diff --git a/CLAUDE.md b/CLAUDE.md index 39c2c14..fd365e0 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -33,6 +33,7 @@ Dependencies: `requests` (streams), `numpy` + `soundfile` (FLAC output and FLAC `web.py`: - Detection: `_compute_rms_windows_wav()` / `analyze_flac()` produce 100 ms RMS windows → `_noise_floor_db()` estimates the rolling floor → `_loud_sections()` emits scored sections → `_package_result()` shapes the `/api/analyze` payload. - Clips: `_api_clip()` validates params, `_clip_wav()` / `_clip_flac()` stream the decoded slice, `_wav_header()` builds the 44-byte PCM header. +- Filenames as a clock: `_recording_start()` parses the start time out of a filename stem; `_cut_filename()` turns a (stem, ext, start, end) into a wall-clock-named cut. Both the listing `date` field and `_api_cut()` use them. - Live headers: `_live_wav_header()`, `_live_flac_header()` (+ `_flac_frame_samples()`, CRC-8 verified). - Serving: `_stream()` (Range support), `_copy_to_response()`, `_safe_path()` (path traversal guard). @@ -54,6 +55,7 @@ Dependencies: `requests` (streams), `numpy` + `soundfile` (FLAC output and FLAC - **Recorder/web coupling is one file:** `RecorderManager` atomically writes `recordings/status.json` every 2 s listing in-progress files; deleted on clean shutdown. `web.py` reads it to show REC badges and to refuse analyse/cut/delete on active files. In-progress WAV/FLAC headers are unfinalized, so durations are not read for active files. - **Stream splits:** OGG/Opus/FLAC codec headers are extracted from the first ~16 KB of each connection and prepended to every split file so each file plays standalone. A new file is always opened on reconnect (gap in stream). MP3/AAC need no headers. - **Split timing:** files split at clock-aligned boundaries (`get_next_split_time()`), e.g. `split_minutes = 60` → on the hour. +- **Filename is the clock — fixed format, not configurable.** Recordings are named `%Y%m%d_%H%M%S.` (the *start* time). This is hardcoded as `FILENAME_FORMAT`, defined in **both** `isr.py` (recorder writes it) and `web.py` (reads it back) — the two copies must stay in sync. There is no `filename_pattern` config option (removed; `web.py` can't see `config.ini`, so a configurable pattern would break parsing). `web.py` derives the displayed DATE column from the filename via `_recording_start()` (falling back to mtime only for non-standard names — mtime is the last write ≈ end, not the start). Cut downloads are named by the wall-clock span they cover via `_cut_filename()`: a 22:31:30→22:32:30 slice of `20260523_220000.flac` becomes `20260523_22-31-30_22-32-30.flac`; non-standard source names fall back to `_cut_s-s`. - **ALSA:** capture spawns `arecord` as a subprocess, raw PCM read in 100 ms chunks by a thread. Device spec resolution: `default` → exact `hw:X,Y` → partial name → fallback to any literal ALSA PCM name (so `shared_mic` from asound.conf works without appearing in `arecord -l`). - **Shutdown:** SIGTERM is converted to KeyboardInterrupt in `main()`; `RecorderManager.stop()` joins all threads against a single shared 25 s deadline to stay inside Docker's `stop_grace_period: 30s`. - **Loud-section detection is adaptive — do not regress it to an absolute threshold.** Per-window dB is compared against a rolling noise floor (`NOISE_PERCENTILE`-th percentile per `NOISE_BLOCK_SECONDS` block, min-smoothed over ±2 blocks so events can't raise their own floor; clamped to ≥ `MIN_RMS`). A section needs `margin` dB of prominence and carries a `score` (peak dB above floor) used for ranking. Sections shorter than `min_duration` (default 0.5 s, after `min_gap` merging) are discarded — without this, isolated 100 ms pops (clicks, single raindrops) produced thousands of zero-length sections per day. The original fixed RMS threshold flagged every ambience change (passing cars, rain) and produced ~600 useless sections/day — that is why it was replaced. Known limitation: a short (~10 s) swell on a quiet street still flags because the floor blocks are 30 s; the planned fix is an onset/spectral filter or optional Silero VAD, **not** a higher margin. Tests in `tests/test_web.py`. diff --git a/README.md b/README.md index 6bba06b..f7e7d23 100644 --- a/README.md +++ b/README.md @@ -67,7 +67,6 @@ docker compose up -d --build |-----|---------|-------------| | `output_directory` | `recordings` | Output path relative to the working directory (or absolute). The Docker setup mounts `./recordings` at `/app/recordings` so this default works unchanged. | | `split_minutes` | `60` | Split into a new file every N minutes, aligned to clock boundaries (e.g. 60 → files start at :00, 30 → at :00 and :30). | -| `filename_pattern` | `%Y%m%d_%H%M%S` | strftime pattern; file extension is appended automatically. | | `max_retries` | `10` | Give up after this many consecutive failures per source. | | `retry_delay_seconds` | `5` | Wait between retries. | | `log_level` | `INFO` | `DEBUG` / `INFO` / `WARNING` / `ERROR` / `CRITICAL` | @@ -124,25 +123,17 @@ split_minutes = 60 [radio1] type = stream url = http://radio.example.com:8000/stream1 -filename_pattern = radio1_%Y%m%d_%H%M%S [system_audio] type = soundcard device = hw:0,0 -filename_pattern = system_%Y%m%d_%H%M%S ``` --- -## Filename patterns +## Filenames -strftime codes are substituted at split time. The file extension is added automatically. - -| Pattern | Example | -|---------|---------| -| `%Y%m%d_%H%M%S` | `20241225_143000.mp3` | -| `radio_%Y-%m-%d_%H%M` | `radio_2024-12-25_1430.mp3` | -| `%Y/%m/%d/rec_%H%M%S` | `2024/12/25/rec_143000.mp3` *(subdirs created automatically)* | +Recordings are named `_.` from the time the file is opened — its **start** time — e.g. `20241225_143000.flac`. This format is fixed and not configurable: the web UI parses the start time back out of the filename to show the recording date and to name cut clips with real wall-clock times (see below). --- @@ -169,7 +160,7 @@ Shows recordings grouped by day with collapsible sections. Features: - **Grace period** — configurable in the controls bar (default 2 s). Loud sections separated by less than this gap are merged into one. Raise this (e.g. to 15–30 s) when a single event generates many timestamps due to brief quiet gaps within it. - **Min duration** — configurable in the controls bar (default 0.5 s). Loud sections shorter than this (after grace-period merging) are discarded, so isolated sub-second pops — a click, a single raindrop — don't flood a day with thousands of near-zero-length sections. Set to 0 to disable. - **Clip playback** — clicking a loud-section chip plays a short server-rendered WAV clip (`/api/clip`, pre-roll included) in a player bar at the bottom of the page. Playback starts instantly even for sections deep inside multi-hundred-MB FLACs, because the browser never has to seek the full file. **J** / **K** (or the **Prev** / **Next** buttons) step through the queued sections — one file's, or a whole day's after **Highlights** — and **Auto-advance** plays the next section when one ends, turning a day's detections into a continuous review reel. **Open in file** switches to the full recording at the same position for context; each chip click also pre-fills the cut panel. -- **Cut & download** — `Cut` button opens the player row and reveals a cut panel. Enter start and end times in `m:ss` or `h:mm:ss` format and click **Download cut** to receive an ffmpeg-trimmed copy without re-encoding. Requires ffmpeg (included in the Docker image). +- **Cut & download** — `Cut` button opens the player row and reveals a cut panel. Enter start and end times in `m:ss` or `h:mm:ss` format and click **Download cut** to receive an ffmpeg-trimmed copy without re-encoding. Requires ffmpeg (included in the Docker image). The cut is named with the real wall-clock span it covers — `__.`, e.g. a 22:31:30→22:32:30 slice of a recording started at 22:00:00 becomes `20260523_22-31-30_22-32-30.flac`. - **Filters** — live filename search and from/to date pickers above the table; applied client-side with no additional requests. Shows `N of M shown` when a filter is active. - **Delete** — `Delete` button per row with confirmation prompt; disabled for files currently being recorded; sends `DELETE /api/files/` and re-renders the table. - **Live REC badge** — files currently being written by `isr.py` show an animated REC indicator, polled every 5 seconds via `/api/status`. Duration for in-progress files shows `—` in the table (header is unfinalized until recording stops). The file list refreshes automatically when a recording starts, stops, or rolls over to a new split file (unless audio is playing). diff --git a/config.example.ini b/config.example.ini index 8430d3a..73b6592 100644 --- a/config.example.ini +++ b/config.example.ini @@ -17,13 +17,10 @@ output_directory = recordings # Duration in minutes after which to split into a new file split_minutes = 60 -# Filename pattern with strftime format codes -# Examples: -# %Y%m%d_%H%M%S -> 20241216_143000.ext -# recording_%Y-%m-%d_%H%M -> recording_2024-12-16_1430.ext -# %Y/%m/%d/audio_%H%M%S -> 2024/12/16/audio_143000.ext (creates subdirs) -# Common codes: %Y=year, %m=month, %d=day, %H=hour, %M=minute, %S=second -filename_pattern = %Y%m%d_%H%M%S +# Recording filenames are fixed as _. (the start time, +# e.g. 20241216_143000.flac). This is not configurable: the web UI parses the +# start time back out of the filename to show the date and to name cut clips +# with real wall-clock times. # Maximum number of connection/recording retry attempts before giving up max_retries = 10 @@ -58,7 +55,6 @@ format = auto # Override general settings for this source (optional): # output_directory = recordings/streams # split_minutes = 30 -# filename_pattern = mystream_%Y%m%d_%H%M%S # ============================================================================= @@ -93,7 +89,6 @@ format = auto # # Override general settings for this source (optional): # # output_directory = recordings/soundcard # # split_minutes = 60 -# # filename_pattern = soundcard_%Y%m%d_%H%M%S # ============================================================================= @@ -105,13 +100,11 @@ format = auto # type = stream # url = http://radio1.example.com:8000/live # format = auto -# filename_pattern = radio1_%Y%m%d_%H%M%S # # [radio_station_2] # type = stream # url = http://radio2.example.com:8000/live # format = auto -# filename_pattern = radio2_%Y%m%d_%H%M%S # # [system_audio] # type = soundcard @@ -119,7 +112,6 @@ format = auto # sample_rate = 48000 # channels = 2 # format = flac -# filename_pattern = system_%Y%m%d_%H%M%S # ============================================================================= diff --git a/isr.py b/isr.py index 22fc974..89334a6 100644 --- a/isr.py +++ b/isr.py @@ -42,6 +42,13 @@ except ImportError: SOUNDFILE_AVAILABLE = False +# Fixed recording-filename timestamp format. This is the recording's *start* +# time and is the single source of truth for the clock: web.py parses it back +# out to derive the displayed date and to name cut clips with real wall-clock +# times. It is intentionally not configurable — both files must agree on it. +FILENAME_FORMAT = '%Y%m%d_%H%M%S' + + # ============================================================================= # Audio Device & Backend System # ============================================================================= @@ -329,7 +336,6 @@ class BaseRecorder(ABC): # Common settings self.split_duration = config.get('split_minutes', 60) self.output_dir = config.get('output_directory', 'recordings') - self.filename_pattern = config.get('filename_pattern', '%Y%m%d_%H%M%S') self.max_retries = config.get('max_retries', 10) self.retry_delay = config.get('retry_delay_seconds', 5) self.file_format = config.get('format', 'auto') @@ -343,9 +349,9 @@ class BaseRecorder(ABC): return next_split.replace(second=0, microsecond=0) def generate_filename(self, ext: str) -> str: - """Generate filename from pattern with strftime substitution.""" + """Generate filename from the fixed start-time format (see FILENAME_FORMAT).""" now = self._clock() - filename = now.strftime(self.filename_pattern) + f".{ext}" + filename = now.strftime(FILENAME_FORMAT) + f".{ext}" full_path = os.path.join(self.output_dir, filename) Path(full_path).parent.mkdir(parents=True, exist_ok=True) return full_path @@ -806,7 +812,6 @@ class RecorderManager: general = { 'output_directory': config.get('general', 'output_directory', fallback='recordings'), 'split_minutes': config.getint('general', 'split_minutes', fallback=60), - 'filename_pattern': config.get('general', 'filename_pattern', fallback='%Y%m%d_%H%M%S', raw=True), 'max_retries': config.getint('general', 'max_retries', fallback=10), 'retry_delay_seconds': config.getint('general', 'retry_delay_seconds', fallback=5), 'log_level': config.get('general', 'log_level', fallback='INFO').upper(), diff --git a/tests/test_isr.py b/tests/test_isr.py index d87e2af..71e7cdb 100644 --- a/tests/test_isr.py +++ b/tests/test_isr.py @@ -142,7 +142,6 @@ class TestGetNextSplitTime: r._clock = fixed_clock(now) r.split_duration = split_minutes r.output_dir = cfg["output_directory"] - r.filename_pattern = "%Y%m%d_%H%M%S" r.max_retries = 3 r.retry_delay = 1 r.file_format = "auto" @@ -184,7 +183,7 @@ class TestGetNextSplitTime: class TestGenerateFilename: """Tests for BaseRecorder.generate_filename().""" - def _recorder(self, pattern: str, now: datetime, output_dir: str) -> isr.BaseRecorder: + def _recorder(self, now: datetime, output_dir: str) -> isr.BaseRecorder: class _Rec(isr.BaseRecorder): def record(self): pass @@ -198,28 +197,21 @@ class TestGenerateFilename: r._clock = fixed_clock(now) r.split_duration = 60 r.output_dir = output_dir - r.filename_pattern = pattern r.max_retries = 3 r.retry_delay = 1 r.file_format = "auto" return r - def test_basic_pattern(self, tmp_path): + def test_fixed_format(self, tmp_path): + # Filenames are always _. — the recording start. now = datetime(2024, 12, 25, 14, 30, 0) - r = self._recorder("%Y%m%d_%H%M%S", now, str(tmp_path)) + r = self._recorder(now, str(tmp_path)) name = r.generate_filename("mp3") assert name.endswith("20241225_143000.mp3") - def test_subdirectory_created(self, tmp_path): - now = datetime(2024, 12, 25, 14, 30, 0) - r = self._recorder("%Y/%m/%d/rec_%H%M%S", now, str(tmp_path)) - name = r.generate_filename("ogg") - parent = Path(name).parent - assert parent.exists() - def test_output_dir_prefix(self, tmp_path): now = datetime(2024, 1, 1, 0, 0, 0) - r = self._recorder("%Y%m%d", now, str(tmp_path)) + r = self._recorder(now, str(tmp_path)) name = r.generate_filename("wav") assert name.startswith(str(tmp_path)) @@ -236,7 +228,6 @@ class TestDetectFormat: "url": "http://example.com/stream", "output_directory": "/tmp", "split_minutes": 60, - "filename_pattern": "%Y%m%d_%H%M%S", "max_retries": 1, "retry_delay_seconds": 0, "format": "auto", @@ -252,7 +243,6 @@ class TestDetectFormat: r._clock = datetime.now r.split_duration = 60 r.output_dir = "/tmp" - r.filename_pattern = "%Y%m%d_%H%M%S" r.max_retries = 1 r.retry_delay = 0 r.file_format = "auto" @@ -446,7 +436,6 @@ class TestRecorderManagerLoadConfig: [general] output_directory = {str(tmp_path / "recordings")} split_minutes = 30 -filename_pattern = test_%Y%m%d max_retries = 3 retry_delay_seconds = 2 log_level = WARNING @@ -491,7 +480,6 @@ format = ogg [general] output_directory = {str(tmp_path / "recordings")} split_minutes = 60 -filename_pattern = %Y%m%d_%H%M%S max_retries = 10 retry_delay_seconds = 5 log_level = WARNING @@ -501,7 +489,6 @@ log_file = {log_file} type = stream url = http://example.com/stream split_minutes = 15 -filename_pattern = custom_%Y%m%d """ cfg_path = tmp_path / "config.ini" cfg_path.write_text(config_text) @@ -512,7 +499,6 @@ filename_pattern = custom_%Y%m%d rec = mgr.recorders[0] assert rec.split_duration == 15 - assert rec.filename_pattern == "custom_%Y%m%d" def test_unknown_type_is_skipped(self, tmp_path): log_file = str(tmp_path / "test.log") @@ -558,7 +544,6 @@ class TestStreamRecorderRecord: "url": "http://example.com/stream", "output_directory": "", # overridden per-test with tmp_path "split_minutes": 60, - "filename_pattern": "%Y%m%d_%H%M%S", "max_retries": 2, "retry_delay_seconds": 0, "format": fmt, @@ -610,7 +595,6 @@ class TestSoundcardRecorder: "format": "wav", "output_directory": str(tmp_path), "split_minutes": 60, - "filename_pattern": "%Y%m%d_%H%M%S", "max_retries": 1, "retry_delay_seconds": 0, } @@ -639,7 +623,6 @@ class TestSoundcardRecorder: "format": "flac", "output_directory": str(tmp_path), "split_minutes": 60, - "filename_pattern": "%Y%m%d_%H%M%S", "max_retries": 1, "retry_delay_seconds": 0, } @@ -662,7 +645,6 @@ class TestSoundcardRecorder: "format": "flac", "output_directory": str(tmp_path), "split_minutes": 60, - "filename_pattern": "%Y%m%d_%H%M%S", "max_retries": 1, "retry_delay_seconds": 0, } diff --git a/tests/test_web.py b/tests/test_web.py index 43a4127..e336ed6 100644 --- a/tests/test_web.py +++ b/tests/test_web.py @@ -2,7 +2,7 @@ import math -from web import _loud_sections, _noise_floor_db +from web import _cut_filename, _loud_sections, _noise_floor_db, _recording_start WINDOW_DUR = 0.1 # 100 ms windows, as produced by WINDOW_SAMPLES at 48 kHz @@ -99,3 +99,33 @@ def test_noise_floor_tracks_blocks_and_ignores_short_events(): floor = _noise_floor_db(db, WINDOW_DUR) assert len(floor) == len(db) assert all(abs(f - quiet_db) < 1.0 for f in floor) + + +# --------------------------------------------------------------------------- +# Filename parsing / cut naming +# --------------------------------------------------------------------------- + +def test_recording_start_parses_standard_name(): + from datetime import datetime + assert _recording_start("20260523_220000") == datetime(2026, 5, 23, 22, 0, 0) + + +def test_recording_start_rejects_nonstandard_name(): + assert _recording_start("radio1_20260523") is None + assert _recording_start("notes") is None + + +def test_cut_filename_uses_wall_clock_span(): + # Recording started 22:00:00; cut covers 22:31:30 → 22:32:30. + name = _cut_filename("20260523_220000", ".flac", 1890, 1950) + assert name == "20260523_22-31-30_22-32-30.flac" + + +def test_cut_filename_rolls_over_the_hour(): + name = _cut_filename("20260523_220000", ".wav", 3590, 3661) + assert name == "20260523_22-59-50_23-01-01.wav" + + +def test_cut_filename_falls_back_for_nonstandard_name(): + name = _cut_filename("mixtape", ".mp3", 740, 750.4) + assert name == "mixtape_cut_740s-750s.mp3" diff --git a/web.py b/web.py index cbc0e31..8e0474b 100644 --- a/web.py +++ b/web.py @@ -24,7 +24,7 @@ import subprocess import tempfile import threading import wave -from datetime import datetime +from datetime import datetime, timedelta from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer from pathlib import Path from urllib.parse import parse_qs, unquote, urlparse @@ -58,6 +58,12 @@ MIN_RMS = 0.002 # ≈ −54 dBFS; the floor never drops below this, CLIP_MAX_SECONDS = 600 # upper bound on /api/clip length +# Recording filenames encode the start time as this strftime format (kept in +# sync with isr.FILENAME_FORMAT). It is the authoritative recording start and +# the only reliable clock anchor — mtime drifts to the last write, so cut clip +# names and the displayed date are both derived from this. +FILENAME_FORMAT = '%Y%m%d_%H%M%S' + MIME_TYPES = { '.wav': 'audio/wav', '.mp3': 'audio/mpeg', @@ -72,6 +78,35 @@ MIME_TYPES = { # Audio analysis helpers # --------------------------------------------------------------------------- +def _recording_start(stem: str): + """Parse the recording start time encoded in a filename stem. + + Returns a datetime, or None if the stem is not in FILENAME_FORMAT + (e.g. a manually renamed file). strptime ignores any extension because + callers pass Path.stem. + """ + try: + return datetime.strptime(stem, FILENAME_FORMAT) + except ValueError: + return None + + +def _cut_filename(stem: str, ext: str, start: float, end: float) -> str: + """Name a cut by the real wall-clock span it covers. + + For a recording that started at 22:00:00, a 22:31:30→22:32:30 slice + (start=1890, end=1950) becomes ``20260523_22-31-30_22-32-30.flac``. + Falls back to the source stem plus second offsets when the filename is + not in FILENAME_FORMAT (e.g. a manually renamed recording). + """ + started = _recording_start(stem) + if started is None: + return f'{stem}_cut_{int(start)}s-{int(end)}s{ext}' + cut_start = started + timedelta(seconds=start) + cut_end = started + timedelta(seconds=end) + return f'{cut_start:%Y%m%d}_{cut_start:%H-%M-%S}_{cut_end:%H-%M-%S}{ext}' + + def _live_wav_header(path: Path, size: int): """Return the WAV header (through the 'data' chunk header) with RIFF and data sizes rewritten to match the current file size, or None. @@ -500,6 +535,11 @@ def list_files(recordings_dir: str): rel = str(path.relative_to(base)).replace('\\', '/') is_active = rel in active_files + # The recording start is encoded in the filename and is the true clock + # anchor; mtime is only a fallback for files not in FILENAME_FORMAT. + started = _recording_start(path.stem) + date = (started or datetime.fromtimestamp(stat.st_mtime)).strftime('%Y-%m-%d %H:%M:%S') + # Skip reading partial headers for in-progress files — the WAV nframes # field and FLAC total_samples are both unfinalized while recording, # producing wildly incorrect values (e.g. 53375995583:39:01). @@ -509,7 +549,7 @@ def list_files(recordings_dir: str): 'name': rel, 'size': stat.st_size, 'mtime': stat.st_mtime, - 'date': datetime.fromtimestamp(stat.st_mtime).strftime('%Y-%m-%d %H:%M:%S'), + 'date': date, 'duration': duration, 'ext': path.suffix.lower().lstrip('.'), 'recording': is_active, @@ -831,7 +871,7 @@ class _Handler(BaseHTTPRequestHandler): return ext = path.suffix.lower() - out_name = f'{path.stem}_cut_{int(start)}s-{int(end)}s{ext}' + out_name = _cut_filename(path.stem, ext, start, end) # For lossless formats, re-encode (not copy) so the container header # is rewritten with the correct duration/size. For lossy formats,