feat: name cut clips by wall-clock time; fix recording filename format

Cut downloads were named by byte offsets (`..._cut_740s-750s.flac`). They are now named by the actual recording time the slice covers, e.g. `20260523_22-31-30_22-32-30.flac` for a 22:31:30->22:32:30 cut of a recording started at 22:00:00. To make this reliable, the recording filename is now a fixed `%Y%m%d_%H%M%S` start-time format (`FILENAME_FORMAT`) shared by isr.py and web.py, replacing the user-configurable `filename_pattern` (web.py never reads config.ini, so a custom pattern could not be parsed back). web.py parses the start time out of the filename via `_recording_start()` and builds cut names with `_cut_filename()`. The DATE column now also comes from the filename (falling back to mtime only for non-standard names), since mtime is the last write, not the start. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 14:30:30 +02:00
parent 2caf23f17d
commit 5e7620627b
7 changed files with 97 additions and 55 deletions
@@ -33,6 +33,7 @@ Dependencies: `requests` (streams), `numpy` + `soundfile` (FLAC output and FLAC
 `web.py`:
 - Detection: `_compute_rms_windows_wav()` / `analyze_flac()` produce 100 ms RMS windows → `_noise_floor_db()` estimates the rolling floor → `_loud_sections()` emits scored sections → `_package_result()` shapes the `/api/analyze` payload.
 - Clips: `_api_clip()` validates params, `_clip_wav()` / `_clip_flac()` stream the decoded slice, `_wav_header()` builds the 44-byte PCM header.
 - Filenames as a clock: `_recording_start()` parses the start time out of a filename stem; `_cut_filename()` turns a (stem, ext, start, end) into a wall-clock-named cut. Both the listing `date` field and `_api_cut()` use them.
 - Live headers: `_live_wav_header()`, `_live_flac_header()` (+ `_flac_frame_samples()`, CRC-8 verified).
 - Serving: `_stream()` (Range support), `_copy_to_response()`, `_safe_path()` (path traversal guard).
@@ -54,6 +55,7 @@ Dependencies: `requests` (streams), `numpy` + `soundfile` (FLAC output and FLAC
 - **Recorder/web coupling is one file:** `RecorderManager` atomically writes `recordings/status.json` every 2 s listing in-progress files; deleted on clean shutdown. `web.py` reads it to show REC badges and to refuse analyse/cut/delete on active files. In-progress WAV/FLAC headers are unfinalized, so durations are not read for active files.
 - **Stream splits:** OGG/Opus/FLAC codec headers are extracted from the first ~16 KB of each connection and prepended to every split file so each file plays standalone. A new file is always opened on reconnect (gap in stream). MP3/AAC need no headers.
 - **Split timing:** files split at clock-aligned boundaries (`get_next_split_time()`), e.g. `split_minutes = 60` → on the hour.
 - **Filename is the clock — fixed format, not configurable.** Recordings are named `%Y%m%d_%H%M%S.<ext>` (the *start* time). This is hardcoded as `FILENAME_FORMAT`, defined in **both** `isr.py` (recorder writes it) and `web.py` (reads it back) — the two copies must stay in sync. There is no `filename_pattern` config option (removed; `web.py` can't see `config.ini`, so a configurable pattern would break parsing). `web.py` derives the displayed DATE column from the filename via `_recording_start()` (falling back to mtime only for non-standard names — mtime is the last write ≈ end, not the start). Cut downloads are named by the wall-clock span they cover via `_cut_filename()`: a 22:31:30→22:32:30 slice of `20260523_220000.flac` becomes `20260523_22-31-30_22-32-30.flac`; non-standard source names fall back to `<stem>_cut_<start>s-<end>s`.
 - **ALSA:** capture spawns `arecord` as a subprocess, raw PCM read in 100 ms chunks by a thread. Device spec resolution: `default` → exact `hw:X,Y` → partial name → fallback to any literal ALSA PCM name (so `shared_mic` from asound.conf works without appearing in `arecord -l`).
 - **Shutdown:** SIGTERM is converted to KeyboardInterrupt in `main()`; `RecorderManager.stop()` joins all threads against a single shared 25 s deadline to stay inside Docker's `stop_grace_period: 30s`.
 - **Loud-section detection is adaptive — do not regress it to an absolute threshold.** Per-window dB is compared against a rolling noise floor (`NOISE_PERCENTILE`-th percentile per `NOISE_BLOCK_SECONDS` block, min-smoothed over ±2 blocks so events can't raise their own floor; clamped to ≥ `MIN_RMS`). A section needs `margin` dB of prominence and carries a `score` (peak dB above floor) used for ranking. Sections shorter than `min_duration` (default 0.5 s, after `min_gap` merging) are discarded — without this, isolated 100 ms pops (clicks, single raindrops) produced thousands of zero-length sections per day. The original fixed RMS threshold flagged every ambience change (passing cars, rain) and produced ~600 useless sections/day — that is why it was replaced. Known limitation: a short (~10 s) swell on a quiet street still flags because the floor blocks are 30 s; the planned fix is an onset/spectral filter or optional Silero VAD, **not** a higher margin. Tests in `tests/test_web.py`.
@@ -67,7 +67,6 @@ docker compose up -d --build
 |-----|---------|-------------|
 | `output_directory` | `recordings` | Output path relative to the working directory (or absolute). The Docker setup mounts `./recordings` at `/app/recordings` so this default works unchanged. |
 | `split_minutes` | `60` | Split into a new file every N minutes, aligned to clock boundaries (e.g. 60 → files start at :00, 30 → at :00 and :30). |
 | `filename_pattern` | `%Y%m%d_%H%M%S` | strftime pattern; file extension is appended automatically. |
 | `max_retries` | `10` | Give up after this many consecutive failures per source. |
 | `retry_delay_seconds` | `5` | Wait between retries. |
 | `log_level` | `INFO` | `DEBUG` / `INFO` / `WARNING` / `ERROR` / `CRITICAL` |
@@ -124,25 +123,17 @@ split_minutes    = 60
 [radio1]
 type             = stream
 url              = http://radio.example.com:8000/stream1
 filename_pattern = radio1_%Y%m%d_%H%M%S
 [system_audio]
 type             = soundcard
 device           = hw:0,0
 filename_pattern = system_%Y%m%d_%H%M%S
 ```
 ---
-## Filename patterns
+## Filenames
-strftime codes are substituted at split time. The file extension is added automatically.
+Recordings are named `<YYYYMMDD>_<HHMMSS>.<ext>` from the time the file is opened — its **start** time — e.g. `20241225_143000.flac`. This format is fixed and not configurable: the web UI parses the start time back out of the filename to show the recording date and to name cut clips with real wall-clock times (see below).
 | Pattern | Example |
 |---------|---------|
 | `%Y%m%d_%H%M%S` | `20241225_143000.mp3` |
 | `radio_%Y-%m-%d_%H%M` | `radio_2024-12-25_1430.mp3` |
 | `%Y/%m/%d/rec_%H%M%S` | `2024/12/25/rec_143000.mp3` *(subdirs created automatically)* |
 ---
@@ -169,7 +160,7 @@ Shows recordings grouped by day with collapsible sections. Features:
 - **Grace period** — configurable in the controls bar (default 2 s). Loud sections separated by less than this gap are merged into one. Raise this (e.g. to 15–30 s) when a single event generates many timestamps due to brief quiet gaps within it.
 - **Min duration** — configurable in the controls bar (default 0.5 s). Loud sections shorter than this (after grace-period merging) are discarded, so isolated sub-second pops — a click, a single raindrop — don't flood a day with thousands of near-zero-length sections. Set to 0 to disable.
 - **Clip playback** — clicking a loud-section chip plays a short server-rendered WAV clip (`/api/clip`, pre-roll included) in a player bar at the bottom of the page. Playback starts instantly even for sections deep inside multi-hundred-MB FLACs, because the browser never has to seek the full file. **J** / **K** (or the **Prev** / **Next** buttons) step through the queued sections — one file's, or a whole day's after **Highlights** — and **Auto-advance** plays the next section when one ends, turning a day's detections into a continuous review reel. **Open in file** switches to the full recording at the same position for context; each chip click also pre-fills the cut panel.
- **Cut & download** — `Cut` button opens the player row and reveals a cut panel. Enter start and end times in `m:ss` or `h:mm:ss` format and click **Download cut** to receive an ffmpeg-trimmed copy without re-encoding. Requires ffmpeg (included in the Docker image).
+- **Cut & download** — `Cut` button opens the player row and reveals a cut panel. Enter start and end times in `m:ss` or `h:mm:ss` format and click **Download cut** to receive an ffmpeg-trimmed copy without re-encoding. Requires ffmpeg (included in the Docker image). The cut is named with the real wall-clock span it covers — `<YYYYMMDD>_<HH-MM-SS>_<HH-MM-SS>.<ext>`, e.g. a 22:31:30→22:32:30 slice of a recording started at 22:00:00 becomes `20260523_22-31-30_22-32-30.flac`.
 - **Filters** — live filename search and from/to date pickers above the table; applied client-side with no additional requests. Shows `N of M shown` when a filter is active.
 - **Delete** — `Delete` button per row with confirmation prompt; disabled for files currently being recorded; sends `DELETE /api/files/<name>` and re-renders the table.
 - **Live REC badge** — files currently being written by `isr.py` show an animated REC indicator, polled every 5 seconds via `/api/status`. Duration for in-progress files shows `—` in the table (header is unfinalized until recording stops). The file list refreshes automatically when a recording starts, stops, or rolls over to a new split file (unless audio is playing).
@@ -17,13 +17,10 @@ output_directory = recordings
 # Duration in minutes after which to split into a new file
 split_minutes = 60
-# Filename pattern with strftime format codes
+# Recording filenames are fixed as <YYYYMMDD>_<HHMMSS>.<ext> (the start time,
-# Examples:
+# e.g. 20241216_143000.flac). This is not configurable: the web UI parses the
-#   %Y%m%d_%H%M%S           -> 20241216_143000.ext
+# start time back out of the filename to show the date and to name cut clips
-#   recording_%Y-%m-%d_%H%M -> recording_2024-12-16_1430.ext
+# with real wall-clock times.
 #   %Y/%m/%d/audio_%H%M%S   -> 2024/12/16/audio_143000.ext (creates subdirs)
 # Common codes: %Y=year, %m=month, %d=day, %H=hour, %M=minute, %S=second
 filename_pattern = %Y%m%d_%H%M%S
 # Maximum number of connection/recording retry attempts before giving up
 max_retries = 10
@@ -58,7 +55,6 @@ format = auto
 # Override general settings for this source (optional):
 # output_directory = recordings/streams
 # split_minutes = 30
 # filename_pattern = mystream_%Y%m%d_%H%M%S
 # =============================================================================
@@ -93,7 +89,6 @@ format = auto
 # # Override general settings for this source (optional):
 # # output_directory = recordings/soundcard
 # # split_minutes = 60
 # # filename_pattern = soundcard_%Y%m%d_%H%M%S
 # =============================================================================
@@ -105,13 +100,11 @@ format = auto
 # type = stream
 # url = http://radio1.example.com:8000/live
 # format = auto
 # filename_pattern = radio1_%Y%m%d_%H%M%S
 #
 # [radio_station_2]
 # type = stream
 # url = http://radio2.example.com:8000/live
 # format = auto
 # filename_pattern = radio2_%Y%m%d_%H%M%S
 #
 # [system_audio]
 # type = soundcard
@@ -119,7 +112,6 @@ format = auto
 # sample_rate = 48000
 # channels = 2
 # format = flac
 # filename_pattern = system_%Y%m%d_%H%M%S
 # =============================================================================
@@ -42,6 +42,13 @@ except ImportError:
    SOUNDFILE_AVAILABLE = False
 # Fixed recording-filename timestamp format. This is the recording's *start*
 # time and is the single source of truth for the clock: web.py parses it back
 # out to derive the displayed date and to name cut clips with real wall-clock
 # times. It is intentionally not configurable — both files must agree on it.
 FILENAME_FORMAT = '%Y%m%d_%H%M%S'
 # =============================================================================
 # Audio Device & Backend System
 # =============================================================================
@@ -329,7 +336,6 @@ class BaseRecorder(ABC):
        # Common settings
        self.split_duration = config.get('split_minutes', 60)
        self.output_dir = config.get('output_directory', 'recordings')
        self.filename_pattern = config.get('filename_pattern', '%Y%m%d_%H%M%S')
        self.max_retries = config.get('max_retries', 10)
        self.retry_delay = config.get('retry_delay_seconds', 5)
        self.file_format = config.get('format', 'auto')
@@ -343,9 +349,9 @@ class BaseRecorder(ABC):
        return next_split.replace(second=0, microsecond=0)
    def generate_filename(self, ext: str) -> str:
-        """Generate filename from pattern with strftime substitution."""
+        """Generate filename from the fixed start-time format (see FILENAME_FORMAT)."""
        now = self._clock()
-        filename = now.strftime(self.filename_pattern) + f".{ext}"
+        filename = now.strftime(FILENAME_FORMAT) + f".{ext}"
        full_path = os.path.join(self.output_dir, filename)
        Path(full_path).parent.mkdir(parents=True, exist_ok=True)
        return full_path
@@ -806,7 +812,6 @@ class RecorderManager:
        general = {
            'output_directory': config.get('general', 'output_directory', fallback='recordings'),
            'split_minutes': config.getint('general', 'split_minutes', fallback=60),
            'filename_pattern': config.get('general', 'filename_pattern', fallback='%Y%m%d_%H%M%S', raw=True),
            'max_retries': config.getint('general', 'max_retries', fallback=10),
            'retry_delay_seconds': config.getint('general', 'retry_delay_seconds', fallback=5),
            'log_level': config.get('general', 'log_level', fallback='INFO').upper(),
@@ -142,7 +142,6 @@ class TestGetNextSplitTime:
        r._clock = fixed_clock(now)
        r.split_duration = split_minutes
        r.output_dir = cfg["output_directory"]
        r.filename_pattern = "%Y%m%d_%H%M%S"
        r.max_retries = 3
        r.retry_delay = 1
        r.file_format = "auto"
@@ -184,7 +183,7 @@ class TestGetNextSplitTime:
 class TestGenerateFilename:
    """Tests for BaseRecorder.generate_filename()."""
-    def _recorder(self, pattern: str, now: datetime, output_dir: str) -> isr.BaseRecorder:
+    def _recorder(self, now: datetime, output_dir: str) -> isr.BaseRecorder:
        class _Rec(isr.BaseRecorder):
            def record(self): pass
@@ -198,28 +197,21 @@ class TestGenerateFilename:
        r._clock = fixed_clock(now)
        r.split_duration = 60
        r.output_dir = output_dir
        r.filename_pattern = pattern
        r.max_retries = 3
        r.retry_delay = 1
        r.file_format = "auto"
        return r
-    def test_basic_pattern(self, tmp_path):
+    def test_fixed_format(self, tmp_path):
        # Filenames are always <YYYYMMDD>_<HHMMSS>.<ext> — the recording start.
        now = datetime(2024, 12, 25, 14, 30, 0)
-        r = self._recorder("%Y%m%d_%H%M%S", now, str(tmp_path))
+        r = self._recorder(now, str(tmp_path))
        name = r.generate_filename("mp3")
        assert name.endswith("20241225_143000.mp3")
    def test_subdirectory_created(self, tmp_path):
        now = datetime(2024, 12, 25, 14, 30, 0)
        r = self._recorder("%Y/%m/%d/rec_%H%M%S", now, str(tmp_path))
        name = r.generate_filename("ogg")
        parent = Path(name).parent
        assert parent.exists()
    def test_output_dir_prefix(self, tmp_path):
        now = datetime(2024, 1, 1, 0, 0, 0)
-        r = self._recorder("%Y%m%d", now, str(tmp_path))
+        r = self._recorder(now, str(tmp_path))
        name = r.generate_filename("wav")
        assert name.startswith(str(tmp_path))
@@ -236,7 +228,6 @@ class TestDetectFormat:
            "url": "http://example.com/stream",
            "output_directory": "/tmp",
            "split_minutes": 60,
            "filename_pattern": "%Y%m%d_%H%M%S",
            "max_retries": 1,
            "retry_delay_seconds": 0,
            "format": "auto",
@@ -252,7 +243,6 @@ class TestDetectFormat:
            r._clock = datetime.now
            r.split_duration = 60
            r.output_dir = "/tmp"
            r.filename_pattern = "%Y%m%d_%H%M%S"
            r.max_retries = 1
            r.retry_delay = 0
            r.file_format = "auto"
@@ -446,7 +436,6 @@ class TestRecorderManagerLoadConfig:
 [general]
 output_directory = {str(tmp_path / "recordings")}
 split_minutes = 30
 filename_pattern = test_%Y%m%d
 max_retries = 3
 retry_delay_seconds = 2
 log_level = WARNING
@@ -491,7 +480,6 @@ format = ogg
 [general]
 output_directory = {str(tmp_path / "recordings")}
 split_minutes = 60
 filename_pattern = %Y%m%d_%H%M%S
 max_retries = 10
 retry_delay_seconds = 5
 log_level = WARNING
@@ -501,7 +489,6 @@ log_file = {log_file}
 type = stream
 url = http://example.com/stream
 split_minutes = 15
 filename_pattern = custom_%Y%m%d
 """
        cfg_path = tmp_path / "config.ini"
        cfg_path.write_text(config_text)
@@ -512,7 +499,6 @@ filename_pattern = custom_%Y%m%d
        rec = mgr.recorders[0]
        assert rec.split_duration == 15
        assert rec.filename_pattern == "custom_%Y%m%d"
    def test_unknown_type_is_skipped(self, tmp_path):
        log_file = str(tmp_path / "test.log")
@@ -558,7 +544,6 @@ class TestStreamRecorderRecord:
            "url": "http://example.com/stream",
            "output_directory": "",  # overridden per-test with tmp_path
            "split_minutes": 60,
            "filename_pattern": "%Y%m%d_%H%M%S",
            "max_retries": 2,
            "retry_delay_seconds": 0,
            "format": fmt,
@@ -610,7 +595,6 @@ class TestSoundcardRecorder:
            "format": "wav",
            "output_directory": str(tmp_path),
            "split_minutes": 60,
            "filename_pattern": "%Y%m%d_%H%M%S",
            "max_retries": 1,
            "retry_delay_seconds": 0,
        }
@@ -639,7 +623,6 @@ class TestSoundcardRecorder:
            "format": "flac",
            "output_directory": str(tmp_path),
            "split_minutes": 60,
            "filename_pattern": "%Y%m%d_%H%M%S",
            "max_retries": 1,
            "retry_delay_seconds": 0,
        }
@@ -662,7 +645,6 @@ class TestSoundcardRecorder:
            "format": "flac",
            "output_directory": str(tmp_path),
            "split_minutes": 60,
            "filename_pattern": "%Y%m%d_%H%M%S",
            "max_retries": 1,
            "retry_delay_seconds": 0,
        }
@@ -2,7 +2,7 @@
 import math
-from web import _loud_sections, _noise_floor_db
+from web import _cut_filename, _loud_sections, _noise_floor_db, _recording_start
 WINDOW_DUR = 0.1  # 100 ms windows, as produced by WINDOW_SAMPLES at 48 kHz
@@ -99,3 +99,33 @@ def test_noise_floor_tracks_blocks_and_ignores_short_events():
    floor = _noise_floor_db(db, WINDOW_DUR)
    assert len(floor) == len(db)
    assert all(abs(f - quiet_db) < 1.0 for f in floor)
 # ---------------------------------------------------------------------------
 # Filename parsing / cut naming
 # ---------------------------------------------------------------------------
 def test_recording_start_parses_standard_name():
    from datetime import datetime
    assert _recording_start("20260523_220000") == datetime(2026, 5, 23, 22, 0, 0)
 def test_recording_start_rejects_nonstandard_name():
    assert _recording_start("radio1_20260523") is None
    assert _recording_start("notes") is None
 def test_cut_filename_uses_wall_clock_span():
    # Recording started 22:00:00; cut covers 22:31:30 → 22:32:30.
    name = _cut_filename("20260523_220000", ".flac", 1890, 1950)
    assert name == "20260523_22-31-30_22-32-30.flac"
 def test_cut_filename_rolls_over_the_hour():
    name = _cut_filename("20260523_220000", ".wav", 3590, 3661)
    assert name == "20260523_22-59-50_23-01-01.wav"
 def test_cut_filename_falls_back_for_nonstandard_name():
    name = _cut_filename("mixtape", ".mp3", 740, 750.4)
    assert name == "mixtape_cut_740s-750s.mp3"
@@ -24,7 +24,7 @@ import subprocess
 import tempfile
 import threading
 import wave
-from datetime import datetime
+from datetime import datetime, timedelta
 from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer
 from pathlib import Path
 from urllib.parse import parse_qs, unquote, urlparse
@@ -58,6 +58,12 @@ MIN_RMS             = 0.002  # ≈ −54 dBFS; the floor never drops below this,
 CLIP_MAX_SECONDS = 600     # upper bound on /api/clip length
 # Recording filenames encode the start time as this strftime format (kept in
 # sync with isr.FILENAME_FORMAT). It is the authoritative recording start and
 # the only reliable clock anchor — mtime drifts to the last write, so cut clip
 # names and the displayed date are both derived from this.
 FILENAME_FORMAT = '%Y%m%d_%H%M%S'
 MIME_TYPES = {
    '.wav':  'audio/wav',
    '.mp3':  'audio/mpeg',
@@ -72,6 +78,35 @@ MIME_TYPES = {
 # Audio analysis helpers
 # ---------------------------------------------------------------------------
 def _recording_start(stem: str):
    """Parse the recording start time encoded in a filename stem.
    Returns a datetime, or None if the stem is not in FILENAME_FORMAT
    (e.g. a manually renamed file). strptime ignores any extension because
    callers pass Path.stem.
    """
    try:
        return datetime.strptime(stem, FILENAME_FORMAT)
    except ValueError:
        return None
 def _cut_filename(stem: str, ext: str, start: float, end: float) -> str:
    """Name a cut by the real wall-clock span it covers.
    For a recording that started at 22:00:00, a 22:31:30→22:32:30 slice
    (start=1890, end=1950) becomes ``20260523_22-31-30_22-32-30.flac``.
    Falls back to the source stem plus second offsets when the filename is
    not in FILENAME_FORMAT (e.g. a manually renamed recording).
    """
    started = _recording_start(stem)
    if started is None:
        return f'{stem}_cut_{int(start)}s-{int(end)}s{ext}'
    cut_start = started + timedelta(seconds=start)
    cut_end   = started + timedelta(seconds=end)
    return f'{cut_start:%Y%m%d}_{cut_start:%H-%M-%S}_{cut_end:%H-%M-%S}{ext}'
 def _live_wav_header(path: Path, size: int):
    """Return the WAV header (through the 'data' chunk header) with RIFF and
    data sizes rewritten to match the current file size, or None.
@@ -500,6 +535,11 @@ def list_files(recordings_dir: str):
        rel      = str(path.relative_to(base)).replace('\\', '/')
        is_active = rel in active_files
        # The recording start is encoded in the filename and is the true clock
        # anchor; mtime is only a fallback for files not in FILENAME_FORMAT.
        started = _recording_start(path.stem)
        date    = (started or datetime.fromtimestamp(stat.st_mtime)).strftime('%Y-%m-%d %H:%M:%S')
        # Skip reading partial headers for in-progress files — the WAV nframes
        # field and FLAC total_samples are both unfinalized while recording,
        # producing wildly incorrect values (e.g. 53375995583:39:01).
@@ -509,7 +549,7 @@ def list_files(recordings_dir: str):
            'name':      rel,
            'size':      stat.st_size,
            'mtime':     stat.st_mtime,
-            'date':      datetime.fromtimestamp(stat.st_mtime).strftime('%Y-%m-%d %H:%M:%S'),
+            'date':      date,
            'duration':  duration,
            'ext':       path.suffix.lower().lstrip('.'),
            'recording': is_active,
@@ -831,7 +871,7 @@ class _Handler(BaseHTTPRequestHandler):
            return
        ext      = path.suffix.lower()
-        out_name = f'{path.stem}_cut_{int(start)}s-{int(end)}s{ext}'
+        out_name = _cut_filename(path.stem, ext, start, end)
        # For lossless formats, re-encode (not copy) so the container header
        # is rewritten with the correct duration/size. For lossy formats,