feat: duration and seeking for in-progress FLAC recordings

FLAC duration cannot be derived from byte size (variable compression), so unlike WAV the header cannot be patched from st_size alone. Instead, every FLAC frame header carries its own frame/sample number: read the last 64 KB of the growing file, scan backwards for a frame sync, CRC-8-verify the header to reject false matches in compressed data, and compute the exact samples recorded so far. STREAMINFO total_samples (36 bits at a fixed offset) is rewritten in the served bytes only - the on-disk file is never touched. Overhead: one tail read per /stream request, active files only. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 12:37:55 +02:00
parent fa055fc80a
commit 16dd7cbe51
3 changed files with 128 additions and 6 deletions
@@ -38,6 +38,6 @@ Dependencies: `requests` (streams), `numpy` + `soundfile` (FLAC output and FLAC/
 - **Analysis cache:** results stored as `<analyses-dir>/<file>.analysis.json` keyed by threshold+min_gap; orphans pruned at web startup. In Docker the recordings mount is **read-only** for the web container, so the cache uses a separate `./analyses` bind mount. The `threshold` and `min_gap` keys MUST stay first in the cache JSON — `_cached_analysis_params()` reads only the first 256 bytes to avoid parsing the large embedded result.
 - **Analyze responses:** `/api/analyze` returns `rms_display` (~800 points), never the full per-window RMS list — the UI doesn't use it and it is ~45x larger.
 - **HTTP/1.1 keep-alive:** `_Handler.protocol_version = 'HTTP/1.1'`; every response path must set an accurate `Content-Length`. `_copy_to_response()` force-closes the connection if it under-delivers (file truncated mid-serve).
- **Live playback:** for files listed in status.json, `/stream/` patches the WAV header on the fly (`_live_wav_header`) so the browser sees the duration recorded so far and can seek; responses get `Cache-Control: no-store`.
+- **Live playback:** for files listed in status.json, `/stream/` patches the header on the fly so the browser sees the duration recorded so far and can seek; responses get `Cache-Control: no-store`. WAV: `_live_wav_header` derives sizes from the byte count. FLAC: `_live_flac_header` parses the sample count out of the last frame header in the file tail (CRC-8-verified to reject false sync matches) and rewrites STREAMINFO total_samples — duration is NOT derivable from byte size for FLAC.
 - **Path safety:** every file parameter in `web.py` goes through `_safe_path()`, which resolves and verifies the path stays inside the recordings dir.
 - **dsnoop in Docker:** sharing the soundcard requires `asound.conf` on the host *and* `ipc: host` in docker-compose (dsnoop uses shared memory across the container boundary).
@@ -169,7 +169,7 @@ Shows recordings grouped by day with collapsible sections. Features:
 - **Filters** — live filename search and from/to date pickers above the table; applied client-side with no additional requests. Shows `N of M shown` when a filter is active.
 - **Delete** — `✕ Delete` button per row with confirmation prompt; disabled for files currently being recorded; sends `DELETE /api/files/<name>` and re-renders the table.
 - **Live REC badge** — files currently being written by `isr.py` show an animated REC indicator, polled every 5 seconds via `/api/status`. Duration for in-progress files shows `—` in the table (header is unfinalized until recording stops). The file list refreshes automatically when a recording starts, stops, or rolls over to a new split file (unless audio is playing).
- **Listen while recording** — in-progress files are playable and seekable. For WAV the server patches the (still unfinalized) header on the fly so the browser sees the real duration-so-far; reopening the player reloads the source to pick up newly recorded audio. Live responses are sent with `Cache-Control: no-store`.
+- **Listen while recording** — in-progress files are playable and seekable. For WAV and FLAC the server patches the (still unfinalized) header on the fly so the browser sees the real duration-so-far — for FLAC the exact sample count is parsed from the last frame header in the file tail. Reopening the player reloads the source to pick up newly recorded audio. Live responses are sent with `Cache-Control: no-store`.
 - **Fast loading** — analysis results are cached server-side on disk and client-side per session; cached waveforms load only for expanded day groups, and collapsed days fetch nothing until opened.
 - **WCAG-compliant** — skip link, `aria-expanded`/`aria-controls` on the player toggle, `aria-live` status, focus management, `role=img` on SVG waveforms.
@@ -93,6 +93,124 @@ def _live_wav_header(path: Path, size: int):
        return None
 # CRC-8 (poly 0x07) used by FLAC frame headers
 _CRC8_TABLE = []
 for _i in range(256):
    _c = _i
    for _ in range(8):
        _c = ((_c << 1) ^ 0x07) & 0xFF if _c & 0x80 else (_c << 1) & 0xFF
    _CRC8_TABLE.append(_c)
 _FLAC_BLOCKSIZES = {1: 192, 2: 576, 3: 1152, 4: 2304, 5: 4608, 8: 256, 9: 512,
                    10: 1024, 11: 2048, 12: 4096, 13: 8192, 14: 16384, 15: 32768}
 def _crc8(data: bytes) -> int:
    crc = 0
    for b in data:
        crc = _CRC8_TABLE[crc ^ b]
    return crc
 def _flac_coded_number(buf: bytes, pos: int):
    """Decode the UTF-8-style frame/sample number; returns (value, next_pos)."""
    b0 = buf[pos]
    if b0 < 0x80:
        return b0, pos + 1
    n, mask = 0, 0x40
    while b0 & mask:
        n += 1
        mask >>= 1
    if n < 1 or n > 6:  # 10xxxxxx is not a valid leading byte
        return None
    val = b0 & (mask - 1)
    for i in range(1, n + 1):
        c = buf[pos + i]
        if c & 0xC0 != 0x80:
            return None
        val = (val << 6) | (c & 0x3F)
    return val, pos + 1 + n
 def _flac_frame_samples(buf: bytes, pos: int, fixed_bs: int):
    """If a valid FLAC frame header starts at pos, return the stream sample
    count through the end of that frame, else None. Validity is confirmed by
    the header's CRC-8, so false sync matches in compressed data are rejected."""
    try:
        variable = buf[pos + 1] & 0x01
        bs_code  = buf[pos + 2] >> 4
        sr_code  = buf[pos + 2] & 0x0F
        if bs_code == 0 or sr_code == 15 or buf[pos + 3] & 0x01:
            return None
        if (buf[pos + 3] >> 4) > 10:  # reserved channel assignment
            return None
        coded = _flac_coded_number(buf, pos + 4)
        if coded is None:
            return None
        val, p = coded
        bs = _FLAC_BLOCKSIZES.get(bs_code)
        if bs_code == 6:
            bs = buf[p] + 1
            p += 1
        elif bs_code == 7:
            bs = int.from_bytes(buf[p:p + 2], 'big') + 1
            p += 2
        if sr_code == 12:
            p += 1
        elif sr_code in (13, 14):
            p += 2
        if _crc8(buf[pos:p]) != buf[p]:
            return None
        if variable:           # val is the frame's starting sample number
            return val + (bs or 0)
        return val * (fixed_bs or bs or 4096) + (bs or 0)
    except IndexError:
        return None
 def _live_flac_header(path: Path, size: int):
    """Return the first 26 bytes of a FLAC file with STREAMINFO total_samples
    patched to the samples recorded so far, or None.
    Like _live_wav_header, but FLAC duration cannot be derived from the byte
    count (variable compression). Instead the sample count is parsed out of
    the last frame header in the file tail — each FLAC frame carries its own
    frame/sample number.
    """
    try:
        with open(path, 'rb') as fh:
            head = fh.read(42)
            if len(head) < 42 or head[:4] != b'fLaC':
                return None
            # STREAMINFO must be the first metadata block
            if head[4] & 0x7F != 0 or int.from_bytes(head[5:8], 'big') != 34:
                return None
            fixed_bs = int.from_bytes(head[8:10], 'big')
            tail_len = min(size, 65536)
            fh.seek(size - tail_len)
            buf = fh.read(tail_len)
        samples = None
        for i in range(len(buf) - 20, -1, -1):
            if buf[i] == 0xFF and (buf[i + 1] & 0xFC) == 0xF8:
                samples = _flac_frame_samples(buf, i, fixed_bs)
                if samples:
                    break
        if not samples:
            return None
        # Bytes 18-25 hold: sample rate (20 bits) | channels-1 (3) |
        # bps-1 (5) | total_samples (36). Replace only the low 36 bits.
        field = int.from_bytes(head[18:26], 'big')
        field = (field & ~((1 << 36) - 1)) | min(samples, (1 << 36) - 1)
        patched = bytearray(head[:26])
        patched[18:26] = field.to_bytes(8, 'big')
        return bytes(patched)
    except Exception:
        return None
 def _get_audio_duration(path: Path):
    """Return duration in seconds for any supported audio file, or None."""
    ext = path.suffix.lower()
@@ -491,8 +609,8 @@ class _Handler(BaseHTTPRequestHandler):
        """Serve audio for inline playback with HTTP Range support.
        In-progress recordings are served with Cache-Control: no-store (the
-        content is still growing) and, for WAV, with a header patched to the
+        content is still growing) and, for WAV/FLAC, with a header patched to
-        current size so the browser can show a duration and seek.
+        the duration recorded so far so the browser can show it and seek.
        """
        path = self._safe_path(filename)
        if path is None:
@@ -503,8 +621,12 @@ class _Handler(BaseHTTPRequestHandler):
        is_active = self._is_active(filename)
        prefix = b''
-        if is_active and path.suffix.lower() == '.wav':
+        if is_active:
            ext = path.suffix.lower()
            if ext == '.wav':
                prefix = _live_wav_header(path, size) or b''
            elif ext == '.flac':
                prefix = _live_flac_header(path, size) or b''
        range_header = self.headers.get('Range', '')
        m = re.match(r'bytes=(\d+)-(\d*)', range_header) if range_header else None