feat: duration and seeking for in-progress FLAC recordings

FLAC duration cannot be derived from byte size (variable compression),
so unlike WAV the header cannot be patched from st_size alone. Instead,
every FLAC frame header carries its own frame/sample number: read the
last 64 KB of the growing file, scan backwards for a frame sync,
CRC-8-verify the header to reject false matches in compressed data,
and compute the exact samples recorded so far. STREAMINFO
total_samples (36 bits at a fixed offset) is rewritten in the served
bytes only - the on-disk file is never touched.

Overhead: one tail read per /stream request, active files only.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
2026-06-10 12:37:55 +02:00
parent fa055fc80a
commit 16dd7cbe51
3 changed files with 128 additions and 6 deletions
+1 -1
View File
@@ -38,6 +38,6 @@ Dependencies: `requests` (streams), `numpy` + `soundfile` (FLAC output and FLAC/
- **Analysis cache:** results stored as `<analyses-dir>/<file>.analysis.json` keyed by threshold+min_gap; orphans pruned at web startup. In Docker the recordings mount is **read-only** for the web container, so the cache uses a separate `./analyses` bind mount. The `threshold` and `min_gap` keys MUST stay first in the cache JSON — `_cached_analysis_params()` reads only the first 256 bytes to avoid parsing the large embedded result. - **Analysis cache:** results stored as `<analyses-dir>/<file>.analysis.json` keyed by threshold+min_gap; orphans pruned at web startup. In Docker the recordings mount is **read-only** for the web container, so the cache uses a separate `./analyses` bind mount. The `threshold` and `min_gap` keys MUST stay first in the cache JSON — `_cached_analysis_params()` reads only the first 256 bytes to avoid parsing the large embedded result.
- **Analyze responses:** `/api/analyze` returns `rms_display` (~800 points), never the full per-window RMS list — the UI doesn't use it and it is ~45x larger. - **Analyze responses:** `/api/analyze` returns `rms_display` (~800 points), never the full per-window RMS list — the UI doesn't use it and it is ~45x larger.
- **HTTP/1.1 keep-alive:** `_Handler.protocol_version = 'HTTP/1.1'`; every response path must set an accurate `Content-Length`. `_copy_to_response()` force-closes the connection if it under-delivers (file truncated mid-serve). - **HTTP/1.1 keep-alive:** `_Handler.protocol_version = 'HTTP/1.1'`; every response path must set an accurate `Content-Length`. `_copy_to_response()` force-closes the connection if it under-delivers (file truncated mid-serve).
- **Live playback:** for files listed in status.json, `/stream/` patches the WAV header on the fly (`_live_wav_header`) so the browser sees the duration recorded so far and can seek; responses get `Cache-Control: no-store`. - **Live playback:** for files listed in status.json, `/stream/` patches the header on the fly so the browser sees the duration recorded so far and can seek; responses get `Cache-Control: no-store`. WAV: `_live_wav_header` derives sizes from the byte count. FLAC: `_live_flac_header` parses the sample count out of the last frame header in the file tail (CRC-8-verified to reject false sync matches) and rewrites STREAMINFO total_samples — duration is NOT derivable from byte size for FLAC.
- **Path safety:** every file parameter in `web.py` goes through `_safe_path()`, which resolves and verifies the path stays inside the recordings dir. - **Path safety:** every file parameter in `web.py` goes through `_safe_path()`, which resolves and verifies the path stays inside the recordings dir.
- **dsnoop in Docker:** sharing the soundcard requires `asound.conf` on the host *and* `ipc: host` in docker-compose (dsnoop uses shared memory across the container boundary). - **dsnoop in Docker:** sharing the soundcard requires `asound.conf` on the host *and* `ipc: host` in docker-compose (dsnoop uses shared memory across the container boundary).
+1 -1
View File
@@ -169,7 +169,7 @@ Shows recordings grouped by day with collapsible sections. Features:
- **Filters** — live filename search and from/to date pickers above the table; applied client-side with no additional requests. Shows `N of M shown` when a filter is active. - **Filters** — live filename search and from/to date pickers above the table; applied client-side with no additional requests. Shows `N of M shown` when a filter is active.
- **Delete** — `✕ Delete` button per row with confirmation prompt; disabled for files currently being recorded; sends `DELETE /api/files/<name>` and re-renders the table. - **Delete** — `✕ Delete` button per row with confirmation prompt; disabled for files currently being recorded; sends `DELETE /api/files/<name>` and re-renders the table.
- **Live REC badge** — files currently being written by `isr.py` show an animated REC indicator, polled every 5 seconds via `/api/status`. Duration for in-progress files shows `—` in the table (header is unfinalized until recording stops). The file list refreshes automatically when a recording starts, stops, or rolls over to a new split file (unless audio is playing). - **Live REC badge** — files currently being written by `isr.py` show an animated REC indicator, polled every 5 seconds via `/api/status`. Duration for in-progress files shows `—` in the table (header is unfinalized until recording stops). The file list refreshes automatically when a recording starts, stops, or rolls over to a new split file (unless audio is playing).
- **Listen while recording** — in-progress files are playable and seekable. For WAV the server patches the (still unfinalized) header on the fly so the browser sees the real duration-so-far; reopening the player reloads the source to pick up newly recorded audio. Live responses are sent with `Cache-Control: no-store`. - **Listen while recording** — in-progress files are playable and seekable. For WAV and FLAC the server patches the (still unfinalized) header on the fly so the browser sees the real duration-so-far — for FLAC the exact sample count is parsed from the last frame header in the file tail. Reopening the player reloads the source to pick up newly recorded audio. Live responses are sent with `Cache-Control: no-store`.
- **Fast loading** — analysis results are cached server-side on disk and client-side per session; cached waveforms load only for expanded day groups, and collapsed days fetch nothing until opened. - **Fast loading** — analysis results are cached server-side on disk and client-side per session; cached waveforms load only for expanded day groups, and collapsed days fetch nothing until opened.
- **WCAG-compliant** — skip link, `aria-expanded`/`aria-controls` on the player toggle, `aria-live` status, focus management, `role=img` on SVG waveforms. - **WCAG-compliant** — skip link, `aria-expanded`/`aria-controls` on the player toggle, `aria-live` status, focus management, `role=img` on SVG waveforms.
+125 -3
View File
@@ -93,6 +93,124 @@ def _live_wav_header(path: Path, size: int):
return None return None
# CRC-8 (poly 0x07) used by FLAC frame headers
_CRC8_TABLE = []
for _i in range(256):
_c = _i
for _ in range(8):
_c = ((_c << 1) ^ 0x07) & 0xFF if _c & 0x80 else (_c << 1) & 0xFF
_CRC8_TABLE.append(_c)
_FLAC_BLOCKSIZES = {1: 192, 2: 576, 3: 1152, 4: 2304, 5: 4608, 8: 256, 9: 512,
10: 1024, 11: 2048, 12: 4096, 13: 8192, 14: 16384, 15: 32768}
def _crc8(data: bytes) -> int:
crc = 0
for b in data:
crc = _CRC8_TABLE[crc ^ b]
return crc
def _flac_coded_number(buf: bytes, pos: int):
"""Decode the UTF-8-style frame/sample number; returns (value, next_pos)."""
b0 = buf[pos]
if b0 < 0x80:
return b0, pos + 1
n, mask = 0, 0x40
while b0 & mask:
n += 1
mask >>= 1
if n < 1 or n > 6: # 10xxxxxx is not a valid leading byte
return None
val = b0 & (mask - 1)
for i in range(1, n + 1):
c = buf[pos + i]
if c & 0xC0 != 0x80:
return None
val = (val << 6) | (c & 0x3F)
return val, pos + 1 + n
def _flac_frame_samples(buf: bytes, pos: int, fixed_bs: int):
"""If a valid FLAC frame header starts at pos, return the stream sample
count through the end of that frame, else None. Validity is confirmed by
the header's CRC-8, so false sync matches in compressed data are rejected."""
try:
variable = buf[pos + 1] & 0x01
bs_code = buf[pos + 2] >> 4
sr_code = buf[pos + 2] & 0x0F
if bs_code == 0 or sr_code == 15 or buf[pos + 3] & 0x01:
return None
if (buf[pos + 3] >> 4) > 10: # reserved channel assignment
return None
coded = _flac_coded_number(buf, pos + 4)
if coded is None:
return None
val, p = coded
bs = _FLAC_BLOCKSIZES.get(bs_code)
if bs_code == 6:
bs = buf[p] + 1
p += 1
elif bs_code == 7:
bs = int.from_bytes(buf[p:p + 2], 'big') + 1
p += 2
if sr_code == 12:
p += 1
elif sr_code in (13, 14):
p += 2
if _crc8(buf[pos:p]) != buf[p]:
return None
if variable: # val is the frame's starting sample number
return val + (bs or 0)
return val * (fixed_bs or bs or 4096) + (bs or 0)
except IndexError:
return None
def _live_flac_header(path: Path, size: int):
"""Return the first 26 bytes of a FLAC file with STREAMINFO total_samples
patched to the samples recorded so far, or None.
Like _live_wav_header, but FLAC duration cannot be derived from the byte
count (variable compression). Instead the sample count is parsed out of
the last frame header in the file tail — each FLAC frame carries its own
frame/sample number.
"""
try:
with open(path, 'rb') as fh:
head = fh.read(42)
if len(head) < 42 or head[:4] != b'fLaC':
return None
# STREAMINFO must be the first metadata block
if head[4] & 0x7F != 0 or int.from_bytes(head[5:8], 'big') != 34:
return None
fixed_bs = int.from_bytes(head[8:10], 'big')
tail_len = min(size, 65536)
fh.seek(size - tail_len)
buf = fh.read(tail_len)
samples = None
for i in range(len(buf) - 20, -1, -1):
if buf[i] == 0xFF and (buf[i + 1] & 0xFC) == 0xF8:
samples = _flac_frame_samples(buf, i, fixed_bs)
if samples:
break
if not samples:
return None
# Bytes 18-25 hold: sample rate (20 bits) | channels-1 (3) |
# bps-1 (5) | total_samples (36). Replace only the low 36 bits.
field = int.from_bytes(head[18:26], 'big')
field = (field & ~((1 << 36) - 1)) | min(samples, (1 << 36) - 1)
patched = bytearray(head[:26])
patched[18:26] = field.to_bytes(8, 'big')
return bytes(patched)
except Exception:
return None
def _get_audio_duration(path: Path): def _get_audio_duration(path: Path):
"""Return duration in seconds for any supported audio file, or None.""" """Return duration in seconds for any supported audio file, or None."""
ext = path.suffix.lower() ext = path.suffix.lower()
@@ -491,8 +609,8 @@ class _Handler(BaseHTTPRequestHandler):
"""Serve audio for inline playback with HTTP Range support. """Serve audio for inline playback with HTTP Range support.
In-progress recordings are served with Cache-Control: no-store (the In-progress recordings are served with Cache-Control: no-store (the
content is still growing) and, for WAV, with a header patched to the content is still growing) and, for WAV/FLAC, with a header patched to
current size so the browser can show a duration and seek. the duration recorded so far so the browser can show it and seek.
""" """
path = self._safe_path(filename) path = self._safe_path(filename)
if path is None: if path is None:
@@ -503,8 +621,12 @@ class _Handler(BaseHTTPRequestHandler):
is_active = self._is_active(filename) is_active = self._is_active(filename)
prefix = b'' prefix = b''
if is_active and path.suffix.lower() == '.wav': if is_active:
ext = path.suffix.lower()
if ext == '.wav':
prefix = _live_wav_header(path, size) or b'' prefix = _live_wav_header(path, size) or b''
elif ext == '.flac':
prefix = _live_flac_header(path, size) or b''
range_header = self.headers.get('Range', '') range_header = self.headers.get('Range', '')
m = re.match(r'bytes=(\d+)-(\d*)', range_header) if range_header else None m = re.match(r'bytes=(\d+)-(\d*)', range_header) if range_header else None