feat: name cut clips by wall-clock time; fix recording filename format

Cut downloads were named by byte offsets (`..._cut_740s-750s.flac`). They are
now named by the actual recording time the slice covers, e.g.
`20260523_22-31-30_22-32-30.flac` for a 22:31:30->22:32:30 cut of a recording
started at 22:00:00.

To make this reliable, the recording filename is now a fixed
`%Y%m%d_%H%M%S` start-time format (`FILENAME_FORMAT`) shared by isr.py and
web.py, replacing the user-configurable `filename_pattern` (web.py never reads
config.ini, so a custom pattern could not be parsed back). web.py parses the
start time out of the filename via `_recording_start()` and builds cut names
with `_cut_filename()`. The DATE column now also comes from the filename
(falling back to mtime only for non-standard names), since mtime is the last
write, not the start.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-11 14:30:30 +02:00
parent 2caf23f17d
commit 5e7620627b
7 changed files with 97 additions and 55 deletions
+2
View File
@@ -33,6 +33,7 @@ Dependencies: `requests` (streams), `numpy` + `soundfile` (FLAC output and FLAC
`web.py`:
- Detection: `_compute_rms_windows_wav()` / `analyze_flac()` produce 100 ms RMS windows → `_noise_floor_db()` estimates the rolling floor → `_loud_sections()` emits scored sections → `_package_result()` shapes the `/api/analyze` payload.
- Clips: `_api_clip()` validates params, `_clip_wav()` / `_clip_flac()` stream the decoded slice, `_wav_header()` builds the 44-byte PCM header.
- Filenames as a clock: `_recording_start()` parses the start time out of a filename stem; `_cut_filename()` turns a (stem, ext, start, end) into a wall-clock-named cut. Both the listing `date` field and `_api_cut()` use them.
- Live headers: `_live_wav_header()`, `_live_flac_header()` (+ `_flac_frame_samples()`, CRC-8 verified).
- Serving: `_stream()` (Range support), `_copy_to_response()`, `_safe_path()` (path traversal guard).
@@ -54,6 +55,7 @@ Dependencies: `requests` (streams), `numpy` + `soundfile` (FLAC output and FLAC
- **Recorder/web coupling is one file:** `RecorderManager` atomically writes `recordings/status.json` every 2 s listing in-progress files; deleted on clean shutdown. `web.py` reads it to show REC badges and to refuse analyse/cut/delete on active files. In-progress WAV/FLAC headers are unfinalized, so durations are not read for active files.
- **Stream splits:** OGG/Opus/FLAC codec headers are extracted from the first ~16 KB of each connection and prepended to every split file so each file plays standalone. A new file is always opened on reconnect (gap in stream). MP3/AAC need no headers.
- **Split timing:** files split at clock-aligned boundaries (`get_next_split_time()`), e.g. `split_minutes = 60` → on the hour.
- **Filename is the clock — fixed format, not configurable.** Recordings are named `%Y%m%d_%H%M%S.<ext>` (the *start* time). This is hardcoded as `FILENAME_FORMAT`, defined in **both** `isr.py` (recorder writes it) and `web.py` (reads it back) — the two copies must stay in sync. There is no `filename_pattern` config option (removed; `web.py` can't see `config.ini`, so a configurable pattern would break parsing). `web.py` derives the displayed DATE column from the filename via `_recording_start()` (falling back to mtime only for non-standard names — mtime is the last write ≈ end, not the start). Cut downloads are named by the wall-clock span they cover via `_cut_filename()`: a 22:31:30→22:32:30 slice of `20260523_220000.flac` becomes `20260523_22-31-30_22-32-30.flac`; non-standard source names fall back to `<stem>_cut_<start>s-<end>s`.
- **ALSA:** capture spawns `arecord` as a subprocess, raw PCM read in 100 ms chunks by a thread. Device spec resolution: `default` → exact `hw:X,Y` → partial name → fallback to any literal ALSA PCM name (so `shared_mic` from asound.conf works without appearing in `arecord -l`).
- **Shutdown:** SIGTERM is converted to KeyboardInterrupt in `main()`; `RecorderManager.stop()` joins all threads against a single shared 25 s deadline to stay inside Docker's `stop_grace_period: 30s`.
- **Loud-section detection is adaptive — do not regress it to an absolute threshold.** Per-window dB is compared against a rolling noise floor (`NOISE_PERCENTILE`-th percentile per `NOISE_BLOCK_SECONDS` block, min-smoothed over ±2 blocks so events can't raise their own floor; clamped to ≥ `MIN_RMS`). A section needs `margin` dB of prominence and carries a `score` (peak dB above floor) used for ranking. Sections shorter than `min_duration` (default 0.5 s, after `min_gap` merging) are discarded — without this, isolated 100 ms pops (clicks, single raindrops) produced thousands of zero-length sections per day. The original fixed RMS threshold flagged every ambience change (passing cars, rain) and produced ~600 useless sections/day — that is why it was replaced. Known limitation: a short (~10 s) swell on a quiet street still flags because the floor blocks are 30 s; the planned fix is an onset/spectral filter or optional Silero VAD, **not** a higher margin. Tests in `tests/test_web.py`.
+3 -12
View File
@@ -67,7 +67,6 @@ docker compose up -d --build
|-----|---------|-------------|
| `output_directory` | `recordings` | Output path relative to the working directory (or absolute). The Docker setup mounts `./recordings` at `/app/recordings` so this default works unchanged. |
| `split_minutes` | `60` | Split into a new file every N minutes, aligned to clock boundaries (e.g. 60 → files start at :00, 30 → at :00 and :30). |
| `filename_pattern` | `%Y%m%d_%H%M%S` | strftime pattern; file extension is appended automatically. |
| `max_retries` | `10` | Give up after this many consecutive failures per source. |
| `retry_delay_seconds` | `5` | Wait between retries. |
| `log_level` | `INFO` | `DEBUG` / `INFO` / `WARNING` / `ERROR` / `CRITICAL` |
@@ -124,25 +123,17 @@ split_minutes = 60
[radio1]
type = stream
url = http://radio.example.com:8000/stream1
filename_pattern = radio1_%Y%m%d_%H%M%S
[system_audio]
type = soundcard
device = hw:0,0
filename_pattern = system_%Y%m%d_%H%M%S
```
---
## Filename patterns
## Filenames
strftime codes are substituted at split time. The file extension is added automatically.
| Pattern | Example |
|---------|---------|
| `%Y%m%d_%H%M%S` | `20241225_143000.mp3` |
| `radio_%Y-%m-%d_%H%M` | `radio_2024-12-25_1430.mp3` |
| `%Y/%m/%d/rec_%H%M%S` | `2024/12/25/rec_143000.mp3` *(subdirs created automatically)* |
Recordings are named `<YYYYMMDD>_<HHMMSS>.<ext>` from the time the file is opened — its **start** time — e.g. `20241225_143000.flac`. This format is fixed and not configurable: the web UI parses the start time back out of the filename to show the recording date and to name cut clips with real wall-clock times (see below).
---
@@ -169,7 +160,7 @@ Shows recordings grouped by day with collapsible sections. Features:
- **Grace period** — configurable in the controls bar (default 2 s). Loud sections separated by less than this gap are merged into one. Raise this (e.g. to 1530 s) when a single event generates many timestamps due to brief quiet gaps within it.
- **Min duration** — configurable in the controls bar (default 0.5 s). Loud sections shorter than this (after grace-period merging) are discarded, so isolated sub-second pops — a click, a single raindrop — don't flood a day with thousands of near-zero-length sections. Set to 0 to disable.
- **Clip playback** — clicking a loud-section chip plays a short server-rendered WAV clip (`/api/clip`, pre-roll included) in a player bar at the bottom of the page. Playback starts instantly even for sections deep inside multi-hundred-MB FLACs, because the browser never has to seek the full file. **J** / **K** (or the **Prev** / **Next** buttons) step through the queued sections — one file's, or a whole day's after **Highlights** — and **Auto-advance** plays the next section when one ends, turning a day's detections into a continuous review reel. **Open in file** switches to the full recording at the same position for context; each chip click also pre-fills the cut panel.
- **Cut & download** — `Cut` button opens the player row and reveals a cut panel. Enter start and end times in `m:ss` or `h:mm:ss` format and click **Download cut** to receive an ffmpeg-trimmed copy without re-encoding. Requires ffmpeg (included in the Docker image).
- **Cut & download** — `Cut` button opens the player row and reveals a cut panel. Enter start and end times in `m:ss` or `h:mm:ss` format and click **Download cut** to receive an ffmpeg-trimmed copy without re-encoding. Requires ffmpeg (included in the Docker image). The cut is named with the real wall-clock span it covers — `<YYYYMMDD>_<HH-MM-SS>_<HH-MM-SS>.<ext>`, e.g. a 22:31:30→22:32:30 slice of a recording started at 22:00:00 becomes `20260523_22-31-30_22-32-30.flac`.
- **Filters** — live filename search and from/to date pickers above the table; applied client-side with no additional requests. Shows `N of M shown` when a filter is active.
- **Delete** — `Delete` button per row with confirmation prompt; disabled for files currently being recorded; sends `DELETE /api/files/<name>` and re-renders the table.
- **Live REC badge** — files currently being written by `isr.py` show an animated REC indicator, polled every 5 seconds via `/api/status`. Duration for in-progress files shows `—` in the table (header is unfinalized until recording stops). The file list refreshes automatically when a recording starts, stops, or rolls over to a new split file (unless audio is playing).
+4 -12
View File
@@ -17,13 +17,10 @@ output_directory = recordings
# Duration in minutes after which to split into a new file
split_minutes = 60
# Filename pattern with strftime format codes
# Examples:
# %Y%m%d_%H%M%S -> 20241216_143000.ext
# recording_%Y-%m-%d_%H%M -> recording_2024-12-16_1430.ext
# %Y/%m/%d/audio_%H%M%S -> 2024/12/16/audio_143000.ext (creates subdirs)
# Common codes: %Y=year, %m=month, %d=day, %H=hour, %M=minute, %S=second
filename_pattern = %Y%m%d_%H%M%S
# Recording filenames are fixed as <YYYYMMDD>_<HHMMSS>.<ext> (the start time,
# e.g. 20241216_143000.flac). This is not configurable: the web UI parses the
# start time back out of the filename to show the date and to name cut clips
# with real wall-clock times.
# Maximum number of connection/recording retry attempts before giving up
max_retries = 10
@@ -58,7 +55,6 @@ format = auto
# Override general settings for this source (optional):
# output_directory = recordings/streams
# split_minutes = 30
# filename_pattern = mystream_%Y%m%d_%H%M%S
# =============================================================================
@@ -93,7 +89,6 @@ format = auto
# # Override general settings for this source (optional):
# # output_directory = recordings/soundcard
# # split_minutes = 60
# # filename_pattern = soundcard_%Y%m%d_%H%M%S
# =============================================================================
@@ -105,13 +100,11 @@ format = auto
# type = stream
# url = http://radio1.example.com:8000/live
# format = auto
# filename_pattern = radio1_%Y%m%d_%H%M%S
#
# [radio_station_2]
# type = stream
# url = http://radio2.example.com:8000/live
# format = auto
# filename_pattern = radio2_%Y%m%d_%H%M%S
#
# [system_audio]
# type = soundcard
@@ -119,7 +112,6 @@ format = auto
# sample_rate = 48000
# channels = 2
# format = flac
# filename_pattern = system_%Y%m%d_%H%M%S
# =============================================================================
+9 -4
View File
@@ -42,6 +42,13 @@ except ImportError:
SOUNDFILE_AVAILABLE = False
# Fixed recording-filename timestamp format. This is the recording's *start*
# time and is the single source of truth for the clock: web.py parses it back
# out to derive the displayed date and to name cut clips with real wall-clock
# times. It is intentionally not configurable — both files must agree on it.
FILENAME_FORMAT = '%Y%m%d_%H%M%S'
# =============================================================================
# Audio Device & Backend System
# =============================================================================
@@ -329,7 +336,6 @@ class BaseRecorder(ABC):
# Common settings
self.split_duration = config.get('split_minutes', 60)
self.output_dir = config.get('output_directory', 'recordings')
self.filename_pattern = config.get('filename_pattern', '%Y%m%d_%H%M%S')
self.max_retries = config.get('max_retries', 10)
self.retry_delay = config.get('retry_delay_seconds', 5)
self.file_format = config.get('format', 'auto')
@@ -343,9 +349,9 @@ class BaseRecorder(ABC):
return next_split.replace(second=0, microsecond=0)
def generate_filename(self, ext: str) -> str:
"""Generate filename from pattern with strftime substitution."""
"""Generate filename from the fixed start-time format (see FILENAME_FORMAT)."""
now = self._clock()
filename = now.strftime(self.filename_pattern) + f".{ext}"
filename = now.strftime(FILENAME_FORMAT) + f".{ext}"
full_path = os.path.join(self.output_dir, filename)
Path(full_path).parent.mkdir(parents=True, exist_ok=True)
return full_path
@@ -806,7 +812,6 @@ class RecorderManager:
general = {
'output_directory': config.get('general', 'output_directory', fallback='recordings'),
'split_minutes': config.getint('general', 'split_minutes', fallback=60),
'filename_pattern': config.get('general', 'filename_pattern', fallback='%Y%m%d_%H%M%S', raw=True),
'max_retries': config.getint('general', 'max_retries', fallback=10),
'retry_delay_seconds': config.getint('general', 'retry_delay_seconds', fallback=5),
'log_level': config.get('general', 'log_level', fallback='INFO').upper(),
+5 -23
View File
@@ -142,7 +142,6 @@ class TestGetNextSplitTime:
r._clock = fixed_clock(now)
r.split_duration = split_minutes
r.output_dir = cfg["output_directory"]
r.filename_pattern = "%Y%m%d_%H%M%S"
r.max_retries = 3
r.retry_delay = 1
r.file_format = "auto"
@@ -184,7 +183,7 @@ class TestGetNextSplitTime:
class TestGenerateFilename:
"""Tests for BaseRecorder.generate_filename()."""
def _recorder(self, pattern: str, now: datetime, output_dir: str) -> isr.BaseRecorder:
def _recorder(self, now: datetime, output_dir: str) -> isr.BaseRecorder:
class _Rec(isr.BaseRecorder):
def record(self): pass
@@ -198,28 +197,21 @@ class TestGenerateFilename:
r._clock = fixed_clock(now)
r.split_duration = 60
r.output_dir = output_dir
r.filename_pattern = pattern
r.max_retries = 3
r.retry_delay = 1
r.file_format = "auto"
return r
def test_basic_pattern(self, tmp_path):
def test_fixed_format(self, tmp_path):
# Filenames are always <YYYYMMDD>_<HHMMSS>.<ext> — the recording start.
now = datetime(2024, 12, 25, 14, 30, 0)
r = self._recorder("%Y%m%d_%H%M%S", now, str(tmp_path))
r = self._recorder(now, str(tmp_path))
name = r.generate_filename("mp3")
assert name.endswith("20241225_143000.mp3")
def test_subdirectory_created(self, tmp_path):
now = datetime(2024, 12, 25, 14, 30, 0)
r = self._recorder("%Y/%m/%d/rec_%H%M%S", now, str(tmp_path))
name = r.generate_filename("ogg")
parent = Path(name).parent
assert parent.exists()
def test_output_dir_prefix(self, tmp_path):
now = datetime(2024, 1, 1, 0, 0, 0)
r = self._recorder("%Y%m%d", now, str(tmp_path))
r = self._recorder(now, str(tmp_path))
name = r.generate_filename("wav")
assert name.startswith(str(tmp_path))
@@ -236,7 +228,6 @@ class TestDetectFormat:
"url": "http://example.com/stream",
"output_directory": "/tmp",
"split_minutes": 60,
"filename_pattern": "%Y%m%d_%H%M%S",
"max_retries": 1,
"retry_delay_seconds": 0,
"format": "auto",
@@ -252,7 +243,6 @@ class TestDetectFormat:
r._clock = datetime.now
r.split_duration = 60
r.output_dir = "/tmp"
r.filename_pattern = "%Y%m%d_%H%M%S"
r.max_retries = 1
r.retry_delay = 0
r.file_format = "auto"
@@ -446,7 +436,6 @@ class TestRecorderManagerLoadConfig:
[general]
output_directory = {str(tmp_path / "recordings")}
split_minutes = 30
filename_pattern = test_%Y%m%d
max_retries = 3
retry_delay_seconds = 2
log_level = WARNING
@@ -491,7 +480,6 @@ format = ogg
[general]
output_directory = {str(tmp_path / "recordings")}
split_minutes = 60
filename_pattern = %Y%m%d_%H%M%S
max_retries = 10
retry_delay_seconds = 5
log_level = WARNING
@@ -501,7 +489,6 @@ log_file = {log_file}
type = stream
url = http://example.com/stream
split_minutes = 15
filename_pattern = custom_%Y%m%d
"""
cfg_path = tmp_path / "config.ini"
cfg_path.write_text(config_text)
@@ -512,7 +499,6 @@ filename_pattern = custom_%Y%m%d
rec = mgr.recorders[0]
assert rec.split_duration == 15
assert rec.filename_pattern == "custom_%Y%m%d"
def test_unknown_type_is_skipped(self, tmp_path):
log_file = str(tmp_path / "test.log")
@@ -558,7 +544,6 @@ class TestStreamRecorderRecord:
"url": "http://example.com/stream",
"output_directory": "", # overridden per-test with tmp_path
"split_minutes": 60,
"filename_pattern": "%Y%m%d_%H%M%S",
"max_retries": 2,
"retry_delay_seconds": 0,
"format": fmt,
@@ -610,7 +595,6 @@ class TestSoundcardRecorder:
"format": "wav",
"output_directory": str(tmp_path),
"split_minutes": 60,
"filename_pattern": "%Y%m%d_%H%M%S",
"max_retries": 1,
"retry_delay_seconds": 0,
}
@@ -639,7 +623,6 @@ class TestSoundcardRecorder:
"format": "flac",
"output_directory": str(tmp_path),
"split_minutes": 60,
"filename_pattern": "%Y%m%d_%H%M%S",
"max_retries": 1,
"retry_delay_seconds": 0,
}
@@ -662,7 +645,6 @@ class TestSoundcardRecorder:
"format": "flac",
"output_directory": str(tmp_path),
"split_minutes": 60,
"filename_pattern": "%Y%m%d_%H%M%S",
"max_retries": 1,
"retry_delay_seconds": 0,
}
+31 -1
View File
@@ -2,7 +2,7 @@
import math
from web import _loud_sections, _noise_floor_db
from web import _cut_filename, _loud_sections, _noise_floor_db, _recording_start
WINDOW_DUR = 0.1 # 100 ms windows, as produced by WINDOW_SAMPLES at 48 kHz
@@ -99,3 +99,33 @@ def test_noise_floor_tracks_blocks_and_ignores_short_events():
floor = _noise_floor_db(db, WINDOW_DUR)
assert len(floor) == len(db)
assert all(abs(f - quiet_db) < 1.0 for f in floor)
# ---------------------------------------------------------------------------
# Filename parsing / cut naming
# ---------------------------------------------------------------------------
def test_recording_start_parses_standard_name():
from datetime import datetime
assert _recording_start("20260523_220000") == datetime(2026, 5, 23, 22, 0, 0)
def test_recording_start_rejects_nonstandard_name():
assert _recording_start("radio1_20260523") is None
assert _recording_start("notes") is None
def test_cut_filename_uses_wall_clock_span():
# Recording started 22:00:00; cut covers 22:31:30 → 22:32:30.
name = _cut_filename("20260523_220000", ".flac", 1890, 1950)
assert name == "20260523_22-31-30_22-32-30.flac"
def test_cut_filename_rolls_over_the_hour():
name = _cut_filename("20260523_220000", ".wav", 3590, 3661)
assert name == "20260523_22-59-50_23-01-01.wav"
def test_cut_filename_falls_back_for_nonstandard_name():
name = _cut_filename("mixtape", ".mp3", 740, 750.4)
assert name == "mixtape_cut_740s-750s.mp3"
+43 -3
View File
@@ -24,7 +24,7 @@ import subprocess
import tempfile
import threading
import wave
from datetime import datetime
from datetime import datetime, timedelta
from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer
from pathlib import Path
from urllib.parse import parse_qs, unquote, urlparse
@@ -58,6 +58,12 @@ MIN_RMS = 0.002 # ≈ 54 dBFS; the floor never drops below this,
CLIP_MAX_SECONDS = 600 # upper bound on /api/clip length
# Recording filenames encode the start time as this strftime format (kept in
# sync with isr.FILENAME_FORMAT). It is the authoritative recording start and
# the only reliable clock anchor — mtime drifts to the last write, so cut clip
# names and the displayed date are both derived from this.
FILENAME_FORMAT = '%Y%m%d_%H%M%S'
MIME_TYPES = {
'.wav': 'audio/wav',
'.mp3': 'audio/mpeg',
@@ -72,6 +78,35 @@ MIME_TYPES = {
# Audio analysis helpers
# ---------------------------------------------------------------------------
def _recording_start(stem: str):
"""Parse the recording start time encoded in a filename stem.
Returns a datetime, or None if the stem is not in FILENAME_FORMAT
(e.g. a manually renamed file). strptime ignores any extension because
callers pass Path.stem.
"""
try:
return datetime.strptime(stem, FILENAME_FORMAT)
except ValueError:
return None
def _cut_filename(stem: str, ext: str, start: float, end: float) -> str:
"""Name a cut by the real wall-clock span it covers.
For a recording that started at 22:00:00, a 22:31:30→22:32:30 slice
(start=1890, end=1950) becomes ``20260523_22-31-30_22-32-30.flac``.
Falls back to the source stem plus second offsets when the filename is
not in FILENAME_FORMAT (e.g. a manually renamed recording).
"""
started = _recording_start(stem)
if started is None:
return f'{stem}_cut_{int(start)}s-{int(end)}s{ext}'
cut_start = started + timedelta(seconds=start)
cut_end = started + timedelta(seconds=end)
return f'{cut_start:%Y%m%d}_{cut_start:%H-%M-%S}_{cut_end:%H-%M-%S}{ext}'
def _live_wav_header(path: Path, size: int):
"""Return the WAV header (through the 'data' chunk header) with RIFF and
data sizes rewritten to match the current file size, or None.
@@ -500,6 +535,11 @@ def list_files(recordings_dir: str):
rel = str(path.relative_to(base)).replace('\\', '/')
is_active = rel in active_files
# The recording start is encoded in the filename and is the true clock
# anchor; mtime is only a fallback for files not in FILENAME_FORMAT.
started = _recording_start(path.stem)
date = (started or datetime.fromtimestamp(stat.st_mtime)).strftime('%Y-%m-%d %H:%M:%S')
# Skip reading partial headers for in-progress files — the WAV nframes
# field and FLAC total_samples are both unfinalized while recording,
# producing wildly incorrect values (e.g. 53375995583:39:01).
@@ -509,7 +549,7 @@ def list_files(recordings_dir: str):
'name': rel,
'size': stat.st_size,
'mtime': stat.st_mtime,
'date': datetime.fromtimestamp(stat.st_mtime).strftime('%Y-%m-%d %H:%M:%S'),
'date': date,
'duration': duration,
'ext': path.suffix.lower().lstrip('.'),
'recording': is_active,
@@ -831,7 +871,7 @@ class _Handler(BaseHTTPRequestHandler):
return
ext = path.suffix.lower()
out_name = f'{path.stem}_cut_{int(start)}s-{int(end)}s{ext}'
out_name = _cut_filename(path.stem, ext, start, end)
# For lossless formats, re-encode (not copy) so the container header
# is rewritten with the correct duration/size. For lossy formats,