feat: onset-aware section scoring so slow swells rank at the bottom

A section's score is now its peak dB above the noise floor capped by the sharpest rise within ONSET_SECONDS (0.5 s). Real events (voices, impacts, barks) rise fast and keep their full prominence; a gradual swell that outruns the 30 s floor blocks (gusts, distant approaching cars) still flags but scores near zero, so score-ranked review (chips, U/I highlights, "Highlights only" mode) surfaces events first. A section starting in a file's first 0.5 s is scored against the floor instead, so events cut off by a file split are not punished as swells. Old cached analyses carry now-wrong scores, so the cache gains a leading "detector" version key (DETECTOR_VERSION = 2) checked by both _cached_analysis_params() and the /api/analyze cache hit path; v1 caches never match and are recomputed on the next analyse. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 14:57:19 +02:00
parent 6431918989
commit f6031cfa16
4 changed files with 76 additions and 20 deletions
@@ -58,9 +58,9 @@ Dependencies: `requests` (streams), `numpy` + `soundfile` (FLAC output and FLAC
 - **Filename is the clock — fixed format, not configurable.** Recordings are named `%Y%m%d_%H%M%S.<ext>` (the *start* time). This is hardcoded as `FILENAME_FORMAT`, defined in **both** `isr.py` (recorder writes it) and `web.py` (reads it back) — the two copies must stay in sync. There is no `filename_pattern` config option (removed; `web.py` can't see `config.ini`, so a configurable pattern would break parsing). `web.py` derives the displayed DATE column from the filename via `_recording_start()` (falling back to mtime only for non-standard names — mtime is the last write ≈ end, not the start). Cut downloads are named by the wall-clock span they cover via `_cut_filename()`: a 22:31:30→22:32:30 slice of `20260523_220000.flac` becomes `20260523_22-31-30_22-32-30.flac`; non-standard source names fall back to `<stem>_cut_<start>s-<end>s`.
 - **ALSA:** capture spawns `arecord` as a subprocess, raw PCM read in 100 ms chunks by a thread. Device spec resolution: `default` → exact `hw:X,Y` → partial name → fallback to any literal ALSA PCM name (so `shared_mic` from asound.conf works without appearing in `arecord -l`).
 - **Shutdown:** SIGTERM is converted to KeyboardInterrupt in `main()`; `RecorderManager.stop()` joins all threads against a single shared 25 s deadline to stay inside Docker's `stop_grace_period: 30s`.
- **Loud-section detection is adaptive — do not regress it to an absolute threshold.** Per-window dB is compared against a rolling noise floor (`NOISE_PERCENTILE`-th percentile per `NOISE_BLOCK_SECONDS` block, min-smoothed over ±2 blocks so events can't raise their own floor; clamped to ≥ `MIN_RMS`). A section needs `margin` dB of prominence and carries a `score` (peak dB above floor) used for ranking. Sections shorter than `min_duration` (default 0.5 s, after `min_gap` merging) are discarded — without this, isolated 100 ms pops (clicks, single raindrops) produced thousands of zero-length sections per day. The original fixed RMS threshold flagged every ambience change (passing cars, rain) and produced ~600 useless sections/day — that is why it was replaced. Known limitation: a short (~10 s) swell on a quiet street still flags because the floor blocks are 30 s; the planned fix is an onset/spectral filter or optional Silero VAD, **not** a higher margin. Tests in `tests/test_web.py`.
+- **Loud-section detection is adaptive — do not regress it to an absolute threshold.** Per-window dB is compared against a rolling noise floor (`NOISE_PERCENTILE`-th percentile per `NOISE_BLOCK_SECONDS` block, min-smoothed over ±2 blocks so events can't raise their own floor; clamped to ≥ `MIN_RMS`). A section needs `margin` dB of prominence and carries a `score` used for ranking: peak dB above floor, **capped by the sharpest rise within `ONSET_SECONDS` (0.5 s)** — so a short (~10 s) swell that outruns the 30 s floor blocks still flags but scores ≈ 0 and sinks in the U/I highlight ranking, while sharp events keep their full prominence. A section starting in the first 0.5 s of a file is scored against the floor instead (events cut off by a file split must not be punished as swells). Do not regress the scoring to raw peak, and do not fight swells with a higher margin. If flagging itself (not just ranking) ever needs improving, the next step is a spectral filter or optional Silero VAD over candidate sections. Sections shorter than `min_duration` (default 0.5 s, after `min_gap` merging) are discarded — without this, isolated 100 ms pops (clicks, single raindrops) produced thousands of zero-length sections per day. The original fixed RMS threshold flagged every ambience change (passing cars, rain) and produced ~600 useless sections/day — that is why it was replaced. Tests in `tests/test_web.py`.
 - **Analysis params are coupled in five places.** CLI `--margin`/`--min-gap`/`--min-duration` → `/api/config` → UI inputs `#margin-input`/`#min-gap-input`/`#min-duration-input` → `/api/analyze` query params → cache JSON head keys. Renaming or adding a param means touching all five plus `cachedParamsMatch()` and the `_cached_analysis_params()` regex (see the threshold→margin change `c84b7d8` and the min_duration addition).
- **Analysis cache:** results stored as `<analyses-dir>/<file>.analysis.json` keyed by margin+min_gap+min_duration; orphans pruned at web startup. In Docker the recordings mount is **read-only** for the web container, so docker-compose layers a read-write `./recordings/analyses` bind mount over it. The `margin`, `min_gap`, and `min_duration` keys MUST stay first in the cache JSON — `_cached_analysis_params()` reads only the first 256 bytes to avoid parsing the large embedded result. Caches written by older detector versions (missing a key) never match and get overwritten on the next analyse.
+- **Analysis cache:** results stored as `<analyses-dir>/<file>.analysis.json` keyed by margin+min_gap+min_duration; orphans pruned at web startup. In Docker the recordings mount is **read-only** for the web container, so docker-compose layers a read-write `./recordings/analyses` bind mount over it. The `detector`, `margin`, `min_gap`, and `min_duration` keys MUST stay first in the cache JSON — `_cached_analysis_params()` reads only the first 256 bytes to avoid parsing the large embedded result. `detector` is `DETECTOR_VERSION`: bump it whenever detection/scoring changes make old cached results wrong (e.g. v2 = onset-capped scores); caches with another version (or missing keys) never match and get overwritten on the next analyse.
 - **Analyze responses:** `/api/analyze` returns `rms_display` (~800 points), never the full per-window RMS list — the UI doesn't use it and it is ~45x larger.
 - **Section playback uses clips, not seeks:** `/api/clip?file&start&end` decodes the slice server-side (wave/soundfile) and returns a standalone 16-bit WAV with exact Content-Length (capped at `CLIP_MAX_SECONDS`), `Cache-Control: private` so re-listening is free. The UI plays chips/J-K through the bottom clip bar (`clipQueue` in webui.html); seeking the full file only happens via "Open in file". Rationale (finding): libsndfile writes FLAC **without a SEEKTABLE**, so a browser seek bisects the whole multi-hundred-MB file with Range requests — seeking big FLACs in `<audio>` is inherently slow and must not be reintroduced as the primary navigation. Server-side `sf.SoundFile.seek()` on local disk is fast and frame-accurate.
 - **HTTP/1.1 keep-alive:** `_Handler.protocol_version = 'HTTP/1.1'`; every response path must set an accurate `Content-Length`. `_copy_to_response()` force-closes the connection if it under-delivers (file truncated mid-serve).