feat: onset-aware section scoring so slow swells rank at the bottom

A section's score is now its peak dB above the noise floor capped by the sharpest rise within ONSET_SECONDS (0.5 s). Real events (voices, impacts, barks) rise fast and keep their full prominence; a gradual swell that outruns the 30 s floor blocks (gusts, distant approaching cars) still flags but scores near zero, so score-ranked review (chips, U/I highlights, "Highlights only" mode) surfaces events first. A section starting in a file's first 0.5 s is scored against the floor instead, so events cut off by a file split are not punished as swells. Old cached analyses carry now-wrong scores, so the cache gains a leading "detector" version key (DETECTOR_VERSION = 2) checked by both _cached_analysis_params() and the /api/analyze cache hit path; v1 caches never match and are recomputed on the next analyse. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 14:57:19 +02:00
parent 6431918989
commit f6031cfa16
4 changed files with 76 additions and 20 deletions
@@ -156,7 +156,7 @@ Shows recordings grouped by day with collapsible sections. Features:
 - **Day groups** — recordings are grouped under a collapsible day heading showing date, file count, total duration, and total size. The most recent day is expanded by default; older days start collapsed. Expanded state is preserved across filter changes.
 - **Day highlights** — click **Highlights** on any day heading to run loudness analysis across all WAV/FLAC files in that day and display a combined activity timeline SVG. Orange segments show when loud sections occurred relative to the day's time span; blue shows the file extents. Labels show the start, midpoint, and end times. When a day has more sections than fit as chips, the chips show the top 50 by score (loudest-above-background first) so the most promising events are reviewed first; J/K still steps through all sections in time order, and U/I steps through only the top-scored highlights.
 - **Inline playback** — collapsible `Play` button per row; audio loads lazily via a seekable `/stream/` endpoint with HTTP Range support. Metadata is fetched immediately so the duration is visible without pressing play.
- **Waveform analysis** — on demand per file; computes RMS per 100 ms window and marks sections that stand out above the background. Detection is **adaptive**: a rolling noise floor (20th percentile per 30 s block) is estimated across the file, and a section is flagged when the level rises at least *margin* dB (default 12) above that floor. Slow ambience changes — rain setting in, day/night traffic hum — move the floor instead of producing false positives. Each section gets a **score** (its peak dB above the floor) used to rank sections by how much they stand out. Supported for WAV and FLAC (FLAC requires `numpy` + `soundfile`). Pure-Python fallback for WAV when numpy is absent. Results are cached in `recordings/analyses/<filename>.analysis.json`; subsequent requests at the same margin, min-gap, and min-duration settings return instantly without re-reading the audio. The cache file is deleted automatically when the audio file is deleted. Orphaned cache files (audio deleted outside the UI) are pruned on startup.
+- **Waveform analysis** — on demand per file; computes RMS per 100 ms window and marks sections that stand out above the background. Detection is **adaptive**: a rolling noise floor (20th percentile per 30 s block) is estimated across the file, and a section is flagged when the level rises at least *margin* dB (default 12) above that floor. Slow ambience changes — rain setting in, day/night traffic hum — move the floor instead of producing false positives. Each section gets a **score** used to rank it: its peak dB above the floor, capped by the sharpest rise within 0.5 s. Abrupt events — voices, impacts, barks — rise fast, so their score is their full prominence; a gradual swell (a gust, a distant approaching car) that drifts up faster than the floor can track still gets flagged, but scores near zero and sinks to the bottom of the highlight ranking. Supported for WAV and FLAC (FLAC requires `numpy` + `soundfile`). Pure-Python fallback for WAV when numpy is absent. Results are cached in `recordings/analyses/<filename>.analysis.json`; subsequent requests at the same margin, min-gap, and min-duration settings return instantly without re-reading the audio. The cache file is deleted automatically when the audio file is deleted. Orphaned cache files (audio deleted outside the UI) are pruned on startup.
 - **Grace period** — configurable in the controls bar (default 2 s). Loud sections separated by less than this gap are merged into one. Raise this (e.g. to 15–30 s) when a single event generates many timestamps due to brief quiet gaps within it.
 - **Min duration** — configurable in the controls bar (default 0.5 s). Loud sections shorter than this (after grace-period merging) are discarded, so isolated sub-second pops — a click, a single raindrop — don't flood a day with thousands of near-zero-length sections. Set to 0 to disable.
 - **Clip playback** — clicking a loud-section chip plays a short server-rendered WAV clip (`/api/clip`, pre-roll included) in a player bar at the bottom of the page. Playback starts instantly even for sections deep inside multi-hundred-MB FLACs, because the browser never has to seek the full file. **J** / **K** (or the **Prev** / **Next** buttons) step through the queued sections — one file's, or a whole day's after **Highlights** — and **Auto-advance** plays the next section when one ends, turning a day's detections into a continuous review reel. **U** / **I** step through *highlights only*: the top-scored sections of the queue (count set by the **Top** input in the player bar, default 50). Ticking **Highlights only** makes J/K, Prev/Next, and Auto-advance skip non-highlights too, so a day with thousands of detections can be reviewed as a short reel of just the loudest events. The same keys work during full-file playback, seeking the open recording between (highlight) sections. **Open in file** switches to the full recording at the same position for context; each chip click also pre-fills the cut panel.
@@ -172,7 +172,7 @@ Shows recordings grouped by day with collapsible sections. Features:

 The detector is purely energy-based: anything that rises *margin* dB above the rolling background floor for at least *min duration* seconds is flagged. In a quiet environment the floor sits very low, so even modest sounds — a gust, a distant car, rustling — clear the margin, and an outdoor day can easily produce over a thousand sections. What to adjust, in rough order of preference:

-1. **Review by rank instead of thinning the list** — tick **Highlights only** (or use **U**/**I**) to step through just the top-scored sections. Nothing is discarded, so quieter events are still there if the top of the list comes up empty.
+1. **Review by rank instead of thinning the list** — tick **Highlights only** (or use **U**/**I**) to step through just the top-scored sections. Scores favour sharp onsets, so slow ambience swells sort themselves to the bottom automatically. Nothing is discarded — quieter events are still there if the top of the list comes up empty.
 2. **Raise the grace period** (2 → 15–30 s) — merges clusters of related noise (a rain burst, one long conversation) into a single section. Cuts the count heavily without dropping any audio from review.
 3. **Raise the margin** (12 → 15–18 dB) — demands more prominence above the background. The quietest events disappear first, so move in small steps.
 4. **Raise min duration** (0.5 → 1–2 s) — drops short rustles and pops, but beware: single bangs or knocks are themselves short.