feat: onset-aware section scoring so slow swells rank at the bottom

A section's score is now its peak dB above the noise floor capped by the sharpest rise within ONSET_SECONDS (0.5 s). Real events (voices, impacts, barks) rise fast and keep their full prominence; a gradual swell that outruns the 30 s floor blocks (gusts, distant approaching cars) still flags but scores near zero, so score-ranked review (chips, U/I highlights, "Highlights only" mode) surfaces events first. A section starting in a file's first 0.5 s is scored against the floor instead, so events cut off by a file split are not punished as swells. Old cached analyses carry now-wrong scores, so the cache gains a leading "detector" version key (DETECTOR_VERSION = 2) checked by both _cached_analysis_params() and the /api/analyze cache hit path; v1 caches never match and are recomputed on the next analyse. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 14:57:19 +02:00
parent 6431918989
commit f6031cfa16
4 changed files with 76 additions and 20 deletions
@@ -58,9 +58,9 @@ Dependencies: `requests` (streams), `numpy` + `soundfile` (FLAC output and FLAC
 - **Filename is the clock — fixed format, not configurable.** Recordings are named `%Y%m%d_%H%M%S.<ext>` (the *start* time). This is hardcoded as `FILENAME_FORMAT`, defined in **both** `isr.py` (recorder writes it) and `web.py` (reads it back) — the two copies must stay in sync. There is no `filename_pattern` config option (removed; `web.py` can't see `config.ini`, so a configurable pattern would break parsing). `web.py` derives the displayed DATE column from the filename via `_recording_start()` (falling back to mtime only for non-standard names — mtime is the last write ≈ end, not the start). Cut downloads are named by the wall-clock span they cover via `_cut_filename()`: a 22:31:30→22:32:30 slice of `20260523_220000.flac` becomes `20260523_22-31-30_22-32-30.flac`; non-standard source names fall back to `<stem>_cut_<start>s-<end>s`.
 - **ALSA:** capture spawns `arecord` as a subprocess, raw PCM read in 100 ms chunks by a thread. Device spec resolution: `default` → exact `hw:X,Y` → partial name → fallback to any literal ALSA PCM name (so `shared_mic` from asound.conf works without appearing in `arecord -l`).
 - **Shutdown:** SIGTERM is converted to KeyboardInterrupt in `main()`; `RecorderManager.stop()` joins all threads against a single shared 25 s deadline to stay inside Docker's `stop_grace_period: 30s`.
- **Loud-section detection is adaptive — do not regress it to an absolute threshold.** Per-window dB is compared against a rolling noise floor (`NOISE_PERCENTILE`-th percentile per `NOISE_BLOCK_SECONDS` block, min-smoothed over ±2 blocks so events can't raise their own floor; clamped to ≥ `MIN_RMS`). A section needs `margin` dB of prominence and carries a `score` (peak dB above floor) used for ranking. Sections shorter than `min_duration` (default 0.5 s, after `min_gap` merging) are discarded — without this, isolated 100 ms pops (clicks, single raindrops) produced thousands of zero-length sections per day. The original fixed RMS threshold flagged every ambience change (passing cars, rain) and produced ~600 useless sections/day — that is why it was replaced. Known limitation: a short (~10 s) swell on a quiet street still flags because the floor blocks are 30 s; the planned fix is an onset/spectral filter or optional Silero VAD, **not** a higher margin. Tests in `tests/test_web.py`.
+- **Loud-section detection is adaptive — do not regress it to an absolute threshold.** Per-window dB is compared against a rolling noise floor (`NOISE_PERCENTILE`-th percentile per `NOISE_BLOCK_SECONDS` block, min-smoothed over ±2 blocks so events can't raise their own floor; clamped to ≥ `MIN_RMS`). A section needs `margin` dB of prominence and carries a `score` used for ranking: peak dB above floor, **capped by the sharpest rise within `ONSET_SECONDS` (0.5 s)** — so a short (~10 s) swell that outruns the 30 s floor blocks still flags but scores ≈ 0 and sinks in the U/I highlight ranking, while sharp events keep their full prominence. A section starting in the first 0.5 s of a file is scored against the floor instead (events cut off by a file split must not be punished as swells). Do not regress the scoring to raw peak, and do not fight swells with a higher margin. If flagging itself (not just ranking) ever needs improving, the next step is a spectral filter or optional Silero VAD over candidate sections. Sections shorter than `min_duration` (default 0.5 s, after `min_gap` merging) are discarded — without this, isolated 100 ms pops (clicks, single raindrops) produced thousands of zero-length sections per day. The original fixed RMS threshold flagged every ambience change (passing cars, rain) and produced ~600 useless sections/day — that is why it was replaced. Tests in `tests/test_web.py`.
 - **Analysis params are coupled in five places.** CLI `--margin`/`--min-gap`/`--min-duration` → `/api/config` → UI inputs `#margin-input`/`#min-gap-input`/`#min-duration-input` → `/api/analyze` query params → cache JSON head keys. Renaming or adding a param means touching all five plus `cachedParamsMatch()` and the `_cached_analysis_params()` regex (see the threshold→margin change `c84b7d8` and the min_duration addition).
- **Analysis cache:** results stored as `<analyses-dir>/<file>.analysis.json` keyed by margin+min_gap+min_duration; orphans pruned at web startup. In Docker the recordings mount is **read-only** for the web container, so docker-compose layers a read-write `./recordings/analyses` bind mount over it. The `margin`, `min_gap`, and `min_duration` keys MUST stay first in the cache JSON — `_cached_analysis_params()` reads only the first 256 bytes to avoid parsing the large embedded result. Caches written by older detector versions (missing a key) never match and get overwritten on the next analyse.
+- **Analysis cache:** results stored as `<analyses-dir>/<file>.analysis.json` keyed by margin+min_gap+min_duration; orphans pruned at web startup. In Docker the recordings mount is **read-only** for the web container, so docker-compose layers a read-write `./recordings/analyses` bind mount over it. The `detector`, `margin`, `min_gap`, and `min_duration` keys MUST stay first in the cache JSON — `_cached_analysis_params()` reads only the first 256 bytes to avoid parsing the large embedded result. `detector` is `DETECTOR_VERSION`: bump it whenever detection/scoring changes make old cached results wrong (e.g. v2 = onset-capped scores); caches with another version (or missing keys) never match and get overwritten on the next analyse.
 - **Analyze responses:** `/api/analyze` returns `rms_display` (~800 points), never the full per-window RMS list — the UI doesn't use it and it is ~45x larger.
 - **Section playback uses clips, not seeks:** `/api/clip?file&start&end` decodes the slice server-side (wave/soundfile) and returns a standalone 16-bit WAV with exact Content-Length (capped at `CLIP_MAX_SECONDS`), `Cache-Control: private` so re-listening is free. The UI plays chips/J-K through the bottom clip bar (`clipQueue` in webui.html); seeking the full file only happens via "Open in file". Rationale (finding): libsndfile writes FLAC **without a SEEKTABLE**, so a browser seek bisects the whole multi-hundred-MB file with Range requests — seeking big FLACs in `<audio>` is inherently slow and must not be reintroduced as the primary navigation. Server-side `sf.SoundFile.seek()` on local disk is fast and frame-accurate.
 - **HTTP/1.1 keep-alive:** `_Handler.protocol_version = 'HTTP/1.1'`; every response path must set an accurate `Content-Length`. `_copy_to_response()` force-closes the connection if it under-delivers (file truncated mid-serve).
@@ -156,7 +156,7 @@ Shows recordings grouped by day with collapsible sections. Features:
 - **Day groups** — recordings are grouped under a collapsible day heading showing date, file count, total duration, and total size. The most recent day is expanded by default; older days start collapsed. Expanded state is preserved across filter changes.
 - **Day highlights** — click **Highlights** on any day heading to run loudness analysis across all WAV/FLAC files in that day and display a combined activity timeline SVG. Orange segments show when loud sections occurred relative to the day's time span; blue shows the file extents. Labels show the start, midpoint, and end times. When a day has more sections than fit as chips, the chips show the top 50 by score (loudest-above-background first) so the most promising events are reviewed first; J/K still steps through all sections in time order, and U/I steps through only the top-scored highlights.
 - **Inline playback** — collapsible `Play` button per row; audio loads lazily via a seekable `/stream/` endpoint with HTTP Range support. Metadata is fetched immediately so the duration is visible without pressing play.
- **Waveform analysis** — on demand per file; computes RMS per 100 ms window and marks sections that stand out above the background. Detection is **adaptive**: a rolling noise floor (20th percentile per 30 s block) is estimated across the file, and a section is flagged when the level rises at least *margin* dB (default 12) above that floor. Slow ambience changes — rain setting in, day/night traffic hum — move the floor instead of producing false positives. Each section gets a **score** (its peak dB above the floor) used to rank sections by how much they stand out. Supported for WAV and FLAC (FLAC requires `numpy` + `soundfile`). Pure-Python fallback for WAV when numpy is absent. Results are cached in `recordings/analyses/<filename>.analysis.json`; subsequent requests at the same margin, min-gap, and min-duration settings return instantly without re-reading the audio. The cache file is deleted automatically when the audio file is deleted. Orphaned cache files (audio deleted outside the UI) are pruned on startup.
+- **Waveform analysis** — on demand per file; computes RMS per 100 ms window and marks sections that stand out above the background. Detection is **adaptive**: a rolling noise floor (20th percentile per 30 s block) is estimated across the file, and a section is flagged when the level rises at least *margin* dB (default 12) above that floor. Slow ambience changes — rain setting in, day/night traffic hum — move the floor instead of producing false positives. Each section gets a **score** used to rank it: its peak dB above the floor, capped by the sharpest rise within 0.5 s. Abrupt events — voices, impacts, barks — rise fast, so their score is their full prominence; a gradual swell (a gust, a distant approaching car) that drifts up faster than the floor can track still gets flagged, but scores near zero and sinks to the bottom of the highlight ranking. Supported for WAV and FLAC (FLAC requires `numpy` + `soundfile`). Pure-Python fallback for WAV when numpy is absent. Results are cached in `recordings/analyses/<filename>.analysis.json`; subsequent requests at the same margin, min-gap, and min-duration settings return instantly without re-reading the audio. The cache file is deleted automatically when the audio file is deleted. Orphaned cache files (audio deleted outside the UI) are pruned on startup.
 - **Grace period** — configurable in the controls bar (default 2 s). Loud sections separated by less than this gap are merged into one. Raise this (e.g. to 15–30 s) when a single event generates many timestamps due to brief quiet gaps within it.
 - **Min duration** — configurable in the controls bar (default 0.5 s). Loud sections shorter than this (after grace-period merging) are discarded, so isolated sub-second pops — a click, a single raindrop — don't flood a day with thousands of near-zero-length sections. Set to 0 to disable.
 - **Clip playback** — clicking a loud-section chip plays a short server-rendered WAV clip (`/api/clip`, pre-roll included) in a player bar at the bottom of the page. Playback starts instantly even for sections deep inside multi-hundred-MB FLACs, because the browser never has to seek the full file. **J** / **K** (or the **Prev** / **Next** buttons) step through the queued sections — one file's, or a whole day's after **Highlights** — and **Auto-advance** plays the next section when one ends, turning a day's detections into a continuous review reel. **U** / **I** step through *highlights only*: the top-scored sections of the queue (count set by the **Top** input in the player bar, default 50). Ticking **Highlights only** makes J/K, Prev/Next, and Auto-advance skip non-highlights too, so a day with thousands of detections can be reviewed as a short reel of just the loudest events. The same keys work during full-file playback, seeking the open recording between (highlight) sections. **Open in file** switches to the full recording at the same position for context; each chip click also pre-fills the cut panel.
@@ -172,7 +172,7 @@ Shows recordings grouped by day with collapsible sections. Features:
 The detector is purely energy-based: anything that rises *margin* dB above the rolling background floor for at least *min duration* seconds is flagged. In a quiet environment the floor sits very low, so even modest sounds — a gust, a distant car, rustling — clear the margin, and an outdoor day can easily produce over a thousand sections. What to adjust, in rough order of preference:
-1. **Review by rank instead of thinning the list** — tick **Highlights only** (or use **U**/**I**) to step through just the top-scored sections. Nothing is discarded, so quieter events are still there if the top of the list comes up empty.
+1. **Review by rank instead of thinning the list** — tick **Highlights only** (or use **U**/**I**) to step through just the top-scored sections. Scores favour sharp onsets, so slow ambience swells sort themselves to the bottom automatically. Nothing is discarded — quieter events are still there if the top of the list comes up empty.
 2. **Raise the grace period** (2 → 15–30 s) — merges clusters of related noise (a rain burst, one long conversation) into a single section. Cuts the count heavily without dropping any audio from review.
 3. **Raise the margin** (12 → 15–18 dB) — demands more prominence above the background. The quietest events disappear first, so move in small steps.
 4. **Raise min duration** (0.5 → 1–2 s) — drops short rustles and pops, but beware: single bangs or knocks are themselves short.
@@ -92,6 +92,37 @@ def test_min_duration_applies_after_gap_merging():
    assert sections[0]['end'] >= 61.0
 def test_slow_swell_scores_near_zero_sharp_burst_scores_high():
    # A ~10 s swell rises faster than the 30 s floor blocks can track, so it
    # still flags — but its score must collapse (onset cap) so it ranks at the
    # bottom, while a sharp burst with the same peak keeps its full score.
    rms = [0.002] * 1200
    for k in range(80):                       # rise −54 → −28 dB over 8 s
        rms[400 + k] = 0.002 * 10 ** (26 * (k / 80) / 20)
    rms[480:500] = [0.04] * 20                # hold 2 s at −28 dB
    for k in range(80):                       # fall back over 8 s
        rms[500 + k] = 0.04 * 10 ** (-26 * (k / 80) / 20)
    rms[900:910] = [0.04] * 10                # sharp 1 s burst, same peak
    sections = _run(rms)
    assert len(sections) == 2
    swell, burst = sections
    assert swell['start'] < 50.0 < 90.0 <= burst['start']
    assert burst['score'] >= 24.0             # ≈ full 26 dB prominence
    assert swell['score'] <= 5.0              # capped by its slow onset
    assert swell['score'] < burst['score']
 def test_burst_at_file_start_keeps_full_score():
    # No pre-event history to measure a rise against: the onset cap falls back
    # to prominence above the floor, so events cut off by a file split are not
    # punished as swells.
    rms = [0.05] * 10 + [0.002] * 1190
    sections = _run(rms)
    assert len(sections) == 1
    assert sections[0]['start'] == 0.0
    assert sections[0]['score'] >= 24.0
 def test_noise_floor_tracks_blocks_and_ignores_short_events():
    quiet_db = 20 * math.log10(0.002)
    db = [quiet_db] * 1200
@@ -55,6 +55,13 @@ NOISE_BLOCK_SECONDS = 30.0 # noise floor is estimated per block of this length
 NOISE_PERCENTILE    = 20   # percentile of windowed dB levels taken as the floor
 MIN_RMS             = 0.002  # ≈ −54 dBFS; the floor never drops below this, so
                             # digital silence does not make every tiny sound loud
 ONSET_SECONDS       = 0.5  # a section's score is capped by its sharpest dB rise
                           # within this span, so slow swells rank low
 # Bumped whenever section detection/scoring changes in a way that makes old
 # cached results wrong (not just differently parameterised). Caches written
 # with another version never match and are recomputed on the next analyse.
 DETECTOR_VERSION = 2
 CLIP_MAX_SECONDS = 600     # upper bound on /api/clip length
@@ -331,8 +338,12 @@ def _loud_sections(rms_values: list, window_dur: float, duration: float,
                   margin_db: float, min_gap: float = MIN_GAP_SECONDS,
                   min_duration: float = MIN_DURATION_SECONDS) -> list:
    """Sections whose level rises at least margin_db above the local noise
-    floor. Each section carries a 'score': its peak dB above the floor, used
+    floor. Each section carries a 'score': its peak dB above the floor,
-    by the UI to rank sections by how much they stand out.
+    capped by the sharpest rise observed within ONSET_SECONDS. Real events
    (voices, impacts, barks) have steep onsets, so their cap equals their
    peak; a swell that drifts up slower than the noise-floor blocks can track
    (wind, a distant approaching car) still flags but scores near zero, so
    score-ranked review (chips, U/I highlights) surfaces events first.
    Sections shorter than min_duration (after min_gap merging) are discarded:
    without this, every isolated 100 ms window that pops above the floor — a
@@ -341,11 +352,16 @@ def _loud_sections(rms_values: list, window_dur: float, duration: float,
    db = [20 * math.log10(max(r, 1e-6)) for r in rms_values]
    floor = _noise_floor_db(db, window_dur)
    min_db = 20 * math.log10(MIN_RMS)
    onset_win = max(1, int(round(ONSET_SECONDS / window_dur)))
    sections = []
    start_t = None
    last_loud_t = None
    peak = 0.0
    onset = 0.0
    def _score():
        return round(max(0.0, min(peak, onset)), 1)
    for i, d in enumerate(db):
        t = i * window_dur
@@ -354,21 +370,27 @@ def _loud_sections(rms_values: list, window_dur: float, duration: float,
            if start_t is None:
                start_t = t
                peak = 0.0
                onset = 0.0
            last_loud_t = t
            peak = max(peak, d - floor_eff)
            # Rise within the onset span; a section starting before the file
            # has history is measured against the floor instead (an event cut
            # off by a file split must not be punished as a swell).
            rise = d - db[i - onset_win] if i >= onset_win else d - floor_eff
            onset = max(onset, rise)
        else:
            if start_t is not None and (t - last_loud_t) > min_gap:
                end_t = last_loud_t + window_dur
                if end_t - start_t >= min_duration - 1e-9:
                    sections.append({'start': round(start_t, 1),
                                     'end':   round(end_t, 1),
-                                     'score': round(peak, 1)})
+                                     'score': _score()})
                start_t = None
                last_loud_t = None
    if start_t is not None and (last_loud_t + window_dur - start_t) >= min_duration - 1e-9:
        sections.append({'start': round(start_t, 1), 'end': round(duration, 1),
-                         'score': round(peak, 1)})
+                         'score': _score()})
    return sections
@@ -452,21 +474,22 @@ def _analysis_cache_path(analyses_base: Path, recordings_base: Path, audio_path:
 def _cached_analysis_params(cache_path: Path):
-    """Read just margin/min_gap/min_duration from a cache file without parsing
+    """Read just detector/margin/min_gap/min_duration from a cache file
-    the whole JSON (the embedded result can be hundreds of KB). Relies on the
+    without parsing the whole JSON (the embedded result can be hundreds of
-    writer in _api_analyze putting these three keys first. Caches written by
+    KB). Relies on the writer in _api_analyze putting these keys first.
-    older detector versions lack one of the keys and simply never match."""
+    Caches written by other detector versions (or so old they lack a key)
    simply never match and get recomputed on the next analyse."""
    try:
        with open(cache_path, 'r', encoding='utf-8') as fh:
            head = fh.read(256)
    except OSError:
        return None
-    m = re.search(r'"margin":\s*([0-9.eE+-]+),\s*"min_gap":\s*([0-9.eE+-]+),'
+    m = re.search(r'"detector":\s*(\d+),\s*"margin":\s*([0-9.eE+-]+),'
-                  r'\s*"min_duration":\s*([0-9.eE+-]+)', head)
+                  r'\s*"min_gap":\s*([0-9.eE+-]+),\s*"min_duration":\s*([0-9.eE+-]+)', head)
-    if not m:
+    if not m or int(m.group(1)) != DETECTOR_VERSION:
        return None
-    return {'margin': float(m.group(1)), 'min_gap': float(m.group(2)),
+    return {'margin': float(m.group(2)), 'min_gap': float(m.group(3)),
-            'min_duration': float(m.group(3))}
+            'min_duration': float(m.group(4))}
 def prune_orphan_analyses(analyses_base: Path, recordings_base: Path):
@@ -663,7 +686,8 @@ class _Handler(BaseHTTPRequestHandler):
        cache_path = _analysis_cache_path(analyses_base, recordings_base, path)
        try:
            cached = json.loads(cache_path.read_text('utf-8'))
-            if (cached.get('margin') == margin and cached.get('min_gap') == min_gap
+            if (cached.get('detector') == DETECTOR_VERSION
                    and cached.get('margin') == margin and cached.get('min_gap') == min_gap
                    and cached.get('min_duration') == min_duration):
                payload = dict(cached['result'])
                payload.pop('rms', None)  # caches written before the full-RMS field was dropped
@@ -688,9 +712,10 @@ class _Handler(BaseHTTPRequestHandler):
        try:
            cache_path.parent.mkdir(parents=True, exist_ok=True)
            tmp = cache_path.with_suffix('.tmp')
-            # margin, min_gap and min_duration MUST stay first:
+            # detector, margin, min_gap and min_duration MUST stay first:
            # _cached_analysis_params reads only the first 256 bytes of this file
-            tmp.write_text(json.dumps({'margin': margin, 'min_gap': min_gap,
+            tmp.write_text(json.dumps({'detector': DETECTOR_VERSION,
                                       'margin': margin, 'min_gap': min_gap,
                                       'min_duration': min_duration, 'result': result}), 'utf-8')
            os.replace(tmp, cache_path)
        except Exception as e: