feat: move analysis cache to recordings/analyses/, prune orphans on startup

- Cache files now live in recordings/analyses/<filename>.analysis.json (mirroring the relative path for files in subdirectories) rather than alongside each audio file. - _api_delete now removes the corresponding cache file after deleting audio. - prune_orphan_analyses() runs at startup and removes any .analysis.json whose audio file no longer exists (handles files deleted outside the UI). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
feat: cache analysis results alongside audio files
2026-06-02 22:33:26 +02:00 · 2026-06-02 22:30:12 +02:00 · 2026-06-02 22:29:42 +02:00 · 2026-06-02 22:28:20 +02:00
3 changed files with 55 additions and 157 deletions
@@ -159,7 +159,7 @@ Shows recordings grouped by day with collapsible sections. Features:
 - **Day groups** — recordings are grouped under a collapsible day heading showing date, file count, total duration, and total size. The most recent day is expanded by default; older days start collapsed. Expanded state is preserved across filter changes.
 - **Day highlights** — click **★ Highlights** on any day heading to run loudness analysis across all WAV/FLAC files in that day and display a combined activity timeline SVG. Orange segments show when loud sections occurred relative to the day's time span; blue shows the file extents. Labels show the start, midpoint, and end times.
 - **Inline playback** — collapsible `▶ Play` button per row; audio loads lazily via a seekable `/stream/` endpoint with HTTP Range support. Metadata is fetched immediately so the duration is visible without pressing play.
- **Waveform analysis** — on demand per file; computes RMS per 100 ms window and highlights loud sections. Supported for WAV and FLAC (FLAC requires `numpy` + `soundfile`). Pure-Python fallback for WAV when numpy is absent.
+- **Waveform analysis** — on demand per file; computes RMS per 100 ms window and highlights loud sections. Supported for WAV and FLAC (FLAC requires `numpy` + `soundfile`). Pure-Python fallback for WAV when numpy is absent. Results are cached in `recordings/analyses/<filename>.analysis.json`; subsequent requests at the same threshold and min-gap settings return instantly without re-reading the audio. The cache file is deleted automatically when the audio file is deleted. Orphaned cache files (audio deleted outside the UI) are pruned on startup.
 - **Grace period** — configurable in the controls bar (default 2 s). Loud sections separated by less than this gap are merged into one. Raise this (e.g. to 15–30 s) when a single event generates many timestamps due to brief quiet gaps within it.
 - **Timestamp jump** — after analysis, click any loud-section chip to seek the player to that position and pre-fill the cut panel. Use **J** / **K** keyboard shortcuts to jump to the previous / next section while audio is playing.
 - **Cut & download** — `✂ Cut` button opens the player row and reveals a cut panel. Enter start and end times in `m:ss` or `h:mm:ss` format and click **↓ Download cut** to receive an ffmpeg-trimmed copy without re-encoding. Requires ffmpeg (included in the Docker image).
@@ -1,154 +0,0 @@
 # ISR Roadmap
 ## notify.py — NTFY Loudness Notifications
 ### Context
 Street ambience recorder. Goal: detect notable audio events (speech, thunder,
 sustained unusual sounds) in hourly recording files and push a notification via
 a self-hosted NTFY server. Generic short events (car horn, passing vehicle)
 should be filtered out by a minimum section duration.
 ### Design decisions
 | Topic | Decision |
 |---|---|
 | Detection | RMS + minimum section duration filter (KISS — no FFT for now) |
 | Timing | Configurable: `immediate` / `daily` / `both` |
 | Config | `[notify]` section in existing `config.ini` |
 | Code structure | `notify.py` imports `analyze_wav` / `analyze_flac` from `web.py` (DRY) |
 | Source name | Included in notification body; configurable display name per source |
 ---
 ### Config additions (`config.example.ini`)
 Add a `[notify]` section to `config.ini`:
 ```ini
 [notify]
 enabled              = true
 ntfy_url             = https://ntfy.example.com/mytopic   ; full URL incl. topic
 mode                 = immediate   ; immediate | daily | both
 daily_time           = 08:00       ; HH:MM — used in daily and both modes
 debounce_minutes     = 60          ; immediate mode: suppress repeat notifications within this window
 min_section_duration = 2.0         ; seconds — sections shorter than this are ignored (filters car horns etc.)
 min_sections         = 1           ; number of qualifying sections required to trigger a notification
 loudness_threshold   = 0.05        ; RMS 0–1, same scale as web.py analysis threshold
 ```
 Per recording source, add an optional `display_name`:
 ```ini
 [radio1]
 type         = stream
 url          = http://icecast.example.com:8000/live
 display_name = Street mic north   ; shown in notification; defaults to section name [radio1]
 ```
 ---
 ### Notification format
 ```
 Title:  ISR — Notable audio · Street mic north
 Body:   radio1_20260427_0300.wav
        3 notable sections (≥ 2.0 s each)
        → 00:12 – 00:18
        → 01:45 – 01:52
        → 47:03 – 47:11
        Peak RMS: 0.312
 ```
 Daily digest example:
 ```
 Title:  ISR Daily Digest · 2026-04-27
 Body:   Street mic north — 4 files with notable events
        03:00 file · 3 sections (peak 0.312)
        07:00 file · 1 section  (peak 0.091)
        14:00 file · 2 sections (peak 0.204)
        21:00 file · 1 section  (peak 0.178)
 ```
 ---
 ### Implementation plan
 #### Phase 1 — Core
 1. **`config.example.ini`** — add `[notify]` section and `display_name` key to
   source section examples (as shown above).
 2. **`notify.py` — file watcher**
   - Polls `recordings/status.json` every 30 s.
   - Tracks which files were in `active` on the previous poll.
   - When a file disappears from `active` it was just closed → queues it for
     analysis.
   - Skips files with extensions that cannot be analysed (anything other than
     `.wav` / `.flac`).
 3. **`notify.py` — analysis + filter**
   - Imports `analyze_wav` / `analyze_flac` from `web.py`.
   - Applies `loudness_threshold` from `[notify]` config.
   - Filters resulting sections to those with duration ≥ `min_section_duration`.
   - Counts filtered sections against `min_sections` threshold.
 4. **`notify.py` — NTFY HTTP POST**
   - Plain `urllib` POST to `ntfy_url` (no extra dependencies).
   - Sets `Title` and message body as described above.
   - Logs success / failure to stdout.
 #### Phase 2 — Cadence modes
 5. **Immediate mode with debounce**
   - Fires right after the file closes and analysis passes.
   - Persists last-notification timestamp per source to a small
     `notify_state.json` in the recordings directory.
   - Suppresses sending if last notification for that source was within
     `debounce_minutes`.
 6. **Daily digest mode**
   - Appends qualifying events to `notify_log.jsonl` in the recordings
     directory (one JSON line per event: timestamp, source, filename, sections,
     peak RMS).
   - On each poll checks whether `daily_time` has passed today and no digest
     has been sent yet (tracked in `notify_state.json`).
   - Reads all undigested entries from `notify_log.jsonl`, groups by
     `display_name`, sends one notification per source with notable activity.
   - Marks entries as digested.
 7. **Both mode**
   - Immediate path: only fires when peak RMS exceeds a second, higher
     threshold (`alarm_threshold`, default `0.3`; add to `[notify]` config).
   - Daily digest path: fires for everything that passes `min_sections`.
 #### Phase 3 — Integration
 8. **Docker** — optional `notify` service in `docker-compose.yml`:
   ```yaml
   notify:
     build: .
     command: python notify.py
     volumes:
       - ./recordings:/app/recordings
       - ./config.ini:/app/config.ini:ro
     restart: unless-stopped
   ```
 9. **README** — new section documenting `notify.py` usage, config keys, and
   Docker setup.
 ---
 ### Open questions (decide before implementing)
 - **Log rotation**: `notify_log.jsonl` grows indefinitely. Options: cap at N
  days (configurable), cap at N MB, or leave cleanup to the user. No decision
  made yet.
 - **Multiple NTFY topics per source**: current design uses one global topic.
  If per-source topics are ever needed, `ntfy_url` could be moved to the source
  section and override the global one.
 - **FFT / frequency analysis** (future): distinguishing thunder (low rumble,
  50–200 Hz) from speech (300–3000 Hz) from vehicles would reduce false
  positives further. Deferred — requires `numpy` and adds meaningful complexity.
@@ -203,6 +203,33 @@ def analyze_flac(path: Path, window_samples: int = WINDOW_SAMPLES,
    return _package_result(rms_values, framerate, n_frames, window_samples, threshold, min_gap)
 # ---------------------------------------------------------------------------
 # Analysis cache helpers
 # ---------------------------------------------------------------------------
 def _analysis_cache_path(base: Path, audio_path: Path) -> Path:
    rel = audio_path.relative_to(base)
    return base / 'analyses' / rel.parent / (rel.name + '.analysis.json')
 def prune_orphan_analyses(base: Path):
    analyses_dir = base / 'analyses'
    if not analyses_dir.exists():
        return
    removed = 0
    for cache in analyses_dir.rglob('*.analysis.json'):
        rel = cache.relative_to(analyses_dir)
        audio_path = base / rel.parent / rel.name[:-len('.analysis.json')]
        if not audio_path.exists():
            try:
                cache.unlink()
                removed += 1
            except Exception:
                pass
    if removed:
        print(f'Pruned {removed} orphaned analysis cache file(s)')
 # ---------------------------------------------------------------------------
 # File listing
 # ---------------------------------------------------------------------------
@@ -331,6 +358,16 @@ class _Handler(BaseHTTPRequestHandler):
        except Exception:
            pass
        base = Path(self.recordings_dir).resolve()
        cache_path = _analysis_cache_path(base, path)
        try:
            cached = json.loads(cache_path.read_text('utf-8'))
            if cached.get('threshold') == threshold and cached.get('min_gap') == min_gap:
                self._send(200, json.dumps(cached['result']).encode('utf-8'), 'application/json')
                return
        except Exception:
            pass
        ext = path.suffix.lower()
        if ext == '.wav':
            result = analyze_wav(path, threshold=threshold, min_gap=min_gap)
@@ -343,6 +380,14 @@ class _Handler(BaseHTTPRequestHandler):
            self._json_err(400, f'Loudness analysis is not available for {ext} files')
            return
        try:
            cache_path.parent.mkdir(parents=True, exist_ok=True)
            tmp = cache_path.with_suffix('.tmp')
            tmp.write_text(json.dumps({'threshold': threshold, 'min_gap': min_gap, 'result': result}), 'utf-8')
            os.replace(tmp, cache_path)
        except Exception:
            pass
        self._send(200, json.dumps(result).encode('utf-8'), 'application/json')
    def _api_status(self):
@@ -465,6 +510,11 @@ class _Handler(BaseHTTPRequestHandler):
            self._json_err(500, f'Failed to delete: {e}')
            return
        try:
            _analysis_cache_path(Path(self.recordings_dir).resolve(), path).unlink()
        except Exception:
            pass
        self._send(200, json.dumps({'deleted': filename}).encode(), 'application/json')
    def _api_cut(self, qs):
@@ -876,7 +926,7 @@ function seekToSection(idx, filename, startSec, endSec, sectionIdx) {
  activePlayerIdx = idx;
  const audio  = document.getElementById('aud-'+idx);
  const seekTo = Math.max(0, startSec - getPreroll());
-  const doSeek = () => { audio.currentTime = seekTo; };
+  const doSeek = () => { audio.currentTime = seekTo; audio.play().catch(() => {}); };
  if (audio.readyState >= 1) doSeek();
  else audio.addEventListener('loadedmetadata', doSeek, {once: true});
  setCutFields(idx, startSec, endSec);
@@ -1449,7 +1499,7 @@ function jumpToDaySection(si) {
  const audio = document.getElementById('aud-' + fileIdx);
  if (!audio) return;
  const seekTo = Math.max(0, start - getPreroll());
-  const doSeek = () => { audio.currentTime = seekTo; };
+  const doSeek = () => { audio.currentTime = seekTo; audio.play().catch(() => {}); };
  if (audio.readyState >= 1) doSeek();
  else audio.addEventListener('loadedmetadata', doSeek, { once: true });
@@ -1551,6 +1601,8 @@ def main():
    if not rec_dir.exists():
        print(f"Warning: recordings directory '{args.dir}' does not exist yet.")
    prune_orphan_analyses(rec_dir.resolve())
    class Handler(_Handler):
        recordings_dir = str(rec_dir.resolve())
        threshold      = args.threshold
Author	SHA1	Message	Date
admin	eb774a0876	feat: move analysis cache to recordings/analyses/, prune orphans on startup - Cache files now live in recordings/analyses/<filename>.analysis.json (mirroring the relative path for files in subdirectories) rather than alongside each audio file. - _api_delete now removes the corresponding cache file after deleting audio. - prune_orphan_analyses() runs at startup and removes any .analysis.json whose audio file no longer exists (handles files deleted outside the UI). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-02 22:33:26 +02:00
admin	e22c0059f6	feat: cache analysis results alongside audio files After the first analysis of a WAV/FLAC file, the result is written to <filename>.analysis.json next to the audio. Subsequent requests with the same threshold and min_gap parameters return the cached result immediately without re-reading the audio data. The cache is invalidated automatically if either parameter changes. Written via temp-then-replace for thread safety. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-02 22:30:12 +02:00
admin	df32c263bc	chore: remove NTFY notification roadmap Not being pursued — the planning doc just adds noise. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-02 22:29:42 +02:00
admin	af8113ba03	fix: auto-play audio when jumping to a loud section seekToSection and jumpToDaySection were only seeking (setting currentTime) but never calling play(), so the player would open and position correctly but stay paused. The loadedmetadata-deferred path already handles slow audio loading; play() is now called there too. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-02 22:28:20 +02:00