CIS490/docs/dashboard-request-sticky-cache.md
Max c2a71de4b2 scene 9 bars: paint full zoo + 0–1 visible scale
- multi_model_metrics: publish gbt / mlp / cnn / knn_semi /
  gru / lstm / bert (knn handled by knn streamer); read both
  *_train.json and *_eval.json with macro_f1.point fallback
- dashboard.css: add palette gradients for the four
  non-canonical names so the bars render with a fill colour
- dashboard.js: open the bar's visible scale to the full 0–1
  range so honest-low cross-host F1s show as a bar instead of
  clamping to 0%
- ship lambda-live-detection-loop.py + dashboard request docs
  (scenes 7/8/12, sticky cache, lambda-inference-demo)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:18:00 -05:00

2.9 KiB
Raw Permalink Blame History

Dashboard request — sticky cache for slowly-changing event types

Audience: dashboard session (owns training/dashboard/). Producer side: training/producers/multi_model_metrics.py (scenes 9 + 12), training/producers/knn.py stream (scene 11), Lambda-side live_detection_loop_v2.py (scene 13).

Problem

The broadcaster fans events out to currently-connected browsers only. Reconnects (page refresh, second tab opening, mid-talk page reload) see empty widgets until the next producer tick rebroadcasts. The user has explicitly flagged this as a bug:

"Your functions need to be more stateful, when we call your data it needs to be available right away. For the streaming data, when we call a new page it needs to connect correctly."

The broadcaster already does sticky caching for some keys — its /healthz reports cached state under host_counts, phase_mix, recent_episodes, total_alerts, total_bytes, total_episodes. What's missing is sticky caching for the model + scatter + embedding event types.

Producer-side band-aid (already in place)

We've shortened the multi_model_metrics tick from 20 s → 5 s so worst-case-stale-on-reconnect drops to ~5 s. That's acceptable for the talk but not the right architecture — at 5 s × 4 events × 2 event types we're spending bandwidth and CPU on retransmits the broadcaster could just remember.

Asks

Please add sticky caching to the broadcaster for these event types:

event type scene key TTL replay-on-connect?
model_metric 9 one entry per model (last value wins) none yes
model_perf 12 one entry per model (last value wins) none yes
live_detection 13 a small ring buffer, e.g. last 60 events globally (or last 12 per host_id) none yes
embedding 11 one snapshot — see companion request dashboard-request-knn-cap-evict.md for the snapshot-replace pattern none yes
attack_profile 7 one entry per name (last curve wins) none yes
prediction 8 one entry per (episode_id, window_idx) last value wins none yes

Implementation suggestion: extend the broadcaster's existing state-keys cache with a per-event-type "sticky map." On new client connect, replay the cache before any live event reaches the new client.

For live_detection the right structure is a ring-buffer (60 cells per lane match the widget's DOM cap; replaying 60 newest events lets a new browser paint the lanes immediately).

Verification

After this lands, our producers can drop their republish cadence back to a sane 30 s + on-change-only, and a cold page-load on dashboard.wg paints scenes 9, 11, 12, 13 within one frame.

We'll also drop the 5 s tick on multi_model_metrics once we verify replay works.