CIS490/docs/dashboard-request-sticky-cache.md

# Dashboard request — sticky cache for slowly-changing event types

**Audience:** dashboard session (owns `training/dashboard/`).
**Producer side:** `training/producers/multi_model_metrics.py`
(scenes 9 + 12), `training/producers/knn.py stream` (scene 11),
Lambda-side `live_detection_loop_v2.py` (scene 13).

## Problem

The broadcaster fans events out to **currently-connected** browsers
only. Reconnects (page refresh, second tab opening, mid-talk page
reload) see empty widgets until the next producer tick rebroadcasts.
The user has explicitly flagged this as a bug:

> "Your functions need to be more stateful, when we call your data it
> needs to be available right away. For the streaming data, when we
> call a new page it needs to connect correctly."

The broadcaster already does sticky caching for some keys — its
`/healthz` reports cached state under `host_counts`, `phase_mix`,
`recent_episodes`, `total_alerts`, `total_bytes`, `total_episodes`.
What's missing is sticky caching for the model + scatter + embedding
event types.

## Producer-side band-aid (already in place)

We've shortened the multi_model_metrics tick from 20 s → **5 s** so
worst-case-stale-on-reconnect drops to ~5 s. That's acceptable for
the talk but not the right architecture — at 5 s × 4 events × 2
event types we're spending bandwidth and CPU on retransmits the
broadcaster could just remember.

## Asks

Please add sticky caching to the broadcaster for these event types:

| event type        | scene | key                | TTL   | replay-on-connect? |
|-------------------|-------|--------------------|-------|---------------------|
| `model_metric`    | 9     | one entry per `model` (last value wins) | none  | yes |
| `model_perf`      | 12    | one entry per `model` (last value wins) | none  | yes |
| `live_detection`  | 13    | a small ring buffer, e.g. last 60 events globally (or last 12 per host_id) | none | yes |
| `embedding`       | 11    | one snapshot — see companion request `dashboard-request-knn-cap-evict.md` for the snapshot-replace pattern | none | yes |
| `attack_profile`  | 7     | one entry per `name` (last curve wins) | none | yes |
| `prediction`      | 8     | one entry per `(episode_id, window_idx)` last value wins | none | yes |

Implementation suggestion: extend the broadcaster's existing
state-keys cache with a per-event-type "sticky map." On new client
connect, replay the cache before any live event reaches the new
client.

For `live_detection` the right structure is a ring-buffer (60 cells
per lane match the widget's DOM cap; replaying 60 newest events lets
a new browser paint the lanes immediately).

## Verification

After this lands, our producers can drop their republish cadence
back to a sane 30 s + on-change-only, and a cold page-load on
`dashboard.wg` paints scenes 9, 11, 12, 13 within one frame.

We'll also drop the 5 s tick on `multi_model_metrics` once we
verify replay works.