CIS490/docs/dashboard-request-scenes-7-8-12.md
Max c2a71de4b2 scene 9 bars: paint full zoo + 0–1 visible scale
- multi_model_metrics: publish gbt / mlp / cnn / knn_semi /
  gru / lstm / bert (knn handled by knn streamer); read both
  *_train.json and *_eval.json with macro_f1.point fallback
- dashboard.css: add palette gradients for the four
  non-canonical names so the bars render with a fill colour
- dashboard.js: open the bar's visible scale to the full 0–1
  range so honest-low cross-host F1s show as a bar instead of
  clamping to 0%
- ship lambda-live-detection-loop.py + dashboard request docs
  (scenes 7/8/12, sticky cache, lambda-inference-demo)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:18:00 -05:00

180 lines
6.5 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Dashboard request — scenes 7, 8, 12 visibility fixes
**Audience:** dashboard session (owns `training/dashboard/`).
**Producer side (this session):**
* `training/producers/multi_model_metrics.py` — publishes
`ModelMetric` and `ModelPerf` for **gbt, mlp, cnn, knn_semi, gru,
lstm, bert** (every 5 s)
* `training/producers/knn.py stream` — publishes `ModelMetric`+
`ModelPerf` for **knn**
* Lambda-side `scripts/lambda-live-detection-loop.py` — publishes
`LiveDetection` **and now also `Prediction`** events per inference
window
All confirmed delivering (`{"delivered":N}` from `/publish`).
Visibility issues are all in `training/dashboard/static/dashboard.js`.
The user has flagged this twice now: scene 7 (chunking) and scene 9
(model bars) are not showing real-data state in deck mode. The events
exist; the widgets just don't render them. **This is the blocker
for the talk.**
---
## Scene 7 — chunking timeline (`#chunk-row`)
**Problem.** Cells are only built inside `buildExample()`, which is wired
to `demo_start`. The `prediction` handler can only update existing
cells:
```js
on('prediction', m => {
if (typeof m.window_idx !== 'number') return;
const cells = rowEl.querySelectorAll('.chunk-cell');
const cell = cells[m.window_idx];
if (!cell) return; // ← always falls through if no demo
...
});
```
If a real `prediction` event arrives without `demo_start` having
fired first, `cells.length === 0` and the event is silently dropped.
**Why we can't just publish `demo_start` from this side.** It has
destructive side-effects on other scenes: scene-9 (KNN scatter)
loads synthetic data on `demo_start`, scene-attack profile loads
synthetic curves on `demo_start`, etc. We tried this once and
clobbered the live KNN scatter.
**Fix request.** Lazy cell-build inside the `prediction` handler when
no cells exist yet:
```js
on('prediction', m => {
if (typeof m.window_idx !== 'number') return;
if (rowEl.children.length === 0 || rowEl.querySelector('.chunk-empty')) {
// Build N empty cells on first prediction. Width grows lazily.
rowEl.innerHTML = '';
ruleEl.innerHTML = '';
axisEl.innerHTML = '';
}
// Ensure cell at index exists; pad with empty cells up to window_idx.
let cells = rowEl.querySelectorAll('.chunk-cell');
while (cells.length <= m.window_idx) {
const c = document.createElement('div');
c.className = 'chunk-cell';
c.textContent = '';
rowEl.appendChild(c);
ruleEl.appendChild(Object.assign(
document.createElement('div'), { className: 'tick' }));
const t = document.createElement('span');
t.textContent = `${cells.length * 10}s`;
axisEl.appendChild(t);
cells = rowEl.querySelectorAll('.chunk-cell');
}
const cell = cells[m.window_idx];
const phase = m.predicted || m.actual;
if (!phase) return;
cell.className = `chunk-cell ${phase}`;
cell.textContent = phase.replace('_', ' ');
});
```
This keeps `demo_start`/`demo_stop` working and additionally lights up
the row from real `prediction` events.
If the Lambda producer re-runs episodes from window 0, you may also
want a reset on `prediction` events with `window_idx === 0` (clear all
cells, rebuild fresh). We can publish a `prediction_reset` event too
if you'd prefer an explicit signal — let us know.
---
## Scene 8 — model accuracy bars (`.model-row`)
**Problem.** The bar fill formula compresses to nothing for any
F1 < 0.5:
```js
const visiblePct = Math.max(0, Math.min(1, (acc - 0.5) / 0.5)) * 100;
```
Our trained models on the cross-device test split honestly land in
0.300.55 range (this is the **point** of held-out-by-host evaluation
real generalization is hard). With the current scale, half the bars
render as 0% wide and look like there's no data flowing.
**Fix request.** Either:
(a) Use the full 01 range so a 0.35-F1 bar is still visibly 35% filled:
```js
const visiblePct = Math.max(0, Math.min(1, acc)) * 100;
```
(b) Or add the numeric F1 next to the empty-looking bars (we already
publish it in `accuracy`); the right-hand `.model-acc` element does
already render `acc.toFixed(3)` so this may already be readable
verify that's still being shown when fill is 0%.
We strongly prefer (a). Hiding 0.30-F1 models behind a 0% bar tells the
user "no data" when the truth is "the model is honestly not great
under cross-host generalization." That's the headline finding.
---
## Scene 12 — accuracy vs inference cost scatter
**Problem A: y-axis range.** y is clamped to `[0.7, 1.0]` (or similar
high range). Every model with F1 < 0.7 stacks on the bottom edge.
**Fix.** Open the y-axis to `[0.0, 1.0]` (or auto-fit to the published
range with a small margin). The chart's whole point is "model honesty
under cross-device shift" letting bad models show as bad is the
right answer.
**Problem B: overlapping labels.** Multiple points at the same
y-coordinate (especially when stacked at the floor) draw their model
name labels on top of each other. We've already shortened the
displayed names producer-side (`gbt-O`, `mlp-R`, `knns-O`, `trf-R`,
etc., max 6 chars). That helps but doesn't fully solve it when 5+
points cluster.
**Fix request, pick whichever is easiest:**
1. Skip label rendering when point density is high (only label points
that are local extrema, e.g. best F1, lowest latency, or
non-Pareto-dominated points).
2. Offset overlapping labels with a force layout (`d3-force` style) or
even just a fixed alternating up/down/left/right pattern.
3. Show labels only on hover, with a small dot-only render at rest.
Option (3) is the cleanest visually and matches how most real "model
zoo" scatters render in papers.
---
## Verification after dashboard JS lands
Producer side keeps publishing on these channels (already running on
the Pi + Lambda):
- `prediction` (scene 7) once Lambda producer is re-pointed at
scene 7 events, see request below
- `model_metric` + `model_perf` (scenes 8, 12) every 30 s from
`multi_model_metrics.py` on the Pi
- `live_detection` (scene-live) continuously from Lambda
Open the dashboard, watch each scene. Empty-state placeholders should
disappear within ~30 s of page load.
---
## Side note for scene 7 — currently no `prediction` events flow
The Lambda producer (`live_detection_loop_v2.py`) currently emits
`live_detection` events for the scene-live swim lanes. If you want
scene 7 lit up with the same data, we can mirror per-window output to
the `prediction` event type as well say the word and we'll add a
second emit. Doing that without the lazy-cell-build above accomplishes
nothing on the dashboard, so let us wait on this until the JS lands.