- multi_model_metrics: publish gbt / mlp / cnn / knn_semi / gru / lstm / bert (knn handled by knn streamer); read both *_train.json and *_eval.json with macro_f1.point fallback - dashboard.css: add palette gradients for the four non-canonical names so the bars render with a fill colour - dashboard.js: open the bar's visible scale to the full 0–1 range so honest-low cross-host F1s show as a bar instead of clamping to 0% - ship lambda-live-detection-loop.py + dashboard request docs (scenes 7/8/12, sticky cache, lambda-inference-demo) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
180 lines
6.5 KiB
Markdown
180 lines
6.5 KiB
Markdown
# Dashboard request — scenes 7, 8, 12 visibility fixes
|
||
|
||
**Audience:** dashboard session (owns `training/dashboard/`).
|
||
**Producer side (this session):**
|
||
* `training/producers/multi_model_metrics.py` — publishes
|
||
`ModelMetric` and `ModelPerf` for **gbt, mlp, cnn, knn_semi, gru,
|
||
lstm, bert** (every 5 s)
|
||
* `training/producers/knn.py stream` — publishes `ModelMetric`+
|
||
`ModelPerf` for **knn**
|
||
* Lambda-side `scripts/lambda-live-detection-loop.py` — publishes
|
||
`LiveDetection` **and now also `Prediction`** events per inference
|
||
window
|
||
|
||
All confirmed delivering (`{"delivered":N}` from `/publish`).
|
||
Visibility issues are all in `training/dashboard/static/dashboard.js`.
|
||
|
||
The user has flagged this twice now: scene 7 (chunking) and scene 9
|
||
(model bars) are not showing real-data state in deck mode. The events
|
||
exist; the widgets just don't render them. **This is the blocker
|
||
for the talk.**
|
||
|
||
---
|
||
|
||
## Scene 7 — chunking timeline (`#chunk-row`)
|
||
|
||
**Problem.** Cells are only built inside `buildExample()`, which is wired
|
||
to `demo_start`. The `prediction` handler can only update existing
|
||
cells:
|
||
|
||
```js
|
||
on('prediction', m => {
|
||
if (typeof m.window_idx !== 'number') return;
|
||
const cells = rowEl.querySelectorAll('.chunk-cell');
|
||
const cell = cells[m.window_idx];
|
||
if (!cell) return; // ← always falls through if no demo
|
||
...
|
||
});
|
||
```
|
||
|
||
If a real `prediction` event arrives without `demo_start` having
|
||
fired first, `cells.length === 0` and the event is silently dropped.
|
||
|
||
**Why we can't just publish `demo_start` from this side.** It has
|
||
destructive side-effects on other scenes: scene-9 (KNN scatter)
|
||
loads synthetic data on `demo_start`, scene-attack profile loads
|
||
synthetic curves on `demo_start`, etc. We tried this once and
|
||
clobbered the live KNN scatter.
|
||
|
||
**Fix request.** Lazy cell-build inside the `prediction` handler when
|
||
no cells exist yet:
|
||
|
||
```js
|
||
on('prediction', m => {
|
||
if (typeof m.window_idx !== 'number') return;
|
||
if (rowEl.children.length === 0 || rowEl.querySelector('.chunk-empty')) {
|
||
// Build N empty cells on first prediction. Width grows lazily.
|
||
rowEl.innerHTML = '';
|
||
ruleEl.innerHTML = '';
|
||
axisEl.innerHTML = '';
|
||
}
|
||
// Ensure cell at index exists; pad with empty cells up to window_idx.
|
||
let cells = rowEl.querySelectorAll('.chunk-cell');
|
||
while (cells.length <= m.window_idx) {
|
||
const c = document.createElement('div');
|
||
c.className = 'chunk-cell';
|
||
c.textContent = '';
|
||
rowEl.appendChild(c);
|
||
ruleEl.appendChild(Object.assign(
|
||
document.createElement('div'), { className: 'tick' }));
|
||
const t = document.createElement('span');
|
||
t.textContent = `${cells.length * 10}s`;
|
||
axisEl.appendChild(t);
|
||
cells = rowEl.querySelectorAll('.chunk-cell');
|
||
}
|
||
const cell = cells[m.window_idx];
|
||
const phase = m.predicted || m.actual;
|
||
if (!phase) return;
|
||
cell.className = `chunk-cell ${phase}`;
|
||
cell.textContent = phase.replace('_', ' ');
|
||
});
|
||
```
|
||
|
||
This keeps `demo_start`/`demo_stop` working and additionally lights up
|
||
the row from real `prediction` events.
|
||
|
||
If the Lambda producer re-runs episodes from window 0, you may also
|
||
want a reset on `prediction` events with `window_idx === 0` (clear all
|
||
cells, rebuild fresh). We can publish a `prediction_reset` event too
|
||
if you'd prefer an explicit signal — let us know.
|
||
|
||
---
|
||
|
||
## Scene 8 — model accuracy bars (`.model-row`)
|
||
|
||
**Problem.** The bar fill formula compresses to nothing for any
|
||
F1 < 0.5:
|
||
|
||
```js
|
||
const visiblePct = Math.max(0, Math.min(1, (acc - 0.5) / 0.5)) * 100;
|
||
```
|
||
|
||
Our trained models on the cross-device test split honestly land in
|
||
0.30–0.55 range (this is the **point** of held-out-by-host evaluation —
|
||
real generalization is hard). With the current scale, ≥ half the bars
|
||
render as 0% wide and look like there's no data flowing.
|
||
|
||
**Fix request.** Either:
|
||
|
||
(a) Use the full 0–1 range so a 0.35-F1 bar is still visibly 35% filled:
|
||
|
||
```js
|
||
const visiblePct = Math.max(0, Math.min(1, acc)) * 100;
|
||
```
|
||
|
||
(b) Or add the numeric F1 next to the empty-looking bars (we already
|
||
publish it in `accuracy`); the right-hand `.model-acc` element does
|
||
already render `acc.toFixed(3)` so this may already be readable —
|
||
verify that's still being shown when fill is 0%.
|
||
|
||
We strongly prefer (a). Hiding 0.30-F1 models behind a 0% bar tells the
|
||
user "no data" when the truth is "the model is honestly not great
|
||
under cross-host generalization." That's the headline finding.
|
||
|
||
---
|
||
|
||
## Scene 12 — accuracy vs inference cost scatter
|
||
|
||
**Problem A: y-axis range.** y is clamped to `[0.7, 1.0]` (or similar
|
||
high range). Every model with F1 < 0.7 stacks on the bottom edge.
|
||
|
||
**Fix.** Open the y-axis to `[0.0, 1.0]` (or auto-fit to the published
|
||
range with a small margin). The chart's whole point is "model honesty
|
||
under cross-device shift" — letting bad models show as bad is the
|
||
right answer.
|
||
|
||
**Problem B: overlapping labels.** Multiple points at the same
|
||
y-coordinate (especially when stacked at the floor) draw their model
|
||
name labels on top of each other. We've already shortened the
|
||
displayed names producer-side (`gbt-O`, `mlp-R`, `knns-O`, `trf-R`,
|
||
etc., max 6 chars). That helps but doesn't fully solve it when 5+
|
||
points cluster.
|
||
|
||
**Fix request, pick whichever is easiest:**
|
||
|
||
1. Skip label rendering when point density is high (only label points
|
||
that are local extrema, e.g. best F1, lowest latency, or
|
||
non-Pareto-dominated points).
|
||
2. Offset overlapping labels with a force layout (`d3-force` style) or
|
||
even just a fixed alternating up/down/left/right pattern.
|
||
3. Show labels only on hover, with a small dot-only render at rest.
|
||
|
||
Option (3) is the cleanest visually and matches how most real "model
|
||
zoo" scatters render in papers.
|
||
|
||
---
|
||
|
||
## Verification after dashboard JS lands
|
||
|
||
Producer side keeps publishing on these channels (already running on
|
||
the Pi + Lambda):
|
||
|
||
- `prediction` (scene 7) — once Lambda producer is re-pointed at
|
||
scene 7 events, see request below
|
||
- `model_metric` + `model_perf` (scenes 8, 12) — every 30 s from
|
||
`multi_model_metrics.py` on the Pi
|
||
- `live_detection` (scene-live) — continuously from Lambda
|
||
|
||
Open the dashboard, watch each scene. Empty-state placeholders should
|
||
disappear within ~30 s of page load.
|
||
|
||
---
|
||
|
||
## Side note for scene 7 — currently no `prediction` events flow
|
||
|
||
The Lambda producer (`live_detection_loop_v2.py`) currently emits
|
||
`live_detection` events for the scene-live swim lanes. If you want
|
||
scene 7 lit up with the same data, we can mirror per-window output to
|
||
the `prediction` event type as well — say the word and we'll add a
|
||
second emit. Doing that without the lazy-cell-build above accomplishes
|
||
nothing on the dashboard, so let us wait on this until the JS lands.
|