Max c2a71de4b2 scene 9 bars: paint full zoo + 0–1 visible scale

- multi_model_metrics: publish gbt / mlp / cnn / knn_semi /
  gru / lstm / bert (knn handled by knn streamer); read both
  *_train.json and *_eval.json with macro_f1.point fallback
- dashboard.css: add palette gradients for the four
  non-canonical names so the bars render with a fill colour
- dashboard.js: open the bar's visible scale to the full 0–1
  range so honest-low cross-host F1s show as a bar instead of
  clamping to 0%
- ship lambda-live-detection-loop.py + dashboard request docs
  (scenes 7/8/12, sticky cache, lambda-inference-demo)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-08 17:18:00 -05:00

6.5 KiB

Raw Blame History

Dashboard request — scenes 7, 8, 12 visibility fixes

Audience: dashboard session (owns training/dashboard/). Producer side (this session):

training/producers/multi_model_metrics.py — publishes ModelMetric and ModelPerf for gbt, mlp, cnn, knn_semi, gru, lstm, bert (every 5 s)
training/producers/knn.py stream — publishes ModelMetric+ ModelPerf for knn
Lambda-side scripts/lambda-live-detection-loop.py — publishes LiveDetection and now also Prediction events per inference window

All confirmed delivering ({"delivered":N} from /publish). Visibility issues are all in training/dashboard/static/dashboard.js.

The user has flagged this twice now: scene 7 (chunking) and scene 9 (model bars) are not showing real-data state in deck mode. The events exist; the widgets just don't render them. This is the blocker for the talk.

Scene 7 — chunking timeline (`#chunk-row`)

Problem. Cells are only built inside buildExample(), which is wired to demo_start. The prediction handler can only update existing cells:

on('prediction', m => {
  if (typeof m.window_idx !== 'number') return;
  const cells = rowEl.querySelectorAll('.chunk-cell');
  const cell = cells[m.window_idx];
  if (!cell) return;            // ← always falls through if no demo
  ...
});

If a real prediction event arrives without demo_start having fired first, cells.length === 0 and the event is silently dropped.

Why we can't just publish demo_start from this side. It has destructive side-effects on other scenes: scene-9 (KNN scatter) loads synthetic data on demo_start, scene-attack profile loads synthetic curves on demo_start, etc. We tried this once and clobbered the live KNN scatter.

Fix request. Lazy cell-build inside the prediction handler when no cells exist yet:

on('prediction', m => {
  if (typeof m.window_idx !== 'number') return;
  if (rowEl.children.length === 0 || rowEl.querySelector('.chunk-empty')) {
    // Build N empty cells on first prediction. Width grows lazily.
    rowEl.innerHTML = '';
    ruleEl.innerHTML = '';
    axisEl.innerHTML = '';
  }
  // Ensure cell at index exists; pad with empty cells up to window_idx.
  let cells = rowEl.querySelectorAll('.chunk-cell');
  while (cells.length <= m.window_idx) {
    const c = document.createElement('div');
    c.className = 'chunk-cell';
    c.textContent = '';
    rowEl.appendChild(c);
    ruleEl.appendChild(Object.assign(
      document.createElement('div'), { className: 'tick' }));
    const t = document.createElement('span');
    t.textContent = `${cells.length * 10}s`;
    axisEl.appendChild(t);
    cells = rowEl.querySelectorAll('.chunk-cell');
  }
  const cell = cells[m.window_idx];
  const phase = m.predicted || m.actual;
  if (!phase) return;
  cell.className = `chunk-cell ${phase}`;
  cell.textContent = phase.replace('_', ' ');
});

This keeps demo_start/demo_stop working and additionally lights up the row from real prediction events.

If the Lambda producer re-runs episodes from window 0, you may also want a reset on prediction events with window_idx === 0 (clear all cells, rebuild fresh). We can publish a prediction_reset event too if you'd prefer an explicit signal — let us know.

Scene 8 — model accuracy bars (`.model-row`)

Problem. The bar fill formula compresses to nothing for any F1 < 0.5:

const visiblePct = Math.max(0, Math.min(1, (acc - 0.5) / 0.5)) * 100;

Our trained models on the cross-device test split honestly land in 0.30–0.55 range (this is the point of held-out-by-host evaluation — real generalization is hard). With the current scale, ≥ half the bars render as 0% wide and look like there's no data flowing.

Fix request. Either:

(a) Use the full 0–1 range so a 0.35-F1 bar is still visibly 35% filled:

const visiblePct = Math.max(0, Math.min(1, acc)) * 100;

(b) Or add the numeric F1 next to the empty-looking bars (we already publish it in accuracy); the right-hand .model-acc element does already render acc.toFixed(3) so this may already be readable — verify that's still being shown when fill is 0%.

We strongly prefer (a). Hiding 0.30-F1 models behind a 0% bar tells the user "no data" when the truth is "the model is honestly not great under cross-host generalization." That's the headline finding.

Scene 12 — accuracy vs inference cost scatter

Problem A: y-axis range. y is clamped to [0.7, 1.0] (or similar high range). Every model with F1 < 0.7 stacks on the bottom edge.

Fix. Open the y-axis to [0.0, 1.0] (or auto-fit to the published range with a small margin). The chart's whole point is "model honesty under cross-device shift" — letting bad models show as bad is the right answer.

Problem B: overlapping labels. Multiple points at the same y-coordinate (especially when stacked at the floor) draw their model name labels on top of each other. We've already shortened the displayed names producer-side (gbt-O, mlp-R, knns-O, trf-R, etc., max 6 chars). That helps but doesn't fully solve it when 5+ points cluster.

Fix request, pick whichever is easiest:

Skip label rendering when point density is high (only label points that are local extrema, e.g. best F1, lowest latency, or non-Pareto-dominated points).
Offset overlapping labels with a force layout (d3-force style) or even just a fixed alternating up/down/left/right pattern.
Show labels only on hover, with a small dot-only render at rest.

Option (3) is the cleanest visually and matches how most real "model zoo" scatters render in papers.

Verification after dashboard JS lands

Producer side keeps publishing on these channels (already running on the Pi + Lambda):

prediction (scene 7) — once Lambda producer is re-pointed at scene 7 events, see request below
model_metric + model_perf (scenes 8, 12) — every 30 s from multi_model_metrics.py on the Pi
live_detection (scene-live) — continuously from Lambda

Open the dashboard, watch each scene. Empty-state placeholders should disappear within ~30 s of page load.

Side note for scene 7 — currently no `prediction` events flow

The Lambda producer (live_detection_loop_v2.py) currently emits live_detection events for the scene-live swim lanes. If you want scene 7 lit up with the same data, we can mirror per-window output to the prediction event type as well — say the word and we'll add a second emit. Doing that without the lazy-cell-build above accomplishes nothing on the dashboard, so let us wait on this until the JS lands.

6.5 KiB Raw Blame History Unescape Escape