- multi_model_metrics: publish gbt / mlp / cnn / knn_semi / gru / lstm / bert (knn handled by knn streamer); read both *_train.json and *_eval.json with macro_f1.point fallback - dashboard.css: add palette gradients for the four non-canonical names so the bars render with a fill colour - dashboard.js: open the bar's visible scale to the full 0–1 range so honest-low cross-host F1s show as a bar instead of clamping to 0% - ship lambda-live-detection-loop.py + dashboard request docs (scenes 7/8/12, sticky cache, lambda-inference-demo) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6.5 KiB
Dashboard request — scenes 7, 8, 12 visibility fixes
Audience: dashboard session (owns training/dashboard/).
Producer side (this session):
training/producers/multi_model_metrics.py— publishesModelMetricandModelPerffor gbt, mlp, cnn, knn_semi, gru, lstm, bert (every 5 s)training/producers/knn.py stream— publishesModelMetric+ModelPerffor knn- Lambda-side
scripts/lambda-live-detection-loop.py— publishesLiveDetectionand now alsoPredictionevents per inference window
All confirmed delivering ({"delivered":N} from /publish).
Visibility issues are all in training/dashboard/static/dashboard.js.
The user has flagged this twice now: scene 7 (chunking) and scene 9 (model bars) are not showing real-data state in deck mode. The events exist; the widgets just don't render them. This is the blocker for the talk.
Scene 7 — chunking timeline (#chunk-row)
Problem. Cells are only built inside buildExample(), which is wired
to demo_start. The prediction handler can only update existing
cells:
on('prediction', m => {
if (typeof m.window_idx !== 'number') return;
const cells = rowEl.querySelectorAll('.chunk-cell');
const cell = cells[m.window_idx];
if (!cell) return; // ← always falls through if no demo
...
});
If a real prediction event arrives without demo_start having
fired first, cells.length === 0 and the event is silently dropped.
Why we can't just publish demo_start from this side. It has
destructive side-effects on other scenes: scene-9 (KNN scatter)
loads synthetic data on demo_start, scene-attack profile loads
synthetic curves on demo_start, etc. We tried this once and
clobbered the live KNN scatter.
Fix request. Lazy cell-build inside the prediction handler when
no cells exist yet:
on('prediction', m => {
if (typeof m.window_idx !== 'number') return;
if (rowEl.children.length === 0 || rowEl.querySelector('.chunk-empty')) {
// Build N empty cells on first prediction. Width grows lazily.
rowEl.innerHTML = '';
ruleEl.innerHTML = '';
axisEl.innerHTML = '';
}
// Ensure cell at index exists; pad with empty cells up to window_idx.
let cells = rowEl.querySelectorAll('.chunk-cell');
while (cells.length <= m.window_idx) {
const c = document.createElement('div');
c.className = 'chunk-cell';
c.textContent = '';
rowEl.appendChild(c);
ruleEl.appendChild(Object.assign(
document.createElement('div'), { className: 'tick' }));
const t = document.createElement('span');
t.textContent = `${cells.length * 10}s`;
axisEl.appendChild(t);
cells = rowEl.querySelectorAll('.chunk-cell');
}
const cell = cells[m.window_idx];
const phase = m.predicted || m.actual;
if (!phase) return;
cell.className = `chunk-cell ${phase}`;
cell.textContent = phase.replace('_', ' ');
});
This keeps demo_start/demo_stop working and additionally lights up
the row from real prediction events.
If the Lambda producer re-runs episodes from window 0, you may also
want a reset on prediction events with window_idx === 0 (clear all
cells, rebuild fresh). We can publish a prediction_reset event too
if you'd prefer an explicit signal — let us know.
Scene 8 — model accuracy bars (.model-row)
Problem. The bar fill formula compresses to nothing for any F1 < 0.5:
const visiblePct = Math.max(0, Math.min(1, (acc - 0.5) / 0.5)) * 100;
Our trained models on the cross-device test split honestly land in 0.30–0.55 range (this is the point of held-out-by-host evaluation — real generalization is hard). With the current scale, ≥ half the bars render as 0% wide and look like there's no data flowing.
Fix request. Either:
(a) Use the full 0–1 range so a 0.35-F1 bar is still visibly 35% filled:
const visiblePct = Math.max(0, Math.min(1, acc)) * 100;
(b) Or add the numeric F1 next to the empty-looking bars (we already
publish it in accuracy); the right-hand .model-acc element does
already render acc.toFixed(3) so this may already be readable —
verify that's still being shown when fill is 0%.
We strongly prefer (a). Hiding 0.30-F1 models behind a 0% bar tells the user "no data" when the truth is "the model is honestly not great under cross-host generalization." That's the headline finding.
Scene 12 — accuracy vs inference cost scatter
Problem A: y-axis range. y is clamped to [0.7, 1.0] (or similar
high range). Every model with F1 < 0.7 stacks on the bottom edge.
Fix. Open the y-axis to [0.0, 1.0] (or auto-fit to the published
range with a small margin). The chart's whole point is "model honesty
under cross-device shift" — letting bad models show as bad is the
right answer.
Problem B: overlapping labels. Multiple points at the same
y-coordinate (especially when stacked at the floor) draw their model
name labels on top of each other. We've already shortened the
displayed names producer-side (gbt-O, mlp-R, knns-O, trf-R,
etc., max 6 chars). That helps but doesn't fully solve it when 5+
points cluster.
Fix request, pick whichever is easiest:
- Skip label rendering when point density is high (only label points that are local extrema, e.g. best F1, lowest latency, or non-Pareto-dominated points).
- Offset overlapping labels with a force layout (
d3-forcestyle) or even just a fixed alternating up/down/left/right pattern. - Show labels only on hover, with a small dot-only render at rest.
Option (3) is the cleanest visually and matches how most real "model zoo" scatters render in papers.
Verification after dashboard JS lands
Producer side keeps publishing on these channels (already running on the Pi + Lambda):
prediction(scene 7) — once Lambda producer is re-pointed at scene 7 events, see request belowmodel_metric+model_perf(scenes 8, 12) — every 30 s frommulti_model_metrics.pyon the Pilive_detection(scene-live) — continuously from Lambda
Open the dashboard, watch each scene. Empty-state placeholders should disappear within ~30 s of page load.
Side note for scene 7 — currently no prediction events flow
The Lambda producer (live_detection_loop_v2.py) currently emits
live_detection events for the scene-live swim lanes. If you want
scene 7 lit up with the same data, we can mirror per-window output to
the prediction event type as well — say the word and we'll add a
second emit. Doing that without the lazy-cell-build above accomplishes
nothing on the dashboard, so let us wait on this until the JS lands.