History

Max 1fabd4a246 training: validator, feature/tensor extractors, 6 supervised models, schema-hashed checkpoints, eval suite, dashboard producers The model layer of the project, built honestly: - tools/dataset_validate.py — full-sweep validator over the receiver store (sha256, schema, monotonic labels, telemetry-row gate). On the current corpus: 64,798 accepted + 8,154 degraded + 3,701 rejected + 7 errored across 76,660 shipped episodes. data/processed/validation_v1.parquet is committed as the per-episode acceptance index. - training/_features.py — channel registry (46 channels across proc/guest/qmp/netflow), summary-stat windowing AND channel×time tensor extraction at 10s/5s windowing. Time alignment uses t_wall_ns (Unix ns) — tested fix for a real netflow-vs-host clock-base inconsistency that was silently dropping every netflow channel. - training/_split.py — three held-out recipes (host / sample / time) with profile-stratification assertions. held_out_host carries untested_profiles for cases like scan-and-dial absent from the test host (5 of 6 profiles tested cross-device, never silently averaged). - training/models/ — 6 architectures behind a common BaseModel interface: gbt (XGBoost), mlp, cnn, gru, lstm, transformer. Each trained twice (realistic / oracle) per the deployment threat model. Schema-hashed checkpoints refuse to load if _features.py changed since training (silent-input-drift protection, tested). - training/trainer/ — unified training loop: class-weighted CE, LR warmup + cosine, gradient clipping, mixed precision when CUDA, early stopping on val macro F1, best-on-val checkpoint. Same loop runs MLP/CNN/GRU/LSTM/Transformer; GBT uses XGBoost early_stopping_rounds on val mlogloss. - training/eval_/ — bootstrap 95% CIs on macro F1, per-class F1, per-profile and per-host breakdown, paired-bootstrap significance for model-vs-model gap. Confusion matrix uses union of seen labels. - training/dashboard/producers/ — replay/metrics/perf/profiles emitting the six event types the dashboard's awaiting scenes consume; on-demand tensor extraction so the Pi can run live inference without 65 GB of shards. - 17 unit tests (split coverage, features round-trip, schema mismatch, determinism, time-base alignment regression). End-to-end smoke-trained all six on a 567-episode subset; held-out test macro F1 reported with paired-bootstrap significance. The methodology now reports honest cross-device generalization, not in-distribution validation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-05-08 01:19:00 -05:00
..
producers	training: validator, feature/tensor extractors, 6 supervised models, schema-hashed checkpoints, eval suite, dashboard producers	2026-05-08 01:19:00 -05:00
static	training/dashboard: click a db row → render the episode envelope	2026-05-08 01:16:54 -05:00
__init__.py	training/dashboard: live deck at dashboard.wg, fed by receiver	2026-05-07 21:26:07 -05:00
__main__.py	training/dashboard: live deck at dashboard.wg, fed by receiver	2026-05-07 21:26:07 -05:00
app.py	training/dashboard: click a db row → render the episode envelope	2026-05-08 01:16:54 -05:00
client.py	training/dashboard: PRODUCERS.md + client.py for the model session	2026-05-07 21:34:54 -05:00
dashboard.caddy	training/dashboard: live deck at dashboard.wg, fed by receiver	2026-05-07 21:26:07 -05:00
feeder.py	training/dashboard: live deck at dashboard.wg, fed by receiver	2026-05-07 21:26:07 -05:00
PRODUCERS.md	training/dashboard: PRODUCERS.md + client.py for the model session	2026-05-07 21:34:54 -05:00
README.md	training/dashboard: PRODUCERS.md + client.py for the model session	2026-05-07 21:34:54 -05:00

README.md

training/dashboard/

Live web display served at https://dashboard.wg. A Starlette app on 127.0.0.1:8447 behind Caddy; messages from Python are pushed to connected browsers over a WebSocket.

This is intentionally a blank slate — the default page just appends every received JSON message to a scrolling log. Build the real widgets on top of window.dashboard.onMessage.

Run locally

uv run python -m training.dashboard
# → open http://127.0.0.1:8447

Push live data from Python

Same process (e.g. a notebook driving the page):

from training.dashboard.app import broadcaster
await broadcaster.publish({"type": "metric", "name": "loss", "value": 0.42})

Different process (orchestrator, receiver, ad-hoc shell):

curl -s http://127.0.0.1:8447/publish \
     -H 'content-type: application/json' \
     -d '{"type":"hello","msg":"from cron"}'

The /publish endpoint is loopback-only (403 otherwise) and is not reverse-proxied by Caddy, so it cannot be hit from the WG mesh.

Producing events from your own code

If you're writing code that should drive a dashboard widget (model inference, training-loop metrics, profile envelopes), see PRODUCERS.md — it documents every event type the widgets subscribe to, the loopback-only /publish contract, the reconnect gotcha, and the systemd hardening that constrains producers running on the Pi. There's a stdlib-only Python helper at client.py:

from training.dashboard.client import Publisher
Publisher().publish({"type": "model_metric", "model": "lstm", "accuracy": 0.928})

Customizing the page

The default static/index.html exposes window.dashboard:

window.dashboard.onMessage = (msg) => {
  if (msg.type === 'metric') updateChart(msg.name, msg.value);
};
window.dashboard.send({type: 'request-snapshot'});  // browser → server

Override onMessage to dispatch to your own widgets. The blank-slate log renderer is just there so a fresh deploy is observably alive.

Deploying to the Pi

sudo cp etc/cis490-dashboard.service /etc/systemd/system/
sudo cp training/dashboard/dashboard.caddy /etc/caddy/Caddyfile.d/
sudo systemctl daemon-reload && sudo systemctl enable --now cis490-dashboard.service
sudo systemctl reload caddy