CIS490

History

Max 4b2863ea99 producers/multi_model_metrics + scripts/rsync-from-lambda Pi-safe replacement for the original metrics.py + perf.py producers which load every checkpoint into memory and score the test set on each cycle. That pattern crashed the Pi during this project (300 MB knn pickles × 6 variants + 226 MB test set in memory at peak ≈ OOM). The new producer: - reads reports/eval/<model>_<mode>_train.json files (already contain the test_macro_f1 each trainer wrote) - publishes one model_metric event per file - publishes one model_perf event per file with a hardcoded per-architecture latency estimate (gbt 250 µs, knn 3500, mlp 50, cnn 500, gru 1500, lstm 2000, transformer 800, transformer_ssl 1000). These are family-level order-of-magnitude figures; proper benchmarks need to run on the deployment hardware (which is the A100, not the Pi). - re-publishes on a tick (default 30 s) for refresh-resilience. - NO model loading. Pi-safe. scripts/rsync-from-lambda.sh — pulls Lambda's artifacts/ + reports/eval/ to the Pi every 30 s. As Lambda finishes each model and writes its train.json, the Pi sees the new file within a cycle and the publisher broadcasts the metric on its next tick. Live multi-model dashboard during training, with no Pi-side inference. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-05-08 16:04:15 -05:00
..
auto-update.sh	lab-host: cis490-autoupdate.timer for self-healing on push	2026-05-01 16:59:31 -05:00
build-lambda-bundle.sh	training: lambda-cloud one-shot training integration	2026-05-08 12:32:04 -05:00
fetch-alpine-baseline.sh	Close out the deployment-readiness gaps	2026-04-30 00:31:55 -05:00
fetch-lab-host-cert.sh	lab-host: cis490-cert-fetch.timer for automatic mTLS bootstrap retry	2026-05-02 13:30:16 -05:00
fetch-metasploitable2.sh	Tier 3 + Tier 4 auto-deploy: zero operator interaction	2026-04-30 23:12:08 -05:00
install-lab-host.sh	PIPELINE §5 step 2: canonical manifest at <repo>/manifest.toml	2026-05-04 01:25:01 -05:00
install-msfrpcd.sh	Tier-3 bring-up: 9 bugs fixed on elliott-ThinkPad (2026-05-01)	2026-05-02 12:26:19 -06:00
install-receiver.sh	bootstrap: auto-issue mTLS leaves to enrolled lab hosts (closes #9 , refs #3 )	2026-04-30 01:30:29 -05:00
install-tier-3-4.sh	Tier-3: fix QEMU boot, catalog admission, verify module	2026-05-05 16:41:41 -06:00
install-training-worker-windows.ps1	training/fleet: distributed multi-host trainer with capability gating	2026-05-08 01:20:20 -05:00
install-training-worker.sh	training/fleet: distributed multi-host trainer with capability gating	2026-05-08 01:20:20 -05:00
issue-cis490-client-cert-wrapper.sh	bootstrap: auto-issue mTLS leaves to enrolled lab hosts (closes #9 , refs #3 )	2026-04-30 01:30:29 -05:00
lambda-bootstrap.sh	scripts/lambda-bootstrap.sh: also fix eval invocation paths	2026-05-08 15:25:40 -05:00
rsync-from-lambda.sh	producers/multi_model_metrics + scripts/rsync-from-lambda	2026-05-08 16:04:15 -05:00
run-on-lambda.sh	training: lambda-cloud one-shot training integration	2026-05-08 12:32:04 -05:00
sync-training-data.sh	training: validator, feature/tensor extractors, 6 supervised models, schema-hashed checkpoints, eval suite, dashboard producers	2026-05-08 01:19:00 -05:00
train-pi-cpu-models.sh	scripts/train-pi-cpu-models.sh — sequential Pi-side trainer chain	2026-05-08 14:12:34 -05:00