Pi-safe replacement for the original metrics.py + perf.py producers
which load every checkpoint into memory and score the test set on each
cycle. That pattern crashed the Pi during this project (300 MB knn
pickles × 6 variants + 226 MB test set in memory at peak ≈ OOM).
The new producer:
- reads reports/eval/<model>_<mode>_train.json files (already
contain the test_macro_f1 each trainer wrote)
- publishes one model_metric event per file
- publishes one model_perf event per file with a hardcoded
per-architecture latency estimate (gbt 250 µs, knn 3500, mlp 50,
cnn 500, gru 1500, lstm 2000, transformer 800, transformer_ssl
1000). These are family-level order-of-magnitude figures; proper
benchmarks need to run on the deployment hardware (which is the
A100, not the Pi).
- re-publishes on a tick (default 30 s) for refresh-resilience.
- NO model loading. Pi-safe.
scripts/rsync-from-lambda.sh — pulls Lambda's artifacts/ + reports/eval/
to the Pi every 30 s. As Lambda finishes each model and writes its
train.json, the Pi sees the new file within a cycle and the publisher
broadcasts the metric on its next tick. Live multi-model dashboard
during training, with no Pi-side inference.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>