Pi-safe replacement for the original metrics.py + perf.py producers
which load every checkpoint into memory and score the test set on each
cycle. That pattern crashed the Pi during this project (300 MB knn
pickles × 6 variants + 226 MB test set in memory at peak ≈ OOM).
The new producer:
- reads reports/eval/<model>_<mode>_train.json files (already
contain the test_macro_f1 each trainer wrote)
- publishes one model_metric event per file
- publishes one model_perf event per file with a hardcoded
per-architecture latency estimate (gbt 250 µs, knn 3500, mlp 50,
cnn 500, gru 1500, lstm 2000, transformer 800, transformer_ssl
1000). These are family-level order-of-magnitude figures; proper
benchmarks need to run on the deployment hardware (which is the
A100, not the Pi).
- re-publishes on a tick (default 30 s) for refresh-resilience.
- NO model loading. Pi-safe.
scripts/rsync-from-lambda.sh — pulls Lambda's artifacts/ + reports/eval/
to the Pi every 30 s. As Lambda finishes each model and writes its
train.json, the Pi sees the new file within a cycle and the publisher
broadcasts the metric on its next tick. Live multi-model dashboard
during training, with no Pi-side inference.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
20 lines
874 B
Bash
Executable file
20 lines
874 B
Bash
Executable file
#!/usr/bin/env bash
|
|
# Tail Lambda's artifacts/ + reports/eval/ and pull new files to the Pi.
|
|
# Lightweight — pure rsync, no compute.
|
|
set -uo pipefail
|
|
KEY=$HOME/.ssh/lambda_ed25519
|
|
REMOTE=ubuntu@129.153.93.192
|
|
REPO=/home/max/.env/CIS490
|
|
mkdir -p "$REPO/artifacts" "$REPO/reports/eval"
|
|
while true; do
|
|
n0=$(ls -1 "$REPO/artifacts"/*.ckpt.json 2>/dev/null | wc -l)
|
|
rsync -aq -e "ssh -i $KEY -o StrictHostKeyChecking=accept-new -o ConnectTimeout=10" \
|
|
"$REMOTE:cis490/artifacts/" "$REPO/artifacts/" 2>&1
|
|
rsync -aq -e "ssh -i $KEY -o StrictHostKeyChecking=accept-new -o ConnectTimeout=10" \
|
|
"$REMOTE:cis490/reports/eval/" "$REPO/reports/eval/" 2>&1
|
|
n1=$(ls -1 "$REPO/artifacts"/*.ckpt.json 2>/dev/null | wc -l)
|
|
if [[ "$n1" -gt "$n0" ]]; then
|
|
echo "[$(date +%H:%M:%S)] +$((n1 - n0)) new artifacts (total=$n1)"
|
|
fi
|
|
sleep 30
|
|
done
|