CIS490/scripts/lambda-inference-demo.md
Max c2a71de4b2 scene 9 bars: paint full zoo + 0–1 visible scale
- multi_model_metrics: publish gbt / mlp / cnn / knn_semi /
  gru / lstm / bert (knn handled by knn streamer); read both
  *_train.json and *_eval.json with macro_f1.point fallback
- dashboard.css: add palette gradients for the four
  non-canonical names so the bars render with a fill colour
- dashboard.js: open the bar's visible scale to the full 0–1
  range so honest-low cross-host F1s show as a bar instead of
  clamping to 0%
- ship lambda-live-detection-loop.py + dashboard request docs
  (scenes 7/8/12, sticky cache, lambda-inference-demo)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:18:00 -05:00

74 lines
3.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Live inference demo — Lambda runs replay, Pi shows predictions
Architecture for the live "catching attacks" demo (scene 7 chunking
timeline). Pi cannot run inference (RAM-bound; crashed once); all
model loading + per-window prediction must live on the A100.
## Topology
```
Pi (office-print, 10.100.0.1) Lambda A100 (ssh ubuntu@<ip>)
┌──────────────────────────┐ ┌───────────────────────────┐
│ dashboard.wg │ │ replay.py running on │
│ /publish (loopback only) │ │ episode tarballs through │
│ ↑ │ │ gbt_oracle.ckpt.json │
│ │ POST │ │ ↓ │
│ │ via SSH reverse tunnel│ │ POST 127.0.0.1:8447 │
│ │ │ │ ↑ │
│ └─── ssh -R 8447:... ───┼─────────────┤ │ │
│ │ └───────────────────────────┘
└──────────────────────────┘
```
## Setup steps
1. **Stage demo episodes on Lambda** (raw tarballs, sudo to read on Pi):
```bash
ssh -i ~/.ssh/lambda_ed25519 ubuntu@<lambda-ip> \
'mkdir -p ~/cis490/data/episodes_demo'
for eid in <episode-ids>; do
sudo cat /var/lib/cis490/episodes/<host>/${eid}.tar.zst | \
ssh -i ~/.ssh/lambda_ed25519 ubuntu@<lambda-ip> \
"cat > ~/cis490/data/episodes_demo/${eid}.tar.zst"
done
```
2. **Open SSH reverse tunnel** from Pi to Lambda. Exposes Pi's
loopback `127.0.0.1:8447` (the dashboard's `/publish` endpoint)
on Lambda's loopback `127.0.0.1:8447`:
```bash
ssh -i ~/.ssh/lambda_ed25519 \
-o ServerAliveInterval=30 \
-o ServerAliveCountMax=3 \
-o ExitOnForwardFailure=yes \
-N -R 8447:127.0.0.1:8447 \
ubuntu@<lambda-ip>
```
Verify: from Lambda, `curl http://127.0.0.1:8447/healthz` should
return the Pi's dashboard health JSON.
3. **Run replay loop on Lambda**:
```bash
ssh -i ~/.ssh/lambda_ed25519 ubuntu@<lambda-ip>
cd ~/cis490 && . .venv/bin/activate
export PYTHONPATH=$PWD/repo
nohup bash replay_loop.sh > replay_loop.log 2>&1 &
```
The loop iterates the staged demo episodes through the
trained `gbt_oracle.ckpt.json`, emitting `prediction` events
per window.
## What the user sees
- Scene 7 (chunking timeline) lights up with predicted/actual phase
per 10-second window
- Scene 8/9/12 still populated from Pi-side lightweight publishers
(knn streamer + multi_model_metrics + profiles streamer)
## Why not run replay on Pi
Pi RAM = 8 GiB. `replay.py` loads every checkpoint into memory at
startup (300 MB for KNN sidecars × multiple variants); concurrent
load with the metrics publisher's per-cycle test-set scoring
crashed the Pi. Inference belongs on the A100. The Pi's job is
display + lightweight event publishing only.