# Live inference demo — Lambda runs replay, Pi shows predictions Architecture for the live "catching attacks" demo (scene 7 chunking timeline). Pi cannot run inference (RAM-bound; crashed once); all model loading + per-window prediction must live on the A100. ## Topology ``` Pi (office-print, 10.100.0.1) Lambda A100 (ssh ubuntu@) ┌──────────────────────────┐ ┌───────────────────────────┐ │ dashboard.wg │ │ replay.py running on │ │ /publish (loopback only) │ │ episode tarballs through │ │ ↑ │ │ gbt_oracle.ckpt.json │ │ │ POST │ │ ↓ │ │ │ via SSH reverse tunnel│ │ POST 127.0.0.1:8447 │ │ │ │ │ ↑ │ │ └─── ssh -R 8447:... ───┼─────────────┤ │ │ │ │ └───────────────────────────┘ └──────────────────────────┘ ``` ## Setup steps 1. **Stage demo episodes on Lambda** (raw tarballs, sudo to read on Pi): ```bash ssh -i ~/.ssh/lambda_ed25519 ubuntu@ \ 'mkdir -p ~/cis490/data/episodes_demo' for eid in ; do sudo cat /var/lib/cis490/episodes//${eid}.tar.zst | \ ssh -i ~/.ssh/lambda_ed25519 ubuntu@ \ "cat > ~/cis490/data/episodes_demo/${eid}.tar.zst" done ``` 2. **Open SSH reverse tunnel** from Pi to Lambda. Exposes Pi's loopback `127.0.0.1:8447` (the dashboard's `/publish` endpoint) on Lambda's loopback `127.0.0.1:8447`: ```bash ssh -i ~/.ssh/lambda_ed25519 \ -o ServerAliveInterval=30 \ -o ServerAliveCountMax=3 \ -o ExitOnForwardFailure=yes \ -N -R 8447:127.0.0.1:8447 \ ubuntu@ ``` Verify: from Lambda, `curl http://127.0.0.1:8447/healthz` should return the Pi's dashboard health JSON. 3. **Run replay loop on Lambda**: ```bash ssh -i ~/.ssh/lambda_ed25519 ubuntu@ cd ~/cis490 && . .venv/bin/activate export PYTHONPATH=$PWD/repo nohup bash replay_loop.sh > replay_loop.log 2>&1 & ``` The loop iterates the staged demo episodes through the trained `gbt_oracle.ckpt.json`, emitting `prediction` events per window. ## What the user sees - Scene 7 (chunking timeline) lights up with predicted/actual phase per 10-second window - Scene 8/9/12 still populated from Pi-side lightweight publishers (knn streamer + multi_model_metrics + profiles streamer) ## Why not run replay on Pi Pi RAM = 8 GiB. `replay.py` loads every checkpoint into memory at startup (300 MB for KNN sidecars × multiple variants); concurrent load with the metrics publisher's per-cycle test-set scoring crashed the Pi. Inference belongs on the A100. The Pi's job is display + lightweight event publishing only.