CIS490/scripts/lambda-inference-demo.md

# Live inference demo — Lambda runs replay, Pi shows predictions

Architecture for the live "catching attacks" demo (scene 7 chunking
timeline). Pi cannot run inference (RAM-bound; crashed once); all
model loading + per-window prediction must live on the A100.

## Topology

```
   Pi (office-print, 10.100.0.1)            Lambda A100 (ssh ubuntu@<ip>)
   ┌──────────────────────────┐             ┌───────────────────────────┐
   │ dashboard.wg              │             │  replay.py running on     │
   │ /publish (loopback only)  │             │  episode tarballs through │
   │   ↑                       │             │  gbt_oracle.ckpt.json     │
   │   │ POST                  │             │   ↓                       │
   │   │ via SSH reverse tunnel│             │  POST 127.0.0.1:8447      │
   │   │                       │             │   ↑                       │
   │   └─── ssh -R 8447:... ───┼─────────────┤   │                       │
   │                           │             └───────────────────────────┘
   └──────────────────────────┘
```

## Setup steps

1. **Stage demo episodes on Lambda** (raw tarballs, sudo to read on Pi):
   ```bash
   ssh -i ~/.ssh/lambda_ed25519 ubuntu@<lambda-ip> \
       'mkdir -p ~/cis490/data/episodes_demo'
   for eid in <episode-ids>; do
       sudo cat /var/lib/cis490/episodes/<host>/${eid}.tar.zst | \
           ssh -i ~/.ssh/lambda_ed25519 ubuntu@<lambda-ip> \
               "cat > ~/cis490/data/episodes_demo/${eid}.tar.zst"
   done
   ```

2. **Open SSH reverse tunnel** from Pi to Lambda. Exposes Pi's
   loopback `127.0.0.1:8447` (the dashboard's `/publish` endpoint)
   on Lambda's loopback `127.0.0.1:8447`:
   ```bash
   ssh -i ~/.ssh/lambda_ed25519 \
       -o ServerAliveInterval=30 \
       -o ServerAliveCountMax=3 \
       -o ExitOnForwardFailure=yes \
       -N -R 8447:127.0.0.1:8447 \
       ubuntu@<lambda-ip>
   ```
   Verify: from Lambda, `curl http://127.0.0.1:8447/healthz` should
   return the Pi's dashboard health JSON.

3. **Run replay loop on Lambda**:
   ```bash
   ssh -i ~/.ssh/lambda_ed25519 ubuntu@<lambda-ip>
   cd ~/cis490 && . .venv/bin/activate
   export PYTHONPATH=$PWD/repo
   nohup bash replay_loop.sh > replay_loop.log 2>&1 &
   ```
   The loop iterates the staged demo episodes through the
   trained `gbt_oracle.ckpt.json`, emitting `prediction` events
   per window.

## What the user sees

- Scene 7 (chunking timeline) lights up with predicted/actual phase
  per 10-second window
- Scene 8/9/12 still populated from Pi-side lightweight publishers
  (knn streamer + multi_model_metrics + profiles streamer)

## Why not run replay on Pi

Pi RAM = 8 GiB. `replay.py` loads every checkpoint into memory at
startup (300 MB for KNN sidecars × multiple variants); concurrent
load with the metrics publisher's per-cycle test-set scoring
crashed the Pi. Inference belongs on the A100. The Pi's job is
display + lightweight event publishing only.