README + AGENTS.md: reflect fleet, driver v2, all 4 collectors

README: - Intro now describes the multi-host fleet + cross-host sample diversity as the primary workflow. - Tier 2 section: profile-driven workload table replaces the old "yes / dd" description. - New Tier 3 section: covers driver v2 dispatch + setup automation scripts. - Tier maturity table refreshed (1, 2 ✅; 3 ✅ code / ⏳ image; 4 🚧). - Telemetry-sources table moved into the per-tier story so the oracle-vs-feature split is visible from the top of the doc. - Status section restructured by section (Pipeline, Telemetry, Orchestrator + drivers, Fleet) instead of a flat list. Cross-links to the new Forgejo issues for the remaining gaps: #4 — Tier 4 MalwareBazaar fetcher #5 — source 3 (perf stat) #6 — bridge pcap per-episode wiring - Quick-start sections rewritten: 1) "fleet mode (the primary workflow)" with --capacity + --waves 2) "single episode, no fleet" covering both Tier 2 + Tier 3 3) "multi-host fleet — how cross-host diversity works" explains the deterministic per-(host, slot, ep) selection mechanism - Repo-layout table updated to include shipper/, scripts/, AGENTS.md, and the workloads/fleet additions. - Deploying section: replaces the "TODO scaffolds" wording with the actual sudo install-receiver / install-lab-host / wg-pki bring-up flow that's running on the Pi today. AGENTS.md: adds a "don't put off the hard parts" convention as the first item under Other conventions, with explicit guidance on when "deferred-with-reason" is legitimate (genuine operator artifact missing) and the requirement to file an issue + automate the bring-up so it Just Works once the artifact lands. 86/86 tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 00:11:35 -05:00 · 2026-04-30 00:11:35 -05:00 · c89dbe29e7
commit c89dbe29e7
parent b80986d99c
2 changed files with 214 additions and 87 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@ -79,6 +79,15 @@ commits.

 ## Other conventions

+- **Don't put off the hard parts.** Frame "deferred-with-reason" only
+  for genuine blockers (binary not present on this machine, external
+  service unreachable). For anything you *could* do but find awkward
+  — bridge setup, cross-arch quirks, fleet concurrency — do it. The
+  user has flagged this twice when work was scoped down prematurely.
+  When something genuinely is blocked by an operator artifact, file
+  the Forgejo issue and *automate the bring-up* (e.g., installer
+  script + sha256-verifying fetcher) so the moment the artifact lands
+  it Just Works.
 - **Naming:** never coin USB / device / service names on the user's
  behalf. Ask first. Reusing an old name is especially bad.
 - **`/etc` configs:** `Read` first, copy second. Never overwrite a
--- a/README.md
+++ b/README.md
@ -4,9 +4,16 @@ Course project for CIS490 (Cybersecurity). The end-goal is an ML model that
 watches performance metrics on a real device, decides whether the device has
 been breached, and triggers a hardware-level reset when confidence is high
 enough. This repository covers the **dataset side** — we run public malware
-samples against intentionally vulnerable Linux VMs and capture labeled
-time-series telemetry that mirrors what the deployed model would see in the
-field.
+samples (and behavior-matched mimics) against intentionally vulnerable Linux
+VMs and capture labeled time-series telemetry that mirrors what the deployed
+model would see in the field.
+
+Concretely, every lab host on the WireGuard mesh detects how much capacity
+it has, spins up that many concurrent VMs, gives each VM a *different*
+malware profile from the manifest, and ships the resulting labeled episode
+tarballs to the central receiver on the Pi over mTLS. Running the same
+fleet on multiple hosts gives novel, non-overlapping data per host with no
+coordinator — see [Multi-host fleet](#multi-host-fleet) below.

 The work is grounded in the trust-over-time scoring model from
 [IEEE 9881803](https://ieeexplore.ieee.org/document/9881803).
@ -22,15 +29,33 @@ the set of timestamped phase transitions written to `labels.jsonl` —
 sharing a monotonic clock with the metric rows so anything aligned in
 time can be aligned in code.

-### Tier 2 — *real Alpine VM, real workload driven from inside the guest*
+### Tier 2 — *real Alpine VM, profile-driven workload inside the guest*

 This is the closest we get to real-malware behaviour without yet running
 real malware. Telemetry is real `/proc/<qemu_pid>` from outside the
-guest, **and the load is generated inside the guest** by busybox
-``yes`` (CPU saturation) and ``dd`` (disk bursts), driven over the
-serial console by `tools/vm_load_controller.py`. Every phase transition
-in `labels.jsonl` corresponds to an actual command issued inside the
-real VM.
+guest plus three more sources running concurrently (QMP, bridge pcap,
+in-guest agent — see *Telemetry sources* below). The *load* itself is
+generated inside the guest by a profile-matched shell command from
+[`exploits/workloads.py`](exploits/workloads.py), driven over the
+serial console by [`tools/vm_load_controller.py`](tools/vm_load_controller.py).
+
+Each sample's `profile` (from [`samples/manifest.toml`](samples/manifest.toml))
+dispatches to a different in-session workload, so the envelope each
+VM produces is observably different per family — exactly the variance
+the ML model needs to learn:
+
+| profile          | shape                                                  |
+|------------------|--------------------------------------------------------|
+| `cpu-saturate`   | sustained 1-vCPU saturation (XMRig)                    |
+| `scan-and-dial`  | SYN-style probes across the bridge subnet + dial-home  |
+| `io-walk`        | fs traversal + 4 KiB urandom writes (ransomware)       |
+| `bursty-c2`      | long idle + periodic 3-packet egress burst (Dridex)    |
+| `low-and-slow`   | minimal CPU + periodic memory churn (Kovter / fileless)|
+| `shell-resident` | one long-lived TCP socket + periodic command ticks (RAT)|
+
+Every phase transition in `labels.jsonl` corresponds to an actual
+command issued inside the real VM, and `meta.json` records which
+sample / profile / kind drove it.

 ![Real Alpine VM envelope](docs/images/real-vm-envelope.png)

@ -41,10 +66,20 @@ controller killing the load process inside the VM. The
 infected_running → dormant → infected_running re-entry is the textbook
 envelope that justifies the whole project framing.

-Reproduce with:
+Reproduce one episode (profile-driven via `--sample` or `SAMPLE_NAME`
+env, defaults to the v1 yes-loop without one):

 ```sh
-uv run python tools/run_real_vm_demo.py --data-root data
+uv run python tools/run_real_vm_demo.py --data-root data \
+    --sample xmrig-cryptominer
+```
+
+Or run the **fleet** — one wave of `max_concurrent` parallel episodes,
+each slot pulling a different sample from the manifest:
+
+```sh
+uv run python tools/run_fleet.py --capacity            # see what the host can do
+uv run python tools/run_fleet.py --waves 1 --data-root data
 ```

 ### Tier 1 — *real Alpine VM, idle baseline*
@ -67,21 +102,46 @@ above produces from real KVM behaviour.

 ![Synthetic envelope (host-side mimic)](docs/images/synthetic-envelope.png)

-### What's still missing for the real-malware envelope
+### Tier 3 — *real exploit fire, profile-matched workload (Driver v2)*
+
+The Tier-3 driver lives in [`exploits/`](exploits/README.md) — a tiny
+msgpack-over-HTTPS msfrpc client + `MSFExploitDriver`. With a
+[`Sample`](samples/manifest.py) supplied, the driver dispatches the
+post-exploit `infected_running` workload through
+[`exploits/workloads.py`](exploits/workloads.py) — same six profiles
+as Tier 2, so a fleet wave produces matched envelopes whether or not
+an exploit fires. Without a sample, the v1 yes-loop path is preserved
+for smoke runs.
+
+First canned module: `exploits/modules/vsftpd_234_backdoor.toml`
+(Metasploitable2's CVE-2011-2523). [`scripts/install-msfrpcd.sh`](scripts/install-msfrpcd.sh)
+sets up `msfrpcd` (loopback only) as a hardened systemd unit;
+[`scripts/fetch-metasploitable2.sh`](scripts/fetch-metasploitable2.sh)
+pulls + sha256-verifies a target image from operator-supplied URL.
+
+### Tier maturity

 | Tier | What it gives | Status |
 |---|---|---|
-| 1 — real VM, idle | confidence the collector reads real KVM behaviour | ✅ done |
-| 2 — real VM, real workload from inside the guest | first real-load envelope shape | ✅ done |
-| 3 — real VM, real exploit fire (Metasploitable + msfrpc) | honest `armed → infecting` transitions | 🟡 driver landed, integration pending |
-| 4 — real VM, real malware sample (XMRig from MalwareBazaar) | the full envelope we ultimately train on | 🚧 |
+| 1 — real VM, idle | confidence the collectors read real KVM behaviour | ✅ done |
+| 2 — real VM, profile-driven workload | distinguishable in-guest envelopes per malware family | ✅ done |
+| 3 — real VM, real exploit fire + profile workload | honest `armed → infecting` transitions, driver v2 dispatch | ✅ code; ⏳ awaiting Metasploitable2 image + msfrpcd on a lab host |
+| 4 — real VM, real malware sample (MalwareBazaar fetch) | the full envelope we ultimately train on | 🚧 manifest schema ready (`sample.sha256` → `kind=real`); fetcher TBD |

-The Tier-3 driver lives in [`exploits/`](exploits/README.md) — a tiny
-msgpack-over-HTTPS msfrpc client plus an `MSFExploitDriver` plugged
-into the orchestrator as the `on_phase` callback. First canned module:
-`exploits/modules/vsftpd_234_backdoor.toml` (Metasploitable2's
-CVE-2011-2523). End-to-end integration needs `msfrpcd` running and a
-Metasploitable2 image at `vm/images/`, which is the next bring-up step.
+### Telemetry sources (all four wire into one episode dir)
+
+| # | Source                         | Vantage       | Role                |
+|---|--------------------------------|---------------|---------------------|
+| 1 | host `/proc/<qemu_pid>`        | outside       | oracle (label only) |
+| 2 | QEMU QMP queries               | outside       | oracle (label only) |
+| 3 | `perf stat -p <qemu_pid>`      | outside       | oracle (planned)    |
+| 4 | Bridge pcap → 100 ms netflow   | gateway-side  | feature (deployable)|
+| 5 | In-guest agent (virtio-serial) | inside        | feature (deployable)|
+
+Sources 1, 2, 4, 5 are live as of this commit. The deploy/oracle split
+follows [`docs/threat-model.md`](docs/threat-model.md): only sources
+4 + 5 are usable as model *features* in the field — sources 1, 2, 3
+exist as labeling oracles only.

 For an interactive view of any episode (zoom/pan/hover), run:

@ -92,87 +152,133 @@ tools/show_envelope.sh data/episodes/<episode_id>

 ---

-## Status
+## Status (86/86 tests passing as of `b80986d`)

- ✅ Receiver (HTTPS PUT, sha256-verified, idempotent) — running on Pi5 via Caddy + mTLS (wg-pki client CA)
- ✅ Orchestrator v0 — single- and scheduled-phase modes, ULID episode ids
- ✅ Host /proc oracle collector (source 1) @ 10 Hz
- ✅ **QMP collector** (source 2) — query-status / query-blockstats / query-stats, 1 Hz
- ✅ **Bridge pcap** (source 4) — pure-Python pcap parser + 100 ms-bucketed netflow.jsonl
- ✅ **In-guest agent** (source 5) — virtio-serial; cidata-embedded for first-boot install on Alpine; host-side reader re-stamps to host clock
- ✅ Synthetic envelope demo — full 8-phase envelope produced end-to-end
- ✅ Real VM (Alpine 3.21 cloud-init under KVM)
- ✅ **Tier 2 — real VM, real workload:** serial-console-driven load controller fires `yes`/`dd` inside the guest at every phase transition
- 🟡 **Tier 3 — exploit driver:** `MSFExploitDriver` + msfrpc client + first module config landed; `scripts/install-msfrpcd.sh` automates msfrpcd setup; `scripts/fetch-metasploitable2.sh` pulls + verifies the target image (URL+sha256 from operator). Driver v2 (sample-profile-driven workloads) is the next step for ML diversity.
- ✅ **Shipper** — lab-host ↔ Pi receiver via tar+zstd PUT over WG with mTLS; `--ping` smoke mode
- ✅ **Fleet runner** — host-capacity-aware concurrency (`tools/run_fleet.py`); resource detector reserves cores + RAM headroom; sample manifest with deterministic per-(host, slot, episode) selection so every host on the network produces *novel, varied, labeled* data
- ✅ **Sample manifest** — six initial profiles (cryptominer / botnet / ransomware / banking-trojan / fileless / RAT). Real-malware fetch from MalwareBazaar is the Tier-4 follow-up.
+**Pipeline (lab-host → Pi → tarball stored)**
+- ✅ Receiver app (HTTPS PUT, sha256-verified, idempotent) — running on the Pi behind Caddy with mTLS via the wg-pki client CA
+- ✅ `POST /v1/ping` smoke endpoint (writes nothing, exercises the full auth path)
+- ✅ Shipper (`shipper/`) — tar+zstd, retry/backoff, `--ping` mode
+- ✅ Caddy `collector.wg` block (in `spectral/caddy`)
+- ✅ Lab-host install script + systemd units (`scripts/install-lab-host.sh`, `etc/cis490-{shipper,orchestrator}.service`)
+- ✅ Receiver install script (`scripts/install-receiver.sh`)
+- ✅ wg-pki client-CA bootstrap + per-host leaf issuance (in `spectral/wg-pki`)

-> **Topology note:** in this project the **Pi5 is the WireGuard-side
-> *collector*** that receives episode tarballs from one or more lab hosts.
-> It is *not* the deployment target for the model. The deployment target is
-> generic ("any constrained Linux device"). See
+**Telemetry**
+- ✅ Source 1 — host `/proc/<qemu_pid>` @ 10 Hz
+- ✅ Source 2 — QEMU QMP @ 1 Hz
+- ✅ Source 4 — bridge pcap + 100 ms netflow bucketizer (pure-Python parser, no scapy/dpkt dep). Per-episode wiring in `EpisodeRunner` is tracked in [#6](http://maxgit.wg/spectral/CIS490/issues/6).
+- ✅ Source 5 — in-guest agent over virtio-serial; cidata-embedded for first-boot install on Alpine
+- 🚧 Source 3 — `perf stat -p <qemu_pid>` ([#5](http://maxgit.wg/spectral/CIS490/issues/5))
+
+**Orchestrator + drivers**
+- ✅ Orchestrator v0 — phase-scheduled episode runner, ULID episode ids
+- ✅ Tier 2 driver — real Alpine VM, profile-driven in-guest workload over serial console
+- ✅ Tier 3 driver v2 — `MSFExploitDriver` + msfrpc client + per-sample workload dispatch; first canned module `vsftpd_234_backdoor.toml`
+- ⏳ Tier 3 integration — needs operator to drop a Metasploitable2 image + run `scripts/install-msfrpcd.sh` on a lab host
+- 🚧 Tier 4 — MalwareBazaar fetch by sha256 (manifest schema is ready; tracked in [#4](http://maxgit.wg/spectral/CIS490/issues/4))
+
+**Fleet (multi-VM, multi-host data generation)**
+- ✅ Resource-aware capacity detector (cores / RAM / load) — `orchestrator/fleet.py`
+- ✅ Concurrent slot runner — `tools/run_fleet.py`
+- ✅ Sample manifest with six behavioural profiles + deterministic per-(host_id, slot, episode) selection so every host walks the catalog in a different order
+
+> **Topology note:** the **Pi5 is the WireGuard-side *collector*** that
+> receives episode tarballs from one or more lab hosts. It is *not* the
+> deployment target for the model. The deployment target is generic
+> ("any constrained Linux device"). See
 > [`docs/architecture.md`](docs/architecture.md).

 ---

 <details>
-<summary><b>Quick start — run the synthetic envelope demo (~90 s)</b></summary>
+<summary><b>Quick start — fleet mode (the primary workflow)</b></summary>

 ```sh
 git clone https://maxgit.wg/spectral/CIS490.git
 cd CIS490
-
-# One-time setup.
 uv sync

-# Generate one labeled episode (8 phases, 851 telemetry rows, 85 s).
-uv run python tools/run_envelope_demo.py --data-root data
+# 1. Build the cidata ISO with the in-guest agent baked in.
+uv run python tools/build_cidata.py vm/images/cidata.iso

-# Render a static PNG envelope of that episode.
-uv run python tools/plot_envelope.py data/episodes/<episode_id>
+# 2. See what this host is sized for.
+uv run python tools/run_fleet.py --capacity
+# cores: 4 (reserve 1)
+# ram:   7951 MiB total, 5223 MiB available (headroom 1024 MiB, per-vm 320 MiB)
+# load:  1m=0.51
+# caps:  by_cores=3, by_ram=13, by_load=3
+# --> max_concurrent VMs: 3

-# Or open an interactive plot in your browser:
+# 3. Run one wave (= max_concurrent parallel episodes, each with a
+#    different sample profile).
+uv run python tools/run_fleet.py --waves 1 --data-root data
+
+# 4. Plot any episode (matplotlib WebAgg).
 tools/show_envelope.sh data/episodes/<episode_id>
 ```

-The data lands in `data/episodes/<ulid>/`:
+Each episode dir contains:

 ```
-meta.json              episode metadata (image, snapshot, schedule, host fingerprint)
-events.jsonl           orchestrator actions (snapshot_load, phase_transition, episode_end)
+meta.json              episode metadata (image, sample, profile, fleet capacity)
+events.jsonl           orchestrator + driver events (exploit_fire, session_open, sample_executed, ...)
 labels.jsonl           one row per phase transition — THIS is the envelope
-telemetry-proc.jsonl   host /proc sampler at 10 Hz
+telemetry-proc.jsonl   source 1: host /proc sampler @ 10 Hz
+telemetry-qmp.jsonl    source 2: QMP query-status / blockstats / kvm stats @ 1 Hz
+telemetry-guest.jsonl  source 5: in-guest agent (CPU jiffies, mem, listen ports, top procs)
+network.pcap           source 4: tcpdump on br-malware
+netflow.jsonl          source 4: 100 ms-bucketed pcap aggregation
 done.marker            written last; the shipper only sees finished episodes
 ```

 </details>

 <details>
-<summary><b>Quick start — boot a real Linux VM (Cirros)</b></summary>
-
-The phase-2 launcher boots a Cirros qcow2 under KVM and exposes its
-QMP/monitor sockets and pidfile. The orchestrator then samples the real
-`qemu-system` process.
+<summary><b>Quick start — single episode, no fleet</b></summary>

 ```sh
-# Pre-staged: vm/images/cirros-baseline.qcow2 with snapshot 'baseline-v1'.
-# (See docs/sources.md for the Cirros sha256.)
+# Tier 2 (no exploit, profile-driven workload):
+uv run python tools/run_real_vm_demo.py --data-root data \
+    --sample mirai-class-bot

-# Boot in one terminal:
-RUN_DIR=/tmp/cis490-vm vm/launch_demo.sh
-
-# In another terminal, point the orchestrator at the VM's pid:
-QPID=$(cat /tmp/cis490-vm/qemu.pid)
-uv run python -m orchestrator --target-pid $QPID --duration 20
-
-# Plot:
-tools/show_envelope.sh data/episodes/<episode_id>
+# Tier 3 (real exploit fire via msfrpcd):
+MSFRPC_PASSWORD=$(. /etc/cis490/msfrpc.env; echo $MSFRPC_PASSWORD) \
+    uv run python tools/run_tier3_demo.py \
+    --module vsftpd_234_backdoor \
+    --sample ransomware-mimic \
+    --data-root data
 ```

-The idle-VM envelope shape is distinct from the synthetic load: periodic
-~10% CPU spikes from KVM/timer interrupts, flat ~230 MiB RSS, a single
-late-boot disk write. That's a real KVM guest you're seeing.
+</details>
+
+<details>
+<summary><b>Multi-host fleet — how cross-host diversity works</b></summary>
+
+Each lab host's `host_id` (set in `/etc/cis490/lab-host.toml`) seeds a
+deterministic walk through the sample catalog:
+
+```python
+# samples/manifest.py
+def select(self, *, host_id, slot, episode_index):
+    seed = f"{host_id}|{slot}|{episode_index}"
+    idx  = sha256(seed)[:8] % len(self.samples)
+    return self.samples[idx]
+```
+
+So:
+- `host=alice slot=0 ep=0` and `host=bob slot=0 ep=0` almost certainly
+  pick *different* samples (test asserts < 25% collision over 20 trials).
+- A single host walks the entire catalog within ~`len(manifest)` waves
+  (test confirms full coverage in 200 episodes).
+- No coordinator needed — every host independently produces non-overlapping
+  data, and `meta.fleet.host_id` + `meta.sample.name` make the join trivial
+  at training time.
+
+The fleet runner shells out to the same `tools/run_real_vm_demo.py` per
+slot, with `SLOT` / `RUN_DIR` / `SAMPLE_NAME` env passed through to the
+launcher. Each VM gets its own QMP socket, agent socket, hostfwd port
+range, and episode dir, so concurrency is collision-free up to the
+capacity ceiling.

 </details>

@ -188,15 +294,18 @@ late-boot disk write. That's a real KVM guest you're seeing.
 | [`docs/deploy.md`](docs/deploy.md) | One-command install for the lab-host and receiver roles |
 | [`docs/lab-setup.md`](docs/lab-setup.md) | KVM prereqs, VM build, snapshot, virtio-serial wiring |
 | [`docs/sources.md`](docs/sources.md) | Works cited — every tool, dep, sample source, paper, and standard |
-| `orchestrator/` | State machine that drives the boot → arm → detonate → observe → revert loop |
-| `collectors/` | One module per telemetry source (host /proc, QMP, perf, pcap, guest agent) |
-| `receiver/` | Starlette app: PUT /v1/episodes ingest, sha256-verified, idempotent |
-| `vm/` | qcow2 images, launch scripts, snapshot recipes (binaries gitignored) |
-| `tools/` | Demo runners, load mimic, plot scripts |
-| [`exploits/`](exploits/README.md) | MSF RPC client + driver + per-module TOML configs (Tier 3) |
-| `samples/` | Sample manifest (sha256-pinned). **Binaries never committed.** |
+| `orchestrator/` | Episode runner + `fleet.py` (capacity detection, concurrent slot driver) |
+| `collectors/` | One module per telemetry source: `proc_qemu`, `qmp`, `pcap`, `guest_agent` |
+| `receiver/` | Starlette app: PUT `/v1/episodes` + POST `/v1/ping`, sha256-verified, idempotent |
+| `shipper/` | Lab-host-side: scan `data/episodes/`, tar+zstd, PUT over mTLS, retry/backoff |
+| `vm/` | Launch scripts (`launch_demo.sh`, `launch_target.sh`), `setup_bridge.sh`, in-guest agent at `vm/guest-agent/cis490_agent.py`. qcow2 images and pcap captures gitignored. |
+| `tools/` | `run_fleet.py`, `run_real_vm_demo.py`, `run_tier3_demo.py`, `build_cidata.py`, `plot_envelope.py`, `show_envelope.sh` |
+| [`exploits/`](exploits/README.md) | MSF RPC client (`msfrpc.py`), `driver.py` (v2 with sample dispatch), `workloads.py` (six profile-matched in-session loops), per-module TOML configs |
+| [`samples/`](samples/manifest.toml) | Sample manifest + loader. Binaries land at `samples/store/<sha256>` (gitignored). |
+| `scripts/` | `install-{lab-host,receiver,msfrpcd}.sh`, `fetch-metasploitable2.sh` |
 | `training/` | Model training code (deferred — schema first) |
-| `etc/` | systemd units and config templates installed by the deploy scripts |
+| `etc/` | systemd units and config templates (`cis490-{receiver,shipper,orchestrator}.service`, `lab-host.toml.example`, `receiver.toml.example`) |
+| [`AGENTS.md`](AGENTS.md) | Conventions for AI agents working on this and sibling spectral repos |

 </details>

@ -237,17 +346,26 @@ Two roles, one bootstrap command each. Detailed in
  `index.jsonl`. Runs on the Pi5 in our setup.

 ```sh
-# On a lab host:
-./scripts/install-lab-host.sh   # (TODO — currently bring up by hand per docs/deploy.md)
-
 # On the Pi5 (or any always-on WG node):
-./scripts/install-receiver.sh   # (TODO — same)
+sudo ./scripts/install-receiver.sh
+# Add the collector.wg block to spectral/caddy (already merged), then:
+sudo systemctl enable --now cis490-receiver
+
+# One-time, on the Pi: bootstrap the CIS490 client CA.
+sudo /home/max/.env/wg-pki/scripts/init-cis490-client-ca.sh
+
+# On each lab host: enroll via wg-enroll first, then:
+sudo ./scripts/install-lab-host.sh
+# Drop a TLS leaf from wg-pki at /etc/cis490/certs/, edit /etc/cis490/lab-host.toml.
+sudo systemctl enable --now cis490-shipper cis490-orchestrator
 ```

-For now both bootstrap scripts are scaffolds; the units and configs they
-install live in `etc/`. The receiver itself works today
-(`uv run python -m receiver --config etc/receiver.toml.example` — modify
-paths).
+The orchestrator service runs `tools/run_fleet.py --waves 1` per
+invocation with `Restart=always`, giving a continuous stream of
+fresh-sample episodes per host. The shipper picks them up as
+`done.marker` files appear and PUTs them to `https://collector.wg`.
+
+For mTLS leaf-cert minting: `spectral/wg-pki/scripts/issue-cis490-client-cert.sh <host_id>`.

 </details>