diff --git a/README.md b/README.md index 098cc63..dbab479 100644 --- a/README.md +++ b/README.md @@ -119,6 +119,28 @@ sets up `msfrpcd` (loopback only) as a hardened systemd unit; [`scripts/fetch-metasploitable2.sh`](scripts/fetch-metasploitable2.sh) pulls + sha256-verifies a target image from operator-supplied URL. +### Tier 4 — *real malware sample, fetched + uploaded + executed* + +A manifest entry with a `sha256` flips its `Sample.kind` to `"real"`. +The driver then bypasses the mimic profile and runs the real-binary +path: + +1. [`tools/fetch_sample.py `](tools/fetch_sample.py) pulls the + binary from MalwareBazaar (Auth-Key from + `samples/.bazaar.token` or `MALWAREBAZAAR_API_KEY`), unzips with the + standard `infected` password, sha-verifies, and lands at + `samples/store/` (gitignored). +2. At `infected_running`, the driver chunked-uploads the binary into + the shell session as 8 KiB base64 segments + (`exploits.workloads.chunked_real_binary_upload`). 256 KiB binaries + work without buffer-busting msfrpc. +3. The session decodes, sha-verifies *again on the guest side*, chmods, + and execs only if the hash matches. Mismatch fail-stops the run. +4. `meta.sample.sha256` + per-step events + (`real_binary_upload_begin`, `real_binary_verify`, + `sample_executed{kind=real}`) record exactly which binary was run + and when, so trainers can join cleanly. + ### Tier maturity | Tier | What it gives | Status | @@ -126,22 +148,22 @@ pulls + sha256-verifies a target image from operator-supplied URL. | 1 — real VM, idle | confidence the collectors read real KVM behaviour | ✅ done | | 2 — real VM, profile-driven workload | distinguishable in-guest envelopes per malware family | ✅ done | | 3 — real VM, real exploit fire + profile workload | honest `armed → infecting` transitions, driver v2 dispatch | ✅ code; ⏳ awaiting Metasploitable2 image + msfrpcd on a lab host | -| 4 — real VM, real malware sample (MalwareBazaar fetch) | the full envelope we ultimately train on | 🚧 manifest schema ready (`sample.sha256` → `kind=real`); fetcher TBD | +| 4 — real VM, real malware sample (MalwareBazaar fetch) | the full envelope we ultimately train on | ✅ code; ⏳ awaiting MalwareBazaar API key + sha256s in manifest | -### Telemetry sources (all four wire into one episode dir) +### Telemetry sources (all five wire into one episode dir) | # | Source | Vantage | Role | |---|--------------------------------|---------------|---------------------| | 1 | host `/proc/` | outside | oracle (label only) | | 2 | QEMU QMP queries | outside | oracle (label only) | -| 3 | `perf stat -p ` | outside | oracle (planned) | +| 3 | `perf stat -p ` | outside | oracle (label only) | | 4 | Bridge pcap → 100 ms netflow | gateway-side | feature (deployable)| | 5 | In-guest agent (virtio-serial) | inside | feature (deployable)| -Sources 1, 2, 4, 5 are live as of this commit. The deploy/oracle split -follows [`docs/threat-model.md`](docs/threat-model.md): only sources -4 + 5 are usable as model *features* in the field — sources 1, 2, 3 -exist as labeling oracles only. +All five are live. The deploy/oracle split follows +[`docs/threat-model.md`](docs/threat-model.md): only sources 4 + 5 +are usable as model *features* in the field — sources 1, 2, 3 exist +as labeling oracles only. For an interactive view of any episode (zoom/pan/hover), run: @@ -152,7 +174,7 @@ tools/show_envelope.sh data/episodes/ --- -## Status (86/86 tests passing as of `b80986d`) +## Status (106/106 tests passing as of `a88ac83`) **Pipeline (lab-host → Pi → tarball stored)** - ✅ Receiver app (HTTPS PUT, sha256-verified, idempotent) — running on the Pi behind Caddy with mTLS via the wg-pki client CA @@ -166,16 +188,18 @@ tools/show_envelope.sh data/episodes/ **Telemetry** - ✅ Source 1 — host `/proc/` @ 10 Hz - ✅ Source 2 — QEMU QMP @ 1 Hz -- ✅ Source 4 — bridge pcap + 100 ms netflow bucketizer (pure-Python parser, no scapy/dpkt dep). Per-episode wiring in `EpisodeRunner` is tracked in [#6](http://maxgit.wg/spectral/CIS490/issues/6). +- ✅ Source 3 — `perf stat -p ` (opt-in via `enable_perf`; needs `CAP_SYS_ADMIN` / `CAP_PERFMON`) +- ✅ Source 4 — bridge pcap + 100 ms netflow bucketizer (pure-Python parser, no scapy/dpkt dep), wired into `EpisodeRunner` via `bridge_iface` - ✅ Source 5 — in-guest agent over virtio-serial; cidata-embedded for first-boot install on Alpine -- 🚧 Source 3 — `perf stat -p ` ([#5](http://maxgit.wg/spectral/CIS490/issues/5)) **Orchestrator + drivers** - ✅ Orchestrator v0 — phase-scheduled episode runner, ULID episode ids +- ✅ Snapshot/revert via QMP `loadvm` (`revert_at_start` / `revert_at_end`) for clean baselines between episodes - ✅ Tier 2 driver — real Alpine VM, profile-driven in-guest workload over serial console - ✅ Tier 3 driver v2 — `MSFExploitDriver` + msfrpc client + per-sample workload dispatch; first canned module `vsftpd_234_backdoor.toml` +- ✅ Tier 4 — `tools/fetch_sample.py` (MalwareBazaar by sha256) + chunked real-binary upload (`exploits.workloads.chunked_real_binary_upload`) + guest-side sha-verify-then-exec dispatch in `MSFExploitDriver` - ⏳ Tier 3 integration — needs operator to drop a Metasploitable2 image + run `scripts/install-msfrpcd.sh` on a lab host -- 🚧 Tier 4 — MalwareBazaar fetch by sha256 (manifest schema is ready; tracked in [#4](http://maxgit.wg/spectral/CIS490/issues/4)) +- ⏳ Tier 4 integration — needs operator's MalwareBazaar API key + at least one `sha256` entry in `samples/manifest.toml` **Fleet (multi-VM, multi-host data generation)** - ✅ Resource-aware capacity detector (cores / RAM / load) — `orchestrator/fleet.py`