Validation on k-gamingcom (commit ac7b85f) showed perf enabled in
production but rows_perf=0 on every episode. Without lifecycle events
the failure mode is indistinguishable from "perf wasn't enabled" — §1
silent-downgrade. The events now surface the actual cause:
- perf_unavailable — binary missing OR launch failed (with reason)
- perf_started — perf is running (pid, events, interval)
- perf_first_row — first row written; counters_populated tells
whether any event was actually counted
- perf_finished — final tally (intervals_seen,
intervals_with_values)
- perf_no_counters — perf was alive but every interval came back
<not counted> (likely paranoid > 2 or PID
ownership mismatch)
`_flush()` now writes a row whenever an interval is observed, even
when every event was <not counted>. The all-None row is honest data
("perf observed this interval and counted nothing"), and the rows
become a count of observed intervals rather than a count of
successful measurements — distinct from rows_proc / rows_qmp which
do count successful measurements. Trainers filter on
`cycles is not None` etc. when they need only populated rows.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Diagnoses + fixes for the silent-collector / never-lands-session
failures that the 200-episode quality probe surfaced (§3 evidence).
All four address the producer; no compensating layers added.
perf collector (rows_perf=0 on 100% of episodes):
- perf stat -j writes to stderr by default with -p; we read stdout.
Add --log-fd 1 so JSON reaches stdout where the parser sees it.
- Event names come back annotated with the privilege scope perf
actually measured ("cycles:u" under perf_event_paranoid=2). Strip
the suffix so _build_row's plain-name lookups hit. Without this
every metric was None even when perf reported real numbers.
- tests/test_collectors_emit.py covers the regression with a real
busy-loop fixture; emit-test discipline per §4.4.
guest-agent collector (rows_guest=0 on 100% of episodes):
- Alpine cloud image doesn't ship python3, so the in-guest agent's
`#!/usr/bin/env python3` shebang silently fails. Add packages:
[python3] to cidata user-data so cloud-init installs it before
the OpenRC service starts.
- Guest agent now exits nonzero (was: silent stdout fallback) when
/dev/virtio-ports/cis490.guest.agent is missing, so OpenRC
reports the failure to /var/log/cis490-agent.log instead of the
bytes vanishing into the void. Refs §1.
- Host-side collector emits guest_agent_connected /
guest_agent_first_byte / guest_agent_silent_window into the
orchestrator's events.jsonl. Future episodes show the in-guest
failure mode per-episode instead of inferring from rows_guest=0.
k-gamingcom missing qmp/netflow/pcap (also affected elliott on
Tier-3 episodes — was misclassified as host divergence):
- tools/run_tier3_demo.py was building EpisodeConfig WITHOUT
qmp_socket / guest_agent_socket / bridge_iface — even though
launch_target.sh creates the underlying chardevs and BRIDGE
supplies the iface. tools/run_real_vm_demo.py wires them
correctly; Tier-3 had a copy-paste gap.
- tests/test_collectors_emit.py adds a source-grep regression so
the wiring stays honest.
samba_usermap_script never lands session (0/67 in §3 probe):
- Bind handler default WfsDelay (~5s) gives up before bind_perl on
Metasploitable2 has finished forking + binding LPORT under
SLIRP+hostfwd. Bump to 30s; matches session_open_timeout_s in
exploits/driver.py so framework + driver agree on the wait
budget. Add ConnectTimeout=15 so the handler's bind connect has
retry budget instead of one-shot.
orchestrator/fleet.py: usable_modules + BRIDGE handling were both
unconditional, so:
- With BRIDGE set, requires_bridge modules were still being
dropped — picker only ever returned samba_usermap_script across
every slot/episode (the test_fleet_uses_all_modules_when_bridge_set
failure on HEAD).
- env.pop("BRIDGE") fired even when BRIDGE was the operator's
explicit setup, breaking modules that need bridge mode (vsftpd
backdoor on hardcoded port 6200, distccd, etc.).
Both made conditional on bridge_set so the picker walks the full
catalog under bridge mode and SLIRP-only modules still get a
clean SLIRP env when BRIDGE is unset.
receiver/app.py: half-pregnant v2 schema state in HEAD — calling
store.ingest_stream(episode_type=..., benign_profile=...) with
kwargs the matching store.py change was in the WIP stash. Removed
v2 awareness from app.py so v1 episodes (what the producer ships
today) get accepted again. SCHEMA_VERSION default reset to 1 to
match.
229 passed, 0 failed. (HEAD had 15 failures, all linked to the
half-pregnant v2 state above.)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wraps the gaps surfaced in the "what is not implemented" audit so the
fleet really is shippable end-to-end. Verified live on the Pi:
- cis490-shipper --ping → HTTP 200 through Caddy + mTLS via the
new wg-pki client CA leaf
- real episode dir → tar+zstd → PUT → HTTP 201 stored
- re-ship same bytes → 200 (idempotent)
- re-ship different bytes under same id → 409 (conflict)
Changes:
orchestrator/episode.py
- EpisodeConfig.revert_at_start / revert_at_end (Tier 0+ snapshot/
revert per docs/architecture.md). When set + qmp_socket present,
EpisodeRunner issues loadvm <snapshot_name> and emits
snapshot_revert / snapshot_revert_failed events on the same
monotonic clock as everything else.
collectors/qmp.py
- savevm() / loadvm() helpers using human-monitor-command, plus a
test against the fake QMP server.
exploits/workloads.py
- chunked_real_binary_upload() returns a ChunkedUpload plan: 8 KiB
base64 chunks (~6 KiB binary each) so msfrpc never sees a buffer-
busting payload. Includes a finalize step that sha256-verifies on
the guest before exec.
- real_binary_workload() now wraps the chunked plan for backwards
compat with single-shot callers.
exploits/driver.py
- Tier-4 dispatch walks the chunked plan in MSFExploitDriver:
each chunk is a separate session_shell_write; finalize verifies;
exec only runs on sha-ok. New events: real_binary_upload_begin,
real_binary_verify, real_binary_aborted.
etc/cis490-orchestrator.service
- Reads /etc/cis490/lab-host.env (FLEET_HOST_ID + optional BRIDGE).
- Grants AmbientCapabilities CAP_NET_RAW (tcpdump for source 4) +
CAP_SYS_ADMIN + CAP_PERFMON (perf for source 3) so collectors
work under hardening.
scripts/install-lab-host.sh
- Writes /etc/cis490/lab-host.env on first install with FLEET_HOST_ID
defaulting to `hostname -s`.
- Best-effort: fetches the Alpine baseline qcow2 (sha512-pinned) and
builds cidata.iso with the in-guest agent embedded; symlinks both
into /opt/cis490/vm/images/ so launchers find them.
scripts/fetch-alpine-baseline.sh
- Idempotent fetcher for the Alpine 3.21 cloud-init nocloud qcow2
matching the sha512 in docs/sources.md.
tools/plot_envelope.py
- Rebuilt to render whatever telemetry the episode dir contains:
proc → QMP block ops → perf IPC/miss-rate → bridge pkts/SYNs →
guest agent load/mem. Missing sources are silently skipped.
tools/index_reader.py
- cis490-index CLI: filter receiver's index.jsonl by host / sample
/ time range, sort, count-by group. Closest thing to a query
interface until we stand up Postgres/Timescale.
samples/README.md
- Rewritten to match the new manifest schema, the kind=real vs mimic
split, the per-(host, slot, ep) selection mechanic, and the
chunked-upload safety story.
Tests: 106 pass (was 102). New cases:
- test_qmp.py — savevm + loadvm (HMP wrapper + error path)
- test_tier4.py — chunked plan splitting, sha-pinned finalize,
end-to-end driver walks all chunks + verify + exec via the fake
msfrpc client
Closes the "what is not implemented" punch list.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wraps the three remaining 🚧 items from the README so every collector
the threat-model promises is actually live, and the Tier-4 path
(real-malware fetch + upload + exec) works end-to-end as soon as a
sha256 lands in samples/store/.
Closesspectral/CIS490#4, #5, #6.
== #6 — Bridge pcap wiring ==
EpisodeConfig grows three optional fields:
bridge_iface: str | None # e.g. "br-malware"
bridge_ip: str = "10.200.0.1"
pcap_snaplen: int = 256
When bridge_iface is set, EpisodeRunner spawns tcpdump for the duration
of the schedule (network.pcap), stops it cleanly on episode end, and
runs collectors.pcap.bucketize() to produce netflow.jsonl per the
100-ms schema in docs/data-model.md. EpisodeResult + meta.result
gain rows_netflow + pcap_bytes counters.
vm/launch_demo.sh + launch_target.sh now switch between SLIRP usermode
and tap+bridge based on $BRIDGE — operator pre-creates the tap as a
bridge member, no sudo from the launcher.
run_real_vm_demo.py picks BRIDGE up from env so the fleet runner can
opt entire waves into pcap mode by exporting BRIDGE before invocation.
== #5 — Source 3 perf collector ==
collectors/perf_qemu.py shells out to ``perf stat -p <pid> -I 100 -j``
and parses the per-event JSON stream. Aggregates one row per interval
across the canonical event set (cycles/instructions/cache-{refs,misses}/
branches/branch-misses/page-faults/context-switches), computes IPC +
cache-miss rate. Tolerates missing events (``<not counted>`` /
``<not supported>``) without dropping the row, and skips cleanly when
``perf`` isn't on PATH or the process can't be attached.
EpisodeConfig.enable_perf=True opts into the collector — off by default
because perf needs CAP_SYS_ADMIN or perf_event_paranoid <= 1. When
enabled, runs as a parallel thread alongside the other collectors;
EpisodeResult.rows_perf records the count.
== #4 — Tier 4 (real-malware fetch + upload + exec) ==
tools/fetch_sample.py: pulls a sample by sha256 from MalwareBazaar
(API key from env or samples/.bazaar.token), unzips with the standard
"infected" password, verifies the resulting binary's sha256, lands at
samples/store/<sha256>. Idempotent — already-staged correct binaries
return immediately.
samples/manifest.py: Sample.binary_path(store_root) resolves to the
staged binary path, or None for mimics / not-yet-fetched real samples.
exploits/workloads.py: real_binary_workload(bytes, sample) builds a
Workload that base64-uploads the binary into the shell session via a
heredoc, decodes + chmods + execs it in the background, captures the
PID for clean stop on dormant. Per-profile pid/bin paths so concurrent
samples in the same guest don't collide.
exploits/driver.py: dispatch order is now:
1) sample.kind == "real" + binary staged at sample_store_root
→ real_binary_workload (Tier 4)
2) profile mimic from workloads.workload_for() (Tier 3 v2)
3) None → driver v1 fallback yes-loop
DriverConfig.sample_store_root is the new field; run_tier3_demo.py
wires it to repo_root/samples/store. driver_setup event records
sample_sha256 so trainers can join Tier-4 episodes against the
manifest by hash.
samples/store/.gitkeep added (binaries themselves are gitignored).
Tests: 102 pass (was 86). New suites:
tests/test_perf_qemu.py — parser + builder + perf-missing fallback
tests/test_tier4.py — real_binary_workload base64 round-trip,
stop-cmd kills pidfile, per-profile path
isolation, driver dispatch chooses real vs
mimic correctly, fetcher input validation
and cached-fast-path
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This is the chunk that makes "real data" actually flow on multiple
hosts in parallel. End-to-end pipe was up at 613c6fa / 2579683; now
the lab-host side has the diversity + concurrency it needs.
Collectors landed:
collectors/qmp.py — source 2 (oracle). Tiny synchronous QMP
client + row builder + run loop. Tolerates
older qemu without query-stats.
collectors/guest_agent.py — source 5 (deployable). Reads the
virtio-serial host-side socket, parses
agent JSON-lines, re-stamps to the host
monotonic clock, persists.
collectors/pcap.py — source 4 (deployable). tcpdump capture
+ pure-Python pcap reader + 100 ms
netflow.jsonl bucketizer. Decodes
Ethernet/IPv4/TCP/UDP enough for the
schema in docs/data-model.md.
In-guest agent:
vm/guest-agent/cis490_agent.py — stdlib-only Python agent. Reads
/proc/{stat,meminfo,loadavg,net/dev,net/tcp*}, top-N RSS procs,
thermal. Writes JSON-lines to /dev/virtio-ports/cis490.guest.agent.
tools/build_cidata.py — embeds the agent + an OpenRC service into
user-data so first boot of the Alpine cidata image auto-starts it.
Launchers:
vm/launch_demo.sh / launch_target.sh — second virtio-serial port for
the agent socket; SLOT env support so multiple VMs run without
socket / port collisions; PORT_BASE on launch_target so multiple
target VMs hostfwd different host ports.
vm/setup_bridge.sh — creates host-only br-malware (10.200.0.1/24,
no NAT). Idempotent.
Fleet:
orchestrator/fleet.py — capacity detector (cores / RAM / load
headroom) + concurrent-slot runner. Per-slot ENV selects the
sample. FleetCapacity dataclass round-trips into meta.json so
"this episode ran with 6 concurrent VMs" is auditable post-hoc.
tools/run_fleet.py — CLI: --capacity report; --waves N runs N
waves of (max_concurrent) episodes each, every slot with a
different sample.
etc/cis490-orchestrator.service — now drives the fleet runner with
Restart=always so each invocation runs one wave and respawns,
giving a continuous stream.
Samples:
samples/manifest.toml — six profiles spanning the five major
behaviour shapes. Each entry is real OR mimic (sha256 distinguishes).
samples/manifest.py — strict TOML loader (rejects dups, unknown
categories) + deterministic select(host_id, slot, episode_index)
so different hosts on the network walk the catalog in different
orders without any coordinator.
EpisodeRunner:
orchestrator/episode.py — optional qmp_socket + guest_agent_socket
fields on EpisodeConfig; when set, additional collector threads
run alongside proc_qemu. EpisodeResult now carries rows_qmp +
rows_guest counters.
Tier-3 setup automation:
scripts/install-msfrpcd.sh — installs metasploit-framework where
the package manager has it, generates a strong password into
/etc/cis490/msfrpc.env, drops a hardened systemd unit bound to
127.0.0.1:55553. After this, run_tier3_demo.py works zero-touch
once MSFRPC_PASSWORD is sourced.
scripts/fetch-metasploitable2.sh — accepts IMAGE_URL + IMAGE_SHA256
from the operator (Rapid7 download is registration-walled), pulls,
verifies, converts vmdk → qcow2, lands at vm/images/.
Tests: 82 pass (was 51). New suites:
tests/test_qmp.py — fake QMP server, capability handshake,
blockstats, async-event interleaving,
5-failure backoff
tests/test_guest_agent.py — fake virtio socket, JSON-lines read +
re-stamp, malformed-line tolerance
tests/test_pcap.py — synthetic pcap with TCP/UDP/ARP frames,
bucketize correctness across windows
tests/test_fleet.py — capacity math (8-core idle / low-RAM /
high-load / Pi5 / 1-core box), manifest
selection determinism + diversity
What's queued for the next commit (already discussed in convo):
- MSFExploitDriver v2: map sample.profile → distinct in-session
workload so Tier-3 episodes don't all produce the same yes-loop
envelope. Critical for ML to learn varied malware shapes.
- Real-sample fetch from MalwareBazaar by sha256.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
End-to-end: ``python -m orchestrator --target-pid <pid> --duration N`` now
writes a complete episode directory matching docs/data-model.md, with phase
labels, events, and a 10 Hz host /proc telemetry stream. No VM yet — pid is
arbitrary so we can validate the loop against e.g. ``sleep 5`` while the lab
side comes up.
collectors/proc_qemu.py — parses /proc/<pid>/{stat,io,status} (handles parens
in comm), single-shot collect_once(), and a stop-event-driven run_loop()
that ticks at a fixed cadence and exits when the pid disappears. Tagged
``available_in_deployment: false`` per the threat-model doc.
orchestrator/episode.py — EpisodeRunner: creates data/episodes/<ulid>/,
atomic meta.json, events.jsonl + labels.jsonl writers, drives the collector
in a thread for duration_s, writes done.marker last so the shipper never
sees a half-finished episode.
orchestrator/ulid.py — tiny 26-char Crockford-base32 ULID generator.
Time-sortable, no third-party dep.
orchestrator/__main__.py — CLI entry point.
Tests (15 new, 28 total green):
- proc_qemu: real-ish stat with parens-in-comm, missing /proc/<pid>/io,
missing pid, run_loop cadence, run_loop terminates when pid disappears.
- episode: full directory shape against os.getpid(), id override,
done.marker written after meta.json finalize.
- ulid: length+alphabet, 2000-burst uniqueness, time-sortability.
Smoke-tested against ``sleep 10``: 16 rows over 1.5s at 100ms cadence,
monotonic clock, RSS stable at ~3.5 MiB as expected for an idle sleep.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>