Lays down the design surface for the CIS490 behavioral-malware-detection dataset and model. No code yet — schema and topology are decided first so collection can start without rework. Docs: - README: project goal, navigation - architecture: lab topology, KVM choice, episode state machine, deployment-mirror reasoning - threat-model: train/serve parity rule, oracle-vs-deployable feature split, two-model evaluation strategy - data-model: per-episode JSONL layout, row schemas, phase enum - transport: WG-native shipper/receiver design, idempotent uploads - deploy: one-command install for lab-host and receiver roles - lab-setup: KVM prereqs, VM build, snapshot, virtio-serial wiring Skeleton: orchestrator/, collectors/, vm/, exploits/, samples/, training/ (each with a short README explaining purpose). Extended .gitignore to exclude qcow2 images, pcaps, sample binaries, secrets. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4.2 KiB
Lab Setup
How to bring up the host, build the guest, and verify the snapshot loop.
Host prerequisites
qemu-system-x86_64 >= 8.0
qemu-img >= 8.0
bridge-utils
tcpdump / tshark
linux-tools-common (for `perf`)
zstd
python >= 3.11
uv (https://github.com/astral-sh/uv)
scripts/install-lab-host.sh installs all of these and wires up systemd —
see deploy.md.
KVM must be enabled in the kernel and the user must be in the kvm group:
ls /dev/kvm # must exist
groups # must include kvm
Network: host-only malware bridge
br-malware (10.200.0.1/24) is the only network the guest sees, and it is
host-only — no NAT, no upstream route. The host's WG interface is on a
separate link (wg0) used only for shipping completed episodes to the
collector; the bridge and WG never touch.
| Interface | Purpose |
|---|---|
br-malware (10.200.0.1/24) |
host-only bridge, only NIC attached to the guest |
guest eth0 |
DHCP from a dnsmasq bound only to br-malware |
host WG (wg0) |
shipping channel to the collector — not connected to the bridge |
Detailed firewall rules and the egress-drop safety net are out of scope for this document and live in the deploy script. The relevant invariant for readers is: the guest cannot route off
br-malware, period.
Guest: Metasploitable 2
-
Download from the Rapid7 mirror (verify sha256 against the published value before use).
-
Convert VMware → qcow2:
qemu-img convert -O qcow2 -p Metasploitable.vmdk metasploitable2.qcow2 -
First boot (no snapshot yet) — let it come up, log in (msfadmin/msfadmin), confirm services are listening on the expected ports, shut down cleanly.
-
Take the baseline snapshot:
qemu-img snapshot -c baseline-v1 metasploitable2.qcow2Internal qcow2 snapshots load in well under a second — this is the "factory reset" mechanism for every episode.
Single-vCPU constrained-device emulation
-cpu host -smp 1,sockets=1,cores=1,threads=1
-m 512
-machine type=q35,accel=kvm
Plus a host-side cgroup CPU cap on the QEMU process (e.g. 80% of one core) so the guest behaves like a small, constrained device under load.
Telemetry channels
virtio-serial for the in-guest agent
-device virtio-serial-pci
-chardev socket,path=/run/qemu/guest-agent.sock,server=on,wait=off,id=ga
-device virtserialport,chardev=ga,name=cis490.guest.agent
The in-guest agent opens /dev/virtio-ports/cis490.guest.agent and writes
JSONL to it. Host side, the orchestrator reads from the unix socket. No network
involvement = the malware cannot interfere with this channel.
QMP for live oracle queries
-qmp unix:/run/qemu/qmp.sock,server=on,wait=off
The orchestrator polls query-stats, query-blockstats, and netdev stats over
this socket.
perf stat on the QEMU process
perf stat -p <qemu_pid> -I 100 \
-e cycles,instructions,cache-references,cache-misses,branches,branch-misses,page-faults,context-switches \
-x , -o telemetry-perf.csv
The collector tails the CSV, parses, and emits JSONL.
tcpdump on br-malware
tcpdump -i br-malware -w network.pcap -B 4096 -s 200
Post-process to netflow.jsonl with 100ms buckets.
Snapshot loop sanity check
A green light before any data collection:
qemu-img snapshot -l metasploitable2.qcow2showsbaseline-v1.- Boot the VM with the qcow2.
- Touch a file in the guest. Shut down.
qemu-img snapshot -a baseline-v1 metasploitable2.qcow2.- Boot again. The file is gone. ✅
Safety checks before running real samples
ip route show table all | grep br-malwareshows no route off the bridge.dig @host example.comfrom a guest fails (no DNS for malware).- The host's WG interface is not bridged to
br-malware.
(See scripts/install-lab-host.sh for the firewall plumbing — it isn't the
focus of this project.)
Where to put VMs and snapshots
vm/images/ # qcow2 disk images (gitignored)
vm/snapshots/ # named snapshot exports if we ever externalize them
Both directories are gitignored. The repo only carries the recipes for reproducing them.