Lays down the design surface for the CIS490 behavioral-malware-detection dataset and model. No code yet — schema and topology are decided first so collection can start without rework. Docs: - README: project goal, navigation - architecture: lab topology, KVM choice, episode state machine, deployment-mirror reasoning - threat-model: train/serve parity rule, oracle-vs-deployable feature split, two-model evaluation strategy - data-model: per-episode JSONL layout, row schemas, phase enum - transport: WG-native shipper/receiver design, idempotent uploads - deploy: one-command install for lab-host and receiver roles - lab-setup: KVM prereqs, VM build, snapshot, virtio-serial wiring Skeleton: orchestrator/, collectors/, vm/, exploits/, samples/, training/ (each with a short README explaining purpose). Extended .gitignore to exclude qcow2 images, pcaps, sample binaries, secrets. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
145 lines
4.2 KiB
Markdown
145 lines
4.2 KiB
Markdown
# Lab Setup
|
|
|
|
How to bring up the host, build the guest, and verify the snapshot loop.
|
|
|
|
## Host prerequisites
|
|
|
|
```
|
|
qemu-system-x86_64 >= 8.0
|
|
qemu-img >= 8.0
|
|
bridge-utils
|
|
tcpdump / tshark
|
|
linux-tools-common (for `perf`)
|
|
zstd
|
|
python >= 3.11
|
|
uv (https://github.com/astral-sh/uv)
|
|
```
|
|
|
|
`scripts/install-lab-host.sh` installs all of these and wires up systemd —
|
|
see [`deploy.md`](deploy.md).
|
|
|
|
KVM must be enabled in the kernel and the user must be in the `kvm` group:
|
|
|
|
```
|
|
ls /dev/kvm # must exist
|
|
groups # must include kvm
|
|
```
|
|
|
|
## Network: host-only malware bridge
|
|
|
|
`br-malware` (10.200.0.1/24) is the only network the guest sees, and it is
|
|
host-only — no NAT, no upstream route. The host's WG interface is on a
|
|
*separate* link (`wg0`) used only for shipping completed episodes to the
|
|
collector; the bridge and WG never touch.
|
|
|
|
| Interface | Purpose |
|
|
|---|---|
|
|
| `br-malware` (10.200.0.1/24) | host-only bridge, only NIC attached to the guest |
|
|
| guest `eth0` | DHCP from a dnsmasq bound only to `br-malware` |
|
|
| host WG (`wg0`) | shipping channel to the collector — not connected to the bridge |
|
|
|
|
> Detailed firewall rules and the egress-drop safety net are out of scope for
|
|
> this document and live in the deploy script. The relevant invariant for
|
|
> readers is: **the guest cannot route off `br-malware`, period.**
|
|
|
|
## Guest: Metasploitable 2
|
|
|
|
1. Download from the [Rapid7 mirror](https://information.rapid7.com/download-metasploitable-2017.html)
|
|
(verify sha256 against the published value before use).
|
|
2. Convert VMware → qcow2:
|
|
|
|
```
|
|
qemu-img convert -O qcow2 -p Metasploitable.vmdk metasploitable2.qcow2
|
|
```
|
|
|
|
3. First boot (no snapshot yet) — let it come up, log in (msfadmin/msfadmin),
|
|
confirm services are listening on the expected ports, shut down cleanly.
|
|
4. Take the baseline snapshot:
|
|
|
|
```
|
|
qemu-img snapshot -c baseline-v1 metasploitable2.qcow2
|
|
```
|
|
|
|
Internal qcow2 snapshots load in well under a second — this is the
|
|
"factory reset" mechanism for every episode.
|
|
|
|
## Single-vCPU constrained-device emulation
|
|
|
|
```
|
|
-cpu host -smp 1,sockets=1,cores=1,threads=1
|
|
-m 512
|
|
-machine type=q35,accel=kvm
|
|
```
|
|
|
|
Plus a host-side cgroup CPU cap on the QEMU process (e.g. 80% of one core) so
|
|
the guest behaves like a small, constrained device under load.
|
|
|
|
## Telemetry channels
|
|
|
|
### virtio-serial for the in-guest agent
|
|
|
|
```
|
|
-device virtio-serial-pci
|
|
-chardev socket,path=/run/qemu/guest-agent.sock,server=on,wait=off,id=ga
|
|
-device virtserialport,chardev=ga,name=cis490.guest.agent
|
|
```
|
|
|
|
The in-guest agent opens `/dev/virtio-ports/cis490.guest.agent` and writes
|
|
JSONL to it. Host side, the orchestrator reads from the unix socket. No network
|
|
involvement = the malware cannot interfere with this channel.
|
|
|
|
### QMP for live oracle queries
|
|
|
|
```
|
|
-qmp unix:/run/qemu/qmp.sock,server=on,wait=off
|
|
```
|
|
|
|
The orchestrator polls `query-stats`, `query-blockstats`, and netdev stats over
|
|
this socket.
|
|
|
|
### perf stat on the QEMU process
|
|
|
|
```
|
|
perf stat -p <qemu_pid> -I 100 \
|
|
-e cycles,instructions,cache-references,cache-misses,branches,branch-misses,page-faults,context-switches \
|
|
-x , -o telemetry-perf.csv
|
|
```
|
|
|
|
The collector tails the CSV, parses, and emits JSONL.
|
|
|
|
### tcpdump on `br-malware`
|
|
|
|
```
|
|
tcpdump -i br-malware -w network.pcap -B 4096 -s 200
|
|
```
|
|
|
|
Post-process to `netflow.jsonl` with 100ms buckets.
|
|
|
|
## Snapshot loop sanity check
|
|
|
|
A green light before any data collection:
|
|
|
|
1. `qemu-img snapshot -l metasploitable2.qcow2` shows `baseline-v1`.
|
|
2. Boot the VM with the qcow2.
|
|
3. Touch a file in the guest. Shut down.
|
|
4. `qemu-img snapshot -a baseline-v1 metasploitable2.qcow2`.
|
|
5. Boot again. The file is gone. ✅
|
|
|
|
## Safety checks before running real samples
|
|
|
|
- `ip route show table all | grep br-malware` shows no route off the bridge.
|
|
- `dig @host example.com` from a guest fails (no DNS for malware).
|
|
- The host's WG interface is **not** bridged to `br-malware`.
|
|
|
|
(See `scripts/install-lab-host.sh` for the firewall plumbing — it isn't the
|
|
focus of this project.)
|
|
|
|
## Where to put VMs and snapshots
|
|
|
|
```
|
|
vm/images/ # qcow2 disk images (gitignored)
|
|
vm/snapshots/ # named snapshot exports if we ever externalize them
|
|
```
|
|
|
|
Both directories are gitignored. The repo only carries the *recipes* for
|
|
reproducing them.
|