Lays down the design surface for the CIS490 behavioral-malware-detection dataset and model. No code yet — schema and topology are decided first so collection can start without rework. Docs: - README: project goal, navigation - architecture: lab topology, KVM choice, episode state machine, deployment-mirror reasoning - threat-model: train/serve parity rule, oracle-vs-deployable feature split, two-model evaluation strategy - data-model: per-episode JSONL layout, row schemas, phase enum - transport: WG-native shipper/receiver design, idempotent uploads - deploy: one-command install for lab-host and receiver roles - lab-setup: KVM prereqs, VM build, snapshot, virtio-serial wiring Skeleton: orchestrator/, collectors/, vm/, exploits/, samples/, training/ (each with a short README explaining purpose). Extended .gitignore to exclude qcow2 images, pcaps, sample binaries, secrets. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| README.md | ||
collectors/
One module per telemetry source. All collectors:
- Receive an
episode_id, an output directory, and a sharedt_mono_origin_ns. - Write JSONL into
data/episodes/<episode_id>/telemetry-<name>.jsonl. - Stamp every row with the same
t_mono_ns/t_wall_nsclock pair. - Stamp every row with
sourceandavailable_in_deployment(true/false). - Exit cleanly on
SIGTERMfrom the orchestrator.
| Module | Source | Vantage | Role |
|---|---|---|---|
proc_qemu.py |
host /proc/<qemu_pid>/{stat,io,status,schedstat} |
outside guest | oracle |
qmp.py |
QEMU QMP query-stats, query-blockstats, netdev |
outside guest | oracle |
perf_qemu.py |
perf stat -p <qemu_pid> |
outside guest | oracle |
pcap.py |
tcpdump -i br-malware, bucketed |
gateway-side | feature |
guest_agent.py |
virtio-serial reader, parses agent JSONL | inside guest | feature |
The in-guest agent itself (a small Python+psutil program that runs on the
guest and writes to /dev/virtio-ports/cis490.guest.agent) lives under
vm/guest-agent/ because it is shipped into the guest at image-build time.
See docs/data-model.md for row schemas.