Lays down the design surface for the CIS490 behavioral-malware-detection dataset and model. No code yet — schema and topology are decided first so collection can start without rework. Docs: - README: project goal, navigation - architecture: lab topology, KVM choice, episode state machine, deployment-mirror reasoning - threat-model: train/serve parity rule, oracle-vs-deployable feature split, two-model evaluation strategy - data-model: per-episode JSONL layout, row schemas, phase enum - transport: WG-native shipper/receiver design, idempotent uploads - deploy: one-command install for lab-host and receiver roles - lab-setup: KVM prereqs, VM build, snapshot, virtio-serial wiring Skeleton: orchestrator/, collectors/, vm/, exploits/, samples/, training/ (each with a short README explaining purpose). Extended .gitignore to exclude qcow2 images, pcaps, sample binaries, secrets. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
21 lines
821 B
Markdown
21 lines
821 B
Markdown
# orchestrator/
|
|
|
|
The state machine that drives a single **episode**:
|
|
|
|
```
|
|
snapshot_load → clean → armed → infecting → infected_running → dormant → reverting
|
|
```
|
|
|
|
Responsibilities:
|
|
|
|
- Bring up the host-only bridge and verify isolation before the guest starts.
|
|
- Boot the guest from a named snapshot.
|
|
- Spawn the five telemetry collectors (`collectors/`) with a shared episode id
|
|
and shared monotonic clock origin.
|
|
- Drive the Metasploit Framework over RPC to fire the configured exploit module.
|
|
- Upload + execute the configured malware sample once a session is open.
|
|
- Emit phase transitions to `labels.jsonl` *at the moment the action is taken*.
|
|
- Revert the snapshot at episode end.
|
|
- Write `meta.json` with the result summary.
|
|
|
|
Implementation lives in this directory and is imported as `orchestrator.*`.
|