Lays down the design surface for the CIS490 behavioral-malware-detection dataset and model. No code yet — schema and topology are decided first so collection can start without rework. Docs: - README: project goal, navigation - architecture: lab topology, KVM choice, episode state machine, deployment-mirror reasoning - threat-model: train/serve parity rule, oracle-vs-deployable feature split, two-model evaluation strategy - data-model: per-episode JSONL layout, row schemas, phase enum - transport: WG-native shipper/receiver design, idempotent uploads - deploy: one-command install for lab-host and receiver roles - lab-setup: KVM prereqs, VM build, snapshot, virtio-serial wiring Skeleton: orchestrator/, collectors/, vm/, exploits/, samples/, training/ (each with a short README explaining purpose). Extended .gitignore to exclude qcow2 images, pcaps, sample binaries, secrets. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| README.md | ||
samples/
Sample binaries are NEVER committed to this repo. This directory holds:
manifest.yaml— sha256-pinned list of samples to fetch, with metadata (source, category, expected behavior, target CVE).fetch.py— script that pulls samples from configured sources (MalwareBazaar, theZoo, vx-underground), verifies sha256, and stores them undersamples/store/(gitignored).- Per-sample notes in markdown describing observed behavior in our lab.
samples/store/ lives only on the lab host. It is gitignored and should
sit on a disk that is not auto-mounted on developer workstations.
Manifest entry shape (placeholder)
samples:
- name: linux.miner.xmrig.elf
sha256: "..." # pinned
source: MalwareBazaar
category: miner
target_cve: null # cryptominers are usually post-exploit payloads
behavior: "high CPU, periodic stratum protocol traffic"
pairs_with_exploit: exploit/multi/samba/usermap_script
Safety rules
- Only download to the lab host, never to a developer workstation.
- Verify sha256 immediately, before any other read.
- Keep the directory on a path that is not on the WG overlay.
- Re-verify sha256 before each detonation; refuse to run on mismatch.