CIS490/samples
Maximus Gorog fa1574a0a6 Scaffold project: docs, repo skeleton, transport + deploy design
Lays down the design surface for the CIS490 behavioral-malware-detection
dataset and model. No code yet — schema and topology are decided first so
collection can start without rework.

Docs:
- README: project goal, navigation
- architecture: lab topology, KVM choice, episode state machine,
  deployment-mirror reasoning
- threat-model: train/serve parity rule, oracle-vs-deployable feature
  split, two-model evaluation strategy
- data-model: per-episode JSONL layout, row schemas, phase enum
- transport: WG-native shipper/receiver design, idempotent uploads
- deploy: one-command install for lab-host and receiver roles
- lab-setup: KVM prereqs, VM build, snapshot, virtio-serial wiring

Skeleton: orchestrator/, collectors/, vm/, exploits/, samples/,
training/ (each with a short README explaining purpose).
Extended .gitignore to exclude qcow2 images, pcaps, sample binaries,
secrets.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 23:21:00 -06:00
..
README.md Scaffold project: docs, repo skeleton, transport + deploy design 2026-04-28 23:21:00 -06:00

samples/

Sample binaries are NEVER committed to this repo. This directory holds:

  • manifest.yaml — sha256-pinned list of samples to fetch, with metadata (source, category, expected behavior, target CVE).
  • fetch.py — script that pulls samples from configured sources (MalwareBazaar, theZoo, vx-underground), verifies sha256, and stores them under samples/store/ (gitignored).
  • Per-sample notes in markdown describing observed behavior in our lab.

samples/store/ lives only on the lab host. It is gitignored and should sit on a disk that is not auto-mounted on developer workstations.

Manifest entry shape (placeholder)

samples:
  - name: linux.miner.xmrig.elf
    sha256: "..."                # pinned
    source: MalwareBazaar
    category: miner
    target_cve: null              # cryptominers are usually post-exploit payloads
    behavior: "high CPU, periodic stratum protocol traffic"
    pairs_with_exploit: exploit/multi/samba/usermap_script

Safety rules

  • Only download to the lab host, never to a developer workstation.
  • Verify sha256 immediately, before any other read.
  • Keep the directory on a path that is not on the WG overlay.
  • Re-verify sha256 before each detonation; refuse to run on mismatch.