This is the chunk that makes "real data" actually flow on multiple
hosts in parallel. End-to-end pipe was up at 613c6fa / 2579683; now
the lab-host side has the diversity + concurrency it needs.
Collectors landed:
collectors/qmp.py — source 2 (oracle). Tiny synchronous QMP
client + row builder + run loop. Tolerates
older qemu without query-stats.
collectors/guest_agent.py — source 5 (deployable). Reads the
virtio-serial host-side socket, parses
agent JSON-lines, re-stamps to the host
monotonic clock, persists.
collectors/pcap.py — source 4 (deployable). tcpdump capture
+ pure-Python pcap reader + 100 ms
netflow.jsonl bucketizer. Decodes
Ethernet/IPv4/TCP/UDP enough for the
schema in docs/data-model.md.
In-guest agent:
vm/guest-agent/cis490_agent.py — stdlib-only Python agent. Reads
/proc/{stat,meminfo,loadavg,net/dev,net/tcp*}, top-N RSS procs,
thermal. Writes JSON-lines to /dev/virtio-ports/cis490.guest.agent.
tools/build_cidata.py — embeds the agent + an OpenRC service into
user-data so first boot of the Alpine cidata image auto-starts it.
Launchers:
vm/launch_demo.sh / launch_target.sh — second virtio-serial port for
the agent socket; SLOT env support so multiple VMs run without
socket / port collisions; PORT_BASE on launch_target so multiple
target VMs hostfwd different host ports.
vm/setup_bridge.sh — creates host-only br-malware (10.200.0.1/24,
no NAT). Idempotent.
Fleet:
orchestrator/fleet.py — capacity detector (cores / RAM / load
headroom) + concurrent-slot runner. Per-slot ENV selects the
sample. FleetCapacity dataclass round-trips into meta.json so
"this episode ran with 6 concurrent VMs" is auditable post-hoc.
tools/run_fleet.py — CLI: --capacity report; --waves N runs N
waves of (max_concurrent) episodes each, every slot with a
different sample.
etc/cis490-orchestrator.service — now drives the fleet runner with
Restart=always so each invocation runs one wave and respawns,
giving a continuous stream.
Samples:
samples/manifest.toml — six profiles spanning the five major
behaviour shapes. Each entry is real OR mimic (sha256 distinguishes).
samples/manifest.py — strict TOML loader (rejects dups, unknown
categories) + deterministic select(host_id, slot, episode_index)
so different hosts on the network walk the catalog in different
orders without any coordinator.
EpisodeRunner:
orchestrator/episode.py — optional qmp_socket + guest_agent_socket
fields on EpisodeConfig; when set, additional collector threads
run alongside proc_qemu. EpisodeResult now carries rows_qmp +
rows_guest counters.
Tier-3 setup automation:
scripts/install-msfrpcd.sh — installs metasploit-framework where
the package manager has it, generates a strong password into
/etc/cis490/msfrpc.env, drops a hardened systemd unit bound to
127.0.0.1:55553. After this, run_tier3_demo.py works zero-touch
once MSFRPC_PASSWORD is sourced.
scripts/fetch-metasploitable2.sh — accepts IMAGE_URL + IMAGE_SHA256
from the operator (Rapid7 download is registration-walled), pulls,
verifies, converts vmdk → qcow2, lands at vm/images/.
Tests: 82 pass (was 51). New suites:
tests/test_qmp.py — fake QMP server, capability handshake,
blockstats, async-event interleaving,
5-failure backoff
tests/test_guest_agent.py — fake virtio socket, JSON-lines read +
re-stamp, malformed-line tolerance
tests/test_pcap.py — synthetic pcap with TCP/UDP/ARP frames,
bucketize correctness across windows
tests/test_fleet.py — capacity math (8-core idle / low-RAM /
high-load / Pi5 / 1-core box), manifest
selection determinism + diversity
What's queued for the next commit (already discussed in convo):
- MSFExploitDriver v2: map sample.profile → distinct in-session
workload so Tier-3 episodes don't all produce the same yes-loop
envelope. Critical for ML to learn varied malware shapes.
- Real-sample fetch from MalwareBazaar by sha256.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
274 lines
8.5 KiB
Python
274 lines
8.5 KiB
Python
#!/usr/bin/env python3
|
|
"""In-guest telemetry agent — runs INSIDE the VM.
|
|
|
|
Writes one JSON-lines row per tick to a virtio-serial port that the
|
|
host has wired up as ``cis490.guest.agent``. The host-side collector
|
|
(`collectors.guest_agent`) reads these rows and stamps them with the
|
|
host's monotonic clock before persisting to ``telemetry-guest.jsonl``.
|
|
|
|
Stdlib only — no `psutil`, no extra deps to bake into the guest. Every
|
|
field is read from /proc on the guest, so this works on busybox-based
|
|
Alpine, on Cirros, and on Metasploitable2 unchanged.
|
|
|
|
Wire path inside the guest:
|
|
/dev/virtio-ports/cis490.guest.agent
|
|
|
|
The host side opens the matching unix socket on the hypervisor.
|
|
The protocol is intentionally trivial: the agent emits newline-
|
|
delimited JSON; the host emits nothing back. One direction.
|
|
|
|
This source is the **deployable** side — every row is tagged
|
|
``available_in_deployment: true``. See docs/threat-model.md.
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
import argparse
|
|
import json
|
|
import os
|
|
import platform
|
|
import sys
|
|
import time
|
|
from typing import Any
|
|
|
|
|
|
SOURCE = "guest_agent"
|
|
AVAILABLE_IN_DEPLOYMENT = True
|
|
DEFAULT_PORT = "/dev/virtio-ports/cis490.guest.agent"
|
|
DEFAULT_INTERVAL_MS = 100 # 10 Hz
|
|
DEFAULT_TOP_N = 8
|
|
|
|
|
|
# ---------- /proc parsers ---------------------------------------------------
|
|
|
|
|
|
def _read(path: str) -> str | None:
|
|
try:
|
|
with open(path, "rb") as f:
|
|
return f.read().decode("ascii", errors="replace")
|
|
except (FileNotFoundError, PermissionError):
|
|
return None
|
|
|
|
|
|
def read_loadavg() -> tuple[float, float, float] | None:
|
|
text = _read("/proc/loadavg")
|
|
if text is None:
|
|
return None
|
|
parts = text.split()
|
|
return float(parts[0]), float(parts[1]), float(parts[2])
|
|
|
|
|
|
def read_meminfo() -> dict[str, int]:
|
|
text = _read("/proc/meminfo")
|
|
out: dict[str, int] = {}
|
|
if text is None:
|
|
return out
|
|
for line in text.splitlines():
|
|
k, _, rest = line.partition(":")
|
|
v = rest.strip()
|
|
if v.endswith(" kB"):
|
|
try:
|
|
out[k] = int(v[:-3]) * 1024
|
|
except ValueError:
|
|
pass
|
|
return out
|
|
|
|
|
|
def read_cpu_total() -> dict[str, int] | None:
|
|
"""First line of /proc/stat: aggregate cpu user/nice/sys/idle/...
|
|
in jiffies since boot."""
|
|
text = _read("/proc/stat")
|
|
if text is None:
|
|
return None
|
|
line = text.splitlines()[0]
|
|
fields = line.split()
|
|
# cpu user nice system idle iowait irq softirq steal guest guest_nice
|
|
if not fields or fields[0] != "cpu":
|
|
return None
|
|
nums = [int(x) for x in fields[1:]]
|
|
pad = nums + [0] * max(0, 10 - len(nums))
|
|
return {
|
|
"user": pad[0],
|
|
"nice": pad[1],
|
|
"system": pad[2],
|
|
"idle": pad[3],
|
|
"iowait": pad[4],
|
|
"irq": pad[5],
|
|
"softirq": pad[6],
|
|
"steal": pad[7],
|
|
"guest": pad[8],
|
|
"guest_nice":pad[9],
|
|
}
|
|
|
|
|
|
def read_thermal_milli_c() -> int | None:
|
|
"""Best-effort: /sys/class/thermal/thermal_zone0/temp."""
|
|
text = _read("/sys/class/thermal/thermal_zone0/temp")
|
|
if text is None:
|
|
return None
|
|
try:
|
|
return int(text.strip())
|
|
except ValueError:
|
|
return None
|
|
|
|
|
|
def read_net_devs() -> dict[str, dict[str, int]]:
|
|
"""Parse /proc/net/dev → {iface: {rx_bytes, tx_bytes, rx_pkts, tx_pkts}}."""
|
|
text = _read("/proc/net/dev")
|
|
out: dict[str, dict[str, int]] = {}
|
|
if text is None:
|
|
return out
|
|
lines = text.splitlines()
|
|
for line in lines[2:]:
|
|
if ":" not in line:
|
|
continue
|
|
name, _, rest = line.partition(":")
|
|
name = name.strip()
|
|
if name == "lo":
|
|
continue
|
|
cols = rest.split()
|
|
if len(cols) < 16:
|
|
continue
|
|
out[name] = {
|
|
"rx_bytes": int(cols[0]),
|
|
"rx_pkts": int(cols[1]),
|
|
"tx_bytes": int(cols[8]),
|
|
"tx_pkts": int(cols[9]),
|
|
}
|
|
return out
|
|
|
|
|
|
def read_listen_ports() -> list[int]:
|
|
"""TCP listen sockets from /proc/net/tcp + tcp6. State 0A = LISTEN."""
|
|
out: set[int] = set()
|
|
for path in ("/proc/net/tcp", "/proc/net/tcp6"):
|
|
text = _read(path)
|
|
if not text:
|
|
continue
|
|
for line in text.splitlines()[1:]:
|
|
cols = line.split()
|
|
if len(cols) < 4:
|
|
continue
|
|
if cols[3] != "0A":
|
|
continue
|
|
local = cols[1] # "ADDR:PORT" with PORT in hex
|
|
_, _, port_hex = local.rpartition(":")
|
|
try:
|
|
out.add(int(port_hex, 16))
|
|
except ValueError:
|
|
pass
|
|
return sorted(out)
|
|
|
|
|
|
def read_top_procs(top_n: int) -> list[dict[str, Any]]:
|
|
"""Top-N processes by RSS. Cheap O(N) scan of /proc."""
|
|
procs: list[dict[str, Any]] = []
|
|
try:
|
|
entries = os.listdir("/proc")
|
|
except OSError:
|
|
return procs
|
|
for ent in entries:
|
|
if not ent.isdigit():
|
|
continue
|
|
pid = int(ent)
|
|
stat = _read(f"/proc/{pid}/stat")
|
|
if stat is None:
|
|
continue
|
|
try:
|
|
rparen = stat.rindex(")")
|
|
comm = stat[stat.index("(") + 1 : rparen]
|
|
fields = stat[rparen + 2:].split()
|
|
utime = int(fields[11])
|
|
stime = int(fields[12])
|
|
rss_pages = int(fields[21])
|
|
except (ValueError, IndexError):
|
|
continue
|
|
procs.append({
|
|
"pid": pid,
|
|
"comm": comm[:32],
|
|
"cpu_jiffies": utime + stime,
|
|
"rss_bytes": rss_pages * os.sysconf("SC_PAGESIZE"),
|
|
})
|
|
procs.sort(key=lambda p: p["rss_bytes"], reverse=True)
|
|
return procs[:top_n]
|
|
|
|
|
|
# ---------- one tick --------------------------------------------------------
|
|
|
|
|
|
def collect_once(top_n: int = DEFAULT_TOP_N) -> dict[str, Any]:
|
|
mem = read_meminfo()
|
|
cpu = read_cpu_total()
|
|
load = read_loadavg()
|
|
return {
|
|
"t_guest_mono_ns": time.monotonic_ns(),
|
|
"t_guest_wall_ns": time.time_ns(),
|
|
"source": SOURCE,
|
|
"available_in_deployment": AVAILABLE_IN_DEPLOYMENT,
|
|
"kernel": platform.release(),
|
|
"cpu_total_jiffies": cpu,
|
|
"load_1m_5m_15m": list(load) if load else None,
|
|
"mem_total_bytes": (mem.get("MemTotal") or 0),
|
|
"mem_available_bytes": (mem.get("MemAvailable") or 0),
|
|
"mem_buffers_bytes": (mem.get("Buffers") or 0),
|
|
"mem_cached_bytes": (mem.get("Cached") or 0),
|
|
"swap_used_bytes": (mem.get("SwapTotal", 0) - mem.get("SwapFree", 0)),
|
|
"thermal_milli_c": read_thermal_milli_c(),
|
|
"net": read_net_devs(),
|
|
"listen_ports": read_listen_ports(),
|
|
"top_procs": read_top_procs(top_n),
|
|
}
|
|
|
|
|
|
# ---------- main loop -------------------------------------------------------
|
|
|
|
|
|
def main(argv: list[str] | None = None) -> int:
|
|
p = argparse.ArgumentParser(prog="cis490-guest-agent")
|
|
p.add_argument("--port", default=DEFAULT_PORT,
|
|
help="virtio-serial port path inside the guest")
|
|
p.add_argument("--interval-ms", type=int, default=DEFAULT_INTERVAL_MS)
|
|
p.add_argument("--top-n", type=int, default=DEFAULT_TOP_N)
|
|
p.add_argument("--once", action="store_true",
|
|
help="emit a single row and exit (for smoke tests)")
|
|
args = p.parse_args(argv)
|
|
|
|
if args.once:
|
|
sys.stdout.write(json.dumps(collect_once(args.top_n)) + "\n")
|
|
sys.stdout.flush()
|
|
return 0
|
|
|
|
# Open the virtio-serial port. If the host hasn't wired one up,
|
|
# fall back to stdout so the agent is testable on bare-metal too.
|
|
out_fp: Any
|
|
if os.path.exists(args.port):
|
|
out_fp = open(args.port, "wb", buffering=0)
|
|
else:
|
|
sys.stderr.write(f"[cis490-agent] {args.port} missing; writing to stdout\n")
|
|
out_fp = sys.stdout.buffer
|
|
|
|
interval_ns = args.interval_ms * 1_000_000
|
|
next_tick = time.monotonic_ns()
|
|
try:
|
|
while True:
|
|
row = collect_once(args.top_n)
|
|
out_fp.write((json.dumps(row) + "\n").encode("utf-8"))
|
|
try:
|
|
out_fp.flush()
|
|
except (AttributeError, OSError):
|
|
pass
|
|
next_tick += interval_ns
|
|
sleep_ns = next_tick - time.monotonic_ns()
|
|
if sleep_ns > 0:
|
|
time.sleep(sleep_ns / 1_000_000_000)
|
|
else:
|
|
next_tick = time.monotonic_ns()
|
|
except KeyboardInterrupt:
|
|
return 0
|
|
except (BrokenPipeError, OSError) as e:
|
|
sys.stderr.write(f"[cis490-agent] write failed: {e}\n")
|
|
return 1
|
|
|
|
|
|
if __name__ == "__main__":
|
|
sys.exit(main())
|