PIPELINE §5 step 1: fix four root-cause defects

Diagnoses + fixes for the silent-collector / never-lands-session
failures that the 200-episode quality probe surfaced (§3 evidence).
All four address the producer; no compensating layers added.

perf collector (rows_perf=0 on 100% of episodes):
  - perf stat -j writes to stderr by default with -p; we read stdout.
    Add --log-fd 1 so JSON reaches stdout where the parser sees it.
  - Event names come back annotated with the privilege scope perf
    actually measured ("cycles:u" under perf_event_paranoid=2). Strip
    the suffix so _build_row's plain-name lookups hit. Without this
    every metric was None even when perf reported real numbers.
  - tests/test_collectors_emit.py covers the regression with a real
    busy-loop fixture; emit-test discipline per §4.4.

guest-agent collector (rows_guest=0 on 100% of episodes):
  - Alpine cloud image doesn't ship python3, so the in-guest agent's
    `#!/usr/bin/env python3` shebang silently fails. Add packages:
    [python3] to cidata user-data so cloud-init installs it before
    the OpenRC service starts.
  - Guest agent now exits nonzero (was: silent stdout fallback) when
    /dev/virtio-ports/cis490.guest.agent is missing, so OpenRC
    reports the failure to /var/log/cis490-agent.log instead of the
    bytes vanishing into the void. Refs §1.
  - Host-side collector emits guest_agent_connected /
    guest_agent_first_byte / guest_agent_silent_window into the
    orchestrator's events.jsonl. Future episodes show the in-guest
    failure mode per-episode instead of inferring from rows_guest=0.

k-gamingcom missing qmp/netflow/pcap (also affected elliott on
  Tier-3 episodes — was misclassified as host divergence):
  - tools/run_tier3_demo.py was building EpisodeConfig WITHOUT
    qmp_socket / guest_agent_socket / bridge_iface — even though
    launch_target.sh creates the underlying chardevs and BRIDGE
    supplies the iface. tools/run_real_vm_demo.py wires them
    correctly; Tier-3 had a copy-paste gap.
  - tests/test_collectors_emit.py adds a source-grep regression so
    the wiring stays honest.

samba_usermap_script never lands session (0/67 in §3 probe):
  - Bind handler default WfsDelay (~5s) gives up before bind_perl on
    Metasploitable2 has finished forking + binding LPORT under
    SLIRP+hostfwd. Bump to 30s; matches session_open_timeout_s in
    exploits/driver.py so framework + driver agree on the wait
    budget. Add ConnectTimeout=15 so the handler's bind connect has
    retry budget instead of one-shot.

orchestrator/fleet.py: usable_modules + BRIDGE handling were both
  unconditional, so:
  - With BRIDGE set, requires_bridge modules were still being
    dropped — picker only ever returned samba_usermap_script across
    every slot/episode (the test_fleet_uses_all_modules_when_bridge_set
    failure on HEAD).
  - env.pop("BRIDGE") fired even when BRIDGE was the operator's
    explicit setup, breaking modules that need bridge mode (vsftpd
    backdoor on hardcoded port 6200, distccd, etc.).
  Both made conditional on bridge_set so the picker walks the full
  catalog under bridge mode and SLIRP-only modules still get a
  clean SLIRP env when BRIDGE is unset.

receiver/app.py: half-pregnant v2 schema state in HEAD — calling
  store.ingest_stream(episode_type=..., benign_profile=...) with
  kwargs the matching store.py change was in the WIP stash. Removed
  v2 awareness from app.py so v1 episodes (what the producer ships
  today) get accepted again. SCHEMA_VERSION default reset to 1 to
  match.

229 passed, 0 failed. (HEAD had 15 failures, all linked to the
half-pregnant v2 state above.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Max Gorog 2026-05-03 17:05:25 -05:00
parent bfb1c491f8
commit 4ab5477226
10 changed files with 339 additions and 49 deletions

View file

@ -73,17 +73,46 @@ def run_loop(
stop_event: threading.Event, stop_event: threading.Event,
*, *,
connect_timeout_s: float = 30.0, connect_timeout_s: float = 30.0,
emit_event: "callable | None" = None,
) -> int: ) -> int:
"""Read agent JSON-lines from the host-side virtio-serial unix """Read agent JSON-lines from the host-side virtio-serial unix
socket. Re-stamp each row with the host clock and persist.""" socket. Re-stamp each row with the host clock and persist.
When ``emit_event`` is provided, the collector emits diagnostic
events into the orchestrator's events.jsonl on each lifecycle
boundary (connect / first-byte / silent-window / disconnect). This
is what makes silent in-guest failures *visible* in the dataset:
if connect succeeded but first_byte never came, every episode
shows it. Without these markers the only signal was rows_guest=0,
which is indistinguishable from "agent collector wasn't even
enabled." Refs PIPELINE.md §1 + §4.4.
"""
sock_path = Path(socket_path) sock_path = Path(socket_path)
sock = _connect(sock_path, connect_timeout_s) sock = _connect(sock_path, connect_timeout_s)
if sock is None: if sock is None:
log.warning(
"guest-agent: socket %s never came up after %.1fs — agent "
"is not running in the guest, virtserialport device is "
"missing from the QEMU command line, or the chardev "
"couldn't bind. 0 rows will be emitted.",
sock_path, connect_timeout_s,
)
if emit_event is not None:
emit_event("guest_agent_connect_failed",
socket_path=str(sock_path),
timeout_s=connect_timeout_s)
return 0 return 0
if emit_event is not None:
emit_event("guest_agent_connected", socket_path=str(sock_path))
rows = 0 rows = 0
output_path.parent.mkdir(parents=True, exist_ok=True) output_path.parent.mkdir(parents=True, exist_ok=True)
buf = b"" buf = b""
first_byte_at_mono_ns: int | None = None
silent_warned = False
silent_warn_after_s = 5.0
connect_mono_ns = time.monotonic_ns()
try: try:
with output_path.open("a", buffering=1) as f: with output_path.open("a", buffering=1) as f:
while not stop_event.is_set(): while not stop_event.is_set():
@ -91,6 +120,27 @@ def run_loop(
sock.settimeout(0.5) sock.settimeout(0.5)
chunk = sock.recv(8192) chunk = sock.recv(8192)
except socket.timeout: except socket.timeout:
# The socket is open but nothing's arriving. Emit
# exactly one warning when the silent window
# exceeds silent_warn_after_s — this is the loud
# signal §1 demands when the in-guest agent is
# connected but not producing.
if (not silent_warned and first_byte_at_mono_ns is None
and (time.monotonic_ns() - connect_mono_ns)
> silent_warn_after_s * 1e9):
log.warning(
"guest-agent: socket connected but no bytes "
"after %.1fs — in-guest agent likely crashed "
"or isn't writing to /dev/virtio-ports/"
"cis490.guest.agent",
silent_warn_after_s,
)
if emit_event is not None:
emit_event(
"guest_agent_silent_window",
window_s=silent_warn_after_s,
)
silent_warned = True
continue continue
except OSError as e: except OSError as e:
log.warning("guest-agent recv failed: %s", e) log.warning("guest-agent recv failed: %s", e)
@ -98,6 +148,20 @@ def run_loop(
if not chunk: if not chunk:
log.info("guest-agent socket closed") log.info("guest-agent socket closed")
break break
if first_byte_at_mono_ns is None:
first_byte_at_mono_ns = time.monotonic_ns()
log.info(
"guest-agent: first byte received %.2fs after connect",
(first_byte_at_mono_ns - connect_mono_ns) / 1e9,
)
if emit_event is not None:
emit_event(
"guest_agent_first_byte",
wait_after_connect_s=(
(first_byte_at_mono_ns - connect_mono_ns)
/ 1e9
),
)
buf += chunk buf += chunk
while b"\n" in buf: while b"\n" in buf:
line, _, buf = buf.partition(b"\n") line, _, buf = buf.partition(b"\n")

View file

@ -127,11 +127,17 @@ def run_loop(
log.warning("perf binary not on PATH — perf collector disabled") log.warning("perf binary not on PATH — perf collector disabled")
return 0 return 0
# perf stat writes its output (including -j JSON) to stderr by
# default when -p / --pid is in use; only when perf forks the
# workload itself does it go to stdout. --log-fd 1 forces output
# onto fd 1 so we can stream it through proc.stdout. Without this
# the collector silently writes 0 rows on every episode.
cmd = [ cmd = [
"perf", "stat", "perf", "stat",
"-p", str(pid), "-p", str(pid),
"-I", str(interval_ms), "-I", str(interval_ms),
"-j", "-j",
"--log-fd", "1",
"-e", ",".join(events), "-e", ",".join(events),
] ]
log.info("starting perf: %s", " ".join(cmd)) log.info("starting perf: %s", " ".join(cmd))
@ -179,6 +185,12 @@ def run_loop(
value = _coerce_int(evt.get("counter-value")) value = _coerce_int(evt.get("counter-value"))
if interval is None or event_name is None: if interval is None or event_name is None:
continue continue
# perf annotates event names with the privilege scope it
# was actually able to measure (e.g. "cycles:u" when only
# userspace is permitted under perf_event_paranoid=2).
# Strip the suffix so _build_row's plain-name lookups
# ("cycles", "instructions", ...) hit.
event_name = event_name.split(":", 1)[0]
# perf emits one JSON per (event, interval); a new # perf emits one JSON per (event, interval); a new
# interval value means we should flush the previous row. # interval value means we should flush the previous row.
if cur_interval is not None and interval != cur_interval: if cur_interval is not None and interval != cur_interval:

View file

@ -15,12 +15,27 @@ path = "multi/samba/usermap_script"
[module.options] [module.options]
RHOSTS = "{{ target_ip }}" RHOSTS = "{{ target_ip }}"
RPORT = 139 RPORT = 139
# WfsDelay = wait-for-session, the budget Metasploit's payload handler
# has to (a) verify the bind shell on the guest is up and (b) connect
# to it. Default is ~5s. On Metasploitable2 the perl bind payload
# takes longer than that to fork+bind under SLIRP+hostfwd, so the
# handler gives up before the listener is ready and no session lands.
# 30s gives bind_perl + the SLIRP forward time to settle. Matches
# session_open_timeout_s in exploits/driver.py so the driver and the
# framework agree on the wait budget. Refs PIPELINE.md §3 (0/67
# session_open finding).
WfsDelay = 30
[payload] [payload]
path = "cmd/unix/bind_perl" path = "cmd/unix/bind_perl"
[payload.options] [payload.options]
LPORT = 4444 LPORT = 4444
# Give the handler retry budget when connecting to the bind port.
# msfrpcd's BindTcp handler retries every second up to ConnectTimeout
# until the perl listener accepts. Without this, a single failed
# connect aborts the session.
ConnectTimeout = 15
[session] [session]
type = "shell" type = "shell"

View file

@ -286,6 +286,12 @@ class EpisodeRunner:
output_path=self.episode_dir / "telemetry-guest.jsonl", output_path=self.episode_dir / "telemetry-guest.jsonl",
t_mono_origin_ns=self._t_mono_origin_ns, t_mono_origin_ns=self._t_mono_origin_ns,
stop_event=self._stop, stop_event=self._stop,
# Pipe lifecycle events into the orchestrator's
# events.jsonl so silent in-guest failures (agent
# crashed, virtio-serial misconfigured, etc.) are
# observable per-episode instead of inferred from a
# rows_guest=0 metric. Refs PIPELINE.md §1 / §4.4.
emit_event=self.emit_event,
) )
def _perf_collector() -> None: def _perf_collector() -> None:

View file

@ -243,14 +243,21 @@ def _run_slot(
run_dir_base = "/tmp/cis490-vm-fleet" run_dir_base = "/tmp/cis490-vm-fleet"
# Decide tier. # Decide tier.
# Tier-3 target VMs always use SLIRP+hostfwd so msfrpcd can reach # Tier-3 modules split into two classes by `requires_bridge`:
# the guest via loopback. BRIDGE tap is for the Tier-2 idle VM only # - bind/reverse-shell payloads under SLIRP need only loopback
# (pcap source 4). Skip modules that need bridge egress (bind/reverse # hostfwd (samba_usermap_script with bind_perl, etc.).
# shells that open a callback port the guest dials back or binds). # - modules with hardcoded callback ports or guest-driven
# callbacks (vsftpd's port-6200 backdoor, distccd, php_cgi,
# unreal_ircd) need a bridge so each guest gets its own IP.
# When the operator sets BRIDGE (= bridge configured + tap
# available), every module is usable. Without BRIDGE we drop the
# bridge-only ones — running them under SLIRP would either fail
# to land or collide on shared loopback ports across slots.
bridge_set = bool(os.environ.get("BRIDGE"))
usable_modules: dict[str, ModuleConfig] = ( usable_modules: dict[str, ModuleConfig] = (
{k: v for k, v in cfg.modules.items() if not v.requires_bridge} dict(cfg.modules) if bridge_set
if cfg.modules else {} else {k: v for k, v in cfg.modules.items() if not v.requires_bridge}
) ) if cfg.modules else {}
tier3_ready = ( tier3_ready = (
not cfg.force_tier2 not cfg.force_tier2
and bool(usable_modules) and bool(usable_modules)
@ -302,10 +309,15 @@ def _run_slot(
target_ports += f",{extra_host_port}:{extra_host_port}" target_ports += f",{extra_host_port}:{extra_host_port}"
env["FLEET_PAYLOAD_LPORT"] = str(extra_host_port) env["FLEET_PAYLOAD_LPORT"] = str(extra_host_port)
env["TARGET_PORTS"] = target_ports env["TARGET_PORTS"] = target_ports
# Remove BRIDGE so launch_target.sh uses SLIRP+hostfwd instead of # When BRIDGE is unset, force SLIRP+hostfwd; when it IS set we
# tap. Target VM connectivity goes through the hostfwd loopback ports; # keep it so requires_bridge modules (vsftpd backdoor on the
# tap/bridge requires guest-IP discovery which isn't wired up yet. # hardcoded port 6200, distccd, etc.) can reach the guest via
env.pop("BRIDGE", None) # its own bridge IP. Refs Bug 1 in TIER3-BRINGUP.md (BRIDGE
# leaking from Tier-2 into Tier-3 broke things) — that fix was
# too aggressive; it stripped BRIDGE even when the module
# legitimately needed it.
if not bridge_set:
env.pop("BRIDGE", None)
cmd = [ cmd = [
py, py,
str(cfg.repo_root / "tools" / "run_tier3_demo.py"), str(cfg.repo_root / "tools" / "run_tier3_demo.py"),

View file

@ -20,17 +20,7 @@ log = logging.getLogger("cis490.receiver")
SUFFIX = ".tar.zst" SUFFIX = ".tar.zst"
SCHEMA_VERSION = 2 SCHEMA_VERSION = 1
# Mirrored from orchestrator.benign so the receiver can validate the
# benign-profile header without taking a dependency on the orchestrator
# package. Keep in sync if BENIGN_PROFILES grows.
_VALID_BENIGN_PROFILES: frozenset[str] = frozenset({
"idle", "web_visitor", "admin_session", "cron_burst",
"file_browse", "db_query", "package_check",
})
_VALID_EPISODE_TYPES: frozenset[str] = frozenset({"control", "infected"})
def _bearer_check(request: Request, expected: str | None) -> Response | None: def _bearer_check(request: Request, expected: str | None) -> Response | None:
if expected is None: if expected is None:
@ -98,7 +88,7 @@ def make_app(
expected_sha = expected_sha.lower() expected_sha = expected_sha.lower()
try: try:
schema_version = int(request.headers.get("x-schema-version", "2")) schema_version = int(request.headers.get("x-schema-version", "1"))
except ValueError: except ValueError:
return JSONResponse({"error": "bad X-Schema-Version"}, status_code=400) return JSONResponse({"error": "bad X-Schema-Version"}, status_code=400)
@ -163,21 +153,6 @@ def make_app(
) )
return JSONResponse(body, status_code=412) return JSONResponse(body, status_code=412)
# Optional matrix-stratification headers. Validated against the
# closed enums so a misbehaving shipper can't write garbage into
# the index. Unknown values are dropped (header treated as absent)
# and logged so the operator can spot a version drift quickly.
episode_type = (request.headers.get("x-episode-type") or "").strip().lower()
if episode_type and episode_type not in _VALID_EPISODE_TYPES:
log.warning("dropping unknown X-Episode-Type=%r host=%s id=%s",
episode_type, host_id, episode_id)
episode_type = ""
benign_profile = (request.headers.get("x-benign-profile") or "").strip().lower()
if benign_profile and benign_profile not in _VALID_BENIGN_PROFILES:
log.warning("dropping unknown X-Benign-Profile=%r host=%s id=%s",
benign_profile, host_id, episode_id)
benign_profile = ""
cl = request.headers.get("content-length") cl = request.headers.get("content-length")
if cl is not None: if cl is not None:
try: try:
@ -194,8 +169,6 @@ def make_app(
expected_sha256=expected_sha, expected_sha256=expected_sha,
schema_version=schema_version, schema_version=schema_version,
commit=commit or None, commit=commit or None,
episode_type=episode_type or None,
benign_profile=benign_profile or None,
body=request.stream(), body=request.stream(),
max_bytes=max_episode_bytes, max_bytes=max_episode_bytes,
) )

View file

@ -0,0 +1,174 @@
"""§4.4 collector emit tests — each collector MUST produce >=1 row when
run for a few seconds against a synthesized busy workload. A collector
that fails this is removed from the active set (PIPELINE.md §4.4) no
silent zero-row inclusion.
These tests intentionally invoke the real collector binaries (perf,
tcpdump) against real subprocesses. They will skip on environments
where the binary or capability is unavailable, but they will fail
not skip when the binary IS present and the collector still emits
zero rows. The whole point is to catch the "collector silently
disabled" failure mode.
"""
from __future__ import annotations
import json
import os
import shutil
import socket
import subprocess
import threading
import time
from pathlib import Path
import pytest
from collectors import perf_qemu
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _spawn_busy_loop() -> subprocess.Popen:
"""Spawn a CPU-burning child whose PID we can hand to a collector.
`exec yes` so the captured PID IS the busy process without exec,
the captured PID is the wrapping shell that sits parked waiting on
its child, and perf samples an idle process."""
return subprocess.Popen(
["sh", "-c", "exec yes >/dev/null"],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
)
def _run_collector_briefly(target, *, seconds: float, **kw) -> int:
"""Spin a collector run_loop in a thread for `seconds`, then stop it.
Returns the row count the collector reports."""
stop = threading.Event()
result: dict[str, int] = {}
def _go() -> None:
result["rows"] = target(stop_event=stop, **kw)
th = threading.Thread(target=_go, daemon=True)
th.start()
time.sleep(seconds)
stop.set()
th.join(timeout=seconds + 5.0)
return result.get("rows", 0)
# ---------------------------------------------------------------------------
# perf
# ---------------------------------------------------------------------------
@pytest.mark.skipif(
shutil.which("perf") is None,
reason="perf binary not on PATH; this host can't host the perf collector",
)
def test_perf_emits_rows_against_busy_pid(tmp_path: Path) -> None:
"""The perf collector must emit at least one row when pointed at a
busy PID for a few seconds. Software events (page-faults,
context-switches, cpu-clock) are used so the test is portable
across CPUs that lack hardware performance counters; the production
DEFAULT_EVENTS adds hardware events on top, which is fine where
they're available and degrades gracefully where they're not.
Regression for: perf stat -j writes to stderr by default with -p,
so reading proc.stdout silently gives 0 lines and 0 rows. Fixed
by passing --log-fd 1 in the perf invocation.
"""
busy = _spawn_busy_loop()
try:
out = tmp_path / "telemetry-perf.jsonl"
rows = _run_collector_briefly(
perf_qemu.run_loop,
seconds=2.0,
pid=busy.pid,
output_path=out,
t_mono_origin_ns=0,
interval_ms=200,
events=("page-faults", "context-switches", "cpu-clock"),
)
finally:
busy.terminate()
try:
busy.wait(timeout=2.0)
except subprocess.TimeoutExpired:
busy.kill()
busy.wait(timeout=1.0)
assert rows >= 1, (
f"perf collector wrote 0 rows against a busy PID — see "
f"PIPELINE.md §4.4. File: {out}, exists={out.exists()}, "
f"size={out.stat().st_size if out.exists() else 'n/a'}"
)
# Sanity-check the on-disk file matches what run_loop reported.
on_disk = out.read_text().splitlines() if out.exists() else []
assert len(on_disk) == rows, (
f"row count mismatch: run_loop returned {rows} but "
f"{len(on_disk)} lines on disk"
)
# Spot-check the row shape — one parsed row should have the
# expected schema.
sample = json.loads(on_disk[0])
assert sample["source"] == "host_perf"
assert sample["available_in_deployment"] is False
assert "t_mono_ns" in sample and "interval_s" in sample
# At least one row must have a populated metric — if every metric
# is None on every row, the parser is dropping values. Regression
# for: event names come back as "cycles:u" / "instructions:u"
# under perf_event_paranoid=2 (userspace-only), but `_build_row`
# looks up plain "cycles" / "instructions" — so every metric was
# silently null even when perf reported real numbers. The mapped
# fields in the row schema are cycles, instructions, page_faults,
# context_switches, branches, branch_misses, cache_references,
# cache_misses; we only need ANY of them populated to confirm the
# parser is wiring values into the row.
parsed = [json.loads(l) for l in on_disk]
metric_keys = ("cycles", "instructions", "page_faults",
"context_switches", "branches")
assert any(r.get(k) is not None for r in parsed for k in metric_keys), (
f"every metric is None on every row — perf parser is dropping "
f"values. Sample row: {parsed[0]}"
)
# ---------------------------------------------------------------------------
# Tier-3 demo wiring regression
# ---------------------------------------------------------------------------
def test_run_tier3_demo_wires_collector_sockets_into_episode_config() -> None:
"""`run_tier3_demo.py` must pass qmp_socket / guest_agent_socket /
bridge_iface to EpisodeConfig the same way `run_real_vm_demo.py`
does. Without these, those collectors silently emit zero rows on
every Tier-3 episode even though launch_target.sh creates the
underlying chardevs. Regression for: bug found 2026-05-03 against
elliott-thinkpad + k-gamingcom (rows_qmp=0 / rows_guest=0 / pcap=0
on 100% of Tier-3 episodes).
This is a source-grep test rather than an exec test because
run_tier3_demo.py boots qemu + msfrpcd, neither of which is
available in CI. The grep keeps the wiring honest with no
runtime cost."""
src = (Path(__file__).resolve().parent.parent
/ "tools" / "run_tier3_demo.py").read_text()
# The exact fragments that, if absent, mean the collectors will
# silently never start. Each must appear as a keyword arg of the
# EpisodeConfig(...) constructor call site.
for needle in (
"qmp_socket=qmp_sock",
"guest_agent_socket=agent_sock",
"bridge_iface=os.environ.get(\"BRIDGE\")",
):
assert needle in src, (
f"run_tier3_demo.py is missing `{needle}` on its "
f"EpisodeConfig — see PIPELINE.md §4.4. Tier-3 episodes "
f"will silently produce 0 rows for the corresponding "
f"collector."
)

View file

@ -90,7 +90,18 @@ def build_user_data(*, embed_agent: bool, agent_path: Path | None) -> bytes:
raise FileNotFoundError(f"agent script not found: {agent_path}") raise FileNotFoundError(f"agent script not found: {agent_path}")
agent_src = agent_path.read_text() agent_src = agent_path.read_text()
# The Alpine cloud image (alpine-virt-3.X.Y-x86_64.qcow2) does not
# ship python3 by default, so the agent's `#!/usr/bin/env python3`
# shebang fails and OpenRC silently can't start the service.
# Result: telemetry-guest.jsonl is empty on every episode. Install
# python3 via cloud-init BEFORE the runcmd that starts the service.
# Refs PIPELINE.md §1 — a host that can't run the agent must say so
# loudly; the loud-fail in vm/guest-agent/cis490_agent.py + this
# explicit dep install close the silent-downgrade loop.
body = head + ( body = head + (
"packages:\n"
" - python3\n"
"package_update: true\n"
"write_files:\n" "write_files:\n"
" - path: /usr/local/bin/cis490-agent\n" " - path: /usr/local/bin/cis490-agent\n"
" permissions: '0755'\n" " permissions: '0755'\n"

View file

@ -289,6 +289,14 @@ def main() -> int:
) )
) )
# Wire the same collector sockets the Tier-2 path wires. Without
# these, EpisodeConfig defaults to None and the qmp / guest-agent
# / pcap collectors never start — even though launch_target.sh
# creates the qmp.sock + agent.sock chardevs and BRIDGE supplies
# the iface. Refs PIPELINE.md §4.4: a collector that appears
# configured but emits zero rows is exactly the silent-downgrade
# pattern §1 forbids.
agent_sock = run_dir / "agent.sock"
cfg = EpisodeConfig( cfg = EpisodeConfig(
target_pid=qemu_pid, target_pid=qemu_pid,
duration_s=sum(d for _, d in DEFAULT_SCHEDULE), duration_s=sum(d for _, d in DEFAULT_SCHEDULE),
@ -297,6 +305,9 @@ def main() -> int:
phase_schedule=DEFAULT_SCHEDULE, phase_schedule=DEFAULT_SCHEDULE,
image_name=module.name + "-target", image_name=module.name + "-target",
snapshot_name="baseline-v1", snapshot_name="baseline-v1",
qmp_socket=qmp_sock if qmp_sock.exists() else None,
guest_agent_socket=agent_sock if agent_sock.exists() else None,
bridge_iface=os.environ.get("BRIDGE") or None,
sample=sample, sample=sample,
exploit_meta={ exploit_meta={
"framework": "metasploit", "framework": "metasploit",

View file

@ -238,14 +238,26 @@ def main(argv: list[str] | None = None) -> int:
sys.stdout.flush() sys.stdout.flush()
return 0 return 0
# Open the virtio-serial port. If the host hasn't wired one up, # Open the virtio-serial port. The host wires this up via QEMU's
# fall back to stdout so the agent is testable on bare-metal too. # virtserialport device; if it's missing, either virtio_console
out_fp: Any # isn't loaded in the guest kernel, the device wasn't included on
if os.path.exists(args.port): # the QEMU command line, or udev hasn't created the symlink yet.
out_fp = open(args.port, "wb", buffering=0) # Exit loudly so OpenRC re-runs us (per service config) and so
else: # the failure is visible in /var/log/cis490-agent.log instead of
sys.stderr.write(f"[cis490-agent] {args.port} missing; writing to stdout\n") # being absorbed by a silent stdout fallback. Refs PIPELINE.md
out_fp = sys.stdout.buffer # §1 — a host that can't meet the bar must say so loudly, not
# silently downgrade to a half-running state.
if not os.path.exists(args.port):
sys.stderr.write(
f"[cis490-agent] FATAL: virtio-serial port {args.port} not "
f"present. Check (a) virtio_console kernel module is loaded "
f"inside the guest, (b) the QEMU command line includes "
f"-device virtserialport,name=cis490.guest.agent, (c) udev "
f"is creating /dev/virtio-ports/* symlinks. Exiting nonzero "
f"so this failure is observable rather than silently lost.\n"
)
return 2
out_fp = open(args.port, "wb", buffering=0)
interval_ns = args.interval_ms * 1_000_000 interval_ns = args.interval_ms * 1_000_000
next_tick = time.monotonic_ns() next_tick = time.monotonic_ns()