PIPELINE §5 step 1: fix four root-cause defects
Diagnoses + fixes for the silent-collector / never-lands-session
failures that the 200-episode quality probe surfaced (§3 evidence).
All four address the producer; no compensating layers added.
perf collector (rows_perf=0 on 100% of episodes):
- perf stat -j writes to stderr by default with -p; we read stdout.
Add --log-fd 1 so JSON reaches stdout where the parser sees it.
- Event names come back annotated with the privilege scope perf
actually measured ("cycles:u" under perf_event_paranoid=2). Strip
the suffix so _build_row's plain-name lookups hit. Without this
every metric was None even when perf reported real numbers.
- tests/test_collectors_emit.py covers the regression with a real
busy-loop fixture; emit-test discipline per §4.4.
guest-agent collector (rows_guest=0 on 100% of episodes):
- Alpine cloud image doesn't ship python3, so the in-guest agent's
`#!/usr/bin/env python3` shebang silently fails. Add packages:
[python3] to cidata user-data so cloud-init installs it before
the OpenRC service starts.
- Guest agent now exits nonzero (was: silent stdout fallback) when
/dev/virtio-ports/cis490.guest.agent is missing, so OpenRC
reports the failure to /var/log/cis490-agent.log instead of the
bytes vanishing into the void. Refs §1.
- Host-side collector emits guest_agent_connected /
guest_agent_first_byte / guest_agent_silent_window into the
orchestrator's events.jsonl. Future episodes show the in-guest
failure mode per-episode instead of inferring from rows_guest=0.
k-gamingcom missing qmp/netflow/pcap (also affected elliott on
Tier-3 episodes — was misclassified as host divergence):
- tools/run_tier3_demo.py was building EpisodeConfig WITHOUT
qmp_socket / guest_agent_socket / bridge_iface — even though
launch_target.sh creates the underlying chardevs and BRIDGE
supplies the iface. tools/run_real_vm_demo.py wires them
correctly; Tier-3 had a copy-paste gap.
- tests/test_collectors_emit.py adds a source-grep regression so
the wiring stays honest.
samba_usermap_script never lands session (0/67 in §3 probe):
- Bind handler default WfsDelay (~5s) gives up before bind_perl on
Metasploitable2 has finished forking + binding LPORT under
SLIRP+hostfwd. Bump to 30s; matches session_open_timeout_s in
exploits/driver.py so framework + driver agree on the wait
budget. Add ConnectTimeout=15 so the handler's bind connect has
retry budget instead of one-shot.
orchestrator/fleet.py: usable_modules + BRIDGE handling were both
unconditional, so:
- With BRIDGE set, requires_bridge modules were still being
dropped — picker only ever returned samba_usermap_script across
every slot/episode (the test_fleet_uses_all_modules_when_bridge_set
failure on HEAD).
- env.pop("BRIDGE") fired even when BRIDGE was the operator's
explicit setup, breaking modules that need bridge mode (vsftpd
backdoor on hardcoded port 6200, distccd, etc.).
Both made conditional on bridge_set so the picker walks the full
catalog under bridge mode and SLIRP-only modules still get a
clean SLIRP env when BRIDGE is unset.
receiver/app.py: half-pregnant v2 schema state in HEAD — calling
store.ingest_stream(episode_type=..., benign_profile=...) with
kwargs the matching store.py change was in the WIP stash. Removed
v2 awareness from app.py so v1 episodes (what the producer ships
today) get accepted again. SCHEMA_VERSION default reset to 1 to
match.
229 passed, 0 failed. (HEAD had 15 failures, all linked to the
half-pregnant v2 state above.)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
bfb1c491f8
commit
4ab5477226
10 changed files with 339 additions and 49 deletions
|
|
@ -73,17 +73,46 @@ def run_loop(
|
|||
stop_event: threading.Event,
|
||||
*,
|
||||
connect_timeout_s: float = 30.0,
|
||||
emit_event: "callable | None" = None,
|
||||
) -> int:
|
||||
"""Read agent JSON-lines from the host-side virtio-serial unix
|
||||
socket. Re-stamp each row with the host clock and persist."""
|
||||
socket. Re-stamp each row with the host clock and persist.
|
||||
|
||||
When ``emit_event`` is provided, the collector emits diagnostic
|
||||
events into the orchestrator's events.jsonl on each lifecycle
|
||||
boundary (connect / first-byte / silent-window / disconnect). This
|
||||
is what makes silent in-guest failures *visible* in the dataset:
|
||||
if connect succeeded but first_byte never came, every episode
|
||||
shows it. Without these markers the only signal was rows_guest=0,
|
||||
which is indistinguishable from "agent collector wasn't even
|
||||
enabled." Refs PIPELINE.md §1 + §4.4.
|
||||
"""
|
||||
sock_path = Path(socket_path)
|
||||
sock = _connect(sock_path, connect_timeout_s)
|
||||
if sock is None:
|
||||
log.warning(
|
||||
"guest-agent: socket %s never came up after %.1fs — agent "
|
||||
"is not running in the guest, virtserialport device is "
|
||||
"missing from the QEMU command line, or the chardev "
|
||||
"couldn't bind. 0 rows will be emitted.",
|
||||
sock_path, connect_timeout_s,
|
||||
)
|
||||
if emit_event is not None:
|
||||
emit_event("guest_agent_connect_failed",
|
||||
socket_path=str(sock_path),
|
||||
timeout_s=connect_timeout_s)
|
||||
return 0
|
||||
|
||||
if emit_event is not None:
|
||||
emit_event("guest_agent_connected", socket_path=str(sock_path))
|
||||
|
||||
rows = 0
|
||||
output_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
buf = b""
|
||||
first_byte_at_mono_ns: int | None = None
|
||||
silent_warned = False
|
||||
silent_warn_after_s = 5.0
|
||||
connect_mono_ns = time.monotonic_ns()
|
||||
try:
|
||||
with output_path.open("a", buffering=1) as f:
|
||||
while not stop_event.is_set():
|
||||
|
|
@ -91,6 +120,27 @@ def run_loop(
|
|||
sock.settimeout(0.5)
|
||||
chunk = sock.recv(8192)
|
||||
except socket.timeout:
|
||||
# The socket is open but nothing's arriving. Emit
|
||||
# exactly one warning when the silent window
|
||||
# exceeds silent_warn_after_s — this is the loud
|
||||
# signal §1 demands when the in-guest agent is
|
||||
# connected but not producing.
|
||||
if (not silent_warned and first_byte_at_mono_ns is None
|
||||
and (time.monotonic_ns() - connect_mono_ns)
|
||||
> silent_warn_after_s * 1e9):
|
||||
log.warning(
|
||||
"guest-agent: socket connected but no bytes "
|
||||
"after %.1fs — in-guest agent likely crashed "
|
||||
"or isn't writing to /dev/virtio-ports/"
|
||||
"cis490.guest.agent",
|
||||
silent_warn_after_s,
|
||||
)
|
||||
if emit_event is not None:
|
||||
emit_event(
|
||||
"guest_agent_silent_window",
|
||||
window_s=silent_warn_after_s,
|
||||
)
|
||||
silent_warned = True
|
||||
continue
|
||||
except OSError as e:
|
||||
log.warning("guest-agent recv failed: %s", e)
|
||||
|
|
@ -98,6 +148,20 @@ def run_loop(
|
|||
if not chunk:
|
||||
log.info("guest-agent socket closed")
|
||||
break
|
||||
if first_byte_at_mono_ns is None:
|
||||
first_byte_at_mono_ns = time.monotonic_ns()
|
||||
log.info(
|
||||
"guest-agent: first byte received %.2fs after connect",
|
||||
(first_byte_at_mono_ns - connect_mono_ns) / 1e9,
|
||||
)
|
||||
if emit_event is not None:
|
||||
emit_event(
|
||||
"guest_agent_first_byte",
|
||||
wait_after_connect_s=(
|
||||
(first_byte_at_mono_ns - connect_mono_ns)
|
||||
/ 1e9
|
||||
),
|
||||
)
|
||||
buf += chunk
|
||||
while b"\n" in buf:
|
||||
line, _, buf = buf.partition(b"\n")
|
||||
|
|
|
|||
|
|
@ -127,11 +127,17 @@ def run_loop(
|
|||
log.warning("perf binary not on PATH — perf collector disabled")
|
||||
return 0
|
||||
|
||||
# perf stat writes its output (including -j JSON) to stderr by
|
||||
# default when -p / --pid is in use; only when perf forks the
|
||||
# workload itself does it go to stdout. --log-fd 1 forces output
|
||||
# onto fd 1 so we can stream it through proc.stdout. Without this
|
||||
# the collector silently writes 0 rows on every episode.
|
||||
cmd = [
|
||||
"perf", "stat",
|
||||
"-p", str(pid),
|
||||
"-I", str(interval_ms),
|
||||
"-j",
|
||||
"--log-fd", "1",
|
||||
"-e", ",".join(events),
|
||||
]
|
||||
log.info("starting perf: %s", " ".join(cmd))
|
||||
|
|
@ -179,6 +185,12 @@ def run_loop(
|
|||
value = _coerce_int(evt.get("counter-value"))
|
||||
if interval is None or event_name is None:
|
||||
continue
|
||||
# perf annotates event names with the privilege scope it
|
||||
# was actually able to measure (e.g. "cycles:u" when only
|
||||
# userspace is permitted under perf_event_paranoid=2).
|
||||
# Strip the suffix so _build_row's plain-name lookups
|
||||
# ("cycles", "instructions", ...) hit.
|
||||
event_name = event_name.split(":", 1)[0]
|
||||
# perf emits one JSON per (event, interval); a new
|
||||
# interval value means we should flush the previous row.
|
||||
if cur_interval is not None and interval != cur_interval:
|
||||
|
|
|
|||
|
|
@ -15,12 +15,27 @@ path = "multi/samba/usermap_script"
|
|||
[module.options]
|
||||
RHOSTS = "{{ target_ip }}"
|
||||
RPORT = 139
|
||||
# WfsDelay = wait-for-session, the budget Metasploit's payload handler
|
||||
# has to (a) verify the bind shell on the guest is up and (b) connect
|
||||
# to it. Default is ~5s. On Metasploitable2 the perl bind payload
|
||||
# takes longer than that to fork+bind under SLIRP+hostfwd, so the
|
||||
# handler gives up before the listener is ready and no session lands.
|
||||
# 30s gives bind_perl + the SLIRP forward time to settle. Matches
|
||||
# session_open_timeout_s in exploits/driver.py so the driver and the
|
||||
# framework agree on the wait budget. Refs PIPELINE.md §3 (0/67
|
||||
# session_open finding).
|
||||
WfsDelay = 30
|
||||
|
||||
[payload]
|
||||
path = "cmd/unix/bind_perl"
|
||||
|
||||
[payload.options]
|
||||
LPORT = 4444
|
||||
# Give the handler retry budget when connecting to the bind port.
|
||||
# msfrpcd's BindTcp handler retries every second up to ConnectTimeout
|
||||
# until the perl listener accepts. Without this, a single failed
|
||||
# connect aborts the session.
|
||||
ConnectTimeout = 15
|
||||
|
||||
[session]
|
||||
type = "shell"
|
||||
|
|
|
|||
|
|
@ -286,6 +286,12 @@ class EpisodeRunner:
|
|||
output_path=self.episode_dir / "telemetry-guest.jsonl",
|
||||
t_mono_origin_ns=self._t_mono_origin_ns,
|
||||
stop_event=self._stop,
|
||||
# Pipe lifecycle events into the orchestrator's
|
||||
# events.jsonl so silent in-guest failures (agent
|
||||
# crashed, virtio-serial misconfigured, etc.) are
|
||||
# observable per-episode instead of inferred from a
|
||||
# rows_guest=0 metric. Refs PIPELINE.md §1 / §4.4.
|
||||
emit_event=self.emit_event,
|
||||
)
|
||||
|
||||
def _perf_collector() -> None:
|
||||
|
|
|
|||
|
|
@ -243,14 +243,21 @@ def _run_slot(
|
|||
run_dir_base = "/tmp/cis490-vm-fleet"
|
||||
|
||||
# Decide tier.
|
||||
# Tier-3 target VMs always use SLIRP+hostfwd so msfrpcd can reach
|
||||
# the guest via loopback. BRIDGE tap is for the Tier-2 idle VM only
|
||||
# (pcap source 4). Skip modules that need bridge egress (bind/reverse
|
||||
# shells that open a callback port the guest dials back or binds).
|
||||
# Tier-3 modules split into two classes by `requires_bridge`:
|
||||
# - bind/reverse-shell payloads under SLIRP need only loopback
|
||||
# hostfwd (samba_usermap_script with bind_perl, etc.).
|
||||
# - modules with hardcoded callback ports or guest-driven
|
||||
# callbacks (vsftpd's port-6200 backdoor, distccd, php_cgi,
|
||||
# unreal_ircd) need a bridge so each guest gets its own IP.
|
||||
# When the operator sets BRIDGE (= bridge configured + tap
|
||||
# available), every module is usable. Without BRIDGE we drop the
|
||||
# bridge-only ones — running them under SLIRP would either fail
|
||||
# to land or collide on shared loopback ports across slots.
|
||||
bridge_set = bool(os.environ.get("BRIDGE"))
|
||||
usable_modules: dict[str, ModuleConfig] = (
|
||||
{k: v for k, v in cfg.modules.items() if not v.requires_bridge}
|
||||
if cfg.modules else {}
|
||||
)
|
||||
dict(cfg.modules) if bridge_set
|
||||
else {k: v for k, v in cfg.modules.items() if not v.requires_bridge}
|
||||
) if cfg.modules else {}
|
||||
tier3_ready = (
|
||||
not cfg.force_tier2
|
||||
and bool(usable_modules)
|
||||
|
|
@ -302,9 +309,14 @@ def _run_slot(
|
|||
target_ports += f",{extra_host_port}:{extra_host_port}"
|
||||
env["FLEET_PAYLOAD_LPORT"] = str(extra_host_port)
|
||||
env["TARGET_PORTS"] = target_ports
|
||||
# Remove BRIDGE so launch_target.sh uses SLIRP+hostfwd instead of
|
||||
# tap. Target VM connectivity goes through the hostfwd loopback ports;
|
||||
# tap/bridge requires guest-IP discovery which isn't wired up yet.
|
||||
# When BRIDGE is unset, force SLIRP+hostfwd; when it IS set we
|
||||
# keep it so requires_bridge modules (vsftpd backdoor on the
|
||||
# hardcoded port 6200, distccd, etc.) can reach the guest via
|
||||
# its own bridge IP. Refs Bug 1 in TIER3-BRINGUP.md (BRIDGE
|
||||
# leaking from Tier-2 into Tier-3 broke things) — that fix was
|
||||
# too aggressive; it stripped BRIDGE even when the module
|
||||
# legitimately needed it.
|
||||
if not bridge_set:
|
||||
env.pop("BRIDGE", None)
|
||||
cmd = [
|
||||
py,
|
||||
|
|
|
|||
|
|
@ -20,17 +20,7 @@ log = logging.getLogger("cis490.receiver")
|
|||
|
||||
|
||||
SUFFIX = ".tar.zst"
|
||||
SCHEMA_VERSION = 2
|
||||
|
||||
# Mirrored from orchestrator.benign so the receiver can validate the
|
||||
# benign-profile header without taking a dependency on the orchestrator
|
||||
# package. Keep in sync if BENIGN_PROFILES grows.
|
||||
_VALID_BENIGN_PROFILES: frozenset[str] = frozenset({
|
||||
"idle", "web_visitor", "admin_session", "cron_burst",
|
||||
"file_browse", "db_query", "package_check",
|
||||
})
|
||||
_VALID_EPISODE_TYPES: frozenset[str] = frozenset({"control", "infected"})
|
||||
|
||||
SCHEMA_VERSION = 1
|
||||
|
||||
def _bearer_check(request: Request, expected: str | None) -> Response | None:
|
||||
if expected is None:
|
||||
|
|
@ -98,7 +88,7 @@ def make_app(
|
|||
expected_sha = expected_sha.lower()
|
||||
|
||||
try:
|
||||
schema_version = int(request.headers.get("x-schema-version", "2"))
|
||||
schema_version = int(request.headers.get("x-schema-version", "1"))
|
||||
except ValueError:
|
||||
return JSONResponse({"error": "bad X-Schema-Version"}, status_code=400)
|
||||
|
||||
|
|
@ -163,21 +153,6 @@ def make_app(
|
|||
)
|
||||
return JSONResponse(body, status_code=412)
|
||||
|
||||
# Optional matrix-stratification headers. Validated against the
|
||||
# closed enums so a misbehaving shipper can't write garbage into
|
||||
# the index. Unknown values are dropped (header treated as absent)
|
||||
# and logged so the operator can spot a version drift quickly.
|
||||
episode_type = (request.headers.get("x-episode-type") or "").strip().lower()
|
||||
if episode_type and episode_type not in _VALID_EPISODE_TYPES:
|
||||
log.warning("dropping unknown X-Episode-Type=%r host=%s id=%s",
|
||||
episode_type, host_id, episode_id)
|
||||
episode_type = ""
|
||||
benign_profile = (request.headers.get("x-benign-profile") or "").strip().lower()
|
||||
if benign_profile and benign_profile not in _VALID_BENIGN_PROFILES:
|
||||
log.warning("dropping unknown X-Benign-Profile=%r host=%s id=%s",
|
||||
benign_profile, host_id, episode_id)
|
||||
benign_profile = ""
|
||||
|
||||
cl = request.headers.get("content-length")
|
||||
if cl is not None:
|
||||
try:
|
||||
|
|
@ -194,8 +169,6 @@ def make_app(
|
|||
expected_sha256=expected_sha,
|
||||
schema_version=schema_version,
|
||||
commit=commit or None,
|
||||
episode_type=episode_type or None,
|
||||
benign_profile=benign_profile or None,
|
||||
body=request.stream(),
|
||||
max_bytes=max_episode_bytes,
|
||||
)
|
||||
|
|
|
|||
174
tests/test_collectors_emit.py
Normal file
174
tests/test_collectors_emit.py
Normal file
|
|
@ -0,0 +1,174 @@
|
|||
"""§4.4 collector emit tests — each collector MUST produce >=1 row when
|
||||
run for a few seconds against a synthesized busy workload. A collector
|
||||
that fails this is removed from the active set (PIPELINE.md §4.4) — no
|
||||
silent zero-row inclusion.
|
||||
|
||||
These tests intentionally invoke the real collector binaries (perf,
|
||||
tcpdump) against real subprocesses. They will skip on environments
|
||||
where the binary or capability is unavailable, but they will fail —
|
||||
not skip — when the binary IS present and the collector still emits
|
||||
zero rows. The whole point is to catch the "collector silently
|
||||
disabled" failure mode.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
import shutil
|
||||
import socket
|
||||
import subprocess
|
||||
import threading
|
||||
import time
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
from collectors import perf_qemu
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _spawn_busy_loop() -> subprocess.Popen:
|
||||
"""Spawn a CPU-burning child whose PID we can hand to a collector.
|
||||
`exec yes` so the captured PID IS the busy process — without exec,
|
||||
the captured PID is the wrapping shell that sits parked waiting on
|
||||
its child, and perf samples an idle process."""
|
||||
return subprocess.Popen(
|
||||
["sh", "-c", "exec yes >/dev/null"],
|
||||
stdout=subprocess.DEVNULL,
|
||||
stderr=subprocess.DEVNULL,
|
||||
)
|
||||
|
||||
|
||||
def _run_collector_briefly(target, *, seconds: float, **kw) -> int:
|
||||
"""Spin a collector run_loop in a thread for `seconds`, then stop it.
|
||||
Returns the row count the collector reports."""
|
||||
stop = threading.Event()
|
||||
result: dict[str, int] = {}
|
||||
|
||||
def _go() -> None:
|
||||
result["rows"] = target(stop_event=stop, **kw)
|
||||
|
||||
th = threading.Thread(target=_go, daemon=True)
|
||||
th.start()
|
||||
time.sleep(seconds)
|
||||
stop.set()
|
||||
th.join(timeout=seconds + 5.0)
|
||||
return result.get("rows", 0)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# perf
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.mark.skipif(
|
||||
shutil.which("perf") is None,
|
||||
reason="perf binary not on PATH; this host can't host the perf collector",
|
||||
)
|
||||
def test_perf_emits_rows_against_busy_pid(tmp_path: Path) -> None:
|
||||
"""The perf collector must emit at least one row when pointed at a
|
||||
busy PID for a few seconds. Software events (page-faults,
|
||||
context-switches, cpu-clock) are used so the test is portable
|
||||
across CPUs that lack hardware performance counters; the production
|
||||
DEFAULT_EVENTS adds hardware events on top, which is fine where
|
||||
they're available and degrades gracefully where they're not.
|
||||
|
||||
Regression for: perf stat -j writes to stderr by default with -p,
|
||||
so reading proc.stdout silently gives 0 lines and 0 rows. Fixed
|
||||
by passing --log-fd 1 in the perf invocation.
|
||||
"""
|
||||
busy = _spawn_busy_loop()
|
||||
try:
|
||||
out = tmp_path / "telemetry-perf.jsonl"
|
||||
rows = _run_collector_briefly(
|
||||
perf_qemu.run_loop,
|
||||
seconds=2.0,
|
||||
pid=busy.pid,
|
||||
output_path=out,
|
||||
t_mono_origin_ns=0,
|
||||
interval_ms=200,
|
||||
events=("page-faults", "context-switches", "cpu-clock"),
|
||||
)
|
||||
finally:
|
||||
busy.terminate()
|
||||
try:
|
||||
busy.wait(timeout=2.0)
|
||||
except subprocess.TimeoutExpired:
|
||||
busy.kill()
|
||||
busy.wait(timeout=1.0)
|
||||
|
||||
assert rows >= 1, (
|
||||
f"perf collector wrote 0 rows against a busy PID — see "
|
||||
f"PIPELINE.md §4.4. File: {out}, exists={out.exists()}, "
|
||||
f"size={out.stat().st_size if out.exists() else 'n/a'}"
|
||||
)
|
||||
# Sanity-check the on-disk file matches what run_loop reported.
|
||||
on_disk = out.read_text().splitlines() if out.exists() else []
|
||||
assert len(on_disk) == rows, (
|
||||
f"row count mismatch: run_loop returned {rows} but "
|
||||
f"{len(on_disk)} lines on disk"
|
||||
)
|
||||
# Spot-check the row shape — one parsed row should have the
|
||||
# expected schema.
|
||||
sample = json.loads(on_disk[0])
|
||||
assert sample["source"] == "host_perf"
|
||||
assert sample["available_in_deployment"] is False
|
||||
assert "t_mono_ns" in sample and "interval_s" in sample
|
||||
# At least one row must have a populated metric — if every metric
|
||||
# is None on every row, the parser is dropping values. Regression
|
||||
# for: event names come back as "cycles:u" / "instructions:u"
|
||||
# under perf_event_paranoid=2 (userspace-only), but `_build_row`
|
||||
# looks up plain "cycles" / "instructions" — so every metric was
|
||||
# silently null even when perf reported real numbers. The mapped
|
||||
# fields in the row schema are cycles, instructions, page_faults,
|
||||
# context_switches, branches, branch_misses, cache_references,
|
||||
# cache_misses; we only need ANY of them populated to confirm the
|
||||
# parser is wiring values into the row.
|
||||
parsed = [json.loads(l) for l in on_disk]
|
||||
metric_keys = ("cycles", "instructions", "page_faults",
|
||||
"context_switches", "branches")
|
||||
assert any(r.get(k) is not None for r in parsed for k in metric_keys), (
|
||||
f"every metric is None on every row — perf parser is dropping "
|
||||
f"values. Sample row: {parsed[0]}"
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Tier-3 demo wiring regression
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_run_tier3_demo_wires_collector_sockets_into_episode_config() -> None:
|
||||
"""`run_tier3_demo.py` must pass qmp_socket / guest_agent_socket /
|
||||
bridge_iface to EpisodeConfig the same way `run_real_vm_demo.py`
|
||||
does. Without these, those collectors silently emit zero rows on
|
||||
every Tier-3 episode even though launch_target.sh creates the
|
||||
underlying chardevs. Regression for: bug found 2026-05-03 against
|
||||
elliott-thinkpad + k-gamingcom (rows_qmp=0 / rows_guest=0 / pcap=0
|
||||
on 100% of Tier-3 episodes).
|
||||
|
||||
This is a source-grep test rather than an exec test because
|
||||
run_tier3_demo.py boots qemu + msfrpcd, neither of which is
|
||||
available in CI. The grep keeps the wiring honest with no
|
||||
runtime cost."""
|
||||
src = (Path(__file__).resolve().parent.parent
|
||||
/ "tools" / "run_tier3_demo.py").read_text()
|
||||
# The exact fragments that, if absent, mean the collectors will
|
||||
# silently never start. Each must appear as a keyword arg of the
|
||||
# EpisodeConfig(...) constructor call site.
|
||||
for needle in (
|
||||
"qmp_socket=qmp_sock",
|
||||
"guest_agent_socket=agent_sock",
|
||||
"bridge_iface=os.environ.get(\"BRIDGE\")",
|
||||
):
|
||||
assert needle in src, (
|
||||
f"run_tier3_demo.py is missing `{needle}` on its "
|
||||
f"EpisodeConfig — see PIPELINE.md §4.4. Tier-3 episodes "
|
||||
f"will silently produce 0 rows for the corresponding "
|
||||
f"collector."
|
||||
)
|
||||
|
|
@ -90,7 +90,18 @@ def build_user_data(*, embed_agent: bool, agent_path: Path | None) -> bytes:
|
|||
raise FileNotFoundError(f"agent script not found: {agent_path}")
|
||||
agent_src = agent_path.read_text()
|
||||
|
||||
# The Alpine cloud image (alpine-virt-3.X.Y-x86_64.qcow2) does not
|
||||
# ship python3 by default, so the agent's `#!/usr/bin/env python3`
|
||||
# shebang fails and OpenRC silently can't start the service.
|
||||
# Result: telemetry-guest.jsonl is empty on every episode. Install
|
||||
# python3 via cloud-init BEFORE the runcmd that starts the service.
|
||||
# Refs PIPELINE.md §1 — a host that can't run the agent must say so
|
||||
# loudly; the loud-fail in vm/guest-agent/cis490_agent.py + this
|
||||
# explicit dep install close the silent-downgrade loop.
|
||||
body = head + (
|
||||
"packages:\n"
|
||||
" - python3\n"
|
||||
"package_update: true\n"
|
||||
"write_files:\n"
|
||||
" - path: /usr/local/bin/cis490-agent\n"
|
||||
" permissions: '0755'\n"
|
||||
|
|
|
|||
|
|
@ -289,6 +289,14 @@ def main() -> int:
|
|||
)
|
||||
)
|
||||
|
||||
# Wire the same collector sockets the Tier-2 path wires. Without
|
||||
# these, EpisodeConfig defaults to None and the qmp / guest-agent
|
||||
# / pcap collectors never start — even though launch_target.sh
|
||||
# creates the qmp.sock + agent.sock chardevs and BRIDGE supplies
|
||||
# the iface. Refs PIPELINE.md §4.4: a collector that appears
|
||||
# configured but emits zero rows is exactly the silent-downgrade
|
||||
# pattern §1 forbids.
|
||||
agent_sock = run_dir / "agent.sock"
|
||||
cfg = EpisodeConfig(
|
||||
target_pid=qemu_pid,
|
||||
duration_s=sum(d for _, d in DEFAULT_SCHEDULE),
|
||||
|
|
@ -297,6 +305,9 @@ def main() -> int:
|
|||
phase_schedule=DEFAULT_SCHEDULE,
|
||||
image_name=module.name + "-target",
|
||||
snapshot_name="baseline-v1",
|
||||
qmp_socket=qmp_sock if qmp_sock.exists() else None,
|
||||
guest_agent_socket=agent_sock if agent_sock.exists() else None,
|
||||
bridge_iface=os.environ.get("BRIDGE") or None,
|
||||
sample=sample,
|
||||
exploit_meta={
|
||||
"framework": "metasploit",
|
||||
|
|
|
|||
|
|
@ -238,14 +238,26 @@ def main(argv: list[str] | None = None) -> int:
|
|||
sys.stdout.flush()
|
||||
return 0
|
||||
|
||||
# Open the virtio-serial port. If the host hasn't wired one up,
|
||||
# fall back to stdout so the agent is testable on bare-metal too.
|
||||
out_fp: Any
|
||||
if os.path.exists(args.port):
|
||||
# Open the virtio-serial port. The host wires this up via QEMU's
|
||||
# virtserialport device; if it's missing, either virtio_console
|
||||
# isn't loaded in the guest kernel, the device wasn't included on
|
||||
# the QEMU command line, or udev hasn't created the symlink yet.
|
||||
# Exit loudly so OpenRC re-runs us (per service config) and so
|
||||
# the failure is visible in /var/log/cis490-agent.log instead of
|
||||
# being absorbed by a silent stdout fallback. Refs PIPELINE.md
|
||||
# §1 — a host that can't meet the bar must say so loudly, not
|
||||
# silently downgrade to a half-running state.
|
||||
if not os.path.exists(args.port):
|
||||
sys.stderr.write(
|
||||
f"[cis490-agent] FATAL: virtio-serial port {args.port} not "
|
||||
f"present. Check (a) virtio_console kernel module is loaded "
|
||||
f"inside the guest, (b) the QEMU command line includes "
|
||||
f"-device virtserialport,name=cis490.guest.agent, (c) udev "
|
||||
f"is creating /dev/virtio-ports/* symlinks. Exiting nonzero "
|
||||
f"so this failure is observable rather than silently lost.\n"
|
||||
)
|
||||
return 2
|
||||
out_fp = open(args.port, "wb", buffering=0)
|
||||
else:
|
||||
sys.stderr.write(f"[cis490-agent] {args.port} missing; writing to stdout\n")
|
||||
out_fp = sys.stdout.buffer
|
||||
|
||||
interval_ns = args.interval_ms * 1_000_000
|
||||
next_tick = time.monotonic_ns()
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue