PIPELINE §5 step 1: fix four root-cause defects
Diagnoses + fixes for the silent-collector / never-lands-session
failures that the 200-episode quality probe surfaced (§3 evidence).
All four address the producer; no compensating layers added.
perf collector (rows_perf=0 on 100% of episodes):
- perf stat -j writes to stderr by default with -p; we read stdout.
Add --log-fd 1 so JSON reaches stdout where the parser sees it.
- Event names come back annotated with the privilege scope perf
actually measured ("cycles:u" under perf_event_paranoid=2). Strip
the suffix so _build_row's plain-name lookups hit. Without this
every metric was None even when perf reported real numbers.
- tests/test_collectors_emit.py covers the regression with a real
busy-loop fixture; emit-test discipline per §4.4.
guest-agent collector (rows_guest=0 on 100% of episodes):
- Alpine cloud image doesn't ship python3, so the in-guest agent's
`#!/usr/bin/env python3` shebang silently fails. Add packages:
[python3] to cidata user-data so cloud-init installs it before
the OpenRC service starts.
- Guest agent now exits nonzero (was: silent stdout fallback) when
/dev/virtio-ports/cis490.guest.agent is missing, so OpenRC
reports the failure to /var/log/cis490-agent.log instead of the
bytes vanishing into the void. Refs §1.
- Host-side collector emits guest_agent_connected /
guest_agent_first_byte / guest_agent_silent_window into the
orchestrator's events.jsonl. Future episodes show the in-guest
failure mode per-episode instead of inferring from rows_guest=0.
k-gamingcom missing qmp/netflow/pcap (also affected elliott on
Tier-3 episodes — was misclassified as host divergence):
- tools/run_tier3_demo.py was building EpisodeConfig WITHOUT
qmp_socket / guest_agent_socket / bridge_iface — even though
launch_target.sh creates the underlying chardevs and BRIDGE
supplies the iface. tools/run_real_vm_demo.py wires them
correctly; Tier-3 had a copy-paste gap.
- tests/test_collectors_emit.py adds a source-grep regression so
the wiring stays honest.
samba_usermap_script never lands session (0/67 in §3 probe):
- Bind handler default WfsDelay (~5s) gives up before bind_perl on
Metasploitable2 has finished forking + binding LPORT under
SLIRP+hostfwd. Bump to 30s; matches session_open_timeout_s in
exploits/driver.py so framework + driver agree on the wait
budget. Add ConnectTimeout=15 so the handler's bind connect has
retry budget instead of one-shot.
orchestrator/fleet.py: usable_modules + BRIDGE handling were both
unconditional, so:
- With BRIDGE set, requires_bridge modules were still being
dropped — picker only ever returned samba_usermap_script across
every slot/episode (the test_fleet_uses_all_modules_when_bridge_set
failure on HEAD).
- env.pop("BRIDGE") fired even when BRIDGE was the operator's
explicit setup, breaking modules that need bridge mode (vsftpd
backdoor on hardcoded port 6200, distccd, etc.).
Both made conditional on bridge_set so the picker walks the full
catalog under bridge mode and SLIRP-only modules still get a
clean SLIRP env when BRIDGE is unset.
receiver/app.py: half-pregnant v2 schema state in HEAD — calling
store.ingest_stream(episode_type=..., benign_profile=...) with
kwargs the matching store.py change was in the WIP stash. Removed
v2 awareness from app.py so v1 episodes (what the producer ships
today) get accepted again. SCHEMA_VERSION default reset to 1 to
match.
229 passed, 0 failed. (HEAD had 15 failures, all linked to the
half-pregnant v2 state above.)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
bfb1c491f8
commit
4ab5477226
10 changed files with 339 additions and 49 deletions
|
|
@ -73,17 +73,46 @@ def run_loop(
|
||||||
stop_event: threading.Event,
|
stop_event: threading.Event,
|
||||||
*,
|
*,
|
||||||
connect_timeout_s: float = 30.0,
|
connect_timeout_s: float = 30.0,
|
||||||
|
emit_event: "callable | None" = None,
|
||||||
) -> int:
|
) -> int:
|
||||||
"""Read agent JSON-lines from the host-side virtio-serial unix
|
"""Read agent JSON-lines from the host-side virtio-serial unix
|
||||||
socket. Re-stamp each row with the host clock and persist."""
|
socket. Re-stamp each row with the host clock and persist.
|
||||||
|
|
||||||
|
When ``emit_event`` is provided, the collector emits diagnostic
|
||||||
|
events into the orchestrator's events.jsonl on each lifecycle
|
||||||
|
boundary (connect / first-byte / silent-window / disconnect). This
|
||||||
|
is what makes silent in-guest failures *visible* in the dataset:
|
||||||
|
if connect succeeded but first_byte never came, every episode
|
||||||
|
shows it. Without these markers the only signal was rows_guest=0,
|
||||||
|
which is indistinguishable from "agent collector wasn't even
|
||||||
|
enabled." Refs PIPELINE.md §1 + §4.4.
|
||||||
|
"""
|
||||||
sock_path = Path(socket_path)
|
sock_path = Path(socket_path)
|
||||||
sock = _connect(sock_path, connect_timeout_s)
|
sock = _connect(sock_path, connect_timeout_s)
|
||||||
if sock is None:
|
if sock is None:
|
||||||
|
log.warning(
|
||||||
|
"guest-agent: socket %s never came up after %.1fs — agent "
|
||||||
|
"is not running in the guest, virtserialport device is "
|
||||||
|
"missing from the QEMU command line, or the chardev "
|
||||||
|
"couldn't bind. 0 rows will be emitted.",
|
||||||
|
sock_path, connect_timeout_s,
|
||||||
|
)
|
||||||
|
if emit_event is not None:
|
||||||
|
emit_event("guest_agent_connect_failed",
|
||||||
|
socket_path=str(sock_path),
|
||||||
|
timeout_s=connect_timeout_s)
|
||||||
return 0
|
return 0
|
||||||
|
|
||||||
|
if emit_event is not None:
|
||||||
|
emit_event("guest_agent_connected", socket_path=str(sock_path))
|
||||||
|
|
||||||
rows = 0
|
rows = 0
|
||||||
output_path.parent.mkdir(parents=True, exist_ok=True)
|
output_path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
buf = b""
|
buf = b""
|
||||||
|
first_byte_at_mono_ns: int | None = None
|
||||||
|
silent_warned = False
|
||||||
|
silent_warn_after_s = 5.0
|
||||||
|
connect_mono_ns = time.monotonic_ns()
|
||||||
try:
|
try:
|
||||||
with output_path.open("a", buffering=1) as f:
|
with output_path.open("a", buffering=1) as f:
|
||||||
while not stop_event.is_set():
|
while not stop_event.is_set():
|
||||||
|
|
@ -91,6 +120,27 @@ def run_loop(
|
||||||
sock.settimeout(0.5)
|
sock.settimeout(0.5)
|
||||||
chunk = sock.recv(8192)
|
chunk = sock.recv(8192)
|
||||||
except socket.timeout:
|
except socket.timeout:
|
||||||
|
# The socket is open but nothing's arriving. Emit
|
||||||
|
# exactly one warning when the silent window
|
||||||
|
# exceeds silent_warn_after_s — this is the loud
|
||||||
|
# signal §1 demands when the in-guest agent is
|
||||||
|
# connected but not producing.
|
||||||
|
if (not silent_warned and first_byte_at_mono_ns is None
|
||||||
|
and (time.monotonic_ns() - connect_mono_ns)
|
||||||
|
> silent_warn_after_s * 1e9):
|
||||||
|
log.warning(
|
||||||
|
"guest-agent: socket connected but no bytes "
|
||||||
|
"after %.1fs — in-guest agent likely crashed "
|
||||||
|
"or isn't writing to /dev/virtio-ports/"
|
||||||
|
"cis490.guest.agent",
|
||||||
|
silent_warn_after_s,
|
||||||
|
)
|
||||||
|
if emit_event is not None:
|
||||||
|
emit_event(
|
||||||
|
"guest_agent_silent_window",
|
||||||
|
window_s=silent_warn_after_s,
|
||||||
|
)
|
||||||
|
silent_warned = True
|
||||||
continue
|
continue
|
||||||
except OSError as e:
|
except OSError as e:
|
||||||
log.warning("guest-agent recv failed: %s", e)
|
log.warning("guest-agent recv failed: %s", e)
|
||||||
|
|
@ -98,6 +148,20 @@ def run_loop(
|
||||||
if not chunk:
|
if not chunk:
|
||||||
log.info("guest-agent socket closed")
|
log.info("guest-agent socket closed")
|
||||||
break
|
break
|
||||||
|
if first_byte_at_mono_ns is None:
|
||||||
|
first_byte_at_mono_ns = time.monotonic_ns()
|
||||||
|
log.info(
|
||||||
|
"guest-agent: first byte received %.2fs after connect",
|
||||||
|
(first_byte_at_mono_ns - connect_mono_ns) / 1e9,
|
||||||
|
)
|
||||||
|
if emit_event is not None:
|
||||||
|
emit_event(
|
||||||
|
"guest_agent_first_byte",
|
||||||
|
wait_after_connect_s=(
|
||||||
|
(first_byte_at_mono_ns - connect_mono_ns)
|
||||||
|
/ 1e9
|
||||||
|
),
|
||||||
|
)
|
||||||
buf += chunk
|
buf += chunk
|
||||||
while b"\n" in buf:
|
while b"\n" in buf:
|
||||||
line, _, buf = buf.partition(b"\n")
|
line, _, buf = buf.partition(b"\n")
|
||||||
|
|
|
||||||
|
|
@ -127,11 +127,17 @@ def run_loop(
|
||||||
log.warning("perf binary not on PATH — perf collector disabled")
|
log.warning("perf binary not on PATH — perf collector disabled")
|
||||||
return 0
|
return 0
|
||||||
|
|
||||||
|
# perf stat writes its output (including -j JSON) to stderr by
|
||||||
|
# default when -p / --pid is in use; only when perf forks the
|
||||||
|
# workload itself does it go to stdout. --log-fd 1 forces output
|
||||||
|
# onto fd 1 so we can stream it through proc.stdout. Without this
|
||||||
|
# the collector silently writes 0 rows on every episode.
|
||||||
cmd = [
|
cmd = [
|
||||||
"perf", "stat",
|
"perf", "stat",
|
||||||
"-p", str(pid),
|
"-p", str(pid),
|
||||||
"-I", str(interval_ms),
|
"-I", str(interval_ms),
|
||||||
"-j",
|
"-j",
|
||||||
|
"--log-fd", "1",
|
||||||
"-e", ",".join(events),
|
"-e", ",".join(events),
|
||||||
]
|
]
|
||||||
log.info("starting perf: %s", " ".join(cmd))
|
log.info("starting perf: %s", " ".join(cmd))
|
||||||
|
|
@ -179,6 +185,12 @@ def run_loop(
|
||||||
value = _coerce_int(evt.get("counter-value"))
|
value = _coerce_int(evt.get("counter-value"))
|
||||||
if interval is None or event_name is None:
|
if interval is None or event_name is None:
|
||||||
continue
|
continue
|
||||||
|
# perf annotates event names with the privilege scope it
|
||||||
|
# was actually able to measure (e.g. "cycles:u" when only
|
||||||
|
# userspace is permitted under perf_event_paranoid=2).
|
||||||
|
# Strip the suffix so _build_row's plain-name lookups
|
||||||
|
# ("cycles", "instructions", ...) hit.
|
||||||
|
event_name = event_name.split(":", 1)[0]
|
||||||
# perf emits one JSON per (event, interval); a new
|
# perf emits one JSON per (event, interval); a new
|
||||||
# interval value means we should flush the previous row.
|
# interval value means we should flush the previous row.
|
||||||
if cur_interval is not None and interval != cur_interval:
|
if cur_interval is not None and interval != cur_interval:
|
||||||
|
|
|
||||||
|
|
@ -15,12 +15,27 @@ path = "multi/samba/usermap_script"
|
||||||
[module.options]
|
[module.options]
|
||||||
RHOSTS = "{{ target_ip }}"
|
RHOSTS = "{{ target_ip }}"
|
||||||
RPORT = 139
|
RPORT = 139
|
||||||
|
# WfsDelay = wait-for-session, the budget Metasploit's payload handler
|
||||||
|
# has to (a) verify the bind shell on the guest is up and (b) connect
|
||||||
|
# to it. Default is ~5s. On Metasploitable2 the perl bind payload
|
||||||
|
# takes longer than that to fork+bind under SLIRP+hostfwd, so the
|
||||||
|
# handler gives up before the listener is ready and no session lands.
|
||||||
|
# 30s gives bind_perl + the SLIRP forward time to settle. Matches
|
||||||
|
# session_open_timeout_s in exploits/driver.py so the driver and the
|
||||||
|
# framework agree on the wait budget. Refs PIPELINE.md §3 (0/67
|
||||||
|
# session_open finding).
|
||||||
|
WfsDelay = 30
|
||||||
|
|
||||||
[payload]
|
[payload]
|
||||||
path = "cmd/unix/bind_perl"
|
path = "cmd/unix/bind_perl"
|
||||||
|
|
||||||
[payload.options]
|
[payload.options]
|
||||||
LPORT = 4444
|
LPORT = 4444
|
||||||
|
# Give the handler retry budget when connecting to the bind port.
|
||||||
|
# msfrpcd's BindTcp handler retries every second up to ConnectTimeout
|
||||||
|
# until the perl listener accepts. Without this, a single failed
|
||||||
|
# connect aborts the session.
|
||||||
|
ConnectTimeout = 15
|
||||||
|
|
||||||
[session]
|
[session]
|
||||||
type = "shell"
|
type = "shell"
|
||||||
|
|
|
||||||
|
|
@ -286,6 +286,12 @@ class EpisodeRunner:
|
||||||
output_path=self.episode_dir / "telemetry-guest.jsonl",
|
output_path=self.episode_dir / "telemetry-guest.jsonl",
|
||||||
t_mono_origin_ns=self._t_mono_origin_ns,
|
t_mono_origin_ns=self._t_mono_origin_ns,
|
||||||
stop_event=self._stop,
|
stop_event=self._stop,
|
||||||
|
# Pipe lifecycle events into the orchestrator's
|
||||||
|
# events.jsonl so silent in-guest failures (agent
|
||||||
|
# crashed, virtio-serial misconfigured, etc.) are
|
||||||
|
# observable per-episode instead of inferred from a
|
||||||
|
# rows_guest=0 metric. Refs PIPELINE.md §1 / §4.4.
|
||||||
|
emit_event=self.emit_event,
|
||||||
)
|
)
|
||||||
|
|
||||||
def _perf_collector() -> None:
|
def _perf_collector() -> None:
|
||||||
|
|
|
||||||
|
|
@ -243,14 +243,21 @@ def _run_slot(
|
||||||
run_dir_base = "/tmp/cis490-vm-fleet"
|
run_dir_base = "/tmp/cis490-vm-fleet"
|
||||||
|
|
||||||
# Decide tier.
|
# Decide tier.
|
||||||
# Tier-3 target VMs always use SLIRP+hostfwd so msfrpcd can reach
|
# Tier-3 modules split into two classes by `requires_bridge`:
|
||||||
# the guest via loopback. BRIDGE tap is for the Tier-2 idle VM only
|
# - bind/reverse-shell payloads under SLIRP need only loopback
|
||||||
# (pcap source 4). Skip modules that need bridge egress (bind/reverse
|
# hostfwd (samba_usermap_script with bind_perl, etc.).
|
||||||
# shells that open a callback port the guest dials back or binds).
|
# - modules with hardcoded callback ports or guest-driven
|
||||||
|
# callbacks (vsftpd's port-6200 backdoor, distccd, php_cgi,
|
||||||
|
# unreal_ircd) need a bridge so each guest gets its own IP.
|
||||||
|
# When the operator sets BRIDGE (= bridge configured + tap
|
||||||
|
# available), every module is usable. Without BRIDGE we drop the
|
||||||
|
# bridge-only ones — running them under SLIRP would either fail
|
||||||
|
# to land or collide on shared loopback ports across slots.
|
||||||
|
bridge_set = bool(os.environ.get("BRIDGE"))
|
||||||
usable_modules: dict[str, ModuleConfig] = (
|
usable_modules: dict[str, ModuleConfig] = (
|
||||||
{k: v for k, v in cfg.modules.items() if not v.requires_bridge}
|
dict(cfg.modules) if bridge_set
|
||||||
if cfg.modules else {}
|
else {k: v for k, v in cfg.modules.items() if not v.requires_bridge}
|
||||||
)
|
) if cfg.modules else {}
|
||||||
tier3_ready = (
|
tier3_ready = (
|
||||||
not cfg.force_tier2
|
not cfg.force_tier2
|
||||||
and bool(usable_modules)
|
and bool(usable_modules)
|
||||||
|
|
@ -302,10 +309,15 @@ def _run_slot(
|
||||||
target_ports += f",{extra_host_port}:{extra_host_port}"
|
target_ports += f",{extra_host_port}:{extra_host_port}"
|
||||||
env["FLEET_PAYLOAD_LPORT"] = str(extra_host_port)
|
env["FLEET_PAYLOAD_LPORT"] = str(extra_host_port)
|
||||||
env["TARGET_PORTS"] = target_ports
|
env["TARGET_PORTS"] = target_ports
|
||||||
# Remove BRIDGE so launch_target.sh uses SLIRP+hostfwd instead of
|
# When BRIDGE is unset, force SLIRP+hostfwd; when it IS set we
|
||||||
# tap. Target VM connectivity goes through the hostfwd loopback ports;
|
# keep it so requires_bridge modules (vsftpd backdoor on the
|
||||||
# tap/bridge requires guest-IP discovery which isn't wired up yet.
|
# hardcoded port 6200, distccd, etc.) can reach the guest via
|
||||||
env.pop("BRIDGE", None)
|
# its own bridge IP. Refs Bug 1 in TIER3-BRINGUP.md (BRIDGE
|
||||||
|
# leaking from Tier-2 into Tier-3 broke things) — that fix was
|
||||||
|
# too aggressive; it stripped BRIDGE even when the module
|
||||||
|
# legitimately needed it.
|
||||||
|
if not bridge_set:
|
||||||
|
env.pop("BRIDGE", None)
|
||||||
cmd = [
|
cmd = [
|
||||||
py,
|
py,
|
||||||
str(cfg.repo_root / "tools" / "run_tier3_demo.py"),
|
str(cfg.repo_root / "tools" / "run_tier3_demo.py"),
|
||||||
|
|
|
||||||
|
|
@ -20,17 +20,7 @@ log = logging.getLogger("cis490.receiver")
|
||||||
|
|
||||||
|
|
||||||
SUFFIX = ".tar.zst"
|
SUFFIX = ".tar.zst"
|
||||||
SCHEMA_VERSION = 2
|
SCHEMA_VERSION = 1
|
||||||
|
|
||||||
# Mirrored from orchestrator.benign so the receiver can validate the
|
|
||||||
# benign-profile header without taking a dependency on the orchestrator
|
|
||||||
# package. Keep in sync if BENIGN_PROFILES grows.
|
|
||||||
_VALID_BENIGN_PROFILES: frozenset[str] = frozenset({
|
|
||||||
"idle", "web_visitor", "admin_session", "cron_burst",
|
|
||||||
"file_browse", "db_query", "package_check",
|
|
||||||
})
|
|
||||||
_VALID_EPISODE_TYPES: frozenset[str] = frozenset({"control", "infected"})
|
|
||||||
|
|
||||||
|
|
||||||
def _bearer_check(request: Request, expected: str | None) -> Response | None:
|
def _bearer_check(request: Request, expected: str | None) -> Response | None:
|
||||||
if expected is None:
|
if expected is None:
|
||||||
|
|
@ -98,7 +88,7 @@ def make_app(
|
||||||
expected_sha = expected_sha.lower()
|
expected_sha = expected_sha.lower()
|
||||||
|
|
||||||
try:
|
try:
|
||||||
schema_version = int(request.headers.get("x-schema-version", "2"))
|
schema_version = int(request.headers.get("x-schema-version", "1"))
|
||||||
except ValueError:
|
except ValueError:
|
||||||
return JSONResponse({"error": "bad X-Schema-Version"}, status_code=400)
|
return JSONResponse({"error": "bad X-Schema-Version"}, status_code=400)
|
||||||
|
|
||||||
|
|
@ -163,21 +153,6 @@ def make_app(
|
||||||
)
|
)
|
||||||
return JSONResponse(body, status_code=412)
|
return JSONResponse(body, status_code=412)
|
||||||
|
|
||||||
# Optional matrix-stratification headers. Validated against the
|
|
||||||
# closed enums so a misbehaving shipper can't write garbage into
|
|
||||||
# the index. Unknown values are dropped (header treated as absent)
|
|
||||||
# and logged so the operator can spot a version drift quickly.
|
|
||||||
episode_type = (request.headers.get("x-episode-type") or "").strip().lower()
|
|
||||||
if episode_type and episode_type not in _VALID_EPISODE_TYPES:
|
|
||||||
log.warning("dropping unknown X-Episode-Type=%r host=%s id=%s",
|
|
||||||
episode_type, host_id, episode_id)
|
|
||||||
episode_type = ""
|
|
||||||
benign_profile = (request.headers.get("x-benign-profile") or "").strip().lower()
|
|
||||||
if benign_profile and benign_profile not in _VALID_BENIGN_PROFILES:
|
|
||||||
log.warning("dropping unknown X-Benign-Profile=%r host=%s id=%s",
|
|
||||||
benign_profile, host_id, episode_id)
|
|
||||||
benign_profile = ""
|
|
||||||
|
|
||||||
cl = request.headers.get("content-length")
|
cl = request.headers.get("content-length")
|
||||||
if cl is not None:
|
if cl is not None:
|
||||||
try:
|
try:
|
||||||
|
|
@ -194,8 +169,6 @@ def make_app(
|
||||||
expected_sha256=expected_sha,
|
expected_sha256=expected_sha,
|
||||||
schema_version=schema_version,
|
schema_version=schema_version,
|
||||||
commit=commit or None,
|
commit=commit or None,
|
||||||
episode_type=episode_type or None,
|
|
||||||
benign_profile=benign_profile or None,
|
|
||||||
body=request.stream(),
|
body=request.stream(),
|
||||||
max_bytes=max_episode_bytes,
|
max_bytes=max_episode_bytes,
|
||||||
)
|
)
|
||||||
|
|
|
||||||
174
tests/test_collectors_emit.py
Normal file
174
tests/test_collectors_emit.py
Normal file
|
|
@ -0,0 +1,174 @@
|
||||||
|
"""§4.4 collector emit tests — each collector MUST produce >=1 row when
|
||||||
|
run for a few seconds against a synthesized busy workload. A collector
|
||||||
|
that fails this is removed from the active set (PIPELINE.md §4.4) — no
|
||||||
|
silent zero-row inclusion.
|
||||||
|
|
||||||
|
These tests intentionally invoke the real collector binaries (perf,
|
||||||
|
tcpdump) against real subprocesses. They will skip on environments
|
||||||
|
where the binary or capability is unavailable, but they will fail —
|
||||||
|
not skip — when the binary IS present and the collector still emits
|
||||||
|
zero rows. The whole point is to catch the "collector silently
|
||||||
|
disabled" failure mode.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import shutil
|
||||||
|
import socket
|
||||||
|
import subprocess
|
||||||
|
import threading
|
||||||
|
import time
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from collectors import perf_qemu
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Helpers
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def _spawn_busy_loop() -> subprocess.Popen:
|
||||||
|
"""Spawn a CPU-burning child whose PID we can hand to a collector.
|
||||||
|
`exec yes` so the captured PID IS the busy process — without exec,
|
||||||
|
the captured PID is the wrapping shell that sits parked waiting on
|
||||||
|
its child, and perf samples an idle process."""
|
||||||
|
return subprocess.Popen(
|
||||||
|
["sh", "-c", "exec yes >/dev/null"],
|
||||||
|
stdout=subprocess.DEVNULL,
|
||||||
|
stderr=subprocess.DEVNULL,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _run_collector_briefly(target, *, seconds: float, **kw) -> int:
|
||||||
|
"""Spin a collector run_loop in a thread for `seconds`, then stop it.
|
||||||
|
Returns the row count the collector reports."""
|
||||||
|
stop = threading.Event()
|
||||||
|
result: dict[str, int] = {}
|
||||||
|
|
||||||
|
def _go() -> None:
|
||||||
|
result["rows"] = target(stop_event=stop, **kw)
|
||||||
|
|
||||||
|
th = threading.Thread(target=_go, daemon=True)
|
||||||
|
th.start()
|
||||||
|
time.sleep(seconds)
|
||||||
|
stop.set()
|
||||||
|
th.join(timeout=seconds + 5.0)
|
||||||
|
return result.get("rows", 0)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# perf
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.skipif(
|
||||||
|
shutil.which("perf") is None,
|
||||||
|
reason="perf binary not on PATH; this host can't host the perf collector",
|
||||||
|
)
|
||||||
|
def test_perf_emits_rows_against_busy_pid(tmp_path: Path) -> None:
|
||||||
|
"""The perf collector must emit at least one row when pointed at a
|
||||||
|
busy PID for a few seconds. Software events (page-faults,
|
||||||
|
context-switches, cpu-clock) are used so the test is portable
|
||||||
|
across CPUs that lack hardware performance counters; the production
|
||||||
|
DEFAULT_EVENTS adds hardware events on top, which is fine where
|
||||||
|
they're available and degrades gracefully where they're not.
|
||||||
|
|
||||||
|
Regression for: perf stat -j writes to stderr by default with -p,
|
||||||
|
so reading proc.stdout silently gives 0 lines and 0 rows. Fixed
|
||||||
|
by passing --log-fd 1 in the perf invocation.
|
||||||
|
"""
|
||||||
|
busy = _spawn_busy_loop()
|
||||||
|
try:
|
||||||
|
out = tmp_path / "telemetry-perf.jsonl"
|
||||||
|
rows = _run_collector_briefly(
|
||||||
|
perf_qemu.run_loop,
|
||||||
|
seconds=2.0,
|
||||||
|
pid=busy.pid,
|
||||||
|
output_path=out,
|
||||||
|
t_mono_origin_ns=0,
|
||||||
|
interval_ms=200,
|
||||||
|
events=("page-faults", "context-switches", "cpu-clock"),
|
||||||
|
)
|
||||||
|
finally:
|
||||||
|
busy.terminate()
|
||||||
|
try:
|
||||||
|
busy.wait(timeout=2.0)
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
busy.kill()
|
||||||
|
busy.wait(timeout=1.0)
|
||||||
|
|
||||||
|
assert rows >= 1, (
|
||||||
|
f"perf collector wrote 0 rows against a busy PID — see "
|
||||||
|
f"PIPELINE.md §4.4. File: {out}, exists={out.exists()}, "
|
||||||
|
f"size={out.stat().st_size if out.exists() else 'n/a'}"
|
||||||
|
)
|
||||||
|
# Sanity-check the on-disk file matches what run_loop reported.
|
||||||
|
on_disk = out.read_text().splitlines() if out.exists() else []
|
||||||
|
assert len(on_disk) == rows, (
|
||||||
|
f"row count mismatch: run_loop returned {rows} but "
|
||||||
|
f"{len(on_disk)} lines on disk"
|
||||||
|
)
|
||||||
|
# Spot-check the row shape — one parsed row should have the
|
||||||
|
# expected schema.
|
||||||
|
sample = json.loads(on_disk[0])
|
||||||
|
assert sample["source"] == "host_perf"
|
||||||
|
assert sample["available_in_deployment"] is False
|
||||||
|
assert "t_mono_ns" in sample and "interval_s" in sample
|
||||||
|
# At least one row must have a populated metric — if every metric
|
||||||
|
# is None on every row, the parser is dropping values. Regression
|
||||||
|
# for: event names come back as "cycles:u" / "instructions:u"
|
||||||
|
# under perf_event_paranoid=2 (userspace-only), but `_build_row`
|
||||||
|
# looks up plain "cycles" / "instructions" — so every metric was
|
||||||
|
# silently null even when perf reported real numbers. The mapped
|
||||||
|
# fields in the row schema are cycles, instructions, page_faults,
|
||||||
|
# context_switches, branches, branch_misses, cache_references,
|
||||||
|
# cache_misses; we only need ANY of them populated to confirm the
|
||||||
|
# parser is wiring values into the row.
|
||||||
|
parsed = [json.loads(l) for l in on_disk]
|
||||||
|
metric_keys = ("cycles", "instructions", "page_faults",
|
||||||
|
"context_switches", "branches")
|
||||||
|
assert any(r.get(k) is not None for r in parsed for k in metric_keys), (
|
||||||
|
f"every metric is None on every row — perf parser is dropping "
|
||||||
|
f"values. Sample row: {parsed[0]}"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Tier-3 demo wiring regression
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def test_run_tier3_demo_wires_collector_sockets_into_episode_config() -> None:
|
||||||
|
"""`run_tier3_demo.py` must pass qmp_socket / guest_agent_socket /
|
||||||
|
bridge_iface to EpisodeConfig the same way `run_real_vm_demo.py`
|
||||||
|
does. Without these, those collectors silently emit zero rows on
|
||||||
|
every Tier-3 episode even though launch_target.sh creates the
|
||||||
|
underlying chardevs. Regression for: bug found 2026-05-03 against
|
||||||
|
elliott-thinkpad + k-gamingcom (rows_qmp=0 / rows_guest=0 / pcap=0
|
||||||
|
on 100% of Tier-3 episodes).
|
||||||
|
|
||||||
|
This is a source-grep test rather than an exec test because
|
||||||
|
run_tier3_demo.py boots qemu + msfrpcd, neither of which is
|
||||||
|
available in CI. The grep keeps the wiring honest with no
|
||||||
|
runtime cost."""
|
||||||
|
src = (Path(__file__).resolve().parent.parent
|
||||||
|
/ "tools" / "run_tier3_demo.py").read_text()
|
||||||
|
# The exact fragments that, if absent, mean the collectors will
|
||||||
|
# silently never start. Each must appear as a keyword arg of the
|
||||||
|
# EpisodeConfig(...) constructor call site.
|
||||||
|
for needle in (
|
||||||
|
"qmp_socket=qmp_sock",
|
||||||
|
"guest_agent_socket=agent_sock",
|
||||||
|
"bridge_iface=os.environ.get(\"BRIDGE\")",
|
||||||
|
):
|
||||||
|
assert needle in src, (
|
||||||
|
f"run_tier3_demo.py is missing `{needle}` on its "
|
||||||
|
f"EpisodeConfig — see PIPELINE.md §4.4. Tier-3 episodes "
|
||||||
|
f"will silently produce 0 rows for the corresponding "
|
||||||
|
f"collector."
|
||||||
|
)
|
||||||
|
|
@ -90,7 +90,18 @@ def build_user_data(*, embed_agent: bool, agent_path: Path | None) -> bytes:
|
||||||
raise FileNotFoundError(f"agent script not found: {agent_path}")
|
raise FileNotFoundError(f"agent script not found: {agent_path}")
|
||||||
agent_src = agent_path.read_text()
|
agent_src = agent_path.read_text()
|
||||||
|
|
||||||
|
# The Alpine cloud image (alpine-virt-3.X.Y-x86_64.qcow2) does not
|
||||||
|
# ship python3 by default, so the agent's `#!/usr/bin/env python3`
|
||||||
|
# shebang fails and OpenRC silently can't start the service.
|
||||||
|
# Result: telemetry-guest.jsonl is empty on every episode. Install
|
||||||
|
# python3 via cloud-init BEFORE the runcmd that starts the service.
|
||||||
|
# Refs PIPELINE.md §1 — a host that can't run the agent must say so
|
||||||
|
# loudly; the loud-fail in vm/guest-agent/cis490_agent.py + this
|
||||||
|
# explicit dep install close the silent-downgrade loop.
|
||||||
body = head + (
|
body = head + (
|
||||||
|
"packages:\n"
|
||||||
|
" - python3\n"
|
||||||
|
"package_update: true\n"
|
||||||
"write_files:\n"
|
"write_files:\n"
|
||||||
" - path: /usr/local/bin/cis490-agent\n"
|
" - path: /usr/local/bin/cis490-agent\n"
|
||||||
" permissions: '0755'\n"
|
" permissions: '0755'\n"
|
||||||
|
|
|
||||||
|
|
@ -289,6 +289,14 @@ def main() -> int:
|
||||||
)
|
)
|
||||||
)
|
)
|
||||||
|
|
||||||
|
# Wire the same collector sockets the Tier-2 path wires. Without
|
||||||
|
# these, EpisodeConfig defaults to None and the qmp / guest-agent
|
||||||
|
# / pcap collectors never start — even though launch_target.sh
|
||||||
|
# creates the qmp.sock + agent.sock chardevs and BRIDGE supplies
|
||||||
|
# the iface. Refs PIPELINE.md §4.4: a collector that appears
|
||||||
|
# configured but emits zero rows is exactly the silent-downgrade
|
||||||
|
# pattern §1 forbids.
|
||||||
|
agent_sock = run_dir / "agent.sock"
|
||||||
cfg = EpisodeConfig(
|
cfg = EpisodeConfig(
|
||||||
target_pid=qemu_pid,
|
target_pid=qemu_pid,
|
||||||
duration_s=sum(d for _, d in DEFAULT_SCHEDULE),
|
duration_s=sum(d for _, d in DEFAULT_SCHEDULE),
|
||||||
|
|
@ -297,6 +305,9 @@ def main() -> int:
|
||||||
phase_schedule=DEFAULT_SCHEDULE,
|
phase_schedule=DEFAULT_SCHEDULE,
|
||||||
image_name=module.name + "-target",
|
image_name=module.name + "-target",
|
||||||
snapshot_name="baseline-v1",
|
snapshot_name="baseline-v1",
|
||||||
|
qmp_socket=qmp_sock if qmp_sock.exists() else None,
|
||||||
|
guest_agent_socket=agent_sock if agent_sock.exists() else None,
|
||||||
|
bridge_iface=os.environ.get("BRIDGE") or None,
|
||||||
sample=sample,
|
sample=sample,
|
||||||
exploit_meta={
|
exploit_meta={
|
||||||
"framework": "metasploit",
|
"framework": "metasploit",
|
||||||
|
|
|
||||||
|
|
@ -238,14 +238,26 @@ def main(argv: list[str] | None = None) -> int:
|
||||||
sys.stdout.flush()
|
sys.stdout.flush()
|
||||||
return 0
|
return 0
|
||||||
|
|
||||||
# Open the virtio-serial port. If the host hasn't wired one up,
|
# Open the virtio-serial port. The host wires this up via QEMU's
|
||||||
# fall back to stdout so the agent is testable on bare-metal too.
|
# virtserialport device; if it's missing, either virtio_console
|
||||||
out_fp: Any
|
# isn't loaded in the guest kernel, the device wasn't included on
|
||||||
if os.path.exists(args.port):
|
# the QEMU command line, or udev hasn't created the symlink yet.
|
||||||
out_fp = open(args.port, "wb", buffering=0)
|
# Exit loudly so OpenRC re-runs us (per service config) and so
|
||||||
else:
|
# the failure is visible in /var/log/cis490-agent.log instead of
|
||||||
sys.stderr.write(f"[cis490-agent] {args.port} missing; writing to stdout\n")
|
# being absorbed by a silent stdout fallback. Refs PIPELINE.md
|
||||||
out_fp = sys.stdout.buffer
|
# §1 — a host that can't meet the bar must say so loudly, not
|
||||||
|
# silently downgrade to a half-running state.
|
||||||
|
if not os.path.exists(args.port):
|
||||||
|
sys.stderr.write(
|
||||||
|
f"[cis490-agent] FATAL: virtio-serial port {args.port} not "
|
||||||
|
f"present. Check (a) virtio_console kernel module is loaded "
|
||||||
|
f"inside the guest, (b) the QEMU command line includes "
|
||||||
|
f"-device virtserialport,name=cis490.guest.agent, (c) udev "
|
||||||
|
f"is creating /dev/virtio-ports/* symlinks. Exiting nonzero "
|
||||||
|
f"so this failure is observable rather than silently lost.\n"
|
||||||
|
)
|
||||||
|
return 2
|
||||||
|
out_fp = open(args.port, "wb", buffering=0)
|
||||||
|
|
||||||
interval_ns = args.interval_ms * 1_000_000
|
interval_ns = args.interval_ms * 1_000_000
|
||||||
next_tick = time.monotonic_ns()
|
next_tick = time.monotonic_ns()
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue