PIPELINE §5 step 2: canonical manifest at <repo>/manifest.toml

The experiment is now defined by a single version-pinned file —
manifest.toml at the repo root. PIPELINE.md §4.1 / §13 / §16. Every
lab host loads THIS exact file; per-host overrides of experiment
shape are forbidden.

Drops the following per-host CLI overrides that previously violated
the canonical-manifest principle:
  * --manifest, --modules-dir       (paths now derived)
  * --ram-per-vm-mib                (in manifest.experiment)
  * --max-concurrent                (manifest.experiment.fleet.max_concurrent_ceiling)
  * --max-tier3-slots               (manifest.experiment.fleet.max_tier3_slots)
  * --force-tier2                   (not a §14 sanctioned override knob —
                                     ship empty catalog to disable Tier-3)
  * --require-real-samples          (sample-side concern; out of fleet scope)
  * tools/run_*_demo.py --manifest  (samples path now from canonical)

New surface:
  * manifest.toml                   — the single source of truth
  * orchestrator/manifest.py        — load_canonical() + Manifest dataclass
                                      with strict validation, raises
                                      ManifestError on any failure
  * EpisodeConfig.experiment_meta   — populated by run_*_demo.py from
                                      the canonical manifest; stamped
                                      into every episode's meta.json
                                      under "experiment" key for
                                      provenance
  * cis490-orchestrator.service     — RestartPreventExitStatus=78 so
                                      manifest-load failures stay
                                      stuck-and-loud (§9, §4.7)
  * install-lab-host.sh             — validates manifest.toml at
                                      install time; missing or invalid
                                      = die with clear message

Catalog admission semantics: only modules whose name appears in
manifest.catalog get loaded into the runtime catalog (§4.3 in
miniature, will tighten further in step 4 when verified_against /
last_verified actually gate admission). Missing toml for an admitted
name is a sysadmin error → exit 78.

Renames cfg.manifest → cfg.samples + adds cfg.experiment to
disambiguate sample-manifest from experiment-manifest. Rewrites
test_fleet.py fixture to construct synthetic Manifest objects so
test outcomes don't depend on the on-disk manifest.toml content.

12 new tests in tests/test_manifest.py: schema-version mismatch,
unknown collector, duplicate collector, unknown phase, negative
phase seconds, negative ram, missing catalog fields, json round-trip.

Local run: `python tools/run_fleet.py --capacity` correctly logs the
loaded manifest and prints capacity. 241 tests passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Max Gorog 2026-05-04 01:25:01 -05:00
parent dca6144a4a
commit 207a902c3e
11 changed files with 971 additions and 105 deletions

View file

@ -18,17 +18,25 @@ EnvironmentFile=/etc/cis490/lab-host.env
# unit still starts on Tier-2-only hosts where msfrpcd isn't installed.
EnvironmentFile=-/etc/cis490/msfrpc.env
# Fleet mode: detect host capacity, run that many concurrent episodes
# per wave with samples drawn from the manifest. Each invocation runs
# one wave and exits; systemd respawns per Restart= below, giving us
# a continuous stream of fresh-sample episodes per host. The shipper
# picks them up as `done.marker` files appear.
# per wave with samples + experiment shape drawn from the canonical
# manifest at /opt/cis490/manifest.toml. Each invocation runs one wave
# and exits; systemd respawns per Restart= below.
#
# Per PIPELINE.md §4.1 there are no --manifest, --max-tier3-slots,
# --ram-per-vm-mib, --max-concurrent, --force-tier2, or
# --require-real-samples flags. Experiment-shape parameters live in
# manifest.toml. Per-host overrides are forbidden.
#
# Exit 78 (sysadmin error) when the canonical manifest fails to load
# or when the host can't run the experiment. RestartPreventExitStatus=78
# keeps the unit stuck-and-loud rather than respawning into the same
# broken state — operator notices and fixes.
ExecStart=/opt/cis490/.venv/bin/python /opt/cis490/tools/run_fleet.py \
--data-root /var/lib/cis490/data \
--manifest /opt/cis490/samples/manifest.toml \
--waves 1 \
--max-tier3-slots 4
--waves 1
Restart=always
RestartSec=15
RestartPreventExitStatus=78
# Hardening — explicitly grant CAP_NET_RAW for tcpdump (source 4) and
# CAP_SYS_ADMIN / CAP_PERFMON for perf (source 3) when the operator

145
manifest.toml Normal file
View file

@ -0,0 +1,145 @@
# CIS490 canonical experiment manifest.
#
# This file is the single source of truth for what the experiment IS:
# which collectors run, at what cadence, against what target images,
# with which exploit modules in rotation, walking which phase budget,
# under what concurrency cap. Every lab host loads THIS exact file.
# Per-host overrides are forbidden — PIPELINE.md §4.1 / §13.
#
# Hosts that cannot run the canonical experiment exit 78 at
# orchestrator startup (PIPELINE.md §4.7). They produce zero episodes
# and say so loudly. There is no "ship what we can" fallback.
#
# Substantive amendments to this file follow PIPELINE.md §16:
# operator sign-off (§15), §8 decision tests, lands in the same merge
# as the code change it justifies. "This is just config" is not a
# free pass — the manifest is admission scope (§13).
schema_version = 1
name = "cis490-spectral-v1"
# ---------------------------------------------------------------------
# [experiment] — episode-shape parameters
# ---------------------------------------------------------------------
[experiment]
# Per-VM RAM in mebibytes. The capacity detector divides available
# host memory by this number to compute max_concurrent. Every slot
# gets the same RAM — no per-host fuzz.
ram_per_vm_mib = 320
# ---------------------------------------------------------------------
# [experiment.schedule] — phase budget (§4.5)
# ---------------------------------------------------------------------
# Each phase carries a duration in seconds — the MAXIMUM time the
# orchestrator will wait in that phase. Phase TRANSITIONS are written
# from observed events (PIPELINE.md §4.5), not from this clock. The
# clock is just an upper bound: if no event fires within the budget,
# the orchestrator advances and records the failure.
[[experiment.schedule.phases]]
name = "clean"
seconds = 10.0
[[experiment.schedule.phases]]
name = "armed"
seconds = 3.0
[[experiment.schedule.phases]]
name = "infecting"
seconds = 5.0
[[experiment.schedule.phases]]
name = "infected_running"
seconds = 25.0
[[experiment.schedule.phases]]
name = "dormant"
seconds = 15.0
[[experiment.schedule.phases]]
name = "infected_running"
seconds = 20.0
[[experiment.schedule.phases]]
name = "dormant"
seconds = 5.0
[[experiment.schedule.phases]]
name = "clean"
seconds = 5.0
# ---------------------------------------------------------------------
# [experiment.fleet] — concurrency policy
# ---------------------------------------------------------------------
[experiment.fleet]
# Hard ceiling on per-wave slots regardless of host capacity. 0 = no
# ceiling (capacity detector decides). Hosts whose capacity exceeds
# this still run only `max_concurrent_ceiling` slots, so the dataset
# isn't dominated by the largest host.
max_concurrent_ceiling = 0
# Cap of Tier-3 (real-exploit) slots per wave. Slots beyond this fall
# back to Tier-2. 0 = no cap. Useful when msfrpcd contention starts
# to matter; left at 0 by default.
max_tier3_slots = 0
# ---------------------------------------------------------------------
# [collectors] — active set (§4.4)
# ---------------------------------------------------------------------
# A collector listed here MUST emit ≥1 row when run against the
# canonical-manifest experiment (§4.4 admission). A collector that
# can't pass admission is REMOVED from this list — never silently
# included with zero rows.
[collectors]
active = [
"proc",
"qmp",
"perf",
"guest_agent",
"pcap",
"netflow",
]
[collectors.intervals]
proc_ms = 100
qmp_ms = 1000
perf_ms = 100
guest_agent_ms = 100
pcap_snaplen = 256
netflow_bucket_ms = 100
# ---------------------------------------------------------------------
# [catalog] — Tier-3 module catalog (§4.3)
# ---------------------------------------------------------------------
# Each entry references a module config in `exploits/modules/<name>.toml`.
# Every entry MUST carry `verified_against` and `last_verified`. The
# absence of either drops the module from the active catalog.
#
# Empty as of 2026-05-04: PIPELINE.md §3 found 0/67 session_open on
# samba_usermap_script against the SourceForge Metasploitable2 image,
# and no in-house verified target exists yet. §5 step 3 builds the
# target VM, step 4 re-admits modules with verification recorded.
# Until then, Tier-3 episodes do not run — the dataset is honest
# Tier-2 only.
[catalog]
modules = []
# ---------------------------------------------------------------------
# [targets] — target VM images (§4.2)
# ---------------------------------------------------------------------
# Each entry pins (image_name, sha256, build_script). The image MUST
# have been produced by `build_script` and verified per §4.2. Hosts
# that don't have the expected image at the expected sha256 fail
# preflight (§4.7) and produce zero episodes.
#
# Empty until §5 step 3 lands declarative target builds.
[targets]
images = []
# ---------------------------------------------------------------------
# [samples] — pointer to the malware-sample manifest
# ---------------------------------------------------------------------
# Samples roll forward independently of experiment shape (new sample
# = manifest entry; doesn't change collector set or schedule). Path
# is relative to repo root.
[samples]
manifest_path = "samples/manifest.toml"

View file

@ -160,6 +160,13 @@ class EpisodeConfig:
# doesn't need to import exploits.modules; populated by callers
# that have a ModuleConfig in hand.
exploit_meta: dict | None = None
# Canonical experiment manifest provenance (PIPELINE.md §4.1).
# Plain dict (output of Manifest.to_meta()) so the runner doesn't
# need to import orchestrator.manifest; populated by run_fleet /
# run_*_demo via load_canonical(). When None, meta.experiment is
# null — useful for unit tests that don't go through the canonical
# path; production episodes always carry it.
experiment_meta: dict | None = None
# Snapshot/revert (Tier 0+):
# revert_at_start — before any phase walks, loadvm <snapshot_name>.
# Use this to drop the guest back to a known-good baseline at
@ -466,6 +473,7 @@ class EpisodeRunner:
},
"exploit": self.cfg.exploit_meta,
"sample": sample_meta,
"experiment": self.cfg.experiment_meta,
"schedule": {
"baseline_seconds": self.cfg.duration_s,
"interval_ms": self.cfg.interval_ms,

View file

@ -43,6 +43,8 @@ from exploits.modules import (
)
from samples.manifest import Sample, SampleManifest
from .manifest import Manifest
log = logging.getLogger("cis490.fleet")
@ -93,32 +95,51 @@ class FleetCapacity:
@dataclass
class FleetConfig:
"""Per-host config that combines (a) the canonical experiment
manifest, (b) per-host identity (host_id), and (c) host-local
paths (data_root, repo_root). Experiment-shape parameters live
in `experiment` (the canonical Manifest). Per-host overrides of
experiment shape are forbidden see PIPELINE.md §4.1."""
host_id: str
repo_root: Path
data_root: Path
manifest: SampleManifest
# Canonical experiment manifest — drives schedule, ram_per_vm_mib,
# collector active set, intervals, fleet ceilings, catalog, targets.
experiment: "Manifest"
# Sample manifest (separate file at samples/manifest.toml). Holds
# the per-malware-family sample list; rolls forward independently
# of experiment shape so a new sample is one-line, not a
# canonical-manifest amendment.
samples: SampleManifest
# Module catalog for Tier-3 dispatch. Required for fleet-driven
# exploit-fire variety; empty catalog forces Tier-2 fallback.
# exploit-fire variety; empty catalog forces Tier-2 fallback. The
# canonical manifest's catalog admission gates which modules end
# up here (only those whose toml exists AND whose name is in
# experiment.catalog get loaded).
modules: dict[str, ModuleConfig] = field(default_factory=dict)
# VM resource shape — must match what the launcher requests.
ram_per_vm_mib: int = 320
# Cap concurrency below the calculated max (e.g. for a smoke test).
max_concurrent_override: int | None = None
# Skip episodes whose sample requires a real binary that's not present.
require_real_samples: bool = False
# Force Tier-2 even when msfrpcd is up; used by tests + dev runs
# that want a no-exploit baseline.
force_tier2: bool = False
# Limit how many slots per wave run as Tier-3. Slots 0..N-1 get
# Tier-3; the rest fall back to Tier-2. Metasploitable2 boot is IO-
# bound: running >~6 concurrent target VMs saturates disk and causes
# all slots to timeout waiting for the guest service to come up.
# None = no cap (all eligible slots use Tier-3).
max_tier3_slots: int | None = None
# msfrpcd connectivity (read by tier-3 driver via env).
# msfrpcd connectivity (read by tier-3 driver via env). Per-host
# because msfrpcd may be on a non-loopback bind on some hosts;
# NOT an experiment-shape parameter.
msfrpcd_host: str = "127.0.0.1"
msfrpcd_port: int = 55553
# ---- experiment-shape derived properties (manifest-driven) -----
@property
def ram_per_vm_mib(self) -> int:
return self.experiment.ram_per_vm_mib
@property
def max_concurrent_ceiling(self) -> int:
"""Hard ceiling on per-wave slots; 0 = no ceiling (capacity
detector decides)."""
return self.experiment.fleet.max_concurrent_ceiling
@property
def max_tier3_slots(self) -> int:
"""Cap of Tier-3 slots per wave; 0 = no cap."""
return self.experiment.fleet.max_tier3_slots
def _read_meminfo() -> dict[str, int]:
out: dict[str, int] = {}
@ -263,11 +284,16 @@ def _run_slot(
{k: v for k, v in cfg.modules.items() if not v.requires_bridge}
if cfg.modules else {}
)
# Tier-3 dispatch requires (a) at least one verified module loaded
# AND not filtered out by requires_bridge, (b) msfrpcd reachable,
# (c) the slot index is below the manifest's max_tier3_slots cap
# (0 = no cap). force_tier2 is no longer a sanctioned override knob
# per §14 — to disable Tier-3 globally, ship an empty catalog (the
# current state); to disable per-slot, raise the cap.
tier3_ready = (
not cfg.force_tier2
and bool(usable_modules)
bool(usable_modules)
and _msfrpcd_available(cfg.msfrpcd_host, cfg.msfrpcd_port)
and (cfg.max_tier3_slots is None or slot < cfg.max_tier3_slots)
and (cfg.max_tier3_slots == 0 or slot < cfg.max_tier3_slots)
)
env = os.environ.copy()
@ -345,13 +371,13 @@ def _run_slot(
]
tier = "tier2"
module_name = None
if not cfg.force_tier2 and not cfg.modules:
if not cfg.modules:
log.warning("slot=%d falling back to Tier 2: empty module catalog", slot)
elif not cfg.force_tier2 and not usable_modules:
elif not usable_modules:
log.warning("slot=%d falling back to Tier 2: no non-bridge modules available", slot)
elif not cfg.force_tier2 and cfg.max_tier3_slots is not None and slot >= cfg.max_tier3_slots:
elif cfg.max_tier3_slots > 0 and slot >= cfg.max_tier3_slots:
log.debug("slot=%d Tier 2 by max_tier3_slots=%d cap", slot, cfg.max_tier3_slots)
elif not cfg.force_tier2:
else:
log.warning("slot=%d falling back to Tier 2: msfrpcd unreachable at %s:%d",
slot, cfg.msfrpcd_host, cfg.msfrpcd_port)
@ -422,8 +448,12 @@ class FleetRunner:
ram_per_vm_mib=self.cfg.ram_per_vm_mib,
)
n_slots = capacity.max_concurrent
if self.cfg.max_concurrent_override is not None:
n_slots = min(n_slots, self.cfg.max_concurrent_override)
# Manifest ceiling: 0 = no ceiling. The capacity detector
# decides per-host based on hardware; the manifest caps that
# decision to keep the dataset balanced across hosts of
# different sizes (PIPELINE.md §4.1).
if self.cfg.max_concurrent_ceiling > 0:
n_slots = min(n_slots, self.cfg.max_concurrent_ceiling)
if n_slots <= 0:
log.warning(
"fleet capacity is zero (%s); cannot run", capacity.rationale,
@ -433,8 +463,9 @@ class FleetRunner:
)
log.info(
"fleet host=%s slots=%d episodes=%d manifest_size=%d",
self.cfg.host_id, n_slots, episodes, len(self.cfg.manifest),
"fleet host=%s slots=%d episodes=%d samples=%d experiment=%s",
self.cfg.host_id, n_slots, episodes, len(self.cfg.samples),
self.cfg.experiment.name,
)
all_results: list[SlotResult] = []
@ -444,18 +475,13 @@ class FleetRunner:
break
episode_index = episode_index_base + ep
slot_samples = [
self.cfg.manifest.select(
self.cfg.samples.select(
host_id=self.cfg.host_id,
slot=slot,
episode_index=episode_index,
)
for slot in range(n_slots)
]
if self.cfg.require_real_samples:
slot_samples = [s for s in slot_samples if s.kind == "real"]
if not slot_samples:
log.warning("require_real_samples: no real samples in manifest; skipping wave")
continue
log.info(
"wave %d/%d: %s",
@ -491,8 +517,8 @@ class FleetRunner:
# ---------------------------------------------------------------------------
def capacity_report() -> str:
c = detect_capacity()
def capacity_report(*, ram_per_vm_mib: int = 320) -> str:
c = detect_capacity(ram_per_vm_mib=ram_per_vm_mib)
return (
f"cores: {c.cores_total} (reserve {c.cores_reserved})\n"
f"ram: {c.ram_total_mib} MiB total, {c.ram_available_mib} MiB available "

371
orchestrator/manifest.py Normal file
View file

@ -0,0 +1,371 @@
"""Canonical experiment manifest (PIPELINE.md §4.1 / §13).
The manifest at `<repo_root>/manifest.toml` is the single source of
truth for what the experiment is: which collectors run, at what
cadence, against what targets, with which exploit modules in rotation,
walking which phase budget. Every lab host loads THIS file. There is
no per-host override flag, no `--manifest <path>` argument, no
fallback. A host that can't load and validate the canonical manifest
must exit 78 and ship zero episodes.
`load_canonical(repo_root)` reads from the fixed path and validates.
On any failure it raises `ManifestError`; callers translate that into
exit 78. `Manifest` is a frozen dataclass once loaded the values
don't move under us mid-run.
Substantive amendments follow PIPELINE.md §16: operator sign-off,
landed in the same merge as the code change, with §8 decision tests
applied to the amendment itself.
"""
from __future__ import annotations
import tomllib
from dataclasses import dataclass, field
from pathlib import Path
CANONICAL_FILENAME = "manifest.toml"
# Closed enums — keep in sync with the corresponding code that
# implements each name. A name not in these sets means the manifest
# is asking for something the orchestrator doesn't know how to do.
KNOWN_COLLECTORS: frozenset[str] = frozenset({
"proc",
"qmp",
"perf",
"guest_agent",
"pcap",
"netflow",
})
KNOWN_PHASES: frozenset[str] = frozenset({
"clean",
"armed",
"infecting",
"infected_running",
"dormant",
"failed",
})
class ManifestError(ValueError):
"""Raised when the canonical manifest is missing, unreadable, or
fails validation. The orchestrator translates this into exit 78
(PIPELINE.md §4.7 / §9)."""
@dataclass(frozen=True)
class Phase:
name: str
seconds: float
@dataclass(frozen=True)
class CollectorIntervals:
proc_ms: int
qmp_ms: int
perf_ms: int
guest_agent_ms: int
pcap_snaplen: int
netflow_bucket_ms: int
@dataclass(frozen=True)
class FleetPolicy:
max_concurrent_ceiling: int
max_tier3_slots: int
@dataclass(frozen=True)
class CatalogEntry:
name: str
verified_against: str
last_verified: str
@dataclass(frozen=True)
class TargetSpec:
image_name: str
sha256: str
build_script: str
@dataclass(frozen=True)
class Manifest:
schema_version: int
name: str
ram_per_vm_mib: int
schedule: tuple[Phase, ...]
fleet: FleetPolicy
collectors_active: tuple[str, ...]
intervals: CollectorIntervals
catalog: tuple[CatalogEntry, ...]
targets: tuple[TargetSpec, ...]
samples_manifest_path: str
# Resolved repo root + manifest path so callers can stamp them
# into meta.json for provenance without re-deriving.
repo_root: Path = field(repr=False)
manifest_path: Path = field(repr=False)
def to_meta(self) -> dict:
"""Lightweight representation suitable for embedding in
meta.json so episodes carry their experiment provenance.
Excludes the resolved Path fields (host-specific paths don't
belong in the wire-format)."""
return {
"schema_version": self.schema_version,
"name": self.name,
"ram_per_vm_mib": self.ram_per_vm_mib,
"phases": [
{"name": p.name, "seconds": p.seconds} for p in self.schedule
],
"fleet": {
"max_concurrent_ceiling": self.fleet.max_concurrent_ceiling,
"max_tier3_slots": self.fleet.max_tier3_slots,
},
"collectors_active": list(self.collectors_active),
"intervals": {
"proc_ms": self.intervals.proc_ms,
"qmp_ms": self.intervals.qmp_ms,
"perf_ms": self.intervals.perf_ms,
"guest_agent_ms": self.intervals.guest_agent_ms,
"pcap_snaplen": self.intervals.pcap_snaplen,
"netflow_bucket_ms": self.intervals.netflow_bucket_ms,
},
"catalog": [
{"name": c.name, "verified_against": c.verified_against,
"last_verified": c.last_verified}
for c in self.catalog
],
"targets": [
{"image_name": t.image_name, "sha256": t.sha256,
"build_script": t.build_script}
for t in self.targets
],
}
def load_canonical(repo_root: Path | str) -> Manifest:
"""Load + validate `<repo_root>/manifest.toml`. There is no
`manifest_path` parameter on purpose per §4.1 the canonical
manifest lives at exactly one path. Callers that pass the path
directly are off the runbook.
Raises ManifestError on any failure. Successful return guarantees
every field is present, every collector name is known, every
phase name is known, and every catalog entry has both
verified_against and last_verified.
"""
repo_root = Path(repo_root).resolve()
path = repo_root / CANONICAL_FILENAME
if not path.exists():
raise ManifestError(
f"canonical manifest not found at {path}. "
f"PIPELINE.md §4.1 requires exactly one manifest at the "
f"repo root; this host cannot run the experiment without it."
)
try:
raw = tomllib.loads(path.read_text())
except (OSError, tomllib.TOMLDecodeError) as e:
raise ManifestError(f"cannot parse {path}: {e}") from e
return _validate(raw, repo_root, path)
def _validate(raw: dict, repo_root: Path, path: Path) -> Manifest:
schema_version = _require_int(raw, "schema_version")
if schema_version != 1:
raise ManifestError(
f"manifest schema_version={schema_version} not supported; "
f"this orchestrator handles version 1 only. "
f"Upgrade orchestrator or downgrade manifest to match."
)
name = _require_str(raw, "name")
experiment = _require_dict(raw, "experiment")
ram_per_vm_mib = _require_int(experiment, "ram_per_vm_mib")
if ram_per_vm_mib <= 0:
raise ManifestError(
f"experiment.ram_per_vm_mib must be positive, got {ram_per_vm_mib}"
)
schedule_block = _require_dict(experiment, "schedule")
phases_raw = schedule_block.get("phases")
if not isinstance(phases_raw, list) or not phases_raw:
raise ManifestError(
"experiment.schedule.phases must be a non-empty array"
)
phases: list[Phase] = []
for i, p in enumerate(phases_raw):
if not isinstance(p, dict):
raise ManifestError(
f"experiment.schedule.phases[{i}] must be a table"
)
pname = _require_str(p, "name", ctx=f"phases[{i}]")
if pname not in KNOWN_PHASES:
raise ManifestError(
f"experiment.schedule.phases[{i}].name={pname!r} not in "
f"KNOWN_PHASES {sorted(KNOWN_PHASES)}"
)
secs = _require_float(p, "seconds", ctx=f"phases[{i}]")
if secs <= 0:
raise ManifestError(
f"experiment.schedule.phases[{i}].seconds must be > 0, "
f"got {secs}"
)
phases.append(Phase(name=pname, seconds=secs))
fleet_block = _require_dict(experiment, "fleet")
fleet = FleetPolicy(
max_concurrent_ceiling=_require_int(fleet_block, "max_concurrent_ceiling"),
max_tier3_slots=_require_int(fleet_block, "max_tier3_slots"),
)
if fleet.max_concurrent_ceiling < 0 or fleet.max_tier3_slots < 0:
raise ManifestError(
"experiment.fleet ceilings must be >= 0 (0 = no cap)"
)
collectors_block = _require_dict(raw, "collectors")
active_raw = collectors_block.get("active")
if not isinstance(active_raw, list):
raise ManifestError("collectors.active must be an array")
if len(set(active_raw)) != len(active_raw):
raise ManifestError(
f"collectors.active contains duplicates: {active_raw}"
)
for c in active_raw:
if c not in KNOWN_COLLECTORS:
raise ManifestError(
f"collectors.active references unknown collector "
f"{c!r}; known: {sorted(KNOWN_COLLECTORS)}"
)
collectors_active = tuple(active_raw)
intervals_block = _require_dict(collectors_block, "intervals")
intervals = CollectorIntervals(
proc_ms=_require_int(intervals_block, "proc_ms"),
qmp_ms=_require_int(intervals_block, "qmp_ms"),
perf_ms=_require_int(intervals_block, "perf_ms"),
guest_agent_ms=_require_int(intervals_block, "guest_agent_ms"),
pcap_snaplen=_require_int(intervals_block, "pcap_snaplen"),
netflow_bucket_ms=_require_int(intervals_block, "netflow_bucket_ms"),
)
for fname, fval in (
("proc_ms", intervals.proc_ms),
("qmp_ms", intervals.qmp_ms),
("perf_ms", intervals.perf_ms),
("guest_agent_ms", intervals.guest_agent_ms),
("pcap_snaplen", intervals.pcap_snaplen),
("netflow_bucket_ms", intervals.netflow_bucket_ms),
):
if fval <= 0:
raise ManifestError(
f"collectors.intervals.{fname} must be > 0, got {fval}"
)
catalog_block = _require_dict(raw, "catalog")
modules_raw = catalog_block.get("modules")
if not isinstance(modules_raw, list):
raise ManifestError("catalog.modules must be an array")
catalog: list[CatalogEntry] = []
for i, entry in enumerate(modules_raw):
if not isinstance(entry, dict):
raise ManifestError(f"catalog.modules[{i}] must be a table")
cname = _require_str(entry, "name", ctx=f"catalog[{i}]")
verified_against = _require_str(
entry, "verified_against", ctx=f"catalog[{i}]"
)
last_verified = _require_str(
entry, "last_verified", ctx=f"catalog[{i}]"
)
catalog.append(CatalogEntry(
name=cname,
verified_against=verified_against,
last_verified=last_verified,
))
targets_block = _require_dict(raw, "targets")
images_raw = targets_block.get("images")
if not isinstance(images_raw, list):
raise ManifestError("targets.images must be an array")
targets: list[TargetSpec] = []
for i, t in enumerate(images_raw):
if not isinstance(t, dict):
raise ManifestError(f"targets.images[{i}] must be a table")
targets.append(TargetSpec(
image_name=_require_str(t, "image_name", ctx=f"targets[{i}]"),
sha256=_require_str(t, "sha256", ctx=f"targets[{i}]"),
build_script=_require_str(t, "build_script", ctx=f"targets[{i}]"),
))
samples_block = _require_dict(raw, "samples")
samples_manifest_path = _require_str(samples_block, "manifest_path")
return Manifest(
schema_version=schema_version,
name=name,
ram_per_vm_mib=ram_per_vm_mib,
schedule=tuple(phases),
fleet=fleet,
collectors_active=collectors_active,
intervals=intervals,
catalog=tuple(catalog),
targets=tuple(targets),
samples_manifest_path=samples_manifest_path,
repo_root=repo_root,
manifest_path=path,
)
# ---------- helpers --------------------------------------------------
def _require(d: dict, key: str, kind: type, *, ctx: str = "") -> object:
where = f"{ctx}." if ctx else ""
if key not in d:
raise ManifestError(f"missing required field {where}{key}")
v = d[key]
if not isinstance(v, kind):
raise ManifestError(
f"field {where}{key} must be {kind.__name__}, got {type(v).__name__}"
)
return v
def _require_str(d: dict, key: str, *, ctx: str = "") -> str:
return _require(d, key, str, ctx=ctx) # type: ignore[return-value]
def _require_int(d: dict, key: str, *, ctx: str = "") -> int:
# tomllib parses both ints and floats; require strict int for fields
# that should be ints, accept int-valued floats for ergonomics.
where = f"{ctx}." if ctx else ""
if key not in d:
raise ManifestError(f"missing required field {where}{key}")
v = d[key]
if isinstance(v, bool): # bool is a subclass of int — reject explicitly
raise ManifestError(f"field {where}{key} must be int, got bool")
if isinstance(v, int):
return v
raise ManifestError(
f"field {where}{key} must be int, got {type(v).__name__}"
)
def _require_float(d: dict, key: str, *, ctx: str = "") -> float:
where = f"{ctx}." if ctx else ""
if key not in d:
raise ManifestError(f"missing required field {where}{key}")
v = d[key]
if isinstance(v, bool):
raise ManifestError(f"field {where}{key} must be number, got bool")
if isinstance(v, (int, float)):
return float(v)
raise ManifestError(
f"field {where}{key} must be number, got {type(v).__name__}"
)
def _require_dict(d: dict, key: str, *, ctx: str = "") -> dict:
return _require(d, key, dict, ctx=ctx) # type: ignore[return-value]

View file

@ -124,6 +124,24 @@ command -v perf >/dev/null || die \
command -v tcpdump >/dev/null || die \
"tcpdump not on PATH after install attempt — collector source 4 (bridge pcap + netflow) requires it. See ensure_collector_packages above for what was tried."
# Canonical experiment manifest (PIPELINE.md §4.1). Validate at install
# time so a host that cloned a repo missing manifest.toml fails the
# install loudly, not at first orchestrator startup. The orchestrator
# also validates and exits 78 on bad manifest, but install-time is the
# earliest point we can fail.
[[ -f "$REPO_ROOT/manifest.toml" ]] || die \
"$REPO_ROOT/manifest.toml not found — every lab host must ship the canonical experiment manifest (§4.1). Did the git pull complete?"
"$REPO_ROOT/.venv/bin/python" -c "
import sys, pathlib
sys.path.insert(0, '$REPO_ROOT')
from orchestrator.manifest import load_canonical, ManifestError
try:
load_canonical(pathlib.Path('$REPO_ROOT'))
except ManifestError as e:
print(f'manifest.toml failed validation: {e}', file=sys.stderr)
sys.exit(1)
" || die "manifest.toml validation failed — see error above (§4.1)"
# uv is preferred (lockfile-driven). Fall back to system pip if absent.
USE_UV=0
if command -v uv >/dev/null; then USE_UV=1; fi

View file

@ -245,7 +245,50 @@ def _fixture_modules() -> dict:
}
def _fleet_cfg_with_modules(tmp_path: Path, *, force_tier2: bool = False):
def _fixture_manifest(*, max_tier3_slots: int = 0,
max_concurrent_ceiling: int = 0):
"""Synthetic canonical Manifest for fleet tests.
Mirrors the production manifest.toml shape but constructed in-memory
so test outcomes don't depend on what the on-disk manifest happens
to say. Per-test parameterization (ceilings, future schedule
variants) goes through this builder, not through CLI overrides
that don't exist anymore (PIPELINE.md §4.1)."""
from orchestrator.manifest import (
CollectorIntervals, FleetPolicy, Manifest, Phase,
)
return Manifest(
schema_version=1,
name="test-fixture",
ram_per_vm_mib=320,
schedule=(
Phase("clean", 10.0),
Phase("armed", 3.0),
Phase("infecting", 5.0),
Phase("infected_running", 25.0),
Phase("dormant", 15.0),
Phase("clean", 5.0),
),
fleet=FleetPolicy(
max_concurrent_ceiling=max_concurrent_ceiling,
max_tier3_slots=max_tier3_slots,
),
collectors_active=("proc", "qmp", "perf", "guest_agent",
"pcap", "netflow"),
intervals=CollectorIntervals(
proc_ms=100, qmp_ms=1000, perf_ms=100,
guest_agent_ms=100, pcap_snaplen=256, netflow_bucket_ms=100,
),
catalog=(),
targets=(),
samples_manifest_path="samples/manifest.toml",
repo_root=REPO_ROOT,
manifest_path=REPO_ROOT / "manifest.toml",
)
def _fleet_cfg_with_modules(tmp_path: Path, *, max_tier3_slots: int = 0,
max_concurrent_ceiling: int = 0):
from orchestrator import fleet
from samples.manifest import SampleManifest
@ -254,9 +297,12 @@ def _fleet_cfg_with_modules(tmp_path: Path, *, force_tier2: bool = False):
host_id="test-host",
repo_root=repo_root,
data_root=tmp_path,
manifest=SampleManifest.load(repo_root / "samples" / "manifest.toml"),
experiment=_fixture_manifest(
max_tier3_slots=max_tier3_slots,
max_concurrent_ceiling=max_concurrent_ceiling,
),
samples=SampleManifest.load(repo_root / "samples" / "manifest.toml"),
modules=_fixture_modules(),
force_tier2=force_tier2,
)
@ -273,7 +319,7 @@ def test_fleet_dispatches_to_tier3_when_msfrpcd_listening(monkeypatch, tmp_path)
_patch_subprocess(monkeypatch)
capacity = fleet.detect_capacity()
sample = cfg.manifest.samples[0]
sample = cfg.samples.samples[0]
res = fleet._run_slot(cfg, slot=0, sample=sample, episode_index=0, capacity=capacity)
assert res.tier == "tier3", res
@ -293,7 +339,7 @@ def test_fleet_falls_back_to_tier2_when_msfrpcd_down(monkeypatch, tmp_path) -> N
_patch_subprocess(monkeypatch)
capacity = fleet.detect_capacity()
sample = cfg.manifest.samples[0]
sample = cfg.samples.samples[0]
res = fleet._run_slot(cfg, slot=0, sample=sample, episode_index=0, capacity=capacity)
assert res.tier == "tier2"
@ -309,26 +355,40 @@ def test_fleet_falls_back_to_tier2_when_module_catalog_empty(monkeypatch, tmp_pa
host_id="test-host",
repo_root=REPO_ROOT,
data_root=tmp_path,
manifest=SampleManifest.load(REPO_ROOT / "samples" / "manifest.toml"),
experiment=_fixture_manifest(),
samples=SampleManifest.load(REPO_ROOT / "samples" / "manifest.toml"),
modules={}, # explicitly empty
)
monkeypatch.setattr(fleet, "_msfrpcd_available", lambda *a, **kw: True)
_patch_subprocess(monkeypatch)
capacity = fleet.detect_capacity()
sample = cfg.manifest.samples[0]
sample = cfg.samples.samples[0]
res = fleet._run_slot(cfg, slot=0, sample=sample, episode_index=0, capacity=capacity)
assert res.tier == "tier2"
def test_fleet_force_tier2_overrides_msfrpcd(monkeypatch, tmp_path) -> None:
def test_fleet_empty_module_catalog_falls_back_to_tier2(monkeypatch, tmp_path) -> None:
"""An empty module catalog forces Tier-2 fallback even when msfrpcd
is reachable. This replaces the former force_tier2 override knob:
per PIPELINE.md §14 the closed override list contains only
CIS490_ALLOW_DIRTY, and per §1 the right way to disable Tier-3 is
to ship no admitted modules not to flag-flip the orchestrator."""
from orchestrator import fleet
cfg = _fleet_cfg_with_modules(tmp_path, force_tier2=True)
from samples.manifest import SampleManifest
cfg = fleet.FleetConfig(
host_id="test-host",
repo_root=REPO_ROOT,
data_root=tmp_path,
experiment=_fixture_manifest(),
samples=SampleManifest.load(REPO_ROOT / "samples" / "manifest.toml"),
modules={}, # empty catalog → no Tier-3
)
monkeypatch.setattr(fleet, "_msfrpcd_available", lambda *a, **kw: True)
_patch_subprocess(monkeypatch)
capacity = fleet.detect_capacity()
sample = cfg.manifest.samples[0]
sample = cfg.samples.samples[0]
res = fleet._run_slot(cfg, slot=0, sample=sample, episode_index=0, capacity=capacity)
assert res.tier == "tier2"
@ -344,7 +404,7 @@ def test_fleet_skips_requires_bridge_modules_when_no_bridge(monkeypatch, tmp_pat
_patch_subprocess(monkeypatch)
capacity = fleet.detect_capacity()
sample = cfg.manifest.samples[0]
sample = cfg.samples.samples[0]
seen_modules = set()
for ep in range(20):
res = fleet._run_slot(cfg, slot=0, sample=sample, episode_index=ep, capacity=capacity)
@ -379,7 +439,7 @@ def test_tier3_strips_bridge_env_even_when_set(monkeypatch, tmp_path) -> None:
monkeypatch.setenv("BRIDGE", "br-malware")
_patch_subprocess(monkeypatch)
capacity = fleet.detect_capacity()
sample = cfg.manifest.samples[0]
sample = cfg.samples.samples[0]
fleet._run_slot(cfg, slot=0, sample=sample, episode_index=0, capacity=capacity)
assert "BRIDGE" not in _RecordingPopen.calls[-1]["env"]
@ -399,7 +459,7 @@ def test_tier3_drops_requires_bridge_modules_unconditionally(monkeypatch, tmp_pa
_patch_subprocess(monkeypatch)
capacity = fleet.detect_capacity()
slirp_friendly = {k for k, v in cfg.modules.items() if not v.requires_bridge}
sample = cfg.manifest.samples[0]
sample = cfg.samples.samples[0]
seen = set()
for ep in range(40):
res = fleet._run_slot(cfg, slot=0, sample=sample,
@ -422,7 +482,7 @@ def test_fleet_assigns_unique_port_base_per_slot(monkeypatch, tmp_path) -> None:
_patch_subprocess(monkeypatch)
capacity = fleet.detect_capacity()
sample = cfg.manifest.samples[0]
sample = cfg.samples.samples[0]
fleet._run_slot(cfg, slot=0, sample=sample, episode_index=0, capacity=capacity)
fleet._run_slot(cfg, slot=1, sample=sample, episode_index=0, capacity=capacity)
fleet._run_slot(cfg, slot=2, sample=sample, episode_index=0, capacity=capacity)

168
tests/test_manifest.py Normal file
View file

@ -0,0 +1,168 @@
"""Tests for orchestrator/manifest.py — the canonical experiment
manifest loader (PIPELINE.md §4.1)."""
from __future__ import annotations
from pathlib import Path
import pytest
from orchestrator.manifest import (
KNOWN_COLLECTORS, KNOWN_PHASES, ManifestError, load_canonical,
)
REPO_ROOT = Path(__file__).resolve().parent.parent
MINIMAL_VALID = """
schema_version = 1
name = "test-experiment"
[experiment]
ram_per_vm_mib = 320
[[experiment.schedule.phases]]
name = "clean"
seconds = 1.0
[[experiment.schedule.phases]]
name = "armed"
seconds = 1.0
[experiment.fleet]
max_concurrent_ceiling = 0
max_tier3_slots = 0
[collectors]
active = ["proc"]
[collectors.intervals]
proc_ms = 100
qmp_ms = 1000
perf_ms = 100
guest_agent_ms = 100
pcap_snaplen = 256
netflow_bucket_ms = 100
[catalog]
modules = []
[targets]
images = []
[samples]
manifest_path = "samples/manifest.toml"
"""
def _write_manifest(repo: Path, body: str) -> None:
(repo / "manifest.toml").write_text(body)
def test_canonical_manifest_in_repo_loads() -> None:
"""The actual manifest.toml shipped in the repo MUST load and
validate. If it doesn't, every lab host fails preflight."""
m = load_canonical(REPO_ROOT)
assert m.schema_version == 1
assert m.name == "cis490-spectral-v1"
# Every active collector must be in KNOWN_COLLECTORS.
for c in m.collectors_active:
assert c in KNOWN_COLLECTORS
# Every scheduled phase name must be in KNOWN_PHASES.
for p in m.schedule:
assert p.name in KNOWN_PHASES
def test_loads_minimal_valid(tmp_path: Path) -> None:
_write_manifest(tmp_path, MINIMAL_VALID)
m = load_canonical(tmp_path)
assert m.name == "test-experiment"
assert len(m.schedule) == 2
assert m.fleet.max_concurrent_ceiling == 0
assert m.collectors_active == ("proc",)
def test_missing_file_raises_manifest_error(tmp_path: Path) -> None:
with pytest.raises(ManifestError, match="not found"):
load_canonical(tmp_path)
def test_unsupported_schema_version_raises(tmp_path: Path) -> None:
_write_manifest(tmp_path, MINIMAL_VALID.replace(
"schema_version = 1", "schema_version = 2"
))
with pytest.raises(ManifestError, match="schema_version=2"):
load_canonical(tmp_path)
def test_unknown_collector_in_active_raises(tmp_path: Path) -> None:
_write_manifest(tmp_path, MINIMAL_VALID.replace(
'active = ["proc"]', 'active = ["proc", "totally_not_real"]'
))
with pytest.raises(ManifestError, match="totally_not_real"):
load_canonical(tmp_path)
def test_duplicate_collector_in_active_raises(tmp_path: Path) -> None:
_write_manifest(tmp_path, MINIMAL_VALID.replace(
'active = ["proc"]', 'active = ["proc", "proc"]'
))
with pytest.raises(ManifestError, match="duplicate"):
load_canonical(tmp_path)
def test_unknown_phase_name_raises(tmp_path: Path) -> None:
bad = MINIMAL_VALID.replace('name = "armed"', 'name = "magical"', 1)
_write_manifest(tmp_path, bad)
with pytest.raises(ManifestError, match="magical"):
load_canonical(tmp_path)
def test_negative_phase_seconds_raises(tmp_path: Path) -> None:
bad = MINIMAL_VALID.replace('seconds = 1.0\n\n[[experiment.schedule.phases]]\nname = "armed"\nseconds = 1.0',
'seconds = 1.0\n\n[[experiment.schedule.phases]]\nname = "armed"\nseconds = -5.0')
_write_manifest(tmp_path, bad)
with pytest.raises(ManifestError, match="must be > 0"):
load_canonical(tmp_path)
def test_negative_ram_raises(tmp_path: Path) -> None:
bad = MINIMAL_VALID.replace("ram_per_vm_mib = 320", "ram_per_vm_mib = -1")
_write_manifest(tmp_path, bad)
with pytest.raises(ManifestError, match="positive"):
load_canonical(tmp_path)
def test_catalog_entry_missing_verified_against_raises(tmp_path: Path) -> None:
bad = MINIMAL_VALID.replace(
"[catalog]\nmodules = []",
'[catalog]\n[[catalog.modules]]\nname = "fixture"\nlast_verified = "abc"\n',
)
_write_manifest(tmp_path, bad)
with pytest.raises(ManifestError, match="verified_against"):
load_canonical(tmp_path)
def test_catalog_entry_with_both_fields_loads(tmp_path: Path) -> None:
valid = MINIMAL_VALID.replace(
"[catalog]\nmodules = []",
('[catalog]\n[[catalog.modules]]\nname = "fixture"\n'
'verified_against = "test-target"\nlast_verified = "abc"\n'),
)
_write_manifest(tmp_path, valid)
m = load_canonical(tmp_path)
assert len(m.catalog) == 1
assert m.catalog[0].name == "fixture"
assert m.catalog[0].verified_against == "test-target"
assert m.catalog[0].last_verified == "abc"
def test_to_meta_round_trips_to_json_safe(tmp_path: Path) -> None:
"""meta.json embedding requires the to_meta dict be json-encodable."""
import json
_write_manifest(tmp_path, MINIMAL_VALID)
m = load_canonical(tmp_path)
encoded = json.dumps(m.to_meta())
decoded = json.loads(encoded)
assert decoded["schema_version"] == 1
assert decoded["name"] == "test-experiment"

View file

@ -1,13 +1,20 @@
"""``cis490-fleet`` — run as many concurrent labeled episodes as the
host can handle, drawing samples from the manifest.
host can handle, drawing samples and experiment shape from the
canonical manifest at <repo_root>/manifest.toml.
Per PIPELINE.md §4.1 the experiment manifest is the single source of
truth for what the experiment is. There are NO `--manifest`,
`--modules-dir`, `--ram-per-vm-mib`, `--max-concurrent`,
`--max-tier3-slots`, `--force-tier2`, or `--require-real-samples`
flags those would all be per-host overrides of the canonical
experiment shape, which §4.1 forbids. A host that can't load the
canonical manifest exits 78 (§9, §4.7).
Modes:
--capacity Print the resource calculation and exit. No VMs spawned.
--waves N Run N waves of episodes (one wave = max_concurrent
episodes, each in its own slot). Default: 1.
--max-concurrent N
Cap concurrency below the auto-detected ceiling.
"""
from __future__ import annotations
@ -27,26 +34,26 @@ from exploits.modules import load_module_configs # noqa: E402
from orchestrator.fleet import ( # noqa: E402
FleetConfig, FleetRunner, capacity_report, detect_capacity,
)
from orchestrator.manifest import ManifestError, load_canonical # noqa: E402
from samples.manifest import SampleManifest # noqa: E402
# LSB-style sysadmin-error exit code: do NOT respawn (paired with
# RestartPreventExitStatus=78 on cis490-orchestrator.service).
EXIT_SYSADMIN_ERROR = 78
def main(argv: list[str] | None = None) -> int:
p = argparse.ArgumentParser(prog="cis490-fleet")
p.add_argument("--capacity", action="store_true")
p.add_argument("--waves", type=int, default=1)
p.add_argument("--max-concurrent", type=int, default=None)
p.add_argument("--manifest",
default=str(Path(__file__).resolve().parent.parent / "samples" / "manifest.toml"))
p.add_argument("--modules-dir",
default=str(Path(__file__).resolve().parent.parent / "exploits" / "modules"))
p.add_argument("--data-root", default="data")
p.add_argument("--host-id", default=os.environ.get("FLEET_HOST_ID") or os.uname().nodename)
p.add_argument("--ram-per-vm-mib", type=int, default=320)
p.add_argument("--require-real-samples", action="store_true")
p.add_argument("--force-tier2", action="store_true",
help="Skip Tier 3 even when msfrpcd is reachable")
p.add_argument("--max-tier3-slots", type=int, default=None,
help="Cap concurrent Tier-3 slots; slots >= N fall back to Tier-2")
p.add_argument("--capacity", action="store_true",
help="Print capacity calculation and exit")
p.add_argument("--waves", type=int, default=1,
help="Number of episode waves to run")
p.add_argument("--data-root", default="data",
help="Per-host writable directory for episode dirs")
p.add_argument("--host-id",
default=os.environ.get("FLEET_HOST_ID") or os.uname().nodename,
help="Stable identity for this host (per-host)")
p.add_argument("--log-level", default="INFO")
args = p.parse_args(argv)
@ -54,27 +61,64 @@ def main(argv: list[str] | None = None) -> int:
level=getattr(logging, args.log_level.upper(), logging.INFO),
format="%(asctime)s %(levelname)s %(name)s %(message)s",
)
log = logging.getLogger("cis490.fleet")
repo_root = Path(__file__).resolve().parent.parent
# Canonical manifest. PIPELINE.md §4.1 / §4.7 — failure to load is
# a sysadmin error, not a runtime warning. Exit 78 so systemd's
# RestartPreventExitStatus=78 keeps the unit stuck-and-loud rather
# than respawning into the same broken state.
try:
experiment = load_canonical(repo_root)
except ManifestError as e:
log.error("canonical manifest failed to load: %s", e)
log.error(
"this host cannot run the experiment without a valid "
"manifest.toml at %s. Per §4.1 there is no fallback. "
"Exiting %d (sysadmin error).",
repo_root / "manifest.toml", EXIT_SYSADMIN_ERROR,
)
return EXIT_SYSADMIN_ERROR
log.info("loaded experiment manifest: %s v%d (schema_version=%d)",
experiment.name, 1, experiment.schema_version)
if args.capacity:
print(capacity_report())
print(capacity_report(ram_per_vm_mib=experiment.ram_per_vm_mib))
return 0
manifest = SampleManifest.load(args.manifest)
repo_root = Path(__file__).resolve().parent.parent
modules_dir = Path(args.modules_dir)
modules = load_module_configs(modules_dir) if modules_dir.exists() else {}
samples_path = repo_root / experiment.samples_manifest_path
samples = SampleManifest.load(samples_path)
# Module catalog admission: only load modules whose name appears in
# experiment.catalog. Modules without a catalog entry don't get
# loaded into the runtime catalog regardless of whether their toml
# is on disk. PIPELINE.md §4.3.
modules_dir = repo_root / "exploits" / "modules"
on_disk_modules = (
load_module_configs(modules_dir) if modules_dir.exists() else {}
)
admitted_names = {entry.name for entry in experiment.catalog}
modules = {
name: cfg for name, cfg in on_disk_modules.items()
if name in admitted_names
}
if admitted_names and (admitted_names - set(modules.keys())):
missing = sorted(admitted_names - set(modules.keys()))
log.error(
"manifest.catalog references modules whose toml is missing "
"from %s: %s. Refusing to start (§4.3).", modules_dir, missing,
)
return EXIT_SYSADMIN_ERROR
cfg = FleetConfig(
host_id=args.host_id,
repo_root=repo_root,
data_root=Path(args.data_root).resolve(),
manifest=manifest,
experiment=experiment,
samples=samples,
modules=modules,
ram_per_vm_mib=args.ram_per_vm_mib,
max_concurrent_override=args.max_concurrent,
require_real_samples=args.require_real_samples,
force_tier2=args.force_tier2,
max_tier3_slots=args.max_tier3_slots,
)
runner = FleetRunner(cfg)
@ -88,6 +132,7 @@ def main(argv: list[str] | None = None) -> int:
print(json.dumps({
"host_id": args.host_id,
"experiment": experiment.name,
"capacity": result.capacity.to_dict(),
"modules_loaded": sorted(modules.keys()),
"slots": [

View file

@ -29,6 +29,7 @@ sys.path.insert(0, str(Path(__file__).resolve().parent))
from collectors import qmp # noqa: E402
from orchestrator.episode import EpisodeConfig, EpisodeRunner # noqa: E402
from orchestrator.manifest import ManifestError, load_canonical # noqa: E402
from samples.manifest import SampleManifest # noqa: E402
from vm_load_controller import VMLoadController # noqa: E402
from vm_serial import SerialClient # noqa: E402
@ -98,12 +99,9 @@ def main() -> int:
parser.add_argument(
"--sample",
default=os.environ.get("SAMPLE_NAME"),
help="Pick a workload profile from the manifest by name. Fleet runner "
"passes this via SAMPLE_NAME env. If unset, runs the v1 yes-loop.",
)
parser.add_argument(
"--manifest",
default=str(Path(__file__).resolve().parent.parent / "samples" / "manifest.toml"),
help="Pick a workload profile from the samples manifest by name. "
"Fleet runner passes this via SAMPLE_NAME env. If unset, runs the "
"v1 yes-loop.",
)
args = parser.parse_args()
@ -116,13 +114,23 @@ def main() -> int:
repo_root = Path(__file__).resolve().parent.parent
launcher = repo_root / "vm" / "launch_demo.sh"
# Canonical experiment manifest (PIPELINE.md §4.1). The samples-manifest
# path comes from here too; no per-call override.
try:
experiment = load_canonical(repo_root)
except ManifestError as e:
log.error("canonical manifest failed to load: %s", e)
return 78
samples_path = repo_root / experiment.samples_manifest_path
# Resolve sample if requested.
sample = None
if args.sample:
manifest = SampleManifest.load(args.manifest)
sample = next((s for s in manifest.samples if s.name == args.sample), None)
samples = SampleManifest.load(samples_path)
sample = next((s for s in samples.samples if s.name == args.sample), None)
if sample is None:
log.error("sample %r not in manifest %s", args.sample, args.manifest)
log.error("sample %r not in samples manifest %s",
args.sample, samples_path)
return 2
log.info("using sample=%s profile=%s kind=%s",
sample.name, sample.profile, sample.kind)
@ -217,6 +225,7 @@ def main() -> int:
qmp_socket=qmp_sock if qmp_sock.exists() else None,
guest_agent_socket=agent_sock if agent_sock.exists() else None,
bridge_iface=os.environ.get("BRIDGE") or None,
experiment_meta=experiment.to_meta(),
# Source 3 (oracle): perf-stat sampling of the qemu PID.
# Now that the stdout/stderr + event-name parser bugs are
# fixed, perf produces real rows. Per PIPELINE.md §4.4

View file

@ -38,6 +38,7 @@ from exploits.driver import DriverConfig, MSFExploitDriver # noqa: E402
from exploits.modules import load_module_config # noqa: E402
from exploits.msfrpc import MSFRpcClient, MSFRpcConfig # noqa: E402
from orchestrator.episode import EpisodeConfig, EpisodeRunner # noqa: E402
from orchestrator.manifest import ManifestError, load_canonical # noqa: E402
from samples.manifest import SampleManifest # noqa: E402
@ -170,12 +171,9 @@ def main() -> int:
parser.add_argument(
"--sample",
default=os.environ.get("SAMPLE_NAME"),
help="Pick a workload profile from the manifest by name. Fleet runner "
"passes this via SAMPLE_NAME env. Without it, falls back to the v1 yes-loop.",
)
parser.add_argument(
"--manifest",
default=str(Path(__file__).resolve().parent.parent / "samples" / "manifest.toml"),
help="Pick a workload profile from the samples manifest by name. "
"Fleet runner passes this via SAMPLE_NAME env. Without it, falls "
"back to the v1 yes-loop.",
)
args = parser.parse_args()
@ -198,15 +196,24 @@ def main() -> int:
log.error("no module config at %s", module_path)
return 2
# Canonical experiment manifest (PIPELINE.md §4.1).
try:
experiment = load_canonical(repo_root)
except ManifestError as e:
log.error("canonical manifest failed to load: %s", e)
return 78
samples_path = repo_root / experiment.samples_manifest_path
module = load_module_config(module_path)
log.info("module loaded: %s (%s)", module.name, module.module_path)
sample = None
if args.sample:
manifest = SampleManifest.load(args.manifest)
sample = next((s for s in manifest.samples if s.name == args.sample), None)
samples = SampleManifest.load(samples_path)
sample = next((s for s in samples.samples if s.name == args.sample), None)
if sample is None:
log.error("sample %r not in manifest %s", args.sample, args.manifest)
log.error("sample %r not in samples manifest %s",
args.sample, samples_path)
return 2
log.info("sample=%s profile=%s kind=%s",
sample.name, sample.profile, sample.kind)
@ -308,6 +315,7 @@ def main() -> int:
qmp_socket=qmp_sock if qmp_sock.exists() else None,
guest_agent_socket=agent_sock if agent_sock.exists() else None,
bridge_iface=os.environ.get("BRIDGE") or None,
experiment_meta=experiment.to_meta(),
# Source 3 (oracle): see equivalent in run_real_vm_demo.py.
# Now that the perf collector parser is fixed, turn it on
# in production rather than leave it silently disabled.