CIS490/manifest.toml
Max Gorog 207a902c3e PIPELINE §5 step 2: canonical manifest at <repo>/manifest.toml
The experiment is now defined by a single version-pinned file —
manifest.toml at the repo root. PIPELINE.md §4.1 / §13 / §16. Every
lab host loads THIS exact file; per-host overrides of experiment
shape are forbidden.

Drops the following per-host CLI overrides that previously violated
the canonical-manifest principle:
  * --manifest, --modules-dir       (paths now derived)
  * --ram-per-vm-mib                (in manifest.experiment)
  * --max-concurrent                (manifest.experiment.fleet.max_concurrent_ceiling)
  * --max-tier3-slots               (manifest.experiment.fleet.max_tier3_slots)
  * --force-tier2                   (not a §14 sanctioned override knob —
                                     ship empty catalog to disable Tier-3)
  * --require-real-samples          (sample-side concern; out of fleet scope)
  * tools/run_*_demo.py --manifest  (samples path now from canonical)

New surface:
  * manifest.toml                   — the single source of truth
  * orchestrator/manifest.py        — load_canonical() + Manifest dataclass
                                      with strict validation, raises
                                      ManifestError on any failure
  * EpisodeConfig.experiment_meta   — populated by run_*_demo.py from
                                      the canonical manifest; stamped
                                      into every episode's meta.json
                                      under "experiment" key for
                                      provenance
  * cis490-orchestrator.service     — RestartPreventExitStatus=78 so
                                      manifest-load failures stay
                                      stuck-and-loud (§9, §4.7)
  * install-lab-host.sh             — validates manifest.toml at
                                      install time; missing or invalid
                                      = die with clear message

Catalog admission semantics: only modules whose name appears in
manifest.catalog get loaded into the runtime catalog (§4.3 in
miniature, will tighten further in step 4 when verified_against /
last_verified actually gate admission). Missing toml for an admitted
name is a sysadmin error → exit 78.

Renames cfg.manifest → cfg.samples + adds cfg.experiment to
disambiguate sample-manifest from experiment-manifest. Rewrites
test_fleet.py fixture to construct synthetic Manifest objects so
test outcomes don't depend on the on-disk manifest.toml content.

12 new tests in tests/test_manifest.py: schema-version mismatch,
unknown collector, duplicate collector, unknown phase, negative
phase seconds, negative ram, missing catalog fields, json round-trip.

Local run: `python tools/run_fleet.py --capacity` correctly logs the
loaded manifest and prints capacity. 241 tests passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 01:25:01 -05:00

145 lines
5.2 KiB
TOML

# CIS490 canonical experiment manifest.
#
# This file is the single source of truth for what the experiment IS:
# which collectors run, at what cadence, against what target images,
# with which exploit modules in rotation, walking which phase budget,
# under what concurrency cap. Every lab host loads THIS exact file.
# Per-host overrides are forbidden — PIPELINE.md §4.1 / §13.
#
# Hosts that cannot run the canonical experiment exit 78 at
# orchestrator startup (PIPELINE.md §4.7). They produce zero episodes
# and say so loudly. There is no "ship what we can" fallback.
#
# Substantive amendments to this file follow PIPELINE.md §16:
# operator sign-off (§15), §8 decision tests, lands in the same merge
# as the code change it justifies. "This is just config" is not a
# free pass — the manifest is admission scope (§13).
schema_version = 1
name = "cis490-spectral-v1"
# ---------------------------------------------------------------------
# [experiment] — episode-shape parameters
# ---------------------------------------------------------------------
[experiment]
# Per-VM RAM in mebibytes. The capacity detector divides available
# host memory by this number to compute max_concurrent. Every slot
# gets the same RAM — no per-host fuzz.
ram_per_vm_mib = 320
# ---------------------------------------------------------------------
# [experiment.schedule] — phase budget (§4.5)
# ---------------------------------------------------------------------
# Each phase carries a duration in seconds — the MAXIMUM time the
# orchestrator will wait in that phase. Phase TRANSITIONS are written
# from observed events (PIPELINE.md §4.5), not from this clock. The
# clock is just an upper bound: if no event fires within the budget,
# the orchestrator advances and records the failure.
[[experiment.schedule.phases]]
name = "clean"
seconds = 10.0
[[experiment.schedule.phases]]
name = "armed"
seconds = 3.0
[[experiment.schedule.phases]]
name = "infecting"
seconds = 5.0
[[experiment.schedule.phases]]
name = "infected_running"
seconds = 25.0
[[experiment.schedule.phases]]
name = "dormant"
seconds = 15.0
[[experiment.schedule.phases]]
name = "infected_running"
seconds = 20.0
[[experiment.schedule.phases]]
name = "dormant"
seconds = 5.0
[[experiment.schedule.phases]]
name = "clean"
seconds = 5.0
# ---------------------------------------------------------------------
# [experiment.fleet] — concurrency policy
# ---------------------------------------------------------------------
[experiment.fleet]
# Hard ceiling on per-wave slots regardless of host capacity. 0 = no
# ceiling (capacity detector decides). Hosts whose capacity exceeds
# this still run only `max_concurrent_ceiling` slots, so the dataset
# isn't dominated by the largest host.
max_concurrent_ceiling = 0
# Cap of Tier-3 (real-exploit) slots per wave. Slots beyond this fall
# back to Tier-2. 0 = no cap. Useful when msfrpcd contention starts
# to matter; left at 0 by default.
max_tier3_slots = 0
# ---------------------------------------------------------------------
# [collectors] — active set (§4.4)
# ---------------------------------------------------------------------
# A collector listed here MUST emit ≥1 row when run against the
# canonical-manifest experiment (§4.4 admission). A collector that
# can't pass admission is REMOVED from this list — never silently
# included with zero rows.
[collectors]
active = [
"proc",
"qmp",
"perf",
"guest_agent",
"pcap",
"netflow",
]
[collectors.intervals]
proc_ms = 100
qmp_ms = 1000
perf_ms = 100
guest_agent_ms = 100
pcap_snaplen = 256
netflow_bucket_ms = 100
# ---------------------------------------------------------------------
# [catalog] — Tier-3 module catalog (§4.3)
# ---------------------------------------------------------------------
# Each entry references a module config in `exploits/modules/<name>.toml`.
# Every entry MUST carry `verified_against` and `last_verified`. The
# absence of either drops the module from the active catalog.
#
# Empty as of 2026-05-04: PIPELINE.md §3 found 0/67 session_open on
# samba_usermap_script against the SourceForge Metasploitable2 image,
# and no in-house verified target exists yet. §5 step 3 builds the
# target VM, step 4 re-admits modules with verification recorded.
# Until then, Tier-3 episodes do not run — the dataset is honest
# Tier-2 only.
[catalog]
modules = []
# ---------------------------------------------------------------------
# [targets] — target VM images (§4.2)
# ---------------------------------------------------------------------
# Each entry pins (image_name, sha256, build_script). The image MUST
# have been produced by `build_script` and verified per §4.2. Hosts
# that don't have the expected image at the expected sha256 fail
# preflight (§4.7) and produce zero episodes.
#
# Empty until §5 step 3 lands declarative target builds.
[targets]
images = []
# ---------------------------------------------------------------------
# [samples] — pointer to the malware-sample manifest
# ---------------------------------------------------------------------
# Samples roll forward independently of experiment shape (new sample
# = manifest entry; doesn't change collector set or schedule). Path
# is relative to repo root.
[samples]
manifest_path = "samples/manifest.toml"