The experiment is now defined by a single version-pinned file —
manifest.toml at the repo root. PIPELINE.md §4.1 / §13 / §16. Every
lab host loads THIS exact file; per-host overrides of experiment
shape are forbidden.
Drops the following per-host CLI overrides that previously violated
the canonical-manifest principle:
* --manifest, --modules-dir (paths now derived)
* --ram-per-vm-mib (in manifest.experiment)
* --max-concurrent (manifest.experiment.fleet.max_concurrent_ceiling)
* --max-tier3-slots (manifest.experiment.fleet.max_tier3_slots)
* --force-tier2 (not a §14 sanctioned override knob —
ship empty catalog to disable Tier-3)
* --require-real-samples (sample-side concern; out of fleet scope)
* tools/run_*_demo.py --manifest (samples path now from canonical)
New surface:
* manifest.toml — the single source of truth
* orchestrator/manifest.py — load_canonical() + Manifest dataclass
with strict validation, raises
ManifestError on any failure
* EpisodeConfig.experiment_meta — populated by run_*_demo.py from
the canonical manifest; stamped
into every episode's meta.json
under "experiment" key for
provenance
* cis490-orchestrator.service — RestartPreventExitStatus=78 so
manifest-load failures stay
stuck-and-loud (§9, §4.7)
* install-lab-host.sh — validates manifest.toml at
install time; missing or invalid
= die with clear message
Catalog admission semantics: only modules whose name appears in
manifest.catalog get loaded into the runtime catalog (§4.3 in
miniature, will tighten further in step 4 when verified_against /
last_verified actually gate admission). Missing toml for an admitted
name is a sysadmin error → exit 78.
Renames cfg.manifest → cfg.samples + adds cfg.experiment to
disambiguate sample-manifest from experiment-manifest. Rewrites
test_fleet.py fixture to construct synthetic Manifest objects so
test outcomes don't depend on the on-disk manifest.toml content.
12 new tests in tests/test_manifest.py: schema-version mismatch,
unknown collector, duplicate collector, unknown phase, negative
phase seconds, negative ram, missing catalog fields, json round-trip.
Local run: `python tools/run_fleet.py --capacity` correctly logs the
loaded manifest and prints capacity. 241 tests passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
58 lines
2.4 KiB
Desktop File
58 lines
2.4 KiB
Desktop File
[Unit]
|
|
Description=CIS490 lab-host episode orchestrator (fleet mode)
|
|
Documentation=https://maxgit.wg/spectral/CIS490
|
|
# Episodes need KVM. msfrpcd (for Tier 3+) is brought up out-of-band
|
|
# by cis490-msfrpcd.service when installed.
|
|
After=network-online.target wg-quick@wg0.service
|
|
Wants=network-online.target
|
|
|
|
[Service]
|
|
Type=simple
|
|
User=cis490
|
|
Group=cis490
|
|
WorkingDirectory=/opt/cis490
|
|
# /etc/cis490/lab-host.env is written by scripts/install-lab-host.sh;
|
|
# carries FLEET_HOST_ID, BRIDGE, and any operator-supplied overrides.
|
|
EnvironmentFile=/etc/cis490/lab-host.env
|
|
# msfrpc credentials (written by install-msfrpcd.sh). Optional (-) so the
|
|
# unit still starts on Tier-2-only hosts where msfrpcd isn't installed.
|
|
EnvironmentFile=-/etc/cis490/msfrpc.env
|
|
# Fleet mode: detect host capacity, run that many concurrent episodes
|
|
# per wave with samples + experiment shape drawn from the canonical
|
|
# manifest at /opt/cis490/manifest.toml. Each invocation runs one wave
|
|
# and exits; systemd respawns per Restart= below.
|
|
#
|
|
# Per PIPELINE.md §4.1 there are no --manifest, --max-tier3-slots,
|
|
# --ram-per-vm-mib, --max-concurrent, --force-tier2, or
|
|
# --require-real-samples flags. Experiment-shape parameters live in
|
|
# manifest.toml. Per-host overrides are forbidden.
|
|
#
|
|
# Exit 78 (sysadmin error) when the canonical manifest fails to load
|
|
# or when the host can't run the experiment. RestartPreventExitStatus=78
|
|
# keeps the unit stuck-and-loud rather than respawning into the same
|
|
# broken state — operator notices and fixes.
|
|
ExecStart=/opt/cis490/.venv/bin/python /opt/cis490/tools/run_fleet.py \
|
|
--data-root /var/lib/cis490/data \
|
|
--waves 1
|
|
Restart=always
|
|
RestartSec=15
|
|
RestartPreventExitStatus=78
|
|
|
|
# Hardening — explicitly grant CAP_NET_RAW for tcpdump (source 4) and
|
|
# CAP_SYS_ADMIN / CAP_PERFMON for perf (source 3) when the operator
|
|
# enables those. Both are inherited by per-episode subprocesses.
|
|
# NoNewPrivileges=false is required because AmbientCapabilities only
|
|
# survives across exec() if NNP is off.
|
|
NoNewPrivileges=false
|
|
PrivateTmp=true
|
|
ProtectSystem=strict
|
|
ProtectHome=true
|
|
# /tmp is needed for per-slot RUN_DIR (cis490-vm-fleet-<slot>) — the
|
|
# fleet runner stages QEMU's sockets + pidfile there.
|
|
ReadWritePaths=/var/lib/cis490 /tmp
|
|
SupplementaryGroups=kvm
|
|
AmbientCapabilities=CAP_NET_RAW CAP_NET_ADMIN CAP_SYS_ADMIN CAP_PERFMON
|
|
CapabilityBoundingSet=CAP_NET_RAW CAP_NET_ADMIN CAP_SYS_ADMIN CAP_PERFMON CAP_DAC_READ_SEARCH
|
|
|
|
[Install]
|
|
WantedBy=multi-user.target
|