Phase labels are written ONLY when justifying events arrive. The
schedule clock is now a budget — an upper bound — never a label
source. This is the core honesty fix the §3 evidence demanded:
Before: every Tier-3 episode wrote `infected_running` from the
schedule clock regardless of whether session_open ever
fired. Per §10 every dishonest label is a poisoned
training example. 67/67 of the §3 probe episodes were
poisoned this way.
After: `infecting` writes ONLY when exploit_fire is observed in
events.jsonl. `infected_running` writes ONLY when
session_open is observed. Either timing out or seeing
session_open_timeout terminates the walker with a `failed`
label that the §4.6 acceptance gate will reject.
PHASE_JUSTIFYING_EVENTS in orchestrator/episode.py declares which
events justify which phases:
"clean": None # orchestrator-emitted
"armed": None # orchestrator-emitted
"infecting": ("exploit_fire",)
"infected_running": ("session_open",)
TERMINAL_FAILURE_EVENTS = {"session_open_timeout"} short-circuit any
event-driven wait into a `failed` label.
`dormant` is intentionally OFF the canonical schedule. §4.5 calls
for dormant to be event-driven (session_idle / session_active) too,
but the driver doesn't emit those yet. Per §1 default-to-removal we
ship without dormant rather than label it from the clock; when the
driver gains those emits, dormant re-enters the schedule with
proper justification.
EpisodeRunner now owns:
* `_event_log` — every emit_event appends here
* `_event_cv` — condition variable for waiters
* `_wait_for_event(names, since_t_mono_ns, timeout_s)` — returns
the first matching event in the log
with t_mono >= threshold; threshold
catches events that fired during
the previous on_phase callback.
When an event-driven phase's justifier already arrived (e.g.
exploit_fire emitted by driver._fire() inside on_phase("armed")),
the walker uses the EVENT's t_mono on the label — not the time the
walker noticed. The label means "this is when this thing actually
happened."
manifest.toml: dropped the dormant cycle from the canonical schedule.
Episode is shorter (~30s) but every label is event-justified.
14 new tests in tests/test_event_driven_labeller.py covering: justifier
mapping invariants, _wait_for_event semantics (already-arrived,
future, timeout, since-threshold, first-of-multiple-names), walker
behavior (orchestrator-emitted phases, event-driven phases, missing
event → failed, terminal-failure-event short-circuit, stop event,
event-t_mono on label, phase_transition events with justified_by).
286 tests passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
141 lines
5.4 KiB
TOML
141 lines
5.4 KiB
TOML
# CIS490 canonical experiment manifest.
|
|
#
|
|
# This file is the single source of truth for what the experiment IS:
|
|
# which collectors run, at what cadence, against what target images,
|
|
# with which exploit modules in rotation, walking which phase budget,
|
|
# under what concurrency cap. Every lab host loads THIS exact file.
|
|
# Per-host overrides are forbidden — PIPELINE.md §4.1 / §13.
|
|
#
|
|
# Hosts that cannot run the canonical experiment exit 78 at
|
|
# orchestrator startup (PIPELINE.md §4.7). They produce zero episodes
|
|
# and say so loudly. There is no "ship what we can" fallback.
|
|
#
|
|
# Substantive amendments to this file follow PIPELINE.md §16:
|
|
# operator sign-off (§15), §8 decision tests, lands in the same merge
|
|
# as the code change it justifies. "This is just config" is not a
|
|
# free pass — the manifest is admission scope (§13).
|
|
|
|
schema_version = 1
|
|
name = "cis490-spectral-v1"
|
|
|
|
# ---------------------------------------------------------------------
|
|
# [experiment] — episode-shape parameters
|
|
# ---------------------------------------------------------------------
|
|
[experiment]
|
|
# Per-VM RAM in mebibytes. The capacity detector divides available
|
|
# host memory by this number to compute max_concurrent. Every slot
|
|
# gets the same RAM — no per-host fuzz.
|
|
ram_per_vm_mib = 320
|
|
|
|
# ---------------------------------------------------------------------
|
|
# [experiment.schedule] — phase budget (§4.5)
|
|
# ---------------------------------------------------------------------
|
|
# Each phase carries a duration in seconds — the MAXIMUM time the
|
|
# orchestrator will wait in that phase. Phase TRANSITIONS are written
|
|
# from observed events (PIPELINE.md §4.5), not from this clock. The
|
|
# clock is just an upper bound: if no event fires within the budget,
|
|
# the orchestrator emits a `failed` label and the §4.6 acceptance
|
|
# gate rejects the episode.
|
|
#
|
|
# The dormant cycle (dormant → infected_running → dormant) is NOT
|
|
# in the canonical schedule yet because the driver doesn't emit
|
|
# session_idle / session_active events. Per §1 default-to-removal,
|
|
# we ship without dormant rather than label it from the clock; when
|
|
# the driver gains those emits, they re-enter §4.5 with proper
|
|
# justification.
|
|
[[experiment.schedule.phases]]
|
|
name = "clean"
|
|
seconds = 10.0
|
|
|
|
[[experiment.schedule.phases]]
|
|
name = "armed"
|
|
seconds = 3.0
|
|
|
|
[[experiment.schedule.phases]]
|
|
name = "infecting"
|
|
seconds = 5.0
|
|
|
|
[[experiment.schedule.phases]]
|
|
name = "infected_running"
|
|
seconds = 25.0
|
|
|
|
[[experiment.schedule.phases]]
|
|
name = "clean"
|
|
seconds = 5.0
|
|
|
|
# ---------------------------------------------------------------------
|
|
# [experiment.fleet] — concurrency policy
|
|
# ---------------------------------------------------------------------
|
|
[experiment.fleet]
|
|
# Hard ceiling on per-wave slots regardless of host capacity. 0 = no
|
|
# ceiling (capacity detector decides). Hosts whose capacity exceeds
|
|
# this still run only `max_concurrent_ceiling` slots, so the dataset
|
|
# isn't dominated by the largest host.
|
|
max_concurrent_ceiling = 0
|
|
|
|
# Cap of Tier-3 (real-exploit) slots per wave. Slots beyond this fall
|
|
# back to Tier-2. 0 = no cap. Useful when msfrpcd contention starts
|
|
# to matter; left at 0 by default.
|
|
max_tier3_slots = 0
|
|
|
|
# ---------------------------------------------------------------------
|
|
# [collectors] — active set (§4.4)
|
|
# ---------------------------------------------------------------------
|
|
# A collector listed here MUST emit ≥1 row when run against the
|
|
# canonical-manifest experiment (§4.4 admission). A collector that
|
|
# can't pass admission is REMOVED from this list — never silently
|
|
# included with zero rows.
|
|
[collectors]
|
|
active = [
|
|
"proc",
|
|
"qmp",
|
|
"perf",
|
|
"guest_agent",
|
|
"pcap",
|
|
"netflow",
|
|
]
|
|
|
|
[collectors.intervals]
|
|
proc_ms = 100
|
|
qmp_ms = 1000
|
|
perf_ms = 100
|
|
guest_agent_ms = 100
|
|
pcap_snaplen = 256
|
|
netflow_bucket_ms = 100
|
|
|
|
# ---------------------------------------------------------------------
|
|
# [catalog] — Tier-3 module catalog (§4.3)
|
|
# ---------------------------------------------------------------------
|
|
# Each entry references a module config in `exploits/modules/<name>.toml`.
|
|
# Every entry MUST carry `verified_against` and `last_verified`. The
|
|
# absence of either drops the module from the active catalog.
|
|
#
|
|
# Empty as of 2026-05-04: PIPELINE.md §3 found 0/67 session_open on
|
|
# samba_usermap_script against the SourceForge Metasploitable2 image,
|
|
# and no in-house verified target exists yet. §5 step 3 builds the
|
|
# target VM, step 4 re-admits modules with verification recorded.
|
|
# Until then, Tier-3 episodes do not run — the dataset is honest
|
|
# Tier-2 only.
|
|
[catalog]
|
|
modules = []
|
|
|
|
# ---------------------------------------------------------------------
|
|
# [targets] — target VM images (§4.2)
|
|
# ---------------------------------------------------------------------
|
|
# Each entry pins (image_name, sha256, build_script). The image MUST
|
|
# have been produced by `build_script` and verified per §4.2. Hosts
|
|
# that don't have the expected image at the expected sha256 fail
|
|
# preflight (§4.7) and produce zero episodes.
|
|
#
|
|
# Empty until §5 step 3 lands declarative target builds.
|
|
[targets]
|
|
images = []
|
|
|
|
# ---------------------------------------------------------------------
|
|
# [samples] — pointer to the malware-sample manifest
|
|
# ---------------------------------------------------------------------
|
|
# Samples roll forward independently of experiment shape (new sample
|
|
# = manifest entry; doesn't change collector set or schedule). Path
|
|
# is relative to repo root.
|
|
[samples]
|
|
manifest_path = "samples/manifest.toml"
|