CIS490

bolyai/CIS490

Fork 0

Commit graph

Author	SHA1	Message	Date
Elliott Kolden	b29d30a1b2	Tier-3: fix QEMU boot, catalog admission, verify module Bug 14 (vm/launch_target.sh): Metasploitable2 requires -machine pc (i440fx), -cpu kvm32, -drive if=ide, and -device e1000. The previous config (-machine q35, -cpu host, -drive if=virtio, virtio-net-pci) caused a kernel panic at boot because /dev/vda != the grub root=/dev/sda1. Services never started; the b'' probe fix (Bug 10) then correctly waited out the full timeout with no result. Bug 15 (scripts/install-tier-3-4.sh): verify step used vsftpd_234_backdoor which is requires_bridge=true and has a hardcoded port-6200 backdoor. Changed to distccd_command_exec with TARGET_PORTS="5632:3632,4444:4444". manifest.toml: admit distccd_command_exec and unreal_ircd_3281_backdoor to the module catalog. Both use cmd/unix/bind_perl (bind shell, no guest egress, SLIRP-safe). distccd returns a valid protocol response so MSF's handler runs and session_open fires. Verified against Metasploitable2 sourceforge image sha256 a8c019c3. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-05 16:41:41 -06:00
Max Gorog	d9f913fc97	PIPELINE §5 step 6: event-driven labeller (§4.5) Phase labels are written ONLY when justifying events arrive. The schedule clock is now a budget — an upper bound — never a label source. This is the core honesty fix the §3 evidence demanded: Before: every Tier-3 episode wrote `infected_running` from the schedule clock regardless of whether session_open ever fired. Per §10 every dishonest label is a poisoned training example. 67/67 of the §3 probe episodes were poisoned this way. After: `infecting` writes ONLY when exploit_fire is observed in events.jsonl. `infected_running` writes ONLY when session_open is observed. Either timing out or seeing session_open_timeout terminates the walker with a `failed` label that the §4.6 acceptance gate will reject. PHASE_JUSTIFYING_EVENTS in orchestrator/episode.py declares which events justify which phases: "clean": None # orchestrator-emitted "armed": None # orchestrator-emitted "infecting": ("exploit_fire",) "infected_running": ("session_open",) TERMINAL_FAILURE_EVENTS = {"session_open_timeout"} short-circuit any event-driven wait into a `failed` label. `dormant` is intentionally OFF the canonical schedule. §4.5 calls for dormant to be event-driven (session_idle / session_active) too, but the driver doesn't emit those yet. Per §1 default-to-removal we ship without dormant rather than label it from the clock; when the driver gains those emits, dormant re-enters the schedule with proper justification. EpisodeRunner now owns: * `_event_log` — every emit_event appends here * `_event_cv` — condition variable for waiters * `_wait_for_event(names, since_t_mono_ns, timeout_s)` — returns the first matching event in the log with t_mono >= threshold; threshold catches events that fired during the previous on_phase callback. When an event-driven phase's justifier already arrived (e.g. exploit_fire emitted by driver._fire() inside on_phase("armed")), the walker uses the EVENT's t_mono on the label — not the time the walker noticed. The label means "this is when this thing actually happened." manifest.toml: dropped the dormant cycle from the canonical schedule. Episode is shorter (~30s) but every label is event-justified. 14 new tests in tests/test_event_driven_labeller.py covering: justifier mapping invariants, _wait_for_event semantics (already-arrived, future, timeout, since-threshold, first-of-multiple-names), walker behavior (orchestrator-emitted phases, event-driven phases, missing event → failed, terminal-failure-event short-circuit, stop event, event-t_mono on label, phase_transition events with justified_by). 286 tests passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 01:43:16 -05:00
Max Gorog	207a902c3e	PIPELINE §5 step 2: canonical manifest at <repo>/manifest.toml The experiment is now defined by a single version-pinned file — manifest.toml at the repo root. PIPELINE.md §4.1 / §13 / §16. Every lab host loads THIS exact file; per-host overrides of experiment shape are forbidden. Drops the following per-host CLI overrides that previously violated the canonical-manifest principle: * --manifest, --modules-dir (paths now derived) * --ram-per-vm-mib (in manifest.experiment) * --max-concurrent (manifest.experiment.fleet.max_concurrent_ceiling) * --max-tier3-slots (manifest.experiment.fleet.max_tier3_slots) * --force-tier2 (not a §14 sanctioned override knob — ship empty catalog to disable Tier-3) * --require-real-samples (sample-side concern; out of fleet scope) * tools/run__demo.py --manifest (samples path now from canonical) New surface: manifest.toml — the single source of truth * orchestrator/manifest.py — load_canonical() + Manifest dataclass with strict validation, raises ManifestError on any failure * EpisodeConfig.experiment_meta — populated by run__demo.py from the canonical manifest; stamped into every episode's meta.json under "experiment" key for provenance cis490-orchestrator.service — RestartPreventExitStatus=78 so manifest-load failures stay stuck-and-loud (§9, §4.7) * install-lab-host.sh — validates manifest.toml at install time; missing or invalid = die with clear message Catalog admission semantics: only modules whose name appears in manifest.catalog get loaded into the runtime catalog (§4.3 in miniature, will tighten further in step 4 when verified_against / last_verified actually gate admission). Missing toml for an admitted name is a sysadmin error → exit 78. Renames cfg.manifest → cfg.samples + adds cfg.experiment to disambiguate sample-manifest from experiment-manifest. Rewrites test_fleet.py fixture to construct synthetic Manifest objects so test outcomes don't depend on the on-disk manifest.toml content. 12 new tests in tests/test_manifest.py: schema-version mismatch, unknown collector, duplicate collector, unknown phase, negative phase seconds, negative ram, missing catalog fields, json round-trip. Local run: `python tools/run_fleet.py --capacity` correctly logs the loaded manifest and prints capacity. 241 tests passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 01:25:01 -05:00

Author

SHA1

Message

Date

Elliott Kolden

b29d30a1b2

Tier-3: fix QEMU boot, catalog admission, verify module

Bug 14 (vm/launch_target.sh): Metasploitable2 requires -machine pc
(i440fx), -cpu kvm32, -drive if=ide, and -device e1000. The previous
config (-machine q35, -cpu host, -drive if=virtio, virtio-net-pci)
caused a kernel panic at boot because /dev/vda != the grub root=/dev/sda1.
Services never started; the b'' probe fix (Bug 10) then correctly waited
out the full timeout with no result.

Bug 15 (scripts/install-tier-3-4.sh): verify step used vsftpd_234_backdoor
which is requires_bridge=true and has a hardcoded port-6200 backdoor.
Changed to distccd_command_exec with TARGET_PORTS="5632:3632,4444:4444".

manifest.toml: admit distccd_command_exec and unreal_ircd_3281_backdoor
to the module catalog. Both use cmd/unix/bind_perl (bind shell, no guest
egress, SLIRP-safe). distccd returns a valid protocol response so MSF's
handler runs and session_open fires. Verified against Metasploitable2
sourceforge image sha256 a8c019c3.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-05 16:41:41 -06:00

Max Gorog

d9f913fc97

PIPELINE §5 step 6: event-driven labeller (§4.5)

Phase labels are written ONLY when justifying events arrive. The
schedule clock is now a budget — an upper bound — never a label
source. This is the core honesty fix the §3 evidence demanded:

  Before: every Tier-3 episode wrote `infected_running` from the
          schedule clock regardless of whether session_open ever
          fired. Per §10 every dishonest label is a poisoned
          training example. 67/67 of the §3 probe episodes were
          poisoned this way.

  After:  `infecting` writes ONLY when exploit_fire is observed in
          events.jsonl. `infected_running` writes ONLY when
          session_open is observed. Either timing out or seeing
          session_open_timeout terminates the walker with a `failed`
          label that the §4.6 acceptance gate will reject.

PHASE_JUSTIFYING_EVENTS in orchestrator/episode.py declares which
events justify which phases:

    "clean":            None              # orchestrator-emitted
    "armed":            None              # orchestrator-emitted
    "infecting":        ("exploit_fire",)
    "infected_running": ("session_open",)

TERMINAL_FAILURE_EVENTS = {"session_open_timeout"} short-circuit any
event-driven wait into a `failed` label.

`dormant` is intentionally OFF the canonical schedule. §4.5 calls
for dormant to be event-driven (session_idle / session_active) too,
but the driver doesn't emit those yet. Per §1 default-to-removal we
ship without dormant rather than label it from the clock; when the
driver gains those emits, dormant re-enters the schedule with
proper justification.

EpisodeRunner now owns:
  * `_event_log` — every emit_event appends here
  * `_event_cv`  — condition variable for waiters
  * `_wait_for_event(names, since_t_mono_ns, timeout_s)` — returns
                                  the first matching event in the log
                                  with t_mono >= threshold; threshold
                                  catches events that fired during
                                  the previous on_phase callback.

When an event-driven phase's justifier already arrived (e.g.
exploit_fire emitted by driver._fire() inside on_phase("armed")),
the walker uses the EVENT's t_mono on the label — not the time the
walker noticed. The label means "this is when this thing actually
happened."

manifest.toml: dropped the dormant cycle from the canonical schedule.
Episode is shorter (~30s) but every label is event-justified.

14 new tests in tests/test_event_driven_labeller.py covering: justifier
mapping invariants, _wait_for_event semantics (already-arrived,
future, timeout, since-threshold, first-of-multiple-names), walker
behavior (orchestrator-emitted phases, event-driven phases, missing
event → failed, terminal-failure-event short-circuit, stop event,
event-t_mono on label, phase_transition events with justified_by).

286 tests passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-04 01:43:16 -05:00

Max Gorog

207a902c3e

PIPELINE §5 step 2: canonical manifest at <repo>/manifest.toml

The experiment is now defined by a single version-pinned file —
manifest.toml at the repo root. PIPELINE.md §4.1 / §13 / §16. Every
lab host loads THIS exact file; per-host overrides of experiment
shape are forbidden.

Drops the following per-host CLI overrides that previously violated
the canonical-manifest principle:
  * --manifest, --modules-dir       (paths now derived)
  * --ram-per-vm-mib                (in manifest.experiment)
  * --max-concurrent                (manifest.experiment.fleet.max_concurrent_ceiling)
  * --max-tier3-slots               (manifest.experiment.fleet.max_tier3_slots)
  * --force-tier2                   (not a §14 sanctioned override knob —
                                     ship empty catalog to disable Tier-3)
  * --require-real-samples          (sample-side concern; out of fleet scope)
  * tools/run_*_demo.py --manifest  (samples path now from canonical)

New surface:
  * manifest.toml                   — the single source of truth
  * orchestrator/manifest.py        — load_canonical() + Manifest dataclass
                                      with strict validation, raises
                                      ManifestError on any failure
  * EpisodeConfig.experiment_meta   — populated by run_*_demo.py from
                                      the canonical manifest; stamped
                                      into every episode's meta.json
                                      under "experiment" key for
                                      provenance
  * cis490-orchestrator.service     — RestartPreventExitStatus=78 so
                                      manifest-load failures stay
                                      stuck-and-loud (§9, §4.7)
  * install-lab-host.sh             — validates manifest.toml at
                                      install time; missing or invalid
                                      = die with clear message

Catalog admission semantics: only modules whose name appears in
manifest.catalog get loaded into the runtime catalog (§4.3 in
miniature, will tighten further in step 4 when verified_against /
last_verified actually gate admission). Missing toml for an admitted
name is a sysadmin error → exit 78.

Renames cfg.manifest → cfg.samples + adds cfg.experiment to
disambiguate sample-manifest from experiment-manifest. Rewrites
test_fleet.py fixture to construct synthetic Manifest objects so
test outcomes don't depend on the on-disk manifest.toml content.

12 new tests in tests/test_manifest.py: schema-version mismatch,
unknown collector, duplicate collector, unknown phase, negative
phase seconds, negative ram, missing catalog fields, json round-trip.

Local run: `python tools/run_fleet.py --capacity` correctly logs the
loaded manifest and prints capacity. 241 tests passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-04 01:25:01 -05:00

3 commits