Commit graph

7 commits

Author SHA1 Message Date
b73f5559dc Tier-3 fixes: b'' probe false-positive, requires_bridge, msgpack
Bug 10: _wait_for_tcp returned on recv()→b'' (connection closed by peer),
falsely signalling service-ready. Only socket.timeout or non-empty data
are genuine ready signals; b'' now retries.

Bug 11: distccd_command_exec and unreal_ircd_3281_backdoor incorrectly
had requires_bridge=true. bind_perl payloads connect inward (host→guest
via hostfwd), not outward — no bridge egress needed. Both modules now
run on SLIRP-only fleet slots.

Bug 12: msgpack.unpackb crashed on integer session IDs from msfrpcd 6.x
(strict_map_key=True default). Added strict_map_key=False.

Bug 13 (documented): samba_usermap_script removed from catalog (NoReply
on every fire — already handled in dca6144 on origin/main).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 15:15:18 -06:00
Max Gorog
dca6144a4a catalog: remove samba_usermap_script — never landed sessions in prod
PIPELINE.md §1 (default-to-removal), §4.3 (catalog admission), §10
(every dishonest label is a poisoned training example).

Empirical evidence on commits 4ab5477c41763b: samba_usermap_script
fired its bind_perl payload but the framework's bind handler never
managed to connect to the guest's listening port within
session_open_timeout_s=30 (or even with WfsDelay=30 bumped on the
framework side). All 67 attempts in the §3 probe ended in
session_open_timeout. Yet the schedule clock was still writing
`infected_running` labels for the failed exploit — exactly the §10
poisoned-example pattern.

Until §5 step 3 builds an in-house target VM and step 4 re-admits
modules with `verified_against` recorded (§4.3), the production
catalog should consist of zero verified Tier-3 modules. That's the
state after this removal: the four remaining modules
(vsftpd_234_backdoor, distccd_command_exec, php_cgi_arg_injection,
unreal_ircd_3281_backdoor) are all `requires_bridge=true`, which the
fleet picker filters out unconditionally (the post-revert behavior
from commit 0390eb2). Net effect: production runs Tier-2 only,
producing honest Tier-2 episodes and zero dishonest Tier-3
infected_running labels.

Test fixture updated to inject synthetic in-memory ModuleConfigs
instead of loading from disk, so Tier-3 dispatch logic stays tested
even though no production module qualifies. test_exploits asserts
the new "every shipped module is requires_bridge until §4.3 admits
something verified" invariant — flips into a tripwire if anyone
reintroduces an unverified non-bridge module.

229 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 22:48:03 -05:00
Max Gorog
4ab5477226 PIPELINE §5 step 1: fix four root-cause defects
Diagnoses + fixes for the silent-collector / never-lands-session
failures that the 200-episode quality probe surfaced (§3 evidence).
All four address the producer; no compensating layers added.

perf collector (rows_perf=0 on 100% of episodes):
  - perf stat -j writes to stderr by default with -p; we read stdout.
    Add --log-fd 1 so JSON reaches stdout where the parser sees it.
  - Event names come back annotated with the privilege scope perf
    actually measured ("cycles:u" under perf_event_paranoid=2). Strip
    the suffix so _build_row's plain-name lookups hit. Without this
    every metric was None even when perf reported real numbers.
  - tests/test_collectors_emit.py covers the regression with a real
    busy-loop fixture; emit-test discipline per §4.4.

guest-agent collector (rows_guest=0 on 100% of episodes):
  - Alpine cloud image doesn't ship python3, so the in-guest agent's
    `#!/usr/bin/env python3` shebang silently fails. Add packages:
    [python3] to cidata user-data so cloud-init installs it before
    the OpenRC service starts.
  - Guest agent now exits nonzero (was: silent stdout fallback) when
    /dev/virtio-ports/cis490.guest.agent is missing, so OpenRC
    reports the failure to /var/log/cis490-agent.log instead of the
    bytes vanishing into the void. Refs §1.
  - Host-side collector emits guest_agent_connected /
    guest_agent_first_byte / guest_agent_silent_window into the
    orchestrator's events.jsonl. Future episodes show the in-guest
    failure mode per-episode instead of inferring from rows_guest=0.

k-gamingcom missing qmp/netflow/pcap (also affected elliott on
  Tier-3 episodes — was misclassified as host divergence):
  - tools/run_tier3_demo.py was building EpisodeConfig WITHOUT
    qmp_socket / guest_agent_socket / bridge_iface — even though
    launch_target.sh creates the underlying chardevs and BRIDGE
    supplies the iface. tools/run_real_vm_demo.py wires them
    correctly; Tier-3 had a copy-paste gap.
  - tests/test_collectors_emit.py adds a source-grep regression so
    the wiring stays honest.

samba_usermap_script never lands session (0/67 in §3 probe):
  - Bind handler default WfsDelay (~5s) gives up before bind_perl on
    Metasploitable2 has finished forking + binding LPORT under
    SLIRP+hostfwd. Bump to 30s; matches session_open_timeout_s in
    exploits/driver.py so framework + driver agree on the wait
    budget. Add ConnectTimeout=15 so the handler's bind connect has
    retry budget instead of one-shot.

orchestrator/fleet.py: usable_modules + BRIDGE handling were both
  unconditional, so:
  - With BRIDGE set, requires_bridge modules were still being
    dropped — picker only ever returned samba_usermap_script across
    every slot/episode (the test_fleet_uses_all_modules_when_bridge_set
    failure on HEAD).
  - env.pop("BRIDGE") fired even when BRIDGE was the operator's
    explicit setup, breaking modules that need bridge mode (vsftpd
    backdoor on hardcoded port 6200, distccd, etc.).
  Both made conditional on bridge_set so the picker walks the full
  catalog under bridge mode and SLIRP-only modules still get a
  clean SLIRP env when BRIDGE is unset.

receiver/app.py: half-pregnant v2 schema state in HEAD — calling
  store.ingest_stream(episode_type=..., benign_profile=...) with
  kwargs the matching store.py change was in the WIP stash. Removed
  v2 awareness from app.py so v1 episodes (what the producer ships
  today) get accepted again. SCHEMA_VERSION default reset to 1 to
  match.

229 passed, 0 failed. (HEAD had 15 failures, all linked to the
half-pregnant v2 state above.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 17:05:25 -05:00
667f042707 Tier-3 bring-up: 9 bugs fixed on elliott-ThinkPad (2026-05-01)
Root causes and fixes documented in TIER3-BRINGUP.md. Summary:

1. BRIDGE env var leaked into Tier-3 subprocess → target VM used tap
   instead of SLIRP; fix: env.pop("BRIDGE") in fleet _run_slot.

2. usable_modules filter conditioned on BRIDGE presence → bridge-requiring
   modules selected on SLIRP runs; fix: always filter requires_bridge.

3. cmd/unix/interact creates no session.list entry → session_open_timeout
   every episode; fix: switch samba_usermap_script to cmd/unix/bind_perl.

4. Per-slot LPORT hostfwd used wrong guest port (host:5444→guest:4444);
   fix: extra_host_port:extra_host_port mapping so guest binds the
   per-slot LPORT directly.

5. vsftpd backdoor port 6200 hardcoded → collision across concurrent slots;
   fix: requires_bridge=true filters it from SLIRP fleet runs.

6. SLIRP false-positive in _wait_for_tcp → exploit fires before Samba
   boots (~60 s too early); fix: replace TCP probe with serial console
   _wait_for_serial_login that waits for actual "login:" prompt.

7. Stale QEMU survives orchestrator restart (start_new_session=True) →
   holds hostfwd ports, new QEMU silently fails; fix: kill by pgid from
   old pidfile before rmtree.

8. PORT_BASE default used privileged port 21; fix: default to 2021+slot*100.

9. msfrpcd 6.x returns bytes for all string values even with raw=False;
   fix: MSFRpcClient._str() recursive decoder applied to all responses.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 12:26:19 -06:00
max
507eac617b Solvable Tier-3 holes: callback payloads, busybox workloads, bridge by default
Closes the next batch of issues from the post-mortem. The previous
"each run uses a different vulnerability" commit shipped 5 modules
but 3 of them couldn't actually fire under SLIRP+restrict=on:
their reverse-shell payloads needed a callback channel the launcher
didn't provide, AND their LHOST options were set to {{ target_ip }}
(the target's IP, not the attacker's — copy-paste from RHOSTS).
Same time, the workloads.py shell commands used bash-only /dev/tcp
redirects that silently no-op'd in the busybox shell sessions
Metasploitable2 returns. Net effect: episodes that selected those
modules would have produced session_open_timeout + dead workloads.

Module configs (the three callback ones):
  exploits/modules/distccd_command_exec.toml
  exploits/modules/php_cgi_arg_injection.toml
  exploits/modules/unreal_ircd_3281_backdoor.toml
    - Switch payload from cmd/unix/reverse* to cmd/unix/bind_perl
      so the target listens on a known port; msfrpcd connects to it
      via the host's hostfwd (no callback path required).
    - Drop the bogus LHOST = "{{ target_ip }}" — bind shells don't
      use LHOST.
    - Add [runtime] table:
        requires_bridge = true
        extra_target_ports = [<bind_lport>]
      Both fields are honored by the loader (ModuleConfig.requires_bridge)
      and the launcher (TARGET_PORTS gets the extra port hostfwd'd
      when BRIDGE mode is active).

orchestrator/fleet.py
  When BRIDGE is unset in env, _run_slot filters the module catalog
  down to modules where requires_bridge=False before calling
  select_module. Two same-socket-shell modules (vsftpd_234_backdoor +
  samba_usermap_script) survive — fleet still has variety; just
  doesn't pick modules whose payloads can't land. With BRIDGE set,
  the full catalog rotates as before, AND BRIDGE is propagated to
  the per-slot subprocess env so launch_target.sh enters tap+bridge
  mode.

exploits/workloads.py
  Replaced bash-only constructs in three profiles:
    scan-and-dial  /dev/tcp/HOST/PORT redirects → nc -z -w 1
    bursty-c2      same fix
    shell-resident exec 3<>/dev/tcp/...  → piping into nc -w
  All three now run cleanly in busybox / dash / Metasploitable2's
  default shell. The remaining three profiles (cpu-saturate, io-walk,
  low-and-slow) were already busybox-portable.

scripts/install-lab-host.sh
  - lab-host.env now defaults BRIDGE=br-malware (was commented out).
    Operator opt-out is to comment the line back in.
  - New step 6b: provisions br-malware via vm/setup_bridge.sh AND
    pre-creates a per-slot tap pool (cis490tap0..7 for Tier-2 demo,
    cis490target0..7 for Tier-3 target) all attached to br-malware
    and brought up. Launchers reference these by SLOT — no sudo
    needed at episode time.
  - On bridge-setup failure, the script auto-comments BRIDGE in the
    env file with a "auto-disabled: bridge setup failed" note so
    the fleet falls back to same-socket modules + Tier-2 cleanly.

tools/cis490_doctor.py
  Two new checks for the lab-host role:
    bridge: br-malware exists / up
    tier3: msfrpcd listening on 127.0.0.1:55553
    tier3: module catalog parses (counts same-socket vs requires_bridge)
  All three are warn-level — they don't fail an otherwise-healthy
  Tier-2-only setup; they tell the operator what's missing for full
  Tier-3 + source 4 coverage.

Tests: 132 (was 129). New cases:
  test_fleet.py +3
    - fleet skips requires_bridge modules when BRIDGE unset (asserted
      across 20 episodes; never picks a callback module)
    - fleet uses the full catalog when BRIDGE is set
    - BRIDGE env propagates to per-slot subprocess

What's still untested live: the bind_perl payloads against a real
Metasploitable2 in the bridge-enabled launcher path. That's a
deployment validation, not a code change. The unit tests confirm
the dispatch / filter logic; the live test is the next operator
action.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 02:32:52 -05:00
max
a193d17ead fleet: rotate exploit modules per (host, slot, ep); Tier 3 by default
Closes the "every run hits the same vulnerability" gap. Before this
commit, the fleet shipped Tier-2 episodes (no exploit at all) with
only the post-infection sample varying. Tier-3 had a single canned
module — vsftpd_234_backdoor — so even when exploit fire was
exercised, the entry vector never changed. Trainer would see one
shape of `armed → infecting` and learn nothing about how varied
real exploits look on the wire / in /proc.

What landed:

exploits/modules/
  + samba_usermap_script.toml          CVE-2007-2447, SMB:139
  + distccd_command_exec.toml          CVE-2004-2687, distcc:3632
  + php_cgi_arg_injection.toml         CVE-2012-1823, http:80
  + unreal_ircd_3281_backdoor.toml     CVE-2010-2075, ircd:6667
  (vsftpd_234_backdoor.toml unchanged)
  All five are canonical Metasploitable2 vectors with stable
  Metasploit modules. Each TOML carries the RPORT the launcher
  needs to wire its hostfwd at, plus a payload tuned to a clean
  shell session (cmd/unix/interact for in-band shells,
  cmd/unix/reverse* with deterministic LPORTs for reverse shells).

exploits/modules.py
  + select_module(catalog, host_id, slot, episode_index) — same
    SHA-256-keyed deterministic selection shape SampleManifest uses
    for samples. Two hosts at the same slot/episode hash to
    different modules; one host walks the full catalog within
    ~len(catalog) episodes.
  + module_target_port() — pulls RPORT off the module config so
    the fleet can plumb the launcher's hostfwd at the right service.

orchestrator/fleet.py
  - _run_slot now decides Tier 3 vs Tier 2 from msfrpcd reachability
    + module-catalog populated. Default is Tier 3 when both are true;
    Tier 2 fallback when not (logged + recorded in SlotResult.tier
    so trainers can filter no-exploit episodes).
  - Per-slot module via select_module() — each concurrent slot in a
    wave gets a different vector AND a different sample.
  - PORT_BASE per slot (target_port + slot * 1000) so concurrent
    Tier-3 targets don't collide on the host-side hostfwd port.
  - _msfrpcd_available() probe gates the dispatch.
  - Fleet-side log line records (slot, ep, tier, sample, module,
    run_dir) so the operator can see at a glance what each wave is
    exercising.
  - SlotResult grows tier + module_name fields; FleetConfig grows
    modules + force_tier2 + msfrpcd_{host,port} fields.

orchestrator/episode.py
  + EpisodeConfig.exploit_meta — plain dict the runner stamps into
    meta.exploit so every Tier-3 episode records {framework,
    module path, module type, payload, RPORT, RHOSTS template}.
    Trainers join on meta.exploit.module_name to stratify by entry
    vector; meta.sample.name to stratify by post-infection family.

tools/run_tier3_demo.py
  + Builds exploit_meta from the loaded ModuleConfig and passes it
    to EpisodeConfig. Sample is now also passed (was missing).

tools/run_fleet.py
  + --modules-dir (default exploits/modules/) — load module catalog
    on startup; pass to FleetConfig.
  + --force-tier2 — escape hatch for dev / smoke tests.
  + JSON output now includes per-slot {tier, module} so the operator
    can see at a glance what each slot ran without grepping logs.

Tests: 129 (was 119). New cases:
  test_exploits.py +6
    - catalog has at least the five canonical Metasploitable2 vectors
    - select_module is deterministic per (host, slot, ep)
    - select_module diversifies across hosts
    - select_module walks the full catalog over many episodes
    - module_target_port pulls RPORT for each shipped TOML
  test_fleet.py +4
    - _run_slot dispatches to run_tier3_demo.py when msfrpcd up
    - falls back to run_real_vm_demo.py when msfrpcd unreachable
    - falls back when module catalog empty
    - --force-tier2 overrides msfrpcd availability
    - PORT_BASE is unique per concurrent slot (no hostfwd collision)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 02:22:49 -05:00
max
613c6fa223 Tier 3: msfrpc-driven exploit driver + first module config
Adds the Tier-3 exploit driver — an MSFExploitDriver that plugs into
EpisodeRunner.on_phase, fires a Metasploit module against a target VM
via msfrpcd, watches for the resulting session, and stamps each
transition (exploit_fire, session_open, session_landing_probe,
sample_executed, session_dormant, session_killed) into the episode's
events.jsonl on the orchestrator's monotonic clock.

What landed:
- exploits/msfrpc.py — minimal msgpack-over-HTTPS client (auth,
  module.execute, job/session lifecycle) so we don't depend on a
  third-party MSF wrapper.
- exploits/driver.py — phase-to-msfrpc adapter; idempotent fire,
  session-open polling with timeout, workload start/stop, teardown.
- exploits/modules.py + exploits/modules/vsftpd_234_backdoor.toml —
  TOML module configs with {{ target_ip }} placeholders, replacing the
  imperative .rc-script approach the README previously hinted at.
- vm/launch_target.sh — SLIRP+restrict=on launcher for the
  intentionally-vulnerable target VM (host can reach guest via
  hostfwd, guest cannot reach host or internet).
- tools/run_tier3_demo.py — end-to-end runner mirroring run_real_vm_demo.
- tests/test_exploits.py — 12 new tests against a fake MSFRpcClient,
  including an integration test that drives a real EpisodeRunner.

Plumbing changes:
- EpisodeRunner._emit_event → public emit_event, so external drivers
  share the runner's monotonic clock and events.jsonl.
- mkdir for episode_dir moved to __init__ so emit_event is callable
  before run() (driver_setup fires pre-schedule).

Status: driver + tests pass (40/40); end-to-end against a live msfrpcd
+ Metasploitable2 image is the next bring-up step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 23:11:52 -05:00