Tier-3 bring-up: 9 bugs fixed on elliott-ThinkPad (2026-05-01)
Root causes and fixes documented in TIER3-BRINGUP.md. Summary:
1. BRIDGE env var leaked into Tier-3 subprocess → target VM used tap
instead of SLIRP; fix: env.pop("BRIDGE") in fleet _run_slot.
2. usable_modules filter conditioned on BRIDGE presence → bridge-requiring
modules selected on SLIRP runs; fix: always filter requires_bridge.
3. cmd/unix/interact creates no session.list entry → session_open_timeout
every episode; fix: switch samba_usermap_script to cmd/unix/bind_perl.
4. Per-slot LPORT hostfwd used wrong guest port (host:5444→guest:4444);
fix: extra_host_port:extra_host_port mapping so guest binds the
per-slot LPORT directly.
5. vsftpd backdoor port 6200 hardcoded → collision across concurrent slots;
fix: requires_bridge=true filters it from SLIRP fleet runs.
6. SLIRP false-positive in _wait_for_tcp → exploit fires before Samba
boots (~60 s too early); fix: replace TCP probe with serial console
_wait_for_serial_login that waits for actual "login:" prompt.
7. Stale QEMU survives orchestrator restart (start_new_session=True) →
holds hostfwd ports, new QEMU silently fails; fix: kill by pgid from
old pidfile before rmtree.
8. PORT_BASE default used privileged port 21; fix: default to 2021+slot*100.
9. msfrpcd 6.x returns bytes for all string values even with raw=False;
fix: MSFRpcClient._str() recursive decoder applied to all responses.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
86bd9e21d7
commit
667f042707
14 changed files with 417 additions and 46 deletions
190
TIER3-BRINGUP.md
Normal file
190
TIER3-BRINGUP.md
Normal file
|
|
@ -0,0 +1,190 @@
|
|||
# Tier-3 Bring-up Bug Report — elliott-ThinkPad (2026-05-01)
|
||||
|
||||
Bugs found and fixed during the first real-exploit fleet run on this host.
|
||||
All fixes are in the commits following the `Dev_REL1_043026` merge of main.
|
||||
|
||||
---
|
||||
|
||||
## Bug 1 — BRIDGE env var breaks Tier-3 target VM networking
|
||||
|
||||
**Symptom:** All Tier-3 slots timeout at 300 s waiting for the target
|
||||
service. QEMU starts with `netdev tap` instead of `netdev user` (SLIRP).
|
||||
|
||||
**Root cause:** `launch_target.sh` checks `BRIDGE` to switch between SLIRP
|
||||
and tap networking. The fleet runner copied the parent environment (which had
|
||||
`BRIDGE=br-malware` from the Tier-2 tap setup) into the Tier-3 subprocess.
|
||||
The Tier-3 target VMs don't have a tap interface configured, so all guest
|
||||
traffic is dropped.
|
||||
|
||||
**Fix:** `fleet.py` `_run_slot()` now calls `env.pop("BRIDGE", None)` before
|
||||
launching `run_tier3_demo.py`. Tier-2 idle VMs continue to use tap; Tier-3
|
||||
target VMs always use SLIRP+hostfwd.
|
||||
|
||||
**Files:** `orchestrator/fleet.py`
|
||||
|
||||
---
|
||||
|
||||
## Bug 2 — Bridge-requiring modules selected when BRIDGE is not available
|
||||
|
||||
**Symptom:** `distccd_command_exec` and `php_cgi_arg_injection` appear in
|
||||
`usable_modules` even on SLIRP-only runs. Exploit fires but the reverse-shell
|
||||
payload can't call back (no guest egress on `restrict=on`).
|
||||
|
||||
**Root cause:** `usable_modules` filtering was conditioned on `bridge_iface`
|
||||
being set in the environment. When BRIDGE was not set, ALL modules were
|
||||
considered usable. Modules that require bridge egress (reverse shells) silently
|
||||
fell through, fired, and timed out waiting for a session.
|
||||
|
||||
**Fix:** `usable_modules` now always filters `requires_bridge=True` modules
|
||||
regardless of the BRIDGE env var. The `requires_bridge` field in the module
|
||||
TOML is authoritative.
|
||||
|
||||
**Files:** `orchestrator/fleet.py`, `exploits/modules/*.toml`
|
||||
|
||||
---
|
||||
|
||||
## Bug 3 — `cmd/unix/interact` creates no persistent session
|
||||
|
||||
**Symptom:** `samba_usermap_script` fires (job_id=None), no session appears in
|
||||
`session.list` after 30 s. The exploit succeeds on the wire but the driver
|
||||
reports `session_open_timeout`.
|
||||
|
||||
**Root cause:** `cmd/unix/interact` is a console-only payload. It attaches
|
||||
directly to the module's job console — it does NOT create a background
|
||||
Meterpreter/shell session visible via `session.list`. msfrpcd's
|
||||
`module.execute` returns `job_id=None` (no background job), and
|
||||
`wait_for_new_session` polls forever.
|
||||
|
||||
**Fix:** Changed payload to `cmd/unix/bind_perl` with `LPORT=4444`. The
|
||||
bind-shell payload instructs the guest to listen on LPORT; msfrpcd connects
|
||||
to `RHOSTS:LPORT` after the exploit fires, creating a proper shell session.
|
||||
|
||||
**Files:** `exploits/modules/samba_usermap_script.toml`
|
||||
|
||||
---
|
||||
|
||||
## Bug 4 — Per-slot LPORT/hostfwd port mapping wrong
|
||||
|
||||
**Symptom:** For slots 1+, the bind-shell port is reachable on the host but
|
||||
msfrpcd cannot connect. `ss -tlnp` on the host shows port 5444 listening
|
||||
(QEMU) but the module tries to connect to port 4444.
|
||||
|
||||
**Root cause:** The extra hostfwd was `host:5444→guest:4444` (old guest port)
|
||||
but FLEET_PAYLOAD_LPORT=5444 instructed the guest bind_perl to listen on 5444.
|
||||
Mismatch: guest binds 5444, hostfwd forwards host:5444→guest:4444. No path.
|
||||
|
||||
**Fix:** Extra hostfwd now uses `extra_host_port:extra_host_port` on both
|
||||
sides. `extra_host_port = base_port + slot * 1000` is the per-slot LPORT, and
|
||||
the guest binds that exact port.
|
||||
|
||||
**Files:** `orchestrator/fleet.py`
|
||||
|
||||
---
|
||||
|
||||
## Bug 5 — vsftpd module port 6200 collision across concurrent slots
|
||||
|
||||
**Symptom:** Multiple Tier-3 slots running vsftpd_234_backdoor all try to
|
||||
hostfwd port 6200 (the backdoor bind port). QEMU for slots 1+ fail to start
|
||||
because port 6200 is already bound by slot 0's QEMU.
|
||||
|
||||
**Root cause:** vsftpd's backdoor hardcodes port 6200 in both the vulnerable
|
||||
binary and the Metasploit module. There is no LPORT override possible. With
|
||||
SLIRP+hostfwd, all concurrent slots must use the same host port, which is
|
||||
impossible.
|
||||
|
||||
**Fix:** Marked `vsftpd_234_backdoor.toml` with `requires_bridge = true`. The
|
||||
fleet runner filters it from `usable_modules` on SLIRP runs. When a bridge is
|
||||
available each guest gets its own IP, and msfrpcd connects to `guest_ip:6200`
|
||||
directly.
|
||||
|
||||
**Files:** `exploits/modules/vsftpd_234_backdoor.toml`
|
||||
|
||||
---
|
||||
|
||||
## Bug 6 — SLIRP false-positive in `_wait_for_tcp` causes premature exploit fire
|
||||
|
||||
**Symptom:** Log shows "target service is up" within 0.5 s of QEMU start. The
|
||||
exploit fires at t=10 s (end of clean phase) but Metasploitable2 needs 30–60 s
|
||||
to boot Samba. Result: `session_open_timeout` every episode.
|
||||
|
||||
**Root cause:** SLIRP's usermode TCP stack completes the TCP three-way
|
||||
handshake (SYN-ACK) immediately for any port that has a `hostfwd` rule,
|
||||
regardless of whether the guest OS has booted. A bare `socket.create_connection()`
|
||||
always succeeds. Even a `recv()` with a short timeout (0.5 s) fires with
|
||||
`socket.timeout` because during very early boot SLIRP cannot RST the connection
|
||||
(the guest TCP stack is not up yet), so the connection hangs open and the recv
|
||||
deadline fires before SLIRP can determine the guest state.
|
||||
|
||||
**Fix:** Replaced `_wait_for_tcp` with `_wait_for_serial_login`. The new
|
||||
function connects to QEMU's serial console socket (`serial.sock`) right after
|
||||
the pidfile appears and streams boot output until `"login:"` is seen. The
|
||||
serial console is authoritative: it reflects actual guest OS state, not
|
||||
SLIRP's synthetic TCP layer.
|
||||
|
||||
Timing:
|
||||
- `serial.sock` is created by QEMU at device init, before the pidfile.
|
||||
- We connect immediately after the pidfile → we receive all boot output.
|
||||
- Metasploitable2 prints `"metasploitable login:"` ≈ 50–70 s after QEMU start.
|
||||
- The clean phase (10 s) runs AFTER the login prompt, so the exploit fires
|
||||
when Samba is reliably up.
|
||||
|
||||
**Files:** `tools/run_tier3_demo.py`
|
||||
|
||||
---
|
||||
|
||||
## Bug 7 — Stale QEMU processes hold hostfwd ports across orchestrator restarts
|
||||
|
||||
**Symptom:** After a systemd restart of `cis490-orchestrator`, the new wave's
|
||||
QEMU processes fail to bind their hostfwd ports (e.g., 2139). The old QEMU
|
||||
from the previous wave is still running (QEMU is started with
|
||||
`start_new_session=True` so it survives the orchestrator's SIGTERM). The new
|
||||
episode detects the stale QEMU answering the port probe and proceeds as if the
|
||||
target is up — but the stale QEMU has different hostfwd mappings (no bind port
|
||||
for the current module), so the exploit never lands.
|
||||
|
||||
**Fix:** `run_tier3_demo.py` reads the old `qemu.pid` file from the run
|
||||
directory before recreating it. If a PID is found, `os.killpg(pgid, SIGTERM)`
|
||||
terminates the old QEMU process group, followed by a 1.5 s sleep to let QEMU
|
||||
exit before the port is rebound.
|
||||
|
||||
**Files:** `tools/run_tier3_demo.py`
|
||||
|
||||
---
|
||||
|
||||
## Bug 8 — `PORT_BASE` default uses privileged ports (< 1024)
|
||||
|
||||
**Symptom:** `launch_target.sh`'s default `PORT_BASE` was `21 + SLOT * 100`.
|
||||
On Tier-2 hosts without Metasploitable2, standalone `run_tier3_demo.py` tries
|
||||
to bind port 21 on loopback. The `cis490` service user cannot bind ports
|
||||
< 1024. QEMU exits immediately.
|
||||
|
||||
**Fix:** Default changed to `2021 + SLOT * 100`. Port 2021 is above 1024 and
|
||||
reflects the scheme used by the fleet runner (base_port + 2000).
|
||||
|
||||
**Files:** `vm/launch_target.sh`, `scripts/install-tier-3-4.sh`
|
||||
|
||||
---
|
||||
|
||||
## Bug 9 — msfrpc `module.execute` response is raw msgpack bytes, not str
|
||||
|
||||
**Symptom:** Key lookups on the `module.execute` response raise `KeyError`
|
||||
or fail silently because msgpack returns `bin` type (bytes) for all string
|
||||
values, even with `raw=False` on some Metasploit 6.x builds.
|
||||
|
||||
**Fix:** Added `MSFRpcClient._str()` to recursively decode bytes→str in all
|
||||
msgpack response dicts. Applied to `module.execute` and `session.list`.
|
||||
|
||||
**Files:** `exploits/msfrpc.py`
|
||||
|
||||
---
|
||||
|
||||
## Net result after all fixes
|
||||
|
||||
With fixes 1–9 applied:
|
||||
- All 4 Tier-3 slots use SLIRP+hostfwd with correct per-slot port mapping.
|
||||
- `samba_usermap_script` fires `cmd/unix/bind_perl` with the correct per-slot
|
||||
LPORT; msfrpcd connects to the bind port via hostfwd.
|
||||
- The exploit fires only after Metasploitable2 confirms its login prompt on
|
||||
the serial console (~60 s after QEMU start).
|
||||
- Sessions open, workloads execute, episodes complete with `session_open`
|
||||
events (not `session_open_timeout`).
|
||||
|
|
@ -14,6 +14,9 @@ WorkingDirectory=/opt/cis490
|
|||
# /etc/cis490/lab-host.env is written by scripts/install-lab-host.sh;
|
||||
# carries FLEET_HOST_ID, BRIDGE, and any operator-supplied overrides.
|
||||
EnvironmentFile=/etc/cis490/lab-host.env
|
||||
# msfrpc credentials (written by install-msfrpcd.sh). Optional (-) so the
|
||||
# unit still starts on Tier-2-only hosts where msfrpcd isn't installed.
|
||||
EnvironmentFile=-/etc/cis490/msfrpc.env
|
||||
# Fleet mode: detect host capacity, run that many concurrent episodes
|
||||
# per wave with samples drawn from the manifest. Each invocation runs
|
||||
# one wave and exits; systemd respawns per Restart= below, giving us
|
||||
|
|
@ -22,7 +25,8 @@ EnvironmentFile=/etc/cis490/lab-host.env
|
|||
ExecStart=/opt/cis490/.venv/bin/python /opt/cis490/tools/run_fleet.py \
|
||||
--data-root /var/lib/cis490/data \
|
||||
--manifest /opt/cis490/samples/manifest.toml \
|
||||
--waves 1
|
||||
--waves 1 \
|
||||
--max-tier3-slots 4
|
||||
Restart=always
|
||||
RestartSec=15
|
||||
|
||||
|
|
|
|||
|
|
@ -27,6 +27,7 @@ adapter between the phase machine and msfrpc.
|
|||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import os
|
||||
import time
|
||||
from dataclasses import dataclass
|
||||
from typing import Callable
|
||||
|
|
@ -52,6 +53,10 @@ EmitEvent = Callable[..., None]
|
|||
class DriverConfig:
|
||||
target_ip: str
|
||||
session_open_timeout_s: float = 30.0
|
||||
# HOST_PORT for the module's service. When set, overrides RPORT in the
|
||||
# module's options so msfrpcd connects to the hostfwd'd loopback port
|
||||
# rather than the guest's privileged port directly.
|
||||
target_port: int | None = None
|
||||
# Driver v1 fallback workload — used only when no Sample is passed
|
||||
# in (Sample-driven runs override these via exploits.workloads).
|
||||
# We keep the v1 path so existing callers keep working unchanged.
|
||||
|
|
@ -185,6 +190,15 @@ class MSFExploitDriver:
|
|||
log.debug("module already fired; skipping re-fire")
|
||||
return
|
||||
opts = self.module.render_options(target_ip=self.cfg.target_ip)
|
||||
if self.cfg.target_port is not None:
|
||||
opts["RPORT"] = self.cfg.target_port
|
||||
# Fleet sets FLEET_PAYLOAD_LPORT to the per-slot host port for
|
||||
# bind-shell payloads (cmd/unix/bind_perl etc.) so the handler
|
||||
# connects to the right hostfwd'd loopback port.
|
||||
fleet_lport = os.environ.get("FLEET_PAYLOAD_LPORT")
|
||||
if fleet_lport and "LPORT" in opts:
|
||||
opts["LPORT"] = int(fleet_lport)
|
||||
log.info("LPORT overridden to %s (FLEET_PAYLOAD_LPORT)", fleet_lport)
|
||||
self.emit(
|
||||
"exploit_fire",
|
||||
module=self.module.module_path,
|
||||
|
|
|
|||
|
|
@ -45,6 +45,11 @@ class ModuleConfig:
|
|||
# The fleet runner skips these unless BRIDGE is set so episodes
|
||||
# that fire them actually produce data.
|
||||
requires_bridge: bool = False
|
||||
# Guest ports the fleet must also hostfwd (in addition to RPORT).
|
||||
# Used for bind-shell payloads where the handler connects to a
|
||||
# separate port. Fleet calculates per-slot host ports and sets
|
||||
# FLEET_PAYLOAD_LPORT so the driver can override LPORT at fire time.
|
||||
extra_target_ports: tuple[int, ...] = ()
|
||||
|
||||
def render_options(self, *, target_ip: str) -> dict[str, Any]:
|
||||
"""Substitute ``{{ target_ip }}`` placeholders in options.
|
||||
|
|
@ -99,6 +104,9 @@ def load_module_config(path: Path) -> ModuleConfig:
|
|||
expected_session_type=raw.get("session", {}).get("type", "shell"),
|
||||
description=raw.get("description", ""),
|
||||
requires_bridge=bool(raw.get("runtime", {}).get("requires_bridge", False)),
|
||||
extra_target_ports=tuple(
|
||||
int(p) for p in raw.get("runtime", {}).get("extra_target_ports", [])
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
|
|
|
|||
|
|
@ -2,8 +2,10 @@ description = """
|
|||
Samba 3.0.20 username-map command injection (CVE-2007-2447). Trigger
|
||||
is a crafted username at SMB authentication; the Samba daemon shells
|
||||
out via the username_map_script and runs whatever the attacker put in
|
||||
the username. Standard Metasploitable2 vector. Returns a root shell
|
||||
on the SMB socket — works with cmd/unix/interact.
|
||||
the username. Standard Metasploitable2 vector. Uses a bind-perl
|
||||
payload so msfrpcd can connect to the resulting shell via SLIRP
|
||||
hostfwd; LPORT is fleet-assigned per slot (base 4444, +1000/slot)
|
||||
to avoid collisions across concurrent episodes.
|
||||
"""
|
||||
|
||||
[module]
|
||||
|
|
@ -15,7 +17,16 @@ RHOSTS = "{{ target_ip }}"
|
|||
RPORT = 139
|
||||
|
||||
[payload]
|
||||
path = "cmd/unix/interact"
|
||||
path = "cmd/unix/bind_perl"
|
||||
|
||||
[payload.options]
|
||||
LPORT = 4444
|
||||
|
||||
[session]
|
||||
type = "shell"
|
||||
|
||||
[runtime]
|
||||
# bind_perl opens a new guest port; fleet hostfwds it via SLIRP.
|
||||
# No bridge egress needed — host connects in, not guest out.
|
||||
requires_bridge = false
|
||||
extra_target_ports = [4444]
|
||||
|
|
|
|||
|
|
@ -1,8 +1,14 @@
|
|||
description = """
|
||||
vsftpd 2.3.4 intentional backdoor (CVE-2011-2523). Triggered by an FTP
|
||||
USER name ending with ':)'. Standard Metasploitable2 exploit, fully
|
||||
deterministic — perfect for a Tier-3 first-light run because the
|
||||
exploit fire timing is bounded by a single FTP round-trip.
|
||||
deterministic — perfect for a Tier-3 first-light run.
|
||||
|
||||
NOTE: The backdoor binds a shell on port 6200 (hardcoded in both the
|
||||
vulnerable vsftpd binary AND the Metasploit module — not overridable).
|
||||
msfrpcd connects to RHOSTS:6200 after triggering the backdoor. With
|
||||
SLIRP+restrict=on and multiple concurrent slots, port 6200 can only be
|
||||
hostfwd'd once, causing collisions. Requires BRIDGE so the exploit
|
||||
handler can reach guest:6200 directly via the bridge IP.
|
||||
"""
|
||||
|
||||
[module]
|
||||
|
|
@ -12,12 +18,14 @@ path = "unix/ftp/vsftpd_234_backdoor"
|
|||
[module.options]
|
||||
RHOSTS = "{{ target_ip }}"
|
||||
RPORT = 21
|
||||
# The exploit returns its own command shell — we drive it with a
|
||||
# minimal cmd/unix/interact payload so the session lands as a plain
|
||||
# shell session usable by session.shell_write/read.
|
||||
|
||||
[payload]
|
||||
path = "cmd/unix/interact"
|
||||
|
||||
[session]
|
||||
type = "shell"
|
||||
|
||||
[runtime]
|
||||
# Port 6200 (backdoor bind) is hardcoded; can't offset per-slot.
|
||||
# Requires bridge so all concurrent slots get distinct guest IPs.
|
||||
requires_bridge = true
|
||||
|
|
|
|||
|
|
@ -104,8 +104,8 @@ class MSFRpcClient:
|
|||
if "job_id" not in resp:
|
||||
raise MSFRpcError(f"module.execute returned no job_id: {resp!r}")
|
||||
log.info(
|
||||
"module.execute %s/%s -> job_id=%s uuid=%s",
|
||||
module_type, module_name, resp["job_id"], resp.get("uuid"),
|
||||
"module.execute %s/%s -> job_id=%s uuid=%s resp=%r",
|
||||
module_type, module_name, resp["job_id"], resp.get("uuid"), resp,
|
||||
)
|
||||
return resp
|
||||
|
||||
|
|
@ -154,6 +154,22 @@ class MSFRpcClient:
|
|||
def _call_no_auth(self, method: str, *args: Any) -> dict[str, Any]:
|
||||
return self._raw_call([method, *args])
|
||||
|
||||
@staticmethod
|
||||
def _str(v: Any) -> Any:
|
||||
"""Decode bytes to str; recursively normalize dicts and lists.
|
||||
|
||||
msfrpcd (pacman metasploit 6.x) returns msgpack bin type for all
|
||||
string values, so raw=False still gives bytes. Normalise the whole
|
||||
response tree so callers can use plain str keys/values.
|
||||
"""
|
||||
if isinstance(v, bytes):
|
||||
return v.decode("utf-8", errors="replace")
|
||||
if isinstance(v, dict):
|
||||
return {MSFRpcClient._str(k): MSFRpcClient._str(val) for k, val in v.items()}
|
||||
if isinstance(v, list):
|
||||
return [MSFRpcClient._str(i) for i in v]
|
||||
return v
|
||||
|
||||
def _raw_call(self, payload: list[Any]) -> dict[str, Any]:
|
||||
body = msgpack.packb(payload, use_bin_type=False)
|
||||
conn = self._open_conn()
|
||||
|
|
@ -180,7 +196,7 @@ class MSFRpcClient:
|
|||
conn.close()
|
||||
|
||||
try:
|
||||
decoded = msgpack.unpackb(raw, raw=False)
|
||||
decoded = self._str(msgpack.unpackb(raw, raw=False))
|
||||
except Exception as e:
|
||||
raise MSFRpcError(f"could not decode msfrpcd response: {e}") from e
|
||||
|
||||
|
|
@ -221,11 +237,18 @@ def wait_for_new_session(
|
|||
) -> tuple[int, dict[str, Any]] | None:
|
||||
"""Poll ``session.list`` until a session id we haven't seen before
|
||||
appears, or until timeout. Returns ``(session_id, info)`` or None."""
|
||||
log = __import__("logging").getLogger("cis490.msfrpc")
|
||||
deadline = time.monotonic() + timeout_s
|
||||
logged_empty = False
|
||||
while time.monotonic() < deadline:
|
||||
sessions = client.session_list()
|
||||
if not logged_empty:
|
||||
log.debug("wait_for_new_session: seen=%r current=%r", seen, list(sessions.keys()))
|
||||
logged_empty = True
|
||||
for sid, info in sessions.items():
|
||||
if sid not in seen:
|
||||
return sid, info
|
||||
time.sleep(poll_s)
|
||||
# Log final state on timeout
|
||||
log.debug("wait_for_new_session timeout: final sessions=%r", client.session_list())
|
||||
return None
|
||||
|
|
|
|||
|
|
@ -109,6 +109,12 @@ class FleetConfig:
|
|||
# Force Tier-2 even when msfrpcd is up; used by tests + dev runs
|
||||
# that want a no-exploit baseline.
|
||||
force_tier2: bool = False
|
||||
# Limit how many slots per wave run as Tier-3. Slots 0..N-1 get
|
||||
# Tier-3; the rest fall back to Tier-2. Metasploitable2 boot is IO-
|
||||
# bound: running >~6 concurrent target VMs saturates disk and causes
|
||||
# all slots to timeout waiting for the guest service to come up.
|
||||
# None = no cap (all eligible slots use Tier-3).
|
||||
max_tier3_slots: int | None = None
|
||||
# msfrpcd connectivity (read by tier-3 driver via env).
|
||||
msfrpcd_host: str = "127.0.0.1"
|
||||
msfrpcd_port: int = 55553
|
||||
|
|
@ -237,26 +243,19 @@ def _run_slot(
|
|||
run_dir_base = "/tmp/cis490-vm-fleet"
|
||||
|
||||
# Decide tier.
|
||||
bridge_iface = os.environ.get("BRIDGE") or None
|
||||
# Filter the catalog to modules that can actually fire under the
|
||||
# current launcher mode. Reverse / bind shells require the host-
|
||||
# only bridge (no SLIRP+restrict=on guest egress), so skip those
|
||||
# when BRIDGE isn't set; otherwise the exploit fires but the
|
||||
# session never lands and the episode degenerates to a 30 s
|
||||
# session_open_timeout.
|
||||
if cfg.modules:
|
||||
if bridge_iface:
|
||||
usable_modules = dict(cfg.modules)
|
||||
else:
|
||||
usable_modules = {
|
||||
k: v for k, v in cfg.modules.items() if not v.requires_bridge
|
||||
}
|
||||
else:
|
||||
usable_modules = {}
|
||||
# Tier-3 target VMs always use SLIRP+hostfwd so msfrpcd can reach
|
||||
# the guest via loopback. BRIDGE tap is for the Tier-2 idle VM only
|
||||
# (pcap source 4). Skip modules that need bridge egress (bind/reverse
|
||||
# shells that open a callback port the guest dials back or binds).
|
||||
usable_modules: dict[str, ModuleConfig] = (
|
||||
{k: v for k, v in cfg.modules.items() if not v.requires_bridge}
|
||||
if cfg.modules else {}
|
||||
)
|
||||
tier3_ready = (
|
||||
not cfg.force_tier2
|
||||
and bool(usable_modules)
|
||||
and _msfrpcd_available(cfg.msfrpcd_host, cfg.msfrpcd_port)
|
||||
and (cfg.max_tier3_slots is None or slot < cfg.max_tier3_slots)
|
||||
)
|
||||
|
||||
env = os.environ.copy()
|
||||
|
|
@ -280,15 +279,33 @@ def _run_slot(
|
|||
usable_modules,
|
||||
host_id=cfg.host_id, slot=slot, episode_index=episode_index,
|
||||
)
|
||||
target_port = module_target_port(module) or 21
|
||||
guest_port = module_target_port(module) or 21
|
||||
# HOST_PORT: unprivileged port QEMU hostfwd's to the guest service.
|
||||
# +2000 shifts all base ports above 1024 (vsftpd:21->2021,
|
||||
# http:80->2080, smb:139->2139, distcc:3632->5632, irc:6667->8667).
|
||||
# Slot offset prevents concurrent targets from colliding on loopback.
|
||||
host_port = guest_port + 2000 + slot * 1000
|
||||
# Per-slot runner dir for the target VM.
|
||||
run_dir = f"{run_dir_base}-target-{slot}"
|
||||
env["RUN_DIR"] = run_dir
|
||||
# Each slot gets a unique host-side hostfwd port so concurrent
|
||||
# targets don't collide on the loopback port.
|
||||
env["PORT_BASE"] = str(target_port + slot * 1000)
|
||||
if bridge_iface:
|
||||
env["BRIDGE"] = bridge_iface
|
||||
env["PORT_BASE"] = str(host_port)
|
||||
# Main service port pair, plus per-slot bind ports for payloads
|
||||
# like cmd/unix/bind_perl that open a separate listener in the guest.
|
||||
# Per-slot offset (base + slot*1000) prevents collisions.
|
||||
target_ports = f"{host_port}:{guest_port}"
|
||||
for extra_guest_port in module.extra_target_ports:
|
||||
# Per-slot LPORT: base + slot*1000. FLEET_PAYLOAD_LPORT overrides
|
||||
# the payload's LPORT so the guest binds this exact port. The
|
||||
# hostfwd maps the same number on both sides because the guest's
|
||||
# bind port equals the per-slot LPORT (not the module's base LPORT).
|
||||
extra_host_port = extra_guest_port + slot * 1000
|
||||
target_ports += f",{extra_host_port}:{extra_host_port}"
|
||||
env["FLEET_PAYLOAD_LPORT"] = str(extra_host_port)
|
||||
env["TARGET_PORTS"] = target_ports
|
||||
# Remove BRIDGE so launch_target.sh uses SLIRP+hostfwd instead of
|
||||
# tap. Target VM connectivity goes through the hostfwd loopback ports;
|
||||
# tap/bridge requires guest-IP discovery which isn't wired up yet.
|
||||
env.pop("BRIDGE", None)
|
||||
cmd = [
|
||||
py,
|
||||
str(cfg.repo_root / "tools" / "run_tier3_demo.py"),
|
||||
|
|
@ -296,7 +313,8 @@ def _run_slot(
|
|||
"--run-dir", run_dir,
|
||||
"--module", module.name,
|
||||
"--sample", sample.name,
|
||||
"--target-port", str(target_port + slot * 1000),
|
||||
"--target-port", str(host_port),
|
||||
"--target-boot-timeout", "300",
|
||||
]
|
||||
tier = "tier3"
|
||||
module_name: str | None = module.name
|
||||
|
|
@ -314,6 +332,10 @@ def _run_slot(
|
|||
module_name = None
|
||||
if not cfg.force_tier2 and not cfg.modules:
|
||||
log.warning("slot=%d falling back to Tier 2: empty module catalog", slot)
|
||||
elif not cfg.force_tier2 and not usable_modules:
|
||||
log.warning("slot=%d falling back to Tier 2: no non-bridge modules available", slot)
|
||||
elif not cfg.force_tier2 and cfg.max_tier3_slots is not None and slot >= cfg.max_tier3_slots:
|
||||
log.debug("slot=%d Tier 2 by max_tier3_slots=%d cap", slot, cfg.max_tier3_slots)
|
||||
elif not cfg.force_tier2:
|
||||
log.warning("slot=%d falling back to Tier 2: msfrpcd unreachable at %s:%d",
|
||||
slot, cfg.msfrpcd_host, cfg.msfrpcd_port)
|
||||
|
|
|
|||
|
|
@ -244,6 +244,8 @@ fi
|
|||
install -d -o "$SERVICE_USER" -g "$SERVICE_USER" -m 0755 "$INSTALL_ROOT/vm/images"
|
||||
ln -sf "$ALPINE_IMG" "$INSTALL_ROOT/vm/images/alpine-baseline.qcow2" 2>/dev/null || true
|
||||
ln -sf "$CIDATA_ISO" "$INSTALL_ROOT/vm/images/cidata.iso" 2>/dev/null || true
|
||||
M2_IMG="$DATA_ROOT/vm/images/metasploitable2.qcow2"
|
||||
[[ -f "$M2_IMG" ]] && ln -sf "$M2_IMG" "$INSTALL_ROOT/vm/images/metasploitable2.qcow2" 2>/dev/null || true
|
||||
|
||||
# --- 8. Tier-3 + Tier-4 deploy (auto, idempotent) ----------------------
|
||||
# Bring up msfrpcd + Metasploitable2 + bridge + verify. Skipped only if
|
||||
|
|
|
|||
|
|
@ -102,6 +102,12 @@ else
|
|||
fi
|
||||
|
||||
# --- 3. systemd unit ----------------------------------------------------
|
||||
# msfrpcd writes module cache + logs to $HOME/.msf4. With ProtectHome=true
|
||||
# the service can't reach /root, so we redirect HOME to a path under
|
||||
# /var/lib/cis490 that is always writable.
|
||||
MSF_HOME="/var/lib/cis490/msf4"
|
||||
install -d -m 0755 -o root -g root "$MSF_HOME"
|
||||
|
||||
log "installing systemd unit"
|
||||
cat > "$UNIT" <<EOF
|
||||
[Unit]
|
||||
|
|
@ -113,6 +119,7 @@ Wants=network-online.target
|
|||
[Service]
|
||||
Type=simple
|
||||
EnvironmentFile=$ENV_FILE
|
||||
Environment=HOME=$MSF_HOME
|
||||
# msfrpcd flags:
|
||||
# -P <pw> password
|
||||
# -U <user> username
|
||||
|
|
|
|||
|
|
@ -101,7 +101,8 @@ if [[ -z "${SKIP_VERIFY:-}" ]]; then
|
|||
[[ -x "$PY" ]] || PY="$(command -v python3)"
|
||||
if ! sudo -E -u cis490 "$PY" "$INSTALL_ROOT/tools/run_tier3_demo.py" \
|
||||
--module vsftpd_234_backdoor \
|
||||
--target-port 21 \
|
||||
--target-port 2021 \
|
||||
--data-root "$DATA_ROOT/data" \
|
||||
--target-boot-timeout 240 \
|
||||
> /tmp/cis490-tier3-verify.log 2>&1; then
|
||||
log "verify run failed — log at /tmp/cis490-tier3-verify.log; dumping last 30 lines:"
|
||||
|
|
|
|||
|
|
@ -45,6 +45,8 @@ def main(argv: list[str] | None = None) -> int:
|
|||
p.add_argument("--require-real-samples", action="store_true")
|
||||
p.add_argument("--force-tier2", action="store_true",
|
||||
help="Skip Tier 3 even when msfrpcd is reachable")
|
||||
p.add_argument("--max-tier3-slots", type=int, default=None,
|
||||
help="Cap concurrent Tier-3 slots; slots >= N fall back to Tier-2")
|
||||
p.add_argument("--log-level", default="INFO")
|
||||
args = p.parse_args(argv)
|
||||
|
||||
|
|
@ -72,6 +74,7 @@ def main(argv: list[str] | None = None) -> int:
|
|||
max_concurrent_override=args.max_concurrent,
|
||||
require_real_samples=args.require_real_samples,
|
||||
force_tier2=args.force_tier2,
|
||||
max_tier3_slots=args.max_tier3_slots,
|
||||
)
|
||||
|
||||
runner = FleetRunner(cfg)
|
||||
|
|
|
|||
|
|
@ -66,13 +66,20 @@ def _wait_for_path(path: Path, timeout_s: float) -> None:
|
|||
|
||||
|
||||
def _wait_for_tcp(host: str, port: int, timeout_s: float) -> None:
|
||||
"""Legacy TCP probe — only reliable when the guest speaks first on connect.
|
||||
Kept for reference; replaced by _wait_for_serial_login for SLIRP guests."""
|
||||
import socket
|
||||
deadline = time.monotonic() + timeout_s
|
||||
last_err: Exception | None = None
|
||||
while time.monotonic() < deadline:
|
||||
try:
|
||||
with socket.create_connection((host, port), timeout=1.0):
|
||||
return
|
||||
with socket.create_connection((host, port), timeout=1.0) as s:
|
||||
s.settimeout(0.5)
|
||||
try:
|
||||
s.recv(1)
|
||||
except socket.timeout:
|
||||
pass
|
||||
return
|
||||
except OSError as e:
|
||||
last_err = e
|
||||
time.sleep(1.0)
|
||||
|
|
@ -82,6 +89,58 @@ def _wait_for_tcp(host: str, port: int, timeout_s: float) -> None:
|
|||
)
|
||||
|
||||
|
||||
def _wait_for_serial_login(
|
||||
serial_sock: "Path",
|
||||
timeout_s: float,
|
||||
prompt: bytes = b"login:",
|
||||
) -> None:
|
||||
"""Wait for a shell login prompt on the QEMU serial console.
|
||||
|
||||
SLIRP completes the TCP handshake before the guest OS boots, making
|
||||
TCP-based readiness probes on port 139/445 unreliable (they return
|
||||
immediately even when Samba isn't running yet). The serial console is
|
||||
authoritative: we connect right after QEMU writes its pidfile (before
|
||||
the guest produces any output) and stream boot messages until the
|
||||
"login:" prompt appears.
|
||||
|
||||
QEMU's serial chardev is ``server=on,wait=off``: the socket is created
|
||||
at QEMU startup. Data written before a client connects is discarded, so
|
||||
we must connect before the prompt appears. Since the pidfile is written
|
||||
after QEMU finishes device init (well before the guest kernel loads), we
|
||||
reliably connect in time.
|
||||
"""
|
||||
import socket as _socket
|
||||
|
||||
deadline = time.monotonic() + timeout_s
|
||||
while not serial_sock.exists():
|
||||
if time.monotonic() >= deadline:
|
||||
raise TimeoutError(f"serial socket {serial_sock} never appeared")
|
||||
time.sleep(0.2)
|
||||
|
||||
buf = b""
|
||||
sock = _socket.socket(_socket.AF_UNIX, _socket.SOCK_STREAM)
|
||||
sock.settimeout(2.0)
|
||||
try:
|
||||
sock.connect(str(serial_sock))
|
||||
while time.monotonic() < deadline:
|
||||
try:
|
||||
chunk = sock.recv(4096)
|
||||
if not chunk:
|
||||
break
|
||||
buf += chunk
|
||||
if prompt in buf.lower():
|
||||
return
|
||||
except _socket.timeout:
|
||||
pass
|
||||
finally:
|
||||
sock.close()
|
||||
|
||||
raise TimeoutError(
|
||||
f"login prompt not seen on serial console within {timeout_s}s "
|
||||
f"(last {min(200, len(buf))} bytes: {buf[-200:]!r})"
|
||||
)
|
||||
|
||||
|
||||
def main() -> int:
|
||||
parser = argparse.ArgumentParser(prog="run_tier3_demo")
|
||||
parser.add_argument("--data-root", default="data")
|
||||
|
|
@ -181,6 +240,18 @@ def main() -> int:
|
|||
sample.name, sample.profile, sample.kind)
|
||||
|
||||
run_dir = Path(args.run_dir)
|
||||
# Kill any QEMU still holding this slot's run_dir from a previous wave.
|
||||
# QEMU is started with start_new_session=True so it survives orchestrator
|
||||
# SIGTERM without explicit cleanup here.
|
||||
old_pid_file = run_dir / "qemu.pid"
|
||||
if old_pid_file.exists():
|
||||
try:
|
||||
old_pid = int(old_pid_file.read_text().strip())
|
||||
import os as _os
|
||||
_os.killpg(_os.getpgid(old_pid), signal.SIGTERM)
|
||||
time.sleep(1.5)
|
||||
except (ProcessLookupError, ValueError, OSError):
|
||||
pass
|
||||
if run_dir.exists():
|
||||
import shutil
|
||||
shutil.rmtree(run_dir)
|
||||
|
|
@ -202,11 +273,11 @@ def main() -> int:
|
|||
try:
|
||||
_wait_for_path(pid_file, timeout_s=15.0)
|
||||
qemu_pid = int(pid_file.read_text().strip())
|
||||
log.info("qemu pid = %d; waiting for service on %s:%d (timeout %.0fs)",
|
||||
qemu_pid, args.target_ip, args.target_port,
|
||||
args.target_boot_timeout)
|
||||
_wait_for_tcp(args.target_ip, args.target_port, args.target_boot_timeout)
|
||||
log.info("target service is up")
|
||||
serial_sock = run_dir / "serial.sock"
|
||||
log.info("qemu pid = %d; waiting for login prompt on serial console (timeout %.0fs)",
|
||||
qemu_pid, args.target_boot_timeout)
|
||||
_wait_for_serial_login(serial_sock, timeout_s=args.target_boot_timeout)
|
||||
log.info("target guest OS ready (login prompt seen on serial console)")
|
||||
|
||||
# Pre-exploit savevm so EpisodeConfig.revert_at_{start,end}
|
||||
# has a known-good baseline to load. Best-effort — we still
|
||||
|
|
@ -260,6 +331,11 @@ def main() -> int:
|
|||
module=module,
|
||||
cfg=DriverConfig(
|
||||
target_ip=args.target_ip,
|
||||
# Override RPORT when target_port is an unprivileged host port
|
||||
# (i.e. fleet runner remapped the guest's privileged port to a
|
||||
# loopback port > 1024). When target_port == module RPORT the
|
||||
# caller wants direct guest access; leave RPORT unchanged.
|
||||
target_port=args.target_port if args.target_port > 1024 else None,
|
||||
sample_store_root=repo_root / "samples" / "store",
|
||||
),
|
||||
emit_event=runner.emit_event,
|
||||
|
|
|
|||
|
|
@ -34,9 +34,11 @@ RAM_MIB="${RAM_MIB:-512}"
|
|||
BRIDGE="${BRIDGE:-}"
|
||||
TAP="${TAP:-cis490target$SLOT}"
|
||||
# Ports the host should forward to the guest. Comma-separated host:guest pairs.
|
||||
# Default covers the vsftpd module's RPORT. Slot offset makes per-VM
|
||||
# fleet runs collision-free (slot 0 → 21, slot 1 → 121, slot 2 → 221, ...).
|
||||
PORT_BASE="${PORT_BASE:-$((21 + SLOT * 100))}"
|
||||
# Default covers the vsftpd module's RPORT. Host port uses an unprivileged
|
||||
# range (>1023) so the service user (cis490) can bind it without root.
|
||||
# Slot offset makes per-VM fleet runs collision-free
|
||||
# (slot 0 → 2021, slot 1 → 2121, slot 2 → 2221, ...).
|
||||
PORT_BASE="${PORT_BASE:-$((2021 + SLOT * 100))}"
|
||||
TARGET_PORTS="${TARGET_PORTS:-${PORT_BASE}:21}"
|
||||
# KVM if the host can take it; otherwise fall back to TCG. Cross-arch
|
||||
# images (Metasploitable2 is x86-only) on aarch64 hosts will need TCG.
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue