Compare commits

...
Sign in to create a new pull request.

8 commits

Author SHA1 Message Date
0fb2f3b9a6 docs: fix notes for Dev_REL3_050126 — all 7 Tier-3 bring-up bugs
Branch HEAD: 656a015443

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-02 12:20:35 -06:00
656a015443 fix: fleet tier3 port formula produces privileged ports, boot timeout too tight
Two bugs causing all tier3 slots to fail:

1. PORT_BASE = target_port + slot * 1000 → slot 0 with samba (139)
   and php (80) produced host ports < 1024. cis490 lacks
   CAP_NET_BIND_SERVICE; QEMU's SLIRP hostfwd silently skipped the
   bind, making the service unreachable. Changed to:
   host_port = (target_port % 1000) + 2000 + slot * 1000
   so the minimum is always ≥ 2000 regardless of module RPORT.

2. --target-boot-timeout was never passed to run_tier3_demo.py,
   so it used the 180 s default. 7 concurrent VMs under I/O
   contention need more time; now passes 300 s explicitly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 19:56:19 -06:00
d294eb9f52 fix: orchestrator never received MSFRPC_PASSWORD — load msfrpc.env
The cis490-orchestrator unit only loaded lab-host.env, which has no
MSFRPC_PASSWORD. run_tier3_demo.py exits rc=2 immediately if the var
is unset. All tier3 slots were failing in ~240ms.

Add EnvironmentFile=-/etc/cis490/msfrpc.env (the '-' prefix silences
the error on Tier-2-only hosts where the file doesn't exist yet).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 14:43:40 -06:00
42626259c7 fix: msfrpc normalise bytes response keys/values (closes #20)
msfrpcd encodes all string values as msgpack bin type (bytes), not
legacy raw/str. unpackb(raw=False) only converts legacy raw; bin
always arrives as bytes. auth.login saw {b'result': b'success', ...}
and the .get("result") check returned None → MSFRpcError.

Add _decode_response() recursive normaliser and call it in _raw_call
immediately after unpackb so all callers see plain str keys/values.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 08:07:46 -06:00
d2716b485e fix: run_tier3_demo.py data-root defaults to relative 'data' — pass absolute path
install-tier-3-4.sh didn't pass --data-root to run_tier3_demo.py, so
episode.py tried to mkdir a relative 'data/' in whatever CWD sudo
inherited. Pass $DATA_ROOT/data explicitly.

Closes spectral/CIS490#19

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 08:03:22 -06:00
f4eef81807 fix: Tier-3 verify fails — PORT_BASE 21 is privileged, RPORT not propagated
QEMU's SLIRP hostfwd tried to bind host port 21 for the Metasploitable2
target, which fails for the non-root cis490 user (EPERM). The exploit
driver also had no way to use a different host-side port than the module's
static RPORT=21, so even if the VM had started the exploit would have
connected to the wrong port.

Fix:
  - launch_target.sh: change PORT_BASE default from (21 + SLOT*100) to
    (2121 + SLOT*100) so SLIRP binds non-privileged ports
  - exploits/driver.py: add target_port to DriverConfig; in _fire(),
    override opts["RPORT"] when target_port is set so msfrpcd connects
    to the correct forwarded port
  - tools/run_tier3_demo.py: pass target_port=args.target_port to DriverConfig
  - scripts/install-tier-3-4.sh: --target-port 2121 (matches new default)

Closes spectral/CIS490#18

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 08:02:23 -06:00
ae4b80dc32 fix: two Tier-3 install bugs — msfrpcd HOME and missing metasploitable2 symlink
Bug 1 (install-msfrpcd.sh): cis490-msfrpcd.service crashed with EROFS
on /root/.msf4 because ProtectHome=true makes /root inaccessible.
Fix: set Environment=HOME=/var/lib/cis490/msf4 in the unit template and
add ReadWritePaths=/var/lib/cis490. Pre-create /var/lib/cis490/msf4 in
the install step so msfrpcd never races mkdir.

Bug 2 (install-tier-3-4.sh): run_tier3_demo.py launches launch_target.sh
which resolves IMAGE relative to $REPO_ROOT/vm/images/. The fetch step
placed metasploitable2.qcow2 in $DATA_ROOT/vm/images/ but never
symlinked it into $INSTALL_ROOT/vm/images/, so launch_target.sh exited
immediately (no image found) and qemu.pid never appeared within 15s.
Fix: add a symlink step after the fetch, mirroring how install-lab-host.sh
handles the Alpine baseline image.

Closes spectral/CIS490#16
Closes spectral/CIS490#17

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 07:58:25 -06:00
1dd484dd5c fix: cis490-msfrpcd unit crashes with EROFS on /root/.msf4 (ProtectHome=true)
msfrpcd tries to mkdir ~/.msf4/ for its module cache and logs. The
cis490-msfrpcd.service unit sets ProtectHome=true, which makes /root
inaccessible (EROFS), so msfrpcd exits immediately on first start.

Fix: add Environment=HOME=/var/lib/cis490/msf4 to the unit template
and ReadWritePaths=/var/lib/cis490, and pre-create the msf4 dir in the
install script so msfrpcd can write its state there instead. ProtectHome
is preserved because /root is now never touched.

Closes spectral/CIS490#16

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 07:55:08 -06:00
9 changed files with 195 additions and 5 deletions

View file

@ -0,0 +1,142 @@
# Fix Notes — Dev_REL3_050126
Branch HEAD: `656a015443f54dffeab66ae29fa726eee36a51ed`
Date: 2026-05-02
Author: elliott (k-gamingcom lab host)
## Summary
Seven bugs found and fixed during Tier-3 + Tier-4 bring-up on k-gamingcom,
following the AGENTS.md runbook. All fixes are committed to `Dev_REL3_050126`
and deployed to `/opt/cis490`.
---
## Fixes (oldest → newest)
### 1. `cis490-msfrpcd` crashes with EROFS on `/root/.msf4` — commit `1dd484d`
**File:** `scripts/install-msfrpcd.sh`
**Symptom:** msfrpcd service failed immediately with `EROFS` because
`ProtectHome=true` in the generated systemd unit made `/root` a read-only
overlay. msfrpcd defaulted `$HOME` to `/root` and could not create `.msf4/`.
**Fix:** Pre-create `/var/lib/cis490/msf4`, add `Environment=HOME=/var/lib/cis490/msf4`
and `ReadWritePaths=/var/lib/cis490` to the generated unit.
---
### 2. Two Tier-3 install bugs — `metasploitable2` symlink + msfrpcd HOME — commit `ae4b80d`
**Files:** `scripts/install-tier-3-4.sh`
**Symptom A:** `install-tier-3-4.sh` fetched the Metasploitable2 image to
`$DATA_ROOT/vm/images/` but never symlinked it to `$INSTALL_ROOT/vm/images/`.
`launch_target.sh` resolved `IMAGE` relative to `$INSTALL_ROOT/vm/images/`
and exited immediately; `qemu.pid` never appeared.
**Fix:** Added `install -d` + `ln -sf` step after the fetch.
**Symptom B:** Same install bug also carried over the `HOME` fix above into
the install script's live-patch path.
---
### 3. PORT_BASE=21 is privileged; RPORT not propagated — commit `f4eef81`
**Files:** `vm/launch_target.sh`, `exploits/driver.py`, `tools/run_tier3_demo.py`
**Symptom:** `launch_target.sh` defaulted `PORT_BASE` to `$((21 + SLOT * 100))`.
Slot 0 → port 21, which `cis490` (non-root) cannot bind. QEMU printed
`bind(AF_INET, ...): Permission denied` and exited before booting the guest.
Even if the port had worked, `DriverConfig` had no way to override `RPORT`,
so the exploit module would have still connected to port 21 (not the hostfwd'd
port).
**Fix:**
- `launch_target.sh`: `PORT_BASE` default → `$((2121 + SLOT * 100))`
- `DriverConfig`: added `target_port: int | None` field
- `MSFExploitDriver._fire()`: if `target_port` set and RPORT in opts, override
- `run_tier3_demo.py`: pass `target_port=args.target_port` to `DriverConfig`
- `install-tier-3-4.sh` verify call: `--target-port 2121`
---
### 4. `run_tier3_demo.py --data-root` defaults to relative `"data"` — commit `d2716b4`
**Files:** `scripts/install-tier-3-4.sh`
**Symptom:** `run_tier3_demo.py` defaults `--data-root` to `"data"` (relative).
When invoked via `sudo -u cis490`, the CWD was `/`, so episode dirs resolved to
`/data/episodes/` which doesn't exist; `mkdir` raised `PermissionError`.
**Fix:** Pass `--data-root "$DATA_ROOT/data"` explicitly in the install script.
---
### 5. msfrpc bytes/str normalisation — commit `4262625` (closes #20)
**File:** `exploits/msfrpc.py`
**Symptom:** msfrpcd encodes all response strings as msgpack `bin` type (always
Python `bytes`). `unpackb(raw=False)` only converts the legacy `raw` type;
`bin` comes out as `bytes` regardless. `auth.login` received
`{b'result': b'success', b'token': b'TEMP...'}` and `resp.get("result")`
returned `None``MSFRpcError("auth.login failed: ...")`.
**Fix:** Added `_decode_response()` recursive `bytes → str` normaliser and
called it in `_raw_call` immediately after `msgpack.unpackb`.
---
### 6. Orchestrator never received `MSFRPC_PASSWORD` — commit `d294eb9`
**File:** `etc/cis490-orchestrator.service`
**Symptom:** The orchestrator unit only loaded `lab-host.env`, which contains
`FLEET_HOST_ID` and `BRIDGE` but not `MSFRPC_PASSWORD`. `run_tier3_demo.py`
checks for the env var at startup and exits `rc=2` immediately if unset.
All tier3 slots were failing in ~240 ms with `rc=2`.
**Fix:** Added `EnvironmentFile=-/etc/cis490/msfrpc.env` to the unit (the `-`
prefix silences the error on Tier-2-only hosts where the file doesn't exist).
---
### 7. Fleet port formula produces privileged ports; boot timeout too tight — commit `656a015`
**File:** `orchestrator/fleet.py`
**Symptom A:** `PORT_BASE = target_port + slot * 1000` produced host ports
< 1024 for `samba_usermap_script` (RPORT=139, slot 0 port 139) and
`php_cgi_arg_injection` (RPORT=80, slot 0 → port 80). `cis490` lacks
`CAP_NET_BIND_SERVICE`; QEMU's SLIRP `hostfwd` silently failed. The service
was never reachable. All 7 slots returned `rc=1` after timing out.
**Symptom B:** `--target-boot-timeout` was not passed to `run_tier3_demo.py`,
which uses a 180 s default. 7 concurrent VMs contending on I/O during boot
cannot reliably start their services within 180 s.
**Fix:**
- Port formula: `host_port = (target_port % 1000) + 2000 + slot * 1000`
(minimum host port 2000, no collisions across module types or slots)
- Pass `--target-boot-timeout 300` explicitly from the fleet runner
---
## Verification
After all fixes were applied:
- `install-tier-3-4.sh` step 4 produced episode `01KQJM5WGWC33P0QWJXRDJV1EN`
- `install-tier-3-4.sh` step 5 staged 6 real binaries in `samples/store/`
- Fleet wave at 19:55:57 UTC-6 confirmed slot 0 samba probing port 2139
with 300 s timeout — first wave to actually run to completion
## Still outstanding
- Pi-side mTLS cert for k-gamingcom not yet issued (shipper in
"waiting on mTLS material" state). Blocked on Pi operator running
`deploy-cis490-cert.sh k-gamingcom <wg_ip>`. No action needed on
lab-host side.

View file

@ -14,6 +14,9 @@ WorkingDirectory=/opt/cis490
# /etc/cis490/lab-host.env is written by scripts/install-lab-host.sh;
# carries FLEET_HOST_ID, BRIDGE, and any operator-supplied overrides.
EnvironmentFile=/etc/cis490/lab-host.env
# msfrpc.env only exists after install-tier-3-4.sh; the '-' prefix makes
# this a no-op on Tier-2-only hosts where it hasn't run yet.
EnvironmentFile=-/etc/cis490/msfrpc.env
# Fleet mode: detect host capacity, run that many concurrent episodes
# per wave with samples drawn from the manifest. Each invocation runs
# one wave and exits; systemd respawns per Restart= below, giving us

View file

@ -51,6 +51,9 @@ EmitEvent = Callable[..., None]
@dataclass
class DriverConfig:
target_ip: str
# Override the module's static RPORT when the host-side SLIRP
# hostfwd uses a non-privileged port (e.g. 2121 → guest:21).
target_port: int | None = None
session_open_timeout_s: float = 30.0
# Driver v1 fallback workload — used only when no Sample is passed
# in (Sample-driven runs override these via exploits.workloads).
@ -185,6 +188,8 @@ class MSFExploitDriver:
log.debug("module already fired; skipping re-fire")
return
opts = self.module.render_options(target_ip=self.cfg.target_ip)
if self.cfg.target_port is not None and "RPORT" in opts:
opts["RPORT"] = self.cfg.target_port
self.emit(
"exploit_fire",
module=self.module.module_path,

View file

@ -45,6 +45,24 @@ except ImportError as e: # pragma: no cover - import-time guard
log = logging.getLogger("cis490.msfrpc")
def _decode_response(v: Any) -> Any:
"""Recursively convert bytes → str in a msgpack-decoded structure.
msfrpcd encodes string values as msgpack bin (binary) type, not as
msgpack raw/str. Python msgpack's raw=False only decodes the legacy
'raw' type; 'bin' always comes out as bytes. Normalise here so
callers can do resp.get("result") regardless of which wire encoding
msfrpcd uses in a given version.
"""
if isinstance(v, bytes):
return v.decode("utf-8", errors="replace")
if isinstance(v, dict):
return {_decode_response(k): _decode_response(val) for k, val in v.items()}
if isinstance(v, list):
return [_decode_response(i) for i in v]
return v
class MSFRpcError(RuntimeError):
"""Raised when msfrpcd returns an error or a malformed response."""
@ -184,6 +202,8 @@ class MSFRpcClient:
except Exception as e:
raise MSFRpcError(f"could not decode msfrpcd response: {e}") from e
decoded = _decode_response(decoded)
if isinstance(decoded, dict) and decoded.get("error") is True:
raise MSFRpcError(
f"{payload[0]!r}: {decoded.get('error_class')} "

View file

@ -285,8 +285,12 @@ def _run_slot(
run_dir = f"{run_dir_base}-target-{slot}"
env["RUN_DIR"] = run_dir
# Each slot gets a unique host-side hostfwd port so concurrent
# targets don't collide on the loopback port.
env["PORT_BASE"] = str(target_port + slot * 1000)
# targets don't collide on the loopback port. Base at 2000+
# (target_port % 1000) so privileged-port modules (samba/139,
# php/80, vsftpd/21) never try to bind a port < 1024 on the
# host — cis490 user lacks CAP_NET_BIND_SERVICE.
host_port = (target_port % 1000) + 2000 + slot * 1000
env["PORT_BASE"] = str(host_port)
if bridge_iface:
env["BRIDGE"] = bridge_iface
cmd = [
@ -296,7 +300,10 @@ def _run_slot(
"--run-dir", run_dir,
"--module", module.name,
"--sample", sample.name,
"--target-port", str(target_port + slot * 1000),
"--target-port", str(host_port),
# Concurrent VMs contend on I/O during boot; 300 s gives
# a full fleet of 7 slots room to start their services.
"--target-boot-timeout", "300",
]
tier = "tier3"
module_name: str | None = module.name

View file

@ -103,6 +103,10 @@ fi
# --- 3. systemd unit ----------------------------------------------------
log "installing systemd unit"
# msfrpcd writes ~/.msf4/ for module cache and logs. ProtectHome=true in
# the unit makes /root inaccessible, so redirect HOME to a writable path
# under /var/lib/cis490/. Pre-create so msfrpcd doesn't race mkdir.
install -d -m 0755 -o root -g root /var/lib/cis490/msf4
cat > "$UNIT" <<EOF
[Unit]
Description=CIS490 — Metasploit RPC daemon (loopback only)
@ -119,6 +123,7 @@ EnvironmentFile=$ENV_FILE
# -a <ip> bind address (loopback only — Tier-3 driver runs locally)
# -p <port> port
# -f foreground (no daemonization, so systemd manages PID)
Environment=HOME=/var/lib/cis490/msf4
ExecStart=/usr/bin/env msfrpcd -P \${MSFRPC_PASSWORD} -U \${MSFRPC_USER} -a 127.0.0.1 -p \${MSFRPC_PORT} -f
Restart=on-failure
RestartSec=5
@ -126,6 +131,7 @@ NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=full
ProtectHome=true
ReadWritePaths=/var/lib/cis490
[Install]
WantedBy=multi-user.target

View file

@ -79,6 +79,11 @@ OUT_DIR="$DATA_ROOT/vm/images"
install -d -m 0755 -o cis490 -g cis490 "$OUT_DIR"
OUT_DIR="$OUT_DIR" "$(script_path fetch-metasploitable2.sh)"
chown cis490:cis490 "$OUT_DIR/metasploitable2.qcow2" 2>/dev/null || true
# launch_target.sh resolves IMAGE relative to $REPO_ROOT/vm/images/.
# Symlink the canonical path so it resolves correctly from /opt/cis490.
install -d -o cis490 -g cis490 -m 0755 "$INSTALL_ROOT/vm/images"
ln -sf "$OUT_DIR/metasploitable2.qcow2" \
"$INSTALL_ROOT/vm/images/metasploitable2.qcow2" || true
log "metasploitable2.qcow2 ✓"
# --- 3. bridge ---------------------------------------------------------
@ -101,7 +106,8 @@ if [[ -z "${SKIP_VERIFY:-}" ]]; then
[[ -x "$PY" ]] || PY="$(command -v python3)"
if ! sudo -E -u cis490 "$PY" "$INSTALL_ROOT/tools/run_tier3_demo.py" \
--module vsftpd_234_backdoor \
--target-port 21 \
--target-port 2121 \
--data-root "$DATA_ROOT/data" \
--target-boot-timeout 240 \
> /tmp/cis490-tier3-verify.log 2>&1; then
log "verify run failed — log at /tmp/cis490-tier3-verify.log; dumping last 30 lines:"

View file

@ -260,6 +260,7 @@ def main() -> int:
module=module,
cfg=DriverConfig(
target_ip=args.target_ip,
target_port=args.target_port,
sample_store_root=repo_root / "samples" / "store",
),
emit_event=runner.emit_event,

View file

@ -36,7 +36,7 @@ TAP="${TAP:-cis490target$SLOT}"
# Ports the host should forward to the guest. Comma-separated host:guest pairs.
# Default covers the vsftpd module's RPORT. Slot offset makes per-VM
# fleet runs collision-free (slot 0 → 21, slot 1 → 121, slot 2 → 221, ...).
PORT_BASE="${PORT_BASE:-$((21 + SLOT * 100))}"
PORT_BASE="${PORT_BASE:-$((2121 + SLOT * 100))}"
TARGET_PORTS="${TARGET_PORTS:-${PORT_BASE}:21}"
# KVM if the host can take it; otherwise fall back to TCG. Cross-arch
# images (Metasploitable2 is x86-only) on aarch64 hosts will need TCG.