diff --git a/docs/fix-notes-Dev_REL3_050126.md b/docs/fix-notes-Dev_REL3_050126.md new file mode 100644 index 0000000..b110263 --- /dev/null +++ b/docs/fix-notes-Dev_REL3_050126.md @@ -0,0 +1,142 @@ +# Fix Notes — Dev_REL3_050126 + +Branch HEAD: `656a015443f54dffeab66ae29fa726eee36a51ed` +Date: 2026-05-02 +Author: elliott (k-gamingcom lab host) + +## Summary + +Seven bugs found and fixed during Tier-3 + Tier-4 bring-up on k-gamingcom, +following the AGENTS.md runbook. All fixes are committed to `Dev_REL3_050126` +and deployed to `/opt/cis490`. + +--- + +## Fixes (oldest → newest) + +### 1. `cis490-msfrpcd` crashes with EROFS on `/root/.msf4` — commit `1dd484d` + +**File:** `scripts/install-msfrpcd.sh` + +**Symptom:** msfrpcd service failed immediately with `EROFS` because +`ProtectHome=true` in the generated systemd unit made `/root` a read-only +overlay. msfrpcd defaulted `$HOME` to `/root` and could not create `.msf4/`. + +**Fix:** Pre-create `/var/lib/cis490/msf4`, add `Environment=HOME=/var/lib/cis490/msf4` +and `ReadWritePaths=/var/lib/cis490` to the generated unit. + +--- + +### 2. Two Tier-3 install bugs — `metasploitable2` symlink + msfrpcd HOME — commit `ae4b80d` + +**Files:** `scripts/install-tier-3-4.sh` + +**Symptom A:** `install-tier-3-4.sh` fetched the Metasploitable2 image to +`$DATA_ROOT/vm/images/` but never symlinked it to `$INSTALL_ROOT/vm/images/`. +`launch_target.sh` resolved `IMAGE` relative to `$INSTALL_ROOT/vm/images/` +and exited immediately; `qemu.pid` never appeared. + +**Fix:** Added `install -d` + `ln -sf` step after the fetch. + +**Symptom B:** Same install bug also carried over the `HOME` fix above into +the install script's live-patch path. + +--- + +### 3. PORT_BASE=21 is privileged; RPORT not propagated — commit `f4eef81` + +**Files:** `vm/launch_target.sh`, `exploits/driver.py`, `tools/run_tier3_demo.py` + +**Symptom:** `launch_target.sh` defaulted `PORT_BASE` to `$((21 + SLOT * 100))`. +Slot 0 → port 21, which `cis490` (non-root) cannot bind. QEMU printed +`bind(AF_INET, ...): Permission denied` and exited before booting the guest. +Even if the port had worked, `DriverConfig` had no way to override `RPORT`, +so the exploit module would have still connected to port 21 (not the hostfwd'd +port). + +**Fix:** +- `launch_target.sh`: `PORT_BASE` default → `$((2121 + SLOT * 100))` +- `DriverConfig`: added `target_port: int | None` field +- `MSFExploitDriver._fire()`: if `target_port` set and RPORT in opts, override +- `run_tier3_demo.py`: pass `target_port=args.target_port` to `DriverConfig` +- `install-tier-3-4.sh` verify call: `--target-port 2121` + +--- + +### 4. `run_tier3_demo.py --data-root` defaults to relative `"data"` — commit `d2716b4` + +**Files:** `scripts/install-tier-3-4.sh` + +**Symptom:** `run_tier3_demo.py` defaults `--data-root` to `"data"` (relative). +When invoked via `sudo -u cis490`, the CWD was `/`, so episode dirs resolved to +`/data/episodes/` which doesn't exist; `mkdir` raised `PermissionError`. + +**Fix:** Pass `--data-root "$DATA_ROOT/data"` explicitly in the install script. + +--- + +### 5. msfrpc bytes/str normalisation — commit `4262625` (closes #20) + +**File:** `exploits/msfrpc.py` + +**Symptom:** msfrpcd encodes all response strings as msgpack `bin` type (always +Python `bytes`). `unpackb(raw=False)` only converts the legacy `raw` type; +`bin` comes out as `bytes` regardless. `auth.login` received +`{b'result': b'success', b'token': b'TEMP...'}` and `resp.get("result")` +returned `None` → `MSFRpcError("auth.login failed: ...")`. + +**Fix:** Added `_decode_response()` recursive `bytes → str` normaliser and +called it in `_raw_call` immediately after `msgpack.unpackb`. + +--- + +### 6. Orchestrator never received `MSFRPC_PASSWORD` — commit `d294eb9` + +**File:** `etc/cis490-orchestrator.service` + +**Symptom:** The orchestrator unit only loaded `lab-host.env`, which contains +`FLEET_HOST_ID` and `BRIDGE` but not `MSFRPC_PASSWORD`. `run_tier3_demo.py` +checks for the env var at startup and exits `rc=2` immediately if unset. +All tier3 slots were failing in ~240 ms with `rc=2`. + +**Fix:** Added `EnvironmentFile=-/etc/cis490/msfrpc.env` to the unit (the `-` +prefix silences the error on Tier-2-only hosts where the file doesn't exist). + +--- + +### 7. Fleet port formula produces privileged ports; boot timeout too tight — commit `656a015` + +**File:** `orchestrator/fleet.py` + +**Symptom A:** `PORT_BASE = target_port + slot * 1000` produced host ports +< 1024 for `samba_usermap_script` (RPORT=139, slot 0 → port 139) and +`php_cgi_arg_injection` (RPORT=80, slot 0 → port 80). `cis490` lacks +`CAP_NET_BIND_SERVICE`; QEMU's SLIRP `hostfwd` silently failed. The service +was never reachable. All 7 slots returned `rc=1` after timing out. + +**Symptom B:** `--target-boot-timeout` was not passed to `run_tier3_demo.py`, +which uses a 180 s default. 7 concurrent VMs contending on I/O during boot +cannot reliably start their services within 180 s. + +**Fix:** +- Port formula: `host_port = (target_port % 1000) + 2000 + slot * 1000` + (minimum host port 2000, no collisions across module types or slots) +- Pass `--target-boot-timeout 300` explicitly from the fleet runner + +--- + +## Verification + +After all fixes were applied: + +- `install-tier-3-4.sh` step 4 produced episode `01KQJM5WGWC33P0QWJXRDJV1EN` +- `install-tier-3-4.sh` step 5 staged 6 real binaries in `samples/store/` +- Fleet wave at 19:55:57 UTC-6 confirmed slot 0 samba probing port 2139 + with 300 s timeout — first wave to actually run to completion + +## Still outstanding + +- Pi-side mTLS cert for k-gamingcom not yet issued (shipper in + "waiting on mTLS material" state). Blocked on Pi operator running + `deploy-cis490-cert.sh k-gamingcom `. No action needed on + lab-host side.