docs: fix notes for Dev_REL3_050126 — all 7 Tier-3 bring-up bugs
Branch HEAD: 656a015443
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
656a015443
commit
0fb2f3b9a6
1 changed files with 142 additions and 0 deletions
142
docs/fix-notes-Dev_REL3_050126.md
Normal file
142
docs/fix-notes-Dev_REL3_050126.md
Normal file
|
|
@ -0,0 +1,142 @@
|
|||
# Fix Notes — Dev_REL3_050126
|
||||
|
||||
Branch HEAD: `656a015443f54dffeab66ae29fa726eee36a51ed`
|
||||
Date: 2026-05-02
|
||||
Author: elliott (k-gamingcom lab host)
|
||||
|
||||
## Summary
|
||||
|
||||
Seven bugs found and fixed during Tier-3 + Tier-4 bring-up on k-gamingcom,
|
||||
following the AGENTS.md runbook. All fixes are committed to `Dev_REL3_050126`
|
||||
and deployed to `/opt/cis490`.
|
||||
|
||||
---
|
||||
|
||||
## Fixes (oldest → newest)
|
||||
|
||||
### 1. `cis490-msfrpcd` crashes with EROFS on `/root/.msf4` — commit `1dd484d`
|
||||
|
||||
**File:** `scripts/install-msfrpcd.sh`
|
||||
|
||||
**Symptom:** msfrpcd service failed immediately with `EROFS` because
|
||||
`ProtectHome=true` in the generated systemd unit made `/root` a read-only
|
||||
overlay. msfrpcd defaulted `$HOME` to `/root` and could not create `.msf4/`.
|
||||
|
||||
**Fix:** Pre-create `/var/lib/cis490/msf4`, add `Environment=HOME=/var/lib/cis490/msf4`
|
||||
and `ReadWritePaths=/var/lib/cis490` to the generated unit.
|
||||
|
||||
---
|
||||
|
||||
### 2. Two Tier-3 install bugs — `metasploitable2` symlink + msfrpcd HOME — commit `ae4b80d`
|
||||
|
||||
**Files:** `scripts/install-tier-3-4.sh`
|
||||
|
||||
**Symptom A:** `install-tier-3-4.sh` fetched the Metasploitable2 image to
|
||||
`$DATA_ROOT/vm/images/` but never symlinked it to `$INSTALL_ROOT/vm/images/`.
|
||||
`launch_target.sh` resolved `IMAGE` relative to `$INSTALL_ROOT/vm/images/`
|
||||
and exited immediately; `qemu.pid` never appeared.
|
||||
|
||||
**Fix:** Added `install -d` + `ln -sf` step after the fetch.
|
||||
|
||||
**Symptom B:** Same install bug also carried over the `HOME` fix above into
|
||||
the install script's live-patch path.
|
||||
|
||||
---
|
||||
|
||||
### 3. PORT_BASE=21 is privileged; RPORT not propagated — commit `f4eef81`
|
||||
|
||||
**Files:** `vm/launch_target.sh`, `exploits/driver.py`, `tools/run_tier3_demo.py`
|
||||
|
||||
**Symptom:** `launch_target.sh` defaulted `PORT_BASE` to `$((21 + SLOT * 100))`.
|
||||
Slot 0 → port 21, which `cis490` (non-root) cannot bind. QEMU printed
|
||||
`bind(AF_INET, ...): Permission denied` and exited before booting the guest.
|
||||
Even if the port had worked, `DriverConfig` had no way to override `RPORT`,
|
||||
so the exploit module would have still connected to port 21 (not the hostfwd'd
|
||||
port).
|
||||
|
||||
**Fix:**
|
||||
- `launch_target.sh`: `PORT_BASE` default → `$((2121 + SLOT * 100))`
|
||||
- `DriverConfig`: added `target_port: int | None` field
|
||||
- `MSFExploitDriver._fire()`: if `target_port` set and RPORT in opts, override
|
||||
- `run_tier3_demo.py`: pass `target_port=args.target_port` to `DriverConfig`
|
||||
- `install-tier-3-4.sh` verify call: `--target-port 2121`
|
||||
|
||||
---
|
||||
|
||||
### 4. `run_tier3_demo.py --data-root` defaults to relative `"data"` — commit `d2716b4`
|
||||
|
||||
**Files:** `scripts/install-tier-3-4.sh`
|
||||
|
||||
**Symptom:** `run_tier3_demo.py` defaults `--data-root` to `"data"` (relative).
|
||||
When invoked via `sudo -u cis490`, the CWD was `/`, so episode dirs resolved to
|
||||
`/data/episodes/` which doesn't exist; `mkdir` raised `PermissionError`.
|
||||
|
||||
**Fix:** Pass `--data-root "$DATA_ROOT/data"` explicitly in the install script.
|
||||
|
||||
---
|
||||
|
||||
### 5. msfrpc bytes/str normalisation — commit `4262625` (closes #20)
|
||||
|
||||
**File:** `exploits/msfrpc.py`
|
||||
|
||||
**Symptom:** msfrpcd encodes all response strings as msgpack `bin` type (always
|
||||
Python `bytes`). `unpackb(raw=False)` only converts the legacy `raw` type;
|
||||
`bin` comes out as `bytes` regardless. `auth.login` received
|
||||
`{b'result': b'success', b'token': b'TEMP...'}` and `resp.get("result")`
|
||||
returned `None` → `MSFRpcError("auth.login failed: ...")`.
|
||||
|
||||
**Fix:** Added `_decode_response()` recursive `bytes → str` normaliser and
|
||||
called it in `_raw_call` immediately after `msgpack.unpackb`.
|
||||
|
||||
---
|
||||
|
||||
### 6. Orchestrator never received `MSFRPC_PASSWORD` — commit `d294eb9`
|
||||
|
||||
**File:** `etc/cis490-orchestrator.service`
|
||||
|
||||
**Symptom:** The orchestrator unit only loaded `lab-host.env`, which contains
|
||||
`FLEET_HOST_ID` and `BRIDGE` but not `MSFRPC_PASSWORD`. `run_tier3_demo.py`
|
||||
checks for the env var at startup and exits `rc=2` immediately if unset.
|
||||
All tier3 slots were failing in ~240 ms with `rc=2`.
|
||||
|
||||
**Fix:** Added `EnvironmentFile=-/etc/cis490/msfrpc.env` to the unit (the `-`
|
||||
prefix silences the error on Tier-2-only hosts where the file doesn't exist).
|
||||
|
||||
---
|
||||
|
||||
### 7. Fleet port formula produces privileged ports; boot timeout too tight — commit `656a015`
|
||||
|
||||
**File:** `orchestrator/fleet.py`
|
||||
|
||||
**Symptom A:** `PORT_BASE = target_port + slot * 1000` produced host ports
|
||||
< 1024 for `samba_usermap_script` (RPORT=139, slot 0 → port 139) and
|
||||
`php_cgi_arg_injection` (RPORT=80, slot 0 → port 80). `cis490` lacks
|
||||
`CAP_NET_BIND_SERVICE`; QEMU's SLIRP `hostfwd` silently failed. The service
|
||||
was never reachable. All 7 slots returned `rc=1` after timing out.
|
||||
|
||||
**Symptom B:** `--target-boot-timeout` was not passed to `run_tier3_demo.py`,
|
||||
which uses a 180 s default. 7 concurrent VMs contending on I/O during boot
|
||||
cannot reliably start their services within 180 s.
|
||||
|
||||
**Fix:**
|
||||
- Port formula: `host_port = (target_port % 1000) + 2000 + slot * 1000`
|
||||
(minimum host port 2000, no collisions across module types or slots)
|
||||
- Pass `--target-boot-timeout 300` explicitly from the fleet runner
|
||||
|
||||
---
|
||||
|
||||
## Verification
|
||||
|
||||
After all fixes were applied:
|
||||
|
||||
- `install-tier-3-4.sh` step 4 produced episode `01KQJM5WGWC33P0QWJXRDJV1EN`
|
||||
- `install-tier-3-4.sh` step 5 staged 6 real binaries in `samples/store/`
|
||||
- Fleet wave at 19:55:57 UTC-6 confirmed slot 0 samba probing port 2139
|
||||
with 300 s timeout — first wave to actually run to completion
|
||||
|
||||
## Still outstanding
|
||||
|
||||
- Pi-side mTLS cert for k-gamingcom not yet issued (shipper in
|
||||
"waiting on mTLS material" state). Blocked on Pi operator running
|
||||
`deploy-cis490-cert.sh k-gamingcom <wg_ip>`. No action needed on
|
||||
lab-host side.
|
||||
Loading…
Add table
Reference in a new issue