Branch HEAD: 656a015443
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5.4 KiB
Fix Notes — Dev_REL3_050126
Branch HEAD: 656a015443f54dffeab66ae29fa726eee36a51ed
Date: 2026-05-02
Author: elliott (k-gamingcom lab host)
Summary
Seven bugs found and fixed during Tier-3 + Tier-4 bring-up on k-gamingcom,
following the AGENTS.md runbook. All fixes are committed to Dev_REL3_050126
and deployed to /opt/cis490.
Fixes (oldest → newest)
1. cis490-msfrpcd crashes with EROFS on /root/.msf4 — commit 1dd484d
File: scripts/install-msfrpcd.sh
Symptom: msfrpcd service failed immediately with EROFS because
ProtectHome=true in the generated systemd unit made /root a read-only
overlay. msfrpcd defaulted $HOME to /root and could not create .msf4/.
Fix: Pre-create /var/lib/cis490/msf4, add Environment=HOME=/var/lib/cis490/msf4
and ReadWritePaths=/var/lib/cis490 to the generated unit.
2. Two Tier-3 install bugs — metasploitable2 symlink + msfrpcd HOME — commit ae4b80d
Files: scripts/install-tier-3-4.sh
Symptom A: install-tier-3-4.sh fetched the Metasploitable2 image to
$DATA_ROOT/vm/images/ but never symlinked it to $INSTALL_ROOT/vm/images/.
launch_target.sh resolved IMAGE relative to $INSTALL_ROOT/vm/images/
and exited immediately; qemu.pid never appeared.
Fix: Added install -d + ln -sf step after the fetch.
Symptom B: Same install bug also carried over the HOME fix above into
the install script's live-patch path.
3. PORT_BASE=21 is privileged; RPORT not propagated — commit f4eef81
Files: vm/launch_target.sh, exploits/driver.py, tools/run_tier3_demo.py
Symptom: launch_target.sh defaulted PORT_BASE to $((21 + SLOT * 100)).
Slot 0 → port 21, which cis490 (non-root) cannot bind. QEMU printed
bind(AF_INET, ...): Permission denied and exited before booting the guest.
Even if the port had worked, DriverConfig had no way to override RPORT,
so the exploit module would have still connected to port 21 (not the hostfwd'd
port).
Fix:
launch_target.sh:PORT_BASEdefault →$((2121 + SLOT * 100))DriverConfig: addedtarget_port: int | NonefieldMSFExploitDriver._fire(): iftarget_portset and RPORT in opts, overriderun_tier3_demo.py: passtarget_port=args.target_porttoDriverConfiginstall-tier-3-4.shverify call:--target-port 2121
4. run_tier3_demo.py --data-root defaults to relative "data" — commit d2716b4
Files: scripts/install-tier-3-4.sh
Symptom: run_tier3_demo.py defaults --data-root to "data" (relative).
When invoked via sudo -u cis490, the CWD was /, so episode dirs resolved to
/data/episodes/ which doesn't exist; mkdir raised PermissionError.
Fix: Pass --data-root "$DATA_ROOT/data" explicitly in the install script.
5. msfrpc bytes/str normalisation — commit 4262625 (closes #20)
File: exploits/msfrpc.py
Symptom: msfrpcd encodes all response strings as msgpack bin type (always
Python bytes). unpackb(raw=False) only converts the legacy raw type;
bin comes out as bytes regardless. auth.login received
{b'result': b'success', b'token': b'TEMP...'} and resp.get("result")
returned None → MSFRpcError("auth.login failed: ...").
Fix: Added _decode_response() recursive bytes → str normaliser and
called it in _raw_call immediately after msgpack.unpackb.
6. Orchestrator never received MSFRPC_PASSWORD — commit d294eb9
File: etc/cis490-orchestrator.service
Symptom: The orchestrator unit only loaded lab-host.env, which contains
FLEET_HOST_ID and BRIDGE but not MSFRPC_PASSWORD. run_tier3_demo.py
checks for the env var at startup and exits rc=2 immediately if unset.
All tier3 slots were failing in ~240 ms with rc=2.
Fix: Added EnvironmentFile=-/etc/cis490/msfrpc.env to the unit (the -
prefix silences the error on Tier-2-only hosts where the file doesn't exist).
7. Fleet port formula produces privileged ports; boot timeout too tight — commit 656a015
File: orchestrator/fleet.py
Symptom A: PORT_BASE = target_port + slot * 1000 produced host ports
< 1024 for samba_usermap_script (RPORT=139, slot 0 → port 139) and
php_cgi_arg_injection (RPORT=80, slot 0 → port 80). cis490 lacks
CAP_NET_BIND_SERVICE; QEMU's SLIRP hostfwd silently failed. The service
was never reachable. All 7 slots returned rc=1 after timing out.
Symptom B: --target-boot-timeout was not passed to run_tier3_demo.py,
which uses a 180 s default. 7 concurrent VMs contending on I/O during boot
cannot reliably start their services within 180 s.
Fix:
- Port formula:
host_port = (target_port % 1000) + 2000 + slot * 1000(minimum host port 2000, no collisions across module types or slots) - Pass
--target-boot-timeout 300explicitly from the fleet runner
Verification
After all fixes were applied:
install-tier-3-4.shstep 4 produced episode01KQJM5WGWC33P0QWJXRDJV1ENinstall-tier-3-4.shstep 5 staged 6 real binaries insamples/store/- Fleet wave at 19:55:57 UTC-6 confirmed slot 0 samba probing port 2139 with 300 s timeout — first wave to actually run to completion
Still outstanding
- Pi-side mTLS cert for k-gamingcom not yet issued (shipper in
"waiting on mTLS material" state). Blocked on Pi operator running
deploy-cis490-cert.sh k-gamingcom <wg_ip>. No action needed on lab-host side.