Dev_REL2_043026: k-gamingcom first bring-up — 3 install bugs + mTLS certs needed #10

Closed
opened 2026-04-30 16:07:14 -05:00 by elliott · 1 comment
Owner

Lab Host: k-gamingcom

Branch: Dev_REL2_043026
Date: 2026-04-30
Operator: elliott


Bugs Found and Fixed

All three bugs hit during first scripts/install-lab-host.sh run on k-gamingcom. Fixed in branch Dev_REL2_043026.

Bug 1 — pycdlib in wrong dependency group

File: pyproject.toml

pycdlib was listed under [dependency-groups] dev but install-lab-host.sh only installs main deps into the service venv. build_cidata.py failed on first install with ModuleNotFoundError: No module named pycdlib. The in-guest agent ISO was not built.

Fix: Move pycdlib>=1.14 to the dependencies list.


Bug 2 — vm/images/ directory not created in install root before symlinking

File: scripts/install-lab-host.sh

The install script tries to symlink the Alpine qcow2 and cidata ISO into $INSTALL_ROOT/vm/images/, but that directory is never created. The ln -sf ... || true silently fails. Result: every episode exits with rc=1 within 15 s because launch_demo.sh cannot find the image.

Fix: Add install -d -o cis490 -g cis490 -m 0755 $INSTALL_ROOT/vm/images before the symlink calls.


Bug 3 — cis490_doctor.py module import + subprocess CWD

File: tools/cis490_doctor.py

Two issues:

a. The doctor does from exploits.modules import load_module_configs inline (not in a subprocess), but sys.path does not include the repo root. Since package = false, nothing is installed into site-packages. The import fails with No module named exploits, giving a false red row in the report.

b. The check_end_to_end function runs python -m shipper --ping via subprocess without cwd set. The shipper module is only importable when CWD is the repo root. The subprocess inherits whatever the caller's CWD is, causing No module named shipper.

Fix a: Insert repo_root into sys.path at the start of main().
Fix b: Pass cwd="/opt/cis490" to the _run() call for the shipper ping. Also add cwd parameter to _run() helper.


Remaining Items (Operator Action Required)

Item Status Action
mTLS leaf cert for k-gamingcom missing On Pi: sudo /home/max/.env/wg-pki/scripts/deploy-cis490-cert.sh k-gamingcom <wg_ip>
collector.wg DNS workaround applied Added 10.100.0.1 collector.wg bootstrap.wg to /etc/hosts (permanent fix: run wg-enroll)
cis490-shipper service failing Will start cleanly once certs land at /etc/cis490/certs/

Verified Working

  • cis490-orchestrator runs 7 concurrent Tier-2 KVM episodes (rc=0, ~130 s each)
  • All 5 telemetry sources writing: telemetry-proc.jsonl, telemetry-qmp.jsonl, telemetry-guest.jsonl, labels.jsonl, events.jsonl, done.marker
  • 7 completed episodes in /var/lib/cis490/data/episodes/ after first wave
  • cis490-orchestrator systemd unit enabled with Restart=always for continuous generation

Doctor Summary (post-fix)

summary: 13 ok, 4 warn, 5 fail

Remaining fails: mTLS certs (3 files missing), shipper service inactive, shipper ping FileNotFoundError. All blocked on Pi cert issuance.

## Lab Host: k-gamingcom **Branch:** `Dev_REL2_043026` **Date:** 2026-04-30 **Operator:** elliott --- ## Bugs Found and Fixed All three bugs hit during first `scripts/install-lab-host.sh` run on `k-gamingcom`. Fixed in branch `Dev_REL2_043026`. ### Bug 1 — `pycdlib` in wrong dependency group **File:** `pyproject.toml` `pycdlib` was listed under `[dependency-groups] dev` but `install-lab-host.sh` only installs main deps into the service venv. `build_cidata.py` failed on first install with `ModuleNotFoundError: No module named pycdlib`. The in-guest agent ISO was not built. **Fix:** Move `pycdlib>=1.14` to the `dependencies` list. --- ### Bug 2 — `vm/images/` directory not created in install root before symlinking **File:** `scripts/install-lab-host.sh` The install script tries to symlink the Alpine qcow2 and cidata ISO into `$INSTALL_ROOT/vm/images/`, but that directory is never created. The `ln -sf ... || true` silently fails. Result: every episode exits with `rc=1` within 15 s because `launch_demo.sh` cannot find the image. **Fix:** Add `install -d -o cis490 -g cis490 -m 0755 $INSTALL_ROOT/vm/images` before the symlink calls. --- ### Bug 3 — `cis490_doctor.py` module import + subprocess CWD **File:** `tools/cis490_doctor.py` Two issues: **a.** The doctor does `from exploits.modules import load_module_configs` inline (not in a subprocess), but `sys.path` does not include the repo root. Since `package = false`, nothing is installed into site-packages. The import fails with `No module named exploits`, giving a false red row in the report. **b.** The `check_end_to_end` function runs `python -m shipper --ping` via subprocess without `cwd` set. The shipper module is only importable when CWD is the repo root. The subprocess inherits whatever the caller's CWD is, causing `No module named shipper`. **Fix a:** Insert `repo_root` into `sys.path` at the start of `main()`. **Fix b:** Pass `cwd="/opt/cis490"` to the `_run()` call for the shipper ping. Also add `cwd` parameter to `_run()` helper. --- ## Remaining Items (Operator Action Required) | Item | Status | Action | |---|---|---| | mTLS leaf cert for k-gamingcom | missing | On Pi: `sudo /home/max/.env/wg-pki/scripts/deploy-cis490-cert.sh k-gamingcom <wg_ip>` | | `collector.wg` DNS | workaround applied | Added `10.100.0.1 collector.wg bootstrap.wg` to `/etc/hosts` (permanent fix: run wg-enroll) | | `cis490-shipper` service | failing | Will start cleanly once certs land at `/etc/cis490/certs/` | --- ## Verified Working - `cis490-orchestrator` runs 7 concurrent Tier-2 KVM episodes (`rc=0`, ~130 s each) - All 5 telemetry sources writing: `telemetry-proc.jsonl`, `telemetry-qmp.jsonl`, `telemetry-guest.jsonl`, `labels.jsonl`, `events.jsonl`, `done.marker` - 7 completed episodes in `/var/lib/cis490/data/episodes/` after first wave - `cis490-orchestrator` systemd unit enabled with `Restart=always` for continuous generation ## Doctor Summary (post-fix) ``` summary: 13 ok, 4 warn, 5 fail ``` Remaining fails: mTLS certs (3 files missing), shipper service inactive, shipper ping FileNotFoundError. All blocked on Pi cert issuance.
elliott self-assigned this 2026-04-30 16:07:14 -05:00
Owner

Fixes merged to main in 95ac56a (Dev_REL2_043026 fast-forward).

  • pycdlib moved to main deps so install-lab-host.sh's service venv has it
  • vm/images dir is now created with install -d before the symlink calls
  • doctor sets sys.path to repo_root, _run() takes a cwd kwarg, shipper --ping subprocess passes cwd=/opt/cis490

New lab-host first-boot should now show the doctor row green for tier3 module catalog and survive past the build_cidata step.

Fixes merged to main in 95ac56a (Dev_REL2_043026 fast-forward). - pycdlib moved to main deps so install-lab-host.sh's service venv has it - vm/images dir is now created with install -d before the symlink calls - doctor sets sys.path to repo_root, _run() takes a cwd kwarg, shipper --ping subprocess passes cwd=/opt/cis490 New lab-host first-boot should now show the doctor row green for tier3 module catalog and survive past the build_cidata step.
max closed this issue 2026-04-30 16:14:23 -05:00
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: bolyai/CIS490#10
No description provided.