Adds a pull-based cert distribution path so install-lab-host.sh can fetch its own leaf cert without operator intervention. Removes the ssh-from-Pi requirement that blocked elliott-lab. How the chicken-and-egg gets solved: a freshly wg-enrolled lab host already has WG access (gate kept by iptmonads at L4) and trusts the Caddy local CA (bundled in this repo at etc/caddy-root.crt). It makes a single TLS call to https://bootstrap.wg/v1/cert/<host_id> — no mTLS — gets back a tar of {ca.crt, leaf.pem, leaf.key}, extracts to /etc/cis490/certs/, and the shipper unblocks. Trust boundary is "reached :443 over WG"; no operator action needed. bootstrap/ app.py Starlette: GET /v1/cert/{host_id}, GET /v1/health. Validates host_id charset, rate-limits per source IP, logs every mint with the X-Real-IP Caddy injects. __main__.py uvicorn launcher; runs as root because the wg-pki CA private key is root-only. etc/cis490-bootstrap.service systemd unit on 127.0.0.1:8446 with ProtectSystem=strict + narrow ReadWritePaths=/var/lib/wg-pki. ProtectHome=no because systemd's read-only mode hides /home contents (the issuer script the wrapper exec's lives there). scripts/issue-cis490-client-cert-wrapper.sh Adapter the bootstrap service shells out to. Resolves the actual wg-pki issuer script across the three plausible install layouts (/opt/wg-pki, /home/max/wg-pki, /home/max/.env/wg-pki) so a single copy of the unit file works on any operator's box. Forces --out-dir to /var/lib/wg-pki/issued so writes stay inside the service's narrow ReadWritePaths. scripts/install-lab-host.sh After scaffolding lab-host.toml, if /etc/cis490/certs/lab-host.pem is absent, curls bootstrap.wg with --cacert etc/caddy-root.crt (no chicken-and-egg), extracts, chowns/chmods. Skips silently if bootstrap.wg is unreachable so manual hand-carry remains possible. scripts/install-receiver.sh Drops cis490-bootstrap.service alongside cis490-receiver and prints both as "enable --now" candidates. cis490-bootstrap is the thing that makes lab hosts self-provisioning. etc/caddy-root.crt Bundled copy of wg-pki's published Caddy local CA root, so the bootstrap fetch can verify TLS without depending on a wg-pki clone that may or may not be on the lab host yet. Verified live on the Pi: $ curl --cacert etc/caddy-root.crt https://bootstrap.wg/v1/cert/elliott-lab -o /tmp/x.tar HTTP 200 size=10240 $ tar tf /tmp/x.tar ca.crt elliott-lab.key elliott-lab.pem $ openssl verify -CAfile … elliott-lab.pem /tmp/.../elliott-lab.pem: OK $ openssl x509 -subject … -noout subject=CN=elliott-lab Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
146 lines
5.3 KiB
Python
146 lines
5.3 KiB
Python
"""``cis490-bootstrap`` — auto-issue mTLS leaf certs to enrolled lab hosts.
|
|
|
|
This is the chicken-and-egg fix for first-time lab-host setup. A
|
|
freshly wg-enrolled device has WG access (and trusts the wg-pki CA)
|
|
but has no client cert yet, so it can't authenticate to the
|
|
mTLS-protected ``collector.wg``. This service exposes a *plain-TLS*
|
|
(no client-auth) endpoint that the lab host can call once during
|
|
``install-lab-host.sh`` to retrieve its leaf cert tarball.
|
|
|
|
Trust boundary: anything that reaches ``bootstrap.wg`` has already
|
|
passed iptmonads' WG-membership check at L4. No further
|
|
authentication is required for the bootstrap pull — by the time a
|
|
caller can connect at all they're a peer the operator authorized.
|
|
|
|
The privilege boundary, on the other hand, is real: minting certs
|
|
requires the wg-pki CA private key (root-only at
|
|
``/var/lib/wg-pki/cis490-client-ca/ca.key``). This service therefore
|
|
runs as root in a tight sandbox (see ``etc/cis490-bootstrap.service``)
|
|
and shells out to ``issue-cis490-client-cert.sh`` for each mint.
|
|
|
|
Endpoints:
|
|
|
|
GET /v1/cert/{host_id} — return tarball of {ca.crt, leaf.pem, leaf.key}
|
|
for ``host_id``. Cached — successive calls
|
|
return the same bytes.
|
|
GET /v1/health — liveness probe (no auth needed).
|
|
|
|
Each mint is logged with the source IP (after Caddy's X-Real-IP
|
|
forward) so the operator has an audit trail of which devices have
|
|
fetched which certs.
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
import logging
|
|
import re
|
|
import subprocess
|
|
import time
|
|
from pathlib import Path
|
|
from typing import Awaitable, Callable
|
|
|
|
from starlette.applications import Starlette
|
|
from starlette.requests import Request
|
|
from starlette.responses import FileResponse, JSONResponse, Response
|
|
from starlette.routing import Route
|
|
|
|
|
|
log = logging.getLogger("cis490.bootstrap")
|
|
|
|
|
|
# Sane host_id charset — same rules the receiver enforces, mirrored
|
|
# here so mint requests can't smuggle path traversal in.
|
|
_HOST_ID_RE = re.compile(r"^[A-Za-z0-9_.-]{1,64}$")
|
|
|
|
|
|
def _is_valid_host_id(s: str) -> bool:
|
|
return bool(_HOST_ID_RE.match(s))
|
|
|
|
|
|
def make_app(
|
|
*,
|
|
issuer_script: Path,
|
|
issued_root: Path,
|
|
rate_limit_window_s: float = 5.0,
|
|
) -> Starlette:
|
|
"""Build the Starlette app. Wired by the production launcher in
|
|
``bootstrap/__main__.py``; tests can pass synthetic paths."""
|
|
issued_root.mkdir(parents=True, exist_ok=True)
|
|
|
|
# Coarse per-IP rate limiter to make a casual scan annoying. Not
|
|
# a real defense — the WG mesh is the actual perimeter.
|
|
last_request: dict[str, float] = {}
|
|
|
|
async def health(request: Request) -> Response:
|
|
return JSONResponse({"status": "ok"})
|
|
|
|
async def get_cert(request: Request) -> Response:
|
|
host_id: str = request.path_params["host_id"]
|
|
if not _is_valid_host_id(host_id):
|
|
return JSONResponse({"error": "bad host_id"}, status_code=400)
|
|
|
|
# Caddy forwards the original WG-side IP via X-Real-IP /
|
|
# X-Forwarded-For; fall back to the direct peer if running
|
|
# without Caddy in front (tests).
|
|
src = (
|
|
request.headers.get("x-real-ip")
|
|
or (request.headers.get("x-forwarded-for") or "").split(",")[0].strip()
|
|
or (request.client.host if request.client else "?")
|
|
)
|
|
|
|
now = time.monotonic()
|
|
prev = last_request.get(src, 0.0)
|
|
if (now - prev) < rate_limit_window_s:
|
|
return JSONResponse(
|
|
{"error": "rate limited; back off"},
|
|
status_code=429,
|
|
)
|
|
last_request[src] = now
|
|
|
|
tar_path = issued_root / host_id / f"{host_id}.tar"
|
|
if not tar_path.exists():
|
|
log.info("minting cert for host_id=%s src=%s", host_id, src)
|
|
try:
|
|
subprocess.run(
|
|
[
|
|
str(issuer_script), host_id,
|
|
"--out-dir", str(issued_root / host_id),
|
|
],
|
|
check=True,
|
|
capture_output=True,
|
|
text=True,
|
|
timeout=30,
|
|
)
|
|
except subprocess.CalledProcessError as e:
|
|
log.error("issue script failed for %s: rc=%d stderr=%s",
|
|
host_id, e.returncode, e.stderr[:500])
|
|
return JSONResponse(
|
|
{"error": "mint failed", "detail": e.stderr[:500]},
|
|
status_code=500,
|
|
)
|
|
except (OSError, subprocess.TimeoutExpired) as e:
|
|
log.exception("issue script transport error for %s", host_id)
|
|
return JSONResponse(
|
|
{"error": f"transport: {e}"},
|
|
status_code=500,
|
|
)
|
|
else:
|
|
log.info("cache hit for host_id=%s src=%s", host_id, src)
|
|
|
|
if not tar_path.exists():
|
|
return JSONResponse({"error": "tarball not produced"}, status_code=500)
|
|
return FileResponse(
|
|
tar_path,
|
|
media_type="application/x-tar",
|
|
filename=f"{host_id}.tar",
|
|
headers={
|
|
"X-Cis490-Host-Id": host_id,
|
|
"X-Cis490-Cert-Source-IP": src,
|
|
},
|
|
)
|
|
|
|
routes = [
|
|
Route("/v1/health", health, methods=["GET"]),
|
|
Route("/v1/cert/{host_id}", get_cert, methods=["GET"]),
|
|
]
|
|
return Starlette(routes=routes)
|