Two correctness fixes that the §4.5 event-driven labeller surfaced:
1. tools/run_real_vm_demo.py was hardcoding a Tier-3-shaped schedule
(clean → armed → infecting → infected_running → ...) for episodes
with no exploit firing. Pre-§4.5 those episodes wrote dishonest
`infected_running` labels from the schedule clock — exactly the §3
evidence pattern. Post-§4.5 they write `failed` at the infecting
transition (the justifying exploit_fire never arrives), which is
honest about what happened but useless for training.
The honest fix: Tier-2 episodes have a clean-only schedule. All
telemetry tagged `clean` because nothing infected anything. The
total duration matches the canonical Tier-3 schedule so episode
lengths are comparable across tiers — no length-bias in the
dataset (§10).
Helper `tier2_schedule_from(schedule)` in orchestrator/manifest.py
derives `[("clean", total_seconds)]` from the canonical schedule.
`tier3_schedule_from(schedule)` renders the legacy
`[(name, seconds)]` shape EpisodeConfig still expects.
Tier-2 demo (run_real_vm_demo.py) now calls tier2_schedule_from.
Tier-3 demo (run_tier3_demo.py) now calls tier3_schedule_from.
Drops the hardcoded DEFAULT_SCHEDULE constants from both — the
canonical manifest is the single source of truth (§4.1).
2. .gitignore now excludes /VERSION. The install-lab-host.sh stamp
writes /opt/cis490/VERSION so episodes can record code provenance
without /opt/cis490 carrying a .git directory. But /opt/cis490 IS
typically a git checkout on lab hosts (auto-update.sh pulls into
it), so writing VERSION leaves the working tree dirty. Every
episode's meta.code_version.dirty=true. PIPELINE.md §4.6 acceptance
gate's rule 4 would then reject every episode without
CIS490_ALLOW_DIRTY=1 set — which would break the data flow.
Now VERSION is .gitignored: install-lab-host.sh stamps it, git
status doesn't see it, dirty=false, gate rule 4 passes naturally.
These two changes together keep the data flowing AND honest. Tier-2
episodes pass with `phases=[clean]` + every collector emitting real
rows. Tier-3 episodes (none today, empty catalog) walk the full
event-driven schedule when a verified module gets re-admitted.
286 tests passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
58 lines
797 B
Text
58 lines
797 B
Text
# Disk images and snapshots
|
|
*.iso
|
|
*.img
|
|
*.qcow2
|
|
*.qcow2.*
|
|
*.vmdk
|
|
*.vdi
|
|
*.raw
|
|
vm/images/
|
|
vm/snapshots/
|
|
|
|
# VERSION file is install-script-stamped (provenance for episodes
|
|
# generated from /opt/cis490 install copies). Tracking it would
|
|
# trigger spurious dirty-tree state on lab hosts and reject every
|
|
# episode at the §4.6 acceptance gate.
|
|
/VERSION
|
|
|
|
# Telemetry output
|
|
data/episodes/
|
|
data/campaign.json
|
|
data/campaign_done.marker
|
|
data/outbox/
|
|
data/shipped/
|
|
*.pcap
|
|
*.pcapng
|
|
|
|
# Malware samples — NEVER commit binaries
|
|
samples/store/
|
|
*.bin
|
|
*.elf
|
|
*.exe
|
|
*.dll
|
|
*.so.malware
|
|
|
|
# Python
|
|
__pycache__/
|
|
*.py[cod]
|
|
.venv/
|
|
venv/
|
|
.pytest_cache/
|
|
.mypy_cache/
|
|
.ruff_cache/
|
|
*.egg-info/
|
|
dist/
|
|
build/
|
|
|
|
# Editor
|
|
.vscode/
|
|
.idea/
|
|
*.swp
|
|
.DS_Store
|
|
|
|
# Local secrets (never commit)
|
|
.env
|
|
.env.local
|
|
secrets.toml
|
|
*.pat
|
|
*.token
|