orchestrator: snapshot_load event stamped after _write_meta I/O, t_mono_ns > 1ms — test_driver_events_persist_to_events_jsonl fails #7

Closed
opened 2026-04-30 00:47:33 -05:00 by mu · 1 comment
Member

Context

Running the full test suite after pulling 637fb06 (the big Tier-3/4 push).

What happened

tests/test_exploits.py::test_driver_events_persist_to_events_jsonl failed:

assert by_name["snapshot_load"]["t_mono_ns"] < 1_000_000  # <1ms after origin
E       assert 1048451 < 1000000

Root cause

In orchestrator/episode.py, run() sets _t_mono_origin_ns then immediately calls _write_meta() (which does file I/O: json.dump → write → os.replace) before emitting snapshot_load. By the time the event is stamped, 1+ ms have already elapsed.

# before fix
self._t_mono_origin_ns = time.monotonic_ns()
meta = self._initial_meta(started_at_wall)
self._write_meta(meta)                        # file I/O eats >1ms
self.emit_event("snapshot_load", ...)         # already late

What was tried / fix applied

Moved emit_event("snapshot_load") to fire immediately after the clock origin is set, before any file I/O. This also matches the semantic intent — snapshot_load should mark the moment the episode clock starts:

# after fix
self._t_mono_origin_ns = time.monotonic_ns()
self.emit_event("snapshot_load", ...)         # at origin
meta = self._initial_meta(started_at_wall)
self._write_meta(meta)

All 106 tests pass after the change.

Suggested next step

Commit the fix to orchestrator/episode.py and close this issue.

## Context Running the full test suite after pulling `637fb06` (the big Tier-3/4 push). ## What happened `tests/test_exploits.py::test_driver_events_persist_to_events_jsonl` failed: ``` assert by_name["snapshot_load"]["t_mono_ns"] < 1_000_000 # <1ms after origin E assert 1048451 < 1000000 ``` ## Root cause In `orchestrator/episode.py`, `run()` sets `_t_mono_origin_ns` then immediately calls `_write_meta()` (which does file I/O: json.dump → write → os.replace) *before* emitting `snapshot_load`. By the time the event is stamped, 1+ ms have already elapsed. ```python # before fix self._t_mono_origin_ns = time.monotonic_ns() meta = self._initial_meta(started_at_wall) self._write_meta(meta) # file I/O eats >1ms self.emit_event("snapshot_load", ...) # already late ``` ## What was tried / fix applied Moved `emit_event("snapshot_load")` to fire immediately after the clock origin is set, before any file I/O. This also matches the semantic intent — `snapshot_load` should mark the moment the episode clock starts: ```python # after fix self._t_mono_origin_ns = time.monotonic_ns() self.emit_event("snapshot_load", ...) # at origin meta = self._initial_meta(started_at_wall) self._write_meta(meta) ``` All 106 tests pass after the change. ## Suggested next step Commit the fix to `orchestrator/episode.py` and close this issue.
max closed this issue 2026-04-30 00:49:51 -05:00
Owner

Fixed in main — see commit on this issue. Thanks for the precise diagnosis (and the patch); applied verbatim. 106/106 tests pass on Pi5 too. Looking forward to your first real episode landing in index.jsonl.

Fixed in main — see commit on this issue. Thanks for the precise diagnosis (and the patch); applied verbatim. 106/106 tests pass on Pi5 too. Looking forward to your first real episode landing in `index.jsonl`.
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: bolyai/CIS490#7
No description provided.