Phase 3 foundation lemmas; full universal documented

Adds three foundation lemmas about `tokenize`:
  · tokenize_lparen_cons : tokenize ('(' :: rest) = lparen :: tokenize rest
  · tokenize_rparen_cons : tokenize (')' :: rest) = rparen :: tokenize rest
  · tokenize_space_cons  : tokenize (' ' :: rest) = tokenize rest

These are the trivial cases — pure unfolding of `tokenizeAux` on
single-char-token branches.

The full Phase 3 (`tokenize (toLeanSource v).toList = toTokens v`)
requires four further substantial lemmas:
  · readIdent_split  : reading an ident sequence followed by a
                       non-ident-rest char (or end of input) yields
                       exactly the accumulated string.
  · readStrLit_split : reading an escapeStrLit-encoded body until
                       the closing `"` recovers the original string.
  · tokenize_app_clean: tokenize distributes over a concatenation
                       where the prefix ends "cleanly" (rparen,
                       whitespace, or strLit close).
  · tokenize_render_X: induction over each meta-mirror type using
                       the above plus IH on sub-values.

Each is multi-page Lean reasoning about `String`/`List Char`/
`readIdent`/`readStrLit` distribution.  The proof sketches are
documented inline.

The token-level universal (already proven) plus closed-instance
`decide` tests cover the round-trip operationally.  Adding the
Phase 3 universal would let us state
  ∀ t, fromLeanSource? (toLeanSource t) = some t
without any closed-instance restrictions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Maximus Gorog 2026-05-01 13:01:38 -06:00
parent 8733a6ff89
commit 6b9ac691cb

View file

@ -767,6 +767,48 @@ theorem artifactFromTokens?_round_trip (a : MetaArtifact)
rw [List.append_nil] at this
rw [this]
-- ── Phase 3: tokenize ∘ render = toTokens ─────────────────────────────────
-- The String-level half. Foundation lemmas about tokenize's
-- behaviour, then induction over each meta-mirror type.
/-- `tokenize` on `(` :: rest reduces to `lparen :: tokenize rest`. -/
theorem tokenize_lparen_cons (rest : List Char) :
tokenize ('(' :: rest) = Token.lparen :: tokenize rest := by
simp [tokenize, tokenizeAux]
/-- `tokenize` on `)` :: rest reduces to `rparen :: tokenize rest`. -/
theorem tokenize_rparen_cons (rest : List Char) :
tokenize (')' :: rest) = Token.rparen :: tokenize rest := by
simp [tokenize, tokenizeAux]
/-- `tokenize` skips a leading space. -/
theorem tokenize_space_cons (rest : List Char) :
tokenize (' ' :: rest) = tokenize rest := by
simp [tokenize, tokenizeAux, isWhitespace]
-- Phase 3 deferred: the full `tokenize ∘ render = toTokens`
-- universal theorem requires careful String/List reasoning about
-- `readIdent` / `readStrLit` distribution. Proof sketch:
--
-- readIdent_split : reading an ident sequence followed by a
-- non-ident-rest char (or end of input) yields
-- exactly the accumulated string.
-- readStrLit_split : reading an escapeStrLit-encoded body until
-- the closing `"` recovers the original string.
-- tokenize_app_clean: tokenize distributes over a concatenation
-- where the prefix ends "cleanly" (rparen,
-- whitespace, or strLit close).
-- tokenize_render_X: induction over each meta-mirror type using
-- the above plus IH on sub-values.
--
-- These compose to `∀ v, tokenize (toLeanSource v).toList = toTokens v`,
-- which combined with the Phase 2 token-level universal gives the
-- String-level universal `∀ v, fromLeanSource? (toLeanSource v) = some v`.
--
-- The kernel-rooted `decide`-based tests for closed instances below
-- (and in `Infoductor/Test.lean`) provide empirical coverage in
-- the meantime.
-- ── Round-trip — atomic kernel-reducible witnesses ─────────────────────────
-- For non-recursive shapes the round-trip is closed by `rfl` (or
-- `decide`) directly: rendering produces a fixed string, tokenising