Atomic Phase 3 witnesses via decide

Five kernel-rooted atomic-shape witnesses for `tokenize ∘ render = toTokens`:
  · tokenize_render_name_anonymous
  · tokenize_render_classifier_always
  · tokenize_render_classifier_never
  · tokenize_render_cterm_empty
  · tokenize_render_artifact_empty

Each closes via `decide` with `maxRecDepth 2000` — the kernel
fully reduces the entire chain (toLeanSource → String.toList →
tokenize → comparison) on the closed atomic input.

Recursive arms (.str, .num, .app, .lam, .comp, .transp, .meet, .join,
.cterm, .refTo, .source) require the four substantial distribution
lemmas (readIdent_app, readStrLit_app, tokenize_app_clean,
tokenize_render_X by induction) — documented as future work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Maximus Gorog 2026-05-01 13:03:51 -06:00
parent 6b9ac691cb
commit 62b73bb176

View file

@ -786,28 +786,56 @@ theorem tokenize_space_cons (rest : List Char) :
tokenize (' ' :: rest) = tokenize rest := by
simp [tokenize, tokenizeAux, isWhitespace]
-- Phase 3 deferred: the full `tokenize ∘ render = toTokens`
-- universal theorem requires careful String/List reasoning about
-- `readIdent` / `readStrLit` distribution. Proof sketch:
-- Atomic-arm Phase 3 witnesses via `decide`. Each closed input
-- reduces in the kernel to a concrete token list; we prove
-- structural identity between `tokenize (toLeanSource v).toList`
-- and `toTokens v` for these atomic shapes.
set_option maxRecDepth 2000 in
theorem tokenize_render_name_anonymous :
tokenize (nameToLeanSource Lean.Name.anonymous).toList =
nameToTokens Lean.Name.anonymous := by
decide
set_option maxRecDepth 2000 in
theorem tokenize_render_classifier_always :
tokenize (MetaClassifier.toLeanSource MetaClassifier.always).toList =
MetaClassifier.toTokens MetaClassifier.always := by
decide
set_option maxRecDepth 2000 in
theorem tokenize_render_classifier_never :
tokenize (MetaClassifier.toLeanSource MetaClassifier.never).toList =
MetaClassifier.toTokens MetaClassifier.never := by
decide
set_option maxRecDepth 2000 in
theorem tokenize_render_cterm_empty :
tokenize (MetaCTerm.toLeanSource MetaCTerm.empty).toList =
MetaCTerm.toTokens MetaCTerm.empty := by
decide
set_option maxRecDepth 2000 in
theorem tokenize_render_artifact_empty :
tokenize (MetaArtifact.toLeanSource MetaArtifact.empty).toList =
MetaArtifact.toTokens MetaArtifact.empty := by
decide
-- The recursive arms (`.str`, `.app`, `.lam`, etc.) require the
-- four substantial lemmas sketched below — multi-page Lean
-- reasoning about `readIdent` / `readStrLit` distribution.
-- Documented as future work; the token-level universal above
-- plus `decide` on closed instances cover the round-trip
-- operationally in the meantime.
--
-- readIdent_split : reading an ident sequence followed by a
-- non-ident-rest char (or end of input) yields
-- exactly the accumulated string.
-- readStrLit_split : reading an escapeStrLit-encoded body until
-- the closing `"` recovers the original string.
-- readIdent_app : readIdent on (ident-chars ++ rest) where
-- rest starts cleanly returns (acc ++ ident, rest).
-- readStrLit_app : readStrLit on (escapeStrLit body ++ '\"' ++ rest)
-- returns (body, rest).
-- tokenize_app_clean: tokenize distributes over a concatenation
-- where the prefix ends "cleanly" (rparen,
-- whitespace, or strLit close).
-- where the prefix ends "cleanly".
-- tokenize_render_X: induction over each meta-mirror type using
-- the above plus IH on sub-values.
--
-- These compose to `∀ v, tokenize (toLeanSource v).toList = toTokens v`,
-- which combined with the Phase 2 token-level universal gives the
-- String-level universal `∀ v, fromLeanSource? (toLeanSource v) = some v`.
--
-- The kernel-rooted `decide`-based tests for closed instances below
-- (and in `Infoductor/Test.lean`) provide empirical coverage in
-- the meantime.
-- ── Round-trip — atomic kernel-reducible witnesses ─────────────────────────
-- For non-recursive shapes the round-trip is closed by `rfl` (or