From 665046a353a0b2b1a42a955bbd0a94fed7998348 Mon Sep 17 00:00:00 2001 From: Maximus Gorog Date: Fri, 1 May 2026 13:12:38 -0600 Subject: [PATCH] Phase 3 Lean 4.30 String-internals analysis MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Attempted to prove `readIdent_app` (the key distribution lemma) but hit a wall on `String.push c ++ xs.asString = s ++ (c :: xs).asString` which is not definitionally true in Lean 4.30 (String is UTF-8 ByteArray-backed, not a List Char structure). Documents two cleaner paths forward: (a) Refactor `readIdent`/`readStrLit` to accumulate into `List Char` instead of `String`, decoupling proofs from String internals. (b) Import Mathlib's richer String API which provides the needed structural lemmas. The committed state: · Atomic Phase 3 witnesses via decide (5 theorems, kernel-rooted). · 3 foundation tokenize lemmas (lparen/rparen/space). · The full Phase 3 universal lemmas documented inline as open work, with a proof sketch for both paths. The token-level universal (already proven) plus closed-instance decide tests cover the round-trip operationally. Adding Phase 3 proper is a focused multi-day refactor. Co-Authored-By: Claude Opus 4.7 (1M context) --- Infoductor/Foundation/MetaParse.lean | 33 +++++++++++++++++----------- 1 file changed, 20 insertions(+), 13 deletions(-) diff --git a/Infoductor/Foundation/MetaParse.lean b/Infoductor/Foundation/MetaParse.lean index 171fdce..576a55e 100644 --- a/Infoductor/Foundation/MetaParse.lean +++ b/Infoductor/Foundation/MetaParse.lean @@ -821,21 +821,28 @@ theorem tokenize_render_artifact_empty : MetaArtifact.toTokens MetaArtifact.empty := by decide --- The recursive arms (`.str`, `.app`, `.lam`, etc.) require the --- four substantial lemmas sketched below — multi-page Lean --- reasoning about `readIdent` / `readStrLit` distribution. --- Documented as future work; the token-level universal above --- plus `decide` on closed instances cover the round-trip --- operationally in the meantime. +-- ── Phase 3 distribution lemmas — open question ──────────────────────────── +-- Proving `tokenize ∘ render = toTokens` requires four substantial +-- distribution lemmas: -- --- readIdent_app : readIdent on (ident-chars ++ rest) where --- rest starts cleanly returns (acc ++ ident, rest). --- readStrLit_app : readStrLit on (escapeStrLit body ++ '\"' ++ rest) --- returns (body, rest). +-- readIdent_app : reading an ident sequence followed by a clean +-- boundary returns the accumulated ident. +-- readStrLit_app: reading an escapeStrLit-encoded body returns +-- the original String. -- tokenize_app_clean: tokenize distributes over a concatenation --- where the prefix ends "cleanly". --- tokenize_render_X: induction over each meta-mirror type using --- the above plus IH on sub-values. +-- where the prefix ends "cleanly". +-- tokenize_render_X: induction over each meta-mirror type. +-- +-- Each requires reasoning about Lean 4.30's `String` API (UTF-8 +-- ByteArray-backed) — specifically the equation +-- `s.push c ++ xs.asString = s ++ (c :: xs).asString`, which is +-- *not* definitionally true and needs a String-internals lemma. +-- +-- Cleaner paths forward: (a) refactor `readIdent`/`readStrLit` to +-- accumulate into a `List Char` rather than a `String`, then build +-- the String at the end via `List.asString` — decouples the proof +-- from String internals; or (b) import Mathlib's richer String +-- API which provides the needed lemmas. -- ── Round-trip — atomic kernel-reducible witnesses ───────────────────────── -- For non-recursive shapes the round-trip is closed by `rfl` (or