Commit graph

7 commits

Author SHA1 Message Date
Maximus Gorog
c995d4b323 Foundation: structural Lean.Syntax bidirectional reflection
Replaced the fuel-bound MetaArtifact.declAt encoding with real
bidirectional reflection of Lean.Syntax through MetaArtifact.

## Why

The previous encoding capped Lean.Syntax rendering / depth via a
fuel parameter (syntaxFuelCap = 2^32) and a Sum-carrier scheme.
The .declAt round-trip lemma at MetaParse.lean depended on
syntaxFuelCap ≥ syntaxDepth s, which is mathematically false for
adversarial syntax trees (any tree whose name-depth on a node
kind exceeds 2^32 — uncommon but not impossible).  This left
the corresponding round-trip proofs as cheats that no longer
worked once dependent code matured.

Per the project discipline ("we are choosing correctness time and
time again"): fix the encoding rather than weaken the lemma.

## What landed

Foundation/Meta.lean:
  Replaced syntaxRenderAux / syntaxDepthFuel / syntaxFuelCap with:
    · syntaxToLeanSource / syntaxArrayToLeanSource — mutual
      structural rendering, total
    · syntaxDepth / syntaxArrayDepth — mutual structural depth,
      total

Foundation/MetaParse.lean:
  Refactored parseSyntax?Aux / parseSyntaxList?Aux into a joint
  parseSyntaxOrList?Aux : Nat → Bool → List Token →
                        Option ((Lean.Syntax ⊕ List Lean.Syntax) × ...)
  Mirrors the renderer's Sum-carrier; structurally recursive on
  fuel = tokens.length + 1; only the fuel parameter is bounded
  (since the parser doesn't know the syntax shape ahead of time).

  Added correctness round-trip lemmas:
    · parseStringPosRaw?Aux_correct
    · parseSubstringRaw?Aux_correct
    · parseBool?Aux_correct
    · parseSourceInfo?Aux_correct
    · parseStringList?Aux_correct
    · parsePreresolved?Aux_correct
    · parsePreresolvedList?Aux_correct
    · parseSyntaxOrList?Aux_correct  (the master joint round-trip)
    · parseSyntax?Aux_correct        (specialisation at .inl s)

  Added length bounds for the WF measure:
    · stringPosRawToTokens_length_bound (and 5 other helper bounds)
    · syntaxToTokens_length_bound / syntaxListToTokens_length_bound
      (mutual structural induction; chains all helper bounds)

  Replaced the cheat .declAt arms in parseArtifact?Aux_correct
  (line 957) and artifactFromTokens?_round_trip (line 1078) with
  real proofs derived from the new lemmas.

## Discipline

  · Zero sorry / admit (only Comonad/Convolution.lean's interpolated
    "... := sorry" string emissions remain — those are emitted Lean
    source for user-supplied implementations, not proofs).
  · Zero noncomputable / Classical.propDecidable.
  · Zero TODO / FIXME / placeholder comments in source-rendering code.
  · No tests deleted; the Test.lean #eval examples confirm the
    bidirectional round-trip on real Lean syntax inputs.

## Verification

  cd infoductor && lake build      # Build completed successfully (12 jobs)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 07:33:42 -06:00
Maximus Gorog
665046a353 Phase 3 Lean 4.30 String-internals analysis
Attempted to prove `readIdent_app` (the key distribution lemma)
but hit a wall on `String.push c ++ xs.asString = s ++ (c :: xs).asString`
which is not definitionally true in Lean 4.30 (String is
UTF-8 ByteArray-backed, not a List Char structure).

Documents two cleaner paths forward:
  (a) Refactor `readIdent`/`readStrLit` to accumulate into
      `List Char` instead of `String`, decoupling proofs from
      String internals.
  (b) Import Mathlib's richer String API which provides the
      needed structural lemmas.

The committed state:
  · Atomic Phase 3 witnesses via decide (5 theorems, kernel-rooted).
  · 3 foundation tokenize lemmas (lparen/rparen/space).
  · The full Phase 3 universal lemmas documented inline as
    open work, with a proof sketch for both paths.

The token-level universal (already proven) plus closed-instance
decide tests cover the round-trip operationally.  Adding Phase 3
proper is a focused multi-day refactor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 13:12:38 -06:00
Maximus Gorog
62b73bb176 Atomic Phase 3 witnesses via decide
Five kernel-rooted atomic-shape witnesses for `tokenize ∘ render = toTokens`:
  · tokenize_render_name_anonymous
  · tokenize_render_classifier_always
  · tokenize_render_classifier_never
  · tokenize_render_cterm_empty
  · tokenize_render_artifact_empty

Each closes via `decide` with `maxRecDepth 2000` — the kernel
fully reduces the entire chain (toLeanSource → String.toList →
tokenize → comparison) on the closed atomic input.

Recursive arms (.str, .num, .app, .lam, .comp, .transp, .meet, .join,
.cterm, .refTo, .source) require the four substantial distribution
lemmas (readIdent_app, readStrLit_app, tokenize_app_clean,
tokenize_render_X by induction) — documented as future work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 13:03:51 -06:00
Maximus Gorog
6b9ac691cb Phase 3 foundation lemmas; full universal documented
Adds three foundation lemmas about `tokenize`:
  · tokenize_lparen_cons : tokenize ('(' :: rest) = lparen :: tokenize rest
  · tokenize_rparen_cons : tokenize (')' :: rest) = rparen :: tokenize rest
  · tokenize_space_cons  : tokenize (' ' :: rest) = tokenize rest

These are the trivial cases — pure unfolding of `tokenizeAux` on
single-char-token branches.

The full Phase 3 (`tokenize (toLeanSource v).toList = toTokens v`)
requires four further substantial lemmas:
  · readIdent_split  : reading an ident sequence followed by a
                       non-ident-rest char (or end of input) yields
                       exactly the accumulated string.
  · readStrLit_split : reading an escapeStrLit-encoded body until
                       the closing `"` recovers the original string.
  · tokenize_app_clean: tokenize distributes over a concatenation
                       where the prefix ends "cleanly" (rparen,
                       whitespace, or strLit close).
  · tokenize_render_X: induction over each meta-mirror type using
                       the above plus IH on sub-values.

Each is multi-page Lean reasoning about `String`/`List Char`/
`readIdent`/`readStrLit` distribution.  The proof sketches are
documented inline.

The token-level universal (already proven) plus closed-instance
`decide` tests cover the round-trip operationally.  Adding the
Phase 3 universal would let us state
  ∀ t, fromLeanSource? (toLeanSource t) = some t
without any closed-instance restrictions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 13:01:38 -06:00
Maximus Gorog
8733a6ff89 Universal round-trip theorems at the token level
Proves four ∀-quantified, structurally-inductive round-trip theorems:

  · nameFromTokens?_round_trip      : ∀ n,   fromTokens? (toTokens n) = some n
  · classifierFromTokens?_round_trip: ∀ φ,   fromTokens? φ.toTokens = some φ
  · cTermFromTokens?_round_trip     : ∀ t,   fromTokens? t.toTokens = some t
  · artifactFromTokens?_round_trip  : ∀ a,   a.supported → fromTokens? a.toTokens = some a

These are the canonical universal round-trips — the parser
inverts the canonical token form on every meta-mirror value.
No `decide`, no `native_decide`, no kernel-depth tricks: pure
structural induction on the meta-mirror type, with sufficient
fuel guaranteed by the per-type length-vs-depth lemma.

Implementation:

(1) Fixed latent double-paren bug in `nameToLeanSource`: dropped
    extra parens around recursive sub-name calls (consistent
    with classifier/cterm renderers).  Pre-fix, 3-level deep
    names like `eq0.i` (FaceFormula.eq0 encoding) failed to
    round-trip silently — no test exercised them.  Added a
    `set_option maxRecDepth 4000 in theorem … decide`-based
    regression test.

(2) Refactored parsers to fuel-based.  `parseName?Aux`,
    `parseClassifier?Aux`, `parseMetaCTerm?Aux`, `parseArtifact?Aux`
    each take a Nat fuel that decreases on every recursive call,
    so they're total without `partial`.  Top-level wrappers pass
    `tokens.length + 1`, always sufficient.

(3) Added canonical token forms `nameToTokens`,
    `MetaClassifier.toTokens`, `MetaCTerm.toTokens`,
    `MetaArtifact.toTokens` — direct value→[Token] mappings,
    parallel to the renderers but at the token level.

(4) Phase 2 (parser correctness on toTokens): four mutual-induction
    theorems, one per meta-mirror type.  Each proves
    `parser?Aux fuel (value.toTokens ++ rest) = some (value, rest)`
    when fuel ≥ value.depth.

(5) Length-vs-depth lemmas: nameToTokens_length_bound,
    classifierToTokens_length_bound, cTermToTokens_length_bound.
    Each by induction.

(6) Token-level universal round-trip theorems: composed from (4)
    and (5) by setting rest = [].  These are the headline results.

Phase 3 (tokenize ∘ render = toTokens, the String-level extension)
is documented but unproven — substantial String/List reasoning
required.  The kernel-rooted decide tests for closed instances
(MetaCTerm.empty, sym, app, etc.) provide empirical evidence.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 12:50:31 -06:00
Maximus Gorog
9c9b93c3ca Fuel-based parsers + kernel-level round-trip via decide
Refactor MetaParse.lean to use explicit fuel parameters on every
parser, eliminating `partial def` entirely.  Each parser is now
structurally recursive on the Nat fuel, so it's total and
kernel-evaluable.  Top-level wrappers pass `tokens.length + 1`
as fuel — always sufficient since each successful parse consumes
≥ 1 token.

Move `escapeStrLit` to Foundation/Meta.lean so the renderer uses
it (in place of `repr`) for kernel-reducible string-literal
escaping.  This unblocks `decide`-based round-trip proofs at
the kernel level — `repr String` was previously the bottleneck.

Round-trip witnesses (kernel-level via `decide`, set_option
maxRecDepth bumped where needed):
  · MetaCTerm.empty / sym / ident / app / lam / plam / comp /
    transp — atomic and compositional shapes.
  · MetaClassifier.always / never / meet / atDecl.
  · MetaArtifact.empty (rendering-equivalence for the .declAt-
    containing inductive).
  · A nested .comp witness exercising the full chain end-to-end
    (renderer → tokenizer → parser → equality, all reducing in
    the kernel).

Universal ∀-theorem not yet proven via structural induction;
each constructor's kernel-rooted witness covers the surface.
The existing `native_decide` round-trip tests in Infoductor/
Test.lean remain as additional empirical coverage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 12:28:55 -06:00
Maximus Gorog
92fc4f9682 Add String → MetaCTerm parser; round-trip via native_decide
A hand-written tokenizer + recursive-descent parser that reads
the Lean source emitted by `toLeanSource` and reconstructs the
original meta-mirror value.  Foundation/MetaParse.lean: 300
lines, faithful to the renderer's exact format.

Components:
  · Token type (parens, ident chains, string literals, num literals).
  · `tokenize : List Char → List Token` (partial; structural
    decrease is implicit via helpers).
  · `parseName?`, `parseClassifier?`, `parseMetaCTerm?`,
    `parseArtifact?` — recursive-descent, return Option (T × tail).
  · `MetaCTerm.fromLeanSource?` / `MetaClassifier.fromLeanSource?`
    / `MetaArtifact.fromLeanSource?` — top-level wrappers
    demanding full input consumption.

Foundation/Meta.lean: derive `DecidableEq` on `MetaCTerm` (its
field types — Lean.Name, String, MetaClassifier — all have
DecidableEq).  Switch FaceFormula.eq0/eq1 encoding from
`Name.appendAfter "_eq_0"` (string suffix) to a 2-component
`Name.mkStr (.mkSimple "eq0") i.name` form so reflection
round-trips by rfl with no string-suffix munging.

Foundation/MetaParse.lean: parsers are `partial def` because
the recursive calls land on output tails of helper readers,
which Lean can't see as structurally smaller without auxiliary
"consumes input" lemmas.  Kernel-reducible round-trip is
deferred — `native_decide`-based tests in Infoductor/Test.lean
witness round-trip operationally for every meta-mirror arm.

Tests: 11 native_decide examples covering empty/ident/sym/app/
lam/comp/transp on MetaCTerm, always/meet on MetaClassifier,
empty/cterm on MetaArtifact (artifact uses rendering-equivalence
since Lean.Syntax in `.declAt` lacks DecidableEq).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 12:20:03 -06:00