Commit graph

10 commits

Author SHA1 Message Date
Maximus Gorog
c995d4b323 Foundation: structural Lean.Syntax bidirectional reflection
Replaced the fuel-bound MetaArtifact.declAt encoding with real
bidirectional reflection of Lean.Syntax through MetaArtifact.

## Why

The previous encoding capped Lean.Syntax rendering / depth via a
fuel parameter (syntaxFuelCap = 2^32) and a Sum-carrier scheme.
The .declAt round-trip lemma at MetaParse.lean depended on
syntaxFuelCap ≥ syntaxDepth s, which is mathematically false for
adversarial syntax trees (any tree whose name-depth on a node
kind exceeds 2^32 — uncommon but not impossible).  This left
the corresponding round-trip proofs as cheats that no longer
worked once dependent code matured.

Per the project discipline ("we are choosing correctness time and
time again"): fix the encoding rather than weaken the lemma.

## What landed

Foundation/Meta.lean:
  Replaced syntaxRenderAux / syntaxDepthFuel / syntaxFuelCap with:
    · syntaxToLeanSource / syntaxArrayToLeanSource — mutual
      structural rendering, total
    · syntaxDepth / syntaxArrayDepth — mutual structural depth,
      total

Foundation/MetaParse.lean:
  Refactored parseSyntax?Aux / parseSyntaxList?Aux into a joint
  parseSyntaxOrList?Aux : Nat → Bool → List Token →
                        Option ((Lean.Syntax ⊕ List Lean.Syntax) × ...)
  Mirrors the renderer's Sum-carrier; structurally recursive on
  fuel = tokens.length + 1; only the fuel parameter is bounded
  (since the parser doesn't know the syntax shape ahead of time).

  Added correctness round-trip lemmas:
    · parseStringPosRaw?Aux_correct
    · parseSubstringRaw?Aux_correct
    · parseBool?Aux_correct
    · parseSourceInfo?Aux_correct
    · parseStringList?Aux_correct
    · parsePreresolved?Aux_correct
    · parsePreresolvedList?Aux_correct
    · parseSyntaxOrList?Aux_correct  (the master joint round-trip)
    · parseSyntax?Aux_correct        (specialisation at .inl s)

  Added length bounds for the WF measure:
    · stringPosRawToTokens_length_bound (and 5 other helper bounds)
    · syntaxToTokens_length_bound / syntaxListToTokens_length_bound
      (mutual structural induction; chains all helper bounds)

  Replaced the cheat .declAt arms in parseArtifact?Aux_correct
  (line 957) and artifactFromTokens?_round_trip (line 1078) with
  real proofs derived from the new lemmas.

## Discipline

  · Zero sorry / admit (only Comonad/Convolution.lean's interpolated
    "... := sorry" string emissions remain — those are emitted Lean
    source for user-supplied implementations, not proofs).
  · Zero noncomputable / Classical.propDecidable.
  · Zero TODO / FIXME / placeholder comments in source-rendering code.
  · No tests deleted; the Test.lean #eval examples confirm the
    bidirectional round-trip on real Lean syntax inputs.

## Verification

  cd infoductor && lake build      # Build completed successfully (12 jobs)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 07:33:42 -06:00
Maximus Gorog
665046a353 Phase 3 Lean 4.30 String-internals analysis
Attempted to prove `readIdent_app` (the key distribution lemma)
but hit a wall on `String.push c ++ xs.asString = s ++ (c :: xs).asString`
which is not definitionally true in Lean 4.30 (String is
UTF-8 ByteArray-backed, not a List Char structure).

Documents two cleaner paths forward:
  (a) Refactor `readIdent`/`readStrLit` to accumulate into
      `List Char` instead of `String`, decoupling proofs from
      String internals.
  (b) Import Mathlib's richer String API which provides the
      needed structural lemmas.

The committed state:
  · Atomic Phase 3 witnesses via decide (5 theorems, kernel-rooted).
  · 3 foundation tokenize lemmas (lparen/rparen/space).
  · The full Phase 3 universal lemmas documented inline as
    open work, with a proof sketch for both paths.

The token-level universal (already proven) plus closed-instance
decide tests cover the round-trip operationally.  Adding Phase 3
proper is a focused multi-day refactor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 13:12:38 -06:00
Maximus Gorog
62b73bb176 Atomic Phase 3 witnesses via decide
Five kernel-rooted atomic-shape witnesses for `tokenize ∘ render = toTokens`:
  · tokenize_render_name_anonymous
  · tokenize_render_classifier_always
  · tokenize_render_classifier_never
  · tokenize_render_cterm_empty
  · tokenize_render_artifact_empty

Each closes via `decide` with `maxRecDepth 2000` — the kernel
fully reduces the entire chain (toLeanSource → String.toList →
tokenize → comparison) on the closed atomic input.

Recursive arms (.str, .num, .app, .lam, .comp, .transp, .meet, .join,
.cterm, .refTo, .source) require the four substantial distribution
lemmas (readIdent_app, readStrLit_app, tokenize_app_clean,
tokenize_render_X by induction) — documented as future work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 13:03:51 -06:00
Maximus Gorog
6b9ac691cb Phase 3 foundation lemmas; full universal documented
Adds three foundation lemmas about `tokenize`:
  · tokenize_lparen_cons : tokenize ('(' :: rest) = lparen :: tokenize rest
  · tokenize_rparen_cons : tokenize (')' :: rest) = rparen :: tokenize rest
  · tokenize_space_cons  : tokenize (' ' :: rest) = tokenize rest

These are the trivial cases — pure unfolding of `tokenizeAux` on
single-char-token branches.

The full Phase 3 (`tokenize (toLeanSource v).toList = toTokens v`)
requires four further substantial lemmas:
  · readIdent_split  : reading an ident sequence followed by a
                       non-ident-rest char (or end of input) yields
                       exactly the accumulated string.
  · readStrLit_split : reading an escapeStrLit-encoded body until
                       the closing `"` recovers the original string.
  · tokenize_app_clean: tokenize distributes over a concatenation
                       where the prefix ends "cleanly" (rparen,
                       whitespace, or strLit close).
  · tokenize_render_X: induction over each meta-mirror type using
                       the above plus IH on sub-values.

Each is multi-page Lean reasoning about `String`/`List Char`/
`readIdent`/`readStrLit` distribution.  The proof sketches are
documented inline.

The token-level universal (already proven) plus closed-instance
`decide` tests cover the round-trip operationally.  Adding the
Phase 3 universal would let us state
  ∀ t, fromLeanSource? (toLeanSource t) = some t
without any closed-instance restrictions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 13:01:38 -06:00
Maximus Gorog
8733a6ff89 Universal round-trip theorems at the token level
Proves four ∀-quantified, structurally-inductive round-trip theorems:

  · nameFromTokens?_round_trip      : ∀ n,   fromTokens? (toTokens n) = some n
  · classifierFromTokens?_round_trip: ∀ φ,   fromTokens? φ.toTokens = some φ
  · cTermFromTokens?_round_trip     : ∀ t,   fromTokens? t.toTokens = some t
  · artifactFromTokens?_round_trip  : ∀ a,   a.supported → fromTokens? a.toTokens = some a

These are the canonical universal round-trips — the parser
inverts the canonical token form on every meta-mirror value.
No `decide`, no `native_decide`, no kernel-depth tricks: pure
structural induction on the meta-mirror type, with sufficient
fuel guaranteed by the per-type length-vs-depth lemma.

Implementation:

(1) Fixed latent double-paren bug in `nameToLeanSource`: dropped
    extra parens around recursive sub-name calls (consistent
    with classifier/cterm renderers).  Pre-fix, 3-level deep
    names like `eq0.i` (FaceFormula.eq0 encoding) failed to
    round-trip silently — no test exercised them.  Added a
    `set_option maxRecDepth 4000 in theorem … decide`-based
    regression test.

(2) Refactored parsers to fuel-based.  `parseName?Aux`,
    `parseClassifier?Aux`, `parseMetaCTerm?Aux`, `parseArtifact?Aux`
    each take a Nat fuel that decreases on every recursive call,
    so they're total without `partial`.  Top-level wrappers pass
    `tokens.length + 1`, always sufficient.

(3) Added canonical token forms `nameToTokens`,
    `MetaClassifier.toTokens`, `MetaCTerm.toTokens`,
    `MetaArtifact.toTokens` — direct value→[Token] mappings,
    parallel to the renderers but at the token level.

(4) Phase 2 (parser correctness on toTokens): four mutual-induction
    theorems, one per meta-mirror type.  Each proves
    `parser?Aux fuel (value.toTokens ++ rest) = some (value, rest)`
    when fuel ≥ value.depth.

(5) Length-vs-depth lemmas: nameToTokens_length_bound,
    classifierToTokens_length_bound, cTermToTokens_length_bound.
    Each by induction.

(6) Token-level universal round-trip theorems: composed from (4)
    and (5) by setting rest = [].  These are the headline results.

Phase 3 (tokenize ∘ render = toTokens, the String-level extension)
is documented but unproven — substantial String/List reasoning
required.  The kernel-rooted decide tests for closed instances
(MetaCTerm.empty, sym, app, etc.) provide empirical evidence.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 12:50:31 -06:00
Maximus Gorog
9c9b93c3ca Fuel-based parsers + kernel-level round-trip via decide
Refactor MetaParse.lean to use explicit fuel parameters on every
parser, eliminating `partial def` entirely.  Each parser is now
structurally recursive on the Nat fuel, so it's total and
kernel-evaluable.  Top-level wrappers pass `tokens.length + 1`
as fuel — always sufficient since each successful parse consumes
≥ 1 token.

Move `escapeStrLit` to Foundation/Meta.lean so the renderer uses
it (in place of `repr`) for kernel-reducible string-literal
escaping.  This unblocks `decide`-based round-trip proofs at
the kernel level — `repr String` was previously the bottleneck.

Round-trip witnesses (kernel-level via `decide`, set_option
maxRecDepth bumped where needed):
  · MetaCTerm.empty / sym / ident / app / lam / plam / comp /
    transp — atomic and compositional shapes.
  · MetaClassifier.always / never / meet / atDecl.
  · MetaArtifact.empty (rendering-equivalence for the .declAt-
    containing inductive).
  · A nested .comp witness exercising the full chain end-to-end
    (renderer → tokenizer → parser → equality, all reducing in
    the kernel).

Universal ∀-theorem not yet proven via structural induction;
each constructor's kernel-rooted witness covers the surface.
The existing `native_decide` round-trip tests in Infoductor/
Test.lean remain as additional empirical coverage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 12:28:55 -06:00
Maximus Gorog
92fc4f9682 Add String → MetaCTerm parser; round-trip via native_decide
A hand-written tokenizer + recursive-descent parser that reads
the Lean source emitted by `toLeanSource` and reconstructs the
original meta-mirror value.  Foundation/MetaParse.lean: 300
lines, faithful to the renderer's exact format.

Components:
  · Token type (parens, ident chains, string literals, num literals).
  · `tokenize : List Char → List Token` (partial; structural
    decrease is implicit via helpers).
  · `parseName?`, `parseClassifier?`, `parseMetaCTerm?`,
    `parseArtifact?` — recursive-descent, return Option (T × tail).
  · `MetaCTerm.fromLeanSource?` / `MetaClassifier.fromLeanSource?`
    / `MetaArtifact.fromLeanSource?` — top-level wrappers
    demanding full input consumption.

Foundation/Meta.lean: derive `DecidableEq` on `MetaCTerm` (its
field types — Lean.Name, String, MetaClassifier — all have
DecidableEq).  Switch FaceFormula.eq0/eq1 encoding from
`Name.appendAfter "_eq_0"` (string suffix) to a 2-component
`Name.mkStr (.mkSimple "eq0") i.name` form so reflection
round-trips by rfl with no string-suffix munging.

Foundation/MetaParse.lean: parsers are `partial def` because
the recursive calls land on output tails of helper readers,
which Lean can't see as structurally smaller without auxiliary
"consumes input" lemmas.  Kernel-reducible round-trip is
deferred — `native_decide`-based tests in Infoductor/Test.lean
witness round-trip operationally for every meta-mirror arm.

Tests: 11 native_decide examples covering empty/ident/sym/app/
lam/comp/transp on MetaCTerm, always/meet on MetaClassifier,
empty/cterm on MetaArtifact (artifact uses rendering-equivalence
since Lean.Syntax in `.declAt` lacks DecidableEq).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 12:20:03 -06:00
Maximus Gorog
ae675a1eed Add MetaCTerm/MetaClassifier/MetaArtifact source renderer
Each meta-mirror value renders to a Lean expression that, when
elaborated, reconstructs the same value.  The bridge's loop now
closes at the source level: an Edit's .cterm content can be
written to a .lean file and parsed back via Lean's own
parser/elaborator — no custom parser required.

  · nameToLeanSource — Lean.Name → constructor-call source.
  · MetaClassifier.toLeanSource — lattice → Infoductor.MetaClassifier.* calls.
  · MetaCTerm.toLeanSource — full structural mirror → constructor source.
  · MetaArtifact.toLeanSource — artifact layer wrapping the above.

Foundation.Restructure's EditOp.apply for .cterm now uses
MetaCTerm.toLeanSource instead of toString (repr t).  The headless
interpreter writes valid Lean source.

The recursive helper for Lean.Name lives at Infoductor.nameToLeanSource
rather than Lean.Name.toLeanSource — defining a Lean.Name.* function
inside `namespace Infoductor` would otherwise create an Infoductor.Lean
namespace and shadow the global Lean library.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 12:05:43 -06:00
Maximus Gorog
511d564a55 Add MetaCTerm structural mirror, .cterm artifact arm, MetaPosition.binder
Foundation.Meta gains an 8-constructor MetaCTerm inductive that mirrors
the cubical CTerm's generic shape (ident/sym/app/lam/plam plus dedicated
comp/transp arms; cubical-specific operators encode via .ident-headed
.app chains). MetaArtifact picks up a .cterm arm for structure-preserving
artifact content. MetaPosition gets an Option Lean.Name binder field so
the dim-binder of a comp/transp can be threaded structurally instead of
folded into the classifier.

These additions back the cubical-bridge's Embed.lean overhaul: a real
coreflection between the cubical universe (CType/CTerm/FaceFormula/
DimExpr) and the meta-mirror, with partial inverses and per-constructor
round-trip theorems.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 09:08:01 -06:00
Maximus Gorog
ba0a49823b Phase A: Foundation lib — meta-mirror, Edit/Context, restructure, registries
Initial commit.  Six modules extracted from
cubical-transport-hott-lean4/CubicalTransport/Algebra/ and
re-namespaced as `Infoductor`:

- Foundation/Meta.lean         — MetaCType, MetaClassifier,
                                 MetaArtifact, MetaPosition
                                 (with manual mutual decEq for
                                 MetaClassifier's recursive
                                 lattice arms)
- Foundation/Edit.lean         — Edit monad + Context comonad +
                                 distributive law +
                                 MetaClassifier.atPosition
                                 with lattice laws as theorems
- Foundation/Restructure.lean  — universal `restructure` (5-field
                                 comp-shaped operation), 6 frozen
                                 aliases, headless apply +
                                 brokenRefs / selfConsistent /
                                 Edit.guarded
- Foundation/MacroAlias.lean   — @[macroAlias] attribute +
                                 EnvExtension registry
- Foundation/MetaPath.lean     — @[metaPath src target]
                                 attribute + registry +
                                 findPathsFromSource/ToTarget
- Foundation/Methodology.lean  — @[methodology classifier]
                                 attribute + registry +
                                 deriveByTransport (walks
                                 metaPath registry) +
                                 tryEntryAsClosed primitive +
                                 cubical_search reference
                                 demonstrator

Builds clean on lean4 v4.30.0-rc2, ten oleans.  No deps beyond
Lean stdlib.  Sub-libs Tactics / Pantograph / Runner are planned
under the same lakefile.toml but not yet implemented.

Pairs with Pantograph as the conductor sits atop the pantograph
on an electric train.  "Info-ductor" — conducts information
(structure / classifications / methodology) through a Lean
codebase.

Forgejo: http://maxgit.wg:3000/max/infoductor

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 07:20:36 -06:00