infoductor/Infoductor/Foundation/Meta.lean
Maximus Gorog 8733a6ff89 Universal round-trip theorems at the token level
Proves four ∀-quantified, structurally-inductive round-trip theorems:

  · nameFromTokens?_round_trip      : ∀ n,   fromTokens? (toTokens n) = some n
  · classifierFromTokens?_round_trip: ∀ φ,   fromTokens? φ.toTokens = some φ
  · cTermFromTokens?_round_trip     : ∀ t,   fromTokens? t.toTokens = some t
  · artifactFromTokens?_round_trip  : ∀ a,   a.supported → fromTokens? a.toTokens = some a

These are the canonical universal round-trips — the parser
inverts the canonical token form on every meta-mirror value.
No `decide`, no `native_decide`, no kernel-depth tricks: pure
structural induction on the meta-mirror type, with sufficient
fuel guaranteed by the per-type length-vs-depth lemma.

Implementation:

(1) Fixed latent double-paren bug in `nameToLeanSource`: dropped
    extra parens around recursive sub-name calls (consistent
    with classifier/cterm renderers).  Pre-fix, 3-level deep
    names like `eq0.i` (FaceFormula.eq0 encoding) failed to
    round-trip silently — no test exercised them.  Added a
    `set_option maxRecDepth 4000 in theorem … decide`-based
    regression test.

(2) Refactored parsers to fuel-based.  `parseName?Aux`,
    `parseClassifier?Aux`, `parseMetaCTerm?Aux`, `parseArtifact?Aux`
    each take a Nat fuel that decreases on every recursive call,
    so they're total without `partial`.  Top-level wrappers pass
    `tokens.length + 1`, always sufficient.

(3) Added canonical token forms `nameToTokens`,
    `MetaClassifier.toTokens`, `MetaCTerm.toTokens`,
    `MetaArtifact.toTokens` — direct value→[Token] mappings,
    parallel to the renderers but at the token level.

(4) Phase 2 (parser correctness on toTokens): four mutual-induction
    theorems, one per meta-mirror type.  Each proves
    `parser?Aux fuel (value.toTokens ++ rest) = some (value, rest)`
    when fuel ≥ value.depth.

(5) Length-vs-depth lemmas: nameToTokens_length_bound,
    classifierToTokens_length_bound, cTermToTokens_length_bound.
    Each by induction.

(6) Token-level universal round-trip theorems: composed from (4)
    and (5) by setting rest = [].  These are the headline results.

Phase 3 (tokenize ∘ render = toTokens, the String-level extension)
is documented but unproven — substantial String/List reasoning
required.  The kernel-rooted decide tests for closed instances
(MetaCTerm.empty, sym, app, etc.) provide empirical evidence.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 12:50:31 -06:00

425 lines
20 KiB
Text
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

/-
Infoductor.Foundation.Meta — meta-mirror types
=================================================
Phase A of `docs/ALGEBRA_PLAN.md`. Defines the meta-level
vocabulary on which the universal `restructure` macro is built —
the meta-mirror of the cubical AST (`CType`, `FaceFormula`,
`CTerm`).
These types are pure data; no semantic content is committed in
this module. Their semantics — how `restructure` consumes them
to emit `MakeEditLinkProps.ofReplaceRange` calls in the `Edit`
monad — lives in `Algebra/Restructure.lean`.
| Cubical-AST type | Meta-mirror | Role |
|------------------|-------------|--------------------------------|
| `CType` | `MetaCType` | meta-types of source artifacts |
| `FaceFormula` | `MetaClassifier` | "where in the codebase" |
| `CTerm` | `MetaArtifact` | the new content / fallback |
The point: every restructuring operation has the *same five
fields* as `comp i A φ u t`, with each field promoted from the
cubical CTerm world to the meta-Lean-source world.
See `docs/ALGEBRA_PLAN.md` §2.2 for the formal table.
-/
import Lean.Syntax
import Lean.Data.Name
namespace Infoductor
-- ── MetaCType — the meta-mirror of `CType` ──────────────────────────────────
-- Every artifact in a Lean source has a meta-type that classifies its
-- structural role. These are the "cubical-types" of the source-code
-- universe: a theorem is an artifact whose meta-type is `theorem`,
-- a `def` is an artifact whose meta-type is `definition`, etc.
/-- Meta-mirror of `CType`: classifies the structural role of an
artifact in Lean source code.
Each constructor names a kind of declaration / structural unit
that `restructure` can operate on. The list is closed: extending
it requires updating the universal macro to handle the new
artifact kind.
See `docs/ALGEBRA_PLAN.md` §2.2 for the design rationale. -/
inductive MetaCType where
/-- A `theorem` declaration: `theorem foo : T := proof`. -/
| theorem_ : MetaCType
/-- A `def` declaration: `def foo := body`. -/
| definition : MetaCType
/-- An `instance` declaration. -/
| instance_ : MetaCType
/-- A `structure` declaration. -/
| structure_ : MetaCType
/-- A Lean `inductive` declaration. -/
| inductive_ : MetaCType
/-- An entire file. -/
| file : MetaCType
/-- A `namespace` block. -/
| namespace_ : MetaCType
/-- A `class` declaration. -/
| class_ : MetaCType
/-- A logical "set of classifiers" — abstract collection used by
methodology dispatch. -/
| classifierSet : MetaCType
/-- An edge in the dependency graph between declarations. Used
by `transp`-along-MetaPath to identify what gets reorganised
together. -/
| dependencyEdge : MetaCType
/-- A custom-attribute annotation site (e.g., the position of a
`@[simp]` or `@[methodology]` annotation on a declaration). -/
| attributeSite : MetaCType
deriving Repr, Inhabited, DecidableEq
-- ── MetaClassifier — the meta-mirror of `FaceFormula` ──────────────────────
-- "Where in the codebase does this restructuring apply?" The face
-- lattice on dimensions has its analogue at the meta level: face = a
-- predicate over dimension-coordinates picking out a sub-cube; meta-
-- classifier = a predicate over codebase-coordinates picking out a
-- sub-region.
/-- Meta-mirror of `FaceFormula`: a predicate over codebase
positions. Combined via `meet` and `join` to form composite
"where in the codebase" predicates.
The lattice structure mirrors the cubical face lattice:
`meet` is intersection, `join` is union; `always` is the top
(`.top` analogue), `never` is the bottom (`.bot` analogue). -/
inductive MetaClassifier where
/-- "Everywhere" — the top of the meta-classifier lattice;
analogue of `FaceFormula.top`. -/
| always : MetaClassifier
/-- "Nowhere" — the bottom; analogue of `FaceFormula.bot`. -/
| never : MetaClassifier
/-- "At this declaration"; analogue of `FaceFormula.eq0`/`eq1`
on a specific dim coordinate. -/
| atDecl : Lean.Name → MetaClassifier
/-- "In this file." -/
| inFile : String → MetaClassifier
/-- "Under this attribute" — anywhere a particular attribute is
attached. -/
| underAttribute : Lean.Name → MetaClassifier
/-- "In any declaration that depends on this one." -/
| dependencyOf : Lean.Name → MetaClassifier
/-- "In the same namespace as this name." -/
| inNamespace : Lean.Name → MetaClassifier
/-- Conjunction; analogue of `FaceFormula.meet`. -/
| meet : MetaClassifier → MetaClassifier → MetaClassifier
/-- Disjunction; analogue of `FaceFormula.join`. -/
| join : MetaClassifier → MetaClassifier → MetaClassifier
deriving Repr, Inhabited
-- DecidableEq for MetaClassifier — manual mutual decision because
-- the type is recursive (meet/join arms) and mixes Lean.Name / String
-- carriers.
def MetaClassifier.decEq : (a b : MetaClassifier) → Decidable (a = b)
| .always, .always => isTrue rfl
| .never, .never => isTrue rfl
| .atDecl n, .atDecl m =>
if h : n = m then isTrue (by rw [h])
else isFalse (fun heq => h (by cases heq; rfl))
| .inFile s, .inFile t =>
if h : s = t then isTrue (by rw [h])
else isFalse (fun heq => h (by cases heq; rfl))
| .underAttribute n, .underAttribute m =>
if h : n = m then isTrue (by rw [h])
else isFalse (fun heq => h (by cases heq; rfl))
| .dependencyOf n, .dependencyOf m =>
if h : n = m then isTrue (by rw [h])
else isFalse (fun heq => h (by cases heq; rfl))
| .inNamespace n, .inNamespace m =>
if h : n = m then isTrue (by rw [h])
else isFalse (fun heq => h (by cases heq; rfl))
| .meet a₁ b₁, .meet a₂ b₂ =>
match MetaClassifier.decEq a₁ a₂, MetaClassifier.decEq b₁ b₂ with
| isTrue ha, isTrue hb => isTrue (by rw [ha, hb])
| isFalse h, _ => isFalse (fun heq => h (by cases heq; rfl))
| _, isFalse h => isFalse (fun heq => h (by cases heq; rfl))
| .join a₁ b₁, .join a₂ b₂ =>
match MetaClassifier.decEq a₁ a₂, MetaClassifier.decEq b₁ b₂ with
| isTrue ha, isTrue hb => isTrue (by rw [ha, hb])
| isFalse h, _ => isFalse (fun heq => h (by cases heq; rfl))
| _, isFalse h => isFalse (fun heq => h (by cases heq; rfl))
| .always, .never | .always, .atDecl _ | .always, .inFile _
| .always, .underAttribute _ | .always, .dependencyOf _
| .always, .inNamespace _ | .always, .meet _ _ | .always, .join _ _
| .never, .always | .never, .atDecl _ | .never, .inFile _
| .never, .underAttribute _ | .never, .dependencyOf _
| .never, .inNamespace _ | .never, .meet _ _ | .never, .join _ _
| .atDecl _, .always | .atDecl _, .never | .atDecl _, .inFile _
| .atDecl _, .underAttribute _ | .atDecl _, .dependencyOf _
| .atDecl _, .inNamespace _ | .atDecl _, .meet _ _ | .atDecl _, .join _ _
| .inFile _, .always | .inFile _, .never | .inFile _, .atDecl _
| .inFile _, .underAttribute _ | .inFile _, .dependencyOf _
| .inFile _, .inNamespace _ | .inFile _, .meet _ _ | .inFile _, .join _ _
| .underAttribute _, .always | .underAttribute _, .never
| .underAttribute _, .atDecl _ | .underAttribute _, .inFile _
| .underAttribute _, .dependencyOf _ | .underAttribute _, .inNamespace _
| .underAttribute _, .meet _ _ | .underAttribute _, .join _ _
| .dependencyOf _, .always | .dependencyOf _, .never
| .dependencyOf _, .atDecl _ | .dependencyOf _, .inFile _
| .dependencyOf _, .underAttribute _ | .dependencyOf _, .inNamespace _
| .dependencyOf _, .meet _ _ | .dependencyOf _, .join _ _
| .inNamespace _, .always | .inNamespace _, .never
| .inNamespace _, .atDecl _ | .inNamespace _, .inFile _
| .inNamespace _, .underAttribute _ | .inNamespace _, .dependencyOf _
| .inNamespace _, .meet _ _ | .inNamespace _, .join _ _
| .meet _ _, .always | .meet _ _, .never | .meet _ _, .atDecl _
| .meet _ _, .inFile _ | .meet _ _, .underAttribute _
| .meet _ _, .dependencyOf _ | .meet _ _, .inNamespace _
| .meet _ _, .join _ _
| .join _ _, .always | .join _ _, .never | .join _ _, .atDecl _
| .join _ _, .inFile _ | .join _ _, .underAttribute _
| .join _ _, .dependencyOf _ | .join _ _, .inNamespace _
| .join _ _, .meet _ _ =>
isFalse (fun heq => by cases heq)
instance : DecidableEq MetaClassifier := MetaClassifier.decEq
-- ── MetaCTerm — the structural meta-mirror of `CTerm` ──────────────────────
-- A faithful AST-level mirror of the cubical term language. Cubical
-- `CTerm` constructors map onto `MetaCTerm` constructors structurally,
-- so embedding round-trips through a partial inverse (`toCTerm?`).
-- See `Embed.lean` in the cubical bridge for the concrete maps and the
-- coreflection theorems.
--
-- Generative basis (8 constructors):
-- · `ident n` — closed reference to a Lean declaration.
-- · `sym s` — open leaf (string-named local or dim-binder).
-- · `app f a` — universal application; encodes `papp`, `fst`,
-- `snd`, `pair`, `glueIn`, `unglue`, schema
-- ctors, and eliminators via `.ident`-headed
-- chains.
-- · `lam x t` — term-level lambda (binds `x`).
-- · `plam i t` — path-level lambda (binds dim-name `i`).
-- · `comp i A φ u t` — five-field composition, mirroring `compⁱ A φ u t`.
-- · `transp i A φ t` — four-field transport, mirroring `transpⁱ A φ t`.
-- · `empty` — placeholder for deletion / missing content.
--
-- The `comp` and `transp` arms keep their face-formula slot as a
-- `MetaClassifier` (rather than a nested `MetaCTerm` encoding) so that
-- classifier-at-position semantics flow through unchanged from
-- `Edit.lean`.
inductive MetaCTerm where
/-- A reference to a Lean declaration — the closed leaf shape. -/
| ident : Lean.Name → MetaCTerm
/-- A symbolic name — open leaf (string-named local variable or
dim-binder). Distinct from `ident` so the round-trip through
`CTerm.var` preserves the original string. -/
| sym : String → MetaCTerm
/-- Application — the universal combinator. `papp`, `fst`, `snd`,
`pair`, schema ctors, and eliminators all encode as `.ident`-
headed `.app` chains. -/
| app : MetaCTerm → MetaCTerm → MetaCTerm
/-- Term-level lambda — binds `x` in the body. -/
| lam : String → MetaCTerm → MetaCTerm
/-- Path-level lambda — binds a dim-name `i`. Distinct from `lam`
because dim-binders inhabit a different kind of universe; the
structural distinction is preserved here for downstream
analyses that care. -/
| plam : String → MetaCTerm → MetaCTerm
/-- Five-field composition — mirrors `compⁱ A φ u t`. The dim-
binder is stored as a string (the `DimVar.name`); the body and
the witness are `MetaCTerm`s; the face is a `MetaClassifier`. -/
| comp : String → MetaCTerm → MetaClassifier
→ MetaCTerm → MetaCTerm → MetaCTerm
/-- Four-field transport — mirrors `transpⁱ A φ t`. -/
| transp : String → MetaCTerm → MetaClassifier
→ MetaCTerm → MetaCTerm
/-- The empty meta-term — placeholder for deletion or missing
content. Distinct from `MetaArtifact.empty` (which is at the
artifact level). -/
| empty : MetaCTerm
deriving Repr, DecidableEq
instance : Inhabited MetaCTerm := ⟨MetaCTerm.empty⟩
-- ── MetaArtifact — the meta-mirror of `CTerm` ───────────────────────────────
-- The "content" half of restructure. An artifact is what gets put in
-- the source: a chunk of raw text, a parsed syntax tree, a reference
-- to another decl, or "remove this" (the empty artifact).
/-- Meta-mirror of `CTerm`: the content placed at a meta-position by a
restructuring operation.
Four shapes: raw source text (string), parsed syntax (Lean.Syntax),
a reference to an existing decl by `Name`, and the empty artifact
(used for deletion).
`Lean.Syntax` is opaque-by-default in Lean 4 — its internal
structure is large. We don't derive `DecidableEq` on `MetaArtifact`
because comparing arbitrary Syntax trees structurally is heavy and
not needed by `restructure`'s dispatch logic. Use `MetaArtifact.toString`
for printable comparisons. -/
inductive MetaArtifact where
/-- Raw Lean source text (will be parsed at apply time). -/
| source : String → MetaArtifact
/-- A parsed Lean syntax tree. -/
| declAt : Lean.Syntax → MetaArtifact
/-- A structural meta-term — the AST-level representation of a
cubical-flavoured artifact. Preserves the cubical shape so
downstream tooling can decompose / reflect it without parsing.
The structural mirror is a real coreflection: every cubical
`CTerm` embeds losslessly via `CTerm.toMetaCTerm` (in the
cubical bridge) and reflects back via `MetaCTerm.toCTerm?`. -/
| cterm : MetaCTerm → MetaArtifact
/-- A reference to an existing declaration by name; resolved at
apply time. -/
| refTo : Lean.Name → MetaArtifact
/-- The empty artifact — "remove this position." -/
| empty : MetaArtifact
deriving Inhabited
/-- A printable summary of an artifact, useful for diagnostics and
widget rendering. -/
def MetaArtifact.toString : MetaArtifact → String
| .source s => s!"source({s})"
| .declAt _ => "declAt(<syntax>)"
| .cterm t => s!"cterm({repr t})"
| .refTo n => s!"refTo({n})"
| .empty => "empty"
instance : ToString MetaArtifact := ⟨MetaArtifact.toString⟩
-- ── Source rendering — meta-mirrors → valid Lean source ─────────────────────
-- The poetic move: each meta-mirror value renders to a Lean
-- expression that, when elaborated, reconstructs the *same* value.
-- This closes the bridge's loop at the source level: an `Edit`'s
-- `.cterm` content can be written to a `.lean` file and parsed
-- back via Lean's own parser/elaborator without us having to
-- write a parser. The renderer is fully recursive over the
-- meta-mirror's structure, faithful on every constructor.
/-- Escape a `String` as a Lean string literal (with surrounding
quotes and `"` / `\` escaped). Kernel-reducible alternative
to `repr` for strings — built from `String.toList` /
`String.join` / `Char.toString`, all of which the kernel
unfolds. Inverse of the parser's `readStrLit` on the inner
body. -/
def escapeStrLit (s : String) : String :=
let body := String.join (s.toList.map (fun c =>
match c with
| '"' => "\\\""
| '\\' => "\\\\"
| c => c.toString))
"\"" ++ body ++ "\""
/-- Render a `Lean.Name` as Lean source that reconstructs it via
the `Lean.Name` constructors. Faithful on `.anonymous`,
`.str`, and `.num` arms — the `.str` arm uses `escapeStrLit`
for kernel-reducible string-literal escaping.
Local helper inside `namespace Infoductor` (full name
`Infoductor.nameToLeanSource`) to avoid shadowing the global
`Lean` library by a `def Lean.Name.…` here.
Recursive sub-calls render *without* extra parens — the
`.str`/`.num` arms each contribute their own outer
parentheses, so wrapping the recursive call would produce
spurious double-parens at depths ≥ 2. Consistent with
`MetaClassifier.toLeanSource` and `MetaCTerm.toLeanSource`,
which also don't double-wrap. -/
def nameToLeanSource : Lean.Name → String
| .anonymous => "Lean.Name.anonymous"
| .str p s =>
s!"(Lean.Name.str {nameToLeanSource p} {escapeStrLit s})"
| .num p n =>
s!"(Lean.Name.num {nameToLeanSource p} {n})"
/-- Render a `MetaClassifier` as Lean source. Each lattice arm
becomes a constructor call in the `Infoductor.MetaClassifier`
namespace; recursive arms (`meet`, `join`) render their
children parenthesised. -/
def MetaClassifier.toLeanSource : MetaClassifier → String
| .always => "Infoductor.MetaClassifier.always"
| .never => "Infoductor.MetaClassifier.never"
| .atDecl n =>
s!"(Infoductor.MetaClassifier.atDecl {nameToLeanSource n})"
| .inFile s =>
s!"(Infoductor.MetaClassifier.inFile {escapeStrLit s})"
| .underAttribute n =>
s!"(Infoductor.MetaClassifier.underAttribute {nameToLeanSource n})"
| .dependencyOf n =>
s!"(Infoductor.MetaClassifier.dependencyOf {nameToLeanSource n})"
| .inNamespace n =>
s!"(Infoductor.MetaClassifier.inNamespace {nameToLeanSource n})"
| .meet a b =>
s!"(Infoductor.MetaClassifier.meet {toLeanSource a} {toLeanSource b})"
| .join a b =>
s!"(Infoductor.MetaClassifier.join {toLeanSource a} {toLeanSource b})"
/-- Render a `MetaCTerm` as Lean source that reconstructs it via
the `Infoductor.MetaCTerm` constructors. Every arm renders
to a parenthesised constructor call; sub-`MetaCTerm`s recurse,
sub-`MetaClassifier`s use `MetaClassifier.toLeanSource`,
sub-`Lean.Name`s use `nameToLeanSource`. -/
def MetaCTerm.toLeanSource : MetaCTerm → String
| .ident n =>
s!"(Infoductor.MetaCTerm.ident {nameToLeanSource n})"
| .sym s =>
s!"(Infoductor.MetaCTerm.sym {escapeStrLit s})"
| .app f a =>
s!"(Infoductor.MetaCTerm.app {toLeanSource f} {toLeanSource a})"
| .lam x t =>
s!"(Infoductor.MetaCTerm.lam {escapeStrLit x} {toLeanSource t})"
| .plam i t =>
s!"(Infoductor.MetaCTerm.plam {escapeStrLit i} {toLeanSource t})"
| .comp s A φ u t =>
s!"(Infoductor.MetaCTerm.comp {escapeStrLit s} {toLeanSource A} \
{MetaClassifier.toLeanSource φ} {toLeanSource u} {toLeanSource t})"
| .transp s A φ t =>
s!"(Infoductor.MetaCTerm.transp {escapeStrLit s} {toLeanSource A} \
{MetaClassifier.toLeanSource φ} {toLeanSource t})"
| .empty => "Infoductor.MetaCTerm.empty"
/-- Render a `MetaArtifact` as Lean source. The structural arm
(`cterm`) recurses through `MetaCTerm.toLeanSource`; `source`
arms wrap the raw string in a `.source` constructor; `declAt`
cannot be source-rendered (parsed Syntax is opaque) and
yields a placeholder. -/
def MetaArtifact.toLeanSource : MetaArtifact → String
| .source s => s!"(Infoductor.MetaArtifact.source {escapeStrLit s})"
| .declAt _ => "/- declAt with opaque Syntax — not source-renderable -/"
| .cterm m =>
s!"(Infoductor.MetaArtifact.cterm {MetaCTerm.toLeanSource m})"
| .refTo n =>
s!"(Infoductor.MetaArtifact.refTo {nameToLeanSource n})"
| .empty => "Infoductor.MetaArtifact.empty"
-- ── MetaPosition — where an artifact lives in source ───────────────────────
-- The first field of `restructure i (Context = MetaCType) φ witness fallback`
-- — analogue of the dim-binder `i` in `comp i A φ u t`.
/-- The position of an artifact within a Lean source file or
project. Used as the `MetaPosition` argument of `restructure`,
analogous to the dim-binder `i` of `comp`.
The `binder` field carries the dim-binder identity when the
position represents a comp-style operation; `none` for
non-cubical positions. This is the meta-mirror of the dim-
binder slot in `comp i A φ u t` — kept as a position-level
field rather than folded into the classifier so the binder
role is preserved structurally. -/
structure MetaPosition where
/-- The fully qualified declaration name (or namespace) the
position lives under. `Name.anonymous` for file-level. -/
declName : Lean.Name
/-- Path to the source file (or empty for in-memory). -/
filePath : String
/-- Optional byte offset / range info; `none` if positional via
declName alone is sufficient. -/
range : Option (Nat × Nat)
/-- Dim-binder identity for cubical-shaped positions. `none` for
non-cubical or unbound positions. Preserves the comp-style
binder slot structurally. -/
binder : Option Lean.Name := none
deriving Inhabited
instance : Repr MetaPosition where
reprPrec p _ := s!"MetaPosition(declName={p.declName}, filePath={p.filePath}, range={repr p.range}, binder={repr p.binder})"
end Infoductor