cubical-transport-hott-lean4/docs/NUMERICAL.md
Maximus Gorog d9f952fa6c
Some checks are pending
Lean Action CI / build (push) Waiting to run
Reorganize: move non-README docs into docs/ subfolder
Standard convention: README.md at root, everything else in docs/.

Engine docs/: FFI_DESIGN, FFI_COMPLETENESS, KERNEL_BOUNDARY, NUMERICAL,
PHASE1_HISTORY, ZIGZAG_PORT.  README.md links updated to docs/<name>.
Cross-repo reference in NUMERICAL.md (to topolei's STATUS.md) now
includes the relative path `../topolei/docs/STATUS.md`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 22:57:10 -06:00

21 KiB
Raw Permalink Blame History

Topolei — Numerical Implementation Principles

A reference for how numerical schemes, contracts, and realizations are structured so that mathematical content is never fused with execution context. Establishes the invariants a scheme must satisfy to be first-class in the registry. Read this before writing any numerical code in topolei — the separation discipline is the main thing that prevents construction faults downstream.


0. The Principle

A numerical implementation is a pair: a mathematical content (context-free, transport-stable, provably typed) and an execution context (precision regime, parallelism, memory layout, hardware calibration). These must be separated syntactically — by the FFI boundary — so they cannot leak into each other.

Corollary: any code that looks like numerics but mixes the two halves is not a scheme. It is an artifact that happens to produce numbers. Artifacts are allowed at the edge (e.g., calibration benchmarks), but they live outside the scheme registry.

The justification is operational, not aesthetic: numerical bugs that cost time to find are almost always context leaks — an fp16- specific stability assumption baked into a "generic" solver, a parallel-sum scheme that quietly depends on reduction order, a normalization step whose error budget is expressed in ULPs rather than in the target's structural norm. Keeping the two halves separated at the type level prevents these at the boundary, not after the fact in code review.


1. Mathematical Content

Definition. The mathematical content of a numerical scheme is everything that can be stated without reference to:

  • a specific floating-point format (fp16, fp32, fp64, bf16),
  • a specific reduction order or parallelism model,
  • a specific memory layout (row-major, column-major, blocked, strided),
  • a specific hardware cost (GFLOPS, bandwidth, latency),
  • a specific backend (CUDA, wgpu, CPU reference, vectorised SIMD),
  • a specific realization of its source/target types (dense array, sparse matrix, functional list, mesh data structure).

Positively: mathematical content is a CTerm with a HasType derivation, together with the propositions about its structural behavior (naturality, error bounds, Lipschitz constants, condition numbers). These propositions are stated as Path, Prop, or proposition-valued CTerms, never as tolerances-in-units.

Worked example. The mathematical content of "trapezoidal rule quadrature" is:

scheme : (A : CType) → (f : A → ) → (a b : A) → 
         + naturality witness w.r.t. affine reparametrisation of A
         + error bound: |scheme(f, a, b)  ∫ f| ≤ (ba)³/12 · sup |f″|
         + Lipschitz-in-f: |scheme(f, a, b)  scheme(g, a, b)|
                           ≤ (ba) · ‖f  g‖_∞

Not mathematical content: "trapezoidal with fp32 accumulator, SIMD- 4 wide, row-major input". That's a realization.


2. Execution Context

Definition. The execution context is the residual irreducible data that a realization must commit to in order to actually run. It has five axes, each of which is a separate concern:

Axis Examples
Precision regime fp16/fp32/fp64/bf16/posits/exact rationals
Parallelism model serial, SIMD-n, multi-threaded, SIMT/CUDA-like, distributed
Memory layout row/col-major, blocked, strided, compressed-sparse, mesh topology
Hardware cost calibration GFLOPS per backend, bandwidth per memory tier, wall-clock per abstract cost unit
Backend binding CPU reference, SPIR-V shader, WGSL shader, CUDA kernel, wgpu compute

Where it lives: entirely in the Rust FFI module and/or as Lean-side opaque types (NonemptyType wrappers around @[extern] declarations). The execution context never appears in a CTerm; the type system cannot mention fp16.

Why the split holds at the type level: CType has constructors univ | pi | path | glue — none of them can carry a precision label, a parallelism tag, or a hardware handle. The impossibility is structural, not by discipline. A scheme that needs to talk about precision must do so in the Rust side's ExecContext struct, and communicate with Lean only through the axiom interface.


3. The FFI Boundary as Firewall

The two halves are separated by a C ABI boundary. The boundary enforces three properties:

  1. Opacity. Execution-context types are opaque in Lean (opaque GPUContextPointer : NonemptyType), so Lean code cannot inspect them structurally.

  2. Axiomatic discharge. The Rust side provides @[extern] implementations for axiom-stated behaviors (@[extern "topolei_cubical_eval"] opaque cubicalEval : CEnv → CTerm → CVal). Each Rust function's behavior is specified by a Lean axiom it must satisfy (e.g., eval_var / eval_lam / … in CubicalTransport/Eval.lean, or readback_vneu / … in CubicalTransport/Readback.lean). Step-level axioms are derived from the eval/readback contract via the Week 7 bridge — see STATUS.md in the topolei repo (../topolei/docs/STATUS.md).

  3. IO tagging. Any effect (GPU dispatch, memory allocation, timing, parallelism) is wrapped in IO so it cannot leak into pure mathematical content. A pure CTerm is guaranteed to have no execution-context dependency.

The boundary pattern in Lean:

-- Mathematical content: pure, axiomatic, Lean-only
axiom scheme_error_bound (s : SchemeRecord) (x : s.source) :
    ‖ s.apply x  s.exact x ‖ ≤ s.error_bound x

-- Execution context: opaque, IO-wrapped, Rust-backed
opaque RealizationHandle : NonemptyType
@[extern "topolei_realize_scheme"]
opaque realize (s : @& SchemeRecord) (ctx : @& ExecContext) :
    IO RealizationHandle

The axiom speaks only of mathematical content; the opaque + extern speaks only of execution context; they meet nowhere structurally.


4. Schemes as Typed Morphisms

A scheme is a record with four required components:

structure Scheme where
  source   : CType                                -- structural input type
  target   : CType                                -- structural output type
  body     : CTerm                                -- the morphism itself
  body_ty  : HasType [] body (.pi source target)  -- typing witness

Mandatory additional fields for first-class status (see §10):

  nat_witness   : NaturalityWitness body source target
  error_bound   : ErrorContract source target
  cost_estimate : CostContract
  lipschitz     : Option LipschitzBound    -- if declared

Schemes compose by term-level application composition:

def Scheme.compose (g f : Scheme) (h : g.source = f.target) : Scheme := {
  source  := f.source,
  target  := g.target,
  body    := .lam "x" (.app g.body (.app f.body (.var "x"))),
  body_ty := ⟨... derived via HasType.lam + HasType.app ... ⟩,
  nat_witness   := nat_witness_compose g.nat_witness f.nat_witness,
  error_bound   := error_compose g.error_bound f.error_bound g.lipschitz,
  cost_estimate := cost_compose g.cost_estimate f.cost_estimate,
  ...
}

The non-trivial compositions are error_compose (needs Lipschitz / condition-number data) and nat_witness_compose (path concatenation via .comp on path types). These live in Cell/Contract.lean once that module is written.


5. Contracts

A contract is metadata attached to a scheme that declares its structural behavior. Four contract kinds exist:

5.1 Error contract

structure ErrorContract (src tgt : CType) where
  norm_on_target : CTerm                        -- norm inducing metric on tgt
  bound          : CTerm → CTerm                -- bound on ‖scheme(x)  exact(x)‖
  bound_proof    : ∀ x, WellTyped (bound x) ...

Critical rule: norm_on_target is a structural object, expressed in terms of the target CType — not in ULPs, fp32-epsilons, or any implementation-unit quantity. A realization may report additional implementation-specific error (rounding error ≤ n · ε_machine), but this is separate from the scheme's declared error bound.

Composition law: for g ∘ f, error(g ∘ f)(x) ≤ Lip(g, y=f(x)) · error(f)(x) + error(g)(f(x)).

5.2 Cost contract

structure CostContract where
  abstract_cost : CTerm → Nat                   -- in abstract cost units
  unit          : AbstractCostUnit              -- flops | memops | roundtrips

Critical rule: cost is expressed in abstract units. Wall-clock calibration (AbstractCostUnit → Float seconds) is a per-backend axiom discharged by the execution-context layer, never stored on the scheme itself. Same scheme, two backends, different wall-clock; the scheme doesn't care.

5.3 Lipschitz / condition-number contract

structure LipschitzBound (src tgt : CType) where
  constant  : CTerm                             -- K such that ‖s(x)s(y)‖ ≤ K·‖xy‖
  domain    : CTerm                             -- domain restriction, if any

Declared when relevant (continuous schemes) and used in error composition. Omitted for discrete / non-Lipschitz schemes; those compose via weaker bounds (e.g., sup-norm worst-case).

5.4 Naturality witness

structure NaturalityWitness (body : CTerm) (src tgt : CType) where
  on_representation_changes :
    (e_src : EquivData) → (e_tgt : EquivData) →
    PathWitness ... body (transport-adjusted body)

This is the transport-stability evidence. A scheme with a non-trivial on_representation_changes is natural (applies to any representative of its source); without one, the scheme is representation-specific and must be wrapped by a transport adapter before being used in a transport-stable context.


6. Realizations

A realization is a specific implementation of a scheme, pinned to an execution context. Multiple realizations of the same scheme coexist and are provably interchangeable up to declared precision differences.

structure Realization (s : Scheme) where
  backend       : Backend                       -- CPU ref | SPIR-V | WGSL | CUDA
  precision     : PrecisionRegime               -- fp16 | fp32 | fp64 | exact
  parallelism   : ParallelismModel
  memory_layout : MemoryLayout

  handle        : IO RealizationHandle          -- opaque, FFI-backed

  -- Specification of extra error beyond the scheme's own bound:
  impl_extra_error : CTerm → CTerm              -- e.g., n · ε_machine
  impl_extra_proof : ∀ x, ‖handle.run(x)  scheme.body(x)‖
                          ≤ impl_extra_error x

Interchangeability: two realizations r1, r2 of the same scheme satisfy:

‖r1.run(x)  r2.run(x)‖ ≤ r1.impl_extra_error x + r2.impl_extra_error x

This is a theorem, not an assumption: it falls out of the two impl_extra_proof axioms by triangle inequality. The scheme itself is the same object in both cases — only the realization varies.


7. Naturality and Transport-Stability

A scheme s : A → B is natural with respect to an equivalence e : A ≃ A' iff there exists a path-witness:

path : Path (A' → B) (transp (uaLine e A A') s) s'

where s' is the corresponding scheme on A'. When the witness is refl (the identity path), s commutes definitionally with transport; when it's a non-trivial path, s commutes up to a specified deformation, which is still enough for the registry's composition rules to work.

How to test naturality: pick a non-trivial e : A ≃ A' (e.g., a basis change on a vector type), transport s along uaLine e, and check the result equals s' (the manually-written scheme on A') up to the path witness. In Lean:

theorem my_scheme_natural (e : EquivData) :
    transp (uaLine e A A') s = s' :=
  ...

If no such theorem exists, the scheme is not natural and must be flagged in the registry as representation-specific.

Wrapping a representation-specific scheme: provide an adapter s_wrapped := s ∘ transp (uaLine e_R R R') where e_R is the representation equivalence. The wrapper is natural; the original isn't. Both coexist in the registry with different flags.


8. Preconditions as Propositions

Preconditions (regularity, topological constraints, algebraic properties) are propositions in the type system, not implicit runtime assumptions. At the Lean level this is native:

def divide (x y : ) (h : y ≠ 0) :  := x / y

At the embedded-cubical CTerm level, our current non-dependent Π limits direct encoding. Preconditions attach externally as a Prop parameter on the scheme record:

structure Scheme where
  ...
  precondition : CTerm → Prop
  apply_when_precondition :
    ∀ x, precondition x → WellTyped (.app body x) target

Anti-pattern: an unstated precondition that shows up as a runtime check inside the scheme body. Every precondition must be reflected in the scheme's type or its precondition field. Runtime checks are allowed in the realization (defense-in-depth), not in the scheme.


9. Order-Dependent Operations

Reductions, floating-point summation, parallel sums, and other operations whose result depends on execution order must declare their non-determinism in their contract:

structure OrderSensitivity where
  orders_considered : List ExecutionOrder
  max_spread        : CTerm                     -- ‖result(o₁)  result(o₂)‖ ≤ max_spread
  max_spread_proof  : ∀ o₁ o₂ ∈ orders_considered,
                      ‖run_with_order o₁  run_with_order o₂‖ ≤ max_spread

A scheme that touches parallel reductions without declaring order sensitivity is not first-class — it violates naturality silently by appearing order-invariant when it isn't. The registry rejects it on type-check.


10. Minimum Contract for First-Class Status

A scheme is first-class in the registry iff it declares:

  1. source : CType
  2. target : CType
  3. body : CTerm with HasType [] body (.pi source target)
  4. nat_witness : NaturalityWitness body source target
  5. error_bound : ErrorContract source target
  6. cost_estimate : CostContract
  7. precondition : CTerm → Prop (may be fun _ => True)
  8. order_sensitivity : Option OrderSensitivity (required non-none iff the scheme touches parallel reductions)

A scheme missing any of 17 is second-class — usable but not transport-stable and not composable in the registry's planner.

Any impl_extra_* / realization-specific information lives on Realization, not Scheme. A scheme with a PrecisionRegime field is wrongly constructed — it has leaked context into content.


11. Registry and Planner

Registry shape:

structure SchemeRegistry where
  schemes : HashMap (CType × CType) (List Scheme)
  -- plus: for each scheme, its declared contracts

Graph view: nodes = (CType, property-proof bundle); edges = schemes with contracts. A missing mapping from A to B is a disconnected subgraph — the planner will report it structurally.

Planner API:

inductive PlannerResult where
  | success (path : List Scheme) (total_cost : Nat) (total_error : CTerm)
  | gap (from_ : CType) (to : CType) (best_partial : List Scheme)
        (missing_edge_description : String)

The planner never returns none or throw. Either it finds a path satisfying the (cost, error) budget, or it returns a structured gap report naming the specific unreachable node and the last reachable node. Silent degradation is a construction fault; the planner's type signature forbids it.

Dual objective: minimise-cost-subject-to-error and minimise- error-subject-to-cost are the primal and dual of the same Lagrangian search. The planner takes a composite objective cost + λ · error and the caller picks λ from their budget; the same code path serves both modes.


12. Construction Faults to Avoid

The anti-patterns that the principle prevents — explicit so reviewers can name them:

F1. Context leak into content

A scheme whose type mentions fp16, CUDA, SIMD, row-major, or any execution-context term. Fix: move it to Realization.

F2. Implicit precondition

A scheme that works only for positive / non-zero / regular / convex / strictly-feasible inputs without saying so in its type or precondition field. Fix: state it.

F3. Error bound in wrong units

error ≤ 5 ULPs instead of error ≤ K · ‖x‖. Fix: express in the target's structural norm; the implementation's rounding error is impl_extra_error.

F4. Pseudo-natural scheme

A scheme that happens to be definable generically but whose NaturalityWitness was never constructed because "it's obviously natural". Obviousness is not a witness; write the path. Fix: if natural, write the proof; if not, mark representation-specific.

F5. Undeclared order-dependence

A scheme that sums in parallel without declaring OrderSensitivity. Fix: declare it or use a deterministic-order variant.

F6. Wall-clock cost

Cost contract in seconds instead of abstract cost units. Fix: abstract units in the scheme, wall-clock in the backend's calibration axiom.

F7. Silent planner degradation

Planner returning a "best effort" path when the budget isn't met, without reporting the gap. Fix: return a PlannerResult.gap; making the caller choose to proceed.

F8. Representation pretending to be math

A matrix type whose CType is really ArrayFP32 — the layout has crept into the structural level. Fix: factor as (Matrix_α : univ) × (rep : Rep Matrix_α), where rep is a Realization-level choice.

F9. Realization's impl_extra_error treated as scheme's bound

Using the realization's extra error budget as if it were the scheme's declared bound. Fix: separate them; reports must distinguish.

F10. Missing interchangeability proof

Two realizations that aren't provably interchangeable — they may not in fact compute the same function. Fix: write the impl_extra_proof for each; interchangeability is a theorem by triangle inequality, and you can't skip the hypothesis.


13. Where This Sits in Topolei's Phase Roadmap

Current state (2026-04-22): Phase 1 Cubical Core closed. The machinery required to build the numerical layer — CType, CTerm, HasType, uaLine, EquivData, transport axioms — is in place.

This document is the specification for the numerical layer that will sit in Cell/Numerical.lean (and satellites: Cell/Contract.lean, Cell/Registry.lean, Cell/Planner.lean). None of those modules exist yet; when written, they must satisfy §10's minimum contract and avoid §12's construction faults.

Blocking dependencies for first-class scheme support:

  1. Σ-types on CType (STATUS.md priority #5) — enables (Matrix_α : univ) × (rep : Rep α) factorization for F8, and the (B : univ) × (norm : B → B → Float) factorization for §5.1.
  2. Dependent Π on HasType — enables preconditions as native types (F2) rather than external Prop fields.
  3. Glue-transport rule — discharges transp_ua from axiom to theorem, which is needed for §7 naturality proofs to be computationally meaningful rather than axiom-level.

None of these require the Rust FFI. All three are pure-Lean extensions of the current axiom base.

Rust FFI dependencies (for execution, not formalization):

  1. Cubical evaluator backend — lets the planner actually run the schemes on real inputs.
  2. Backend-specific realizations — SPIR-V / WGSL / CPU-ref variants with calibrated wall-clock costs.

14. Checklist for a New Numerical Scheme

Before merging a scheme into the registry, verify:

  • source, target, body, body_ty all typed without reference to any execution-context axis (§1).
  • nat_witness provided, or scheme explicitly tagged representation-specific (§7).
  • error_bound in terms of target's structural norm, not implementation units (§5.1, F3).
  • cost_estimate in abstract cost units, not wall-clock (§5.2, F6).
  • precondition explicit for any non-total input (§8, F2).
  • order_sensitivity declared if reductions / parallelism involved (§9, F5).
  • No field of the scheme mentions precision, backend, layout, parallelism, or hardware (§1, F1, F8).
  • At least one Realization committed with impl_extra_proof (§6, F10).
  • If multiple realizations, interchangeability theorem derived (§6).
  • Planner integration: scheme appears as an edge in the registry graph (§11); if a planner search fails involving this scheme, the failure is a PlannerResult.gap (§11, F7).

If any checkbox is unchecked, the scheme is second-class. A second- class scheme may still be merged, but is not transport-stable and will not be used automatically by the planner.


This document is a principle, not a suggestion. Any future numerical work that deviates from it deviates deliberately, with a written argument, recorded as an exception in STATUS.md.