Implements the cells-spec vision: a computation space that preserves auditability, correctness, interactivity. Phase 1 (Lean kernel + naga-IR Rust backend) is closed; foundation hypothesis stack (Selection H1+H2, Subobject H3, Trace H5, Obs.Ctx C2, Cubical.Trace) landed. Highlights: - Cubical-HoTT syntax + value/eval/readback in Lean - naga-IR pipeline (no GLSL string crosses FFI; 17/17 probes pass) - Honesty audit: every non-transport (sealed cells, vertex shader, Y-flip, presentation conventions) is documented as such - Polymorphic Trace α as free monoid; Cubical.Trace gives CTerm → Trace CTerm by structural fold (homomorphism = definition) - Selection as Huet zipper; Subobject as Boolean algebra over WCell - All theorems proven; the proof IS the implementation See STATUS.md for the resume guide. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
21 KiB
Topolei — Numerical Implementation Principles
A reference for how numerical schemes, contracts, and realizations are structured so that mathematical content is never fused with execution context. Establishes the invariants a scheme must satisfy to be first-class in the registry. Read this before writing any numerical code in topolei — the separation discipline is the main thing that prevents construction faults downstream.
0. The Principle
A numerical implementation is a pair: a mathematical content (context-free, transport-stable, provably typed) and an execution context (precision regime, parallelism, memory layout, hardware calibration). These must be separated syntactically — by the FFI boundary — so they cannot leak into each other.
Corollary: any code that looks like numerics but mixes the two halves is not a scheme. It is an artifact that happens to produce numbers. Artifacts are allowed at the edge (e.g., calibration benchmarks), but they live outside the scheme registry.
The justification is operational, not aesthetic: numerical bugs that
cost time to find are almost always context leaks — an fp16-
specific stability assumption baked into a "generic" solver, a
parallel-sum scheme that quietly depends on reduction order, a
normalization step whose error budget is expressed in ULPs rather
than in the target's structural norm. Keeping the two halves
separated at the type level prevents these at the boundary, not after
the fact in code review.
1. Mathematical Content
Definition. The mathematical content of a numerical scheme is everything that can be stated without reference to:
- a specific floating-point format (
fp16,fp32,fp64,bf16), - a specific reduction order or parallelism model,
- a specific memory layout (row-major, column-major, blocked, strided),
- a specific hardware cost (GFLOPS, bandwidth, latency),
- a specific backend (CUDA, wgpu, CPU reference, vectorised SIMD),
- a specific realization of its source/target types (dense array, sparse matrix, functional list, mesh data structure).
Positively: mathematical content is a CTerm with a HasType
derivation, together with the propositions about its structural
behavior (naturality, error bounds, Lipschitz constants, condition
numbers). These propositions are stated as Path, Prop, or
proposition-valued CTerms, never as tolerances-in-units.
Worked example. The mathematical content of "trapezoidal rule quadrature" is:
scheme : (A : CType) → (f : A → ℝ) → (a b : A) → ℝ
+ naturality witness w.r.t. affine reparametrisation of A
+ error bound: |scheme(f, a, b) − ∫ f| ≤ (b−a)³/12 · sup |f″|
+ Lipschitz-in-f: |scheme(f, a, b) − scheme(g, a, b)|
≤ (b−a) · ‖f − g‖_∞
Not mathematical content: "trapezoidal with fp32 accumulator, SIMD-
4 wide, row-major input". That's a realization.
2. Execution Context
Definition. The execution context is the residual irreducible data that a realization must commit to in order to actually run. It has five axes, each of which is a separate concern:
| Axis | Examples |
|---|---|
| Precision regime | fp16/fp32/fp64/bf16/posits/exact rationals |
| Parallelism model | serial, SIMD-n, multi-threaded, SIMT/CUDA-like, distributed |
| Memory layout | row/col-major, blocked, strided, compressed-sparse, mesh topology |
| Hardware cost calibration | GFLOPS per backend, bandwidth per memory tier, wall-clock per abstract cost unit |
| Backend binding | CPU reference, SPIR-V shader, WGSL shader, CUDA kernel, wgpu compute |
Where it lives: entirely in the Rust FFI module and/or as Lean-side
opaque types (NonemptyType wrappers around @[extern]
declarations). The execution context never appears in a CTerm; the
type system cannot mention fp16.
Why the split holds at the type level: CType has constructors
univ | pi | path | glue — none of them can carry a precision label,
a parallelism tag, or a hardware handle. The impossibility is
structural, not by discipline. A scheme that needs to talk about
precision must do so in the Rust side's ExecContext struct, and
communicate with Lean only through the axiom interface.
3. The FFI Boundary as Firewall
The two halves are separated by a C ABI boundary. The boundary enforces three properties:
-
Opacity. Execution-context types are
opaquein Lean (opaque GPUContextPointer : NonemptyType), so Lean code cannot inspect them structurally. -
Axiomatic discharge. The Rust side provides
@[extern]implementations for axiom-stated behaviors (@[extern "topolei_cubical_eval"] opaque cubicalEval : CEnv → CTerm → CVal). Each Rust function's behavior is specified by a Lean axiom it must satisfy (e.g.,eval_var/eval_lam/ … inCubical/Eval.lean, orreadback_vneu/ … inCubical/Readback.lean). Step-level axioms are derived from the eval/readback contract via the Week 7 bridge — seeSTATUS.md. -
IO tagging. Any effect (GPU dispatch, memory allocation, timing, parallelism) is wrapped in
IOso it cannot leak into pure mathematical content. A pureCTermis guaranteed to have no execution-context dependency.
The boundary pattern in Lean:
-- Mathematical content: pure, axiomatic, Lean-only
axiom scheme_error_bound (s : SchemeRecord) (x : s.source) :
‖ s.apply x − s.exact x ‖ ≤ s.error_bound x
-- Execution context: opaque, IO-wrapped, Rust-backed
opaque RealizationHandle : NonemptyType
@[extern "topolei_realize_scheme"]
opaque realize (s : @& SchemeRecord) (ctx : @& ExecContext) :
IO RealizationHandle
The axiom speaks only of mathematical content; the opaque + extern speaks only of execution context; they meet nowhere structurally.
4. Schemes as Typed Morphisms
A scheme is a record with four required components:
structure Scheme where
source : CType -- structural input type
target : CType -- structural output type
body : CTerm -- the morphism itself
body_ty : HasType [] body (.pi source target) -- typing witness
Mandatory additional fields for first-class status (see §10):
nat_witness : NaturalityWitness body source target
error_bound : ErrorContract source target
cost_estimate : CostContract
lipschitz : Option LipschitzBound -- if declared
Schemes compose by term-level application composition:
def Scheme.compose (g f : Scheme) (h : g.source = f.target) : Scheme := {
source := f.source,
target := g.target,
body := .lam "x" (.app g.body (.app f.body (.var "x"))),
body_ty := ⟨... derived via HasType.lam + HasType.app ... ⟩,
nat_witness := nat_witness_compose g.nat_witness f.nat_witness,
error_bound := error_compose g.error_bound f.error_bound g.lipschitz,
cost_estimate := cost_compose g.cost_estimate f.cost_estimate,
...
}
The non-trivial compositions are error_compose (needs Lipschitz /
condition-number data) and nat_witness_compose (path concatenation
via .comp on path types). These live in Cell/Contract.lean once
that module is written.
5. Contracts
A contract is metadata attached to a scheme that declares its structural behavior. Four contract kinds exist:
5.1 Error contract
structure ErrorContract (src tgt : CType) where
norm_on_target : CTerm -- norm inducing metric on tgt
bound : CTerm → CTerm -- bound on ‖scheme(x) − exact(x)‖
bound_proof : ∀ x, WellTyped (bound x) ...
Critical rule: norm_on_target is a structural object, expressed
in terms of the target CType — not in ULPs, fp32-epsilons, or any
implementation-unit quantity. A realization may report additional
implementation-specific error (rounding error ≤ n · ε_machine), but
this is separate from the scheme's declared error bound.
Composition law: for g ∘ f,
error(g ∘ f)(x) ≤ Lip(g, y=f(x)) · error(f)(x) + error(g)(f(x)).
5.2 Cost contract
structure CostContract where
abstract_cost : CTerm → Nat -- in abstract cost units
unit : AbstractCostUnit -- flops | memops | roundtrips
Critical rule: cost is expressed in abstract units. Wall-clock
calibration (AbstractCostUnit → Float seconds) is a per-backend axiom
discharged by the execution-context layer, never stored on the
scheme itself. Same scheme, two backends, different wall-clock; the
scheme doesn't care.
5.3 Lipschitz / condition-number contract
structure LipschitzBound (src tgt : CType) where
constant : CTerm -- K such that ‖s(x)−s(y)‖ ≤ K·‖x−y‖
domain : CTerm -- domain restriction, if any
Declared when relevant (continuous schemes) and used in error
composition. Omitted for discrete / non-Lipschitz schemes; those
compose via weaker bounds (e.g., sup-norm worst-case).
5.4 Naturality witness
structure NaturalityWitness (body : CTerm) (src tgt : CType) where
on_representation_changes :
(e_src : EquivData) → (e_tgt : EquivData) →
PathWitness ... body (transport-adjusted body)
This is the transport-stability evidence. A scheme with a
non-trivial on_representation_changes is natural (applies to any
representative of its source); without one, the scheme is
representation-specific and must be wrapped by a transport adapter
before being used in a transport-stable context.
6. Realizations
A realization is a specific implementation of a scheme, pinned to an execution context. Multiple realizations of the same scheme coexist and are provably interchangeable up to declared precision differences.
structure Realization (s : Scheme) where
backend : Backend -- CPU ref | SPIR-V | WGSL | CUDA
precision : PrecisionRegime -- fp16 | fp32 | fp64 | exact
parallelism : ParallelismModel
memory_layout : MemoryLayout
handle : IO RealizationHandle -- opaque, FFI-backed
-- Specification of extra error beyond the scheme's own bound:
impl_extra_error : CTerm → CTerm -- e.g., n · ε_machine
impl_extra_proof : ∀ x, ‖handle.run(x) − scheme.body(x)‖
≤ impl_extra_error x
Interchangeability: two realizations r1, r2 of the same scheme
satisfy:
‖r1.run(x) − r2.run(x)‖ ≤ r1.impl_extra_error x + r2.impl_extra_error x
This is a theorem, not an assumption: it falls out of the two
impl_extra_proof axioms by triangle inequality. The scheme itself
is the same object in both cases — only the realization varies.
7. Naturality and Transport-Stability
A scheme s : A → B is natural with respect to an equivalence
e : A ≃ A' iff there exists a path-witness:
path : Path (A' → B) (transp (uaLine e A A') s) s'
where s' is the corresponding scheme on A'. When the witness is
refl (the identity path), s commutes definitionally with
transport; when it's a non-trivial path, s commutes up to a
specified deformation, which is still enough for the registry's
composition rules to work.
How to test naturality: pick a non-trivial e : A ≃ A' (e.g., a
basis change on a vector type), transport s along uaLine e, and
check the result equals s' (the manually-written scheme on A') up
to the path witness. In Lean:
theorem my_scheme_natural (e : EquivData) :
transp (uaLine e A A') s = s' :=
...
If no such theorem exists, the scheme is not natural and must be flagged in the registry as representation-specific.
Wrapping a representation-specific scheme: provide an adapter
s_wrapped := s ∘ transp (uaLine e_R R R') where e_R is the
representation equivalence. The wrapper is natural; the original
isn't. Both coexist in the registry with different flags.
8. Preconditions as Propositions
Preconditions (regularity, topological constraints, algebraic properties) are propositions in the type system, not implicit runtime assumptions. At the Lean level this is native:
def divide (x y : ℝ) (h : y ≠ 0) : ℝ := x / y
At the embedded-cubical CTerm level, our current non-dependent Π
limits direct encoding. Preconditions attach externally as a Prop
parameter on the scheme record:
structure Scheme where
...
precondition : CTerm → Prop
apply_when_precondition :
∀ x, precondition x → WellTyped (.app body x) target
Anti-pattern: an unstated precondition that shows up as a runtime
check inside the scheme body. Every precondition must be reflected
in the scheme's type or its precondition field. Runtime checks are
allowed in the realization (defense-in-depth), not in the scheme.
9. Order-Dependent Operations
Reductions, floating-point summation, parallel sums, and other operations whose result depends on execution order must declare their non-determinism in their contract:
structure OrderSensitivity where
orders_considered : List ExecutionOrder
max_spread : CTerm -- ‖result(o₁) − result(o₂)‖ ≤ max_spread
max_spread_proof : ∀ o₁ o₂ ∈ orders_considered,
‖run_with_order o₁ − run_with_order o₂‖ ≤ max_spread
A scheme that touches parallel reductions without declaring order sensitivity is not first-class — it violates naturality silently by appearing order-invariant when it isn't. The registry rejects it on type-check.
10. Minimum Contract for First-Class Status
A scheme is first-class in the registry iff it declares:
source : CTypetarget : CTypebody : CTermwithHasType [] body (.pi source target)nat_witness : NaturalityWitness body source targeterror_bound : ErrorContract source targetcost_estimate : CostContractprecondition : CTerm → Prop(may befun _ => True)order_sensitivity : Option OrderSensitivity(required non-none iff the scheme touches parallel reductions)
A scheme missing any of 1–7 is second-class — usable but not transport-stable and not composable in the registry's planner.
Any impl_extra_* / realization-specific information lives on
Realization, not Scheme. A scheme with a PrecisionRegime field
is wrongly constructed — it has leaked context into content.
11. Registry and Planner
Registry shape:
structure SchemeRegistry where
schemes : HashMap (CType × CType) (List Scheme)
-- plus: for each scheme, its declared contracts
Graph view: nodes = (CType, property-proof bundle); edges =
schemes with contracts. A missing mapping from A to B is a
disconnected subgraph — the planner will report it structurally.
Planner API:
inductive PlannerResult where
| success (path : List Scheme) (total_cost : Nat) (total_error : CTerm)
| gap (from_ : CType) (to : CType) (best_partial : List Scheme)
(missing_edge_description : String)
The planner never returns none or throw. Either it finds a
path satisfying the (cost, error) budget, or it returns a structured
gap report naming the specific unreachable node and the last
reachable node. Silent degradation is a construction fault; the
planner's type signature forbids it.
Dual objective: minimise-cost-subject-to-error and minimise-
error-subject-to-cost are the primal and dual of the same Lagrangian
search. The planner takes a composite objective cost + λ · error
and the caller picks λ from their budget; the same code path serves
both modes.
12. Construction Faults to Avoid
The anti-patterns that the principle prevents — explicit so reviewers can name them:
F1. Context leak into content
A scheme whose type mentions fp16, CUDA, SIMD, row-major, or
any execution-context term. Fix: move it to Realization.
F2. Implicit precondition
A scheme that works only for positive / non-zero / regular / convex /
strictly-feasible inputs without saying so in its type or
precondition field. Fix: state it.
F3. Error bound in wrong units
error ≤ 5 ULPs instead of error ≤ K · ‖x‖. Fix: express in the
target's structural norm; the implementation's rounding error is
impl_extra_error.
F4. Pseudo-natural scheme
A scheme that happens to be definable generically but whose
NaturalityWitness was never constructed because "it's obviously
natural". Obviousness is not a witness; write the path. Fix: if
natural, write the proof; if not, mark representation-specific.
F5. Undeclared order-dependence
A scheme that sums in parallel without declaring OrderSensitivity.
Fix: declare it or use a deterministic-order variant.
F6. Wall-clock cost
Cost contract in seconds instead of abstract cost units. Fix: abstract units in the scheme, wall-clock in the backend's calibration axiom.
F7. Silent planner degradation
Planner returning a "best effort" path when the budget isn't met,
without reporting the gap. Fix: return a PlannerResult.gap;
making the caller choose to proceed.
F8. Representation pretending to be math
A matrix type whose CType is really ArrayFP32 — the layout has
crept into the structural level. Fix: factor as (Matrix_α : univ) × (rep : Rep Matrix_α), where rep is a Realization-level choice.
F9. Realization's impl_extra_error treated as scheme's bound
Using the realization's extra error budget as if it were the scheme's declared bound. Fix: separate them; reports must distinguish.
F10. Missing interchangeability proof
Two realizations that aren't provably interchangeable — they may not
in fact compute the same function. Fix: write the
impl_extra_proof for each; interchangeability is a theorem by
triangle inequality, and you can't skip the hypothesis.
13. Where This Sits in Topolei's Phase Roadmap
Current state (2026-04-22): Phase 1 Cubical Core closed. The
machinery required to build the numerical layer — CType, CTerm,
HasType, uaLine, EquivData, transport axioms — is in place.
This document is the specification for the numerical layer that
will sit in Cell/Numerical.lean (and satellites: Cell/Contract.lean,
Cell/Registry.lean, Cell/Planner.lean). None of those modules
exist yet; when written, they must satisfy §10's minimum contract and
avoid §12's construction faults.
Blocking dependencies for first-class scheme support:
- Σ-types on
CType(STATUS.md priority #5) — enables(Matrix_α : univ) × (rep : Rep α)factorization for F8, and the(B : univ) × (norm : B → B → Float)factorization for §5.1. - Dependent Π on
HasType— enables preconditions as native types (F2) rather than externalPropfields. - Glue-transport rule — discharges
transp_uafrom axiom to theorem, which is needed for §7 naturality proofs to be computationally meaningful rather than axiom-level.
None of these require the Rust FFI. All three are pure-Lean extensions of the current axiom base.
Rust FFI dependencies (for execution, not formalization):
- Cubical evaluator backend — lets the planner actually run the schemes on real inputs.
- Backend-specific realizations — SPIR-V / WGSL / CPU-ref variants with calibrated wall-clock costs.
14. Checklist for a New Numerical Scheme
Before merging a scheme into the registry, verify:
source,target,body,body_tyall typed without reference to any execution-context axis (§1).nat_witnessprovided, or scheme explicitly tagged representation-specific (§7).error_boundin terms of target's structural norm, not implementation units (§5.1, F3).cost_estimatein abstract cost units, not wall-clock (§5.2, F6).preconditionexplicit for any non-total input (§8, F2).order_sensitivitydeclared if reductions / parallelism involved (§9, F5).- No field of the scheme mentions precision, backend, layout, parallelism, or hardware (§1, F1, F8).
- At least one
Realizationcommitted withimpl_extra_proof(§6, F10). - If multiple realizations, interchangeability theorem derived (§6).
- Planner integration: scheme appears as an edge in the
registry graph (§11); if a planner search fails involving this
scheme, the failure is a
PlannerResult.gap(§11, F7).
If any checkbox is unchecked, the scheme is second-class. A second- class scheme may still be merged, but is not transport-stable and will not be used automatically by the planner.
This document is a principle, not a suggestion. Any future numerical work that deviates from it deviates deliberately, with a written argument, recorded as an exception in STATUS.md.