Umbrella repo gathering golang-lean (TGC), octive-lean (TOC),
tsm-lean (TSM), and common-lean (apex with cross-language
typeclasses). Subprojects arrive next via `git subtree add` so
per-file history survives the move.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Symbolic computation via a persistent Python subprocess: new `.sym`
Value variant carries (srepr, pretty), `OctiveLean.SymPyBridge` owns
the subprocess and round-trips expressions, and `evalBinOp`/unary
negation route through SymPy when either operand is `.sym`. Corpus
adds sym_basic, sym_solve_simplify, sym_calc; demos add Lorenz,
Van der Pol, gravity, SymToolboxDemo, Lab7Interp.
DSL surface changes from `octave! ... octave_end` to `octave! { ... }`.
RosettaStone rewritten against the new syntax; PlotDemo updated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A new Lake project with path dependencies on the three sibling kernels
(golang-lean, octive-lean, tsm-lean). Provides typeclass abstractions
that capture what's structurally shared.
Common/ — three files:
Lang.lean — two typeclasses:
BigStepLang: Term, Value, State, Eval, deterministic.
Captures TGC's and TOC's big-step shape.
SmallStepLang: State, step.
Captures TSM's small-step shape (determinism is
automatic from `step` being a function).
Generic theorems exported once:
BigStepLang.unique_value, BigStepLang.unique_state
SmallStepLang.step_deterministic
Plus a `CompileCorrect` structure for substrate-projection
theorems (the categorical kernel of CompCert-style results).
Instances.lean — three instances:
TGCLang : BigStepLang (State := Heap × Env)
TOCLang : BigStepLang (State := Env)
TSMLang : SmallStepLang (State := TSM.State)
Each instance bridges the local kernel's existing
`BigStep`/`step` + `deterministic` proof into the typeclass.
Demo.lean — three application examples:
The SAME `BigStepLang.unique_value` theorem applied to TGC and
TOC; `SmallStepLang.step_deterministic` to TSM. One abstract
definition, three concrete payoffs.
The factoring is real: any new theorem stated over the typeclasses
applies to all instances. Future theorems (compilation correctness,
type soundness, simulation) can be expressed abstractly and proved
once.
Zero sorries / axioms / admits across all four projects.
Side-by-side comparisons of Go syntax with two Lean encodings:
* Surface AST (GolangLean.Expr) — mirrors go/ast.Node from upstream.
* Tiny Go Core (GolangLean.Core.Term) — kernel calculus with proven
operational + type semantics.
Twelve sections covering: literals, arithmetic, comparison, conditional,
variables/let-binding, functions/application, references/deref/assign,
sequencing. Each example includes a `#eval` running the term through
the proven Core.eval; output matches what's expected at run time.
Demonstrated programs:
- `(5 + 3) * 2` → 16
- `let x = 3 in let y = 7 in x < y` → true
- `(λ x. x*2) 21` → 42
- `let double = λ x. x*2 in double (double 5)` → 20
- `let p = &42 in *p = 100; *p` → 100 (heap: [vInt 100])
- higher-order: `(λ f. λ x. f (f x)) double 3` → 12
Each #eval is a runtime witness for a Core.BigStep derivation. Apply
Core.eval_sound to get a proof of the BigStep relation, and
Core.preservation to get the value's type.
Supporting changes:
- Added Repr instances for Core.Value and Core.EnvList (manual,
handling the mutual inductive — derive doesn't compose across the
mutual block).
- Added RosettaStone as a lean_lib in lakefile.toml.
Compile signature changes from `Source.Expr -> Code` to
`Nat -> Source.Expr -> Code`, where the Nat is the absolute address
where the output is placed. For v0.4's constructs (intLit, boolLit,
add, sub, mul) the offset is just threaded — no compile output
depends on it. The infrastructure is now ready for control-flow
constructs (ifte) that need absolute jump addresses.
All 5 cases of compile_correct updated for the offset-aware signature.
Each case now references compile pre.size e instead of compile e.
For binops, the second sub-expression is compiled at offset
pre.size + (compile pre.size e1).size.
Generic helpers introduced:
getElem_at_offset (pre X suf : Code) (i)
: (pre ++ X ++ suf)[pre.size + i] = X[i]
- bridges outer-array lookup to inner-array lookup at index.
Compile_X_get_op now parameterized by offset.
Engineering notes for ifte (deferred to v0.5):
- The compile output for ifte is
cc ++ #[jmpFalse else_at] ++ ct ++ #[jmp end_at] ++ cf
where else_at, end_at are absolute addresses.
- Looking up the .jmpFalse or .jmp instructions inside this
nested-append structure requires 4-deep getElem_append_left/
getElem_append_right rewrites. Each rewrite has a bound proof
that must thread through dependent type matching. Lean's `rw`
fails with "motive not type correct" on the natural proof.
- The path forward is likely a different decomposition that
avoids dependent-rewrite chains - perhaps using Array.getElem?
(Option-form) which doesn't carry bounds, then bridging back to
Array.getElem at the leaf.
Zero sorries / axioms / admits in v0.4. Full build clean (26 jobs).
Source.Expr: intLit, boolLit, add, sub, mul.
Each new constructor follows the pattern from v0.2:
- one Source constructor + one Eval rule
- one compile arm
- one step_X lemma (push, pushB, add, sub, mul) - one line each
- one compile_X_get_op lemma, all delegating to a generic
getElem_at_op_boundary helper that handles the c1++c2++[op] shape
- one case in compile_correct (~50 lines, mostly mechanical)
The pattern is fully grooved: each new arithmetic op is now a copy-
paste of the add case with substituted constructor names. The
substantive proof work happened once (in v0.2 for add); subsequent
arithmetic ops add no new proof shapes.
Generic helpers introduced:
getElem_at_op_boundary (c1 c2 : Code) (op : Instr) (h) :
(c1 ++ c2 ++ #[op])[c1.size + c2.size]'h = op
- applies to add/sub/mul interchangeably; specialized via
one-line wrappers (compile_add_get_op etc.)
Zero sorries / axioms / admits. Full build: 26 jobs clean.
Source.Expr now has intLit and add. Compile and correctness theorem
both extend.
The add case of compile_correct exercises the compositional structure:
- IH on e1 (with extended suffix) gives the multistep for the first
operand's evaluation.
- IH on e2 (with extended prefix) gives the multistep for the second.
- A single .add step at the boundary closes the trace.
- Each intermediate state's PC is computed via array-size arithmetic
threaded through omega.
New supporting lemmas:
step_add - per-instruction step for .add
compile_add_get_op - the instruction at the end of compile (.add e1 e2)
is .add. Extracted so the dependent-rewrite issue
with array bound proofs is contained in one place.
Engineering knowledge gained (recurring patterns when extending):
- Array.getElem_append_left/right take the bound as an explicit positional
arg, not via (h := ...).
- rw on indices that appear in dependent bound proofs fails with "motive
not type correct"; factor the lookup into a separate lemma.
- convert tactic appears not to be available; rw + exact substitutes.
- simp + omega closes most arithmetic on Array.size after expansion.
- step lemmas with implicit args (a, b) need explicit (a := _) in calls
where context doesn't determine them.
Adding a constructor still follows the v0.1 recipe — one Source
constructor, one Eval rule, one compile arm, one step_X helper, one
compile_X_get_op lemma, one case in compile_correct's induction. Each
case is ~25-40 lines of proof.
Zero sorries / axioms / admits.
The CompCert-style substrate-projection theorem at miniature scale:
source-level evaluation and TSM-bytecode execution agree on the value
produced.
TsmLean/Compile/ — three files:
Source.lean - small expression language. v0.1 covers integer
literals only; the framework is structured so
arithmetic, comparison, control flow, and
variables extend mechanically.
Compile.lean - compile : Source.Expr -> TSM.Code
v0.1: intLit n -> #[push n]
Correctness.lean - theorem compile_correct:
Source.Eval e v ->
forall pre suf rest,
MultiStep
{ code := pre ++ compile e ++ suf,
pc := pre.size, stack := rest }
{ code := same,
pc := pre.size + (compile e).size,
stack := v :: rest }
Plus a standalone corollary for the no-prefix case.
The infrastructure is in place for compositional extension:
MultiStep.trans - transitive closure of multi-step
MultiStep.single - lift single step to multi-step
step_push - per-instruction step lemma (push)
getElem_compile - lookup-in-larger-code helper
Adding a constructor to Source (e.g., add) requires:
- one constructor in Source.Expr
- one rule in Source.Eval
- one match arm in compile
- one step_X helper (one-liner)
- one case in compile_correct's induction
Demonstrates the pipeline:
- Source language with big-step semantics
- Compiler producing TSM bytecode
- Correctness theorem bridging the two
Zero sorries / axioms / admits across the entire project.
Third concrete kernel, parallel to golang-lean's TGC and octive-lean's
TOC. The substrate-level asymmetry: TSM has values living by *position*
on a stack, not by name. This breaks the named-variable assumption that
TGC and TOC silently share.
Maps onto real bytecode targets: WebAssembly, JVM, CPython, .NET CIL,
SECD. Anything proved here transfers.
TsmLean/Core/ — seven files, parallel structure to TGC/TOC:
Syntax.lean - Instr (12 opcodes), Value (int/bool), Code
Semantics.lean - State, step (function), MultiStep (rel'n)
Determinism.lean - step_deterministic, MultiStep.deterministic
Eval.lean - fuel-bounded run + run_sound
Types.lean - Ty, StackTy, HasTypeInstr
(per-instruction stack-type transitions)
TypeSoundness.lean - HasTypeV, HasTypeStack
Preservation.lean - stack_preservation, progress
(canonical Pierce-style small-step type soundness)
Theorems proven, zero sorries / axioms / admits:
step_deterministic single-step is functional
MultiStep.deterministic multi-step paths to halt are unique
run_sound successful run -> MultiStep derivation
stack_preservation stack typing preserved by step
progress well-typed non-halt instructions step
Demo (Main.lean): (5 + 3) * 2 evaluated on the stack machine.
push 5; push 3; add; push 2; mul; halt
-> stack [vInt 16] at pc 5.
The structural asymmetry from TGC/TOC: TSM uses small-step semantics
with a function `step : State -> Option State`, where TGC/TOC used
big-step inductive relations `Env -> Term -> Value -> Env`. The
canonical type-soundness theorems also flip: TGC/TOC proved
preservation under big-step (which has no progress analogue);
TSM proves both progress AND preservation, each per-instruction.
This is the third datapoint that the cross-language factoring needs.
OctiveLean/Core/Preservation.lean — the TOC analogue of TGC's
preservation. Statement:
HasType Γ e T ∧ HasTypeEnv env Γ ∧ BigStep env e v env'
⟹ HasTypeV v T ∧ HasTypeEnv env' Γ
No heap-typing extension (Octave has no heap). Γ is unchanged across
big-steps (assign requires x already typed).
Three structural changes were required to make preservation provable
under env-mutation, all small:
* letIn semantics shifted to scope-restoring: BigStep.letInR now
returns env1 (the env after evaluating the bound expression)
rather than env2 (after the body). This drops body's mutations
at scope-end, matching the lambda-calculus tradition. Determinism
and Eval updated to match.
* HasTypeV.vClos uses a two-part premise (he_dom + he_typed)
instead of nested ∃ — the kernel rejects nested inductive
parameters with locally-bound vars. The two parts are equivalent
to HasTypeEnv via the new HasTypeV.vClos_to_env inversion lemma.
* Inversion via HasTypeV.vClos_to_env exposes the closure's typing
context as an existential — preservation's appR case uses this
to construct the body's HasTypeEnv via extend_letIn.
The cross-language symmetry that emerged:
TGC preservation : threads heap-typings, weakens via extension.
TOC preservation : threads env directly, no extension needed.
In both, the rule cases collapse into the same three structural
shapes — terminal, IH-chain, contradiction-collapse. The case bodies
differ in HOW state is propagated (heap-typing for TGC, env for TOC)
but the SHAPE of each case is identical. That's the cross-language
abstraction speaking.
Zero sorries / axioms / admits across both projects.
OctiveLean/Core/ — the kernel-level formal layer that octive-lean's
existing surface BigStep didn't have. Six files:
Syntax.lean — Term, BinOp. Twelve constructors: ten shared
with TGC plus assign (var mutation in env) and
whileT (loop). No refs (Octave has no &/*).
Semantics.lean — Value, EnvList, BigStep. Signature
`BigStep : Env -> Term -> Value -> Env -> Prop`
threading env (vs TGC's `Heap x Env`).
Determinism.lean — BigStep.deterministic. Same case shapes as TGC
for the shared ten constructors. The whileFR/whileTR
cross-case mirrors TGC's ifTR/ifFR via Bool.noConfusion.
Eval.lean — fuel-bounded eval + eval_sound. whileT recursion
consumes one fuel unit per iteration. Function
calls discard the body's post-env (Octave/MATLAB
local-scope semantics).
Types.lean — Ty (no ref), TyEnv, HasType. Twelve typing rules
mirroring the constructors.
TypeSoundness.lean — HasTypeV inductive, function-form HasTypeEnv.
vClos uses two-part formulation (he_dom + he_typed)
instead of nested existential — Lean's kernel rejects
the natural ∃-form due to nested-inductive parameter
restrictions. extend_typed (assign preservation) and
extend_letIn (letIn preservation) lemmas.
Cross-language symmetry surfaced this iteration:
* Determinism proof: ten cases mirror TGC line-for-line. Two new
cases (assign, while) follow the same three structural shapes.
* Eval signature: state type differs (Env vs Heap×Env), constructors'
eval recurrences are otherwise isomorphic.
* Type system: Ty diverges (no ref vs no whileT/assign), but the
typing-rule shapes are identical — variables, let, app, if, binop, seq.
* Typing of runtime data is where the languages most diverge:
TGC's structural HasTypeEnv works because env is scoped; TOC needs
function-form because env mutates. This is the *real* asymmetry that
a cross-language layer would have to abstract over.
Preservation deferred — has a soundness gap with letIn-shadowing-then-
assign that needs a freshness premise on letIn typing. To follow.
Zero sorries / axioms / admits. Full lake build clean (73 jobs).
GolangLean/Core/Preservation.lean:
theorem preservation:
HasType Γ e T -> HasTypeH ht h -> HasTypeEnv ht env Γ ->
BigStep h env e v h' ->
∃ ht', HeapTy.extends ht ht' /\ HasTypeH ht' h' /\
HasTypeV ht' v T /\ HasTypeEnv ht' env Γ
The standard big-step type-soundness result: well-typed terminating
programs produce well-typed values, with heap conformance preserved.
Proof is by induction on the big-step derivation, fourteen cases.
Supporting infrastructure:
HeapTy.extends_push - heap-typing extends across a push
HasTypeH.push - heap conformance preserved by push
HasTypeH.setIfInBounds - heap conformance preserved by in-bounds update
binop_apply_sound - operator typing matches operator semantics
The closure case (appR) uses the mutual HasTypeV/HasTypeEnv weakening
lemmas from TypeSoundness to thread heap-typings across the three
sub-derivations. The assign case (assignR) uses the heap-update
lemma to preserve conformance. The if-cases collapse the cross-rule
(ifT vs ifF) ambiguity via Bool.noConfusion on the condition's IH.
Zero sorries / axioms / admits across the project. Full lake build clean.
Phase A — GolangLean/Core/Eval.lean:
def eval : Nat -> Heap -> Env -> Term -> Option (Value x Heap)
Fuel-bounded recursive evaluator, total over the fuel.
theorem eval_sound: eval succeeds => BigStep holds.
Bridges executable computation to inductive specification.
Phase B — GolangLean/Core/Types.lean:
inductive Ty {unit, int, bool, arrow, ref}
TyEnv := List (String x Ty); BinOp.typeOf
inductive HasType : TyEnv -> Term -> Ty -> Prop
Standard simply-typed lambda calculus + ML-style references.
Phase B — GolangLean/Core/TypeSoundness.lean:
abbrev HeapTy := Array Ty
mutual inductive HasTypeV / HasTypeEnv (value & env typing under heap-typing)
def HasTypeH (heap conforms to heap-typing)
def HeapTy.extends (prefix-extension of heap-typings)
thm HeapTy.extends_refl, extends_trans
thm HasTypeV.weaken, HasTypeEnv.weaken (mutual; under heap-typing extension)
thm HasTypeEnv.lookup_correspondence (well-typed env yields well-typed values)
The preservation theorem itself
HasType /\ HasTypeH /\ HasTypeEnv /\ BigStep
==> ∃ ht', extends /\ HasTypeH' /\ HasTypeV' /\ HasTypeEnv'
is the next deliverable; the infrastructure here is what its proof
case-analysis depends on.
Zero sorries / axioms / admits across the project. Full lake build clean.
GolangLean/Core/ holds a small calculus that surface Go is intended to
desugar into. Three files:
Syntax.lean - Term, BinOp; thirteen syntactic forms covering
let-binding, lambda, application, references
(Go's & / *), conditionals, sequencing.
Semantics.lean - Value, EnvList, Heap, BinOp.apply, BigStep relation.
Heap is Array Value; references are indices.
Closures capture EnvList lexically, as in Go.
Fourteen big-step constructors, one per syntactic form
(with ifte split into ifTR / ifFR).
Determinism.lean - theorem BigStep.deterministic:
BigStep h env e v1 h1 -> BigStep h env e v2 h2 ->
v1 = v2 /\ h1 = h2
Proof by induction on the first derivation, case
analysis on the second. The ifTR/ifFR cross-cases
close by contradiction via Bool.noConfusion.
No sorries, no axioms, no admits. The kernel is small enough to extend
compositionally: each new syntactic form adds one constructor and one
case to each proof. Type system and concurrency layer come later.
Strategic note: this kernel is shaped so the same construction will
work for any sequential calculus. When octive-lean grows a parallel
Tiny Octave Core, the determinism proof's structure will line up
case-for-case where the languages share constructors. That alignment
is the seed of the cross-language layer.
Mirrors the layout of octive-lean: lakefile, justfile, gitignored
upstream clone, a top-level library module, and a Main entry point
that switches between REPL and file execution.
Module skeleton in GolangLean/:
Token, AST — real ports of go/token and go/ast
Scanner, Parser — stubs throwing notImpl, point at upstream
Value, Env, Error — runtime data shapes
Eval, Builtins, REPL — stubs that compile and run as a placeholder
PureEval, BigStep,
ValueEquiv — formal-semantics layer (mirroring octive-lean)
where cross-language proof eventually lives.
The proof layer is shaped identically to octive-lean's so that
theorems about Go semantics will share their form with the Octave
ones — that shared shape is the candidate for the future
cross-language core.
Upstream reference go-upstream/ (shallow clone of golang/go) is
gitignored.