Source.Expr now has intLit and add. Compile and correctness theorem
both extend.
The add case of compile_correct exercises the compositional structure:
- IH on e1 (with extended suffix) gives the multistep for the first
operand's evaluation.
- IH on e2 (with extended prefix) gives the multistep for the second.
- A single .add step at the boundary closes the trace.
- Each intermediate state's PC is computed via array-size arithmetic
threaded through omega.
New supporting lemmas:
step_add - per-instruction step for .add
compile_add_get_op - the instruction at the end of compile (.add e1 e2)
is .add. Extracted so the dependent-rewrite issue
with array bound proofs is contained in one place.
Engineering knowledge gained (recurring patterns when extending):
- Array.getElem_append_left/right take the bound as an explicit positional
arg, not via (h := ...).
- rw on indices that appear in dependent bound proofs fails with "motive
not type correct"; factor the lookup into a separate lemma.
- convert tactic appears not to be available; rw + exact substitutes.
- simp + omega closes most arithmetic on Array.size after expansion.
- step lemmas with implicit args (a, b) need explicit (a := _) in calls
where context doesn't determine them.
Adding a constructor still follows the v0.1 recipe — one Source
constructor, one Eval rule, one compile arm, one step_X helper, one
compile_X_get_op lemma, one case in compile_correct's induction. Each
case is ~25-40 lines of proof.
Zero sorries / axioms / admits.
The CompCert-style substrate-projection theorem at miniature scale:
source-level evaluation and TSM-bytecode execution agree on the value
produced.
TsmLean/Compile/ — three files:
Source.lean - small expression language. v0.1 covers integer
literals only; the framework is structured so
arithmetic, comparison, control flow, and
variables extend mechanically.
Compile.lean - compile : Source.Expr -> TSM.Code
v0.1: intLit n -> #[push n]
Correctness.lean - theorem compile_correct:
Source.Eval e v ->
forall pre suf rest,
MultiStep
{ code := pre ++ compile e ++ suf,
pc := pre.size, stack := rest }
{ code := same,
pc := pre.size + (compile e).size,
stack := v :: rest }
Plus a standalone corollary for the no-prefix case.
The infrastructure is in place for compositional extension:
MultiStep.trans - transitive closure of multi-step
MultiStep.single - lift single step to multi-step
step_push - per-instruction step lemma (push)
getElem_compile - lookup-in-larger-code helper
Adding a constructor to Source (e.g., add) requires:
- one constructor in Source.Expr
- one rule in Source.Eval
- one match arm in compile
- one step_X helper (one-liner)
- one case in compile_correct's induction
Demonstrates the pipeline:
- Source language with big-step semantics
- Compiler producing TSM bytecode
- Correctness theorem bridging the two
Zero sorries / axioms / admits across the entire project.
Third concrete kernel, parallel to golang-lean's TGC and octive-lean's
TOC. The substrate-level asymmetry: TSM has values living by *position*
on a stack, not by name. This breaks the named-variable assumption that
TGC and TOC silently share.
Maps onto real bytecode targets: WebAssembly, JVM, CPython, .NET CIL,
SECD. Anything proved here transfers.
TsmLean/Core/ — seven files, parallel structure to TGC/TOC:
Syntax.lean - Instr (12 opcodes), Value (int/bool), Code
Semantics.lean - State, step (function), MultiStep (rel'n)
Determinism.lean - step_deterministic, MultiStep.deterministic
Eval.lean - fuel-bounded run + run_sound
Types.lean - Ty, StackTy, HasTypeInstr
(per-instruction stack-type transitions)
TypeSoundness.lean - HasTypeV, HasTypeStack
Preservation.lean - stack_preservation, progress
(canonical Pierce-style small-step type soundness)
Theorems proven, zero sorries / axioms / admits:
step_deterministic single-step is functional
MultiStep.deterministic multi-step paths to halt are unique
run_sound successful run -> MultiStep derivation
stack_preservation stack typing preserved by step
progress well-typed non-halt instructions step
Demo (Main.lean): (5 + 3) * 2 evaluated on the stack machine.
push 5; push 3; add; push 2; mul; halt
-> stack [vInt 16] at pc 5.
The structural asymmetry from TGC/TOC: TSM uses small-step semantics
with a function `step : State -> Option State`, where TGC/TOC used
big-step inductive relations `Env -> Term -> Value -> Env`. The
canonical type-soundness theorems also flip: TGC/TOC proved
preservation under big-step (which has no progress analogue);
TSM proves both progress AND preservation, each per-instruction.
This is the third datapoint that the cross-language factoring needs.