Initial commit: Tiny Stack Machine (TSM) in Lean 4.

Third concrete kernel, parallel to golang-lean's TGC and octive-lean's
TOC. The substrate-level asymmetry: TSM has values living by *position*
on a stack, not by name. This breaks the named-variable assumption that
TGC and TOC silently share.

Maps onto real bytecode targets: WebAssembly, JVM, CPython, .NET CIL,
SECD. Anything proved here transfers.

TsmLean/Core/ — seven files, parallel structure to TGC/TOC:

  Syntax.lean        - Instr (12 opcodes), Value (int/bool), Code
  Semantics.lean     - State, step (function), MultiStep (rel'n)
  Determinism.lean   - step_deterministic, MultiStep.deterministic
  Eval.lean          - fuel-bounded run + run_sound
  Types.lean         - Ty, StackTy, HasTypeInstr
                       (per-instruction stack-type transitions)
  TypeSoundness.lean - HasTypeV, HasTypeStack
  Preservation.lean  - stack_preservation, progress
                       (canonical Pierce-style small-step type soundness)

Theorems proven, zero sorries / axioms / admits:

  step_deterministic           single-step is functional
  MultiStep.deterministic      multi-step paths to halt are unique
  run_sound                    successful run -> MultiStep derivation
  stack_preservation           stack typing preserved by step
  progress                     well-typed non-halt instructions step

Demo (Main.lean): (5 + 3) * 2 evaluated on the stack machine.
  push 5; push 3; add; push 2; mul; halt
  -> stack [vInt 16] at pc 5.

The structural asymmetry from TGC/TOC: TSM uses small-step semantics
with a function `step : State -> Option State`, where TGC/TOC used
big-step inductive relations `Env -> Term -> Value -> Env`. The
canonical type-soundness theorems also flip: TGC/TOC proved
preservation under big-step (which has no progress analogue);
TSM proves both progress AND preservation, each per-instruction.

This is the third datapoint that the cross-language factoring needs.
This commit is contained in:
Maximus Gorog 2026-05-10 05:12:10 -06:00
commit 987f205ce5
14 changed files with 574 additions and 0 deletions

1
.gitignore vendored Normal file
View file

@ -0,0 +1 @@
/.lake

22
Main.lean Normal file
View file

@ -0,0 +1,22 @@
import TsmLean
open TsmLean.Core in
def main : IO UInt32 := do
-- Demo: 5 + 3, then * 2 = 16
let prog : Array Instr := #[
.push 5,
.push 3,
.add,
.push 2,
.mul,
.halt
]
let s₀ : State := { code := prog, pc := 0, stack := [] }
match run 100 s₀ with
| some s_final =>
IO.println s!"final stack: {repr s_final.stack}"
IO.println s!"final pc: {s_final.pc}"
return 0
| none =>
IO.eprintln "execution did not terminate within fuel"
return 1

58
README.md Normal file
View file

@ -0,0 +1,58 @@
# tsm-lean
A Lean 4 formalization of a Tiny Stack Machine — third concrete kernel parallel to `golang-lean` (TGC) and `octive-lean` (TOC).
The substrate-level asymmetry: TGC and TOC have *named variables*. TSM has values living *by position* on a stack. Forces the cross-language abstraction to factor over "operand-access mechanism" instead of baking name-lookup into the framework. Maps directly to real bytecode targets — WebAssembly, JVM, CPython, .NET CIL, SECD.
## Build
```sh
lake build
```
## Run the demo
```sh
lake exe tsm-lean
# → final stack: [TsmLean.Core.Value.vInt 16] ((5 + 3) * 2)
# → final pc: 5
```
## Layout
| Path | What's there |
| --- | --- |
| `TsmLean/Core/Syntax.lean` | `Instr`, `Value`, `Code` |
| `TsmLean/Core/Semantics.lean` | `State`, `step` (function), `MultiStep` (relation) |
| `TsmLean/Core/Determinism.lean` | `step_deterministic`, `MultiStep.deterministic` |
| `TsmLean/Core/Eval.lean` | fuel-bounded `run` + `run_sound` |
| `TsmLean/Core/Types.lean` | `Ty`, `StackTy`, `HasTypeInstr` |
| `TsmLean/Core/TypeSoundness.lean` | `HasTypeV`, `HasTypeStack` |
| `TsmLean/Core/Preservation.lean` | `stack_preservation`, `progress` |
| `Main.lean` | demo program |
## Theorems proven
- **`step_deterministic`** — single-step is functional.
- **`MultiStep.deterministic`** — multi-step paths to halted states are unique.
- **`run_sound`** — successful fuel-bounded execution corresponds to a `MultiStep` derivation ending at a halted state.
- **`stack_preservation`** — if the stack matches an instruction's input type and the step succeeds, the post-stack matches its output type.
- **`progress`** — well-typed non-halt instructions always make a step.
The first three are the operational counterparts of the big-step theorems in TGC and TOC. The last two are the small-step type-soundness theorems (Pierce-style), which TGC/TOC's big-step formulations don't have direct analogues for.
Zero sorries, axioms, or admits.
## Status
**v0.1**: per-instruction (local) preservation. Global program-level type soundness — the JVM-style stackmap that ensures all reachable PCs have consistent stack types — is the next layer up.
## Instruction set
```
push n pushB b pop dup swap
add sub mul eq lt
jmp k jmpFalse k halt
```
Twelve instructions. No call / ret yet — direct jumps only. Adding function-call frames is a future extension.

7
TsmLean.lean Normal file
View file

@ -0,0 +1,7 @@
import TsmLean.Core.Syntax
import TsmLean.Core.Semantics
import TsmLean.Core.Determinism
import TsmLean.Core.Eval
import TsmLean.Core.Types
import TsmLean.Core.TypeSoundness
import TsmLean.Core.Preservation

View file

@ -0,0 +1,41 @@
import TsmLean.Core.Semantics
namespace TsmLean.Core
/-! # Determinism of TSM step.
`step` is a total function `State → Option State`, so single-step
determinism is *immediate*: two transitions from the same state yield
the same successor (or both fail).
Multi-step determinism follows by induction on the chain. We prove
that any two `MultiStep` derivations of the same length collapse to
the same trace. -/
theorem step_deterministic
{s s₁ s₂ : State}
(h₁ : step s = some s₁) (h₂ : step s = some s₂) :
s₁ = s₂ := by
rw [h₁] at h₂
exact Option.some.inj h₂
/-- Multi-step paths to halted states are deterministic. -/
theorem MultiStep.deterministic
{s s_a s_b : State}
(D_a : MultiStep s s_a) (D_b : MultiStep s s_b)
(halt_a : step s_a = none) (halt_b : step s_b = none) :
s_a = s_b := by
induction D_a generalizing s_b with
| refl =>
cases D_b with
| refl => rfl
| cons h₁ _ => rw [halt_a] at h₁; cases h₁
| cons h₁ _ ih =>
cases D_b with
| refl => rw [halt_b] at h₁; cases h₁
| cons h₁' D_b' =>
have heq := step_deterministic h₁ h₁'
subst heq
exact ih D_b' halt_a halt_b
end TsmLean.Core

44
TsmLean/Core/Eval.lean Normal file
View file

@ -0,0 +1,44 @@
import TsmLean.Core.Semantics
namespace TsmLean.Core
/-! # Fuel-bounded executable multi-step.
`run n s₀` executes up to `n` steps from `s₀`. Returns the final state
when execution halts (step returns `none`) within fuel, or `none` when
fuel is exhausted before halting.
Soundness: any successful run corresponds to a `MultiStep` derivation
ending at a halted state — same shape as TGC/TOC's eval_sound, but
phrased over the small-step closure rather than big-step. -/
def run : Nat → State → Option State
| 0, _ => none
| n + 1, s =>
match step s with
| none => some s -- halted
| some s' => run n s'
theorem run_sound :
∀ (n : Nat) (s s' : State),
run n s = some s' → MultiStep s s' ∧ step s' = none := by
intro n
induction n with
| zero =>
intros s s' heq
simp [run] at heq
| succ n ih =>
intros s s' heq
simp only [run] at heq
cases hstep : step s with
| none =>
rw [hstep] at heq
simp at heq
subst heq
exact ⟨.refl s, hstep⟩
| some s_next =>
rw [hstep] at heq
have ⟨hMS, hHalt⟩ := ih s_next s' heq
exact ⟨.cons hstep hMS, hHalt⟩
end TsmLean.Core

View file

@ -0,0 +1,203 @@
import TsmLean.Core.TypeSoundness
namespace TsmLean.Core
/-! # Preservation and progress for TSM.
Local (per-instruction) preservation: if the stack matches an
instruction's input type and that instruction succeeds, the post-stack
matches its output type.
Global type soundness — that *every* reachable PC has a consistent
stackmap — requires program-wide code typing (JVM-style stackmaps).
That's a layer above; this file proves the per-instruction theorem
on which the global one is built.
Progress: well-typed non-halt instructions always make a step. -/
theorem stack_preservation
{s s' : State} {in_ty out_ty : StackTy}
(h_pc : s.pc < s.code.size)
(h_typed : HasTypeInstr (s.code[s.pc]'h_pc) in_ty out_ty)
(h_stack : HasTypeStack s.stack in_ty)
(h_step : step s = some s') :
HasTypeStack s'.stack out_ty := by
unfold step at h_step
rw [dif_pos h_pc] at h_step
generalize h_at : s.code[s.pc]'h_pc = instr at h_typed h_step
generalize h_stk : s.stack = stk at h_stack h_step
cases h_typed with
| push n =>
simp at h_step
obtain ⟨_, rfl⟩ := h_step
exact .cons (.vInt n) h_stack
| pushB b =>
simp at h_step
obtain ⟨_, rfl⟩ := h_step
exact .cons (.vBool b) h_stack
| pop =>
cases h_stack with
| cons _ hs =>
simp at h_step
obtain ⟨_, rfl⟩ := h_step
exact hs
| dup =>
cases h_stack with
| cons hv hs =>
simp at h_step
obtain ⟨_, rfl⟩ := h_step
exact .cons hv (.cons hv hs)
| swap =>
cases h_stack with
| cons hv1 h_rest =>
cases h_rest with
| cons hv2 hs =>
simp at h_step
obtain ⟨_, rfl⟩ := h_step
exact .cons hv2 (.cons hv1 hs)
| add =>
cases h_stack with
| cons hv1 h1 =>
cases hv1 with
| vInt a =>
cases h1 with
| cons hv2 hs =>
cases hv2 with
| vInt b =>
simp at h_step
obtain ⟨_, rfl⟩ := h_step
exact .cons (.vInt _) hs
| sub =>
cases h_stack with
| cons hv1 h1 =>
cases hv1 with
| vInt a =>
cases h1 with
| cons hv2 hs =>
cases hv2 with
| vInt b =>
simp at h_step
obtain ⟨_, rfl⟩ := h_step
exact .cons (.vInt _) hs
| mul =>
cases h_stack with
| cons hv1 h1 =>
cases hv1 with
| vInt a =>
cases h1 with
| cons hv2 hs =>
cases hv2 with
| vInt b =>
simp at h_step
obtain ⟨_, rfl⟩ := h_step
exact .cons (.vInt _) hs
| eq_int =>
cases h_stack with
| cons hv1 h1 =>
cases hv1 with
| vInt a =>
cases h1 with
| cons hv2 hs =>
cases hv2 with
| vInt b =>
simp at h_step
obtain ⟨_, rfl⟩ := h_step
exact .cons (.vBool _) hs
| lt_int =>
cases h_stack with
| cons hv1 h1 =>
cases hv1 with
| vInt a =>
cases h1 with
| cons hv2 hs =>
cases hv2 with
| vInt b =>
simp at h_step
obtain ⟨_, rfl⟩ := h_step
exact .cons (.vBool _) hs
| jmp =>
simp at h_step
obtain ⟨_, rfl⟩ := h_step
exact h_stack
| jmpFalse =>
cases h_stack with
| cons hv hs =>
cases hv with
| vBool b =>
cases b with
| false =>
simp at h_step
obtain ⟨_, rfl⟩ := h_step
exact hs
| true =>
simp at h_step
obtain ⟨_, rfl⟩ := h_step
exact hs
| halt =>
simp at h_step
theorem progress
{s : State} {in_ty out_ty : StackTy}
(h_pc : s.pc < s.code.size)
(h_typed : HasTypeInstr (s.code[s.pc]'h_pc) in_ty out_ty)
(h_stack : HasTypeStack s.stack in_ty)
(h_not_halt : s.code[s.pc]'h_pc ≠ .halt) :
∃ s', step s = some s' := by
unfold step
rw [dif_pos h_pc]
generalize h_at : s.code[s.pc]'h_pc = instr at h_typed h_not_halt
generalize h_stk : s.stack = stk at h_stack
cases h_typed with
| push n => exact ⟨_, rfl⟩
| pushB b => exact ⟨_, rfl⟩
| pop =>
cases h_stack with
| cons _ _ => exact ⟨_, rfl⟩
| dup =>
cases h_stack with
| cons _ _ => exact ⟨_, rfl⟩
| swap =>
cases h_stack with
| cons _ h1 => cases h1 with | cons _ _ => exact ⟨_, rfl⟩
| add =>
cases h_stack with
| cons hv1 h1 =>
cases hv1 with
| vInt _ =>
cases h1 with
| cons hv2 _ => cases hv2 with | vInt _ => exact ⟨_, rfl⟩
| sub =>
cases h_stack with
| cons hv1 h1 =>
cases hv1 with
| vInt _ =>
cases h1 with
| cons hv2 _ => cases hv2 with | vInt _ => exact ⟨_, rfl⟩
| mul =>
cases h_stack with
| cons hv1 h1 =>
cases hv1 with
| vInt _ =>
cases h1 with
| cons hv2 _ => cases hv2 with | vInt _ => exact ⟨_, rfl⟩
| eq_int =>
cases h_stack with
| cons hv1 h1 =>
cases hv1 with
| vInt _ =>
cases h1 with
| cons hv2 _ => cases hv2 with | vInt _ => exact ⟨_, rfl⟩
| lt_int =>
cases h_stack with
| cons hv1 h1 =>
cases hv1 with
| vInt _ =>
cases h1 with
| cons hv2 _ => cases hv2 with | vInt _ => exact ⟨_, rfl⟩
| jmp => exact ⟨_, rfl⟩
| jmpFalse =>
cases h_stack with
| cons hv _ => cases hv with | vBool b => cases b <;> exact ⟨_, rfl⟩
| halt => exact absurd rfl h_not_halt
end TsmLean.Core

View file

@ -0,0 +1,80 @@
import TsmLean.Core.Syntax
namespace TsmLean.Core
/-! # Small-step operational semantics for TSM.
State = `(Code, PC, Stack)`. The stack is `List Value` (top-of-stack at
the head). Step is a *function* `State → Option State`:
* `some s'` : the next state.
* `none` : halted, OOB, or stuck (type error).
Compare with TGC/TOC's big-step `Env → Term → Value → Env → Prop`:
TSM uses small-step because instructions are atomic. The reflexive-
transitive closure (`MultiStep`) is the analogue of big-step. -/
structure State where
code : Code
pc : Nat
stack : List Value
deriving Repr, Inhabited
def step (s : State) : Option State :=
if h : s.pc < s.code.size then
match s.code[s.pc] with
| .push n => some { s with pc := s.pc + 1, stack := .vInt n :: s.stack }
| .pushB b => some { s with pc := s.pc + 1, stack := .vBool b :: s.stack }
| .pop =>
match s.stack with
| _ :: rest => some { s with pc := s.pc + 1, stack := rest }
| [] => none
| .dup =>
match s.stack with
| v :: rest => some { s with pc := s.pc + 1, stack := v :: v :: rest }
| [] => none
| .swap =>
match s.stack with
| a :: b :: rest => some { s with pc := s.pc + 1, stack := b :: a :: rest }
| _ => none
| .add =>
match s.stack with
| .vInt a :: .vInt b :: rest =>
some { s with pc := s.pc + 1, stack := .vInt (b + a) :: rest }
| _ => none
| .sub =>
match s.stack with
| .vInt a :: .vInt b :: rest =>
some { s with pc := s.pc + 1, stack := .vInt (b - a) :: rest }
| _ => none
| .mul =>
match s.stack with
| .vInt a :: .vInt b :: rest =>
some { s with pc := s.pc + 1, stack := .vInt (b * a) :: rest }
| _ => none
| .eq =>
match s.stack with
| .vInt a :: .vInt b :: rest =>
some { s with pc := s.pc + 1, stack := .vBool (b == a) :: rest }
| _ => none
| .lt =>
match s.stack with
| .vInt a :: .vInt b :: rest =>
some { s with pc := s.pc + 1, stack := .vBool (b < a) :: rest }
| _ => none
| .jmp k => some { s with pc := k }
| .jmpFalse k =>
match s.stack with
| .vBool false :: rest => some { s with pc := k, stack := rest }
| .vBool true :: rest => some { s with pc := s.pc + 1, stack := rest }
| _ => none
| .halt => none
else none
/-- Reflexive-transitive closure of `step`. -/
inductive MultiStep : State → State → Prop where
| refl (s : State) : MultiStep s s
| cons {s s' s'' : State}
(h₁ : step s = some s') (h₂ : MultiStep s' s'') :
MultiStep s s''
end TsmLean.Core

39
TsmLean/Core/Syntax.lean Normal file
View file

@ -0,0 +1,39 @@
namespace TsmLean.Core
/-! # Tiny Stack Machine (TSM) — abstract syntax.
Third concrete kernel, parallel to golang-lean's TGC and octive-lean's
TOC. Where TGC and TOC have *named variables*, TSM has values living
*by position* on a stack — the deepest substrate-level asymmetry.
Instructions are atomic; programs are arrays of instructions. The PC
indexes into the array. Conditional/unconditional jumps use absolute
target addresses (not relative offsets — simpler to reason about).
Maps to real-world stack-based bytecodes: WebAssembly, JVM, CPython,
.NET CIL, SECD machines. Anything proved here transfers to those. -/
inductive Value where
| vInt : Int → Value
| vBool : Bool → Value
deriving Repr, BEq, Inhabited
inductive Instr where
| push : Int → Instr -- push integer literal
| pushB : Bool → Instr -- push bool literal
| pop : Instr
| dup : Instr -- duplicate top
| swap : Instr -- swap top two
| add : Instr
| sub : Instr
| mul : Instr
| eq : Instr -- pop two ints, push bool
| lt : Instr -- pop two ints, push bool
| jmp : Nat → Instr -- absolute jump
| jmpFalse : Nat → Instr -- pop bool; if false, jump
| halt : Instr
deriving Repr, BEq, Inhabited
abbrev Code := Array Instr
end TsmLean.Core

View file

@ -0,0 +1,22 @@
import TsmLean.Core.Types
import TsmLean.Core.Semantics
namespace TsmLean.Core
/-! # Stack-typing infrastructure.
`HasTypeV` types individual values (int / bool). `HasTypeStack` is the
pointwise lift to a list, length-aligned with a `StackTy`. -/
inductive HasTypeV : Value → Ty → Prop where
| vInt (n : Int) : HasTypeV (.vInt n) .int
| vBool (b : Bool) : HasTypeV (.vBool b) .bool
inductive HasTypeStack : List Value → StackTy → Prop where
| nil : HasTypeStack [] []
| cons {v vs T sty}
(hv : HasTypeV v T)
(hs : HasTypeStack vs sty) :
HasTypeStack (v :: vs) (T :: sty)
end TsmLean.Core

40
TsmLean/Core/Types.lean Normal file
View file

@ -0,0 +1,40 @@
import TsmLean.Core.Syntax
namespace TsmLean.Core
/-! # Static type system for TSM.
Types live on the *stack*, not on names — this is the substrate-level
asymmetry vs TGC and TOC. Each instruction transforms the *type* of
the stack it expects to its post-state.
Two base types: `int` and `bool`. A stack-type `StackTy` is a list of
types matching the stack's runtime contents top-to-tail. Per-instruction
typing `HasTypeInstr instr ty_in ty_out` is the abstract transition;
real code-typing (the JVM-style stackmap) requires that all reachable
PCs have consistent stack types — handled separately. -/
inductive Ty where
| int : Ty
| bool : Ty
deriving Repr, BEq, DecidableEq, Inhabited
abbrev StackTy := List Ty
inductive HasTypeInstr : Instr → StackTy → StackTy → Prop where
| push {sty} (n : Int) : HasTypeInstr (.push n) sty (.int :: sty)
| pushB {sty} (b : Bool) : HasTypeInstr (.pushB b) sty (.bool :: sty)
| pop {T sty} : HasTypeInstr .pop (T :: sty) sty
| dup {T sty} : HasTypeInstr .dup (T :: sty) (T :: T :: sty)
| swap {T₁ T₂ sty} : HasTypeInstr .swap (T₁ :: T₂ :: sty) (T₂ :: T₁ :: sty)
| add {sty} : HasTypeInstr .add (.int :: .int :: sty) (.int :: sty)
| sub {sty} : HasTypeInstr .sub (.int :: .int :: sty) (.int :: sty)
| mul {sty} : HasTypeInstr .mul (.int :: .int :: sty) (.int :: sty)
| eq_int {sty} : HasTypeInstr .eq (.int :: .int :: sty) (.bool :: sty)
| lt_int {sty} : HasTypeInstr .lt (.int :: .int :: sty) (.bool :: sty)
-- Jumps preserve the stack type (target's expected stack matches source's).
| jmp {k sty} : HasTypeInstr (.jmp k) sty sty
| jmpFalse {k sty} : HasTypeInstr (.jmpFalse k) (.bool :: sty) sty
| halt {sty} : HasTypeInstr .halt sty sty
end TsmLean.Core

6
lake-manifest.json Normal file
View file

@ -0,0 +1,6 @@
{"version": "1.2.0",
"packagesDir": ".lake/packages",
"packages": [],
"name": "«tsm-lean»",
"lakeDir": ".lake",
"fixedToolchain": false}

10
lakefile.toml Normal file
View file

@ -0,0 +1,10 @@
name = "tsm-lean"
version = "0.1.0"
defaultTargets = ["tsm-lean"]
[[lean_lib]]
name = "TsmLean"
[[lean_exe]]
name = "tsm-lean"
root = "Main"

1
lean-toolchain Normal file
View file

@ -0,0 +1 @@
leanprover/lean4:v4.30.0-rc2