From 987f205ce54ece251fe761affcac116fdf19c464 Mon Sep 17 00:00:00 2001 From: Maximus Gorog Date: Sun, 10 May 2026 05:12:10 -0600 Subject: [PATCH 1/5] Initial commit: Tiny Stack Machine (TSM) in Lean 4. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Third concrete kernel, parallel to golang-lean's TGC and octive-lean's TOC. The substrate-level asymmetry: TSM has values living by *position* on a stack, not by name. This breaks the named-variable assumption that TGC and TOC silently share. Maps onto real bytecode targets: WebAssembly, JVM, CPython, .NET CIL, SECD. Anything proved here transfers. TsmLean/Core/ — seven files, parallel structure to TGC/TOC: Syntax.lean - Instr (12 opcodes), Value (int/bool), Code Semantics.lean - State, step (function), MultiStep (rel'n) Determinism.lean - step_deterministic, MultiStep.deterministic Eval.lean - fuel-bounded run + run_sound Types.lean - Ty, StackTy, HasTypeInstr (per-instruction stack-type transitions) TypeSoundness.lean - HasTypeV, HasTypeStack Preservation.lean - stack_preservation, progress (canonical Pierce-style small-step type soundness) Theorems proven, zero sorries / axioms / admits: step_deterministic single-step is functional MultiStep.deterministic multi-step paths to halt are unique run_sound successful run -> MultiStep derivation stack_preservation stack typing preserved by step progress well-typed non-halt instructions step Demo (Main.lean): (5 + 3) * 2 evaluated on the stack machine. push 5; push 3; add; push 2; mul; halt -> stack [vInt 16] at pc 5. The structural asymmetry from TGC/TOC: TSM uses small-step semantics with a function `step : State -> Option State`, where TGC/TOC used big-step inductive relations `Env -> Term -> Value -> Env`. The canonical type-soundness theorems also flip: TGC/TOC proved preservation under big-step (which has no progress analogue); TSM proves both progress AND preservation, each per-instruction. This is the third datapoint that the cross-language factoring needs. --- .gitignore | 1 + Main.lean | 22 ++++ README.md | 58 +++++++++ TsmLean.lean | 7 ++ TsmLean/Core/Determinism.lean | 41 +++++++ TsmLean/Core/Eval.lean | 44 +++++++ TsmLean/Core/Preservation.lean | 203 ++++++++++++++++++++++++++++++++ TsmLean/Core/Semantics.lean | 80 +++++++++++++ TsmLean/Core/Syntax.lean | 39 ++++++ TsmLean/Core/TypeSoundness.lean | 22 ++++ TsmLean/Core/Types.lean | 40 +++++++ lake-manifest.json | 6 + lakefile.toml | 10 ++ lean-toolchain | 1 + 14 files changed, 574 insertions(+) create mode 100644 .gitignore create mode 100644 Main.lean create mode 100644 README.md create mode 100644 TsmLean.lean create mode 100644 TsmLean/Core/Determinism.lean create mode 100644 TsmLean/Core/Eval.lean create mode 100644 TsmLean/Core/Preservation.lean create mode 100644 TsmLean/Core/Semantics.lean create mode 100644 TsmLean/Core/Syntax.lean create mode 100644 TsmLean/Core/TypeSoundness.lean create mode 100644 TsmLean/Core/Types.lean create mode 100644 lake-manifest.json create mode 100644 lakefile.toml create mode 100644 lean-toolchain diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..bfb30ec --- /dev/null +++ b/.gitignore @@ -0,0 +1 @@ +/.lake diff --git a/Main.lean b/Main.lean new file mode 100644 index 0000000..6ceae59 --- /dev/null +++ b/Main.lean @@ -0,0 +1,22 @@ +import TsmLean + +open TsmLean.Core in +def main : IO UInt32 := do + -- Demo: 5 + 3, then * 2 = 16 + let prog : Array Instr := #[ + .push 5, + .push 3, + .add, + .push 2, + .mul, + .halt + ] + let s₀ : State := { code := prog, pc := 0, stack := [] } + match run 100 s₀ with + | some s_final => + IO.println s!"final stack: {repr s_final.stack}" + IO.println s!"final pc: {s_final.pc}" + return 0 + | none => + IO.eprintln "execution did not terminate within fuel" + return 1 diff --git a/README.md b/README.md new file mode 100644 index 0000000..c0d0675 --- /dev/null +++ b/README.md @@ -0,0 +1,58 @@ +# tsm-lean + +A Lean 4 formalization of a Tiny Stack Machine — third concrete kernel parallel to `golang-lean` (TGC) and `octive-lean` (TOC). + +The substrate-level asymmetry: TGC and TOC have *named variables*. TSM has values living *by position* on a stack. Forces the cross-language abstraction to factor over "operand-access mechanism" instead of baking name-lookup into the framework. Maps directly to real bytecode targets — WebAssembly, JVM, CPython, .NET CIL, SECD. + +## Build + +```sh +lake build +``` + +## Run the demo + +```sh +lake exe tsm-lean +# → final stack: [TsmLean.Core.Value.vInt 16] ((5 + 3) * 2) +# → final pc: 5 +``` + +## Layout + +| Path | What's there | +| --- | --- | +| `TsmLean/Core/Syntax.lean` | `Instr`, `Value`, `Code` | +| `TsmLean/Core/Semantics.lean` | `State`, `step` (function), `MultiStep` (relation) | +| `TsmLean/Core/Determinism.lean` | `step_deterministic`, `MultiStep.deterministic` | +| `TsmLean/Core/Eval.lean` | fuel-bounded `run` + `run_sound` | +| `TsmLean/Core/Types.lean` | `Ty`, `StackTy`, `HasTypeInstr` | +| `TsmLean/Core/TypeSoundness.lean` | `HasTypeV`, `HasTypeStack` | +| `TsmLean/Core/Preservation.lean` | `stack_preservation`, `progress` | +| `Main.lean` | demo program | + +## Theorems proven + +- **`step_deterministic`** — single-step is functional. +- **`MultiStep.deterministic`** — multi-step paths to halted states are unique. +- **`run_sound`** — successful fuel-bounded execution corresponds to a `MultiStep` derivation ending at a halted state. +- **`stack_preservation`** — if the stack matches an instruction's input type and the step succeeds, the post-stack matches its output type. +- **`progress`** — well-typed non-halt instructions always make a step. + +The first three are the operational counterparts of the big-step theorems in TGC and TOC. The last two are the small-step type-soundness theorems (Pierce-style), which TGC/TOC's big-step formulations don't have direct analogues for. + +Zero sorries, axioms, or admits. + +## Status + +**v0.1**: per-instruction (local) preservation. Global program-level type soundness — the JVM-style stackmap that ensures all reachable PCs have consistent stack types — is the next layer up. + +## Instruction set + +``` +push n pushB b pop dup swap +add sub mul eq lt +jmp k jmpFalse k halt +``` + +Twelve instructions. No call / ret yet — direct jumps only. Adding function-call frames is a future extension. diff --git a/TsmLean.lean b/TsmLean.lean new file mode 100644 index 0000000..afc15c8 --- /dev/null +++ b/TsmLean.lean @@ -0,0 +1,7 @@ +import TsmLean.Core.Syntax +import TsmLean.Core.Semantics +import TsmLean.Core.Determinism +import TsmLean.Core.Eval +import TsmLean.Core.Types +import TsmLean.Core.TypeSoundness +import TsmLean.Core.Preservation diff --git a/TsmLean/Core/Determinism.lean b/TsmLean/Core/Determinism.lean new file mode 100644 index 0000000..345add7 --- /dev/null +++ b/TsmLean/Core/Determinism.lean @@ -0,0 +1,41 @@ +import TsmLean.Core.Semantics + +namespace TsmLean.Core + +/-! # Determinism of TSM step. + +`step` is a total function `State → Option State`, so single-step +determinism is *immediate*: two transitions from the same state yield +the same successor (or both fail). + +Multi-step determinism follows by induction on the chain. We prove +that any two `MultiStep` derivations of the same length collapse to +the same trace. -/ + +theorem step_deterministic + {s s₁ s₂ : State} + (h₁ : step s = some s₁) (h₂ : step s = some s₂) : + s₁ = s₂ := by + rw [h₁] at h₂ + exact Option.some.inj h₂ + +/-- Multi-step paths to halted states are deterministic. -/ +theorem MultiStep.deterministic + {s s_a s_b : State} + (D_a : MultiStep s s_a) (D_b : MultiStep s s_b) + (halt_a : step s_a = none) (halt_b : step s_b = none) : + s_a = s_b := by + induction D_a generalizing s_b with + | refl => + cases D_b with + | refl => rfl + | cons h₁ _ => rw [halt_a] at h₁; cases h₁ + | cons h₁ _ ih => + cases D_b with + | refl => rw [halt_b] at h₁; cases h₁ + | cons h₁' D_b' => + have heq := step_deterministic h₁ h₁' + subst heq + exact ih D_b' halt_a halt_b + +end TsmLean.Core diff --git a/TsmLean/Core/Eval.lean b/TsmLean/Core/Eval.lean new file mode 100644 index 0000000..671a158 --- /dev/null +++ b/TsmLean/Core/Eval.lean @@ -0,0 +1,44 @@ +import TsmLean.Core.Semantics + +namespace TsmLean.Core + +/-! # Fuel-bounded executable multi-step. + +`run n s₀` executes up to `n` steps from `s₀`. Returns the final state +when execution halts (step returns `none`) within fuel, or `none` when +fuel is exhausted before halting. + +Soundness: any successful run corresponds to a `MultiStep` derivation +ending at a halted state — same shape as TGC/TOC's eval_sound, but +phrased over the small-step closure rather than big-step. -/ + +def run : Nat → State → Option State + | 0, _ => none + | n + 1, s => + match step s with + | none => some s -- halted + | some s' => run n s' + +theorem run_sound : + ∀ (n : Nat) (s s' : State), + run n s = some s' → MultiStep s s' ∧ step s' = none := by + intro n + induction n with + | zero => + intros s s' heq + simp [run] at heq + | succ n ih => + intros s s' heq + simp only [run] at heq + cases hstep : step s with + | none => + rw [hstep] at heq + simp at heq + subst heq + exact ⟨.refl s, hstep⟩ + | some s_next => + rw [hstep] at heq + have ⟨hMS, hHalt⟩ := ih s_next s' heq + exact ⟨.cons hstep hMS, hHalt⟩ + +end TsmLean.Core diff --git a/TsmLean/Core/Preservation.lean b/TsmLean/Core/Preservation.lean new file mode 100644 index 0000000..f6343a0 --- /dev/null +++ b/TsmLean/Core/Preservation.lean @@ -0,0 +1,203 @@ +import TsmLean.Core.TypeSoundness + +namespace TsmLean.Core + +/-! # Preservation and progress for TSM. + +Local (per-instruction) preservation: if the stack matches an +instruction's input type and that instruction succeeds, the post-stack +matches its output type. + +Global type soundness — that *every* reachable PC has a consistent +stackmap — requires program-wide code typing (JVM-style stackmaps). +That's a layer above; this file proves the per-instruction theorem +on which the global one is built. + +Progress: well-typed non-halt instructions always make a step. -/ + +theorem stack_preservation + {s s' : State} {in_ty out_ty : StackTy} + (h_pc : s.pc < s.code.size) + (h_typed : HasTypeInstr (s.code[s.pc]'h_pc) in_ty out_ty) + (h_stack : HasTypeStack s.stack in_ty) + (h_step : step s = some s') : + HasTypeStack s'.stack out_ty := by + unfold step at h_step + rw [dif_pos h_pc] at h_step + generalize h_at : s.code[s.pc]'h_pc = instr at h_typed h_step + generalize h_stk : s.stack = stk at h_stack h_step + cases h_typed with + | push n => + simp at h_step + obtain ⟨_, rfl⟩ := h_step + exact .cons (.vInt n) h_stack + | pushB b => + simp at h_step + obtain ⟨_, rfl⟩ := h_step + exact .cons (.vBool b) h_stack + | pop => + cases h_stack with + | cons _ hs => + simp at h_step + obtain ⟨_, rfl⟩ := h_step + exact hs + | dup => + cases h_stack with + | cons hv hs => + simp at h_step + obtain ⟨_, rfl⟩ := h_step + exact .cons hv (.cons hv hs) + | swap => + cases h_stack with + | cons hv1 h_rest => + cases h_rest with + | cons hv2 hs => + simp at h_step + obtain ⟨_, rfl⟩ := h_step + exact .cons hv2 (.cons hv1 hs) + | add => + cases h_stack with + | cons hv1 h1 => + cases hv1 with + | vInt a => + cases h1 with + | cons hv2 hs => + cases hv2 with + | vInt b => + simp at h_step + obtain ⟨_, rfl⟩ := h_step + exact .cons (.vInt _) hs + | sub => + cases h_stack with + | cons hv1 h1 => + cases hv1 with + | vInt a => + cases h1 with + | cons hv2 hs => + cases hv2 with + | vInt b => + simp at h_step + obtain ⟨_, rfl⟩ := h_step + exact .cons (.vInt _) hs + | mul => + cases h_stack with + | cons hv1 h1 => + cases hv1 with + | vInt a => + cases h1 with + | cons hv2 hs => + cases hv2 with + | vInt b => + simp at h_step + obtain ⟨_, rfl⟩ := h_step + exact .cons (.vInt _) hs + | eq_int => + cases h_stack with + | cons hv1 h1 => + cases hv1 with + | vInt a => + cases h1 with + | cons hv2 hs => + cases hv2 with + | vInt b => + simp at h_step + obtain ⟨_, rfl⟩ := h_step + exact .cons (.vBool _) hs + | lt_int => + cases h_stack with + | cons hv1 h1 => + cases hv1 with + | vInt a => + cases h1 with + | cons hv2 hs => + cases hv2 with + | vInt b => + simp at h_step + obtain ⟨_, rfl⟩ := h_step + exact .cons (.vBool _) hs + | jmp => + simp at h_step + obtain ⟨_, rfl⟩ := h_step + exact h_stack + | jmpFalse => + cases h_stack with + | cons hv hs => + cases hv with + | vBool b => + cases b with + | false => + simp at h_step + obtain ⟨_, rfl⟩ := h_step + exact hs + | true => + simp at h_step + obtain ⟨_, rfl⟩ := h_step + exact hs + | halt => + simp at h_step + +theorem progress + {s : State} {in_ty out_ty : StackTy} + (h_pc : s.pc < s.code.size) + (h_typed : HasTypeInstr (s.code[s.pc]'h_pc) in_ty out_ty) + (h_stack : HasTypeStack s.stack in_ty) + (h_not_halt : s.code[s.pc]'h_pc ≠ .halt) : + ∃ s', step s = some s' := by + unfold step + rw [dif_pos h_pc] + generalize h_at : s.code[s.pc]'h_pc = instr at h_typed h_not_halt + generalize h_stk : s.stack = stk at h_stack + cases h_typed with + | push n => exact ⟨_, rfl⟩ + | pushB b => exact ⟨_, rfl⟩ + | pop => + cases h_stack with + | cons _ _ => exact ⟨_, rfl⟩ + | dup => + cases h_stack with + | cons _ _ => exact ⟨_, rfl⟩ + | swap => + cases h_stack with + | cons _ h1 => cases h1 with | cons _ _ => exact ⟨_, rfl⟩ + | add => + cases h_stack with + | cons hv1 h1 => + cases hv1 with + | vInt _ => + cases h1 with + | cons hv2 _ => cases hv2 with | vInt _ => exact ⟨_, rfl⟩ + | sub => + cases h_stack with + | cons hv1 h1 => + cases hv1 with + | vInt _ => + cases h1 with + | cons hv2 _ => cases hv2 with | vInt _ => exact ⟨_, rfl⟩ + | mul => + cases h_stack with + | cons hv1 h1 => + cases hv1 with + | vInt _ => + cases h1 with + | cons hv2 _ => cases hv2 with | vInt _ => exact ⟨_, rfl⟩ + | eq_int => + cases h_stack with + | cons hv1 h1 => + cases hv1 with + | vInt _ => + cases h1 with + | cons hv2 _ => cases hv2 with | vInt _ => exact ⟨_, rfl⟩ + | lt_int => + cases h_stack with + | cons hv1 h1 => + cases hv1 with + | vInt _ => + cases h1 with + | cons hv2 _ => cases hv2 with | vInt _ => exact ⟨_, rfl⟩ + | jmp => exact ⟨_, rfl⟩ + | jmpFalse => + cases h_stack with + | cons hv _ => cases hv with | vBool b => cases b <;> exact ⟨_, rfl⟩ + | halt => exact absurd rfl h_not_halt + +end TsmLean.Core diff --git a/TsmLean/Core/Semantics.lean b/TsmLean/Core/Semantics.lean new file mode 100644 index 0000000..5116037 --- /dev/null +++ b/TsmLean/Core/Semantics.lean @@ -0,0 +1,80 @@ +import TsmLean.Core.Syntax + +namespace TsmLean.Core + +/-! # Small-step operational semantics for TSM. + +State = `(Code, PC, Stack)`. The stack is `List Value` (top-of-stack at +the head). Step is a *function* `State → Option State`: + * `some s'` : the next state. + * `none` : halted, OOB, or stuck (type error). + +Compare with TGC/TOC's big-step `Env → Term → Value → Env → Prop`: +TSM uses small-step because instructions are atomic. The reflexive- +transitive closure (`MultiStep`) is the analogue of big-step. -/ + +structure State where + code : Code + pc : Nat + stack : List Value + deriving Repr, Inhabited + +def step (s : State) : Option State := + if h : s.pc < s.code.size then + match s.code[s.pc] with + | .push n => some { s with pc := s.pc + 1, stack := .vInt n :: s.stack } + | .pushB b => some { s with pc := s.pc + 1, stack := .vBool b :: s.stack } + | .pop => + match s.stack with + | _ :: rest => some { s with pc := s.pc + 1, stack := rest } + | [] => none + | .dup => + match s.stack with + | v :: rest => some { s with pc := s.pc + 1, stack := v :: v :: rest } + | [] => none + | .swap => + match s.stack with + | a :: b :: rest => some { s with pc := s.pc + 1, stack := b :: a :: rest } + | _ => none + | .add => + match s.stack with + | .vInt a :: .vInt b :: rest => + some { s with pc := s.pc + 1, stack := .vInt (b + a) :: rest } + | _ => none + | .sub => + match s.stack with + | .vInt a :: .vInt b :: rest => + some { s with pc := s.pc + 1, stack := .vInt (b - a) :: rest } + | _ => none + | .mul => + match s.stack with + | .vInt a :: .vInt b :: rest => + some { s with pc := s.pc + 1, stack := .vInt (b * a) :: rest } + | _ => none + | .eq => + match s.stack with + | .vInt a :: .vInt b :: rest => + some { s with pc := s.pc + 1, stack := .vBool (b == a) :: rest } + | _ => none + | .lt => + match s.stack with + | .vInt a :: .vInt b :: rest => + some { s with pc := s.pc + 1, stack := .vBool (b < a) :: rest } + | _ => none + | .jmp k => some { s with pc := k } + | .jmpFalse k => + match s.stack with + | .vBool false :: rest => some { s with pc := k, stack := rest } + | .vBool true :: rest => some { s with pc := s.pc + 1, stack := rest } + | _ => none + | .halt => none + else none + +/-- Reflexive-transitive closure of `step`. -/ +inductive MultiStep : State → State → Prop where + | refl (s : State) : MultiStep s s + | cons {s s' s'' : State} + (h₁ : step s = some s') (h₂ : MultiStep s' s'') : + MultiStep s s'' + +end TsmLean.Core diff --git a/TsmLean/Core/Syntax.lean b/TsmLean/Core/Syntax.lean new file mode 100644 index 0000000..7f7fd8a --- /dev/null +++ b/TsmLean/Core/Syntax.lean @@ -0,0 +1,39 @@ +namespace TsmLean.Core + +/-! # Tiny Stack Machine (TSM) — abstract syntax. + +Third concrete kernel, parallel to golang-lean's TGC and octive-lean's +TOC. Where TGC and TOC have *named variables*, TSM has values living +*by position* on a stack — the deepest substrate-level asymmetry. + +Instructions are atomic; programs are arrays of instructions. The PC +indexes into the array. Conditional/unconditional jumps use absolute +target addresses (not relative offsets — simpler to reason about). + +Maps to real-world stack-based bytecodes: WebAssembly, JVM, CPython, +.NET CIL, SECD machines. Anything proved here transfers to those. -/ + +inductive Value where + | vInt : Int → Value + | vBool : Bool → Value + deriving Repr, BEq, Inhabited + +inductive Instr where + | push : Int → Instr -- push integer literal + | pushB : Bool → Instr -- push bool literal + | pop : Instr + | dup : Instr -- duplicate top + | swap : Instr -- swap top two + | add : Instr + | sub : Instr + | mul : Instr + | eq : Instr -- pop two ints, push bool + | lt : Instr -- pop two ints, push bool + | jmp : Nat → Instr -- absolute jump + | jmpFalse : Nat → Instr -- pop bool; if false, jump + | halt : Instr + deriving Repr, BEq, Inhabited + +abbrev Code := Array Instr + +end TsmLean.Core diff --git a/TsmLean/Core/TypeSoundness.lean b/TsmLean/Core/TypeSoundness.lean new file mode 100644 index 0000000..4483e85 --- /dev/null +++ b/TsmLean/Core/TypeSoundness.lean @@ -0,0 +1,22 @@ +import TsmLean.Core.Types +import TsmLean.Core.Semantics + +namespace TsmLean.Core + +/-! # Stack-typing infrastructure. + +`HasTypeV` types individual values (int / bool). `HasTypeStack` is the +pointwise lift to a list, length-aligned with a `StackTy`. -/ + +inductive HasTypeV : Value → Ty → Prop where + | vInt (n : Int) : HasTypeV (.vInt n) .int + | vBool (b : Bool) : HasTypeV (.vBool b) .bool + +inductive HasTypeStack : List Value → StackTy → Prop where + | nil : HasTypeStack [] [] + | cons {v vs T sty} + (hv : HasTypeV v T) + (hs : HasTypeStack vs sty) : + HasTypeStack (v :: vs) (T :: sty) + +end TsmLean.Core diff --git a/TsmLean/Core/Types.lean b/TsmLean/Core/Types.lean new file mode 100644 index 0000000..1d2b283 --- /dev/null +++ b/TsmLean/Core/Types.lean @@ -0,0 +1,40 @@ +import TsmLean.Core.Syntax + +namespace TsmLean.Core + +/-! # Static type system for TSM. + +Types live on the *stack*, not on names — this is the substrate-level +asymmetry vs TGC and TOC. Each instruction transforms the *type* of +the stack it expects to its post-state. + +Two base types: `int` and `bool`. A stack-type `StackTy` is a list of +types matching the stack's runtime contents top-to-tail. Per-instruction +typing `HasTypeInstr instr ty_in ty_out` is the abstract transition; +real code-typing (the JVM-style stackmap) requires that all reachable +PCs have consistent stack types — handled separately. -/ + +inductive Ty where + | int : Ty + | bool : Ty + deriving Repr, BEq, DecidableEq, Inhabited + +abbrev StackTy := List Ty + +inductive HasTypeInstr : Instr → StackTy → StackTy → Prop where + | push {sty} (n : Int) : HasTypeInstr (.push n) sty (.int :: sty) + | pushB {sty} (b : Bool) : HasTypeInstr (.pushB b) sty (.bool :: sty) + | pop {T sty} : HasTypeInstr .pop (T :: sty) sty + | dup {T sty} : HasTypeInstr .dup (T :: sty) (T :: T :: sty) + | swap {T₁ T₂ sty} : HasTypeInstr .swap (T₁ :: T₂ :: sty) (T₂ :: T₁ :: sty) + | add {sty} : HasTypeInstr .add (.int :: .int :: sty) (.int :: sty) + | sub {sty} : HasTypeInstr .sub (.int :: .int :: sty) (.int :: sty) + | mul {sty} : HasTypeInstr .mul (.int :: .int :: sty) (.int :: sty) + | eq_int {sty} : HasTypeInstr .eq (.int :: .int :: sty) (.bool :: sty) + | lt_int {sty} : HasTypeInstr .lt (.int :: .int :: sty) (.bool :: sty) + -- Jumps preserve the stack type (target's expected stack matches source's). + | jmp {k sty} : HasTypeInstr (.jmp k) sty sty + | jmpFalse {k sty} : HasTypeInstr (.jmpFalse k) (.bool :: sty) sty + | halt {sty} : HasTypeInstr .halt sty sty + +end TsmLean.Core diff --git a/lake-manifest.json b/lake-manifest.json new file mode 100644 index 0000000..deee81a --- /dev/null +++ b/lake-manifest.json @@ -0,0 +1,6 @@ +{"version": "1.2.0", + "packagesDir": ".lake/packages", + "packages": [], + "name": "«tsm-lean»", + "lakeDir": ".lake", + "fixedToolchain": false} diff --git a/lakefile.toml b/lakefile.toml new file mode 100644 index 0000000..7f186b7 --- /dev/null +++ b/lakefile.toml @@ -0,0 +1,10 @@ +name = "tsm-lean" +version = "0.1.0" +defaultTargets = ["tsm-lean"] + +[[lean_lib]] +name = "TsmLean" + +[[lean_exe]] +name = "tsm-lean" +root = "Main" diff --git a/lean-toolchain b/lean-toolchain new file mode 100644 index 0000000..6c7e31f --- /dev/null +++ b/lean-toolchain @@ -0,0 +1 @@ +leanprover/lean4:v4.30.0-rc2 From fff0091f89b29f75fffe7340d3eb94b2c835191b Mon Sep 17 00:00:00 2001 From: Maximus Gorog Date: Sun, 10 May 2026 05:38:01 -0600 Subject: [PATCH 2/5] Add source-to-TSM compiler with proven correctness (v0.1). MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The CompCert-style substrate-projection theorem at miniature scale: source-level evaluation and TSM-bytecode execution agree on the value produced. TsmLean/Compile/ — three files: Source.lean - small expression language. v0.1 covers integer literals only; the framework is structured so arithmetic, comparison, control flow, and variables extend mechanically. Compile.lean - compile : Source.Expr -> TSM.Code v0.1: intLit n -> #[push n] Correctness.lean - theorem compile_correct: Source.Eval e v -> forall pre suf rest, MultiStep { code := pre ++ compile e ++ suf, pc := pre.size, stack := rest } { code := same, pc := pre.size + (compile e).size, stack := v :: rest } Plus a standalone corollary for the no-prefix case. The infrastructure is in place for compositional extension: MultiStep.trans - transitive closure of multi-step MultiStep.single - lift single step to multi-step step_push - per-instruction step lemma (push) getElem_compile - lookup-in-larger-code helper Adding a constructor to Source (e.g., add) requires: - one constructor in Source.Expr - one rule in Source.Eval - one match arm in compile - one step_X helper (one-liner) - one case in compile_correct's induction Demonstrates the pipeline: - Source language with big-step semantics - Compiler producing TSM bytecode - Correctness theorem bridging the two Zero sorries / axioms / admits across the entire project. --- TsmLean.lean | 3 + TsmLean/Compile/Compile.lean | 12 ++++ TsmLean/Compile/Correctness.lean | 101 +++++++++++++++++++++++++++++++ TsmLean/Compile/Source.lean | 21 +++++++ 4 files changed, 137 insertions(+) create mode 100644 TsmLean/Compile/Compile.lean create mode 100644 TsmLean/Compile/Correctness.lean create mode 100644 TsmLean/Compile/Source.lean diff --git a/TsmLean.lean b/TsmLean.lean index afc15c8..934dbf4 100644 --- a/TsmLean.lean +++ b/TsmLean.lean @@ -5,3 +5,6 @@ import TsmLean.Core.Eval import TsmLean.Core.Types import TsmLean.Core.TypeSoundness import TsmLean.Core.Preservation +import TsmLean.Compile.Source +import TsmLean.Compile.Compile +import TsmLean.Compile.Correctness diff --git a/TsmLean/Compile/Compile.lean b/TsmLean/Compile/Compile.lean new file mode 100644 index 0000000..6128bda --- /dev/null +++ b/TsmLean/Compile/Compile.lean @@ -0,0 +1,12 @@ +import TsmLean.Compile.Source +import TsmLean.Core.Syntax + +namespace TsmLean.Compile + +open TsmLean.Core (Instr Code) + +/-- Compile a source expression to TSM bytecode. -/ +def compile : Source.Expr → Code + | .intLit n => #[.push n] + +end TsmLean.Compile diff --git a/TsmLean/Compile/Correctness.lean b/TsmLean/Compile/Correctness.lean new file mode 100644 index 0000000..3b5feab --- /dev/null +++ b/TsmLean/Compile/Correctness.lean @@ -0,0 +1,101 @@ +import TsmLean.Compile.Compile +import TsmLean.Core.Semantics + +namespace TsmLean.Compile + +open TsmLean.Core + +/-! # Compiler-correctness theorem. + + `Source.Eval e v ⟹ TSM.MultiStep (start of compile e) (end of compile e, with v on stack)` + +The CompCert-flavored bridge: source-level evaluation and target-level +execution agree on the value produced. This is the substrate-projection +theorem at miniature scale — for v0.1 instantiated only on integer +literals; extending to compound expressions is mechanical and the +infrastructure (multi-step utilities, code-lookup helper, single-step +reduction lemmas) is already in place. -/ + +/-! ## Multi-step utilities. -/ + +theorem MultiStep.trans + {s₁ s₂ s₃ : State} + (h₁ : MultiStep s₁ s₂) (h₂ : MultiStep s₂ s₃) : + MultiStep s₁ s₃ := by + induction h₁ with + | refl => exact h₂ + | cons hs _ ih => exact .cons hs (ih h₂) + +theorem MultiStep.single + {s s' : State} (h : step s = some s') : + MultiStep s s' := .cons h (.refl _) + +/-! ## Single-step reduction lemmas. -/ + +theorem step_push + {code : Code} {pc : Nat} {stack : List Value} {n : Int} + (h_pc : pc < code.size) (h_get : code[pc]'h_pc = .push n) : + step { code := code, pc := pc, stack := stack } + = some { code := code, pc := pc + 1, stack := .vInt n :: stack } := by + unfold step; rw [dif_pos h_pc, h_get] + +/-! ## Code-lookup helper. + +Looking up an index `pre.size + i` in `pre ++ compile e ++ suf` reduces +to the corresponding index in `compile e` — used in the inductive case +when extending v0.1 to compound expressions. -/ + +theorem getElem_compile + (pre : Code) (e : Source.Expr) (suf : Code) (i : Nat) + (h_lt : i < (compile e).size) + (h_pc : pre.size + i < (pre ++ compile e ++ suf).size) : + (pre ++ compile e ++ suf)[pre.size + i]'h_pc = (compile e)[i]'h_lt := by + have h_pre_ce : pre.size + i < (pre ++ compile e).size := by + rw [Array.size_append]; omega + rw [Array.getElem_append_left h_pre_ce] + rw [Array.getElem_append_right (Nat.le_add_right _ _)] + congr 1; omega + +/-! ## Main theorem. -/ + +theorem compile_correct + {e : Source.Expr} {v : Source.Value} + (h_eval : Source.Eval e v) : + ∀ (pre suf : Code) (rest : List Value), + MultiStep + { code := pre ++ compile e ++ suf, pc := pre.size, stack := rest } + { code := pre ++ compile e ++ suf, + pc := pre.size + (compile e).size, stack := v :: rest } := by + induction h_eval with + | intLit n => + intros pre suf rest + apply MultiStep.single + have h_pc : pre.size < (pre ++ compile (Source.Expr.intLit n) ++ suf).size := by + simp only [compile, Array.size_append, Array.size_singleton]; omega + have h_get : (pre ++ compile (Source.Expr.intLit n) ++ suf)[pre.size]'h_pc = .push n := by + have h_pre_ce : pre.size < (pre ++ compile (Source.Expr.intLit n)).size := by + simp only [compile, Array.size_append, Array.size_singleton]; omega + rw [Array.getElem_append_left h_pre_ce] + rw [Array.getElem_append_right (Nat.le_refl _)] + simp [compile, Nat.sub_self] + have step_thm := step_push h_pc h_get (stack := rest) + have h_size : (compile (Source.Expr.intLit n)).size = 1 := by simp [compile] + rw [h_size] + exact step_thm + +/-! ## Demo run. + +A self-contained `compile_correct` corollary for the standalone case +where the compile output IS the entire program (no surrounding code). -/ + +theorem compile_correct_standalone + {e : Source.Expr} {v : Source.Value} + (h_eval : Source.Eval e v) : + MultiStep + { code := compile e, pc := 0, stack := [] } + { code := compile e, pc := (compile e).size, stack := [v] } := by + have h := compile_correct h_eval #[] #[] [] + simp [Array.size_empty, Array.append_empty, Array.empty_append] at h + exact h + +end TsmLean.Compile diff --git a/TsmLean/Compile/Source.lean b/TsmLean/Compile/Source.lean new file mode 100644 index 0000000..13bcdc4 --- /dev/null +++ b/TsmLean/Compile/Source.lean @@ -0,0 +1,21 @@ +import TsmLean.Core.Syntax + +namespace TsmLean.Compile.Source + +/-! # Source language for compilation (v0.1). + +The smallest meaningful source language: integer literals. The +compiler produces a one-instruction TSM program; the correctness +theorem demonstrates the proof framework. Subsequent versions extend +to arithmetic, comparison, control flow, and variables. -/ + +inductive Expr where + | intLit : Int → Expr + deriving Repr, Inhabited + +abbrev Value := TsmLean.Core.Value + +inductive Eval : Expr → Value → Prop where + | intLit (n : Int) : Eval (.intLit n) (.vInt n) + +end TsmLean.Compile.Source From ec65229050f2298297415d52d6b943479e01047c Mon Sep 17 00:00:00 2001 From: Maximus Gorog Date: Sun, 10 May 2026 05:53:39 -0600 Subject: [PATCH 3/5] Extend source-to-TSM compiler with addition (v0.2). MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Source.Expr now has intLit and add. Compile and correctness theorem both extend. The add case of compile_correct exercises the compositional structure: - IH on e1 (with extended suffix) gives the multistep for the first operand's evaluation. - IH on e2 (with extended prefix) gives the multistep for the second. - A single .add step at the boundary closes the trace. - Each intermediate state's PC is computed via array-size arithmetic threaded through omega. New supporting lemmas: step_add - per-instruction step for .add compile_add_get_op - the instruction at the end of compile (.add e1 e2) is .add. Extracted so the dependent-rewrite issue with array bound proofs is contained in one place. Engineering knowledge gained (recurring patterns when extending): - Array.getElem_append_left/right take the bound as an explicit positional arg, not via (h := ...). - rw on indices that appear in dependent bound proofs fails with "motive not type correct"; factor the lookup into a separate lemma. - convert tactic appears not to be available; rw + exact substitutes. - simp + omega closes most arithmetic on Array.size after expansion. - step lemmas with implicit args (a, b) need explicit (a := _) in calls where context doesn't determine them. Adding a constructor still follows the v0.1 recipe — one Source constructor, one Eval rule, one compile arm, one step_X helper, one compile_X_get_op lemma, one case in compile_correct's induction. Each case is ~25-40 lines of proof. Zero sorries / axioms / admits. --- TsmLean/Compile/Compile.lean | 3 +- TsmLean/Compile/Correctness.lean | 68 ++++++++++++++++++++++++++++++++ TsmLean/Compile/Source.lean | 13 +++--- 3 files changed, 78 insertions(+), 6 deletions(-) diff --git a/TsmLean/Compile/Compile.lean b/TsmLean/Compile/Compile.lean index 6128bda..fe3d36d 100644 --- a/TsmLean/Compile/Compile.lean +++ b/TsmLean/Compile/Compile.lean @@ -7,6 +7,7 @@ open TsmLean.Core (Instr Code) /-- Compile a source expression to TSM bytecode. -/ def compile : Source.Expr → Code - | .intLit n => #[.push n] + | .intLit n => #[.push n] + | .add e1 e2 => compile e1 ++ compile e2 ++ #[.add] end TsmLean.Compile diff --git a/TsmLean/Compile/Correctness.lean b/TsmLean/Compile/Correctness.lean index 3b5feab..22ad1c9 100644 --- a/TsmLean/Compile/Correctness.lean +++ b/TsmLean/Compile/Correctness.lean @@ -39,6 +39,27 @@ theorem step_push = some { code := code, pc := pc + 1, stack := .vInt n :: stack } := by unfold step; rw [dif_pos h_pc, h_get] +theorem step_add + {code : Code} {pc : Nat} {a b : Int} {rest : List Value} + (h_pc : pc < code.size) (h_get : code[pc]'h_pc = .add) : + step { code := code, pc := pc, stack := .vInt b :: .vInt a :: rest } + = some { code := code, pc := pc + 1, stack := .vInt (a + b) :: rest } := by + unfold step; rw [dif_pos h_pc, h_get] + +/-! ## Compile-output lookup helpers. -/ + +/-- The instruction at the end of `compile (.add e1 e2)` (right after both + sub-compiled segments) is `.add`. -/ +theorem compile_add_get_op + (e1 e2 : Source.Expr) + (h : (compile e1).size + (compile e2).size < (compile (Source.Expr.add e1 e2)).size) : + (compile (Source.Expr.add e1 e2))[(compile e1).size + (compile e2).size]'h = .add := by + show ((compile e1 ++ compile e2) ++ #[Instr.add])[(compile e1).size + (compile e2).size]'h = .add + have hle : (compile e1 ++ compile e2).size ≤ (compile e1).size + (compile e2).size := + Nat.le_of_eq Array.size_append + rw [Array.getElem_append_right hle] + simp [Array.size_append, Nat.sub_self] + /-! ## Code-lookup helper. Looking up an index `pre.size + i` in `pre ++ compile e ++ suf` reduces @@ -82,6 +103,53 @@ theorem compile_correct have h_size : (compile (Source.Expr.intLit n)).size = 1 := by simp [compile] rw [h_size] exact step_thm + | @add e1 e2 a b _ _ ih1 ih2 => + intros pre suf rest + -- Step A: compile e1 + have stepA := ih1 pre (compile e2 ++ #[.add] ++ suf) rest + -- Step B: compile e2 (with prefix extended) + have stepB := ih2 (pre ++ compile e1) (#[.add] ++ suf) (.vInt a :: rest) + -- Code rearrangements + have h_code_A : pre ++ compile e1 ++ (compile e2 ++ #[Instr.add] ++ suf) + = pre ++ compile (Source.Expr.add e1 e2) ++ suf := by + simp only [compile, Array.append_assoc] + have h_code_B : pre ++ compile e1 ++ compile e2 ++ (#[Instr.add] ++ suf) + = pre ++ compile (Source.Expr.add e1 e2) ++ suf := by + simp only [compile, Array.append_assoc] + rw [h_code_A] at stepA + rw [show (pre ++ compile e1).size = pre.size + (compile e1).size from + by simp [Array.size_append]] at stepB + rw [h_code_B] at stepB + apply MultiStep.trans stepA + apply MultiStep.trans stepB + apply MultiStep.single + -- Final .add step + have h_total_size : (compile (Source.Expr.add e1 e2)).size + = (compile e1).size + (compile e2).size + 1 := by + show (compile e1 ++ compile e2 ++ #[Instr.add]).size + = (compile e1).size + (compile e2).size + 1 + simp [Array.size_append]; omega + have h_op_pc : pre.size + (compile e1).size + (compile e2).size + < (pre ++ compile (Source.Expr.add e1 e2) ++ suf).size := by + simp only [Array.size_append, h_total_size]; omega + have h_in_comp : (compile e1).size + (compile e2).size + < (compile (Source.Expr.add e1 e2)).size := by + simp [compile, Array.size_append] + have h_full_pc : pre.size + ((compile e1).size + (compile e2).size) + < (pre ++ compile (Source.Expr.add e1 e2) ++ suf).size := by + simp only [Array.size_append, h_total_size]; omega + have h_op_get : (pre ++ compile (Source.Expr.add e1 e2) ++ suf)[pre.size + ((compile e1).size + (compile e2).size)]'h_full_pc = .add := by + rw [getElem_compile pre (Source.Expr.add e1 e2) suf + ((compile e1).size + (compile e2).size) h_in_comp h_full_pc] + exact compile_add_get_op e1 e2 h_in_comp + have h_step := step_add (a := a) (b := b) (rest := rest) h_full_pc h_op_get + have h_pre_pc : pre.size + (compile e1).size + (compile e2).size + = pre.size + ((compile e1).size + (compile e2).size) := by omega + have h_post_pc : pre.size + ((compile e1).size + (compile e2).size) + 1 + = pre.size + (compile (Source.Expr.add e1 e2)).size := by + rw [h_total_size]; omega + rw [h_pre_pc, ← h_post_pc] + exact h_step /-! ## Demo run. diff --git a/TsmLean/Compile/Source.lean b/TsmLean/Compile/Source.lean index 13bcdc4..cb63650 100644 --- a/TsmLean/Compile/Source.lean +++ b/TsmLean/Compile/Source.lean @@ -2,20 +2,23 @@ import TsmLean.Core.Syntax namespace TsmLean.Compile.Source -/-! # Source language for compilation (v0.1). +/-! # Source language for compilation (v0.2). -The smallest meaningful source language: integer literals. The -compiler produces a one-instruction TSM program; the correctness -theorem demonstrates the proof framework. Subsequent versions extend -to arithmetic, comparison, control flow, and variables. -/ +Integer literals + addition. The minimal "tree of operations" that +exercises the compositional structure of the correctness proof. -/ inductive Expr where | intLit : Int → Expr + | add : Expr → Expr → Expr deriving Repr, Inhabited abbrev Value := TsmLean.Core.Value inductive Eval : Expr → Value → Prop where | intLit (n : Int) : Eval (.intLit n) (.vInt n) + | add {e1 e2 a b} + (h1 : Eval e1 (.vInt a)) + (h2 : Eval e2 (.vInt b)) : + Eval (.add e1 e2) (.vInt (a + b)) end TsmLean.Compile.Source From f9ed9ec4c73b268d41b985d01a1c7360e7e62511 Mon Sep 17 00:00:00 2001 From: Maximus Gorog Date: Sun, 10 May 2026 06:01:44 -0600 Subject: [PATCH 4/5] Extend compiler with bool literals + sub + mul (v0.3). Source.Expr: intLit, boolLit, add, sub, mul. Each new constructor follows the pattern from v0.2: - one Source constructor + one Eval rule - one compile arm - one step_X lemma (push, pushB, add, sub, mul) - one line each - one compile_X_get_op lemma, all delegating to a generic getElem_at_op_boundary helper that handles the c1++c2++[op] shape - one case in compile_correct (~50 lines, mostly mechanical) The pattern is fully grooved: each new arithmetic op is now a copy- paste of the add case with substituted constructor names. The substantive proof work happened once (in v0.2 for add); subsequent arithmetic ops add no new proof shapes. Generic helpers introduced: getElem_at_op_boundary (c1 c2 : Code) (op : Instr) (h) : (c1 ++ c2 ++ #[op])[c1.size + c2.size]'h = op - applies to add/sub/mul interchangeably; specialized via one-line wrappers (compile_add_get_op etc.) Zero sorries / axioms / admits. Full build: 26 jobs clean. --- TsmLean/Compile/Compile.lean | 3 + TsmLean/Compile/Correctness.lean | 171 ++++++++++++++++++++++++++----- TsmLean/Compile/Source.lean | 30 ++++-- 3 files changed, 169 insertions(+), 35 deletions(-) diff --git a/TsmLean/Compile/Compile.lean b/TsmLean/Compile/Compile.lean index fe3d36d..b8e96b0 100644 --- a/TsmLean/Compile/Compile.lean +++ b/TsmLean/Compile/Compile.lean @@ -8,6 +8,9 @@ open TsmLean.Core (Instr Code) /-- Compile a source expression to TSM bytecode. -/ def compile : Source.Expr → Code | .intLit n => #[.push n] + | .boolLit b => #[.pushB b] | .add e1 e2 => compile e1 ++ compile e2 ++ #[.add] + | .sub e1 e2 => compile e1 ++ compile e2 ++ #[.sub] + | .mul e1 e2 => compile e1 ++ compile e2 ++ #[.mul] end TsmLean.Compile diff --git a/TsmLean/Compile/Correctness.lean b/TsmLean/Compile/Correctness.lean index 22ad1c9..0efa4f6 100644 --- a/TsmLean/Compile/Correctness.lean +++ b/TsmLean/Compile/Correctness.lean @@ -10,11 +10,7 @@ open TsmLean.Core `Source.Eval e v ⟹ TSM.MultiStep (start of compile e) (end of compile e, with v on stack)` The CompCert-flavored bridge: source-level evaluation and target-level -execution agree on the value produced. This is the substrate-projection -theorem at miniature scale — for v0.1 instantiated only on integer -literals; extending to compound expressions is mechanical and the -infrastructure (multi-step utilities, code-lookup helper, single-step -reduction lemmas) is already in place. -/ +execution agree on the value produced. -/ /-! ## Multi-step utilities. -/ @@ -39,6 +35,13 @@ theorem step_push = some { code := code, pc := pc + 1, stack := .vInt n :: stack } := by unfold step; rw [dif_pos h_pc, h_get] +theorem step_pushB + {code : Code} {pc : Nat} {stack : List Value} {b : Bool} + (h_pc : pc < code.size) (h_get : code[pc]'h_pc = .pushB b) : + step { code := code, pc := pc, stack := stack } + = some { code := code, pc := pc + 1, stack := .vBool b :: stack } := by + unfold step; rw [dif_pos h_pc, h_get] + theorem step_add {code : Code} {pc : Nat} {a b : Int} {rest : List Value} (h_pc : pc < code.size) (h_get : code[pc]'h_pc = .add) : @@ -46,25 +49,53 @@ theorem step_add = some { code := code, pc := pc + 1, stack := .vInt (a + b) :: rest } := by unfold step; rw [dif_pos h_pc, h_get] +theorem step_sub + {code : Code} {pc : Nat} {a b : Int} {rest : List Value} + (h_pc : pc < code.size) (h_get : code[pc]'h_pc = .sub) : + step { code := code, pc := pc, stack := .vInt b :: .vInt a :: rest } + = some { code := code, pc := pc + 1, stack := .vInt (a - b) :: rest } := by + unfold step; rw [dif_pos h_pc, h_get] + +theorem step_mul + {code : Code} {pc : Nat} {a b : Int} {rest : List Value} + (h_pc : pc < code.size) (h_get : code[pc]'h_pc = .mul) : + step { code := code, pc := pc, stack := .vInt b :: .vInt a :: rest } + = some { code := code, pc := pc + 1, stack := .vInt (a * b) :: rest } := by + unfold step; rw [dif_pos h_pc, h_get] + /-! ## Compile-output lookup helpers. -/ -/-- The instruction at the end of `compile (.add e1 e2)` (right after both - sub-compiled segments) is `.add`. -/ -theorem compile_add_get_op - (e1 e2 : Source.Expr) - (h : (compile e1).size + (compile e2).size < (compile (Source.Expr.add e1 e2)).size) : - (compile (Source.Expr.add e1 e2))[(compile e1).size + (compile e2).size]'h = .add := by - show ((compile e1 ++ compile e2) ++ #[Instr.add])[(compile e1).size + (compile e2).size]'h = .add - have hle : (compile e1 ++ compile e2).size ≤ (compile e1).size + (compile e2).size := - Nat.le_of_eq Array.size_append +/-- The instruction at the boundary of a `c1 ++ c2 ++ #[op]` arrangement + is `op`. Generic version: works for any closing instruction. -/ +theorem getElem_at_op_boundary + (c1 c2 : Code) (op : Instr) + (h : c1.size + c2.size < (c1 ++ c2 ++ #[op]).size) : + (c1 ++ c2 ++ #[op])[c1.size + c2.size]'h = op := by + show ((c1 ++ c2) ++ #[op])[c1.size + c2.size]'h = op + have hle : (c1 ++ c2).size ≤ c1.size + c2.size := Nat.le_of_eq Array.size_append rw [Array.getElem_append_right hle] simp [Array.size_append, Nat.sub_self] -/-! ## Code-lookup helper. +/-- Specialization for each binop. -/ +theorem compile_add_get_op + (e1 e2 : Source.Expr) + (h : (compile e1).size + (compile e2).size < (compile (Source.Expr.add e1 e2)).size) : + (compile (Source.Expr.add e1 e2))[(compile e1).size + (compile e2).size]'h = .add := + getElem_at_op_boundary (compile e1) (compile e2) _ h -Looking up an index `pre.size + i` in `pre ++ compile e ++ suf` reduces -to the corresponding index in `compile e` — used in the inductive case -when extending v0.1 to compound expressions. -/ +theorem compile_sub_get_op + (e1 e2 : Source.Expr) + (h : (compile e1).size + (compile e2).size < (compile (Source.Expr.sub e1 e2)).size) : + (compile (Source.Expr.sub e1 e2))[(compile e1).size + (compile e2).size]'h = .sub := + getElem_at_op_boundary (compile e1) (compile e2) _ h + +theorem compile_mul_get_op + (e1 e2 : Source.Expr) + (h : (compile e1).size + (compile e2).size < (compile (Source.Expr.mul e1 e2)).size) : + (compile (Source.Expr.mul e1 e2))[(compile e1).size + (compile e2).size]'h = .mul := + getElem_at_op_boundary (compile e1) (compile e2) _ h + +/-! ## Code-lookup helper. -/ theorem getElem_compile (pre : Code) (e : Source.Expr) (suf : Code) (i : Nat) @@ -103,13 +134,25 @@ theorem compile_correct have h_size : (compile (Source.Expr.intLit n)).size = 1 := by simp [compile] rw [h_size] exact step_thm + | boolLit b => + intros pre suf rest + apply MultiStep.single + have h_pc : pre.size < (pre ++ compile (Source.Expr.boolLit b) ++ suf).size := by + simp only [compile, Array.size_append, Array.size_singleton]; omega + have h_get : (pre ++ compile (Source.Expr.boolLit b) ++ suf)[pre.size]'h_pc = .pushB b := by + have h_pre_ce : pre.size < (pre ++ compile (Source.Expr.boolLit b)).size := by + simp only [compile, Array.size_append, Array.size_singleton]; omega + rw [Array.getElem_append_left h_pre_ce] + rw [Array.getElem_append_right (Nat.le_refl _)] + simp [compile, Nat.sub_self] + have step_thm := step_pushB h_pc h_get (stack := rest) + have h_size : (compile (Source.Expr.boolLit b)).size = 1 := by simp [compile] + rw [h_size] + exact step_thm | @add e1 e2 a b _ _ ih1 ih2 => intros pre suf rest - -- Step A: compile e1 have stepA := ih1 pre (compile e2 ++ #[.add] ++ suf) rest - -- Step B: compile e2 (with prefix extended) have stepB := ih2 (pre ++ compile e1) (#[.add] ++ suf) (.vInt a :: rest) - -- Code rearrangements have h_code_A : pre ++ compile e1 ++ (compile e2 ++ #[Instr.add] ++ suf) = pre ++ compile (Source.Expr.add e1 e2) ++ suf := by simp only [compile, Array.append_assoc] @@ -123,15 +166,11 @@ theorem compile_correct apply MultiStep.trans stepA apply MultiStep.trans stepB apply MultiStep.single - -- Final .add step have h_total_size : (compile (Source.Expr.add e1 e2)).size = (compile e1).size + (compile e2).size + 1 := by show (compile e1 ++ compile e2 ++ #[Instr.add]).size = (compile e1).size + (compile e2).size + 1 simp [Array.size_append]; omega - have h_op_pc : pre.size + (compile e1).size + (compile e2).size - < (pre ++ compile (Source.Expr.add e1 e2) ++ suf).size := by - simp only [Array.size_append, h_total_size]; omega have h_in_comp : (compile e1).size + (compile e2).size < (compile (Source.Expr.add e1 e2)).size := by simp [compile, Array.size_append] @@ -150,6 +189,86 @@ theorem compile_correct rw [h_total_size]; omega rw [h_pre_pc, ← h_post_pc] exact h_step + | @sub e1 e2 a b _ _ ih1 ih2 => + intros pre suf rest + have stepA := ih1 pre (compile e2 ++ #[.sub] ++ suf) rest + have stepB := ih2 (pre ++ compile e1) (#[.sub] ++ suf) (.vInt a :: rest) + have h_code_A : pre ++ compile e1 ++ (compile e2 ++ #[Instr.sub] ++ suf) + = pre ++ compile (Source.Expr.sub e1 e2) ++ suf := by + simp only [compile, Array.append_assoc] + have h_code_B : pre ++ compile e1 ++ compile e2 ++ (#[Instr.sub] ++ suf) + = pre ++ compile (Source.Expr.sub e1 e2) ++ suf := by + simp only [compile, Array.append_assoc] + rw [h_code_A] at stepA + rw [show (pre ++ compile e1).size = pre.size + (compile e1).size from + by simp [Array.size_append]] at stepB + rw [h_code_B] at stepB + apply MultiStep.trans stepA + apply MultiStep.trans stepB + apply MultiStep.single + have h_total_size : (compile (Source.Expr.sub e1 e2)).size + = (compile e1).size + (compile e2).size + 1 := by + show (compile e1 ++ compile e2 ++ #[Instr.sub]).size + = (compile e1).size + (compile e2).size + 1 + simp [Array.size_append]; omega + have h_in_comp : (compile e1).size + (compile e2).size + < (compile (Source.Expr.sub e1 e2)).size := by + simp [compile, Array.size_append] + have h_full_pc : pre.size + ((compile e1).size + (compile e2).size) + < (pre ++ compile (Source.Expr.sub e1 e2) ++ suf).size := by + simp only [Array.size_append, h_total_size]; omega + have h_op_get : (pre ++ compile (Source.Expr.sub e1 e2) ++ suf)[pre.size + ((compile e1).size + (compile e2).size)]'h_full_pc = .sub := by + rw [getElem_compile pre (Source.Expr.sub e1 e2) suf + ((compile e1).size + (compile e2).size) h_in_comp h_full_pc] + exact compile_sub_get_op e1 e2 h_in_comp + have h_step := step_sub (a := a) (b := b) (rest := rest) h_full_pc h_op_get + have h_pre_pc : pre.size + (compile e1).size + (compile e2).size + = pre.size + ((compile e1).size + (compile e2).size) := by omega + have h_post_pc : pre.size + ((compile e1).size + (compile e2).size) + 1 + = pre.size + (compile (Source.Expr.sub e1 e2)).size := by + rw [h_total_size]; omega + rw [h_pre_pc, ← h_post_pc] + exact h_step + | @mul e1 e2 a b _ _ ih1 ih2 => + intros pre suf rest + have stepA := ih1 pre (compile e2 ++ #[.mul] ++ suf) rest + have stepB := ih2 (pre ++ compile e1) (#[.mul] ++ suf) (.vInt a :: rest) + have h_code_A : pre ++ compile e1 ++ (compile e2 ++ #[Instr.mul] ++ suf) + = pre ++ compile (Source.Expr.mul e1 e2) ++ suf := by + simp only [compile, Array.append_assoc] + have h_code_B : pre ++ compile e1 ++ compile e2 ++ (#[Instr.mul] ++ suf) + = pre ++ compile (Source.Expr.mul e1 e2) ++ suf := by + simp only [compile, Array.append_assoc] + rw [h_code_A] at stepA + rw [show (pre ++ compile e1).size = pre.size + (compile e1).size from + by simp [Array.size_append]] at stepB + rw [h_code_B] at stepB + apply MultiStep.trans stepA + apply MultiStep.trans stepB + apply MultiStep.single + have h_total_size : (compile (Source.Expr.mul e1 e2)).size + = (compile e1).size + (compile e2).size + 1 := by + show (compile e1 ++ compile e2 ++ #[Instr.mul]).size + = (compile e1).size + (compile e2).size + 1 + simp [Array.size_append]; omega + have h_in_comp : (compile e1).size + (compile e2).size + < (compile (Source.Expr.mul e1 e2)).size := by + simp [compile, Array.size_append] + have h_full_pc : pre.size + ((compile e1).size + (compile e2).size) + < (pre ++ compile (Source.Expr.mul e1 e2) ++ suf).size := by + simp only [Array.size_append, h_total_size]; omega + have h_op_get : (pre ++ compile (Source.Expr.mul e1 e2) ++ suf)[pre.size + ((compile e1).size + (compile e2).size)]'h_full_pc = .mul := by + rw [getElem_compile pre (Source.Expr.mul e1 e2) suf + ((compile e1).size + (compile e2).size) h_in_comp h_full_pc] + exact compile_mul_get_op e1 e2 h_in_comp + have h_step := step_mul (a := a) (b := b) (rest := rest) h_full_pc h_op_get + have h_pre_pc : pre.size + (compile e1).size + (compile e2).size + = pre.size + ((compile e1).size + (compile e2).size) := by omega + have h_post_pc : pre.size + ((compile e1).size + (compile e2).size) + 1 + = pre.size + (compile (Source.Expr.mul e1 e2)).size := by + rw [h_total_size]; omega + rw [h_pre_pc, ← h_post_pc] + exact h_step /-! ## Demo run. @@ -163,7 +282,7 @@ theorem compile_correct_standalone { code := compile e, pc := 0, stack := [] } { code := compile e, pc := (compile e).size, stack := [v] } := by have h := compile_correct h_eval #[] #[] [] - simp [Array.size_empty, Array.append_empty, Array.empty_append] at h + simp [Array.append_empty, Array.empty_append] at h exact h end TsmLean.Compile diff --git a/TsmLean/Compile/Source.lean b/TsmLean/Compile/Source.lean index cb63650..c2a5604 100644 --- a/TsmLean/Compile/Source.lean +++ b/TsmLean/Compile/Source.lean @@ -2,23 +2,35 @@ import TsmLean.Core.Syntax namespace TsmLean.Compile.Source -/-! # Source language for compilation (v0.2). +/-! # Source language for compilation (v0.3). -Integer literals + addition. The minimal "tree of operations" that -exercises the compositional structure of the correctness proof. -/ +Integer/bool literals + arithmetic. Each arithmetic op is a separate +constructor — verbose, but each correctness case has a clean shape. -/ inductive Expr where - | intLit : Int → Expr - | add : Expr → Expr → Expr + | intLit : Int → Expr + | boolLit : Bool → Expr + | add : Expr → Expr → Expr + | sub : Expr → Expr → Expr + | mul : Expr → Expr → Expr deriving Repr, Inhabited abbrev Value := TsmLean.Core.Value inductive Eval : Expr → Value → Prop where - | intLit (n : Int) : Eval (.intLit n) (.vInt n) - | add {e1 e2 a b} - (h1 : Eval e1 (.vInt a)) - (h2 : Eval e2 (.vInt b)) : + | intLit (n : Int) : Eval (.intLit n) (.vInt n) + | boolLit (b : Bool) : Eval (.boolLit b) (.vBool b) + | add {e1 e2 a b} + (h1 : Eval e1 (.vInt a)) + (h2 : Eval e2 (.vInt b)) : Eval (.add e1 e2) (.vInt (a + b)) + | sub {e1 e2 a b} + (h1 : Eval e1 (.vInt a)) + (h2 : Eval e2 (.vInt b)) : + Eval (.sub e1 e2) (.vInt (a - b)) + | mul {e1 e2 a b} + (h1 : Eval e1 (.vInt a)) + (h2 : Eval e2 (.vInt b)) : + Eval (.mul e1 e2) (.vInt (a * b)) end TsmLean.Compile.Source From 2e9061abead6f2daa464b39a79c17a949db30785 Mon Sep 17 00:00:00 2001 From: Maximus Gorog Date: Sun, 10 May 2026 06:12:03 -0600 Subject: [PATCH 5/5] Refactor compile to be offset-aware (v0.4). MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Compile signature changes from `Source.Expr -> Code` to `Nat -> Source.Expr -> Code`, where the Nat is the absolute address where the output is placed. For v0.4's constructs (intLit, boolLit, add, sub, mul) the offset is just threaded — no compile output depends on it. The infrastructure is now ready for control-flow constructs (ifte) that need absolute jump addresses. All 5 cases of compile_correct updated for the offset-aware signature. Each case now references compile pre.size e instead of compile e. For binops, the second sub-expression is compiled at offset pre.size + (compile pre.size e1).size. Generic helpers introduced: getElem_at_offset (pre X suf : Code) (i) : (pre ++ X ++ suf)[pre.size + i] = X[i] - bridges outer-array lookup to inner-array lookup at index. Compile_X_get_op now parameterized by offset. Engineering notes for ifte (deferred to v0.5): - The compile output for ifte is cc ++ #[jmpFalse else_at] ++ ct ++ #[jmp end_at] ++ cf where else_at, end_at are absolute addresses. - Looking up the .jmpFalse or .jmp instructions inside this nested-append structure requires 4-deep getElem_append_left/ getElem_append_right rewrites. Each rewrite has a bound proof that must thread through dependent type matching. Lean's `rw` fails with "motive not type correct" on the natural proof. - The path forward is likely a different decomposition that avoids dependent-rewrite chains - perhaps using Array.getElem? (Option-form) which doesn't carry bounds, then bridging back to Array.getElem at the leaf. Zero sorries / axioms / admits in v0.4. Full build clean (26 jobs). --- TsmLean/Compile/Compile.lean | 25 ++- TsmLean/Compile/Correctness.lean | 276 ++++++++++++++++--------------- TsmLean/Compile/Source.lean | 17 +- 3 files changed, 172 insertions(+), 146 deletions(-) diff --git a/TsmLean/Compile/Compile.lean b/TsmLean/Compile/Compile.lean index b8e96b0..a03c17d 100644 --- a/TsmLean/Compile/Compile.lean +++ b/TsmLean/Compile/Compile.lean @@ -5,12 +5,23 @@ namespace TsmLean.Compile open TsmLean.Core (Instr Code) -/-- Compile a source expression to TSM bytecode. -/ -def compile : Source.Expr → Code - | .intLit n => #[.push n] - | .boolLit b => #[.pushB b] - | .add e1 e2 => compile e1 ++ compile e2 ++ #[.add] - | .sub e1 e2 => compile e1 ++ compile e2 ++ #[.sub] - | .mul e1 e2 => compile e1 ++ compile e2 ++ #[.mul] +/-- Compile a source expression to TSM bytecode, with the output + placed at absolute address `offset` in the final assembled code. + Jumps (in `ifte`) reference absolute addresses. -/ +def compile : (offset : Nat) → Source.Expr → Code + | _, .intLit n => #[.push n] + | _, .boolLit b => #[.pushB b] + | offset, .add e1 e2 => + let c1 := compile offset e1 + let c2 := compile (offset + c1.size) e2 + c1 ++ c2 ++ #[.add] + | offset, .sub e1 e2 => + let c1 := compile offset e1 + let c2 := compile (offset + c1.size) e2 + c1 ++ c2 ++ #[.sub] + | offset, .mul e1 e2 => + let c1 := compile offset e1 + let c2 := compile (offset + c1.size) e2 + c1 ++ c2 ++ #[.mul] end TsmLean.Compile diff --git a/TsmLean/Compile/Correctness.lean b/TsmLean/Compile/Correctness.lean index 0efa4f6..6d6ffa0 100644 --- a/TsmLean/Compile/Correctness.lean +++ b/TsmLean/Compile/Correctness.lean @@ -63,10 +63,30 @@ theorem step_mul = some { code := code, pc := pc + 1, stack := .vInt (a * b) :: rest } := by unfold step; rw [dif_pos h_pc, h_get] -/-! ## Compile-output lookup helpers. -/ +theorem step_jmp + {code : Code} {pc k : Nat} {stack : List Value} + (h_pc : pc < code.size) (h_get : code[pc]'h_pc = .jmp k) : + step { code := code, pc := pc, stack := stack } + = some { code := code, pc := k, stack := stack } := by + unfold step; rw [dif_pos h_pc, h_get] -/-- The instruction at the boundary of a `c1 ++ c2 ++ #[op]` arrangement - is `op`. Generic version: works for any closing instruction. -/ +theorem step_jmpFalse_true + {code : Code} {pc k : Nat} {rest : List Value} + (h_pc : pc < code.size) (h_get : code[pc]'h_pc = .jmpFalse k) : + step { code := code, pc := pc, stack := .vBool true :: rest } + = some { code := code, pc := pc + 1, stack := rest } := by + unfold step; rw [dif_pos h_pc, h_get] + +theorem step_jmpFalse_false + {code : Code} {pc k : Nat} {rest : List Value} + (h_pc : pc < code.size) (h_get : code[pc]'h_pc = .jmpFalse k) : + step { code := code, pc := pc, stack := .vBool false :: rest } + = some { code := code, pc := k, stack := rest } := by + unfold step; rw [dif_pos h_pc, h_get] + +/-! ## Generic array-lookup helpers. -/ + +/-- The instruction at the boundary of a `c1 ++ c2 ++ #[op]` arrangement. -/ theorem getElem_at_op_boundary (c1 c2 : Code) (op : Instr) (h : c1.size + c2.size < (c1 ++ c2 ++ #[op]).size) : @@ -76,213 +96,209 @@ theorem getElem_at_op_boundary rw [Array.getElem_append_right hle] simp [Array.size_append, Nat.sub_self] -/-- Specialization for each binop. -/ -theorem compile_add_get_op - (e1 e2 : Source.Expr) - (h : (compile e1).size + (compile e2).size < (compile (Source.Expr.add e1 e2)).size) : - (compile (Source.Expr.add e1 e2))[(compile e1).size + (compile e2).size]'h = .add := - getElem_at_op_boundary (compile e1) (compile e2) _ h - -theorem compile_sub_get_op - (e1 e2 : Source.Expr) - (h : (compile e1).size + (compile e2).size < (compile (Source.Expr.sub e1 e2)).size) : - (compile (Source.Expr.sub e1 e2))[(compile e1).size + (compile e2).size]'h = .sub := - getElem_at_op_boundary (compile e1) (compile e2) _ h - -theorem compile_mul_get_op - (e1 e2 : Source.Expr) - (h : (compile e1).size + (compile e2).size < (compile (Source.Expr.mul e1 e2)).size) : - (compile (Source.Expr.mul e1 e2))[(compile e1).size + (compile e2).size]'h = .mul := - getElem_at_op_boundary (compile e1) (compile e2) _ h - -/-! ## Code-lookup helper. -/ - -theorem getElem_compile - (pre : Code) (e : Source.Expr) (suf : Code) (i : Nat) - (h_lt : i < (compile e).size) - (h_pc : pre.size + i < (pre ++ compile e ++ suf).size) : - (pre ++ compile e ++ suf)[pre.size + i]'h_pc = (compile e)[i]'h_lt := by - have h_pre_ce : pre.size + i < (pre ++ compile e).size := by +/-- Lookup at offset `pre.size + i` within a `pre ++ X ++ suf` array reduces + to lookup at `i` within `X` itself, when `i < X.size`. -/ +theorem getElem_at_offset + (pre X suf : Code) (i : Nat) + (h_lt : i < X.size) + (h_pc : pre.size + i < (pre ++ X ++ suf).size) : + (pre ++ X ++ suf)[pre.size + i]'h_pc = X[i]'h_lt := by + have h_pre_X : pre.size + i < (pre ++ X).size := by rw [Array.size_append]; omega - rw [Array.getElem_append_left h_pre_ce] + rw [Array.getElem_append_left h_pre_X] rw [Array.getElem_append_right (Nat.le_add_right _ _)] congr 1; omega -/-! ## Main theorem. -/ +/-! ## Per-construct compile-output lookup lemmas. -/ + +theorem compile_add_get_op (offset : Nat) (e1 e2 : Source.Expr) + (h : (compile offset e1).size + (compile (offset + (compile offset e1).size) e2).size + < (compile offset (Source.Expr.add e1 e2)).size) : + (compile offset (Source.Expr.add e1 e2))[(compile offset e1).size + (compile (offset + (compile offset e1).size) e2).size]'h = .add := + getElem_at_op_boundary (compile offset e1) (compile (offset + (compile offset e1).size) e2) _ h + +theorem compile_sub_get_op (offset : Nat) (e1 e2 : Source.Expr) + (h : (compile offset e1).size + (compile (offset + (compile offset e1).size) e2).size + < (compile offset (Source.Expr.sub e1 e2)).size) : + (compile offset (Source.Expr.sub e1 e2))[(compile offset e1).size + (compile (offset + (compile offset e1).size) e2).size]'h = .sub := + getElem_at_op_boundary (compile offset e1) (compile (offset + (compile offset e1).size) e2) _ h + +theorem compile_mul_get_op (offset : Nat) (e1 e2 : Source.Expr) + (h : (compile offset e1).size + (compile (offset + (compile offset e1).size) e2).size + < (compile offset (Source.Expr.mul e1 e2)).size) : + (compile offset (Source.Expr.mul e1 e2))[(compile offset e1).size + (compile (offset + (compile offset e1).size) e2).size]'h = .mul := + getElem_at_op_boundary (compile offset e1) (compile (offset + (compile offset e1).size) e2) _ h + + +/-! ## Main theorem. + +The compile is offset-aware: `compile pre.size e` produces bytecode +correctly placed at position `pre.size` in `pre ++ ... ++ suf`. -/ theorem compile_correct {e : Source.Expr} {v : Source.Value} (h_eval : Source.Eval e v) : ∀ (pre suf : Code) (rest : List Value), MultiStep - { code := pre ++ compile e ++ suf, pc := pre.size, stack := rest } - { code := pre ++ compile e ++ suf, - pc := pre.size + (compile e).size, stack := v :: rest } := by + { code := pre ++ compile pre.size e ++ suf, pc := pre.size, stack := rest } + { code := pre ++ compile pre.size e ++ suf, + pc := pre.size + (compile pre.size e).size, stack := v :: rest } := by induction h_eval with | intLit n => intros pre suf rest apply MultiStep.single - have h_pc : pre.size < (pre ++ compile (Source.Expr.intLit n) ++ suf).size := by + have h_pc : pre.size < (pre ++ compile pre.size (Source.Expr.intLit n) ++ suf).size := by simp only [compile, Array.size_append, Array.size_singleton]; omega - have h_get : (pre ++ compile (Source.Expr.intLit n) ++ suf)[pre.size]'h_pc = .push n := by - have h_pre_ce : pre.size < (pre ++ compile (Source.Expr.intLit n)).size := by + have h_get : (pre ++ compile pre.size (Source.Expr.intLit n) ++ suf)[pre.size]'h_pc = .push n := by + have h_pre_ce : pre.size < (pre ++ compile pre.size (Source.Expr.intLit n)).size := by simp only [compile, Array.size_append, Array.size_singleton]; omega rw [Array.getElem_append_left h_pre_ce] rw [Array.getElem_append_right (Nat.le_refl _)] simp [compile, Nat.sub_self] have step_thm := step_push h_pc h_get (stack := rest) - have h_size : (compile (Source.Expr.intLit n)).size = 1 := by simp [compile] + have h_size : (compile pre.size (Source.Expr.intLit n)).size = 1 := by simp [compile] rw [h_size] exact step_thm | boolLit b => intros pre suf rest apply MultiStep.single - have h_pc : pre.size < (pre ++ compile (Source.Expr.boolLit b) ++ suf).size := by + have h_pc : pre.size < (pre ++ compile pre.size (Source.Expr.boolLit b) ++ suf).size := by simp only [compile, Array.size_append, Array.size_singleton]; omega - have h_get : (pre ++ compile (Source.Expr.boolLit b) ++ suf)[pre.size]'h_pc = .pushB b := by - have h_pre_ce : pre.size < (pre ++ compile (Source.Expr.boolLit b)).size := by + have h_get : (pre ++ compile pre.size (Source.Expr.boolLit b) ++ suf)[pre.size]'h_pc = .pushB b := by + have h_pre_ce : pre.size < (pre ++ compile pre.size (Source.Expr.boolLit b)).size := by simp only [compile, Array.size_append, Array.size_singleton]; omega rw [Array.getElem_append_left h_pre_ce] rw [Array.getElem_append_right (Nat.le_refl _)] simp [compile, Nat.sub_self] have step_thm := step_pushB h_pc h_get (stack := rest) - have h_size : (compile (Source.Expr.boolLit b)).size = 1 := by simp [compile] + have h_size : (compile pre.size (Source.Expr.boolLit b)).size = 1 := by simp [compile] rw [h_size] exact step_thm | @add e1 e2 a b _ _ ih1 ih2 => intros pre suf rest - have stepA := ih1 pre (compile e2 ++ #[.add] ++ suf) rest - have stepB := ih2 (pre ++ compile e1) (#[.add] ++ suf) (.vInt a :: rest) - have h_code_A : pre ++ compile e1 ++ (compile e2 ++ #[Instr.add] ++ suf) - = pre ++ compile (Source.Expr.add e1 e2) ++ suf := by + have stepA := ih1 pre (compile (pre.size + (compile pre.size e1).size) e2 ++ #[.add] ++ suf) rest + have stepB := ih2 (pre ++ compile pre.size e1) (#[.add] ++ suf) (.vInt a :: rest) + have h_pre_e1_size : (pre ++ compile pre.size e1).size = pre.size + (compile pre.size e1).size := by + simp [Array.size_append] + have h_code_A : pre ++ compile pre.size e1 ++ (compile (pre.size + (compile pre.size e1).size) e2 ++ #[Instr.add] ++ suf) + = pre ++ compile pre.size (Source.Expr.add e1 e2) ++ suf := by simp only [compile, Array.append_assoc] - have h_code_B : pre ++ compile e1 ++ compile e2 ++ (#[Instr.add] ++ suf) - = pre ++ compile (Source.Expr.add e1 e2) ++ suf := by + have h_code_B : pre ++ compile pre.size e1 ++ compile (pre.size + (compile pre.size e1).size) e2 ++ (#[Instr.add] ++ suf) + = pre ++ compile pre.size (Source.Expr.add e1 e2) ++ suf := by simp only [compile, Array.append_assoc] rw [h_code_A] at stepA - rw [show (pre ++ compile e1).size = pre.size + (compile e1).size from - by simp [Array.size_append]] at stepB + rw [h_pre_e1_size] at stepB rw [h_code_B] at stepB apply MultiStep.trans stepA apply MultiStep.trans stepB apply MultiStep.single - have h_total_size : (compile (Source.Expr.add e1 e2)).size - = (compile e1).size + (compile e2).size + 1 := by - show (compile e1 ++ compile e2 ++ #[Instr.add]).size - = (compile e1).size + (compile e2).size + 1 + have h_total_size : (compile pre.size (Source.Expr.add e1 e2)).size + = (compile pre.size e1).size + + (compile (pre.size + (compile pre.size e1).size) e2).size + 1 := by + show (compile pre.size e1 ++ compile (pre.size + (compile pre.size e1).size) e2 ++ #[Instr.add]).size + = (compile pre.size e1).size + (compile (pre.size + (compile pre.size e1).size) e2).size + 1 simp [Array.size_append]; omega - have h_in_comp : (compile e1).size + (compile e2).size - < (compile (Source.Expr.add e1 e2)).size := by + have h_in_comp : (compile pre.size e1).size + (compile (pre.size + (compile pre.size e1).size) e2).size + < (compile pre.size (Source.Expr.add e1 e2)).size := by simp [compile, Array.size_append] - have h_full_pc : pre.size + ((compile e1).size + (compile e2).size) - < (pre ++ compile (Source.Expr.add e1 e2) ++ suf).size := by + have h_full_pc : pre.size + ((compile pre.size e1).size + (compile (pre.size + (compile pre.size e1).size) e2).size) + < (pre ++ compile pre.size (Source.Expr.add e1 e2) ++ suf).size := by simp only [Array.size_append, h_total_size]; omega - have h_op_get : (pre ++ compile (Source.Expr.add e1 e2) ++ suf)[pre.size + ((compile e1).size + (compile e2).size)]'h_full_pc = .add := by - rw [getElem_compile pre (Source.Expr.add e1 e2) suf - ((compile e1).size + (compile e2).size) h_in_comp h_full_pc] - exact compile_add_get_op e1 e2 h_in_comp + have h_op_get : (pre ++ compile pre.size (Source.Expr.add e1 e2) ++ suf)[pre.size + ((compile pre.size e1).size + (compile (pre.size + (compile pre.size e1).size) e2).size)]'h_full_pc = .add := by + rw [getElem_at_offset pre (compile pre.size (Source.Expr.add e1 e2)) suf + ((compile pre.size e1).size + (compile (pre.size + (compile pre.size e1).size) e2).size) h_in_comp h_full_pc] + exact compile_add_get_op pre.size e1 e2 h_in_comp have h_step := step_add (a := a) (b := b) (rest := rest) h_full_pc h_op_get - have h_pre_pc : pre.size + (compile e1).size + (compile e2).size - = pre.size + ((compile e1).size + (compile e2).size) := by omega - have h_post_pc : pre.size + ((compile e1).size + (compile e2).size) + 1 - = pre.size + (compile (Source.Expr.add e1 e2)).size := by + have h_pre_pc : pre.size + (compile pre.size e1).size + (compile (pre.size + (compile pre.size e1).size) e2).size + = pre.size + ((compile pre.size e1).size + (compile (pre.size + (compile pre.size e1).size) e2).size) := by omega + have h_post_pc : pre.size + ((compile pre.size e1).size + (compile (pre.size + (compile pre.size e1).size) e2).size) + 1 + = pre.size + (compile pre.size (Source.Expr.add e1 e2)).size := by rw [h_total_size]; omega rw [h_pre_pc, ← h_post_pc] exact h_step | @sub e1 e2 a b _ _ ih1 ih2 => intros pre suf rest - have stepA := ih1 pre (compile e2 ++ #[.sub] ++ suf) rest - have stepB := ih2 (pre ++ compile e1) (#[.sub] ++ suf) (.vInt a :: rest) - have h_code_A : pre ++ compile e1 ++ (compile e2 ++ #[Instr.sub] ++ suf) - = pre ++ compile (Source.Expr.sub e1 e2) ++ suf := by + have stepA := ih1 pre (compile (pre.size + (compile pre.size e1).size) e2 ++ #[.sub] ++ suf) rest + have stepB := ih2 (pre ++ compile pre.size e1) (#[.sub] ++ suf) (.vInt a :: rest) + have h_pre_e1_size : (pre ++ compile pre.size e1).size = pre.size + (compile pre.size e1).size := by + simp [Array.size_append] + have h_code_A : pre ++ compile pre.size e1 ++ (compile (pre.size + (compile pre.size e1).size) e2 ++ #[Instr.sub] ++ suf) + = pre ++ compile pre.size (Source.Expr.sub e1 e2) ++ suf := by simp only [compile, Array.append_assoc] - have h_code_B : pre ++ compile e1 ++ compile e2 ++ (#[Instr.sub] ++ suf) - = pre ++ compile (Source.Expr.sub e1 e2) ++ suf := by + have h_code_B : pre ++ compile pre.size e1 ++ compile (pre.size + (compile pre.size e1).size) e2 ++ (#[Instr.sub] ++ suf) + = pre ++ compile pre.size (Source.Expr.sub e1 e2) ++ suf := by simp only [compile, Array.append_assoc] rw [h_code_A] at stepA - rw [show (pre ++ compile e1).size = pre.size + (compile e1).size from - by simp [Array.size_append]] at stepB + rw [h_pre_e1_size] at stepB rw [h_code_B] at stepB apply MultiStep.trans stepA apply MultiStep.trans stepB apply MultiStep.single - have h_total_size : (compile (Source.Expr.sub e1 e2)).size - = (compile e1).size + (compile e2).size + 1 := by - show (compile e1 ++ compile e2 ++ #[Instr.sub]).size - = (compile e1).size + (compile e2).size + 1 + have h_total_size : (compile pre.size (Source.Expr.sub e1 e2)).size + = (compile pre.size e1).size + + (compile (pre.size + (compile pre.size e1).size) e2).size + 1 := by + show (compile pre.size e1 ++ compile (pre.size + (compile pre.size e1).size) e2 ++ #[Instr.sub]).size + = (compile pre.size e1).size + (compile (pre.size + (compile pre.size e1).size) e2).size + 1 simp [Array.size_append]; omega - have h_in_comp : (compile e1).size + (compile e2).size - < (compile (Source.Expr.sub e1 e2)).size := by + have h_in_comp : (compile pre.size e1).size + (compile (pre.size + (compile pre.size e1).size) e2).size + < (compile pre.size (Source.Expr.sub e1 e2)).size := by simp [compile, Array.size_append] - have h_full_pc : pre.size + ((compile e1).size + (compile e2).size) - < (pre ++ compile (Source.Expr.sub e1 e2) ++ suf).size := by + have h_full_pc : pre.size + ((compile pre.size e1).size + (compile (pre.size + (compile pre.size e1).size) e2).size) + < (pre ++ compile pre.size (Source.Expr.sub e1 e2) ++ suf).size := by simp only [Array.size_append, h_total_size]; omega - have h_op_get : (pre ++ compile (Source.Expr.sub e1 e2) ++ suf)[pre.size + ((compile e1).size + (compile e2).size)]'h_full_pc = .sub := by - rw [getElem_compile pre (Source.Expr.sub e1 e2) suf - ((compile e1).size + (compile e2).size) h_in_comp h_full_pc] - exact compile_sub_get_op e1 e2 h_in_comp + have h_op_get : (pre ++ compile pre.size (Source.Expr.sub e1 e2) ++ suf)[pre.size + ((compile pre.size e1).size + (compile (pre.size + (compile pre.size e1).size) e2).size)]'h_full_pc = .sub := by + rw [getElem_at_offset pre (compile pre.size (Source.Expr.sub e1 e2)) suf + ((compile pre.size e1).size + (compile (pre.size + (compile pre.size e1).size) e2).size) h_in_comp h_full_pc] + exact compile_sub_get_op pre.size e1 e2 h_in_comp have h_step := step_sub (a := a) (b := b) (rest := rest) h_full_pc h_op_get - have h_pre_pc : pre.size + (compile e1).size + (compile e2).size - = pre.size + ((compile e1).size + (compile e2).size) := by omega - have h_post_pc : pre.size + ((compile e1).size + (compile e2).size) + 1 - = pre.size + (compile (Source.Expr.sub e1 e2)).size := by + have h_pre_pc : pre.size + (compile pre.size e1).size + (compile (pre.size + (compile pre.size e1).size) e2).size + = pre.size + ((compile pre.size e1).size + (compile (pre.size + (compile pre.size e1).size) e2).size) := by omega + have h_post_pc : pre.size + ((compile pre.size e1).size + (compile (pre.size + (compile pre.size e1).size) e2).size) + 1 + = pre.size + (compile pre.size (Source.Expr.sub e1 e2)).size := by rw [h_total_size]; omega rw [h_pre_pc, ← h_post_pc] exact h_step | @mul e1 e2 a b _ _ ih1 ih2 => intros pre suf rest - have stepA := ih1 pre (compile e2 ++ #[.mul] ++ suf) rest - have stepB := ih2 (pre ++ compile e1) (#[.mul] ++ suf) (.vInt a :: rest) - have h_code_A : pre ++ compile e1 ++ (compile e2 ++ #[Instr.mul] ++ suf) - = pre ++ compile (Source.Expr.mul e1 e2) ++ suf := by + have stepA := ih1 pre (compile (pre.size + (compile pre.size e1).size) e2 ++ #[.mul] ++ suf) rest + have stepB := ih2 (pre ++ compile pre.size e1) (#[.mul] ++ suf) (.vInt a :: rest) + have h_pre_e1_size : (pre ++ compile pre.size e1).size = pre.size + (compile pre.size e1).size := by + simp [Array.size_append] + have h_code_A : pre ++ compile pre.size e1 ++ (compile (pre.size + (compile pre.size e1).size) e2 ++ #[Instr.mul] ++ suf) + = pre ++ compile pre.size (Source.Expr.mul e1 e2) ++ suf := by simp only [compile, Array.append_assoc] - have h_code_B : pre ++ compile e1 ++ compile e2 ++ (#[Instr.mul] ++ suf) - = pre ++ compile (Source.Expr.mul e1 e2) ++ suf := by + have h_code_B : pre ++ compile pre.size e1 ++ compile (pre.size + (compile pre.size e1).size) e2 ++ (#[Instr.mul] ++ suf) + = pre ++ compile pre.size (Source.Expr.mul e1 e2) ++ suf := by simp only [compile, Array.append_assoc] rw [h_code_A] at stepA - rw [show (pre ++ compile e1).size = pre.size + (compile e1).size from - by simp [Array.size_append]] at stepB + rw [h_pre_e1_size] at stepB rw [h_code_B] at stepB apply MultiStep.trans stepA apply MultiStep.trans stepB apply MultiStep.single - have h_total_size : (compile (Source.Expr.mul e1 e2)).size - = (compile e1).size + (compile e2).size + 1 := by - show (compile e1 ++ compile e2 ++ #[Instr.mul]).size - = (compile e1).size + (compile e2).size + 1 + have h_total_size : (compile pre.size (Source.Expr.mul e1 e2)).size + = (compile pre.size e1).size + + (compile (pre.size + (compile pre.size e1).size) e2).size + 1 := by + show (compile pre.size e1 ++ compile (pre.size + (compile pre.size e1).size) e2 ++ #[Instr.mul]).size + = (compile pre.size e1).size + (compile (pre.size + (compile pre.size e1).size) e2).size + 1 simp [Array.size_append]; omega - have h_in_comp : (compile e1).size + (compile e2).size - < (compile (Source.Expr.mul e1 e2)).size := by + have h_in_comp : (compile pre.size e1).size + (compile (pre.size + (compile pre.size e1).size) e2).size + < (compile pre.size (Source.Expr.mul e1 e2)).size := by simp [compile, Array.size_append] - have h_full_pc : pre.size + ((compile e1).size + (compile e2).size) - < (pre ++ compile (Source.Expr.mul e1 e2) ++ suf).size := by + have h_full_pc : pre.size + ((compile pre.size e1).size + (compile (pre.size + (compile pre.size e1).size) e2).size) + < (pre ++ compile pre.size (Source.Expr.mul e1 e2) ++ suf).size := by simp only [Array.size_append, h_total_size]; omega - have h_op_get : (pre ++ compile (Source.Expr.mul e1 e2) ++ suf)[pre.size + ((compile e1).size + (compile e2).size)]'h_full_pc = .mul := by - rw [getElem_compile pre (Source.Expr.mul e1 e2) suf - ((compile e1).size + (compile e2).size) h_in_comp h_full_pc] - exact compile_mul_get_op e1 e2 h_in_comp + have h_op_get : (pre ++ compile pre.size (Source.Expr.mul e1 e2) ++ suf)[pre.size + ((compile pre.size e1).size + (compile (pre.size + (compile pre.size e1).size) e2).size)]'h_full_pc = .mul := by + rw [getElem_at_offset pre (compile pre.size (Source.Expr.mul e1 e2)) suf + ((compile pre.size e1).size + (compile (pre.size + (compile pre.size e1).size) e2).size) h_in_comp h_full_pc] + exact compile_mul_get_op pre.size e1 e2 h_in_comp have h_step := step_mul (a := a) (b := b) (rest := rest) h_full_pc h_op_get - have h_pre_pc : pre.size + (compile e1).size + (compile e2).size - = pre.size + ((compile e1).size + (compile e2).size) := by omega - have h_post_pc : pre.size + ((compile e1).size + (compile e2).size) + 1 - = pre.size + (compile (Source.Expr.mul e1 e2)).size := by + have h_pre_pc : pre.size + (compile pre.size e1).size + (compile (pre.size + (compile pre.size e1).size) e2).size + = pre.size + ((compile pre.size e1).size + (compile (pre.size + (compile pre.size e1).size) e2).size) := by omega + have h_post_pc : pre.size + ((compile pre.size e1).size + (compile (pre.size + (compile pre.size e1).size) e2).size) + 1 + = pre.size + (compile pre.size (Source.Expr.mul e1 e2)).size := by rw [h_total_size]; omega rw [h_pre_pc, ← h_post_pc] exact h_step -/-! ## Demo run. - -A self-contained `compile_correct` corollary for the standalone case -where the compile output IS the entire program (no surrounding code). -/ - -theorem compile_correct_standalone - {e : Source.Expr} {v : Source.Value} - (h_eval : Source.Eval e v) : - MultiStep - { code := compile e, pc := 0, stack := [] } - { code := compile e, pc := (compile e).size, stack := [v] } := by - have h := compile_correct h_eval #[] #[] [] - simp [Array.append_empty, Array.empty_append] at h - exact h - end TsmLean.Compile diff --git a/TsmLean/Compile/Source.lean b/TsmLean/Compile/Source.lean index c2a5604..47fdc0b 100644 --- a/TsmLean/Compile/Source.lean +++ b/TsmLean/Compile/Source.lean @@ -2,10 +2,12 @@ import TsmLean.Core.Syntax namespace TsmLean.Compile.Source -/-! # Source language for compilation (v0.3). +/-! # Source language for compilation (v0.4). -Integer/bool literals + arithmetic. Each arithmetic op is a separate -constructor — verbose, but each correctness case has a clean shape. -/ +Integer/bool literals + arithmetic. The compile function takes an +offset (infrastructure for future control-flow constructs that +require absolute jump addresses). For the constructs in v0.4, the +offset doesn't change the compile output — it's just threaded. -/ inductive Expr where | intLit : Int → Expr @@ -21,16 +23,13 @@ inductive Eval : Expr → Value → Prop where | intLit (n : Int) : Eval (.intLit n) (.vInt n) | boolLit (b : Bool) : Eval (.boolLit b) (.vBool b) | add {e1 e2 a b} - (h1 : Eval e1 (.vInt a)) - (h2 : Eval e2 (.vInt b)) : + (h1 : Eval e1 (.vInt a)) (h2 : Eval e2 (.vInt b)) : Eval (.add e1 e2) (.vInt (a + b)) | sub {e1 e2 a b} - (h1 : Eval e1 (.vInt a)) - (h2 : Eval e2 (.vInt b)) : + (h1 : Eval e1 (.vInt a)) (h2 : Eval e2 (.vInt b)) : Eval (.sub e1 e2) (.vInt (a - b)) | mul {e1 e2 a b} - (h1 : Eval e1 (.vInt a)) - (h2 : Eval e2 (.vInt b)) : + (h1 : Eval e1 (.vInt a)) (h2 : Eval e2 (.vInt b)) : Eval (.mul e1 e2) (.vInt (a * b)) end TsmLean.Compile.Source