Commit graph

3 commits

Author SHA1 Message Date
Maximus Gorog
ec65229050 Extend source-to-TSM compiler with addition (v0.2).
Source.Expr now has intLit and add. Compile and correctness theorem
both extend.

The add case of compile_correct exercises the compositional structure:
  - IH on e1 (with extended suffix) gives the multistep for the first
    operand's evaluation.
  - IH on e2 (with extended prefix) gives the multistep for the second.
  - A single .add step at the boundary closes the trace.
  - Each intermediate state's PC is computed via array-size arithmetic
    threaded through omega.

New supporting lemmas:
  step_add               - per-instruction step for .add
  compile_add_get_op     - the instruction at the end of compile (.add e1 e2)
                           is .add. Extracted so the dependent-rewrite issue
                           with array bound proofs is contained in one place.

Engineering knowledge gained (recurring patterns when extending):
  - Array.getElem_append_left/right take the bound as an explicit positional
    arg, not via (h := ...).
  - rw on indices that appear in dependent bound proofs fails with "motive
    not type correct"; factor the lookup into a separate lemma.
  - convert tactic appears not to be available; rw + exact substitutes.
  - simp + omega closes most arithmetic on Array.size after expansion.
  - step lemmas with implicit args (a, b) need explicit (a := _) in calls
    where context doesn't determine them.

Adding a constructor still follows the v0.1 recipe — one Source
constructor, one Eval rule, one compile arm, one step_X helper, one
compile_X_get_op lemma, one case in compile_correct's induction. Each
case is ~25-40 lines of proof.

Zero sorries / axioms / admits.
2026-05-10 05:53:39 -06:00
Maximus Gorog
fff0091f89 Add source-to-TSM compiler with proven correctness (v0.1).
The CompCert-style substrate-projection theorem at miniature scale:
source-level evaluation and TSM-bytecode execution agree on the value
produced.

TsmLean/Compile/ — three files:

  Source.lean       - small expression language. v0.1 covers integer
                      literals only; the framework is structured so
                      arithmetic, comparison, control flow, and
                      variables extend mechanically.

  Compile.lean      - compile : Source.Expr -> TSM.Code
                      v0.1: intLit n -> #[push n]

  Correctness.lean  - theorem compile_correct:
                        Source.Eval e v ->
                        forall pre suf rest,
                          MultiStep
                            { code := pre ++ compile e ++ suf,
                              pc := pre.size, stack := rest }
                            { code := same,
                              pc := pre.size + (compile e).size,
                              stack := v :: rest }
                      Plus a standalone corollary for the no-prefix case.

The infrastructure is in place for compositional extension:

  MultiStep.trans       - transitive closure of multi-step
  MultiStep.single      - lift single step to multi-step
  step_push             - per-instruction step lemma (push)
  getElem_compile       - lookup-in-larger-code helper

Adding a constructor to Source (e.g., add) requires:
  - one constructor in Source.Expr
  - one rule in Source.Eval
  - one match arm in compile
  - one step_X helper (one-liner)
  - one case in compile_correct's induction

Demonstrates the pipeline:
  - Source language with big-step semantics
  - Compiler producing TSM bytecode
  - Correctness theorem bridging the two

Zero sorries / axioms / admits across the entire project.
2026-05-10 05:38:01 -06:00
Maximus Gorog
987f205ce5 Initial commit: Tiny Stack Machine (TSM) in Lean 4.
Third concrete kernel, parallel to golang-lean's TGC and octive-lean's
TOC. The substrate-level asymmetry: TSM has values living by *position*
on a stack, not by name. This breaks the named-variable assumption that
TGC and TOC silently share.

Maps onto real bytecode targets: WebAssembly, JVM, CPython, .NET CIL,
SECD. Anything proved here transfers.

TsmLean/Core/ — seven files, parallel structure to TGC/TOC:

  Syntax.lean        - Instr (12 opcodes), Value (int/bool), Code
  Semantics.lean     - State, step (function), MultiStep (rel'n)
  Determinism.lean   - step_deterministic, MultiStep.deterministic
  Eval.lean          - fuel-bounded run + run_sound
  Types.lean         - Ty, StackTy, HasTypeInstr
                       (per-instruction stack-type transitions)
  TypeSoundness.lean - HasTypeV, HasTypeStack
  Preservation.lean  - stack_preservation, progress
                       (canonical Pierce-style small-step type soundness)

Theorems proven, zero sorries / axioms / admits:

  step_deterministic           single-step is functional
  MultiStep.deterministic      multi-step paths to halt are unique
  run_sound                    successful run -> MultiStep derivation
  stack_preservation           stack typing preserved by step
  progress                     well-typed non-halt instructions step

Demo (Main.lean): (5 + 3) * 2 evaluated on the stack machine.
  push 5; push 3; add; push 2; mul; halt
  -> stack [vInt 16] at pc 5.

The structural asymmetry from TGC/TOC: TSM uses small-step semantics
with a function `step : State -> Option State`, where TGC/TOC used
big-step inductive relations `Env -> Term -> Value -> Env`. The
canonical type-soundness theorems also flip: TGC/TOC proved
preservation under big-step (which has no progress analogue);
TSM proves both progress AND preservation, each per-instruction.

This is the third datapoint that the cross-language factoring needs.
2026-05-10 05:12:10 -06:00