This PR adds `simpArrowTelescope`, a simproc that simplifies telescopes
of non-dependent arrows (p₁ → p₂ → ... → q) while avoiding quadratic
proof growth.
When using `Expr.forallE` to represent nested implications, each nesting
level bumps de Bruijn indices in subterms, destroying sharing even with
hash-consing. For example, a free variable `x` gets different de Bruijn
representations at each depth, causing proof terms to grow.
`simpArrowTelescope` works by:
- Converting arrows to `Arrow p q` (a definitional wrapper)
- Simplifying each component
- Converting back to `→` form
Since `Arrow` arguments are not under binders, subterms remain identical
across nesting levels and can be shared.
The `simp_4` benchmark demonstrates the improvement:
With `forallE`: ~160ms, proof_size ≈ 173k
With `Arrow`: ~43ms, proof_size ≈ 16k
Tradeoff: `simpArrowTelescope` misses simplifications that depend on the
arrow structure (e.g., `p → p` to `True`), since post-methods aren't
applied to intermediate arrows. Thus, it is not used by default. to use
it, one has to set `simpArrowTelescope` as a `pre`-method.
This PR adds an API for building symbolic simulation engines and
verification
condition generators that leverage `grind`. The API wraps `Sym`
operations to
work with `grind`'s `Goal` type, enabling lightweight symbolic execution
while
carrying `grind` state for discharge steps.
New operations on `Goal`:
- `mkGoal`: create a `Goal` from an `MVarId`
- `introN`, `intros`: introduce binders
- `apply`: apply backward rules
- `simp`, `simpIgnoringNoProgress`: simplify using `Sym.Simp`
- `internalize`, `internalizeAll`: add hypotheses to the E-graph
- `grind`: attempt to close the goal using `grind`
- `assumption`: close by matching a hypothesis
A new test demonstrates the API on a stateful program with conditionals,
using `grind` to discharge arithmetic side conditions.
This PR adds a new benchmark `shallow_add_sub_cancel.lean` that
demonstrates symbolic simulation using a shallow embedding into monadic
`do` notation, as opposed to the deep embedding approach in
`add_sub_cancel.lean`.
The shallow embedding approach:
- Uses Lean's `StateM` monad directly instead of a custom command
language
- Defines `Exec s k post` as a simple predicate: `post (k s).1 (k s).2`
- Proves helper theorems for reasoning about monadic operations (`pure`,
`bind`, `get`, `set`, `modify`, `ite`)
- Programs are written in actual `do`-notation rather than a custom AST
The benchmark solves goals using both the `MetaM` and `SymM` frameworks,
showing that the shallow embedding integrates well with the symbolic
simulation infrastructure. `SymM` is again way faster than `MetaM`
### Symbolic simulation benchmark — tactic time only
Problem size `n` corresponds to a program with `4·n` monadic actions.
| n | MetaM tactic (ms) | SymM tactic (ms) | Speedup |
|-----|-------------------|------------------|---------|
| 10 | 82.10 | 11.37 | ~7.2× |
| 20 | 176.21 | 17.71 | ~9.9× |
| 30 | 306.47 | 25.39 | ~12.1× |
| 40 | 509.52 | 34.53 | ~14.7× |
| 50 | 689.19 | 43.51 | ~15.8× |
| 60 | 905.86 | 52.47 | ~17.3× |
| 70 | 1172.31 | 62.50 | ~18.8× |
| 80 | 1448.48 | 70.65 | ~20.5× |
| 90 | 1787.15 | 80.89 | ~22.1× |
| 100 | 2128.12 | 90.77 | ~23.5× |
<img width="580" height="455" alt="image"
src="https://github.com/user-attachments/assets/3511aaab-4d53-4520-8302-65d2d100df4a"
/>
This PR adds a comparison between `MetaM` and `SymM` for a benchmark was
proposed during the Lean@Google Hackathon.
### Benchmark description
In this benchmark, we define the semantics of a very simple imperative
language using an inductive predicate
```
Exec prog events mem lctx post
```
The predicate holds if, when executing the program `prog` with an
initial list of events `events`, memory `mem`, and local context `lctx`,
the postcondition `post` holds.
We then consider the following program:
```
input b
a := b
a := a + a
a := a - b
...
a := a + a
a := a - b
```
That is, after reading an input value `b`, the program repeatedly
updates the variable `a` by doubling it and then subtracting `b`.
We prove that, for any initial memory `m` and local context `l`, and
starting from the empty list of events, the following postcondition
holds:
```
fun t' m' l' =>
m' = m ∧ -- memory did not change
∃ v : Word,
t' = [IOEvent.IN v] ∧ -- exactly one input event
l'.get "a" = some v -- `a` contains the input value
```
In other words, executing the program produces exactly one input event,
leaves the memory unchanged, and ensures that the final value of `a` is
equal to the input value.
### Symbolic simulation benchmark (problem size `n`, with `2·n + 2`
instructions)
| Problem size (n) | MetaM time (ms) | MetaM kernel (ms) | SymM time
(ms) | SymM kernel (ms) | Total speedup |
|------------------|------------------|-------------------|----------------|------------------|---------------|
| 10 | 94.83 | 6.60 | 7.04 | 6.18 | ~13.5× |
| 20 | 218.92 | 13.33 | 14.15 | 13.02 | ~15.5× |
| 30 | 375.10 | 22.95 | 26.51 | 19.81 | ~14.2× |
| 40 | 563.82 | 34.99 | 40.42 | 29.55 | ~14.0× |
| 50 | 815.89 | 53.78 | 60.84 | 42.25 | ~13.4× |
| 60 | 1081.09 | 73.46 | 80.99 | 53.52 | ~13.3× |
| 70 | 1400.80 | 102.70 | 106.02 | 68.61 | ~13.2× |
| 80 | 1772.19 | 126.65 | 134.23 | 87.64 | ~13.2× |
| 90 | 2203.41 | 161.68 | 168.26 | 115.52 | ~13.1× |
| 100 | 2474.09 | 191.23 | 209.13 | 143.86 | ~11.8× |
<img width="580" height="455" alt="image"
src="https://github.com/user-attachments/assets/bc7058fa-e71a-4c2c-be28-860f39166965"
/>
### Symbolic simulation with extra simplification (SymM)
Problem size `n` corresponds to a program with `2·n + 2` instructions.
| n | Total time (ms) | Kernel time (ms) | Non-kernel time (ms) |
|-----|------------------|------------------|----------------------|
| 10 | 6.33 | 3.97 | 2.36 |
| 20 | 10.30 | 5.59 | 4.71 |
| 30 | 13.72 | 7.38 | 6.34 |
| 40 | 17.85 | 8.84 | 9.01 |
| 50 | 21.90 | 10.63 | 11.27 |
| 60 | 27.00 | 12.56 | 14.44 |
| 70 | 32.02 | 14.04 | 17.98 |
| 80 | 37.25 | 15.76 | 21.49 |
| 90 | 42.55 | 17.95 | 24.60 |
| 100 | 49.30 | 20.03 | 29.27 |
| 200 | 125.56 | 38.21 | 87.36 |
| 300 | 293.58 | 66.79 | 226.79 |
| 400 | 361.87 | 78.96 | 282.91 |
| 500 | 518.51 | 102.51 | 416.00 |
| 600 | 716.63 | 122.81 | 593.82 |
This PR adds support for offset terms in `SymM`. This is essential for
handling equational theorems for functions that pattern match on natural
numbers in `Sym.simp`. Without this, it cannot handle simple examples
such as
```lean
def pw (n : Nat) : Nat :=
match n with
| 0 => 1
| n+1 => 2 * pw n
example : pw 4 = 16 := by
sym_simp [pw.eq_1, pw.eq_2]
example : pw (a + 2) = 2 * (2 * pw a) := by
sym_simp [pw.eq_2]
```
This PR optimizes congruence proof construction in `Sym.simp` by
avoiding
`inferType` calls on expressions that are less likely to be cached.
Instead of
inferring types of expressions like `@HAdd.hAdd Nat Nat Nat instAdd 5`,
we infer
the type of the function prefix `@HAdd.hAdd Nat Nat Nat instAdd` and
traverse
the forall telescope.
The key insight is that function prefixes are more likely shared across
many call sites
(e.g., all `Nat` additions use the same `@HAdd.hAdd Nat Nat Nat
instAdd`), so they
benefit from `inferType` caching.
Benchmark results show improvements on workloads with shared function
prefixes:
- `many_rewrites_5000`: 48.8ms → 43.1ms (-12%)
- `term_tree_5000`: 53.4ms → 30.5ms (-43%)
This PR implements a new strategy for simplifying `have`-telescopes in
`Sym.simp` that achieves linear kernel type-checking time instead of
quadratic.
## Problem
When simplifying deep `have`-telescopes, the previous approach using
`have_congr'` produced proofs that type-checked in quadratic time. The
simplifier itself was fast, but the kernel became the bottleneck for
large telescopes.
For example, at n=100:
- **Before**: simp = 2.4ms, kernel = **225ms**
- **After**: simp = 3.5ms, kernel = **10ms**
The quadratic behavior occurred because the kernel creates fresh free
variables for each binder when type-checking, destroying sharing and
producing O(n²) intermediate terms.
## Solution
We transform sequential `have`-telescopes into a parallel
beta-application form:
```
have x₁ := v₁; have x₂ := v₂[x₁]; b[x₁, x₂]
↓ (definitionally equal)
(fun x₁ x₂' => b[x₁, x₂' x₁]) v₁ (fun x₁ => v₂[x₁])
```
This parallel form leverages the efficient simplifier for lambdas in
`Sym.simp`. This form enables:
1. Independent simplification of each argument
2. Proof construction using standard congruence lemmas
3. Linear kernel type-checking time
The algorithm has three phases:
1. **`toBetaApp`**: Transform telescope → parallel beta-application
2. **`simpBetaApp`**: Simplify using `congr`/`congrArg`/`congrFun'` and
`simpLambda`
3. **`toHave`**: Convert back to `have` form
## Benchmark Results
### Benchmark 1: Chain with all variables used in body
| n | Before (simp) | Before (kernel) | After (simp) | After (kernel) |
|---|---------------|-----------------|--------------|----------------|
| 50 | 1.2ms | 32ms | 1.6ms | 4.4ms |
| 100 | 2.4ms | **225ms** | 3.5ms | **10ms** |
| 200 | 4.5ms | — | 8.4ms | 27ms |
| 500 | 11.7ms | — | 33.6ms | 128ms |
### Benchmark 3: Parallel declarations (simplified values)
| n | Before (simp) | Before (kernel) | After (simp) | After (kernel) |
|---|---------------|-----------------|--------------|----------------|
| 50 | 0.5ms | 24ms | 0.8ms | 1.8ms |
| 100 | 1.2ms | **169ms** | 1.8ms | **5.3ms** |
| 200 | 2.2ms | — | 3.9ms | 17ms |
| 500 | 5.9ms | — | 12.3ms | 93ms |
### Benchmark 5: Chain with single dependency
| n | Before (simp) | Before (kernel) | After (simp) | After (kernel) |
|---|---------------|-----------------|--------------|----------------|
| 100 | 1.6ms | 6.2ms | 1.8ms | 6.2ms |
| 200 | 2.8ms | 21.6ms | 4.4ms | 16.5ms |
| 500 | 7.3ms | **125ms** | 12.8ms | **72ms** |
Key observations:
- Kernel time is now **linear** in telescope depth (previously
quadratic)
- Simp time increases slightly due to the transformation overhead
- Total time (simp + kernel) is dramatically reduced for large
telescopes
- The improvement is most pronounced when the body depends on many
variables
## Trade-offs
- Proof sizes are larger (more congruence lemma applications)
- Simp time has ~1.5x overhead from the transformation
- For very small telescopes (n < 10), the overhead may not pay off
The optimization targets the critical path: kernel type-checking was the
bottleneck preventing scaling to realistic symbolic simulation
workloads.
This PR adds a new option to the function `simpHaveTelescope` in which
the `have` telescope is simplified in two passes:
* In the first pass, only the values and the body are simplified.
* In the second pass, unused declarations are eliminated.
This new mode eliminates **superlinear** behavior in the benchmark
`simp_3.lean`. Note that the kernel type checker still **exhibits**
quadratic behavior in this example, because it **does not have support**
for expanding a `have`/`let` telescope in a single step.
This PR reorganizes the monad hierarchy for symbolic computation in
Lean.
## Motivation
We want a clean layering where:
1. A foundational monad (`SymM`) provides maximally shared terms and
structural/syntactic `isDefEq`
2. `GrindM` builds on this foundation, adding E-graphs, congruence
closure, and decision procedures
3. Symbolic execution / VCGen uses `GrindM` directly without introducing
a third monad
## Changes
The core symbolic computation layer still lives in `Lean.Meta.Sym`. This
monad (`SymM`) provides:
- Maximally shared terms with pointer-based equality
- Structural/syntactic `isDefEq` and matching (no reduction, predictable
cost)
- Monotonic local contexts (no `revert` or `clear`), enabling O(1)
metavariable validation
- Efficient `intro`, `apply`, and `simp` implementations
The name "Sym" reflects that this is infrastructure for symbolic
computation: symbolic simulation, verification condition generation, and
decision procedures.
### Updated hierarchy
```
Lean.Meta.Sym -- SymM: shared terms, syntactic isDefEq, intro, apply, simp
Lean.Meta.Grind -- GrindM: E-graphs, congruence closure (extends SymM)
```
Symbolic execution is a usage pattern of `GrindM` operating on
`Grind.Goal`, not a separate monad. This keeps the API surface minimal:
users learn two monads, and VCGen is "how you use `GrindM`" (for users
that want to use `grind`) rather than a third abstraction to understand.
This PR adds a `done` flag to the result returned by `Simproc`s in
`Sym.simp`.
The `done` flag controls whether simplification should continue after
the result:
- `done = false` (default): Continue with subsequent simplification
steps
- `done = true`: Stop processing, return this result as final
## Use cases for `done = true`
### In `pre` simprocs
Skip simplification of certain subterms entirely:
```
def skipLambdas : Simproc := fun e =>
if e.isLambda then return .rfl (done := true)
else return .rfl
```
### In `post` simprocs
Perform single-pass normalization without recursive simplification:
```
def singlePassNormalize : Simproc := fun e =>
if let some (e', h) ← tryNormalize e then
return .step e' h (done := true)
else return .rfl
```
With `done = true`, the result `e'` won't be recursively simplified.
This PR adds support for simplifying lambda expressions in `Sym.simp`.
It is much more efficient than standard simp for very large lambda
expressions with many binders. The key idea is to generate a custom
function extensionality theorem for the type of the lambda being
simplified.
This technique is compatible with the standard `simp` tactic, and will
be ported in a separate PR.
<img width="581" height="455" alt="image"
src="https://github.com/user-attachments/assets/5911dc6c-03f0-48ed-843b-b8cb4f67ee61"
/>
### `lambda` benchmark summary
| Lambda size | MetaM (ms) | SymM (ms) | Speedup |
|-------------|------------|-----------|---------|
| 50 | 22.7 | 0.74 | ~31× |
| 100 | 120.5 | 1.75 | ~69× |
| 150 | 359.6 | 2.90 | ~124× |
| 200 | 809.5 | 4.51 | ~180× |
This PR ensures that `Sym.simp` checks thresholds for maximum recursion
depth and maximum number of steps. It also invokes `checkSystem`.
Additionally, this PR simplifies the main loop. Assigned metavariables
and `zetaDelta` reduction are now handled by installing `pre`/`post`
methods.
This PR adds `getMatch` and `getMatchWithExtra` for retrieving patterns
from
discrimination trees in the symbolic simulation framework.
The PR also adds uses `DiscrTree` to implement indexing in `Sym.simp`.