Commit graph

379 commits

Author SHA1 Message Date
Sebastian Graf
e6d9220eee
test: add dite and match splitting to sym-based MVCGen (#12903)
This PR generalizes the sym MVCGen's match splitting from `ite`-only to
`ite`, `dite`, and arbitrary matchers. Previously, only `ite` was
supported; `dite` and match expressions were rejected with an error.

`mkBackwardRuleForSplit` uses `SplitInfo.splitWith` to build the
splitting proof. Hypothesis types are discovered via `rwIfOrMatcher`
inside the splitter telescope, and `TransformAltFVars.all` provides the
proper fvars for `mkForallFVars`. Subgoal type metavariables use
`mkFreshExprSyntheticOpaqueMVar` so that `rwIfOrMatcher`'s internal
`assumption` tactic cannot assign them.

Adds `DiteSplit`, `MatchSplit`, and `MatchSplitState` test cases and a
`vcgen_match_split` benchmark.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:39:43 +00:00
Garmelon
6a2a884372
chore: migrate pkg tests (#12889)
Also refactor util.sh in the process, so test scripts become easier to
write (inspired in part by lake's test suite).
2026-03-11 18:55:46 +00:00
Sebastian Graf
b626c6d326
test: apply simp theorems in SymM mvcgen' (#12872)
This PR adds support for simp/equational spec theorems in the SymM-based
`mvcgen'` tactic,
catching up with a feature that the original `mvcgen` has supported for
a long time.
Users can write `@[spec] theorem : get (m := StateT σ m) = fun s => pure
(s, s) := rfl`
instead of manually specifying equivalent Hoare triples. The equational
form is more
concise and natural for specs that simply unfold definitions.

The universe level normalization (`normalizeLevelsExpr`) applied in
`work` and the backward
rule constructors is a workaround; ideally this should be integrated
into
`preprocessMVar`/`preprocessExpr` in the SymM framework so all users
benefit.

Changes:
- Add `SpecTheoremKind` to distinguish triple vs simp specs in
`SpecTheoremNew`
- Add `mkSpecTheoremNewFromSimpDecl?` to create spec entries from
equational lemmas, filtering no-op equations
- Add `mkBackwardRuleFromSimpSpec` to build backward rules via
`Eq.mpr`/`congrArg`, with instance synthesis, projection reduction, and
`unfoldReducible` on the RHS
- Migrate simp theorems from `SimpTheorems` database during
`migrateSpecTheoremsDatabase`
- Normalize universe levels so structural matching in
`BackwardRule.apply` succeeds when `max u v` vs `max v u` arise from
different code paths
- Simplify `mkSpecContext` by removing the mock `simp` context
construction
- Use `mkBackwardRuleFromExpr` instead of `mkAuxLemma` for triple specs,
since the proof may contain free variables from the goal context
- Add `AddSubCancelSimp` benchmark case and test exercising the simp
spec code path
- Change `AddSubCancel` spec proofs from `mvcgen` to `mvcgen'`
(dogfooding)


🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 17:15:04 +00:00
Sebastian Graf
49ed556479
test: add VCGen test suite for sym mvcgen benchmarks (#12855)
This PR extracts the example programs from the sym mvcgen benchmarks
into
shared `Cases.*` modules so that both benchmarks and a new fast test
suite
can reuse them. It also renames `vcgen_deep_add_sub_cancel` to
`vcgen_add_sub_cancel_deep` for consistency.

The test suite (`test_vcgen.lean`) runs all cases at n=10, completing in
~2s vs minutes for the full benchmarks. It is wired up as a `lake test`
driver and integrated with the lean4 test/bench infrastructure via
`run_test`/`run_bench` scripts registered in `CMakeLists.txt`.

Benchmark output now uses aligned `CaseName(n):` labels. The `run_bench`
script extracts per-case vcgen and kernel timings into
`measurements.jsonl`.
Benchmarks run single-threaded (`LEAN_NUM_THREADS=1`) for
reproducibility.
`vcgen_get_throw_set` is excluded from benchmarks due to pathological
`instantiateMVars` behavior.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 13:32:13 +00:00
Sebastian Graf
e9e46f4199
chore: fix two semantic merge errors in SymM mvcgen (#12845) 2026-03-09 11:00:01 +00:00
Paul Reichert
68ea28c24f
feat: Array.mergeSort (#12385)
This PR implements a merge sort algorithm on arrays. It has been
measured to be about twice as fast as `List.mergeSort` for large arrays
with random elements, but for small or almost sorted ones, the list
implementation is faster. Compared to `Array.qsort`, it is stable and
has O(n log n) worst-case cost. Note: There is still a lot of potential
for optimization. The current implementation allocates O(n log n)
arrays, one per recursive call.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 13:18:13 +00:00
Kim Morrison
66bc9ae177
chore: deprecate levelZero and levelOne (#12720)
This PR deprecates `levelZero` in favor of `Level.zero` and `levelOne`
in favor of the new `Level.one`, and updates all usages throughout the
codebase. The `levelZero` alias was previously required for computed
field `data` to work, but this is no longer needed.

🤖 Prepared with Claude Code
2026-03-04 01:03:08 +00:00
Sebastian Graf
4cd7a85334
test: speed up Sym mvcgen by doing fewer redundant program matches (#12712)
This PR changes the spec lookup procedure in Sym-based mvcgen so that

1. Spec candidates are sorted first before being filtered
2. Instead of filtering the whole set of candidates using
`spec.pattern.match?`, we take the first match with the highest
priority.

The second point means we will do a lot fewer matches when the highest
priority spec matches immediately. In this case, the one match is still
partially redundant with the final application of the backward rule
application. It would be great if could somehow specialize the backward
rule after it has been created. Still, this yields some welcome
speedups. Before and after for each.

```
vcgen_add_sub_cancel:
goal_1000: 865 ms, 1 VCs by grind: 228 ms, kernel: 435 ms
goal_1000: 540 ms, 1 VCs by grind: 229 ms, kernel: 426 ms

vcgen_ping_pong:
goal_1000: 458 ms, 0 VCs, kernel: 431 ms
goal_1000: 454 ms, 0 VCs, kernel: 443 ms (unchanged, because there is only ever one candidate spec)

vcgen_deep_add_sub_cancel:
goal_1000: 986 ms, 1 VCs by grind: 234 ms, kernel: 735 ms
goal_1000: 728 ms, 1 VCs by grind: 231 ms, kernel: 708 ms

vcgen_reader_state:
goal_1000: 746 ms, 1 VCs by sorry: 1 ms, kernel: 803 ms
goal_1000: 525 ms, 1 VCs by sorry: 1 ms, kernel: 840 ms
```
2026-02-27 03:24:34 +00:00
Sebastian Graf
668f07039c
chore: do not use Sym.inferType in mvcgen if inputs are not shared (#12713) 2026-02-27 01:15:09 +00:00
Sebastian Graf
f2438a1830
test: support postcondition VCs in Sym VCGen (#12711)
This PR adds support for generating and discharging postcondition VCs in
Sym-based `mvcgen`. It also adds a new benchmark case
`vcgen_ping_pong.lean` that tests this functionality. This benchmark
required a more diligent approach to maintain maximal sharing in goal
preprocessing. Goal preprocessing was subsequently merged into the main
VC generation function.
2026-02-26 16:34:15 +00:00
Garmelon
08eb78a5b2
chore: switch to new test/bench suite (#12590)
This PR sets up the new integrated test/bench suite. It then migrates
all benchmarks and some related tests to the new suite. There's also
some documentation and some linting.

For now, a lot of the old tests are left alone so this PR doesn't become
even larger than it already is. Eventually, all tests should be migrated
to the new suite though so there isn't a confusing mix of two systems.
2026-02-25 13:51:53 +00:00
Marc Huisinga
168c125cf5
chore: relative lean-toolchains (#12652)
This PR changes all `lean-toolchain` to use relative toolchain paths
instead of `lean4` and `lean4-stage0` identifiers, which removes the
need for manually linking toolchains via Elan.

After this PR, at least Elan 4.2.0 and 0.0.224 of the Lean VS Code
extension will be needed to edit core.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-25 10:23:35 +00:00
Sebastian Graf
bb8d8da1af
test: add benchmark vcgen_reader_state (#12671)
This PR adds the benchmark vcgen_reader_state that is a variant of
vcgen_add_sub_cancel that takes the value to subtract from a `ReaderT`
layer. Measurements:
```
goal_100: 201 ms, 1 VCs by sorry: 0 ms, kernel: 52 ms
goal_500: 382 ms, 1 VCs by sorry: 0 ms, kernel: 327 ms
goal_1000: 674 ms, 1 VCs by sorry: 1 ms, kernel: 741 ms
```
Which suggests it scales linearly. The generated VC triggers superlinear
behavior in `grind`, though, hence it is discharged by `sorry`.
2026-02-24 13:19:15 +00:00
Sebastian Graf
8916246be5
test: speed up vcgen_get_throw_set.lean by partially evaluating specs (#12670)
This PR speeds up the vcgen_get_throw_set benchmark by a factor of 4 by
partially evaluating specs.
2026-02-24 13:10:42 +00:00
Wojciech Różowski
65f112a165
chore: rename prime filter benchmark and fix the merge sort benchmark (#12669)
This PR renames the "Eratosthenes' sieve" benchmark description to
"prime filter" in the speedcenter config (following the discussion in
https://leanprover.zulipchat.com/#narrow/channel/270676-lean4/topic/sieve.20of.20Eratosthenes.20benchmark/with/575310824),
and adds the missing `#eval runBenchmarks` call to the merge sort
benchmark so it actually executes.
2026-02-24 10:57:47 +00:00
Wojciech Różowski
722813105d
test: add System F cbv benchmark (#12623)
This PR adds a System F formalization as a `cbv` tactic benchmark. It is
a translation of the Rocq case study from:

*Definitional Proof Irrelevance Made Accessible* by Thiago Felicissimo,
Yann Leray, Loïc Pujet, Nicolas Tabareau, Éric Tanter, Théo Winterhalter

The authors have given permission to use their development.

The benchmark includes:
- A full System F formalization (substitution lemmas, confluence of
λ-calculus, strong normalization)
- A `pow2DoubleEq` benchmark that verifies 2^(n+1) = 2^n + 2^n via
normalization in System F, measuring both `cbv` tactic time and kernel
checking time for n = 0..6

Co-Authored-By: @david-christiansen

Co-authored-by: David Thrane Christiansen <david@davidchristiansen.dk>
2026-02-20 16:46:07 +00:00
Sebastian Graf
06f36b61b8
test: use Sym.simp to unfold in VCGen benchmarks (#12593)
This PR improves the Sym VCGen such that we can use Sym.simp to unfold
definitions in the benchmark driver. To do so, it adds support for
zeta-reduction in the VCGen and ensures that proof terms are maximally
shared before being sent to the kernel.
2026-02-19 14:42:54 +00:00
Wojciech Różowski
fad343d9ef
test: add List.mergeSort benchmark for cbv tactic (#12588)
This PR adds a benchmark for `cbv` tactic that involves evaluating
`List.mergeSort` on a reversed list on natural numbers.
2026-02-19 13:59:42 +00:00
Sebastian Graf
14c973db4e
test: use Sym.Patterns for discrimination tree matching in Sym VCGen (#12579) 2026-02-19 08:08:53 +00:00
Sebastian Graf
172c5c3ba8
test: flag to use Sym.simp in Sym mvcgen benchmark driver (#12578) 2026-02-19 07:37:01 +00:00
Sebastian Graf
e639b66d62
chore: rename SpecTheorems.add to SpecTheorems.insert, add SpecProof.getProof (#12574)
This PR renames `SpecTheorems.add` to `SpecTheorems.insert`
2026-02-19 07:04:27 +00:00
Sebastian Graf
c0b9ff1148
test: measure VC discharging separately in Sym mvcgen benchmarks (#12551)
This PR refactors the benchmark driver of the Sym mvcgen benchmarks such
that time spent for discharging VCs and instantiation of MVars is
measured separately from VC generation.

Example output:

```
baseline_add_sub_cancel
goal_100: 57 ms, 0 VCs, kernel: 22 ms
goal_500: 353 ms, 0 VCs, kernel: 160 ms
goal_1000: 755 ms, 0 VCs, kernel: 437 ms

vcgen_add_sub_cancel
goal_100: 36 ms, 1 VCs by grind: 21 ms, kernel: 35 ms
goal_500: 149 ms, 1 VCs by grind: 115 ms, kernel: 214 ms
goal_1000: 314 ms, 1 VCs by grind: 249 ms, kernel: 478 ms

vcgen_deep_add_sub_cancel
goal_100: 65 ms, 1 VCs by grind: 23 ms, kernel: 82 ms
goal_500: 262 ms, 1 VCs by grind: 123 ms, kernel: 539 ms
goal_1000: 611 ms, 1 VCs by grind: 292 ms, kernel: 1075 ms

vcgen_get_throw_set
goal_100: 87 ms, 101 VCs by sorry: 16 ms, kernel: 93 ms
goal_500: 332 ms, 501 VCs by sorry: 289 ms, instantiate > 1000ms: 23363 ms, kernel: 770 ms
goal_1000: 794 ms, 1001 VCs by sorry: 1332 ms, instantiate > 1000ms: 334614 ms, kernel: 1882 ms
```
2026-02-18 12:03:47 +00:00
Sebastian Graf
c5c0ddcc56
test: remove let handling from Sym mvcgen (#12505)
This PR removes the unnecessary and potentially broken handling of
`let`s by zeta-reduction in Sym-based `mvcgen`.
It turns out to be unnecessary for the benchmarks so far, so there is a
lack of motivation to publicize `betaRevS` which would be needed to fix
it.
2026-02-16 15:58:36 +00:00
Sebastian Graf
f084ce1497
test: share benchmark driver for Sym mvcgen; don't measure unfolding (#12501)
This PR shares the driver code from the Sym-based mvcgen benchmarks. It
also moves the `simp only [loop, step]` call out of the measured
section, so that we measure purely the overhead of VC generation.

The new benchmark results are as follows. All measurements for n=1000:

```
baseline_add_sub_cancel:   719.318425 ms, kernel: 382.708178 ms
vcgen_add_sub_cancel:      306.883079 ms, kernel: 455.050825 ms
vcgen_deep_add_sub_cancel: 543.350543 ms, kernel: 896.926298 ms
vcgen_get_throw_set:       669.566541 ms, kernel: 60754.202714 ms
```

Note that `vcgen_add_sub_cancel` sped up by 100% because we no longer
measure unfolding `loop` and `step`. The baseline didn't speed up as
much because it unfolded in the same `Sym.simp` call that also does
other rewrites, so there was no `simp` pass that could be eliminated.
2026-02-16 13:17:00 +00:00
Wojciech Różowski
f3b8f76ec4
test: add cbv benchmark for evaluating Decidable.decide (#12467)
This PR adds a benchmark for `cbv` tactic for evaluating
`Decidable.decide` for a `Decidable` instance for a problem of checking
if a number is not a prime power.

The test has been inspired by a recent discussion on the [leanprover
zulip](https://leanprover.zulipchat.com/#narrow/channel/287929-mathlib4/topic/Can't.20decide.20if.2015.20is.20a.20prime.20power/with/572513330).
2026-02-13 13:25:35 +00:00
Sebastian Graf
204555ba83
test: add Sym vcgen benchmarks get_throw_set and deep_add_sub_cancel (#12447)
This PR adds two more benchmarks for the Sym-based mvcgen prototype in
the style of `add_sub_cancel`.

The first is `deep_add_sub_cancel`, which is like `add_sub_cancel` but
with a much deeper monad stack:
```lean
abbrev M := ExceptT String <| ReaderT String <| ExceptT Nat <| StateT Nat <| ExceptT Unit <| StateM Unit
```
By specializing the specs for `get` and `set`, we get competitive
performance:
```
goal_100: 180.365086 ms, kernel: 79.634989 ms
goal_200: 313.465611 ms, kernel: 187.808631 ms
goal_300: 478.278585 ms, kernel: 270.210634 ms
goal_400: 638.884320 ms, kernel: 380.381127 ms
goal_500: 759.802772 ms, kernel: 472.662882 ms
goal_600: 933.575180 ms, kernel: 649.040746 ms
goal_700: 1174.367200 ms, kernel: 759.470010 ms
goal_800: 1298.866482 ms, kernel: 864.420171 ms
goal_900: 1475.315552 ms, kernel: 1008.662783 ms
goal_1000: 1627.957444 ms, kernel: 1078.627830 ms
```
Recall that `add_sub_cancel` had `goal_1000: 824.476962 ms, kernel:
477.069045 ms`, but that doesn't need to repeatedly unwrap 3 layers of
the monad.

The second benchmark is `get_throw_set`. Its kernel is
```lean
def step (lim : Nat) : ExceptT String (StateM Nat) Unit := do
  let s ← get
  if s > lim then
    throw "s is too large"
  set (s + 1)

def loop (n : Nat) : ExceptT String (StateM Nat) Unit := do
  match n with
  | 0 => pure ()
  | n+1 => loop n; step n

def Goal (n : Nat) : Prop := ⦃fun s => ⌜s = 0⌝⦄ loop n ⦃⇓_ s => ⌜s = n⌝⦄
```
It will generate `n+1` VCs. We get `n` VCs of the form 
```
s✝ : Nat
_ : ¬0 < s✝
...
_ : n < s✝ + 1 ...<n times>... + 1
⊢ ⌜s✝ = 0⌝ ⊢ₛ ⌜False⌝ (s✝ + ...<n times>...)
```
and one VC of the form
```
⌜s✝ = 0⌝ ⊢ₛ ⌜s✝ + 1 + <n times> ... + 1 = n⌝
```
which can be discharged by `grind`, but presently are discharged with
`sorry`.
Statistics:
```
goal_100: 209.435869 ms, kernel: 128.768919 ms
goal_200: 386.639441 ms, kernel: 482.244717 ms
goal_300: 559.795137 ms, kernel: 1251.777405 ms
goal_400: 753.243978 ms, kernel: 3020.878177 ms
goal_500: 1014.939522 ms, kernel: 5182.120327 ms
goal_600: 1229.173622 ms, kernel: 9296.551442 ms
goal_700: 1410.024180 ms, kernel: 16655.954682 ms
goal_800: 1684.059305 ms, kernel: 32065.951705 ms
goal_900: 1905.602401 ms, kernel: 55299.942894 ms
goal_1000: 2172.823244 ms, kernel: 84082.492485 ms
```

Need to look at kernel times here, but tactic time looks about alright.

Using `grind` to discharge just `n=100` goals took 8s.
2026-02-11 19:54:23 +00:00
Wojciech Różowski
9bfd16ef5e
refactor: main loop of the cbv tactic (#12417)
This PR refactors the main loop of the `cbv` tactic. Rather than using
multiple simprocs, a central pre simproc is introduced. Moreover, let
expressions are no longer immediately zeta-reduced due to performance on
one of the benchmarks (`leroy.lean`).

Stacked on top of #12416
2026-02-11 11:47:18 +00:00
Sebastian Graf
7a8e011603
test: support ite splitting and lifting through ExceptT to Sym mvcgen (#12392) 2026-02-09 13:41:35 +00:00
Paul Reichert
4070ee3b8e
fix: sigmaIterator benchmark (#12364)
This PR fixes breakage in the sigmaIterator benchmark.
2026-02-06 19:45:42 +00:00
Sebastian Graf
fa7f51038b
test: document Sym-based mvcGen and tidy up benchmark project (#12350) 2026-02-06 14:12:29 +00:00
Sebastian Graf
8b5293382b
test: copy shallow_add_sub_cancel into mvcgen benchmark for precompilation (#12347) 2026-02-06 13:07:25 +00:00
Wojciech Różowski
d2176cb5fb
test: add tests and benchmarks for cbv tactic (#12345)
This PR adds two benchmarks (sieve of Eratosthenes, removing duplicates
from the list) and one test (a function with sublinear complexity
defined via well-founded recursion evaluated on large naturals with up
to `60` digits).

The tests have been suggested by @b-mehta.
2026-02-06 11:41:03 +00:00
Wojciech Różowski
bd8c861316
test: add program evaluation benchmark to cbv (#12280)
This PR adds a benchmark based on Xavier Leroy's compiler verification
course to test call-by-value tactic.

Stacked on top of #12279
2026-02-04 15:37:00 +00:00
Sebastian Graf
00c1f0d3a9
test: create aux theorems for backward rules in SymM-based mvcgen (#12295)
This PR improves and simplifies the SymM-based mvcgen prototype by
creating `BackwardRule.apply`-ready auxiliary theorems for spec
theorems. These auxiliary theorems have types that have reducible
definitions unfolded and shared, just like the rest of the SymM world
assumes. Furthermore, in order to aid kernel checking times,
definitional reductions leave behind expected type hints. With #12290,
we get the following numbers:

```
goal_100: 100.671964 ms, kernel: 34.104676 ms
goal_200: 152.650808 ms, kernel: 70.653251 ms
goal_300: 222.973242 ms, kernel: 105.874266 ms
goal_400: 294.032333 ms, kernel: 150.025106 ms
goal_500: 366.748098 ms, kernel: 193.483843 ms
goal_600: 442.509542 ms, kernel: 236.845115 ms
goal_700: 517.527685 ms, kernel: 268.804230 ms
goal_800: 601.657910 ms, kernel: 310.765606 ms
goal_900: 681.020759 ms, kernel: 357.428032 ms
goal_1000: 762.212989 ms, kernel: 403.789517 ms
```

The baseline is `shallow_add_sub_cancel`:

```
goal_100: 62.721757 ms, kernel: 22.109237 ms
goal_200: 140.118652 ms, kernel: 45.219512 ms
goal_300: 241.077690 ms, kernel: 78.779379 ms
goal_400: 363.274462 ms, kernel: 128.951250 ms
goal_500: 517.350791 ms, kernel: 155.498217 ms
goal_600: 678.291416 ms, kernel: 212.325487 ms
goal_700: 881.479043 ms, kernel: 258.690695 ms
goal_800: 1092.357375 ms, kernel: 351.996079 ms
goal_900: 1247.759480 ms, kernel: 319.197608 ms
goal_1000: 1497.203628 ms, kernel: 364.532560 ms
```

The latter is with the main solving loop in interpreter mode, but the
kernel checking times are still representative.
Earlier experiments suggest that the precompiled baseline performs at
roughly 650ms for `goal_1000`, so the new mvcgen is getting close.
2026-02-03 17:43:33 +00:00
Sebastian Graf
7416309805
test: teach SymM mvcgen to recognize specialized theorem applications (#12256)
This PR recognizes certain kinds of composite proof terms of the form
`hpre.trans hspec |> (wp prog).mono _ _ hpost` and abstracts them into
bespoke theorems. This should yield smaller proof terms. Sadly, kernel
checking time is unaffected, even regressing a bit. The number of shared
terms stays almost the same (+- a constant). Hence I deactivate the code
path in this patch. We keep the code, though, because it might be useful
in the future, also there are a few other improvements.
2026-01-30 17:00:59 +00:00
Sebastian Graf
59f3abd0bd
test: add SymM mvcgen and port the add_sub_cancel benchmark (#12251)
This PR adds a clone of the `mvcgen` tactic based on `SymM` and
evaluates it based on a ported `add_sub_cancel` benchmark. Notably, it
can reuse all the existing `@[spec]`-annotated theorems to generate VCs.
(It doesn't do control-flow splitting, simp rules on the program
expression or handling of lets; we'll get there.)

It is quite fast already, with the kernel being the bottle-neck:

```
goal_50: 69.524305 ms, kernel: 155.327778 ms
goal_100: 93.834221 ms, kernel: 407.370786 ms
goal_150: 131.364098 ms, kernel: 762.936720 ms
goal_200: 169.577172 ms, kernel: 1181.199093 ms
goal_250: 206.421738 ms, kernel: 1707.539380 ms
```

```
goal_200: 169.458637 ms, kernel: 1186.221085 ms
goal_400: 322.819718 ms, kernel: 3791.613854 ms
goal_600: 474.929013 ms, kernel: 7763.373757 ms
goal_800: 634.379422 ms, kernel: 13107.810430 ms
```

It is best compared to the `solveUsingSym <n> false true` measurements
of the SymM `add_sub_cancel` benchmark (`false`: without intermediate
eager simplification). For `n=200`, it reports

```
goal_200: 779.482300 ms, kernel: 742.097404 ms
```

suggesting that the generated proof term could be improved for kernel
reduction. (TODO.)
I'm unsure whether `solveUsingSym` is run in interpreted mode, so take
the >400% speedup with a grain of salt.
We can definitely conclude that VC generation time is currently not a
bottleneck compared to kernel checking time.

Plot for discharging goals of sizes 100..800:
<img width="1000" height="600" alt="Code_Generated_Image(1)"
src="https://github.com/user-attachments/assets/90e76a45-fa46-4d02-912a-c3355e2aa094"
/>

Plot comparing Kernel and Goal time: 
<img width="1000" height="600" alt="Code_Generated_Image(2)"
src="https://github.com/user-attachments/assets/5849ba0f-1d83-4f2d-98dd-fa65b840bb4e"
/>
2026-01-30 15:12:54 +00:00
Paul Reichert
16919852d9
refactor: remove last appearances of allowNontermination (#12211)
This PR updates docstrings and function signatures in order to complete
the transition from `Iter.Partial` to `Iter.Total` (extrinsically
terminating by default). It also deprecates `allowNontermination` and
adds `Iter.Total.atIdxSlow?`.
2026-01-29 07:22:19 +00:00
Joachim Breitner
08f43acefb
perf: add introSubstEq shortcut (#12190)
This PR adds the `introSubstEq` MetaM tactic, as an optimization over
`intro h; subst h` that avoids introducing `h : a = b` if it can be
avoided,
which is the case when `b` can be reverted without reverting anything
else. Speeds up the generation of `injEq` theorem.
2026-01-28 12:33:14 +00:00
Leonardo de Moura
50bbf101be
test: delayed assignment performance issue (#12201)
This PR adds a benchmark that exposes a performance issue at
`instantiateMVars` when there are many nested delayed assignments.
2026-01-28 02:08:39 +00:00
Joachim Breitner
36bae38e6b
test: add big_struct_dep1 benchmark (#12191) 2026-01-27 14:36:09 +00:00
Sebastian Graf
e8870da205
chore: improve performance of mpure_intro and mvcgen by avoiding whnfD (#12165)
New measurements:

```
goal_10: 181.910200 ms, kernel: 37.241050 ms
goal_20: 386.540215 ms, kernel: 83.497428 ms
goal_30: 648.282057 ms, kernel: 117.038447 ms
goal_40: 946.733191 ms, kernel: 168.369124 ms
goal_50: 1325.846873 ms, kernel: 223.838786 ms
goal_60: 1734.175705 ms, kernel: 285.594486 ms
goal_70: 2199.522317 ms, kernel: 351.659865 ms
goal_80: 2700.077802 ms, kernel: 428.303337 ms
goal_90: 3260.446641 ms, kernel: 515.976499 ms
goal_100: 3865.503733 ms, kernel: 600.229962 ms
```

Previously, goal_100 took 7.8s.
2026-01-26 17:58:33 +00:00
Sebastian Graf
b44c7e161c
test: add two benchmarks for mvcgen in the style of SymM (#12163)
This PR adds two benchmarks for mvcgen in the style of Leo's SymM
benchmarks.

While performance on add_sub_cancel_StateM.lean is in the same order of
magnitude as the corresponding MetaM benchmark, add_if_sub_StateM.lean
is far slower.

Measurements for add_sub_cancel:
```
goal_10:   245.576221 ms, kernel: 134.134182 ms
goal_20:   613.945320 ms, kernel: 115.453811 ms
goal_30:  1074.053596 ms, kernel: 179.076070 ms
goal_40:  1680.678302 ms, kernel: 252.066677 ms
goal_50:  2457.209584 ms, kernel: 293.974096 ms
goal_60:  3271.773330 ms, kernel: 368.394386 ms
goal_70:  3981.247921 ms, kernel: 434.297822 ms
goal_80:  5077.300540 ms, kernel: 507.047772 ms
goal_90:  6486.990060 ms, kernel: 556.952095 ms
goal_100: 7791.399986 ms, kernel: 623.605163 ms
```

Measurements for add_if_sub:

```
goal_2: 89.762349 ms, kernel: 43.320205 ms
goal_3: 190.655546 ms, kernel: 38.888499 ms
goal_4: 434.461936 ms, kernel: 75.234581 ms
goal_5: 1110.295284 ms, kernel: 161.698707 ms
goal_6: 3241.383031 ms, kernel: 326.137173 ms
goal_7: 11675.609970 ms, kernel: 684.907188 ms
```

Much room for improvement.
2026-01-26 13:17:47 +00:00
Leonardo de Moura
45862d5486
feat: improves simpArrowTelescope simproc (#12153)
This PR improves the `simpArrowTelescope` simproc that simplifies
non-dependent arrow telescopes: `p₁ → p₂ → ... → q`.

The simproc now also applies telescope-specific simplifications:
- `False → q` to `True` (when `q : Prop`)
- `True → q` to `q` (when `q : Prop`)
- `p → True` to `True`
2026-01-25 22:29:38 +00:00
Leonardo de Moura
ba8c2ed4ee
feat: add simpArrowTelescope for compact proofs of arrow simplification (#12152)
This PR adds `simpArrowTelescope`, a simproc that simplifies telescopes
of non-dependent arrows (p₁ → p₂ → ... → q) while avoiding quadratic
proof growth.

When using `Expr.forallE` to represent nested implications, each nesting
level bumps de Bruijn indices in subterms, destroying sharing even with
hash-consing. For example, a free variable `x` gets different de Bruijn
representations at each depth, causing proof terms to grow.

`simpArrowTelescope` works by:

- Converting arrows to `Arrow p q` (a definitional wrapper)
- Simplifying each component
- Converting back to `→` form

Since `Arrow` arguments are not under binders, subterms remain identical
across nesting levels and can be shared.

The `simp_4` benchmark demonstrates the improvement:

With `forallE`: ~160ms, proof_size ≈ 173k
With `Arrow`: ~43ms, proof_size ≈ 16k
Tradeoff: `simpArrowTelescope` misses simplifications that depend on the
arrow structure (e.g., `p → p` to `True`), since post-methods aren't
applied to intermediate arrows. Thus, it is not used by default. to use
it, one has to set `simpArrowTelescope` as a `pre`-method.
2026-01-25 20:43:59 +00:00
Leonardo de Moura
e90f6f77db
test: local rewrite with Sym.simp (#12147)
This PR adds a new API for helping users write focused rewrites.
2026-01-25 01:32:50 +00:00
Leonardo de Moura
6de7100f69
feat: add Goal API for SymM + grind (#12143)
This PR adds an API for building symbolic simulation engines and
verification
condition generators that leverage `grind`. The API wraps `Sym`
operations to
work with `grind`'s `Goal` type, enabling lightweight symbolic execution
while
carrying `grind` state for discharge steps.

New operations on `Goal`:
- `mkGoal`: create a `Goal` from an `MVarId`
- `introN`, `intros`: introduce binders
- `apply`: apply backward rules
- `simp`, `simpIgnoringNoProgress`: simplify using `Sym.Simp`
- `internalize`, `internalizeAll`: add hypotheses to the E-graph
- `grind`: attempt to close the goal using `grind`
- `assumption`: close by matching a hypothesis

A new test demonstrates the API on a stateful program with conditionals,
using `grind` to discharge arithmetic side conditions.
2026-01-24 20:30:08 +00:00
Leonardo de Moura
4c1e4a77b4
test: MetaM vs SymM on do notation (#12134)
This PR adds a new benchmark `shallow_add_sub_cancel.lean` that
demonstrates symbolic simulation using a shallow embedding into monadic
`do` notation, as opposed to the deep embedding approach in
`add_sub_cancel.lean`.

The shallow embedding approach:
- Uses Lean's `StateM` monad directly instead of a custom command
language

- Defines `Exec s k post` as a simple predicate: `post (k s).1 (k s).2`

- Proves helper theorems for reasoning about monadic operations (`pure`,
`bind`, `get`, `set`, `modify`, `ite`)

- Programs are written in actual `do`-notation rather than a custom AST

The benchmark solves goals using both the `MetaM` and `SymM` frameworks,
showing that the shallow embedding integrates well with the symbolic
simulation infrastructure. `SymM` is again way faster than `MetaM`

### Symbolic simulation benchmark — tactic time only

Problem size `n` corresponds to a program with `4·n` monadic actions.

| n   | MetaM tactic (ms) | SymM tactic (ms) | Speedup |
|-----|-------------------|------------------|---------|
| 10  | 82.10  | 11.37 | ~7.2×  |
| 20  | 176.21 | 17.71 | ~9.9×  |
| 30  | 306.47 | 25.39 | ~12.1× |
| 40  | 509.52 | 34.53 | ~14.7× |
| 50  | 689.19 | 43.51 | ~15.8× |
| 60  | 905.86 | 52.47 | ~17.3× |
| 70  | 1172.31 | 62.50 | ~18.8× |
| 80  | 1448.48 | 70.65 | ~20.5× |
| 90  | 1787.15 | 80.89 | ~22.1× |
| 100 | 2128.12 | 90.77 | ~23.5× |

<img width="580" height="455" alt="image"
src="https://github.com/user-attachments/assets/3511aaab-4d53-4520-8302-65d2d100df4a"
/>
2026-01-24 03:38:02 +00:00
Leonardo de Moura
c81a8897a9
feat: improve Sym.simp APIs and new benchmark data (#12101)
This PR improves the the `Sym.simp` APIs. It is now easier to reuse the
simplifier cache between different simplification steps. We use the APIs
to improve the benchmark at #12100.

### Symbolic simulation with simplifier cache reuse (SymM)

Problem size `n` corresponds to a program with `2·n + 2` instructions.

| n   | Tactic time (ms) | Kernel time (ms) |
|-----|------------------|------------------|
| 10  | 4.53  | 4.29  |
| 20  | 5.56  | 6.91  |
| 30  | 6.46  | 8.67  |
| 40  | 8.07  | 11.20 |
| 50  | 9.37  | 13.63 |
| 60  | 11.89 | 15.43 |
| 70  | 12.43 | 18.28 |
| 80  | 14.07 | 20.72 |
| 90  | 15.62 | 23.41 |
| 100 | 17.39 | 24.80 |
| 200 | 30.35 | 48.39 |
| 300 | 45.41 | 72.84 |
| 400 | 59.17 | 97.67 |
| 500 | 79.63 | 138.99 |
| 600 | 100.05 | 173.67 |
| 700 | 119.77 | 208.80 |

<img width="571" height="455" alt="image"
src="https://github.com/user-attachments/assets/70da7ea2-b5d2-405e-985c-bfa358455afc"
/>
2026-01-22 03:37:16 +00:00
Leonardo de Moura
fa40491c78
test: benchmark MetaM vs SymM (#12100)
This PR adds a comparison between `MetaM` and `SymM` for a benchmark was
proposed during the Lean@Google Hackathon.

### Benchmark description

In this benchmark, we define the semantics of a very simple imperative
language using an inductive predicate

```
Exec prog events mem lctx post
```

The predicate holds if, when executing the program `prog` with an
initial list of events `events`, memory `mem`, and local context `lctx`,
the postcondition `post` holds.

We then consider the following program:

```
input b
a := b
a := a + a
a := a - b
...
a := a + a
a := a - b
```

That is, after reading an input value `b`, the program repeatedly
updates the variable `a` by doubling it and then subtracting `b`.

We prove that, for any initial memory `m` and local context `l`, and
starting from the empty list of events, the following postcondition
holds:

```
fun t' m' l' =>
  m' = m ∧                      -- memory did not change
  ∃ v : Word,
    t' = [IOEvent.IN v] ∧       -- exactly one input event
    l'.get "a" = some v         -- `a` contains the input value
```

In other words, executing the program produces exactly one input event,
leaves the memory unchanged, and ensures that the final value of `a` is
equal to the input value.

### Symbolic simulation benchmark (problem size `n`, with `2·n + 2`
instructions)

| Problem size (n) | MetaM time (ms) | MetaM kernel (ms) | SymM time
(ms) | SymM kernel (ms) | Total speedup |

|------------------|------------------|-------------------|----------------|------------------|---------------|
| 10  | 94.83  | 6.60  | 7.04  | 6.18  | ~13.5× |
| 20  | 218.92 | 13.33 | 14.15 | 13.02 | ~15.5× |
| 30  | 375.10 | 22.95 | 26.51 | 19.81 | ~14.2× |
| 40  | 563.82 | 34.99 | 40.42 | 29.55 | ~14.0× |
| 50  | 815.89 | 53.78 | 60.84 | 42.25 | ~13.4× |
| 60  | 1081.09 | 73.46 | 80.99 | 53.52 | ~13.3× | 
| 70  | 1400.80 | 102.70 | 106.02 | 68.61 | ~13.2× | 
| 80  | 1772.19 | 126.65 | 134.23 | 87.64 | ~13.2× |
| 90  | 2203.41 | 161.68 | 168.26 | 115.52 | ~13.1× | 
| 100 | 2474.09 | 191.23 | 209.13 | 143.86 | ~11.8× |

<img width="580" height="455" alt="image"
src="https://github.com/user-attachments/assets/bc7058fa-e71a-4c2c-be28-860f39166965"
/>

 ### Symbolic simulation with extra simplification (SymM)

Problem size `n` corresponds to a program with `2·n + 2` instructions.

| n   | Total time (ms) | Kernel time (ms) | Non-kernel time (ms) |
|-----|------------------|------------------|----------------------|
| 10  | 6.33  | 3.97 | 2.36 |
| 20  | 10.30 | 5.59 | 4.71 |
| 30  | 13.72 | 7.38 | 6.34 |
| 40  | 17.85 | 8.84 | 9.01 |
| 50  | 21.90 | 10.63 | 11.27 |
| 60  | 27.00 | 12.56 | 14.44 |
| 70  | 32.02 | 14.04 | 17.98 |
| 80  | 37.25 | 15.76 | 21.49 |
| 90  | 42.55 | 17.95 | 24.60 |
| 100 | 49.30 | 20.03 | 29.27 |
| 200 | 125.56 | 38.21 | 87.36 |
| 300 | 293.58 | 66.79 | 226.79 |
| 400 | 361.87 | 78.96 | 282.91 |
| 500 | 518.51 | 102.51 | 416.00 |
| 600 | 716.63 | 122.81 | 593.82 |
2026-01-22 01:38:56 +00:00
Leonardo de Moura
af438425d5
perf: avoid mkAppM in Sym.simp (#12099)
This PR ensures `Sym.simpGoal` does not use `mkAppM`. It also increases
the default number of maximum steps in `Sym.simp`.
2026-01-22 00:01:43 +00:00