lean4-htt/tests/bench
Kyle Miller 047f6aaf89
feat: efficient tactic configuration elaborators and configurability (#13651)
This PR replaces the previous tactic configuration system with a
significantly more efficient one that supports custom configuration
syntaxes and processing. On a simple benchmark, configuration evaluation
takes 6.2% of the time it used to. The `declare_config_elab` command
generates a configuration elaborator that now directly constructs
configuration objects; previously it relied on `Meta.evalExpr'`, which
involved running a configuration through the full term elaboration,
compilation, and evaluation processes. The generated configuration
elaborators now also have the capability to do direct `Syntax`
evaluation in common cases, skipping term elaboration. Furthermore, the
elaborator accepts configurations more liberally: any user-defined
syntax that has the form of an `optConfig`-style configuration or
configuration item (including, e.g., `namedArgument`s) is accepted.
Import `Lean.Elab.ConfigEval` to use the system; see this module for
some documentation in addition to the docstrings in
`Lean.Elab.ConfigEval.Commands`. Furthermore, the `simp` tactic now also
has `(user.optionName := ...)` user configuration options, which can be
declared using a global `tactic.simp.user.optionName` option; use
`getUserConfigOption` and `withUserConfig` to access and set these in
metaprograms.

Other features:
- `declare_config_elab` creates a function that exposes an `init`
parameter for the configuration that will be modified. It also now has a
`where` clause, enabling defining custom handlers for specific keys.
- Elaborators can be given additional binders, to make parameterized
elaborators. This is used by `simp` and `grind ` to support multiple
default configurations with the correct expected type for `(config :=
...)` elaboration.
- The `EvalTerm` class supports direct evaluation of `Syntax`, skipping
term elaboration. The system will attempt to automatically derive this
class when generating the elaborator.
- In case `EvalTerm` does not recognize the term, then the syntax is
elaborated to an expression and an `EvalExpr` instance is applied to
evaluate the expression. The system will similarly automatically derive
these instances if possible.
- Automatic derivation is *transitive*. It is able to seek instances
through other instances; e.g. if it needs an `EvalTerm (List T)`
instance it will be able to reduce this to seeking an `EvalTerm T`
instance.
- The system is designed to be flexible, and the various components can
be combined to construct a configuration elaborator. There are also now
`declare_core_config_elab` and `declare_term_config_elab` for
conveniently generating elaborators for `CoreM` and `TermElabM`. The
difference is that the first takes an explicit flag for whether to log
exceptions, and the second uses the current `errToSorry` state.
**Warning:** if you use the `TermElabM` one from `TacticM`, it will be
unaware of the current `recover` state. The only differences between
these macros are the ways error recovery is handled per monad.

Other changes:
- `#reduce` tactic configuration now makes use of this system and has
more options
- The module `Lean.Elab.Tactic.ConfigSetter` is removed; the
`declare_config_elab`-family macros subsume its functionality.
- The module `Lean.Elab.Tactic.Config` is deprecated and will be
removed; migration notes appear in the module docs there. Import
`Lean.Elab.ConfigEval` instead.
- One of the mvcgen benchmarks got significantly slower, but it turned
out to be caused by the new tactic configuration elaboration no longer
resetting the MetaM caches. Adding an explicit `resetCache` into the
test driver fixed the benchmark.

### Notes for metaprogram authors

If you are using the module system, you just need a `meta import
Lean.Elab.ConfigEval` to use the macros, and it should serve as a
drop-in replacement to the previous system so long as

1. your configuration type is a `structure` with no parameters, indices,
or universes (only `Type` is supported);
2. default values are self-contained and not dependent on other fields;
and
3. all fields have types that are composed from `Option`, `List`,
`Array`, `String`, `DataValue`, and inductive types in `Type` with no
parameters or indices, whose fields are similarly composed.

The macros synthesize a self-contained configuration elaboration
procedure, analyzing the `EvalTerm` and `EvalExpr` instances that are
currently available or can be automatically derived. These are the
components of the system:
- `EvalTerm` instances provide `Term → TermElabM α` functions for
evaluation of raw syntax; these handle the common cases where an option
value is a identifier, application, or other simple expression. They are
responsible for adding TermInfo when info is enabled, to support hovers.
One can invoke derivation of a `EvalTerm T` instance with the
`ensure_eval_term_instance T` command (after `open scoped
Lean.Elab.ConfigEval`).
- `EvalExpr` instances provide `Expr → TermElabM α` functions for
evaluation of elaborated expressions; these handle cases where an option
value may require reduction to evaluate. Similarly, one can invoke
derivation of an `EvalExpr T` instance with the
`ensure_eval_expr_instance T` command. If needed, there's also
`derive_eval_expr_instance_using_meta_eval T` for creating a
`Meta.evalExpr'`-based evaluator.
- Functions like `ConfigEval.evalExprWithElab` compose `EvalTerm` and
`EvalExpr` instances into a single procedure that first attempts
`EvalTerm`, and, if that fails, applies the standard term elaborator and
then attempts `EvalExpr`. This way term elaboration can be skipped in
all but uncommon cases.
- Configuration item interpretation is through `ConfigEval.foldConfigM`,
which is a procedure with a lax specification for what counts as a
configuration item, calling the provided function on each recognized
configuration item. The idea is:
  - Null nodes are lists of configurations
- One-argument nodes are considered to be wrappers like `optConfig` or
`configItem`
- Two-argument nodes of the form `("+"<|>"-") (atom<|>ident)` are
considered to be boolean options
- Five-argument nodes of the form `"(" (atom<|>ident) ":=" syntax ")"`
are considered to be general configuration items. (It only checks for
the presence of `(` and that there are two-to-five arguments.)
  - Bare atoms are considered to be positive boolean options
- Configuration evaluation then uses `EvalConfigItem.set` on each item
produced by the fold, for an `EvalConfigItem` structure defined for the
given configuration type. The `def_eval_config_item` command can be used
to generate this structure. It analyzes which `EvalTerm` and `EvalExpr`
instances exist and derives missing ones, then builds an efficient
procedure to process configuration items and apply evaluators.
- Lastly, there are the `declare_core_config_elab`,
`declare_term_config_elab`, `declare_config_elab`, and
`declare_command_config_elab` macros for conveniently running the
`def_eval_config_item` command and constructing a self-contained
elaboration function.

The derivation procedures analyze which `EvalTerm`/`EvalExpr` instances
already exist and only derive the "leaf" instances that are necessary to
construct `EvalTerm` and `EvalExpr` instances. The derived instances are
made `private local` — since configuration elaborators are meant to be
self-contained, we decided not to let the additional instances be a side
effect of the macros. The instances can be globally added by manually
using the `ensure_*` commands.

The macros support making parameterized elaborators with arbitrary
additional binders. See `make_elab_grind_config` and
`make_elab_simp_config` in core Lean for examples of generating a single
elaborator that's used with multiple default value configurations.

To see how to create a key handler that matches all configuration keys
with a given prefix, see `make_elab_simp_config`.

There is a todo item at `Lean.Elab.ConfigEval.ReflectConfigItems` for
reflecting configurations back to syntax, which is not yet supported.

### Performance evaluation

A legacy configuration parser was temporarily added to
`Lean.Elab.Tactic.Grind.Config` using `declare_term_config_elab_legacy
elabGrindConfigLegacy Grind.Config`, and then this file was used for
measuring elaboration time:

```lean
import Lean

open Lean Elab Meta Tactic Parser

def cfgs : Array Syntax := Unhygienic.run do
  return #[
    ← `(Tactic.optConfig| ),
    ← `(Tactic.optConfig| +clean),
    ← `(Tactic.optConfig| +trace +markInstances -lookahead -useSorry),
    ← `(Tactic.optConfig| (trace := true) (markInstances := true) (lookahead := false) (useSorry := false)),
    ← `(Tactic.optConfig| -trace (splits := 20) +revert (maxSuggestions := some 3) (ematch := 2)),
    ← `(Tactic.optConfig| (gen := 5) -reducible +splitImp -funCC),
    ← `(Tactic.optConfig| (config := { trace := true, lookahead := false, maxSuggestions := some 3 })),
  ]

def testGrindElab (cfgs : Array Syntax) (n : Nat) : TacticM Unit := do
  profileitM Exception "test grind elab" (← getOptions) do
    let mut ematch := 0
    for _ in [0:n] do
      for cfg in cfgs do
        let c ← Tactic.elabGrindConfig cfg
        ematch := ematch + c.ematch
    logInfo m!"sum = {ematch}"

def testGrindElabLegacy (cfgs : Array Syntax) (n : Nat) : TacticM Unit := do
  profileitM Exception "test grind elab legacy" (← getOptions) do
    let mut ematch := 0
    for _ in [0:n] do
      for cfg in cfgs do
        let c ← Tactic.elabGrindConfigLegacy cfg
        ematch := ematch + c.ematch
    logInfo m!"sum = {ematch}"

def runTest (info : Bool) (test : TacticM Unit) : TermElabM Unit := do
  withEnableInfoTree info do
    let mvar ← mkFreshExprMVar none
    discard <| Tactic.run mvar.mvarId! test

set_option maxHeartbeats 0
set_option profiler true
set_option profiler.threshold 1
def iters : Nat := 1000
#eval runTest false <| testGrindElab cfgs iters
#eval runTest true <| testGrindElab cfgs iters
#eval runTest false <| testGrindElabLegacy cfgs iters
#eval runTest true <| testGrindElabLegacy cfgs iters
```

A representative output is
```
test grind elab took 315ms
test grind elab took 333ms
test grind elab legacy took 5.22s
test grind elab legacy took 5.33s
```
Computing `(315.0 + 333.0) / (5220 + 5330)` and rounding up to the
nearest tenth gives the 6.2% figure.

---

The #13426 draft PR includes some LSP modifications to support
completions for `simp` user configuration options.
2026-05-14 17:20:57 +00:00
..
build chore: don't fail on running build bench on built stage3 (#13467) 2026-04-18 22:07:21 +00:00
mergeSort feat: Array.mergeSort (#12385) 2026-03-06 13:18:13 +00:00
mvcgen feat: efficient tactic configuration elaborators and configurability (#13651) 2026-05-14 17:20:57 +00:00
qsort chore: relative lean-toolchains (#12652) 2026-02-25 10:23:35 +00:00
size chore: improve how test suite interacts with stages (#12913) 2026-03-16 15:20:03 +00:00
sym chore: minor tweaks to Sym.simp test and benchmark (#13468) 2026-04-18 21:11:30 +00:00
.gitignore chore: switch to new test/bench suite (#12590) 2026-02-25 13:51:53 +00:00
accumulate_profile.py chore: add lakeprof benchmarks (#9709) 2025-08-06 11:25:45 +00:00
arith_eval.ml
binarytrees.ghc-6.hs doc: fix typos 2021-03-07 15:06:02 +01:00
binarytrees.ocaml-2.ml
binarytrees.st.hs test: add binarytrees.st benchmark 2023-01-19 14:44:20 +01:00
binarytrees.st.mlton-2.sml test: add binarytrees.st benchmark 2023-01-19 14:44:20 +01:00
binarytrees.st.sml test: add binarytrees.st benchmark 2023-01-19 14:44:20 +01:00
binarytrees.st.swift test: add binarytrees.st benchmark 2023-01-19 14:44:20 +01:00
binarytrees.swift feat(tests/bench): add safe binarytrees.swift from https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/binarytrees-swift-1.html 2019-05-30 19:33:38 +02:00
binarytrees5.ml test: add binarytrees.st benchmark 2023-01-19 14:44:20 +01:00
binarytrees5_multicore.ml chore: more benchmarking setup 2023-01-17 13:28:05 +01:00
compile.sh feat: LLVM backend (#1837) 2022-12-30 12:45:30 +01:00
const_fold.hs chore(tests/bench): rename benchmarks 2019-05-30 16:25:41 +02:00
const_fold.ml chore(tests/bench): rename benchmarks 2019-05-30 16:25:41 +02:00
const_fold.sml chore(tests/bench): rename benchmarks 2019-05-30 16:25:41 +02:00
const_fold.swift chore(tests/bench): rename benchmarks 2019-05-30 16:25:41 +02:00
cross.yaml chore: fix more typos in comments 2023-10-08 14:37:34 -07:00
dag_hassorry_issue.lean chore: deprecate levelZero and levelOne (#12720) 2026-03-04 01:03:08 +00:00
dag_hassorry_issue.lean.args chore: reduce stack space usage at instantiate_mvars_fn (#4931) 2024-08-06 17:38:59 +00:00
dag_hassorry_issue.lean.out.expected chore: improve how test suite interacts with stages (#12913) 2026-03-16 15:20:03 +00:00
delayed_assign.lean test: delayed assignment performance issue (#12201) 2026-01-28 02:08:39 +00:00
deriv.hs
deriv.ml
deriv.sml
deriv.swift test(tests/bench): add deriv.swift 2019-05-30 11:34:58 -07:00
flake.lock chore: update cross-bench setup 2024-04-15 10:59:07 +02:00
flake.nix chore: robustify Nix shell (#8141) 2025-04-28 15:08:32 +00:00
full-stdlib.exec.yaml feat: separate benchmark for profiling the stdlib per-file 2020-10-29 11:53:03 +01:00
ghc-gc.py
lean-gc.py
Makefile chore: update cross-bench setup 2024-04-15 10:59:07 +02:00
mlkit-gc.py
ocaml-gc.py chore: more benchmarking setup 2023-01-17 13:28:05 +01:00
perf.py chore: update benchmark suite 2022-05-25 18:26:36 +02:00
qsort.hs chore: update benchmark suite 2022-05-25 18:26:36 +02:00
qsort.ml test: more fair qsort.ml benchmark 2022-10-12 20:22:55 +02:00
qsort.sml
qsort.swift test: more fair qsort.ml benchmark 2022-10-12 20:22:55 +02:00
rbmap.hs chore: make rbmap.hs more similar to other implementations 2022-09-24 14:16:48 +02:00
rbmap.ml
rbmap.sml
rbmap.swift tests(tests/bench): add rbmap.swift 2019-05-30 14:47:06 -07:00
rbmap2.lean chore: remove command universes 2021-06-29 17:01:07 -07:00
rbmap3.lean chore: remove command universes 2021-06-29 17:01:07 -07:00
rbmap500k.lean chore: remove command universes 2021-06-29 17:01:07 -07:00
rbmap_checkpoint.hs chore: make rbmap.hs more similar to other implementations 2022-09-24 14:16:48 +02:00
rbmap_checkpoint.ml test(tests/bench/rbmap_checkpoint): OCaml version using myLen 2019-05-30 07:40:53 -07:00
rbmap_checkpoint.sml chore(tests/bench/rbmap_checkpoint): use myLean 2019-05-30 07:30:07 -07:00
rbmap_checkpoint.swift test(tests/bench/rbmap_checkpoint): add swift version 2019-05-30 14:35:58 -07:00
rbmap_checkpoint2.lean chore: remove command universes 2021-06-29 17:01:07 -07:00
rbmap_checkpoint2.sml
rbmap_checkpoint_cpp_lean3.cpp test(tests/bench): add C++ versions of rbmap benchmarks 2019-06-22 06:58:27 -07:00
rbmap_checkpoint_cpp_std.cpp test(tests/bench): add C++ versions of rbmap benchmarks 2019-06-22 06:58:27 -07:00
rbmap_cpp_lean3.cpp test(tests/bench): add C++ versions of rbmap benchmarks 2019-06-22 06:58:27 -07:00
rbmap_cpp_std.cpp test(tests/bench): add C++ versions of rbmap benchmarks 2019-06-22 06:58:27 -07:00
README.md chore: update cross-bench setup 2024-04-15 10:59:07 +02:00
report.py chore: safer bench script 2023-07-19 08:31:39 +02:00
run.sh
speedcenter.yaml chore: try refining some benchmark settings (#8377) 2025-05-16 11:24:11 +00:00
states35.lean chore: move states35 to bench directory 2022-04-09 15:46:28 -07:00
test_single.sh feat: LLVM backend (#1837) 2022-12-30 12:45:30 +01:00
unionfind_clean.lean chore(frontends/lean): use => instead of := in match-expressions 2019-07-04 11:38:38 -07:00

Lean Benchmark Suites

This folder contains multiple small Lean programs for benchmarking used by two separate benchmark suites based on the temci benchmarking tool:

  • The light-weight "Speedcenter" suite benchmarks the current build of Lean. It can be used for quick comparisons on the cmdline and powers the Lean Speedcenter website.
  • The heavy-weight "Cross" suite benchmarks multiple Lean configurations and other functional compilers against each other and generates CSV and HTML reports from that. It was created for the paper "Counting Immutable Beans - Reference Counting Optimized for Purely Functional Programming" (IFL19).

Speedcenter Suite

Requirements:

  • A local Lean build in ../../build/release. Build at least the bin target.
  • temci. Using Nix, open a nix-shell in the project root directory to add a compatible version to your PATH. Alternatively, try pip3 install git+https://github.com/parttimenerd/temci.git.

To execute the suite and save the results in base.yaml, run (in this folder)

temci exec --config speedcenter.yaml --out base.yaml

Other interesting exec flags:

  • use --runs N to modify the default number of 10 runs per benchmark
  • use --included_blocks fast to excluded slow benchmarks like the stdlib benchmark. You can replace fast with any benchmark name or label in speedcenter.exec.yaml.

If you have multiple saved result files, you can compare them with

temci report --config speedcenter.yaml report1.yaml report2.yaml ...

Cross Suite

We recommend using Nix for building/obtaining all Lean variants and used compilers in a reproducible way. After installing Nix, running the benchmarks is as easy as

nix develop
make

This will record 50 runs for each benchmark configuration (this can be changed with runs in cross.yaml), generate results in report_lean.csv and report_cross.csv, and print them to stdout in a tabulated format. It will also generate HTML reports in report/ comparing the time-based benchmarks.

In order to reduce noise in the benchmarking data, you may instead want to try calling make inside a temci shell:

temci short shell --sudo --preset usable --cpuset_active make

Using root powers, this will temporarily configure your machine similarly to the LLVM benchmarking recommendations and move all your other processes to a single CPU core.