lean4-htt/tests
Joachim Breitner 2e06fb5008
perf: fuse fvar substitution into instantiateMVars (#12233)
This PR replaces the default `instantiateMVars` implementation with a
two-pass variant that fuses fvar substitution into the traversal,
avoiding separate `replace_fvars` calls for delayed-assigned MVars and
preserving sharing. The old single-pass implementation is removed
entirely.

The previous implementation had quadratic complexity when instantiating
expressions with long chains of nested delayed-assigned MVars. Such
chains arise naturally from repeated `intro`/`apply` tactic sequences,
where each step creates a new delayed assignment wrapping the previous
one. The new two-pass approach resolves the entire chain in a single
traversal with a fused fvar substitution, reducing this to linear
complexity.

### Terminology (used in this PR and in the source)

* **Direct MVar**: an MVar that is not delayed-assigned.
* **Pending MVar**: the direct MVar stored in a
`DelayedMetavarAssignment`.
* **Assigned MVar**: a direct MVar with an assignment, or a
delayed-assigned MVar with an assigned pending MVar.
* **MVar DAG**: the directed acyclic graph of MVars reachable from the
expression.
* **Resolvable MVar**: an MVar where all MVars reachable from it
(including itself) are assigned.
* **Updateable MVar**: an assigned direct MVar, or a delayed-assigned
MVar that is resolvable but not reachable from any other resolvable
delayed-assigned MVar.

In the MVar DAG, the updateable delayed-assigned MVars form a cut (the
**updateable-MVar cut**) with only assigned MVars behind it and no
resolvable delayed-assigned MVars before it.

### Two-pass architecture

**Pass 1** (`instantiate_direct_fn`): Traverses all MVars and
expressions reachable from the initial expression and instantiates all
updateable direct MVars (updating their assignment with the result),
instantiates all level MVars, and determines if there are any updateable
delayed-assigned MVars.

**Pass 2** (`instantiate_delayed_fn`): Only run if pass 1 found
updateable delayed-assigned MVars. Has an **outer** and an **inner**
mode, depending on whether it has crossed the updateable-MVar cut.

In outer mode (empty fvar substitution), all MVars are either unassigned
direct MVars (left alone), non-updateable delayed-assigned MVars
(pending MVar traversed in outer mode and updated with the result), or
updateable delayed-assigned MVars. When a delayed-assigned MVar is
encountered, its MVar DAG is explored (via `is_resolvable_pending`) to
determine if it is resolvable (and thus updateable). Results are cached
across invocations.

If it is updateable, the substitution is initialized from its arguments
and traversal continues with the value of its pending MVar in inner
mode. In inner mode (non-empty substitution), all encountered
delayed-assigned MVars are, by construction, resolvable but not
updateable. The substitution is carried along and extended as we cross
such MVars. Pending MVars of these delayed-assigned MVars are NOT
updated with the result (as the result is valid only for this
substitution, not in general).

Applying the substitution in one go, rather than instantiating each
delayed-assigned MVar on its own from inside out, avoids the quadratic
overhead of that approach when there are long chains of delayed-assigned
MVars.

**Write-back behavior**: Pass 2 writes back the normalized pending MVar
values of delayed-assigned MVars above the updateable-MVar cut (the
non-resolvable ones whose children may have been resolved). This is
exactly the right set: these MVars are visited in outer mode, so their
normalized values are suitable for storing in the mctx. MVars below the
cut are visited in inner mode, so their intermediate values cannot be
written back.

### Pass 2 scope-tracked caching

A `scope_cache` data structure ensures that sharing is preserved even
across different delayed-assigned MVars (and hence with different
substitutions), when possible. Each `visit_delayed` call pushes a new
scope with fresh fvar bindings. The cache correctly handles cross-scope
reuse, fvar shadowing, and late-binding via generation counters and
scope-level tracking.

The `scope_cache` has been formally verified:
`tests/elab/scopeCacheProofs.lean` contains a complete Lean proof that
the lazy generation-based implementation refines the eager
specification, covering all operations (push, pop, lookup, insert)
including the rewind lazy cleanup with scope re-entry and degradation.
The key correctness invariant is inter-entry gen list consistency
(GensConsistent), which, unlike per-entry alignment with `currentGens`,
survives pop+push cycles.

### Behavioral differences from original `instantiateMVars`

The implementation matches the original single-pass `instantiateMVars`
behavior with one cosmetic difference: the new implementation
substitutes fvars inline during traversal rather than constructing
intermediate beta-redexes, producing more beta-reduced terms in some
edge cases. This changes the pretty-printed output for two elab tests
(`1179b`, `depElim1`) but all terms remain definitionally equal.

### Tests

Correctness and performance tests for the new implementation were added
in #12808.

### Files

- `src/library/instantiate_mvars.cpp` — C++ implementation of both
passes (replaces `src/kernel/instantiate_mvars.cpp`)
- `src/library/scope_cache.h` — scope-aware cache data structure
- `src/Lean/MetavarContext.lean` — exported accessors for
`DelayedMetavarAssignment` fields
- `tests/elab/scopeCacheProofs.lean` — formal verification of
`scope_cache` correctness
- `tests/elab/1179b.lean.out.expected`,
`tests/elab/depElim1.lean.out.expected` — updated expected output

Co-authored-by: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 17:05:21 +00:00
..
bench chore: fix two semantic merge errors in SymM mvcgen (#12845) 2026-03-09 11:00:01 +00:00
compile chore: fix ci for new test suite (#12704) 2026-02-27 23:25:37 +00:00
compile_bench chore: fix ci for new test suite (#12704) 2026-02-27 23:25:37 +00:00
docparse chore: migrate more tests to new test suite (#12809) 2026-03-06 16:52:01 +00:00
elab perf: fuse fvar substitution into instantiateMVars (#12233) 2026-03-09 17:05:21 +00:00
elab_bench test: add instantiateMVars tests and benchmark for delayed assignments (#12808) 2026-03-06 10:59:13 +00:00
elab_fail chore: turn on new do elaborator in Core (#12656) 2026-03-09 12:38:33 +00:00
elabissues
ir
lake fix: lake: emit .nobuild trace only if .trace exists (#12835) 2026-03-07 01:25:28 +00:00
lake_bench/inundation chore: switch to new test/bench suite (#12590) 2026-02-25 13:51:53 +00:00
lean chore: migrate more tests to new test suite (#12809) 2026-03-06 16:52:01 +00:00
misc chore: fix ci after new linter was added (#12733) 2026-02-28 03:05:07 +00:00
misc_bench chore: switch to new test/bench suite (#12590) 2026-02-25 13:51:53 +00:00
misc_dir chore: migrate more tests to new test suite (#12809) 2026-03-06 16:52:01 +00:00
pkg feat: move instance-class check to declaration site (#12325) 2026-03-06 03:23:27 +00:00
playground
server chore: migrate more tests to new test suite (#12809) 2026-03-06 16:52:01 +00:00
server_interactive chore: migrate more tests to new test suite (#12809) 2026-03-06 16:52:01 +00:00
simpperf
.gitignore chore: switch to new test/bench suite (#12590) 2026-02-25 13:51:53 +00:00
CMakeLists.txt chore: migrate more tests to new test suite (#12809) 2026-03-06 16:52:01 +00:00
combine.py chore: switch to new test/bench suite (#12590) 2026-02-25 13:51:53 +00:00
common.sh
env.sh.in chore: switch to new test/bench suite (#12590) 2026-02-25 13:51:53 +00:00
fix_expected.py chore: add --force option to fix_expected.py (#12847) 2026-03-09 12:21:04 +00:00
flake.lock chore: fix ci for new test suite (#12704) 2026-02-27 23:25:37 +00:00
flake.nix chore: fix ci for new test suite (#12704) 2026-02-27 23:25:37 +00:00
lakefile.toml
lean-toolchain chore: relative lean-toolchains (#12652) 2026-02-25 10:23:35 +00:00
lint.py chore: migrate more tests to new test suite (#12809) 2026-03-06 16:52:01 +00:00
measure.py chore: switch to new test/bench suite (#12590) 2026-02-25 13:51:53 +00:00
README.md chore: migrate more tests to new test suite (#12809) 2026-03-06 16:52:01 +00:00
util.sh chore: fix test suite on macOS (#12780) 2026-03-03 20:59:08 +00:00

Test suite

This directory contains the lean test and benchmark suite. It is currently in the process of being migrated to the framework described in this file. Some tests still use the previous framework, which is partially documented in testing.md.

The test suite consists of two types of directories: Test directories and test piles.

A test directory is a directory containing a run_test and/or a run_bench script. It represents a single test or benchmark, depending on which script is present. The run scripts are executed once with their working directory set to the test directory.

A test pile is also a directory containing a run_test and/or a run_bench script. Here however, each file of a directory-specific extension (usually .lean) represents a single test or benchmark. The run scripts are executed once for each test file with their working directory set to the pile directory. Often, additional supplementary files are placed next to the test files and interpreted by the run scripts.

Directory structure

Benchmarks belonging to the old framework are not included in this description.

  • bench: A bunch of benchmarks and benchmarking related files, most of which are not part of the test suite.
    • build: A benchmark that builds the lean stdlib and measures the per-file performance.
    • size: A benchmark that measures the sizes of a few different kinds of files.
  • compile: Tests that compile lean files and then execute the resulting binary, verifying the resulting output. They also run the same lean file through the interpreter.
  • compile_bench: Benchmarks that compile lean files and measure the execution of the resulting binary, as well as optionally run the same lean file through the interpreter. These are also executed as part of the test suite, and .out.expected files are ignored when benchmarking.
  • docparse: Docstring parsing tests.
  • elab: Tests that elaborate lean files without executing them, verifying the resulting output.
  • elab_fail: Like elab, but expecting an exit code of 1 instead of 0.
  • elab_bench: Like elab, but measuring the elaboration performance. These are also executed as part of the test suite, and .out.expected files are ignored when benchmarking.
  • server, server_interactive: Test LSP server requests.
  • lake_bench: Benchmark directories that measure lake performance.
  • misc: A collection of miscellaneous small test scripts.
  • misc_bench: A collection of miscellaneous small benchmark scripts.

How to run the test suite?

Run all tests using

CTEST_PARALLEL_LEVEL="$(nproc)" CTEST_OUTPUT_ON_FAILURE=1 \
make -C build/release -j "$(nproc)" test

Or rerun only the failed tests using

CTEST_PARALLEL_LEVEL="$(nproc)" CTEST_OUTPUT_ON_FAILURE=1 \
make -C build/release -j "$(nproc)" test ARGS="--rerun-failed"

Run an individual test by cd-ing into its directory and then using

./run_test # in a test directory
./run_test testfile # in a test pile

How to run the bench suite?

Run the full benchmark suite using

make -C build/release -j "$(nproc)" bench # produces tests/measurements.jsonl

It is split into two roughly equal parts so it can be split among the benchmark runner machines. Run each individual part using

make -C build/release -j "$(nproc)" bench-part1 # produces tests/part1.measurements.jsonl
make -C build/release -j "$(nproc)" bench-part2 # produces tests/part2.measurements.jsonl

Make sure not to specify -j "$(nproc)" when running the bench suite manually inside build/release/stage<n>.

Run an individual benchmark by cd-ing into its directory and then using

./run_bench # in a test directory
./run_bench testfile # in a test pile

How to write a test?

If your test fits one of the existing test piles:

  1. Add your test file to the test pile.
  2. Document the test via doc comment inside the test file.
  3. Execute the test as documented above (or run the entire test suite).
  4. Run fix_expected.py to create an .out.expected or .out.ignored file for the test.
  5. Run lint.py.

If your test should be part of one of the existing test directories:

  1. Modify the test directory to include your test.
  2. Document the test via comment or README.md, following the test directory's conventions.

Otherwise, create a new test directory or pile:

  1. Decide on a place to put the new directory.
  2. Write a run_test and/or run_bench script.
  3. Add the directory to the CMakeLists.txt file, next to the other tests near the bottom.
  4. Document the new directory in this readme file by updating the directory structure section above.
  5. Optionally update lint.py if it makes sense.

How to write a benchmark?

When writing a benchmark, consider that most benchmarks are also executed as tests. You can check that this is the case if a run_test script exists next to the run_bench script in the directory. When run as benchmark, the problem instance should be large enough to result in reliable measurements. When run as test, the problem instance should be as small as possible, but large enough to still test the different code paths.

The main mechanism to scale problem instances is the TEST_BENCH environment variable. It is unset in tests and set to 1 in benchmarks. Inside your benchmark, check whether the variable exists and adjust the problem size accordingly.

Inside the compile_bench directory, there is a second mechanism: Using a .init.sh file to pass command line arguments to your test. This is useful if you also want to generate graphs for your parametric benchmarks. See tests/compile_bench/binarytrees.lean as an example.

If you want custom metrics aside from the usual instructions, wall-clock, ... inside the elab_bench or compile_bench directories, you can print them to stdout in the format measurement: <name> <value>[ <unit>], e.g. measurement: size 1337 B or measurement: iterations 42. See tests/compile_bench/ilean_roundtrip.lean as an example.

How to fix existing tests after your change breaks them?

If the tests break because the expected output differs from the actual output, don't blindly copy the produced output into the expected output file. Instead, execute fix_expected.py (you need to have meld installed). This script allows you to review the changes one-by-one.

If the test output is very brittle, either modify the test so the output becomes less brittle, or ignore the output by removing .out.expected, re-running fix_expected.py and choosing to ignore the output. Brittle output that should usually be ignored are detailed compiler debug traces or inherently nondeterministic things like multithreading.

Some test directories or test piles strip or modify certain flaky or nondeterministic outputs (e.g. benchmark timings, reference manual URLs).

How to write a test or bench run script?

Test and bench scripts must be named run_test and run_bench respectively. They must be executable and start with the shebang #!/usr/bin/env bash. Immediately afterwards, they must source env_test.sh or env_bench.sh respectively using a relative path.

The env_*.sh files set some build related environment variables, plus a set of test suite related environment variables document at the top of CMakeLists.txt. The most notable ones are:

  • TEST_DIR: Absolute path to the tests directory.
  • SCRIPT_DIR: Absolute path to the script directory.
  • TEST_BENCH: Set to 1 if we're currently executing a benchmark, unset otherwise.

Finally, the run script should source "$TEST_DIR/util.sh", which provides a few utility functions and also uses set to set sensible bash defaults. See util.sh for the available utility functions.

The run scripts are always executed with their working directory set to their surrounding directory. Inside a test pile, run_test and run_bench receive a relative path to the file under test as their first (and only) argument. Inside a test directory, they receive no arguments.

A test succeeds iff the run_test script exits with exit code 0. A benchmark additionally must produce a measurements file: Inside a test pile, run_bench testfile is expected to produce a testfile.measurments.jsonl file. Inside a test directory, run_bench is expected to produce a measurements.jsonl file.

The elab* test pile

These files are available to configure a test:

  • <file>.init.sh: This file is sourced at the start of the run script. Configure the run script by setting bash variables here.

  • <file>.before.sh: This file is executed before the test/benchmark. Create or set up temporary resources used by the test here. Usually, it is better to create temporary files or directories inside the test itself, so they're also available when opening the file in your editor.

  • <file>.after.sh: This file is executed after the test/benchmark. Delete temporary resources used by the test here.

  • <file>.out.expected: The test fails if its stdout and stderr doesn't match this file's contents. If this file isn't present, the test's output must be empty.

  • <file>.out.ignored: Ignore the test's output entirely; don't compare it to <file>.out.expected.

  • <file>.exit.expected: The test fails if its exit code doesn't match this file's contents. If this file isn't present, the pile's default exit code is used instead. If this file contains the text nonzero, the test's exit code must not be 0.

These bash variables (set via <file>.init.sh) are used by the run script:

  • TEST_LEAN_ARGS: A bash array of additional arguments to the lean command.

The compile* test pile

These files are available to configure a test:

  • <file>.(do|no)_(compile|interpret), <file>.(do|no)_(compile|interpret)_(test|bench): Enable or disable the compiler or interpreter during testing or benchmarking. The more specific files take precedence over the more generic files. Instead of disabling the compiler during tests, consider reducing the problem size by passing different command line parameters via <file>.init.sh.

  • <file>.init.sh: This file is sourced at the start of the run script. Configure the run script by setting bash variables here.

  • <file>.before.sh: This file is executed before the test/benchmark. Create or set up temporary resources used by the test here. Usually, it is better to create temporary files or directories inside the test itself, so they're also available when opening the file in your editor.

  • <file>.after.sh: This file is executed after the test/benchmark. Delete temporary resources used by the test here.

  • <file>.out.expected: The test fails if its stdout and stderr doesn't match this file's contents. If this file isn't present, the test's output must be empty.

  • <file>.out.ignored: Ignore the test's output entirely; don't compare it to <file>.out.expected.

  • <file>.exit.expected: The test fails if its exit code doesn't match this file's contents. If this file isn't present, the test's exit code must be 0. If this file contains the text nonzero, the test's exit code must not be 0.

These bash variables (set via <file>.init.sh) are used by the run script:

  • TEST_LEAN_ARGS: A bash array of additional arguments to the lean command used to compile the lean file.

  • TEST_LEANC_ARGS: A bash array of additional arguments to the leanc command used to compile the c file.

  • TEST_LEANI_ARGS: A bash array of additional arguments to the lean --run <file> command used to interpret the lean file.

  • TEST_ARGS: A bash array of arguments to the compiled (or interpreted) program. Check TEST_BENCH if you want to specify more intense parameters for benchmarks.

The interactive test pile

These tests are designed to test LSP server requests at a given position in the input file. Each .lean file contains comments that indicate how to simulate a client request at a position, using a --^ point to the line position.

Example:

open Foo in
theorem tst2 (h : a ≤ b) : a + 2 ≤ b + 2 :=
Bla.
  --^ completion

In this example, the test driver will simulate an auto-completion request at Bla.. The expected output is stored in the corresponding .out.expected file in the json format that is part of the Language Server Protocol.

This can also be used to test the following additional requests:

--^ textDocument/hover
--^ textDocument/typeDefinition
--^ textDocument/definition
--^ $/lean/plainGoal
--^ $/lean/plainTermGoal
--^ insert: ...
--^ collectDiagnostics