This PR adds two more benchmarks for the Sym-based mvcgen prototype in
the style of `add_sub_cancel`.
The first is `deep_add_sub_cancel`, which is like `add_sub_cancel` but
with a much deeper monad stack:
```lean
abbrev M := ExceptT String <| ReaderT String <| ExceptT Nat <| StateT Nat <| ExceptT Unit <| StateM Unit
```
By specializing the specs for `get` and `set`, we get competitive
performance:
```
goal_100: 180.365086 ms, kernel: 79.634989 ms
goal_200: 313.465611 ms, kernel: 187.808631 ms
goal_300: 478.278585 ms, kernel: 270.210634 ms
goal_400: 638.884320 ms, kernel: 380.381127 ms
goal_500: 759.802772 ms, kernel: 472.662882 ms
goal_600: 933.575180 ms, kernel: 649.040746 ms
goal_700: 1174.367200 ms, kernel: 759.470010 ms
goal_800: 1298.866482 ms, kernel: 864.420171 ms
goal_900: 1475.315552 ms, kernel: 1008.662783 ms
goal_1000: 1627.957444 ms, kernel: 1078.627830 ms
```
Recall that `add_sub_cancel` had `goal_1000: 824.476962 ms, kernel:
477.069045 ms`, but that doesn't need to repeatedly unwrap 3 layers of
the monad.
The second benchmark is `get_throw_set`. Its kernel is
```lean
def step (lim : Nat) : ExceptT String (StateM Nat) Unit := do
let s ← get
if s > lim then
throw "s is too large"
set (s + 1)
def loop (n : Nat) : ExceptT String (StateM Nat) Unit := do
match n with
| 0 => pure ()
| n+1 => loop n; step n
def Goal (n : Nat) : Prop := ⦃fun s => ⌜s = 0⌝⦄ loop n ⦃⇓_ s => ⌜s = n⌝⦄
```
It will generate `n+1` VCs. We get `n` VCs of the form
```
s✝ : Nat
_ : ¬0 < s✝
...
_ : n < s✝ + 1 ...<n times>... + 1
⊢ ⌜s✝ = 0⌝ ⊢ₛ ⌜False⌝ (s✝ + ...<n times>...)
```
and one VC of the form
```
⌜s✝ = 0⌝ ⊢ₛ ⌜s✝ + 1 + <n times> ... + 1 = n⌝
```
which can be discharged by `grind`, but presently are discharged with
`sorry`.
Statistics:
```
goal_100: 209.435869 ms, kernel: 128.768919 ms
goal_200: 386.639441 ms, kernel: 482.244717 ms
goal_300: 559.795137 ms, kernel: 1251.777405 ms
goal_400: 753.243978 ms, kernel: 3020.878177 ms
goal_500: 1014.939522 ms, kernel: 5182.120327 ms
goal_600: 1229.173622 ms, kernel: 9296.551442 ms
goal_700: 1410.024180 ms, kernel: 16655.954682 ms
goal_800: 1684.059305 ms, kernel: 32065.951705 ms
goal_900: 1905.602401 ms, kernel: 55299.942894 ms
goal_1000: 2172.823244 ms, kernel: 84082.492485 ms
```
Need to look at kernel times here, but tactic time looks about alright.
Using `grind` to discharge just `n=100` goals took 8s.
|
||
|---|---|---|
| .. | ||
| cbv | ||
| inundation | ||
| mergeSort | ||
| mvcgen | ||
| qsort | ||
| sym | ||
| .gitignore | ||
| accumulate_profile.py | ||
| arith_eval.ml | ||
| big_beq.lean | ||
| big_beq_rec.lean | ||
| big_deceq.lean | ||
| big_deceq_rec.lean | ||
| big_do.lean | ||
| big_match.lean | ||
| big_match_nat.lean | ||
| big_match_nat_split.lean | ||
| big_match_partial.lean | ||
| big_omega.lean | ||
| big_struct.lean | ||
| big_struct_dep.lean | ||
| big_struct_dep1.lean | ||
| binarytrees.ghc-6.hs | ||
| binarytrees.lean | ||
| binarytrees.lean.args | ||
| binarytrees.lean.expected.out | ||
| binarytrees.ocaml-2.ml | ||
| binarytrees.st.hs | ||
| binarytrees.st.lean | ||
| binarytrees.st.mlton-2.sml | ||
| binarytrees.st.sml | ||
| binarytrees.st.swift | ||
| binarytrees.swift | ||
| binarytrees5.ml | ||
| binarytrees5_multicore.ml | ||
| bv_decide_inequality.lean | ||
| bv_decide_large_aig.lean | ||
| bv_decide_mod.lean | ||
| bv_decide_mul.lean | ||
| bv_decide_realworld.lean | ||
| bv_decide_rewriter.lean | ||
| channel.lean | ||
| charactersIn.lean | ||
| compile.sh | ||
| const_fold.hs | ||
| const_fold.lean | ||
| const_fold.lean.args | ||
| const_fold.lean.expected.out | ||
| const_fold.ml | ||
| const_fold.sml | ||
| const_fold.swift | ||
| cross.yaml | ||
| dag_hassorry_issue.lean | ||
| dag_hassorry_issue.lean.args | ||
| dag_hassorry_issue.lean.expected.out | ||
| delayed_assign.lean | ||
| deriv.hs | ||
| deriv.lean | ||
| deriv.lean.args | ||
| deriv.lean.expected.out | ||
| deriv.ml | ||
| deriv.sml | ||
| deriv.swift | ||
| ex-50-50-1.leq | ||
| flake.lock | ||
| flake.nix | ||
| full-stdlib.exec.yaml | ||
| ghc-gc.py | ||
| hashmap.lean | ||
| identifier_completion.lean | ||
| identifier_completion_didOpen.log | ||
| identifier_completion_initialization.log | ||
| identifier_completion_runner.lean | ||
| ilean_roundtrip.lean | ||
| iterators.lean | ||
| lean-gc.py | ||
| liasolver.lean | ||
| liasolver.lean.args | ||
| liasolver.lean.expected.out | ||
| Makefile | ||
| mlkit-gc.py | ||
| mut_rec_wf.lean | ||
| nat_repr.lean | ||
| nat_repr.lean.args | ||
| nat_repr.lean.expected.out | ||
| ocaml-gc.py | ||
| omega_stress.lean | ||
| parser.lean | ||
| perf.py | ||
| phashmap.lean | ||
| qsort.hs | ||
| qsort.lean | ||
| qsort.lean.args | ||
| qsort.lean.expected.out | ||
| qsort.ml | ||
| qsort.sml | ||
| qsort.swift | ||
| rbmap.hs | ||
| rbmap.lean | ||
| rbmap.lean.args | ||
| rbmap.lean.expected.out | ||
| rbmap.ml | ||
| rbmap.sml | ||
| rbmap.swift | ||
| rbmap2.lean | ||
| rbmap3.lean | ||
| rbmap500k.lean | ||
| rbmap_checkpoint.hs | ||
| rbmap_checkpoint.lean | ||
| rbmap_checkpoint.lean.args | ||
| rbmap_checkpoint.lean.expected.out | ||
| rbmap_checkpoint.ml | ||
| rbmap_checkpoint.sml | ||
| rbmap_checkpoint.swift | ||
| rbmap_checkpoint2.lean | ||
| rbmap_checkpoint2.sml | ||
| rbmap_checkpoint_cpp_lean3.cpp | ||
| rbmap_checkpoint_cpp_std.cpp | ||
| rbmap_cpp_lean3.cpp | ||
| rbmap_cpp_std.cpp | ||
| rbmap_fbip.lean | ||
| rbmap_library.lean | ||
| README.md | ||
| reduceMatch.lean | ||
| report.py | ||
| riscv-ast.lean | ||
| run.sh | ||
| server_startup.lean | ||
| server_startup.log | ||
| sigmaIterator.lean | ||
| simp_arith1.lean | ||
| simp_bubblesort_256.lean | ||
| simp_congr.lean | ||
| simp_local.lean | ||
| simp_subexpr.lean | ||
| speedcenter.exec.velcom.yaml | ||
| speedcenter.yaml | ||
| states35.lean | ||
| test_single.sh | ||
| treemap.lean | ||
| unionfind.lean | ||
| unionfind.lean.args | ||
| unionfind.lean.expected.out | ||
| unionfind_clean.lean | ||
| watchdogRss.lean | ||
| workspaceSymbols.lean | ||
| workspaceSymbolsNewRanges.lean | ||
Lean Benchmark Suites
This folder contains multiple small Lean programs for benchmarking used by two separate benchmark suites based on the temci benchmarking tool:
- The light-weight "Speedcenter" suite benchmarks the current build of Lean. It can be used for quick comparisons on the cmdline and powers the Lean Speedcenter website.
- The heavy-weight "Cross" suite benchmarks multiple Lean configurations and other functional compilers against each other and generates CSV and HTML reports from that. It was created for the paper "Counting Immutable Beans - Reference Counting Optimized for Purely Functional Programming" (IFL19).
Speedcenter Suite
Requirements:
- A local Lean build in
../../build/release. Build at least thebintarget. - temci. Using Nix, open a nix-shell in the project
root directory to add a compatible version to your PATH. Alternatively, try
pip3 install git+https://github.com/parttimenerd/temci.git.
To execute the suite and save the results in base.yaml, run (in this folder)
temci exec --config speedcenter.yaml --out base.yaml
Other interesting exec flags:
- use
--runs Nto modify the default number of 10 runs per benchmark - use
--included_blocks fastto excluded slow benchmarks like the stdlib benchmark. You can replacefastwith any benchmark name or label inspeedcenter.exec.yaml.
If you have multiple saved result files, you can compare them with
temci report --config speedcenter.yaml report1.yaml report2.yaml ...
Cross Suite
We recommend using Nix for building/obtaining all Lean variants and used compilers in a reproducible way. After installing Nix, running the benchmarks is as easy as
nix develop
make
This will record 50 runs for each benchmark configuration (this can be changed with runs in cross.yaml),
generate results in report_lean.csv and report_cross.csv, and print them to stdout in a tabulated format.
It will also generate HTML reports in report/ comparing the time-based benchmarks.
In order to reduce noise in the benchmarking data, you may instead want to try calling make inside a
temci shell:
temci short shell --sudo --preset usable --cpuset_active make
Using root powers, this will temporarily configure your machine similarly to the LLVM benchmarking recommendations and move all your other processes to a single CPU core.