lean4-htt/src/library
Henrik Böving 50d661610d
perf: LLVM backend, put all allocas in the first BB to enable mem2reg (#3244)
Again co-developed with @bollu.

Based on top of: #3225 

While hunting down the performance discrepancy on qsort.lean between C
and LLVM we noticed there was a single, trivially optimizeable, alloca
(LLVM's stack memory allocation instruction) that had load/stores in the
hot code path. We then found:
https://groups.google.com/g/llvm-dev/c/e90HiFcFF7Y.

TLDR: `mem2reg`, the pass responsible for getting rid of allocas if
possible, only triggers on an alloca if it is in the first BB. The
allocas of the current implementation get put right at the location
where they are needed -> they are ignored by mem2reg.

Thus we decided to add functionality that allows us to push all allocas
up into the first BB.
We initially wanted to write `buildPrologueAlloca` in a `withReader`
style so:
1. get the current position of the builder
2. jump to first BB and do the thing
3. revert position to the original

However the LLVM C API does not expose an option to obtain the current
position of an IR builder. Thus we ended up at the current
implementation which resets the builder position to the end of the BB
that the function was called from. This is valid because we never
operate anywhere but the end of the current BB in the LLVM emitter.

The numbers on the qsort benchmark got improved by the change as
expected, however we are not fully there yet:
```
C:
Benchmark 1: ./qsort.lean.out 400
  Time (mean ± σ):      2.005 s ±  0.013 s    [User: 1.996 s, System: 0.003 s]
  Range (min … max):    1.993 s …  2.036 s    10 runs

LLVM before aligning the types
Benchmark 1: ./qsort.lean.out 400
  Time (mean ± σ):      2.151 s ±  0.007 s    [User: 2.146 s, System: 0.001 s]
  Range (min … max):    2.142 s …  2.161 s    10 runs

LLVM after aligning the types
Benchmark 1: ./qsort.lean.out 400
  Time (mean ± σ):      2.073 s ±  0.011 s    [User: 2.067 s, System: 0.002 s]
  Range (min … max):    2.060 s …  2.097 s    10 runs

LLVM after this
Benchmark 1: ./qsort.lean.out 400
  Time (mean ± σ):      2.038 s ±  0.009 s    [User: 2.032 s, System: 0.001 s]
  Range (min … max):    2.027 s …  2.052 s    10 runs
```

Note: If you wish to merge this PR independently from its predecessor,
there is no technical dependency between the two, I'm merely stacking
them so we can see the performance impacts of each more clearly.
2024-02-13 14:54:40 +00:00
..
compiler perf: LLVM backend, put all allocas in the first BB to enable mem2reg (#3244) 2024-02-13 14:54:40 +00:00
constructions fix : make mk_no_confusion_type handle delta-reduction when generating telescope (#2501) 2023-10-14 17:18:37 +11:00
annotation.cpp chore: reduce src/include/lean 2021-09-07 08:24:54 -07:00
annotation.h chore: fix more typos in comments 2023-10-08 14:37:34 -07:00
aux_recursors.cpp chore: use aux recursor extension implemented in Lean 2019-11-02 11:48:02 -07:00
aux_recursors.h chore: use aux recursor extension implemented in Lean 2019-11-02 11:48:02 -07:00
bin_app.cpp
bin_app.h
class.cpp chore: remove dead code at Class.lean used by old frontend 2020-11-20 16:51:44 -08:00
class.h chore: remove dead code at Class.lean used by old frontend 2020-11-20 16:51:44 -08:00
CMakeLists.txt chore: remove dead code 2022-06-15 12:43:59 +02:00
constants.cpp fix: elim_scalar_array_cases 2022-07-24 14:46:46 -07:00
constants.h fix: elim_scalar_array_cases 2022-07-24 14:46:46 -07:00
constants.txt fix: elim_scalar_array_cases 2022-07-24 14:46:46 -07:00
expr_lt.cpp fix: dllexport functions not already annotated in header 2021-09-20 18:41:46 +02:00
expr_lt.h chore: remove dead code 2020-10-27 19:23:14 -07:00
expr_pair.h chore: reduce src/include/lean 2021-09-07 08:24:54 -07:00
expr_pair_maps.h
expr_unsigned_map.h
formatter.cpp chore: remove io_state & abstract_type_context 2021-01-12 09:51:14 -08:00
formatter.h chore: remove io_state & abstract_type_context 2021-01-12 09:51:14 -08:00
init_module.cpp chore: move pp_options.cpp to Lean 2021-01-27 14:16:12 +01:00
init_module.h
max_sharing.cpp chore: reduce src/include/lean 2021-09-07 08:24:54 -07:00
max_sharing.h
module.cpp feat: embed and check githash in .olean (#2766) 2023-11-27 10:24:43 +00:00
module.h chore: reduce src/include/lean 2021-09-07 08:24:54 -07:00
num.cpp chore: improve old pretty printer on numeric literals 2020-01-09 13:48:15 -08:00
num.h chore: reduce src/include/lean 2021-09-07 08:24:54 -07:00
print.cpp fix: pp projection indices starting at 1 2023-10-15 14:25:00 -07:00
print.h chore: fix more typos in comments 2023-10-08 14:37:34 -07:00
profiling.cpp fix: profiler threshold in C++ 2023-04-10 16:57:54 +02:00
profiling.h
projection.cpp refactor: move is_structure_like to inductive.cpp 2021-11-25 11:31:00 -08:00
projection.h refactor: move is_structure_like to inductive.cpp 2021-11-25 11:31:00 -08:00
protected.cpp chore: remove legacy support for modification objects 2020-10-26 08:10:51 -07:00
protected.h
reducible.cpp chore(library/init/lean): export as C functions 2019-08-16 20:15:30 -07:00
reducible.h feat(library/reducible): use new Lean implementation 2019-06-24 15:48:12 -07:00
replace_visitor.cpp chore: reduce src/include/lean 2021-09-07 08:24:54 -07:00
replace_visitor.h chore: remove Expr.localE constructor 2020-11-01 09:37:48 -08:00
suffixes.h
time_task.cpp feat: show typeclass and tactic names in profile output 2023-03-27 17:47:52 +02:00
time_task.h refactor: pos at time_task::time_task was a dead field 2021-01-30 11:10:18 -08:00
trace.cpp feat: additional options for Format.pretty (#3264) 2024-02-07 23:25:21 +00:00
trace.h chore: remove remnants of C++ format 2022-11-18 06:11:24 -08:00
util.cpp feat: System.Platform.target (#3207) 2024-01-24 12:11:00 +00:00
util.h refactor: move is_constructor_app to inductive.cpp 2021-11-25 11:31:00 -08:00