This PR enables code generation to proceed in parallel to further
elaboration.
It does not aim to make further refinements such as generating code for
different declarations in parallel or removing the dependency on kernel
checking.
This PR modifies the signature of the functions `Nat.fold`,
`Nat.foldRev`, `Nat.any`, `Nat.all`, so that the function is passed the
upper bound. This allows us to change runtime array bounds checks to
compile time checks in many places.
This PR avoids runtime array bounds checks in places where it can
trivially be done at compile time.
None of these changes are of particular consequence: I mostly wanted to
learn how much we do this, and what the obstacles are to doing it less.
This PR changes the signature of `Array.get` to take a Nat and a proof,
rather than a `Fin`, for consistency with the rest of the (planned)
Array API. Note that because of bootstrapping issues we can't provide
`get_elem_tactic` as an autoparameter for the proof. As users will
mostly use the `xs[i]` notation provided by `GetElem`, this hopefully
isn't a problem.
We may restore `Fin` based versions, either here or downstream, as
needed, but they won't be the "main" functions.
---------
Co-authored-by: David Thrane Christiansen <david@davidchristiansen.dk>
Obviously a link to the web docs isn't ideal, but having hovers
available on the symbol is much better than nothing.
---------
Co-authored-by: David Thrane Christiansen <david@davidchristiansen.dk>
Co-authored-by: Sebastian Ullrich <sebasti@nullri.ch>
Currently, `ll_infer_type` is responsible for telling the user about
`noncomputable` when a definition depends on one without executable
code. However, this is imperfect because type inference does not check
every subexpression. This leads to errors later on that users find to be
hard to interpret.
Now, `Lean.IR.checkDecls` has a friendlier error message when it
encounters constants without compiled definitions, suggesting to
consider using `noncomputable`. While this function is an internal IR
consistency check, it is also reasonable to have it give an informative
error message in this particular case. The suggestion to use
`noncomputable` is limited to just unknown constants.
Some alternatives would be to either (1) create another checker just for
missing constants, (2) change `ll_infer_type` to always visit every
subexpression no matter if they are necessary for inferring the type, or
(3) investigate whether `tests/lean/run/1785.lean` is due to a deeper
issue.
Closes#1785
This restores all of the imports of `Lean.Data.HashMap` and
`Lean.Data.HashSet` so that users actually see the deprecation warnings
instead of a "declaration not found" error.
Continuation of #3958. To ensure that lean code is able to uphold the
invariant that `String`s are valid UTF-8 (which is assumed by the lean
model), we have to make sure that no lean objects are created with
invalid UTF-8. #3958 covers the case of lean code creating strings via
`fromUTF8Unchecked`, but there are still many cases where C++ code
constructs strings from a `const char *` or `std::string` with unclear
UTF-8 status.
To address this and minimize accidental missed validation, the
`(lean_)mk_string` function is modified to validate UTF-8. The original
function is renamed to `mk_string_unchecked`, with several other
variants depending on whether we know the string is UTF-8 or ASCII and
whether we have the length and/or utf8 char count on hand. I reviewed
every function which leads to `mk_string` or its variants in the C code,
and used the appropriate validation function, defaulting to `mk_string`
if the provenance is unclear.
This PR adds no new error handling paths, meaning that incorrect UTF-8
will still produce incorrect results in e.g. IO functions, they are just
not causing unsound behavior anymore. A subsequent PR will handle adding
better error reporting for bad UTF-8.
Now, only `(<- ...)`s occurring in the condition of a pure if-then-else
are lifted.
That is, `if (<- foo) then ... else ...` is ok, but `if ... then (<-
foo) else ...` is not. See #3713closes#3713
This PR also adjusts this repo. Note that some of the `(<- ...)` were
harmless since they were just accessing some
read-only state.
This coercion caused difficult-to-diagnose bugs sometimes. Because there
are some situations where converting a string to a name should be done
by parsing the string, and others where it should not, an explicit
choice seems better here.
---------
Co-authored-by: Mac Malone <tydeu@hatpress.net>
Again co-developed with @bollu.
Based on top of: #3225
While hunting down the performance discrepancy on qsort.lean between C
and LLVM we noticed there was a single, trivially optimizeable, alloca
(LLVM's stack memory allocation instruction) that had load/stores in the
hot code path. We then found:
https://groups.google.com/g/llvm-dev/c/e90HiFcFF7Y.
TLDR: `mem2reg`, the pass responsible for getting rid of allocas if
possible, only triggers on an alloca if it is in the first BB. The
allocas of the current implementation get put right at the location
where they are needed -> they are ignored by mem2reg.
Thus we decided to add functionality that allows us to push all allocas
up into the first BB.
We initially wanted to write `buildPrologueAlloca` in a `withReader`
style so:
1. get the current position of the builder
2. jump to first BB and do the thing
3. revert position to the original
However the LLVM C API does not expose an option to obtain the current
position of an IR builder. Thus we ended up at the current
implementation which resets the builder position to the end of the BB
that the function was called from. This is valid because we never
operate anywhere but the end of the current BB in the LLVM emitter.
The numbers on the qsort benchmark got improved by the change as
expected, however we are not fully there yet:
```
C:
Benchmark 1: ./qsort.lean.out 400
Time (mean ± σ): 2.005 s ± 0.013 s [User: 1.996 s, System: 0.003 s]
Range (min … max): 1.993 s … 2.036 s 10 runs
LLVM before aligning the types
Benchmark 1: ./qsort.lean.out 400
Time (mean ± σ): 2.151 s ± 0.007 s [User: 2.146 s, System: 0.001 s]
Range (min … max): 2.142 s … 2.161 s 10 runs
LLVM after aligning the types
Benchmark 1: ./qsort.lean.out 400
Time (mean ± σ): 2.073 s ± 0.011 s [User: 2.067 s, System: 0.002 s]
Range (min … max): 2.060 s … 2.097 s 10 runs
LLVM after this
Benchmark 1: ./qsort.lean.out 400
Time (mean ± σ): 2.038 s ± 0.009 s [User: 2.032 s, System: 0.001 s]
Range (min … max): 2.027 s … 2.052 s 10 runs
```
Note: If you wish to merge this PR independently from its predecessor,
there is no technical dependency between the two, I'm merely stacking
them so we can see the performance impacts of each more clearly.
Debugged and authored in collaboration with @bollu.
This PR fixes several performance regressions of the LLVM backend
compared to the C backend
as described in #3192. We are now at the point where some benchmarks
from `tests/bench` achieve consistently equal and sometimes ever so
slightly better performance when using LLVM instead of C. However there
are still a few testcases where we are lacking behind ever so slightly.
The PR contains two changes:
1. Using the same types for `lean.h` runtime functions in the LLVM
backend as in `lean.h` it turns out that:
a) LLVM does not throw an error if we declare a function with a
different type than it actually has. This happened on multiple occasions
here, in particular when the function used `unsigned`, as it was
wrongfully assumed to be `size_t` sized.
b) Refuses to inline a function to the call site if such a type mismatch
occurs. This means that we did not inline important functionality such
as `lean_ctor_set` and were thus slowed down compared to the C backend
which did this correctly.
2. While developing this change we noticed that LLVM does treat the
following as invalid: Having a function declared with a certain type but
called with integers of a different type. However this will manifest in
completely nonsensical errors upon optimizing the bitcode file through
`leanc` such as:
```
error: Invalid record (Producer: 'LLVM15.0.7' Reader: 'LLVM 15.0.7')
```
Presumably because the generate .bc file is invalid in the first place.
Thus we added a call to `LLVMVerifyModule` before serializing the module
into a bitcode file. This ended producing the expected type errors from
LLVM an aborting the bitcode file generation as expected.
We manually checked each function in `lean.h` that is mentioned in
`EmitLLVM.lean` to make sure that all of their types align correctly
now.
Quick overview of the fast benchmarks as measured on my machine, 2 runs
of LLVM and 2 runs of C to get a feeling for how far the averages move:
- binarytrees: basically equal performance
- binarytrees.st: basically equal performance
- const_fold: equal if not slightly better for LLVM
- deriv: LLVM has 8% more instructions than C but same wall clock time
- liasolver: basically equal performance
- qsort: LLVM is slower by 7% instructions, 4% time. We have identified
why the generated code is slower (there is a store/load in a hot loop in
LLVM that is not in C) but not figured out why that happens/how to
address it.
- rbmap: LLVM has 3% less instructions and 13% less wall-clock time than
C (woop woop)
- rbmap_1 and rbmap_10 show similar behavior
- rbmap_fbip: LLVM has 2% more instructions but 2% better wall time
- rbmap_library: equal if not slightly better for LLVM
- unionfind: LLVM has 5% more instructions but 4% better wall time
Leaving out benchmarks related to the compiler itself as I was too lazy
to keep recompiling it from scratch until we are on a level with C.
Summing things up, it appears that LLVM has now caught up or surpassed
the C backend in the microbenchmarks for the most part. Next steps from
our side are:
- trying to win the qsort benchmark
- figuring out why/how LLVM runs more instructions for less wall-clock
time. My current guesses would be measurement noise and/or better use of
micro architecture?
- measuring the larger benchmarks as well
This obviates the need to play weak linkage games when
we build `lean.h.bc` from `lean.h`. We perform the following steps:
1. We remove the `static` modifier from all definitions in `lean.h`.
This makes all definitions have `extern` linkage. Thus, when we build
a `lean.h.bc` using `clang`, we will actually get definitions
(instead of an empty file)
2. We build `lean.h.bc` from `lean.h` using `clang`.
3. When it comes time to link, we link
`current_module.bc := LLVMLinkModules2(current_module.bc, lean.h.bc)`.
4. We loop over every symbol that arrived from `lean.h.bc`
in `current_module.bc` and we then set this symbol to have
`internal` linkage. This simulates the effect of
`#include <lean.h>` where every definition in `lean.h`
has internal linkage.
This yajna, one hopes, pleases the linker gods.
* feat: add --target flag for LLVM backend to build objects of a different architecture
* chore: remove dead comment
* Update src/Lean/Compiler/IR/EmitLLVM.lean
Co-authored-by: Sebastian Ullrich <sebasti@nullri.ch>
* chore: normalize indentation in src/util/shell.cpp
* chore: strip trailing whitespace
Co-authored-by: Sebastian Ullrich <sebasti@nullri.ch>
Closes#2004.
In porting the bugfix from
6eb852e28f,
I noticed that the LLVM backend was incorrectly generating declaration
initializers (in `callIODeclInitFn`), by assuming the return type of the
initializer is the return type of the declaration. Rather, it must be be
`lean_object`, since the initializer returns an `IO a` value which must be unpacked.
TODO: stop using the `getOrCreateFunction` pattern pervasively.
perform the `create` at the right location, and the `get`
at the correct location.