This PR makes the compiler produce C code that statically initializes
close terms when possible. This change reduces startup time as the terms
are directly stored in the binary instead of getting computed at
startup.
The set of terms currently supported by this mechanism are:
- string literals
- ctors called with other statically initializeable arguments
- `Name.mkStrX` and other `Name` ctors as they require special support
due to their computed field and occur frequently due to name literals.
In core there are currently 152,524 closed terms and of these 103,929
(68%) get initialized statically with this PR. The remaining 48585 ones
are not extracted because they use (potentially transitively) various
non trivial pieces of code like `stringToMessageData` etc. We might
decide to add special support for these in the future but for the moment
this feels like it's overfitting too much for core.
This PR adds a symbol to the runtime for marking `Array`
non-linearities. This should allow users to
spot them more easily in profiles or hunt them down using a debugger.
This PR fixes a SIGFPE crash on x86_64 when evaluating `INT_MIN / -1` or
`INT_MIN % -1` for signed integer types.
On x86_64, the `idiv` instruction traps when the quotient overflows the
destination register. For signed integers, `INT_MIN / -1` produces a
result that overflows (e.g., `-2147483648 / -1 = 2147483648` which
doesn't fit in Int32). ARM64's `sdiv` instruction wraps instead of
trapping.
The fix:
- For Int8/Int16/Int32: widen to the next larger type before
dividing/modding, then truncate back
- For Int64: explicitly check for the overflow case and return the
wrapped result
Fixes#11612🤖 Prepared with Claude Code
This PR fixes fallout of the closure allocator changes in #10982. As far
as we know
this bug only meaningfully manifests in non default build configurations
without mimalloc such as:
`cmake --preset release -DUSE_MIMALLOC=OFF`
The issue is that I forgot to update the deallocation functions for
closures. However, this only
seems to matter if we disable mimalloc which is why this slipped through
testing.
This PR fixes a memleak caused by the Lean based `IO.waitAny`
implementation by reverting it.
This the faulty Lean implementation:
```lean
def IO.waitAny (tasks : @& List (Task α)) (h : tasks.length > 0 := by exact Nat.zero_lt_succ _) :
BaseIO α := do
have : Nonempty α := ⟨tasks[0].get⟩
let promise : IO.Promise α ← IO.Promise.new
tasks.forM <| fun t => BaseIO.chainTask (sync := true) t promise.resolve
return promise.result!.get
```
In a situation where we call this function repeatedly in a loop with a
pair of tasks `[t1, t2]`
where `t2` is a long lived task that we pass every time and `t1` is
fresh a short lived task, `t2` will
accumlate more and more children from `BaseIO.chainTask` that fill
memory over time. The old C++
implementation did not have this issue so we are reverting.
This PR changes the closure allocator to use the general allocator
instead of the small object one.
This is because users may create closures with a gigantic amount of
closed variables which in turn
boost the size of the closure beyond the small object threshold.
This issue was uncovered by #10979. Detecting that the small object
threshold is at fault requires
building mimalloc in debug mode at which point it yields:
```
mimalloc: assertion failed: at "/home/henrik/lean4/build/debug/mimalloc/src/mimalloc/src/alloc.c":132, mi_heap_malloc_small_zero
assertion: "size <= MI_SMALL_SIZE_MAX"
```
The generated code at fault here looks as follows:
```c
LEAN_EXPORT lean_object* l_initExec___at___00res_spec__0(lean_object* x_1) {
_start:
{
lean_object* x_2; lean_object* x_3; lean_object* x_4; lean_object* x_5; lean_object* x_6; lean_object* x_7; lean_object* x_8; lean_object* x_9; lean_object* x_10; lean_object* x_11; lean_object* x_12; lean_object* x_13; lean_object* x_14;
x_2 = lean_alloc_closure((void*)(l_initializer_ext___at___00initExec___at___00res_spec__0_spec__0___lam__0___boxed), 3, 0);
x_3 = l_initExec___redArg___closed__0;
x_4 = l_initExec___redArg___closed__1;
x_5 = l_instMonadLiftNonDetT___closed__0;
x_6 = l_initExec___redArg___closed__2;
x_7 = l_initExec___at___00res_spec__0___closed__0;
lean_inc_ref(x_2);
x_8 = lean_alloc_closure((void*)(l_initExec___at___00res_spec__0___lam__29___boxed), 213, 212);
lean_closure_set(x_8, 0, x_3);
lean_closure_set(x_8, 1, x_2);
lean_closure_set(x_8, 2, x_4);
lean_closure_set(x_8, 3, x_3);
lean_closure_set(x_8, 4, x_4);
lean_closure_set(x_8, 5, x_3);
lean_closure_set(x_8, 6, x_4);
lean_closure_set(x_8, 7, x_3);
lean_closure_set(x_8, 8, x_4);
lean_closure_set(x_8, 9, x_3);
lean_closure_set(x_8, 10, x_4);
lean_closure_set(x_8, 11, x_3);
lean_closure_set(x_8, 12, x_4);
lean_closure_set(x_8, 13, x_3);
lean_closure_set(x_8, 14, x_4);
lean_closure_set(x_8, 15, x_5);
lean_closure_set(x_8, 16, x_6);
lean_closure_set(x_8, 17, x_5);
lean_closure_set(x_8, 18, x_5);
lean_closure_set(x_8, 19, x_5);
lean_closure_set(x_8, 20, x_5);
lean_closure_set(x_8, 21, x_5);
lean_closure_set(x_8, 22, x_5);
...
```
With the crash happening in `lean_alloc_closure` where we
unconditionally invoke the small allocator
which cannot cope with closures this large. Hopefully changing this to
the general purpose allocator
doesn't have too much of an impact on performance.
Closes: #10979
This PR implements zero cost `BaseIO` by erasing the `IO.RealWorld`
parameter from argument lists and structures. This is a **major breaking
change for FFI**.
Concretely:
- `BaseIO` is defined in terms of `ST IO.RealWorld`
- `EIO` (and thus `IO`) is defined in terms of `EST IO.RealWorld`
- The opaque `Void` type is introduced and the trivial structure
optimization updated to account for it. Furthermore, arguments of type
`Void s` are removed from the argument lists of the C functions.
- `ST` is redefined as `Void s -> ST.Out s a` where `ST.Out` is a pair
of `Void s` and `a`
This together has the following major effects on our generated code:
- Functions that return `BaseIO`/`ST`/`EIO`/`IO`/`EST` now do not take
the dummy world parameter anymore. To account for this FFI code needs to
delete the dummy world parameter from the argument lists.
- Functions that return `BaseIO`/`ST` now return their wrapped value
directly. In particular `BaseIO UInt32` now returns a `uint32_t` instead
of a `lean_object*`. To account for this FFI code might have to change
the return type and does not need to call `lean_io_result_mk_ok` anymore
but can instead just `return` values right away (same with extracting
values from `BaseIO` computations.
- Functions that return `EIO`/`IO`/`EST` now only return the equivalent
of an `Except` node which reduces the allocation size. The
`lean_io_result_mk_ok`/`lean_io_result_mk_error` functions were updated
to account for this already so no change is required.
Besides improving performance by dropping allocation (sizes) we can now
also do fun new things such as:
```lean
@[extern "malloc"]
opaque malloc (size : USize) : BaseIO USize
```
This PR adds new variants of `Array.getInternal` and
`Array.get!Internal` that return their argument borrowed, i.e. without a
reference count increment. These are intended for use by the compiler in
cases where it can determine that the array will continue to hold a
valid reference to the element for the returned value's lifetime.
In the future, this will likely be replaced by a return value borrow
annotation, in which case the special variant of the functions could be
removed, with the compiler inserting an extra `inc` in the non-borrow
cases.
This PR re-implements `IO.waitAny` using Lean instead of C++. This is to
reduce the size and
complexity of `task_manager` in order to ease future refactorings.
There is an import behavioral change of `IO.waitAny` in this PR.
Consider a situation where we have
two promises `p1`, `p2` and call `IO.waitAny [p1.result!, p2.result!]`
and `p1` resolves instantly.
Previously this would just return the result of `p1` and require nothing
else. With the new
implementation if `p2` is released before being resolved this can cause
a panic, even if
`IO.waitAny` has already finished. I argue that this is reasonable
behavior, given that an
invocation of `result!` promises that the promise will eventually be
resolved.
glibc adds `__attribute__((nothrow))` to its declarations, at least for
those related to malloc. glibc has yet to introduce `free_sized`, but
when it does it would cause compilation errors. This is due to the fact
that if a function declarations has `__attribute__((nothrow))` and it is
re-declared or implemented in C++ it must also have
`__attribute__((nothrow))` or `noexcept`, otherwise the compilation will
fail.
This is a follow up to https://github.com/leanprover/lean4/pull/6598.
Signed-off-by: Justin King <jcking@google.com>
This PR optimizes lean_nat_shiftr for scalar operands. The new compiler
converts Nat divisions into right shifts, so this now shows up as hot in
some profiles.
This PR adds optimized division functions for `Int` and `Nat` when the
arguments are known to be divisible (such as when normalizing
rationals). These are backed by the gmp functions `mpz_divexact` and
`mpz_divexact_ui`. See also leanprover-community/batteries#1202.
This PR introduces TCP socket support using the LibUV library, enabling
asynchronous I/O operations with it.
---------
Co-authored-by: Henrik Böving <hargonix@gmail.com>
Co-authored-by: Markus Himmel <markus@himmel-villmar.de>
This PR adds a canonical syntax for linking to sections in the language
reference along with formatting of examples in docstrings according to
the docstring style guide.
Docstrings are now pre-processed as follows:
* Output included as part of examples is shown with leading line comment
indicators in hovers
* URLs of the form `lean-manual://section/section-id` are rewritten to
links that point at the corresponding section in the Lean reference
manual. The reference manual's base URL is configured when Lean is built
and can be overridden with the `LEAN_MANUAL_ROOT` environment variable.
This way, releases can point documentation links to the correct
snapshot, and users can use their own, e.g. for offline reading.
Manual URLs in docstrings are validated when the docstring is added. The
presence of a URL starting with `lean-manual://` that is not a
syntactically valid section link causes the docstring to be rejected.
This allows for future extensibility to the set of allowed links. There
is no validation that the linked-to section actually exists. To provide
the best possible error messages in case of validation failures,
`Lean.addDocString` now takes a `TSyntax ``docComment` instead of a
string; clients should adapt by removing the step that extracts the
string, or by calling the lower-level `addDocStringCore` in cases where
the docstring in question is obtained from the environment and has thus
already had its links validated.
A stage0 update is required to make the documentation site configurable
at build time and for releases. A local commit on top of a stage0 update
that will be sent in a followup PR includes the configurable reference
manual root and updates to the release checklist.
---------
Co-authored-by: Marc Huisinga <mhuisi@protonmail.com>
This PR adds `IntX.abs` functions. These are specified by `BitVec.abs`,
so they map `IntX.minValue` to `IntX.minValue`, similar to Rust's
`i8::abs`. In the future we might also have versions which take values
in `UIntX` and/or `Nat`.
This PR upstreams some UInt theorems from Batteries and adds more
`toNat`-related theorems. It also adds the missing `UInt8` and `UInt16`
to/from `USize` conversions so that the the interface is uniform across
the UInt types.
**Summary of all changes:**
* Upstreamed and added `toNat` constructors lemmas: `toNat_mk`,
`ofNat_toNat`, `toNat_ofNat`, `toNat_ofNatCore`, and
`USize.toNat_ofNat32`
* Upstreamed and added `toNat` canonicalization; `val_val_eq_toNat` and
`toNat_toBitVec_eq_toNat`
* Added injectivity iffs: `toBitVec_inj`, `toNat_inj`, and `val_inj`
* Added inequality iffs: `le_iff_toNat_le` and `lt_iff_toNat_lt`
* Upstreamed antisymmetry lemmas: `le_antisymm` and `le_antisymm_iff`
* Upstreamed missing `toNat` lemmas on arithmetic operations:
`toNat_add`, `toNat_sub`, `toNat_mul`
* Upstreamed and added missing conversion lemmas: `toNat_toUInt*` and
`toNat_USize`
* Added missing `USize` conversions: `USize.toUInt8`, `UInt8.toUSize`,
`USize.toUInt16`, `UInt16.toUSize`
This PR changes `lean_sharecommon_{eq,hash}` to only consider the
salient bytes of an object, and not any bytes of any
unspecified/uninitialized unused capacity.
Accessing uninitialized storage results in undefined behaviour.
This does not seem to have any semantics disadvantages: If objects
compare equal after this change, their salient bytes are still equal. By
contrast, if the actual identity of allocations needs to be
distinguished, that can be done by just comparing pointers to the
storage.
If we wanted to retain the current logic, we would need initialize the
otherwise unused parts to some specific value to avoid the undefined
behaviour.
Closes#5831
This PR adds raw transmutation of floating-point numbers to and from
`UInt64`. Floats and UInts share the same endianness across all
supported platforms. The IEEE 754 standard precisely specifies the bit
layout of floats. Note that `Float.toBits` is distinct from
`Float.toUInt64`, which attempts to preserve the numeric value rather
than the bitwise value.
closes#6071
This PR fixes a bug in the constant folding for the `Nat.ble` and
`Nat.blt` function in the old code generator, leading to a
miscompilation.
Closes#6086
This PR implements conversion functions from `Bool` to all `UIntX` and
`IntX` types.
Note that `Bool.toUInt64` already existed in previous versions of Lean.