This PR fixes fallout of the closure allocator changes in #10982. As far
as we know
this bug only meaningfully manifests in non default build configurations
without mimalloc such as:
`cmake --preset release -DUSE_MIMALLOC=OFF`
The issue is that I forgot to update the deallocation functions for
closures. However, this only
seems to matter if we disable mimalloc which is why this slipped through
testing.
This PR fixes a memleak caused by the Lean based `IO.waitAny`
implementation by reverting it.
This the faulty Lean implementation:
```lean
def IO.waitAny (tasks : @& List (Task α)) (h : tasks.length > 0 := by exact Nat.zero_lt_succ _) :
BaseIO α := do
have : Nonempty α := ⟨tasks[0].get⟩
let promise : IO.Promise α ← IO.Promise.new
tasks.forM <| fun t => BaseIO.chainTask (sync := true) t promise.resolve
return promise.result!.get
```
In a situation where we call this function repeatedly in a loop with a
pair of tasks `[t1, t2]`
where `t2` is a long lived task that we pass every time and `t1` is
fresh a short lived task, `t2` will
accumlate more and more children from `BaseIO.chainTask` that fill
memory over time. The old C++
implementation did not have this issue so we are reverting.
This PR changes the closure allocator to use the general allocator
instead of the small object one.
This is because users may create closures with a gigantic amount of
closed variables which in turn
boost the size of the closure beyond the small object threshold.
This issue was uncovered by #10979. Detecting that the small object
threshold is at fault requires
building mimalloc in debug mode at which point it yields:
```
mimalloc: assertion failed: at "/home/henrik/lean4/build/debug/mimalloc/src/mimalloc/src/alloc.c":132, mi_heap_malloc_small_zero
assertion: "size <= MI_SMALL_SIZE_MAX"
```
The generated code at fault here looks as follows:
```c
LEAN_EXPORT lean_object* l_initExec___at___00res_spec__0(lean_object* x_1) {
_start:
{
lean_object* x_2; lean_object* x_3; lean_object* x_4; lean_object* x_5; lean_object* x_6; lean_object* x_7; lean_object* x_8; lean_object* x_9; lean_object* x_10; lean_object* x_11; lean_object* x_12; lean_object* x_13; lean_object* x_14;
x_2 = lean_alloc_closure((void*)(l_initializer_ext___at___00initExec___at___00res_spec__0_spec__0___lam__0___boxed), 3, 0);
x_3 = l_initExec___redArg___closed__0;
x_4 = l_initExec___redArg___closed__1;
x_5 = l_instMonadLiftNonDetT___closed__0;
x_6 = l_initExec___redArg___closed__2;
x_7 = l_initExec___at___00res_spec__0___closed__0;
lean_inc_ref(x_2);
x_8 = lean_alloc_closure((void*)(l_initExec___at___00res_spec__0___lam__29___boxed), 213, 212);
lean_closure_set(x_8, 0, x_3);
lean_closure_set(x_8, 1, x_2);
lean_closure_set(x_8, 2, x_4);
lean_closure_set(x_8, 3, x_3);
lean_closure_set(x_8, 4, x_4);
lean_closure_set(x_8, 5, x_3);
lean_closure_set(x_8, 6, x_4);
lean_closure_set(x_8, 7, x_3);
lean_closure_set(x_8, 8, x_4);
lean_closure_set(x_8, 9, x_3);
lean_closure_set(x_8, 10, x_4);
lean_closure_set(x_8, 11, x_3);
lean_closure_set(x_8, 12, x_4);
lean_closure_set(x_8, 13, x_3);
lean_closure_set(x_8, 14, x_4);
lean_closure_set(x_8, 15, x_5);
lean_closure_set(x_8, 16, x_6);
lean_closure_set(x_8, 17, x_5);
lean_closure_set(x_8, 18, x_5);
lean_closure_set(x_8, 19, x_5);
lean_closure_set(x_8, 20, x_5);
lean_closure_set(x_8, 21, x_5);
lean_closure_set(x_8, 22, x_5);
...
```
With the crash happening in `lean_alloc_closure` where we
unconditionally invoke the small allocator
which cannot cope with closures this large. Hopefully changing this to
the general purpose allocator
doesn't have too much of an impact on performance.
Closes: #10979
This PR implements zero cost `BaseIO` by erasing the `IO.RealWorld`
parameter from argument lists and structures. This is a **major breaking
change for FFI**.
Concretely:
- `BaseIO` is defined in terms of `ST IO.RealWorld`
- `EIO` (and thus `IO`) is defined in terms of `EST IO.RealWorld`
- The opaque `Void` type is introduced and the trivial structure
optimization updated to account for it. Furthermore, arguments of type
`Void s` are removed from the argument lists of the C functions.
- `ST` is redefined as `Void s -> ST.Out s a` where `ST.Out` is a pair
of `Void s` and `a`
This together has the following major effects on our generated code:
- Functions that return `BaseIO`/`ST`/`EIO`/`IO`/`EST` now do not take
the dummy world parameter anymore. To account for this FFI code needs to
delete the dummy world parameter from the argument lists.
- Functions that return `BaseIO`/`ST` now return their wrapped value
directly. In particular `BaseIO UInt32` now returns a `uint32_t` instead
of a `lean_object*`. To account for this FFI code might have to change
the return type and does not need to call `lean_io_result_mk_ok` anymore
but can instead just `return` values right away (same with extracting
values from `BaseIO` computations.
- Functions that return `EIO`/`IO`/`EST` now only return the equivalent
of an `Except` node which reduces the allocation size. The
`lean_io_result_mk_ok`/`lean_io_result_mk_error` functions were updated
to account for this already so no change is required.
Besides improving performance by dropping allocation (sizes) we can now
also do fun new things such as:
```lean
@[extern "malloc"]
opaque malloc (size : USize) : BaseIO USize
```
This PR adds new variants of `Array.getInternal` and
`Array.get!Internal` that return their argument borrowed, i.e. without a
reference count increment. These are intended for use by the compiler in
cases where it can determine that the array will continue to hold a
valid reference to the element for the returned value's lifetime.
In the future, this will likely be replaced by a return value borrow
annotation, in which case the special variant of the functions could be
removed, with the compiler inserting an extra `inc` in the non-borrow
cases.
This PR re-implements `IO.waitAny` using Lean instead of C++. This is to
reduce the size and
complexity of `task_manager` in order to ease future refactorings.
There is an import behavioral change of `IO.waitAny` in this PR.
Consider a situation where we have
two promises `p1`, `p2` and call `IO.waitAny [p1.result!, p2.result!]`
and `p1` resolves instantly.
Previously this would just return the result of `p1` and require nothing
else. With the new
implementation if `p2` is released before being resolved this can cause
a panic, even if
`IO.waitAny` has already finished. I argue that this is reasonable
behavior, given that an
invocation of `result!` promises that the promise will eventually be
resolved.
glibc adds `__attribute__((nothrow))` to its declarations, at least for
those related to malloc. glibc has yet to introduce `free_sized`, but
when it does it would cause compilation errors. This is due to the fact
that if a function declarations has `__attribute__((nothrow))` and it is
re-declared or implemented in C++ it must also have
`__attribute__((nothrow))` or `noexcept`, otherwise the compilation will
fail.
This is a follow up to https://github.com/leanprover/lean4/pull/6598.
Signed-off-by: Justin King <jcking@google.com>
This PR optimizes lean_nat_shiftr for scalar operands. The new compiler
converts Nat divisions into right shifts, so this now shows up as hot in
some profiles.
This PR adds optimized division functions for `Int` and `Nat` when the
arguments are known to be divisible (such as when normalizing
rationals). These are backed by the gmp functions `mpz_divexact` and
`mpz_divexact_ui`. See also leanprover-community/batteries#1202.
This PR introduces TCP socket support using the LibUV library, enabling
asynchronous I/O operations with it.
---------
Co-authored-by: Henrik Böving <hargonix@gmail.com>
Co-authored-by: Markus Himmel <markus@himmel-villmar.de>
This PR adds a canonical syntax for linking to sections in the language
reference along with formatting of examples in docstrings according to
the docstring style guide.
Docstrings are now pre-processed as follows:
* Output included as part of examples is shown with leading line comment
indicators in hovers
* URLs of the form `lean-manual://section/section-id` are rewritten to
links that point at the corresponding section in the Lean reference
manual. The reference manual's base URL is configured when Lean is built
and can be overridden with the `LEAN_MANUAL_ROOT` environment variable.
This way, releases can point documentation links to the correct
snapshot, and users can use their own, e.g. for offline reading.
Manual URLs in docstrings are validated when the docstring is added. The
presence of a URL starting with `lean-manual://` that is not a
syntactically valid section link causes the docstring to be rejected.
This allows for future extensibility to the set of allowed links. There
is no validation that the linked-to section actually exists. To provide
the best possible error messages in case of validation failures,
`Lean.addDocString` now takes a `TSyntax ``docComment` instead of a
string; clients should adapt by removing the step that extracts the
string, or by calling the lower-level `addDocStringCore` in cases where
the docstring in question is obtained from the environment and has thus
already had its links validated.
A stage0 update is required to make the documentation site configurable
at build time and for releases. A local commit on top of a stage0 update
that will be sent in a followup PR includes the configurable reference
manual root and updates to the release checklist.
---------
Co-authored-by: Marc Huisinga <mhuisi@protonmail.com>
This PR adds `IntX.abs` functions. These are specified by `BitVec.abs`,
so they map `IntX.minValue` to `IntX.minValue`, similar to Rust's
`i8::abs`. In the future we might also have versions which take values
in `UIntX` and/or `Nat`.
This PR upstreams some UInt theorems from Batteries and adds more
`toNat`-related theorems. It also adds the missing `UInt8` and `UInt16`
to/from `USize` conversions so that the the interface is uniform across
the UInt types.
**Summary of all changes:**
* Upstreamed and added `toNat` constructors lemmas: `toNat_mk`,
`ofNat_toNat`, `toNat_ofNat`, `toNat_ofNatCore`, and
`USize.toNat_ofNat32`
* Upstreamed and added `toNat` canonicalization; `val_val_eq_toNat` and
`toNat_toBitVec_eq_toNat`
* Added injectivity iffs: `toBitVec_inj`, `toNat_inj`, and `val_inj`
* Added inequality iffs: `le_iff_toNat_le` and `lt_iff_toNat_lt`
* Upstreamed antisymmetry lemmas: `le_antisymm` and `le_antisymm_iff`
* Upstreamed missing `toNat` lemmas on arithmetic operations:
`toNat_add`, `toNat_sub`, `toNat_mul`
* Upstreamed and added missing conversion lemmas: `toNat_toUInt*` and
`toNat_USize`
* Added missing `USize` conversions: `USize.toUInt8`, `UInt8.toUSize`,
`USize.toUInt16`, `UInt16.toUSize`
This PR changes `lean_sharecommon_{eq,hash}` to only consider the
salient bytes of an object, and not any bytes of any
unspecified/uninitialized unused capacity.
Accessing uninitialized storage results in undefined behaviour.
This does not seem to have any semantics disadvantages: If objects
compare equal after this change, their salient bytes are still equal. By
contrast, if the actual identity of allocations needs to be
distinguished, that can be done by just comparing pointers to the
storage.
If we wanted to retain the current logic, we would need initialize the
otherwise unused parts to some specific value to avoid the undefined
behaviour.
Closes#5831
This PR adds raw transmutation of floating-point numbers to and from
`UInt64`. Floats and UInts share the same endianness across all
supported platforms. The IEEE 754 standard precisely specifies the bit
layout of floats. Note that `Float.toBits` is distinct from
`Float.toUInt64`, which attempts to preserve the numeric value rather
than the bitwise value.
closes#6071
This PR fixes a bug in the constant folding for the `Nat.ble` and
`Nat.blt` function in the old code generator, leading to a
miscompilation.
Closes#6086
This PR implements conversion functions from `Bool` to all `UIntX` and
`IntX` types.
Note that `Bool.toUInt64` already existed in previous versions of Lean.
In C, these are supported only as a vendor extension; they should
instead use proper C99 flexible array members.
In C++, both `[]` and `[0]` are vendor extensions.
Co-authored-by: Thomas Köppe <tkoeppe@google.com>
A more restrictive but efficient max sharing primitive.
**Motivation:** Some software verification proofs may contain
significant redundancy that can be eliminated using hash-consing (also
known as `shareCommon`). For example, [theorem
`sha512_block_armv8_test_4_sym`](460fe5d74c/Proofs/SHA512/SHA512Sym.lean (L29))
took a few seconds at [`addPreDefinitions`
](1a12f63f74/src/Lean/Elab/PreDefinition/Main.lean (L155))
and one second at `fixLevelParams` on a MacBook Pro (with M1 Pro). The
proof term initially had over 16 million subterms, but the redundancy
was indirectly and inefficiently eliminated using `Core.transform` at
`addPreDefinitions`. I tried to use `shareCommon` method to fix the
performance issue, but it was too inefficient. This PR introduces a new
`shareCommon'` method that, although less flexible (e.g., it uses only a
local cache and hash-consing table), is much more efficient. The new
procedure minimizes the number of RC operations and optimizes the
caching strategy. It is 20 times faster than the old `shareCommon`
procedure for theorem `sha512_block_armv8_test_4_sym`.
Continuation of #3958. To ensure that lean code is able to uphold the
invariant that `String`s are valid UTF-8 (which is assumed by the lean
model), we have to make sure that no lean objects are created with
invalid UTF-8. #3958 covers the case of lean code creating strings via
`fromUTF8Unchecked`, but there are still many cases where C++ code
constructs strings from a `const char *` or `std::string` with unclear
UTF-8 status.
To address this and minimize accidental missed validation, the
`(lean_)mk_string` function is modified to validate UTF-8. The original
function is renamed to `mk_string_unchecked`, with several other
variants depending on whether we know the string is UTF-8 or ASCII and
whether we have the length and/or utf8 char count on hand. I reviewed
every function which leads to `mk_string` or its variants in the C code,
and used the appropriate validation function, defaulting to `mk_string`
if the provenance is unclear.
This PR adds no new error handling paths, meaning that incorrect UTF-8
will still produce incorrect results in e.g. IO functions, they are just
not causing unsound behavior anymore. A subsequent PR will handle adding
better error reporting for bad UTF-8.
Adds `IO.getTaskState` which returns the state of a `Task` in the Lean
runtime's task manager. The `TaskState` inductive has 3 constructors:
`waiting`, `running`, and `finished`. The `waiting` constructor
encompasses the waiting and queued states within the C task object
documentation, because the task object does not provide a low cost way
to distinguish these different forms of waiting. Furthermore, it seems
unlikely for consumers to wish to distinguish between these internal
states. The `running` constructor encompasses both the running and
promised states in C docs. While not ideal, the C implementation does
not provide a way to distinguish between a running `Task` and a waiting
`Promise.result` (they both have null closures).
`Nat.repr` was implemented by generating a list of `Chars`, each created
by a 10-way if-then-else. This can cause significant slow down in some
particular use cases.
Now `Nat.repr` is `implemented_by` a faster implementation that uses
C++’s `std::to_string` on small numbers (< USize.size) and maintains an
array of pre-allocated strings for the first 128 numbers.
The handling of big numbers (≥ USize.size) remains as before.
On Windows, we now compile all core `.o`s twice, once with and without
`dllexport`, for use in the shipped dynamic and static libraries,
respectively. On other platforms, we export always as before to avoid
the duplicate work.
---------
Co-authored-by: tydeu <tydeu@hatpress.net>