This PR extends the behavior of the `sync` flag for `Task.map/bind` etc.
to encompass synchronous execution even when they first have to wait on
completion of the first task, drastically lowering the overhead of such
tasks. Thus the flag is now equivalent to e.g. .NET's
`TaskContinuationOptions.ExecuteSynchronously`.
This PR changes `lean_sharecommon_{eq,hash}` to only consider the
salient bytes of an object, and not any bytes of any
unspecified/uninitialized unused capacity.
Accessing uninitialized storage results in undefined behaviour.
This does not seem to have any semantics disadvantages: If objects
compare equal after this change, their salient bytes are still equal. By
contrast, if the actual identity of allocations needs to be
distinguished, that can be done by just comparing pointers to the
storage.
If we wanted to retain the current logic, we would need initialize the
otherwise unused parts to some specific value to avoid the undefined
behaviour.
Closes#5831
This PR adds raw transmutation of floating-point numbers to and from
`UInt64`. Floats and UInts share the same endianness across all
supported platforms. The IEEE 754 standard precisely specifies the bit
layout of floats. Note that `Float.toBits` is distinct from
`Float.toUInt64`, which attempts to preserve the numeric value rather
than the bitwise value.
closes#6071
...unless we are about to kill the process anyway (which is not the
default)
Ensures panics are visible as regular messages in the language server
and properly ordered in relation to other messages on the cmdline
Continuation of #3958. To ensure that lean code is able to uphold the
invariant that `String`s are valid UTF-8 (which is assumed by the lean
model), we have to make sure that no lean objects are created with
invalid UTF-8. #3958 covers the case of lean code creating strings via
`fromUTF8Unchecked`, but there are still many cases where C++ code
constructs strings from a `const char *` or `std::string` with unclear
UTF-8 status.
To address this and minimize accidental missed validation, the
`(lean_)mk_string` function is modified to validate UTF-8. The original
function is renamed to `mk_string_unchecked`, with several other
variants depending on whether we know the string is UTF-8 or ASCII and
whether we have the length and/or utf8 char count on hand. I reviewed
every function which leads to `mk_string` or its variants in the C code,
and used the appropriate validation function, defaulting to `mk_string`
if the provenance is unclear.
This PR adds no new error handling paths, meaning that incorrect UTF-8
will still produce incorrect results in e.g. IO functions, they are just
not causing unsound behavior anymore. A subsequent PR will handle adding
better error reporting for bad UTF-8.
Adds `IO.getTaskState` which returns the state of a `Task` in the Lean
runtime's task manager. The `TaskState` inductive has 3 constructors:
`waiting`, `running`, and `finished`. The `waiting` constructor
encompasses the waiting and queued states within the C task object
documentation, because the task object does not provide a low cost way
to distinguish these different forms of waiting. Furthermore, it seems
unlikely for consumers to wish to distinguish between these internal
states. The `running` constructor encompasses both the running and
promised states in C docs. While not ideal, the C implementation does
not provide a way to distinguish between a running `Task` and a waiting
`Promise.result` (they both have null closures).
Previously, there was a function `opaque fromUTF8Unchecked : ByteArray
-> String` which would convert a list of bytes into a string, but as the
name implies it does not validate that the string is UTF-8 before doing
so and as a result it produces unsound results in the compiler (because
the lean model of `String` indirectly asserts UTF-8 validity). This PR
replaces that function by
```lean
opaque validateUTF8 (a : @& ByteArray) : Bool
opaque fromUTF8 (a : @& ByteArray) (h : validateUTF8 a) : String
```
so that while the function is still "unchecked", we have a proof witness
that the string is valid. To recover the original, actually unchecked
version, use `lcProof` or other unsafe methods to produce the proof
witness.
Because this was the only `ByteArray -> String` conversion function, it
was used in several places in an unsound way (e.g. reading untrusted
input from IO and treating it as UTF-8). These have been replaced by
`fromUTF8?` or `fromUTF8!` as appropriate.
`Nat.repr` was implemented by generating a list of `Chars`, each created
by a 10-way if-then-else. This can cause significant slow down in some
particular use cases.
Now `Nat.repr` is `implemented_by` a faster implementation that uses
C++’s `std::to_string` on small numbers (< USize.size) and maintains an
array of pre-allocated strings for the first 128 numbers.
The handling of big numbers (≥ USize.size) remains as before.
In the new snapshot design, we have a tree of `Task`s that represents
the asynchronously processed document structure. When transforming this
tree in response to a user edit, we want to quickly run through
reusable, already computed nodes of the tree synchronously and then
spawn new tasks for the new parts. The new flag allows us to do such
mixed sync/async tree transformations uniformly. This flag exists as
e.g.
[`ExecuteSynchronously`](https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.taskcontinuationoptions?view=net-8.0)
in other runtimes.
Without this change, enqueuing multiple tasks before the first worker
was started led to only a single worker being created. Now the first
increment and decrement happen under the task manager mutex, so
effectively the worker is never idle until it is out of tasks.