The Scala/Clojure approach for persistent arrays works great with our
`reset/reuse`. We seem to be much more efficient than their
implementations because of `reset/reuse`. The new approach also seems
better than the old one implemented in the runtime, and has a few
advantages:
1- The reroot procedure used in the old approach required
synchronization for multi-threaded code, or we would need to perform
deep copies when sending `parray` objects between threads.
2- We don't need any runtime extension for the new approach.
3- The old approach used "trail lists" for undoing array updates.
This works well for bactracking search use cases, but it is bad
in use cases where we are simultaneously updating the persistent
arrays that have shared nodes.
In 64-bit machines, the max small nat value should now be (2^63 - 1), and on 32-bit
machines (2^32 - 1).
The main motivation for this modification are the array indexing
operations. With the new representation, if a Nat index is not small,
then it must not be a valid index. This was not true in 64-bit
machines. Example: an array of size 2^33 would fit in memory, and but
an index `i` > 2^32 - 1 would not be a small nat value.
Some of the primitives do not have optimal implementation.
@Kha Could you please check if everything we use in the parser has a
reasonable implementation?
- Lean strings (like std::string) may contain null characters. The
codebase was ignoring this issue.
- We now have a wrapper `string_ref` for wrapping Lean string objects in
C++. This wrapper also implements correctly the coercions std::string <-> string_ref.
Remark: I also found a few places where the code relies on the
following property which is not true
Forall s : std::string, std::string(s.c_str()) == s
- `name` object wrapper was assuming that all numerals were small
`nat` values. This is true in most cases, but the system would
crash when processing if it is a big number.
- The commit tries to make sure runtime/util/kernel are correct.
Modules that will be deleted contain many `TODO` comments
indicating they may crash and/or produce incorrect results
when strings contain null characters and numerals are big.
cc @kha
@kha: I thought about using `string` instead of `string_ref`.
We consistently use `std::string`. So, it should be fine, but I
was concerned about code readability.
After we bootstrap Lean4, we will be able to delete `lean::list`
template, and rename `lean::list_ref` to `lean::list`.
I am going to add `pair_ref` for wrapping Lean pair objects.
If we use `lean::string` instead of `lean::string_ref`, then
we should also use `lean::pair` instead of `lean::pair_ref`.
But, there is a problem in this case since we have
https://github.com/leanprover/lean4/blob/master/src/util/pair.h#L13
:(