sed -Ei 's/(g_\w+)\s*= new (name|expr|format|level|string_ref)\W.*;/\0\n mark_persistent(\1->raw());/' src/kernel/**/*.cpp src/util/**/*.cpp src/library/**/*.cpp
In 64-bit machines, the max small nat value should now be (2^63 - 1), and on 32-bit
machines (2^32 - 1).
The main motivation for this modification are the array indexing
operations. With the new representation, if a Nat index is not small,
then it must not be a valid index. This was not true in 64-bit
machines. Example: an array of size 2^33 would fit in memory, and but
an index `i` > 2^32 - 1 would not be a small nat value.
- Lean strings (like std::string) may contain null characters. The
codebase was ignoring this issue.
- We now have a wrapper `string_ref` for wrapping Lean string objects in
C++. This wrapper also implements correctly the coercions std::string <-> string_ref.
Remark: I also found a few places where the code relies on the
following property which is not true
Forall s : std::string, std::string(s.c_str()) == s
- `name` object wrapper was assuming that all numerals were small
`nat` values. This is true in most cases, but the system would
crash when processing if it is a big number.
- The commit tries to make sure runtime/util/kernel are correct.
Modules that will be deleted contain many `TODO` comments
indicating they may crash and/or produce incorrect results
when strings contain null characters and numerals are big.
cc @kha
@kha: I thought about using `string` instead of `string_ref`.
We consistently use `std::string`. So, it should be fine, but I
was concerned about code readability.
After we bootstrap Lean4, we will be able to delete `lean::list`
template, and rename `lean::list_ref` to `lean::list`.
I am going to add `pair_ref` for wrapping Lean pair objects.
If we use `lean::string` instead of `lean::string_ref`, then
we should also use `lean::pair` instead of `lean::pair_ref`.
But, there is a problem in this case since we have
https://github.com/leanprover/lean4/blob/master/src/util/pair.h#L13
:(
Reason:
- UTF8 encoding
- Lean strings may contain null char. That is, null char is not an end
of string delimiter like in C. Lean string objects are similar to std::string
@kha The runtime folder includes what is needed to link a
standalone Lean program. It is still contains some unnecessary files.
We will be able to remove them after we release Lean4.
memory_pool object introduces memory contention and unnecessary
complexity. Moreover, it actually reduces performance when we compile
Lean using JEMALLOC.
Here are the numbers for corelib
jemalloc with memory_pool: 13.83 secs
jemalloc without memory_pool: 13.60 secs
The procedure get_tagged_name_suffix violates the assertion.
Alternative solution: change return type of get_tagged_name_suffix to
optional<std::string> since it is only used for pretty printing.