- Lean strings (like std::string) may contain null characters. The
codebase was ignoring this issue.
- We now have a wrapper `string_ref` for wrapping Lean string objects in
C++. This wrapper also implements correctly the coercions std::string <-> string_ref.
Remark: I also found a few places where the code relies on the
following property which is not true
Forall s : std::string, std::string(s.c_str()) == s
- `name` object wrapper was assuming that all numerals were small
`nat` values. This is true in most cases, but the system would
crash when processing if it is a big number.
- The commit tries to make sure runtime/util/kernel are correct.
Modules that will be deleted contain many `TODO` comments
indicating they may crash and/or produce incorrect results
when strings contain null characters and numerals are big.
cc @kha
@kha: I thought about using `string` instead of `string_ref`.
We consistently use `std::string`. So, it should be fine, but I
was concerned about code readability.
After we bootstrap Lean4, we will be able to delete `lean::list`
template, and rename `lean::list_ref` to `lean::list`.
I am going to add `pair_ref` for wrapping Lean pair objects.
If we use `lean::string` instead of `lean::string_ref`, then
we should also use `lean::pair` instead of `lean::pair_ref`.
But, there is a problem in this case since we have
https://github.com/leanprover/lean4/blob/master/src/util/pair.h#L13
:(
@kha The runtime folder includes what is needed to link a
standalone Lean program. It is still contains some unnecessary files.
We will be able to remove them after we release Lean4.
memory_pool object introduces memory contention and unnecessary
complexity. Moreover, it actually reduces performance when we compile
Lean using JEMALLOC.
Here are the numbers for corelib
jemalloc with memory_pool: 13.83 secs
jemalloc without memory_pool: 13.60 secs
We use small_object_allocator to allocate vm_obj's.
However small_object_allocator is not thread safe. So, we need to copy
vm_obj's between threads. Moreover, in our experiments, we observed that
JEMALLOC is actually faster than the small_object_allocator.
Here are numbers for the reduced corelib.
small_object_allocator: 15.62 secs
gcc 4.9 allocator: 16.19 secs
jemalloc: 13.83 secs
@kha I have added a few performance counters.
I collect their values at each snapshot.
Right now, I am printing only the values in the last snapshot, but if we want
we can even display their progress over time.
Right now, I track the following information
- number of allocated closures
- number of allocated constructors/objects
- number of allocated big numbers
The equation compiler uses different strategies for processing
recursive equations. Some of them may produce unclear runtime cost
model. For example, the following fibonacci functions was running in
linear time instead of exponential time because the equation compiler
used the brec_on recursor.
def fib : nat → nat
| 0 := 1
| 1 := 1
| (n+2) := fib (n+1) + fib n
@dselsam and @jroesch have reported examples were the equation compiler
produces a negative performance impact. The new test (`eval` function)
captures the problem reported by @jroesch. In this example, the runtime
should not depend on the "amount of fuel".
This commit addresses this issue.
closes#1175
The types `string_imp` and `string.iterator_imp` were supposed to be
marked private, but we cannot do it because we need to provide
`string_imp.mk`, `string_imp.cases_on`, `string.iterator_imp.mk` and
`string.iterator_imp.cases_on` in the VM since we use a different
internal representation. Note that marking them as private does not
work since users can still access `string_imp.cases_on` using
meta-programming.
So, we need better support for private declarations.
Missing feature, char literals do not support non ASCII values.
That is, in the current implementation, we cannot write 'α'.
This will be implemented in the future.
The VM native implementation does not behave correctly for huge
strings (i.e., strings with more than 4G characters).
The problem is that the current implementation relies on
```
size_t force_to_size_t(vm_obj const & o, size_t def)
```
We may also have overflow problems in the string.iterator implementation
code. This is not a big deal right now, since I doubt we will try
to process string with more than 2^32 characters.
@Kha the `core_lib` and tests seem to be working correctly, but
we need more tests.
Older gcc compilers generate a warning when the attribute is used.
I found out that GCC 7 will not produce a warning if comments
such as /* fall-thru */ or /* FALLTHRU */ are used instead of the
attribute [[fallthrough]]
eval_expr creates auxiliary definitions in the VM. These auxiliary
definitions are gone after the VM finishes.
We store vm_obj's in the attribute_manager.
Before this commit, Lean was crashing in the following scenario:
1- A new caching_user_attribute is defined, and the user data structure
contains closures.
2- The closures are created using eval_expr.
3- When reusing the cached values, the system crashes when trying
to apply a closure created using eval_expr. The closure points to
an auxiliary definition that has already been deleted.
The new test exposes the problem. This is not a hypothetical scenario,
the new test is based on the Lean - Mathematica integration being
developed by @rlewis1988.
The fix consists in making sure we do not cache anything if
the VM environment has been updated by eval_expr.
I believe this is acceptable behavior. eval_expr is a very low level
tactic, and I don't see a good motivation for invoking it when
constructing the cache.
BTW, the test can be relaxed if the vm_attr does not contain closures.
However, it doesn't seem to pay off.
Another potential fix would be to propagate the definitions created
by eval_expr to the main environment. However, I think this is not
acceptable.
We will be flooding the main environment with useless temporary definitions
created by `eval_expr`.
This commit also stores the environment at caching time, and make
sure the cache is only reused if the current environment is a descendant
of the the one at caching time. This is fixing a different potential
bug.
This will minimize the size of the m_builtin_cases_vector.
It also indirectly prevents the crash decribed at 144d9096e2.
However, the fix used there is more robust.
We generate internal ids for builtin cases_on recursors.
These ids were being saved in the .olean files.
This was fine before commit 41e8a1712e because we had a separate
mapping for builtin cases_on recursors. Now, all ids are stored in the
same mapping. Thus, minor changes in the set of VM builtin operations
make lean crash when importing .olean files because they will change the
internal id for the builtin cases_on.
The problem can be reproduced in the following way:
0- Go to build/release
1- make clean-olean
2- make
Everything is fine after step 2
3- Comment the following line at tactic_state.cpp
DECLARE_VM_BUILTIN(name({"tactic", "open_namespaces"}), tactic_open_namespaces);
4- make
5- Lean will crash when executing the following command
../../bin/lean ../../library/init/meta/tactic.lean
I believe this bug is reponsible by the crash that @jroesch reported on Slack.
This commit fixes the problem by storing the name of the builtin
cases_on recursor in the .olean file.
@gebner, I have been experiencing crashes that are hard to reproduce.
I think one of the problems was that get_vm_name was returning a `name const &`.
I think this may produce a memory access violation in the following
scenario:
1- Thread 1 invokes get_vm_name, and gets a reference R. This is a
reference to a memory cell in the vector m_idx2name.
2- Thread 2 invokes get_vm_index, and it triggers a vector resize
operation. After the resize, reference R is invalid.
3- Thread 1 crashes trying to access R.