The new `partial def`s allow us to define `fix` in Lean, but the Lean
implementation is not as efficient as the native one. The native one
in C++ use weak pointers to prevent a closure allocation at every
recursive invocation.
This commit also fixes the `fixCore` helper functions that were broken
after we switched to camelCase.
We have updated the test `fix1.lean` to demonstrate the native
implementation is faster. Here are the numbers on my desktop.
```
./run.sh fix1.lean 24
721420279
Time for 'native fix': 816ms
721420279
Time for 'fix in lean': 1.34s
```
In 64-bit machines, the max small nat value should now be (2^63 - 1), and on 32-bit
machines (2^32 - 1).
The main motivation for this modification are the array indexing
operations. With the new representation, if a Nat index is not small,
then it must not be a valid index. This was not true in 64-bit
machines. Example: an array of size 2^33 would fit in memory, and but
an index `i` > 2^32 - 1 would not be a small nat value.
The idea is to avoid allocating tuples when creating the fixpoint of
nary functions. For example, consider the new tests:
- tests/playground/fix.lean
- tests/playground/fix_with_tuples.lean
The second one (`fix_with_tuples`) uses the `fix` operator and tuples. For input = 20,
it creates more than 1 million extra objects. The first implementation
(`fix.lean`) using `fix₂` avoids this overhead.
TODO: Add support for pattern #N with N > 9 at
```
def expand_extern_pattern_aux (args : list string) : nat → string.iterator → string → string
```
A C++ vtable at `external_object` is bad because it prevents users
from implementing external object in different programming languages.
Another problem was memory leaks because of the vtable in the
beginning of the object.
cc @kha
The "quick" filter `&s1 != &s2` was incorrect.
It was actually always false, since it just comparing the stack address
of `s1` and `s2`.
I incorporated the quick filter into `string_eq`.
I measured the impact using `lean --new-frontend core.lean` and checking
the number of instructions executed reported by Valgrind.
Before: 5,210,225,530
After: 4,891,642,264
@kha
In this commit, we replace the option `LEAN_DEFERRED_FREE` with
`LEAN_LAZY_RC`. The idea is to match the nomenclature used in the
literature. See paper:
https://dl.acm.org/citation.cfm?id=964019
The following slide deck summarizes the paper:
http://www.hboehm.info/popl04/refcnt.pdf
We also implement the very simple approach described on this paper
where a `del(o)` just puts `o` in the "to free" list, and each
allocation frees at most one object. As pointed out in the paper
above lazy RC may prevent a lot of memory from being reclaimed.
For now, I am keeping the new option disabled.
That being said, the test `arith_eval_nat.lean` is 29% faster when
using lazy RC, and beats the OCaml version.
In the following paper
https://www.microsoft.com/en-us/research/wp-content/uploads/2017/01/tm567-1.pdf
a separate thread keeps processing the "to free" list. However, I
think this approach is not compatible with our
`object_memory_kind::STHeap` trick.
Tomorrow, I will measure the space overhead when compiling the Lean
corelib using Lazy RC using my linux desktop
cc @kha
Some of the primitives do not have optimal implementation.
@Kha Could you please check if everything we use in the parser has a
reasonable implementation?