Example:
```
x = lean::box(0);
...
lean::inc(x);
...
lean::inc(x);
...
lean::dec(x);
...
```
In the example above, the `inc` and `dec` operations are unnecessary
since `x` is known to be a (boxed) scalar value. This commit fixes this.
`whnf_core(e)` uses `whnf` to reduce the major premise of recursors and
projections, and `whnf` unfolds arbitrary definitions.
This commit adds a new option (`cheap`) to `whnf_core`. When
`whnf_core(e, true)` is used, the type checker will not unfolding
definitions when reducing the major premises.
@kha I implemented a more complex example for the paper. The difference
in performance is awesome. On my Linux desktop:
Lean time: 3.34 secs
OCaml time: 11.72 secs
I believe that the difference in performance is due to destructive
updates happening in transformation functions such as `reassoc` and
`const_folding`. I will add a flag to Lean to disable `reuse/reset`
automatic insertion.
BTW, this test requires `ulimit -s unlimited` to avoid stack overflows.
In this commit, we replace the option `LEAN_DEFERRED_FREE` with
`LEAN_LAZY_RC`. The idea is to match the nomenclature used in the
literature. See paper:
https://dl.acm.org/citation.cfm?id=964019
The following slide deck summarizes the paper:
http://www.hboehm.info/popl04/refcnt.pdf
We also implement the very simple approach described on this paper
where a `del(o)` just puts `o` in the "to free" list, and each
allocation frees at most one object. As pointed out in the paper
above lazy RC may prevent a lot of memory from being reclaimed.
For now, I am keeping the new option disabled.
That being said, the test `arith_eval_nat.lean` is 29% faster when
using lazy RC, and beats the OCaml version.
In the following paper
https://www.microsoft.com/en-us/research/wp-content/uploads/2017/01/tm567-1.pdf
a separate thread keeps processing the "to free" list. However, I
think this approach is not compatible with our
`object_memory_kind::STHeap` trick.
Tomorrow, I will measure the space overhead when compiling the Lean
corelib using Lazy RC using my linux desktop
cc @kha
@kha I wrote a simple test in Lean and OCaml. Right now, the numbers on
my machine are
arith_eval.ml 8.13 secs
arith_eval_nat.lean 10.71 secs
OCaml is computing with machine boxed integers, and we are computing
with `nat`. Our version is more expensive since we have to check
whether the number is small or big, and whether the result needs to be a
mpz value or not.
Almost half of our runtime is spent deallocating the big object returned
by `mk_expr`. The deferred free feature does not help here because
we don't deallocate the object in the end but as we execute `eeval`.
So, we perform many small invocations to `del`. None of them take
long, but the overall cost is super high. I can use a different strategy
where `del(o)` just updates the `g_to_free` list, and we deallocate
at most `LEAN_DEFERRED_FREE_QUOTA` at each allocation. The current
deferred free approach would also work if we could use the borrowed
annotations in `eeval`. In this case, we would not delete the input
expression as we evaluate it.
As an experiment, I manually added a `lean::inc` before invoking
`eeval`. The idea was prevent memory deallocation. With this
modification, the program runs in 5.87 secs.
BTW, I also wrote a version using uint32 (arith_eval_uint32.lean),
but the current compiler generates poor code for it.
I know how to fix the performance problem.