@kha I wrote a simple test in Lean and OCaml. Right now, the numbers on
my machine are
arith_eval.ml 8.13 secs
arith_eval_nat.lean 10.71 secs
OCaml is computing with machine boxed integers, and we are computing
with `nat`. Our version is more expensive since we have to check
whether the number is small or big, and whether the result needs to be a
mpz value or not.
Almost half of our runtime is spent deallocating the big object returned
by `mk_expr`. The deferred free feature does not help here because
we don't deallocate the object in the end but as we execute `eeval`.
So, we perform many small invocations to `del`. None of them take
long, but the overall cost is super high. I can use a different strategy
where `del(o)` just updates the `g_to_free` list, and we deallocate
at most `LEAN_DEFERRED_FREE_QUOTA` at each allocation. The current
deferred free approach would also work if we could use the borrowed
annotations in `eeval`. In this case, we would not delete the input
expression as we evaluate it.
As an experiment, I manually added a `lean::inc` before invoking
`eeval`. The idea was prevent memory deallocation. With this
modification, the program runs in 5.87 secs.
BTW, I also wrote a version using uint32 (arith_eval_uint32.lean),
but the current compiler generates poor code for it.
I know how to fix the performance problem.