When used with `COMPRESSED_OBJECT_HEADER`, Lean uses a compressed
object header where only 32-bits are reserved for the RC.
The motivation is performance, in our experiments, it is faster to
access a 32-bit counter than a 45-bit one.
With a smaller RC, we can use 8-bits for the memory kind information,
and speedup its access.
The Scala/Clojure approach for persistent arrays works great with our
`reset/reuse`. We seem to be much more efficient than their
implementations because of `reset/reuse`. The new approach also seems
better than the old one implemented in the runtime, and has a few
advantages:
1- The reroot procedure used in the old approach required
synchronization for multi-threaded code, or we would need to perform
deep copies when sending `parray` objects between threads.
2- We don't need any runtime extension for the new approach.
3- The old approach used "trail lists" for undoing array updates.
This works well for bactracking search use cases, but it is bad
in use cases where we are simultaneously updating the persistent
arrays that have shared nodes.
The modification introduces an overhead of 1.5% on the
execution time. Here is the the time for compiling the corelib
Before: 8.61 secs (avg of 3 runs)
After: 8.74 secs (avg of 3 runs)
On the other hand, the size of the compacted region for the command
`#compact_tst 10` is smaller.
Before: 176687728
After: 153794704
The size before this change was 14.8% bigger.
For reference, using the old serializer we generate a buffer of size 105291117.
cc @kha