502 lines
26 KiB
Markdown
502 lines
26 KiB
Markdown
Design notes for Lean4
|
||
----------------------
|
||
|
||
# Goals
|
||
|
||
- Move more code from C++ to Lean.
|
||
- New compiler and C++ code generator.
|
||
- New runtime (support for unboxed values and FFI).
|
||
- New parser and macro expander (in Lean).
|
||
- New monad for accessing primitives that are only available in C++ (e.g., `type_context`).
|
||
- Fix critical issues (e.g., issue #1601).
|
||
- Fix language design issues.
|
||
- Reduce clutter in the core lib and code base.
|
||
|
||
# Plan
|
||
|
||
- Create Lean4 branch
|
||
- Disable most tests (they will be incrementally added back as we make progress on Lean4).
|
||
- Dramatically reduce the size of core lib. We should only keep the basics that
|
||
are needed to execute Lean programs. Remove most theorems and lemmas,
|
||
algebraic hierarchy, all non basic tactics, etc. Motivations: reduce clutter,
|
||
and increase agility. We will copy `library` into `old_library` and incrementally
|
||
rebuild core lib.
|
||
- Remove dead C++ code, and disable all but the most basic tactics.
|
||
- Add abstraction layers to isolate modules. Example: module manager should not depend on the parser;
|
||
equation compiler should not depend on the elaborator.
|
||
- Make Lean object thread safe (see Memory management section for performance issues),
|
||
and remove related clutter. Example: we will not need ts_vm_obj anymore.
|
||
Remove unnecessary closure objects that were added for the previous C++ code generator
|
||
that was discontinued.
|
||
- Split tactic state into backtrackable and non-backtrackable parts using the
|
||
new monad transformers. The plan is to have the following monads for meta programming:
|
||
a) `elab`: it is morally state_t and except_t where the state is `elab_state`.
|
||
`elab_state` is defined in C++, and contains: `environment`, cache data structures,
|
||
`name_generator`, `options` and `metavar_context`. Only `metavar_context` is backtrackable, and
|
||
we define `<|>` as
|
||
```
|
||
meta def elab.orelse {α : Type} (e₁ : elab α) (e₂ : elab α) : elab α :=
|
||
⟨λ s, let mctx := s.get_mvar_ctx in
|
||
match e₁.run s with
|
||
| elab_result.exception α ex s' := e₂.run (s'.set_mvar_ctx mctx)
|
||
| r := r
|
||
end⟩
|
||
```
|
||
Note that, `elab_state` is used linearly in the definition above, and we just save
|
||
`metavar_context`.
|
||
|
||
b) `tactic`: it built on top of `elab` using `state_t`. In the new state we store
|
||
the list of goals (i.e., metavariables) and the main metavariable.
|
||
|
||
We will implement (most of) the C++ primitive tactics in the `elab` monad.
|
||
One of the motivations for the approach above is that C++ and Lean code use `elab` in very similar ways.
|
||
For example, the `type_context` object in C++ just gets a reference for the `elab_state` object.
|
||
|
||
Remark: `elab` monad will be defined in Lean, but we will mark it private. All tactics that
|
||
have access to the private definition will use `elab_state` linearly. The idea is to make sure
|
||
we can use destructive updates when implementing C++ primitive tactics. This is a big
|
||
advantage with respect to the approach used in Lean3 where we keep creating (and deleting) `tactic_state` objects
|
||
all over the place.
|
||
|
||
We have considered splitting `elab` into two monads. The first one `elab_core` would have a state
|
||
without `metavar_context`. `elab_core` would morally be `except_t ex (state elab_core_state)`. And then,
|
||
`elab` would be defined as `state_t metavar_context elab_core`. So, `metavar_context` would be backtrackable
|
||
for free, and we would not need to define a custom `orelse`. This change is fine on the Lean side, but it would
|
||
be messy when programming in C++ because we would have to continue to propagate the `metavar_context` manually as
|
||
we do in Lean3. Recall that a few bugs in Lean3 are due to incorrect propagation of the `metavar_context`. These bugs
|
||
will all disappear with the new design.
|
||
We also have additional performance overhead with the `elab`+`elab_core` approach.
|
||
For example, we want to support the following API:
|
||
```
|
||
mk_type_context : elab type_context
|
||
infer_type : type_context -> expr -> elab expr
|
||
whnf : type_context -> expr -> elab expr
|
||
...
|
||
```
|
||
If we use `elab`+`elab_core` approach, the `infer_type` primitive would have to be implemented as:
|
||
```
|
||
vm_obj infer_type(vm_obj const & ctx, vm_obj const & e, vm_obj const & mctx, vm_obj const & s) {
|
||
try {
|
||
ctx.set_mctx(mctx); // copy current metavar_context to ctx
|
||
auto r = ctx.infer(e);
|
||
return mk_success_result(r, ctx.mctx(), s); // build result using updated metavar_context
|
||
} catch (exception & ex) {
|
||
....
|
||
} }
|
||
```
|
||
In the `elab` only approach, we save one argument `mctx`, and we don't have to copy the `metavar_context`.
|
||
|
||
- Implement support objects in Lean: options, format, structure trace messages, syntax object, etc.
|
||
- Add parser infrastructure in Lean.
|
||
- Compiler and C++ code generator. The C++ code generator will avoid many bootstrapping
|
||
problems we have. The idea is to write several Lean modules in Lean, emit C++ code, save
|
||
the generated C++ code in our repo.
|
||
- New IR with support for non uniform memory layout for Lean objects (see details on the #backend
|
||
Slack channel).
|
||
- Develop a tool in Lean that given a Lean inductive datatype (or structure) generates C++
|
||
code for retrieving fields and creating Lean objects. The goal is to isolate primitives implemented
|
||
in C++ from the way we represent Lean objects in memory. For example, most Lean functions implemented
|
||
in C++ use the C++ function `cfield` which assumes objects have a uniform memory layout.
|
||
- Develop a tool for generating glue code for interfacing Lean and C++ code. Again, the goal
|
||
is to isolate the primitives from the way we handle boxed/unboxed values.
|
||
For example, suppose we have a builtin function `foo` that takes two Lean `bool` values.
|
||
Right now, this function takes boxed values `c_foo(vm_obj const & a, vm_obj const & b)`.
|
||
It feels weird to have to box a Lean value to be able to invoke the builtin implementation for `foo`.
|
||
After we have the tool, we would write `c_foo(bool a, bool b)` and describe its signature in a Lean file.
|
||
The tool then generates the wrappers for invoking `c_foo` from the interpreter and generated C++ code.
|
||
|
||
# Language and library issues
|
||
|
||
- `private` declarations are not reliable. Users can easily subvert them
|
||
using meta programming. This is problematic for several optimizations
|
||
we want to use. For example, suppose we define a state-like monad
|
||
where every primitive uses the state linearly. The code
|
||
generator cannot rely on that since users can currently access
|
||
the internal implementation, and use the state in a non linear way.
|
||
This is just an example. We have many more.
|
||
|
||
- `parameter`s are currently simulated in Lean. For example,
|
||
when we declare `foo` in a section with a parameter `A`,
|
||
`foo` is automatically abstracted and an alias `foo => @foo A` is created.
|
||
This creates many problems, most of them are documented in the issue tracker.
|
||
This approach has one advantage: users can use the abstracted (`_root_.foo`)
|
||
and non-abstracted (`foo`) version simultaneously. Another advantage is that
|
||
we don't have to type check `foo` more than once in the kernel.
|
||
That being said, the disadvantages far outweigh the advantages.
|
||
We plan to go back to the approach used in Lean1 and Coq.
|
||
|
||
- Coercion resolution (see issue #1402).
|
||
|
||
- Name resolution for `[ ... ] tactic blocks.
|
||
The `[ ... ] notation allows us to use interactive tactic notation
|
||
when writing reusable tactics. This is very convenient, but the current
|
||
implementation uses dynamic name resolution, and is a source of many
|
||
bugs.
|
||
|
||
- `if-then-else` using `bool` instead of `Prop`.
|
||
As soon as we started programming with Lean (version 3), it became clear
|
||
that `if-then-else` with `Prop` creates more problems that it solves.
|
||
The elaborator already has support for a coercion from `Prop` to `bool`
|
||
(for decidable propositions). The dependent `if H : p then t else e`
|
||
may look cute, but it is unnecessary now that we have `match`.
|
||
|
||
- `decidable` type class. A recurrent problem in Lean occurs when
|
||
users perform dependent elimination on `decidable` instances.
|
||
The problem occurs when we have `[h : decidable p]` in the context
|
||
and a goal `G[@f p h]`, that is, a goal containing the term `@f p h`.
|
||
Then, we perform `cases h`, and obtain `G[@f p (dedicable.is_true h')]` in
|
||
one branch and `G[@f p (decidable.is_false h)]` in another. Then, we apply a
|
||
lemma that gives us `G[@f p h]` where `h` is the synthesized instance,
|
||
but we cannot use it to close the goal because we get a type error.
|
||
We have discussed this problem with Tahina and he pointed out that
|
||
we should never perform dependent elimination on type class instances in proofs,
|
||
and that this is an anti-idiom. He told us that every type class is a
|
||
structure in Coq. That is, everything is wrapped in a structure.
|
||
In Coq, they would use a custom eliminator for performing case analysis
|
||
on decidable instances. They don't face this problem because the custom eliminator
|
||
is more convenient to use them manually destructing the wrapper structure,
|
||
and then the actual data. He also strongly suggested that we
|
||
should decouple the program that computes whether a proposition is true or false
|
||
from the proof the result is correct. The Lean type class combines both in one single definition.
|
||
He said this will be a problem in the future for users that want to compute in the kernel.
|
||
The kernel computations will have to deal with these proof terms. In Coq, `decidable` is now defined as:
|
||
```
|
||
Class Decidable (P : Prop) := {
|
||
Decidable_witness : bool;
|
||
Decidable_spec : Decidable_witness = true <-> P
|
||
}.
|
||
```
|
||
He strongly recommended we define `decidable` using this approach.
|
||
He said this is not the original definition used in Coq. The first one was a structure wrapping a
|
||
sum type (which is closer to our definition), and the Coq developers had to change it
|
||
because of performance problems in proofs by reflection.
|
||
|
||
- Interactive tactics. In Lean4 we will have a much more extensible and flexible
|
||
parser, and it will be written in Lean. We will be able to write a custom parser for the tactic
|
||
interactive mode. So, the current argument-type driven parser we use will not
|
||
be needed anymore.
|
||
|
||
- Quotations. Do we need all of them? I find the one with three backticks very inconvenient to use.
|
||
Moreover, as soon as we implement the new parser, we will want quotations for building
|
||
the new syntactic object and Lean expressions.
|
||
|
||
# Compiler
|
||
|
||
- The new compiler will use a System-F like intermediate representation.
|
||
It will be similar to Haskell core language. Inductive datatypes will be represented
|
||
using a constant for each constructor and a `cases` eliminator. If `cases` is encoded
|
||
using a expr-macro, we can easily support `default/other` case.
|
||
|
||
- Code inlining will occur at the System-F level after we have applied
|
||
simplifications. This is relevant for the performance issues we have
|
||
observed when a long chain of functions need to be unfolded (e.g., new
|
||
monad transformer library).
|
||
|
||
- The first compilation step applies compiler specific simplification rules provided by users.
|
||
For example, we will be able to mark `map g (map f l) = map (g o f) l` as an optimization
|
||
rule for the compiler.
|
||
|
||
Issue: many opportunities for applying simplification rules only appear after we have inlined
|
||
definitions. However, we want to inline after we have converted into System-F and have
|
||
erased computationally irrelevant code and applied basic simplifications (e.g., erase trivial structures).
|
||
This is a problem since user provided simplification rules are not applicable here since they have
|
||
been described at the Lean level.
|
||
|
||
- Basic types (scalars, bool, char, uint32, uint64, int64, int32, ...) and C++ types
|
||
can be stored in unboxed form. The unboxed version are prefixed with `#` as they do
|
||
in Haskell.
|
||
|
||
- When we convert a Lean function to System-F, we will generate two versions: boxed and unboxed.
|
||
The boxed version is needed when passing this function to polymorphic higher-order functions.
|
||
As in Haskell, polymorphic functions always take boxed values.
|
||
Both versions are stored in the Lean environment as `meta` functions.
|
||
Example: the function `def inc (a : int32) := a + 1` is converted into two versions
|
||
`meta def _SystemF.boxed.inc (a : int32) := a + 1`, and `meta def _SystemF.unboxed.inc (a : #int32) := a #+ #1`.
|
||
Now, suppose we want to compile `twice inc a` where `def twice {A : Type} (f : A -> A) (a : A) := f (f a)`.
|
||
Then, since `twice` is polymorphic, we need to pass `inc` boxed version, and we generate
|
||
`@_SystemF.boxed.twice int32 _SystemF.boxed.inc a`.
|
||
|
||
- We want to implement monomorphisation as an additional optimization step. The idea is specialize functions like `twice`.
|
||
In the previous example, monomorphisation would generate `@_SystemF.unboxed.twice_int32 (f : #int32 -> #int32) (a : #int32)`.
|
||
We are considering caching monomorphised functions into the .olean files. If we do this, we have to consider the situation
|
||
where more than one .olean contains the same monomorphised function. We see two options: we have a canonical way to generate
|
||
names for monomorphised functions; we generate unique names, and accept the fact the environment will contain duplicates.
|
||
It is just a space issue.
|
||
Remark: see comment `closure` bullet point below. We may not use unboxed values in closures.
|
||
|
||
- In Lean3, `name`, `level` and `expr` are all implemented in C++. To expose these objects in Lean, we have to wrap them
|
||
using a subclass of `vm_external`. This generates a significant performance overhead. For example, suppose we have a lean
|
||
function that traverses a big expression; for each visited expression `e`, we need to create a `vm_expr` object for wrapping `e`.
|
||
Moreover, we have two layers of reference counting: one at `expr` and another at `vm_obj`.
|
||
This design decision is fine in Lean3 because the most expensive tasks are implemented in C++, and Lean code is only
|
||
used to "glue" together existing procedures implemented in C++. This is not the case in Lean4.
|
||
So, in Lean4, all these objects will be implemented directly in Lean.
|
||
As described above, we will have a tool that will generate C++ functions for creating and accessing these objects.
|
||
We believe this will not affect much how we code in C++, and it will eliminate a lot of boilerplate code we currently use.
|
||
|
||
# Object memory layout
|
||
|
||
- Constructor objects: they will contain pointers and unboxed data. We will use all-pointers first approach, and the header will contain 8 bytes:
|
||
a) reference counter: 4 bytes
|
||
b) kind: 1 byte
|
||
c) tag (aka constructor index): 1 byte
|
||
d) size (aka number of pointer objects): 2 bytes
|
||
After the header we have `size` pointers and then all unboxed data. In debug mode, we want to store the size of the space used for
|
||
unboxed data, and use this information for implementing sanity checks.
|
||
Remark: the header of all composite Lean objects start with the reference counter and `kind`.
|
||
Note that this representation supports only inductive datatypes with at most 256 constructors and 2^16 pointer fields.
|
||
This is sufficient for our needs.
|
||
If one day we want to support inductive datatypes with more than 256 constructors and/or 2^16 pointer fields, we can add
|
||
a new kind of constructor object, and add new opcodes for manipulating them. We say this new kind is a "fat" constructor object
|
||
since its header is bigger.
|
||
|
||
- Array: we support arrays of pointers and arrays of unboxed data. We will have opcodes for reading and updating arrays.
|
||
The updates are destructive when the array reference counter is 1.
|
||
|
||
- Closures: we need two kinds of closures: one that stores the bytecode id (for interpreted code); and another that stores a function pointer (for compiled code).
|
||
We need to decide whether we will support closures that store unboxed data or not. The simple solution is to support only boxed data
|
||
in closures. This may not be a huge performance overhead since many higher order functions such as `map` and `fold` will
|
||
be specialized during the monomorphisaton step, and we will not even create the closures.
|
||
A possible compromise is to support unboxed data only for closures that store function pointers. The idea is the following, whenever we emit C++ code for a function,
|
||
we also generate a `run` function for it s.t. given the closure data, it invokes the function. Then, we store the pointer to the `run` function in the closure.
|
||
For example, suppose we have generated
|
||
```
|
||
vm_obj foo(bool b1, vm_obj o, bool b2) { ... }
|
||
```
|
||
Then, we would also generate
|
||
```
|
||
vm_obj run_foo(closure_data d) {
|
||
return foo(get_closure_bool(8, d), get_closure_obj(0, d), get_closure_bool(9, d));
|
||
}
|
||
```
|
||
`get_closure_bool(offset, d)` retrieves the Boolean stored at the given offset in `d`.
|
||
Note that, as in constructor objects, unboxed data is stored after pointers at `closure_data`.
|
||
So, we use `get_closure_bool(8, d)` to retrieve `b` which is stored after `o` (the pointer to `o` consumes 8 bytes) in `d`.
|
||
It is not clear this approach is a good one since we would always need to create a `closure_data` object that contains all arguments
|
||
before executing a closure. It is not clear how to support the optimization we use in Lean3 that avoids the allocation of the last `closure_data`
|
||
in most cases (https://github.com/leanprover/lean/blob/master/src/library/vm/vm.cpp#L1698).
|
||
|
||
- MPZ (multiprecision integers)
|
||
|
||
- String. Remark: internally it is not an array of char, but an array of bytes encoded in UTF8. The key difference with respect to Lean3 is that it will
|
||
not use the `vm_external` wrapper approach, but have an object kind for strings.
|
||
|
||
# IR
|
||
|
||
- Register based.
|
||
- Explicit reference counting instructions. We use reference counting for: composite objects, closures and
|
||
C++ unboxed data (e.g., `expr`). Remark: for each C++ primitive the VM needs to know how to increase/decrease the reference counter.
|
||
- Support for unboxed values.
|
||
- Instructions for accessing unboxed values in non-uniform structures.
|
||
We will have instructions for operations such as `get_scalar_<sz>(obj, offset)`,
|
||
where `obj` is a (potentially non-uniform) Lean object, `offset` is the offset
|
||
inside of this object, and `sz` is the number of bytes needed to store the object.
|
||
In practice, we would have `get_scalar_1`, `get_scalar_2`, `get_scalar_4` and `get_scalar_8`.
|
||
These scalars are unboxed. We would have registers for storing the different kinds of scalars,
|
||
and basic operations on them (e.g., comparison, arithmetic, etc). So, we will
|
||
have instructions such as `GETS_<sz> r_o r_i offset` where `r_o` and `r_i` are registers, and
|
||
it corresponds to `r_o := get_scalar_<sz>(r_i, offset)`. Moreover, `r_o` must be a scalar
|
||
register of size `sz` and `r_i` is a register for storing Lean objects.
|
||
|
||
Remark: for each constructor datatype, we will have a table that maps fields to the
|
||
operation needed to retrieve them. We will use this table when converting the SystemF
|
||
representation into the IR.
|
||
|
||
Open issue: should we use SSA or SIL?
|
||
BTW, most of the benefits of SSA/SIL seem to be irrelevant for Lean.
|
||
The paper https://arxiv.org/pdf/1507.05762.pdf send by Nuno describes the pros/cons for SSA.
|
||
Most of them seem to be related to static analysis procedures. I think we will need very few
|
||
static analysis steps, most optimizations will be implemented at System-F before we convert the code into IR.
|
||
|
||
# VM
|
||
|
||
- We need a new VM for the new IR.
|
||
|
||
- The VM should be able to invoke primitives hand written in C++ and
|
||
C++ code emitted by the Lean compiler.
|
||
|
||
- A few hand written C++ primitives and C++ code emitted by the Lean compiler
|
||
need to invoke Lean functions. We should be careful to avoid a mismatch here where
|
||
a C++ function F for Lean version `X` is trying to invoke bytecode for
|
||
a Lean function G for version `X+1`. If we allow this to happen the system
|
||
may crash because the data representation for version `X+1` may be
|
||
different from version `X`.
|
||
|
||
We can try to address this issue by breaking core lib into two parts.
|
||
The first part (bootstrapping) contains all the infrastructure needed by the parser, compiler,
|
||
tactic framework, and Lean runtime. If we make a change here, we should
|
||
compile it again using the previously emitted C++ code, and then generate
|
||
a new version of the C++ code, compile it, and check whether it works or not.
|
||
In principle, it is not safe to invoke bytecode generated during the current compilation
|
||
from previously emitted C++ code since they may be using different representations.
|
||
Of course, the changes may be harmless, but to avoid problems we should minimize
|
||
the number of tactics used in this part of the core lib. Ideally, tactics should
|
||
not be used in the bootstrapping part.
|
||
|
||
We may also emit C++ code for non essential functionality that is implemented in Lean,
|
||
and then link it with the Lean executable. Example: a decision procedure, a parser extension.
|
||
The idea is to provide a more efficient version to users. Here we can use a more
|
||
relaxed approach since this functionality is not part of the compiler. We can store a hash code
|
||
for each of these functions. When we import the .olean file that generated the function
|
||
we compare whether the hash code there matches the one in the emitted C++ code.
|
||
If it does, we use the C++ version, otherwise we use the bytecode.
|
||
|
||
# Explicit reference counting
|
||
|
||
As described above, we will have opcodes for increasing/decreasing `vm_obj`s reference counters.
|
||
In Lean3, we perform too many unnecessary increments/decrements.
|
||
A function is responsible for decreasing the reference counter of each argument (i.e., it consumes the argument), and
|
||
for increasing the reference counter of the result. Each argument should be viewed as a resource that is consumed by the function.
|
||
Many optimizations are possible. For example, consider the function
|
||
```
|
||
def proj2 (a : A) (b : B) := b
|
||
```
|
||
A naive compilation into bytecode would produce:
|
||
```
|
||
def proj2 a b :=
|
||
r := b;
|
||
inc r;
|
||
dec a;
|
||
dec b;
|
||
return r
|
||
```
|
||
This can be optimized as
|
||
```
|
||
def proj2 a b :=
|
||
dec a;
|
||
return b
|
||
```
|
||
We need to explicitly decrement `a` because it was not use by `proj2`.
|
||
The function
|
||
```
|
||
def ex1 (x : X) (y : Y) :=
|
||
let v1 := f x x,
|
||
v2 := f x y
|
||
in h v1 v2
|
||
```
|
||
is compiled as
|
||
```
|
||
def ex1 x y :=
|
||
inc x;
|
||
inc x;
|
||
v1 := f x x;
|
||
v2 := f x y;
|
||
r := h v1 v2;
|
||
return r
|
||
```
|
||
The reference counter for `x` is incremented twice because it is used 3 times in this function.
|
||
Note that, we don't have to decrement `v1` nor `v2` since they are consumed by `h`.
|
||
We can an instruction `inc2 x` for executing `inc x; inc x` in a single step.
|
||
Now consider a map-like function
|
||
```
|
||
def my_map : list A -> list A
|
||
| [] := []
|
||
| (h::t) := g h :: my_map t
|
||
```
|
||
This function will be compiled as
|
||
```
|
||
def my_map t :=
|
||
switch (cidx t) {
|
||
case 0:
|
||
return t;
|
||
case 1:
|
||
h1 := get#0 t;
|
||
t1 := get#1 t;
|
||
dec t;
|
||
h2 := g h1;
|
||
t2 := my_map t2;
|
||
r := mk_list_cons h2 t2;
|
||
return r
|
||
}
|
||
```
|
||
The instruction `get#<idx> t` returns the field `idx` of the object `t`. It bumps the reference counter of the resulting object, but does not update the reference counter of `t`.
|
||
`mk_list_cons h2 t2` is a function that creates a constructor object with tag `#1` and two pointers `h2` and `t2`. We don't need to decrement
|
||
the reference counters of `h2` and `t2` since they were "consumed" by `mk_list_cons`. We can inline `mk_list_cons` too.
|
||
We can also add a special `dec` instruction for reusing memory cells, and write `my_map` as:
|
||
```
|
||
def my_map t :=
|
||
switch (cidx t) {
|
||
case 0:
|
||
return t;
|
||
case 1:
|
||
h1 := get#0 t;
|
||
t1 := get#1 t;
|
||
t_cell := dec_core t;
|
||
h2 := g h1;
|
||
t2 := my_map t2;
|
||
r := mk_list_cons_reusing h2 t2 t_cell;
|
||
return r
|
||
}
|
||
```
|
||
The `dec_core t` instruction decrements the reference counter of `t`, but does not delete the memory cell if it is zero.
|
||
Then, `mk_list_cons_reusing` will reuse `t`'s memory cell if the counter is 0, and will create a new cell otherwise.
|
||
With this trick, `my_map` will not allocate any constructor object if the input list is not shared.
|
||
To avoid memory leaks, we have to make sure that in each path after `dec_core t`, `t_cell` is used in a `mk_*_reusing` function
|
||
and/or we explicitly delete it using `del t_cell`.
|
||
When the reference counter of `t` is one in `case 1`, the `get#0 t` and `get#1 t` unnecessarily increase the reference counter of the result value just to decrease it again at `dec_core t`. We can avoid this overhead by using the following alternative formulation
|
||
```
|
||
def my_map t :=
|
||
switch (cidx t) {
|
||
case 0:
|
||
return t;
|
||
case 1:
|
||
if (ref_count t == 1) {
|
||
h1 := steal#0 t;
|
||
t1 := steal#1 t;
|
||
t_cell := t;
|
||
} else {
|
||
h1 := get#0 t;
|
||
t1 := get#1 t;
|
||
dec t;
|
||
t_cell := 0;
|
||
}
|
||
h2 := g h1;
|
||
t2 := my_map t2;
|
||
r := mk_constructor_reusing #1 2 t_cell;
|
||
set#0 r h2;
|
||
set#1 r t2;
|
||
return r
|
||
}
|
||
```
|
||
The instruction `steal#<idx> t` is similar to `get#<idx> t`, but it does not increase the reference counter of the resulting object.
|
||
`mk_constructor_reusing #1 2 t_cell` creates a constructor object with tag `#1` and size 2 and reusing `t_cell` if different from 0. The instructions `set#<idx>` are used to initialize the resulting fields.
|
||
The optimization above can be used whenever `t` is dead after the `case`, and an object of same size is created.
|
||
Note that if `t` is a list of arrays and it is not shared, then `g h1` will also be able to perform destructive updates.
|
||
Remark: suppose the `i`-th field in `case` branch is not used, then instead of using `steal#i t`, we use `dec#i t` to decrement the reference counter of the `i`-th field.
|
||
|
||
TODO: create experiments to check whether the optimization above is relevant or not.
|
||
|
||
# Tail recursion
|
||
|
||
TODO
|
||
|
||
# Unboxed products and sums
|
||
|
||
TODO
|
||
|
||
# Memory management
|
||
|
||
Lean3 VM objects are not thread safe: they do not use atomic
|
||
operations for updating the reference counter, and we use a small
|
||
object memory allocator. The main motivation was performance.
|
||
We evaluated these design decisions again, and did not observe any
|
||
performance impact when we used atomic operations for updating
|
||
the reference counter and removed the small object allocator.
|
||
The experiments were conducted using OSX and Linux.
|
||
We considered two benchmarks: core lib compilation, and a small Lean
|
||
program attached in the end of this section.
|
||
In both platforms and benchmarks no significant difference was observed.
|
||
Then, we disable the `memory_pool` object, and again no difference
|
||
in performance was observed.
|
||
We believe the memory allocators in the C++ runtime have been improved.
|
||
This is consistent with our observation that building Lean with `tcmalloc`
|
||
does not improve the performance significantly anymore.
|
||
However, it is not clear why using std::atomic does not impact performance
|
||
anymore.
|
||
|
||
```
|
||
def foo (n : nat) : nat :=
|
||
(((list.iota n).map (+10)).map (+30)).length
|
||
|
||
#eval nat.repeat (λ i _, foo i) 4000 0
|
||
```
|