chore(tmp/lean4.md): moved to google docs
This commit is contained in:
parent
80d68a7605
commit
54b45c19b3
1 changed files with 0 additions and 502 deletions
502
tmp/lean4.md
502
tmp/lean4.md
|
|
@ -1,502 +0,0 @@
|
|||
Design notes for Lean4
|
||||
----------------------
|
||||
|
||||
# Goals
|
||||
|
||||
- Move more code from C++ to Lean.
|
||||
- New compiler and C++ code generator.
|
||||
- New runtime (support for unboxed values and FFI).
|
||||
- New parser and macro expander (in Lean).
|
||||
- New monad for accessing primitives that are only available in C++ (e.g., `type_context`).
|
||||
- Fix critical issues (e.g., issue #1601).
|
||||
- Fix language design issues.
|
||||
- Reduce clutter in the core lib and code base.
|
||||
|
||||
# Plan
|
||||
|
||||
- Create Lean4 branch
|
||||
- Disable most tests (they will be incrementally added back as we make progress on Lean4).
|
||||
- Dramatically reduce the size of core lib. We should only keep the basics that
|
||||
are needed to execute Lean programs. Remove most theorems and lemmas,
|
||||
algebraic hierarchy, all non basic tactics, etc. Motivations: reduce clutter,
|
||||
and increase agility. We will copy `library` into `old_library` and incrementally
|
||||
rebuild core lib.
|
||||
- Remove dead C++ code, and disable all but the most basic tactics.
|
||||
- Add abstraction layers to isolate modules. Example: module manager should not depend on the parser;
|
||||
equation compiler should not depend on the elaborator.
|
||||
- Make Lean object thread safe (see Memory management section for performance issues),
|
||||
and remove related clutter. Example: we will not need ts_vm_obj anymore.
|
||||
Remove unnecessary closure objects that were added for the previous C++ code generator
|
||||
that was discontinued.
|
||||
- Split tactic state into backtrackable and non-backtrackable parts using the
|
||||
new monad transformers. The plan is to have the following monads for meta programming:
|
||||
a) `elab`: it is morally state_t and except_t where the state is `elab_state`.
|
||||
`elab_state` is defined in C++, and contains: `environment`, cache data structures,
|
||||
`name_generator`, `options` and `metavar_context`. Only `metavar_context` is backtrackable, and
|
||||
we define `<|>` as
|
||||
```
|
||||
meta def elab.orelse {α : Type} (e₁ : elab α) (e₂ : elab α) : elab α :=
|
||||
⟨λ s, let mctx := s.get_mvar_ctx in
|
||||
match e₁.run s with
|
||||
| elab_result.exception α ex s' := e₂.run (s'.set_mvar_ctx mctx)
|
||||
| r := r
|
||||
end⟩
|
||||
```
|
||||
Note that, `elab_state` is used linearly in the definition above, and we just save
|
||||
`metavar_context`.
|
||||
|
||||
b) `tactic`: it built on top of `elab` using `state_t`. In the new state we store
|
||||
the list of goals (i.e., metavariables) and the main metavariable.
|
||||
|
||||
We will implement (most of) the C++ primitive tactics in the `elab` monad.
|
||||
One of the motivations for the approach above is that C++ and Lean code use `elab` in very similar ways.
|
||||
For example, the `type_context` object in C++ just gets a reference for the `elab_state` object.
|
||||
|
||||
Remark: `elab` monad will be defined in Lean, but we will mark it private. All tactics that
|
||||
have access to the private definition will use `elab_state` linearly. The idea is to make sure
|
||||
we can use destructive updates when implementing C++ primitive tactics. This is a big
|
||||
advantage with respect to the approach used in Lean3 where we keep creating (and deleting) `tactic_state` objects
|
||||
all over the place.
|
||||
|
||||
We have considered splitting `elab` into two monads. The first one `elab_core` would have a state
|
||||
without `metavar_context`. `elab_core` would morally be `except_t ex (state elab_core_state)`. And then,
|
||||
`elab` would be defined as `state_t metavar_context elab_core`. So, `metavar_context` would be backtrackable
|
||||
for free, and we would not need to define a custom `orelse`. This change is fine on the Lean side, but it would
|
||||
be messy when programming in C++ because we would have to continue to propagate the `metavar_context` manually as
|
||||
we do in Lean3. Recall that a few bugs in Lean3 are due to incorrect propagation of the `metavar_context`. These bugs
|
||||
will all disappear with the new design.
|
||||
We also have additional performance overhead with the `elab`+`elab_core` approach.
|
||||
For example, we want to support the following API:
|
||||
```
|
||||
mk_type_context : elab type_context
|
||||
infer_type : type_context -> expr -> elab expr
|
||||
whnf : type_context -> expr -> elab expr
|
||||
...
|
||||
```
|
||||
If we use `elab`+`elab_core` approach, the `infer_type` primitive would have to be implemented as:
|
||||
```
|
||||
vm_obj infer_type(vm_obj const & ctx, vm_obj const & e, vm_obj const & mctx, vm_obj const & s) {
|
||||
try {
|
||||
ctx.set_mctx(mctx); // copy current metavar_context to ctx
|
||||
auto r = ctx.infer(e);
|
||||
return mk_success_result(r, ctx.mctx(), s); // build result using updated metavar_context
|
||||
} catch (exception & ex) {
|
||||
....
|
||||
} }
|
||||
```
|
||||
In the `elab` only approach, we save one argument `mctx`, and we don't have to copy the `metavar_context`.
|
||||
|
||||
- Implement support objects in Lean: options, format, structure trace messages, syntax object, etc.
|
||||
- Add parser infrastructure in Lean.
|
||||
- Compiler and C++ code generator. The C++ code generator will avoid many bootstrapping
|
||||
problems we have. The idea is to write several Lean modules in Lean, emit C++ code, save
|
||||
the generated C++ code in our repo.
|
||||
- New IR with support for non uniform memory layout for Lean objects (see details on the #backend
|
||||
Slack channel).
|
||||
- Develop a tool in Lean that given a Lean inductive datatype (or structure) generates C++
|
||||
code for retrieving fields and creating Lean objects. The goal is to isolate primitives implemented
|
||||
in C++ from the way we represent Lean objects in memory. For example, most Lean functions implemented
|
||||
in C++ use the C++ function `cfield` which assumes objects have a uniform memory layout.
|
||||
- Develop a tool for generating glue code for interfacing Lean and C++ code. Again, the goal
|
||||
is to isolate the primitives from the way we handle boxed/unboxed values.
|
||||
For example, suppose we have a builtin function `foo` that takes two Lean `bool` values.
|
||||
Right now, this function takes boxed values `c_foo(vm_obj const & a, vm_obj const & b)`.
|
||||
It feels weird to have to box a Lean value to be able to invoke the builtin implementation for `foo`.
|
||||
After we have the tool, we would write `c_foo(bool a, bool b)` and describe its signature in a Lean file.
|
||||
The tool then generates the wrappers for invoking `c_foo` from the interpreter and generated C++ code.
|
||||
|
||||
# Language and library issues
|
||||
|
||||
- `private` declarations are not reliable. Users can easily subvert them
|
||||
using meta programming. This is problematic for several optimizations
|
||||
we want to use. For example, suppose we define a state-like monad
|
||||
where every primitive uses the state linearly. The code
|
||||
generator cannot rely on that since users can currently access
|
||||
the internal implementation, and use the state in a non linear way.
|
||||
This is just an example. We have many more.
|
||||
|
||||
- `parameter`s are currently simulated in Lean. For example,
|
||||
when we declare `foo` in a section with a parameter `A`,
|
||||
`foo` is automatically abstracted and an alias `foo => @foo A` is created.
|
||||
This creates many problems, most of them are documented in the issue tracker.
|
||||
This approach has one advantage: users can use the abstracted (`_root_.foo`)
|
||||
and non-abstracted (`foo`) version simultaneously. Another advantage is that
|
||||
we don't have to type check `foo` more than once in the kernel.
|
||||
That being said, the disadvantages far outweigh the advantages.
|
||||
We plan to go back to the approach used in Lean1 and Coq.
|
||||
|
||||
- Coercion resolution (see issue #1402).
|
||||
|
||||
- Name resolution for `[ ... ] tactic blocks.
|
||||
The `[ ... ] notation allows us to use interactive tactic notation
|
||||
when writing reusable tactics. This is very convenient, but the current
|
||||
implementation uses dynamic name resolution, and is a source of many
|
||||
bugs.
|
||||
|
||||
- `if-then-else` using `bool` instead of `Prop`.
|
||||
As soon as we started programming with Lean (version 3), it became clear
|
||||
that `if-then-else` with `Prop` creates more problems that it solves.
|
||||
The elaborator already has support for a coercion from `Prop` to `bool`
|
||||
(for decidable propositions). The dependent `if H : p then t else e`
|
||||
may look cute, but it is unnecessary now that we have `match`.
|
||||
|
||||
- `decidable` type class. A recurrent problem in Lean occurs when
|
||||
users perform dependent elimination on `decidable` instances.
|
||||
The problem occurs when we have `[h : decidable p]` in the context
|
||||
and a goal `G[@f p h]`, that is, a goal containing the term `@f p h`.
|
||||
Then, we perform `cases h`, and obtain `G[@f p (dedicable.is_true h')]` in
|
||||
one branch and `G[@f p (decidable.is_false h)]` in another. Then, we apply a
|
||||
lemma that gives us `G[@f p h]` where `h` is the synthesized instance,
|
||||
but we cannot use it to close the goal because we get a type error.
|
||||
We have discussed this problem with Tahina and he pointed out that
|
||||
we should never perform dependent elimination on type class instances in proofs,
|
||||
and that this is an anti-idiom. He told us that every type class is a
|
||||
structure in Coq. That is, everything is wrapped in a structure.
|
||||
In Coq, they would use a custom eliminator for performing case analysis
|
||||
on decidable instances. They don't face this problem because the custom eliminator
|
||||
is more convenient to use them manually destructing the wrapper structure,
|
||||
and then the actual data. He also strongly suggested that we
|
||||
should decouple the program that computes whether a proposition is true or false
|
||||
from the proof the result is correct. The Lean type class combines both in one single definition.
|
||||
He said this will be a problem in the future for users that want to compute in the kernel.
|
||||
The kernel computations will have to deal with these proof terms. In Coq, `decidable` is now defined as:
|
||||
```
|
||||
Class Decidable (P : Prop) := {
|
||||
Decidable_witness : bool;
|
||||
Decidable_spec : Decidable_witness = true <-> P
|
||||
}.
|
||||
```
|
||||
He strongly recommended we define `decidable` using this approach.
|
||||
He said this is not the original definition used in Coq. The first one was a structure wrapping a
|
||||
sum type (which is closer to our definition), and the Coq developers had to change it
|
||||
because of performance problems in proofs by reflection.
|
||||
|
||||
- Interactive tactics. In Lean4 we will have a much more extensible and flexible
|
||||
parser, and it will be written in Lean. We will be able to write a custom parser for the tactic
|
||||
interactive mode. So, the current argument-type driven parser we use will not
|
||||
be needed anymore.
|
||||
|
||||
- Quotations. Do we need all of them? I find the one with three backticks very inconvenient to use.
|
||||
Moreover, as soon as we implement the new parser, we will want quotations for building
|
||||
the new syntactic object and Lean expressions.
|
||||
|
||||
# Compiler
|
||||
|
||||
- The new compiler will use a System-F like intermediate representation.
|
||||
It will be similar to Haskell core language. Inductive datatypes will be represented
|
||||
using a constant for each constructor and a `cases` eliminator. If `cases` is encoded
|
||||
using a expr-macro, we can easily support `default/other` case.
|
||||
|
||||
- Code inlining will occur at the System-F level after we have applied
|
||||
simplifications. This is relevant for the performance issues we have
|
||||
observed when a long chain of functions need to be unfolded (e.g., new
|
||||
monad transformer library).
|
||||
|
||||
- The first compilation step applies compiler specific simplification rules provided by users.
|
||||
For example, we will be able to mark `map g (map f l) = map (g o f) l` as an optimization
|
||||
rule for the compiler.
|
||||
|
||||
Issue: many opportunities for applying simplification rules only appear after we have inlined
|
||||
definitions. However, we want to inline after we have converted into System-F and have
|
||||
erased computationally irrelevant code and applied basic simplifications (e.g., erase trivial structures).
|
||||
This is a problem since user provided simplification rules are not applicable here since they have
|
||||
been described at the Lean level.
|
||||
|
||||
- Basic types (scalars, bool, char, uint32, uint64, int64, int32, ...) and C++ types
|
||||
can be stored in unboxed form. The unboxed version are prefixed with `#` as they do
|
||||
in Haskell.
|
||||
|
||||
- When we convert a Lean function to System-F, we will generate two versions: boxed and unboxed.
|
||||
The boxed version is needed when passing this function to polymorphic higher-order functions.
|
||||
As in Haskell, polymorphic functions always take boxed values.
|
||||
Both versions are stored in the Lean environment as `meta` functions.
|
||||
Example: the function `def inc (a : int32) := a + 1` is converted into two versions
|
||||
`meta def _SystemF.boxed.inc (a : int32) := a + 1`, and `meta def _SystemF.unboxed.inc (a : #int32) := a #+ #1`.
|
||||
Now, suppose we want to compile `twice inc a` where `def twice {A : Type} (f : A -> A) (a : A) := f (f a)`.
|
||||
Then, since `twice` is polymorphic, we need to pass `inc` boxed version, and we generate
|
||||
`@_SystemF.boxed.twice int32 _SystemF.boxed.inc a`.
|
||||
|
||||
- We want to implement monomorphisation as an additional optimization step. The idea is specialize functions like `twice`.
|
||||
In the previous example, monomorphisation would generate `@_SystemF.unboxed.twice_int32 (f : #int32 -> #int32) (a : #int32)`.
|
||||
We are considering caching monomorphised functions into the .olean files. If we do this, we have to consider the situation
|
||||
where more than one .olean contains the same monomorphised function. We see two options: we have a canonical way to generate
|
||||
names for monomorphised functions; we generate unique names, and accept the fact the environment will contain duplicates.
|
||||
It is just a space issue.
|
||||
Remark: see comment `closure` bullet point below. We may not use unboxed values in closures.
|
||||
|
||||
- In Lean3, `name`, `level` and `expr` are all implemented in C++. To expose these objects in Lean, we have to wrap them
|
||||
using a subclass of `vm_external`. This generates a significant performance overhead. For example, suppose we have a lean
|
||||
function that traverses a big expression; for each visited expression `e`, we need to create a `vm_expr` object for wrapping `e`.
|
||||
Moreover, we have two layers of reference counting: one at `expr` and another at `vm_obj`.
|
||||
This design decision is fine in Lean3 because the most expensive tasks are implemented in C++, and Lean code is only
|
||||
used to "glue" together existing procedures implemented in C++. This is not the case in Lean4.
|
||||
So, in Lean4, all these objects will be implemented directly in Lean.
|
||||
As described above, we will have a tool that will generate C++ functions for creating and accessing these objects.
|
||||
We believe this will not affect much how we code in C++, and it will eliminate a lot of boilerplate code we currently use.
|
||||
|
||||
# Object memory layout
|
||||
|
||||
- Constructor objects: they will contain pointers and unboxed data. We will use all-pointers first approach, and the header will contain 8 bytes:
|
||||
a) reference counter: 4 bytes
|
||||
b) kind: 1 byte
|
||||
c) tag (aka constructor index): 1 byte
|
||||
d) size (aka number of pointer objects): 2 bytes
|
||||
After the header we have `size` pointers and then all unboxed data. In debug mode, we want to store the size of the space used for
|
||||
unboxed data, and use this information for implementing sanity checks.
|
||||
Remark: the header of all composite Lean objects start with the reference counter and `kind`.
|
||||
Note that this representation supports only inductive datatypes with at most 256 constructors and 2^16 pointer fields.
|
||||
This is sufficient for our needs.
|
||||
If one day we want to support inductive datatypes with more than 256 constructors and/or 2^16 pointer fields, we can add
|
||||
a new kind of constructor object, and add new opcodes for manipulating them. We say this new kind is a "fat" constructor object
|
||||
since its header is bigger.
|
||||
|
||||
- Array: we support arrays of pointers and arrays of unboxed data. We will have opcodes for reading and updating arrays.
|
||||
The updates are destructive when the array reference counter is 1.
|
||||
|
||||
- Closures: we need two kinds of closures: one that stores the bytecode id (for interpreted code); and another that stores a function pointer (for compiled code).
|
||||
We need to decide whether we will support closures that store unboxed data or not. The simple solution is to support only boxed data
|
||||
in closures. This may not be a huge performance overhead since many higher order functions such as `map` and `fold` will
|
||||
be specialized during the monomorphisaton step, and we will not even create the closures.
|
||||
A possible compromise is to support unboxed data only for closures that store function pointers. The idea is the following, whenever we emit C++ code for a function,
|
||||
we also generate a `run` function for it s.t. given the closure data, it invokes the function. Then, we store the pointer to the `run` function in the closure.
|
||||
For example, suppose we have generated
|
||||
```
|
||||
vm_obj foo(bool b1, vm_obj o, bool b2) { ... }
|
||||
```
|
||||
Then, we would also generate
|
||||
```
|
||||
vm_obj run_foo(closure_data d) {
|
||||
return foo(get_closure_bool(8, d), get_closure_obj(0, d), get_closure_bool(9, d));
|
||||
}
|
||||
```
|
||||
`get_closure_bool(offset, d)` retrieves the Boolean stored at the given offset in `d`.
|
||||
Note that, as in constructor objects, unboxed data is stored after pointers at `closure_data`.
|
||||
So, we use `get_closure_bool(8, d)` to retrieve `b` which is stored after `o` (the pointer to `o` consumes 8 bytes) in `d`.
|
||||
It is not clear this approach is a good one since we would always need to create a `closure_data` object that contains all arguments
|
||||
before executing a closure. It is not clear how to support the optimization we use in Lean3 that avoids the allocation of the last `closure_data`
|
||||
in most cases (https://github.com/leanprover/lean/blob/master/src/library/vm/vm.cpp#L1698).
|
||||
|
||||
- MPZ (multiprecision integers)
|
||||
|
||||
- String. Remark: internally it is not an array of char, but an array of bytes encoded in UTF8. The key difference with respect to Lean3 is that it will
|
||||
not use the `vm_external` wrapper approach, but have an object kind for strings.
|
||||
|
||||
# IR
|
||||
|
||||
- Register based.
|
||||
- Explicit reference counting instructions. We use reference counting for: composite objects, closures and
|
||||
C++ unboxed data (e.g., `expr`). Remark: for each C++ primitive the VM needs to know how to increase/decrease the reference counter.
|
||||
- Support for unboxed values.
|
||||
- Instructions for accessing unboxed values in non-uniform structures.
|
||||
We will have instructions for operations such as `get_scalar_<sz>(obj, offset)`,
|
||||
where `obj` is a (potentially non-uniform) Lean object, `offset` is the offset
|
||||
inside of this object, and `sz` is the number of bytes needed to store the object.
|
||||
In practice, we would have `get_scalar_1`, `get_scalar_2`, `get_scalar_4` and `get_scalar_8`.
|
||||
These scalars are unboxed. We would have registers for storing the different kinds of scalars,
|
||||
and basic operations on them (e.g., comparison, arithmetic, etc). So, we will
|
||||
have instructions such as `GETS_<sz> r_o r_i offset` where `r_o` and `r_i` are registers, and
|
||||
it corresponds to `r_o := get_scalar_<sz>(r_i, offset)`. Moreover, `r_o` must be a scalar
|
||||
register of size `sz` and `r_i` is a register for storing Lean objects.
|
||||
|
||||
Remark: for each constructor datatype, we will have a table that maps fields to the
|
||||
operation needed to retrieve them. We will use this table when converting the SystemF
|
||||
representation into the IR.
|
||||
|
||||
Open issue: should we use SSA or SIL?
|
||||
BTW, most of the benefits of SSA/SIL seem to be irrelevant for Lean.
|
||||
The paper https://arxiv.org/pdf/1507.05762.pdf send by Nuno describes the pros/cons for SSA.
|
||||
Most of them seem to be related to static analysis procedures. I think we will need very few
|
||||
static analysis steps, most optimizations will be implemented at System-F before we convert the code into IR.
|
||||
|
||||
# VM
|
||||
|
||||
- We need a new VM for the new IR.
|
||||
|
||||
- The VM should be able to invoke primitives hand written in C++ and
|
||||
C++ code emitted by the Lean compiler.
|
||||
|
||||
- A few hand written C++ primitives and C++ code emitted by the Lean compiler
|
||||
need to invoke Lean functions. We should be careful to avoid a mismatch here where
|
||||
a C++ function F for Lean version `X` is trying to invoke bytecode for
|
||||
a Lean function G for version `X+1`. If we allow this to happen the system
|
||||
may crash because the data representation for version `X+1` may be
|
||||
different from version `X`.
|
||||
|
||||
We can try to address this issue by breaking core lib into two parts.
|
||||
The first part (bootstrapping) contains all the infrastructure needed by the parser, compiler,
|
||||
tactic framework, and Lean runtime. If we make a change here, we should
|
||||
compile it again using the previously emitted C++ code, and then generate
|
||||
a new version of the C++ code, compile it, and check whether it works or not.
|
||||
In principle, it is not safe to invoke bytecode generated during the current compilation
|
||||
from previously emitted C++ code since they may be using different representations.
|
||||
Of course, the changes may be harmless, but to avoid problems we should minimize
|
||||
the number of tactics used in this part of the core lib. Ideally, tactics should
|
||||
not be used in the bootstrapping part.
|
||||
|
||||
We may also emit C++ code for non essential functionality that is implemented in Lean,
|
||||
and then link it with the Lean executable. Example: a decision procedure, a parser extension.
|
||||
The idea is to provide a more efficient version to users. Here we can use a more
|
||||
relaxed approach since this functionality is not part of the compiler. We can store a hash code
|
||||
for each of these functions. When we import the .olean file that generated the function
|
||||
we compare whether the hash code there matches the one in the emitted C++ code.
|
||||
If it does, we use the C++ version, otherwise we use the bytecode.
|
||||
|
||||
# Explicit reference counting
|
||||
|
||||
As described above, we will have opcodes for increasing/decreasing `vm_obj`s reference counters.
|
||||
In Lean3, we perform too many unnecessary increments/decrements.
|
||||
A function is responsible for decreasing the reference counter of each argument (i.e., it consumes the argument), and
|
||||
for increasing the reference counter of the result. Each argument should be viewed as a resource that is consumed by the function.
|
||||
Many optimizations are possible. For example, consider the function
|
||||
```
|
||||
def proj2 (a : A) (b : B) := b
|
||||
```
|
||||
A naive compilation into bytecode would produce:
|
||||
```
|
||||
def proj2 a b :=
|
||||
r := b;
|
||||
inc r;
|
||||
dec a;
|
||||
dec b;
|
||||
return r
|
||||
```
|
||||
This can be optimized as
|
||||
```
|
||||
def proj2 a b :=
|
||||
dec a;
|
||||
return b
|
||||
```
|
||||
We need to explicitly decrement `a` because it was not use by `proj2`.
|
||||
The function
|
||||
```
|
||||
def ex1 (x : X) (y : Y) :=
|
||||
let v1 := f x x,
|
||||
v2 := f x y
|
||||
in h v1 v2
|
||||
```
|
||||
is compiled as
|
||||
```
|
||||
def ex1 x y :=
|
||||
inc x;
|
||||
inc x;
|
||||
v1 := f x x;
|
||||
v2 := f x y;
|
||||
r := h v1 v2;
|
||||
return r
|
||||
```
|
||||
The reference counter for `x` is incremented twice because it is used 3 times in this function.
|
||||
Note that, we don't have to decrement `v1` nor `v2` since they are consumed by `h`.
|
||||
We can an instruction `inc2 x` for executing `inc x; inc x` in a single step.
|
||||
Now consider a map-like function
|
||||
```
|
||||
def my_map : list A -> list A
|
||||
| [] := []
|
||||
| (h::t) := g h :: my_map t
|
||||
```
|
||||
This function will be compiled as
|
||||
```
|
||||
def my_map t :=
|
||||
switch (cidx t) {
|
||||
case 0:
|
||||
return t;
|
||||
case 1:
|
||||
h1 := get#0 t;
|
||||
t1 := get#1 t;
|
||||
dec t;
|
||||
h2 := g h1;
|
||||
t2 := my_map t2;
|
||||
r := mk_list_cons h2 t2;
|
||||
return r
|
||||
}
|
||||
```
|
||||
The instruction `get#<idx> t` returns the field `idx` of the object `t`. It bumps the reference counter of the resulting object, but does not update the reference counter of `t`.
|
||||
`mk_list_cons h2 t2` is a function that creates a constructor object with tag `#1` and two pointers `h2` and `t2`. We don't need to decrement
|
||||
the reference counters of `h2` and `t2` since they were "consumed" by `mk_list_cons`. We can inline `mk_list_cons` too.
|
||||
We can also add a special `dec` instruction for reusing memory cells, and write `my_map` as:
|
||||
```
|
||||
def my_map t :=
|
||||
switch (cidx t) {
|
||||
case 0:
|
||||
return t;
|
||||
case 1:
|
||||
h1 := get#0 t;
|
||||
t1 := get#1 t;
|
||||
t_cell := dec_core t;
|
||||
h2 := g h1;
|
||||
t2 := my_map t2;
|
||||
r := mk_list_cons_reusing h2 t2 t_cell;
|
||||
return r
|
||||
}
|
||||
```
|
||||
The `dec_core t` instruction decrements the reference counter of `t`, but does not delete the memory cell if it is zero.
|
||||
Then, `mk_list_cons_reusing` will reuse `t`'s memory cell if the counter is 0, and will create a new cell otherwise.
|
||||
With this trick, `my_map` will not allocate any constructor object if the input list is not shared.
|
||||
To avoid memory leaks, we have to make sure that in each path after `dec_core t`, `t_cell` is used in a `mk_*_reusing` function
|
||||
and/or we explicitly delete it using `del t_cell`.
|
||||
When the reference counter of `t` is one in `case 1`, the `get#0 t` and `get#1 t` unnecessarily increase the reference counter of the result value just to decrease it again at `dec_core t`. We can avoid this overhead by using the following alternative formulation
|
||||
```
|
||||
def my_map t :=
|
||||
switch (cidx t) {
|
||||
case 0:
|
||||
return t;
|
||||
case 1:
|
||||
if (ref_count t == 1) {
|
||||
h1 := steal#0 t;
|
||||
t1 := steal#1 t;
|
||||
t_cell := t;
|
||||
} else {
|
||||
h1 := get#0 t;
|
||||
t1 := get#1 t;
|
||||
dec t;
|
||||
t_cell := 0;
|
||||
}
|
||||
h2 := g h1;
|
||||
t2 := my_map t2;
|
||||
r := mk_constructor_reusing #1 2 t_cell;
|
||||
set#0 r h2;
|
||||
set#1 r t2;
|
||||
return r
|
||||
}
|
||||
```
|
||||
The instruction `steal#<idx> t` is similar to `get#<idx> t`, but it does not increase the reference counter of the resulting object.
|
||||
`mk_constructor_reusing #1 2 t_cell` creates a constructor object with tag `#1` and size 2 and reusing `t_cell` if different from 0. The instructions `set#<idx>` are used to initialize the resulting fields.
|
||||
The optimization above can be used whenever `t` is dead after the `case`, and an object of same size is created.
|
||||
Note that if `t` is a list of arrays and it is not shared, then `g h1` will also be able to perform destructive updates.
|
||||
Remark: suppose the `i`-th field in `case` branch is not used, then instead of using `steal#i t`, we use `dec#i t` to decrement the reference counter of the `i`-th field.
|
||||
|
||||
TODO: create experiments to check whether the optimization above is relevant or not.
|
||||
|
||||
# Tail recursion
|
||||
|
||||
TODO
|
||||
|
||||
# Unboxed products and sums
|
||||
|
||||
TODO
|
||||
|
||||
# Memory management
|
||||
|
||||
Lean3 VM objects are not thread safe: they do not use atomic
|
||||
operations for updating the reference counter, and we use a small
|
||||
object memory allocator. The main motivation was performance.
|
||||
We evaluated these design decisions again, and did not observe any
|
||||
performance impact when we used atomic operations for updating
|
||||
the reference counter and removed the small object allocator.
|
||||
The experiments were conducted using OSX and Linux.
|
||||
We considered two benchmarks: core lib compilation, and a small Lean
|
||||
program attached in the end of this section.
|
||||
In both platforms and benchmarks no significant difference was observed.
|
||||
Then, we disable the `memory_pool` object, and again no difference
|
||||
in performance was observed.
|
||||
We believe the memory allocators in the C++ runtime have been improved.
|
||||
This is consistent with our observation that building Lean with `tcmalloc`
|
||||
does not improve the performance significantly anymore.
|
||||
However, it is not clear why using std::atomic does not impact performance
|
||||
anymore.
|
||||
|
||||
```
|
||||
def foo (n : nat) : nat :=
|
||||
(((list.iota n).map (+10)).map (+30)).length
|
||||
|
||||
#eval nat.repeat (λ i _, foo i) 4000 0
|
||||
```
|
||||
Loading…
Add table
Reference in a new issue