This feature was originally added to implement the state monad.
It is not needed anymore since we have a better encoding that doesn't
require this feature and avoids memory allocations.
This transformation is useful for caching the construction of closures
at runtime. For example, consider the following piece of code
```
λ α a,
@tree.cases_on a visit._main._lambda_1
(λ a_a a_a_1 a_a_2,
let _x_1 := visit._main ◾ a_a,
_x_2 := visit._main._lambda_5 a_a_1 a_a_2
in bind._main ◾◾◾◾ _x_1 _x_2)
```
where `visit._main._lambda_1` is of the form
```
visit._main._lambda_1 :=
λ _x, ...
```
At runtime, we will create a closure object for `visit._main._lambda_1`
since it has arity 1, but no arguments have been provided.
This commit implements a new transformation that creates an auxiliary
declaration with arity 0.
```
visit._main._closed_1 :=
visit._main._lambda_1
```
Its value is cached by the runtime. That is, the closure is created only
once.
@kha This optimizations reduces the number of closures by another 200k
at `parser1.lean`. We are now under 2million :)
`uint32` is a definition, and `type_checker::whnf` unfolds it.
To preserve the information at `erase_irrelevant`, we use a custom
`whnf_type` method that stops reduction as soon as a builtin runtime
type is found.
A few weeks ago, it was not feasible to inline `bind_mk_res` and
`orelse_mk_res` because the compilation time would increase a lot.
Since then I have improved the heuristics for deciding whether to float
`cases_on` or not.
So, I tried today to mark them with `@[inline]` again.
The corelib build time increased only 1.2 secs, but the `parser1.lean` runtime improved:
Before:
```
num. allocated objects: 18025367
num. allocated closures: 2988476
```
After:
```
num. allocated objects: 15774515
num. allocated closures: 2488695
```
I used my desktop to collect the numbers above.
We do not try to check whether code generation will succeed or not for
some declaration. In the future, we should probably rename it to
`nocode` or something similar.
cc @kha
We must make sure we do not accidentally change the arity of a join
point. The arity is the number of nested lambda expressions.
For example, suppose we have
```
let jp := fun (x : unit), t
```
where `jp` is a join point of arity 1, i.e., `t` is not a lambda.
All "jumps" will be of the form: `jp ()`.
Now, suppose we simplify `t` and it becomes a lambda `fun (y : nat), y`.
We should to represent `jp` as
```
let jp := fun (x : unit) (y : nat), y
```
Because we would be changing `jp`'s arity.
We have the same problem with `cases_on` applications in LCNF.
So, we fix the problem using the same approach: an auxiliary
`let`-declaration. The simplified join point above is encoded as
```
let jp := fun (x : unit),
let _z := fun (y : nat), y
in _z
```
cc @kha This is the bug that I mentioned on slack :)
We want them to be specialized for a given monad stack, but not
inlined. If we inline them, then every occurrence of `whitespace` and
`num` will specialize the nested `take_while?` application.
This is bad since we don't cache them.
Both `alternative` and `monad` implement `applicative`. However,
their default implementations for `seq_right` and `seq_left` are
different. The `alternative` implementation uses the inefficient default
version for `seq_right` available at `applicative`:
```
(seq_right := λ α β a b, const α id <$> a <*> b)
```
instead of the more efficient
```
(seq_right := λ α β x y, x >>= λ _, y)
```
defined at `monad` using the `bind` operator.
This commit makes sure the `applicative` instances for `reader_t`,
`state_t`, `option` and `parsec_t` use the efficient version.
I found the problem when inspecting the generated code for:
```
def symbol (s : string) : parsec' unit :=
(str s *> whitespace) <?> ("'" ++ s ++ "'")
```
The datastructures at `local_context` used to manage used user_names
introduce a lot of overhead. They do guarantee that `get_unused_name` is
`O(log(n))`, but they slowdown much more common operations such as:
local declaration creation/deletion. We create/delete local declarations
much more often than we use `get_unused_name`.
The corelib build time is now 34.18 secs on my desktop. It was 39.5 secs.