This PR adds the directory `Meta/DiscrTree` and reorganizes the code
into different files. Motivation: we are going to have new functions for
retrieving simplification theorems for the new structural simplifier.
This PR internalizes all arguments of Quot.lift during LCNF conversion,
preventing panics in certain
non trivial programs that use quotients.
Fixes#11719.
This PR enables the specializer to also recursively specialize in some
non trivial higher order situations.
The main motivation for this change is the upcoming changes to do
notation by sgraf. In there he uses combinators such as
```lean
@[specialize, expose]
def List.newForIn {α β γ} (l : List α) (b : β) (kcons : α → (β → γ) → β → γ) (knil : β → γ) : γ :=
match l with
| [] => knil b
| a :: l => kcons a (l.newForIn · kcons knil) b
```
in programs such as
```lean
def testing :=
let x := 42;
List.newForIn (β := Nat) (γ := Id Nat)
[1,2,3]
x
(fun i kcontinue s =>
let x := s;
List.newForIn
[i:10].toList x
(fun j kcontinue s =>
let x := s;
let x := x + i + j;
kcontinue x)
kcontinue)
pure
```
inspecting this IR right before we get to the specializer in the current
compiler we get:
```
[Compiler.eagerLambdaLifting] size: 22
def testing : Nat :=
fun _f.1 _y.2 : Nat :=
return _y.2;
let x := 42;
let _x.3 := 1;
fun _f.4 i kcontinue s : Nat :=
fun _f.5 j kcontinue s : Nat :=
let _x.6 := Nat.add s i;
let x := Nat.add _x.6 j;
let _x.7 := kcontinue x;
return _x.7;
let _x.8 := 10;
let _x.9 := Nat.sub _x.8 i;
let _x.10 := Nat.add _x.9 _x.3;
let _x.11 := 1;
let _x.12 := Nat.sub _x.10 _x.11;
let _x.13 := Nat.mul _x.3 _x.12;
let _x.14 := Nat.add i _x.13;
let _x.15 := @List.nil _;
let _x.16 := List.range'TR.go _x.3 _x.12 _x.14 _x.15;
let _x.17 := @List.newForIn _ _ _ _x.16 s _f.5 kcontinue;
return _x.17;
let _x.18 := 2;
let _x.19 := 3;
let _x.20 := @List.nil _;
let _x.21 := @List.cons _ _x.19 _x.20;
let _x.22 := @List.cons _ _x.18 _x.21;
let _x.23 := @List.cons _ _x.3 _x.22;
let _x.24 := @List.newForIn _ _ _ _x.23 x _f.4 _f.1;
return _x.24
```
Here the `kcontinue` higher order functions pose a special challenge
because they delay the discovery of new specialization opportunities.
Inspecting the IR after the current specializer (and a cleanup simp
step) we get functions that look as follows:
```
[simp] size: 7
def List.newForIn._at_.testing.spec_0 i kcontinue l b : Nat :=
cases l : Nat
| List.nil =>
let _x.1 := kcontinue b;
return _x.1
| List.cons head.2 tail.3 =>
let _x.4 := Nat.add b i;
let x := Nat.add _x.4 head.2;
let _x.5 := List.newForIn._at_.testing.spec_0 i kcontinue tail.3 x;
return _x.5
[simp] size: 14
def List.newForIn._at_.List.newForIn._at_.testing.spec_1.spec_1 _x.1 l b : Nat :=
cases l : Nat
| List.nil =>
return b
| List.cons head.2 tail.3 =>
fun _f.4 x.5 : Nat :=
let _x.6 := List.newForIn._at_.List.newForIn._at_.testing.spec_1.spec_1 _x.1 tail.3 x.5;
return _x.6;
let _x.7 := 10;
let _x.8 := Nat.sub _x.7 head.2;
let _x.9 := Nat.add _x.8 _x.1;
let _x.10 := 1;
let _x.11 := Nat.sub _x.9 _x.10;
let _x.12 := Nat.mul _x.1 _x.11;
let _x.13 := Nat.add head.2 _x.12;
let _x.14 := @List.nil _;
let _x.15 := List.range'TR.go _x.1 _x.11 _x.13 _x.14;
let _x.16 := List.newForIn._at_.testing.spec_0 head.2 _f.4 _x.15 b;
return _x.16
```
Observe that the specializer decided to abstract over `kcontinue`
instead of specializing further recursively. Thus this tight loop is now
going through an indirect call.
This PR now changes the specializer somewhat fundamentally to handle
situations like this. The most notable change is going to a fixpoint
loop of:
1. Specialize all current declarations in the worklist
2. If a declaration
- succeeded in specializing run the simplifier on it and put it back
onto the worklist
- if it didn't don't put it back onto the worklist anymore
3. Put all newly generated specialisations on the worklist
4. Recompute fixed parameters for the current SCC
5. Repeat until the worklist is empty
Furthermore, declarations that were already specialized:
- only consider `fixedHO` parameters for specialization, in order to
avoid termination issues with repeated specialization and abstraction of
type class parameters under binders
- recursively specialized declarations only allow specialization if at
least one of their fixedHO arguments is not a parameter itself. The
reason for allowing this in first generation specialization is that we
refrain from specializing inside the body of a declaration marked as
`@[specialize]`. Thus we need to specialize them even if their arguments
don't actually contain anything of interest in order to ensure that type
classes etc. are correctly cleaned up within their bodies.
There is one last trade-off to consider. When specializing code
generated by the new do elaborator we sometimes generate intermediate
specializations that are not actually part of any call graph after we
are done specializing. We could in principle detect these functions and
delete them but having them in cache is potentially helpful for further
specializations later. Once the new do elaborator lands we plan to test
this trade-off.
Closes#10924
This PR removes the old ElimDeadBranches pass and shifts the new one
past lambda lifting.
The reason for dropping the old one is its general unsoundness and the
fact that we want to do refactorings on the IR part. The reason for
shifting the current pass past lambda lifting, is that its analysis is
imprecise in the presence of local function symbols. I experimented with
the exact placement for a while and it seems like it is optimal here.
Overall we observe a slight regression in the amount of C code
generated, likely because we don't propagate information into lambdas
before lifting them anymore. But generally measure a slight performance
improvement in general.
This PR makes the LCNF simplifier eliminate cases where all alts are
`.unreach` to just an `.unreach`.
an `.unreach`
We considered dropping a cases in a situation like this but decided
against it because it might hinder reuse.
```
def test x : Bool :=
cases x : Bool
| Except.error a.1 =>
⊥
| Except.ok a.2 =>
let _x.3 := true;
return _x.3
```
This PR generalizes the `noConfusion` constructions to heterogeneous
equalities (assuming propositional equalities between the indices). This
lays ground work for better support for applying injection to
heterogeneous equalities in grind.
The `Meta.mkNoConfusion` app builder shields most of the code from these
changes.
Since the per-constructor noConfusion principles are now more
expressive, `Meta.mkNoConfusion` no longer uses the general one.
In `Init.Prelude` some proofs are more pedestrian because `injection`
now needs a bit more machinery.
This is a breaking change for whoever uses the `noConfusion` principle
manually and explicitly for a type with indices.
Fixes#11450.
This PR adapts the lambda lifter in LCNF to eta contract instead of
lambda lift if possible. This prevents the creation of a few hundred
unnecessary lambdas across the code base.
This PR sorts the declarations fed into ElimDeadBranches in increasing
size. This can improve performance when we are dealing with a lot of
iterations.
The motivation for this change is as follows. Currently the algorithm
for doing one step of abstract interpretation is:
```
for decl in scc do
interpDecl
if summaryChanged decl then
return true
return false
```
whenever we return true we run another step. Now suppose we are in a
situation where we have an SCC with one big decl in the front and then
`n` small ones afterwards. For each time that the small ones change
their summary, we will re-run analysis of the big one in the front.
Currently the ordering is basically at "random" based on how other
compilers inject things into the SCC. This change ensures the behavior
is consistent and at least somewhat intelligent. By putting the small
declarations first, whenever we trigger a rerun of the loop we bias
analyzing the small declarations first, thus decreasing run time.
Note that this change does not have much effect on the current pipeline
because: We usually construct the SCCs in a way such that small ones
happen to be in front anyways. However, with upcomping changes on
specialization this is about to change.
This PR is a followup of #11381 and enforces the invariants on ordering
of closed terms and constants required by the EmitC pass properly by
toposorting before saving the declarations into the Environment.
This PR fixes a bug where the closed term extraction does not respect
the implicit invariant of the
c emitter to have closed term decls first, other decls second, within an
SCC. This bug has not yet
been triggered in the wild but was unearthed during work on upcoming
modifications of the
specializer.
This PR accelerates termination of the ElimDeadBranches compiler pass.
The implementation addresses situations such as `choice [none, some
top]` which can be summarized to
`top` because `Option` has only two constructors and all constructor
arguments are `top`.
This PR makes the specializer (correctly) share more cache keys across
invocations, causing us to produce less code bloat.
We observed that in functions with lots of specialization, sometimes
cache keys are defeq but not BEq because one has unused let decls
(introduced by specialization) that the other doesn't. This PR resolves
this conflict by erasing unused let decls from specializer cache keys.
This PR removes all code that sets the `Option.Decl.group` field, which
is unused and has no clearly documented meaning.
The actual removal of the field would be #11305.
This PR replaces `MatcherInfo.numAltParams` with a more detailed data
structure that allows us, in particular, to distinguish between an
alternative for a constructor with a `Unit` field and the alternative
for a nullary constructor, where an artificial `Unit` argument is
introduced.
This PR fixes a bug in the LCNF simplifier unearthed while working on
#11078. In some situations caused by `unsafeCast`, the simplifier would
record incorrect information about `cases`, leading to further bugs down
the line.
Suppose we have `v : NonScalar` due to an `unsafeCast` and we run
`cases` on it, expecting `Prod.mk fst snd`. The current code attempts to
record both the arguments from the constructor application in the case
arm `fst`, `snd` and the parameters for the type by inspecting the discr
`v`. However, `NonScalar` does of course not have any parameters,
causing the simplifier to record wrong information. This patch makes the
`cases` infrastructure more cautious when extracting information from
the type of `v`.
This PR fixes the `reduceArity` compiler pass to consider
over-applications to functions that have their arity reduced.
Previously, this pass assumed that the amount of arguments to
applications was always the same as the number of parameters in the
signature. This is usually true, since the compiler eagerly introduces
parameters as long as the return type is a function type, resulting in a
function with a return type that isn't a function type. However, for
dependent types that sometimes are function types and sometimes not,
this assumption is broken, resulting in the additional parameters to be
dropped.
Closes#11131
This PR lets the match compilation procedure use sparse case analysis
when the patterns only match on some but not all constructors of an
inductive type. This way, less code is produce. Before, code handling
each of the other cases was then optimized and commoned-up by later
compilation pipeline, but that is wasteful to do.
In some cases this will prevent Lean from noticing that a match
statement is complete
because it performs less case-splitting for the unreachable case. In
this case, give explicit
patterns to perform the deeper split with `by contradiction` as the
right-hand side.
At least temporarily, there is also the option to disable this behaviour
with
```
set_option backwards.match.sparseCases false
```
This PR adds “sparse casesOn” constructions. They are similar to
`.casesOn`, but have arms only for some constructors and a catch-all
(providing `t.ctorIdx ≠ 42` assumptions). The compiler has native
support for these constructors and now (because of the similarity) also
the per-constructor elimination principles.
This PR enforces users of the constant folder API to provide proofs of
their algebraic properties,
thus hopefully avoiding bugs such as #11042 and #11043 in the future.
This PR fixes a case of overeager constant folding on Nat where the
compiler would mistakenly assume `0 - x = x` (see also #11042 for the
same bug on UInts).
This PR improves the detection of situations where we branch multiple
times on the same value in the
code generator. Previously this would only consider repeated branching
on function arguments, now on
arbitrary values.
Closes: #11018
This PR improves join point finding in the compiler through two means:
1. We now handle situations where a function `f` can only become a join
point when a function `g`
becomes a join point as well correctly.
2. We introduce a second join point finding pass after specialisation
and before the following
simplification pass, as the specialiser might have introduced new join
point opportunities for
the simplifier to exploit.
Notably in the code from #10995 we now correctly detect the missing join
point which required both
of these changes to be made.
Closes: #10995
This PR makes the eager lambda lifting heuristic more predictable by
blocking it from lifting from
any kind of inlineable function, not just `@[inline]`. It also adapts
the doc-string to describe
what is actually going on.
This PR performs more widening in ElimDeadBranches in an attempt to
improve performance in situations with a lot of local precision.
While this is not enough to make the compilation instant it pushes
compilation time from 12s to 3s for the example in #10857 and barely
introduces regressions so it seems like a good first step in this
direction.
Closes: #10857
This PR implements zero cost `BaseIO` by erasing the `IO.RealWorld`
parameter from argument lists and structures. This is a **major breaking
change for FFI**.
Concretely:
- `BaseIO` is defined in terms of `ST IO.RealWorld`
- `EIO` (and thus `IO`) is defined in terms of `EST IO.RealWorld`
- The opaque `Void` type is introduced and the trivial structure
optimization updated to account for it. Furthermore, arguments of type
`Void s` are removed from the argument lists of the C functions.
- `ST` is redefined as `Void s -> ST.Out s a` where `ST.Out` is a pair
of `Void s` and `a`
This together has the following major effects on our generated code:
- Functions that return `BaseIO`/`ST`/`EIO`/`IO`/`EST` now do not take
the dummy world parameter anymore. To account for this FFI code needs to
delete the dummy world parameter from the argument lists.
- Functions that return `BaseIO`/`ST` now return their wrapped value
directly. In particular `BaseIO UInt32` now returns a `uint32_t` instead
of a `lean_object*`. To account for this FFI code might have to change
the return type and does not need to call `lean_io_result_mk_ok` anymore
but can instead just `return` values right away (same with extracting
values from `BaseIO` computations.
- Functions that return `EIO`/`IO`/`EST` now only return the equivalent
of an `Except` node which reduces the allocation size. The
`lean_io_result_mk_ok`/`lean_io_result_mk_error` functions were updated
to account for this already so no change is required.
Besides improving performance by dropping allocation (sizes) we can now
also do fun new things such as:
```lean
@[extern "malloc"]
opaque malloc (size : USize) : BaseIO USize
```
This PR reduces the amount of symbols in our DLLs by cutting open a
linking cycle of the shape:
`Environment -> Compiler -> Meta -> Environment`
This is achieved by introducing a dynamic call to the compiler hidden
behind a `Ref` as previously
done in the pretty printer.
This PR "monomorphizes" the structure `Std.PRange shape α`, replacing it
with nine distinct structures `Std.Rcc`, `Std.Rco`, `Std.Rci` etc., one
for each possible shape of a range's bounds. This change was necessary
because the shape polymorphism is detrimental to attempts of automation.
**BREAKING CHANGE:** While range/slice notation itself is unchanged,
this essentially breaks the entire remaining (polymorphic) range and
slice API except for the dot-notation(`toList`, `iter`, ...). It is not
possible to deprecate old declarations that were formulated in a
shape-polymorphic way that is not available anymore.
This PR ensures that even if a type is marked as `irreducible` the
compiler can see through it in
order to discover functions hidden behind type aliases.
This PR adds the necessary infrastructure for recording elaboration
dependencies that may not be apparent from the resulting environment
such as notations and other metaprograms. An adapted version of `shake`
from Mathlib is added to `script/` but may be moved to another location
or repo in the future.
This PR resolves a potential bad interaction between the compiler and
the module system where references to declarations not imported are
brought into scope by inlining or specializing. We now proactively check
that declarations to be inlined/specialized only reference public
imports. The intention is to later resolve this limitation by moving out
compilation into a separate build step with its own import/incremental
system.
This PR fixes constant folding for UIntX in the code generator. This
optimization was previously simply dead code due to the way that uint
literals are encoded.