Typos were found with ``` pip install codespell --upgrade codespell --summary --ignore-words-list enew,forin,fro,happend,hge,ihs,iterm,spred --skip stage0 --check-filenames codespell --summary --ignore-words-list enew,forin,fro,happend,hge,ihs,iterm,spred --skip stage0 --check-filenames --regex '[A-Z][a-z]*' codespell --summary --ignore-words-list enew,forin,fro,happend,hge,ihs,iterm,spred --skip stage0 --check-filenames --regex "\b[a-z']*" ```
152 lines
10 KiB
Markdown
152 lines
10 KiB
Markdown
# Lean Build Bootstrapping
|
|
|
|
Lean is a bootstrapped program: the
|
|
frontend and compiler are written in Lean itself and thus need to be built before
|
|
building Lean itself - which is needed to again build those parts. This cycle is
|
|
broken by using pre-built C files checked into the repository (which ultimately
|
|
go back to a point where the Lean compiler was not written in Lean) in place of
|
|
these Lean inputs and then compiling everything in multiple stages up to a fixed
|
|
point. The build directory is organized in these stages:
|
|
|
|
```bash
|
|
stage0/
|
|
# Bootstrap binary built from stage0/src/.
|
|
# We do not use any other files from this directory in further stages.
|
|
bin/lean
|
|
stage1/
|
|
include/
|
|
config.h # config variables used to build `lean` such as used allocator
|
|
runtime/lean.h # runtime header, used by extracted C code, uses `config.h`
|
|
share/lean/
|
|
lean.mk # used by `leanmake`
|
|
lib/
|
|
lean/**/*.olean # the Lean library (incl. the compiler) compiled by the previous stage's `lean`
|
|
temp/**/*.{c,o} # the library extracted to C and compiled by `leanc`
|
|
libInit.a libLean.a # static libraries of the Lean library
|
|
libleancpp.a # a static library of the C++ sources of Lean
|
|
libleanshared.so # a dynamic library including the static libraries above
|
|
bin/
|
|
lean # the Lean compiler & server, a small executable that calls directly into libleanshared.so
|
|
leanc # a wrapper around a C compiler supplying search paths etc
|
|
leanmake # a wrapper around `make` supplying the Makefile above
|
|
stage2/...
|
|
stage3/...
|
|
```
|
|
|
|
Stage 0 can be viewed as a blackbox since it does not depend on any local
|
|
changes and is equivalent to downloading a bootstrapping binary as done in other
|
|
compilers. The build for any other stage starts by building the runtime and
|
|
standard library from `src/`, using the `lean` binary from the previous stage in
|
|
the latter case, which are then assembled into a new `bin/lean` binary.
|
|
|
|
Each stage can be built by calling `make stageN` in the root build folder.
|
|
Running just `make` will default to stage 1, which is usually sufficient for
|
|
testing changes on the test suite or other files outside of the stdlib. However,
|
|
it might happen that the stage 1 compiler is not able to load its own stdlib,
|
|
e.g. when changing the .olean format: the stage 1 stdlib will use the format
|
|
generated by the stage 0 compiler, but the stage 1 compiler will expect the new
|
|
format. In this case, we should continue with building and testing stage 2
|
|
instead, which will both build and expect the new format. Note that this is only
|
|
possible because when building a stage's stdlib, we use the previous compiler
|
|
but never load the previous stdlib (since everything is `prelude`). We can also
|
|
use stage 2 to test changes in the compiler or other "meta" parts, i.e. changes
|
|
that affect the produced (.olean or .c) code, on the stdlib and compiler itself.
|
|
We are not aware of any "meta-meta" parts that influence more than two stages of
|
|
compilation, so stage 3 should always be identical to stage 2 and only exists as
|
|
a sanity check.
|
|
|
|
In summary, doing a standard build via `make` internally involves these steps:
|
|
|
|
1. compile the `stage0/src` archived sources into `stage0/bin/lean`
|
|
1. use it to compile the current library (*including* your changes) into `stage1/lib`
|
|
1. link that and the current C++ code from `src/` into `stage1/bin/lean`
|
|
|
|
You now have a Lean binary and library that include your changes, though their
|
|
own compilation was not influenced by them, that you can use to test your
|
|
changes on test programs whose compilation *will* be influenced by the changes.
|
|
|
|
## Updating stage0
|
|
|
|
Finally, when we want to use new language features in the library, we need to
|
|
update the archived C source code of the stage 0 compiler in `stage0/src`.
|
|
|
|
The github repository will automatically update stage0 on `master` once
|
|
`src/stdlib_flags.h` and `stage0/src/stdlib_flags.h` are out of sync.
|
|
|
|
NOTE: A full rebuild of stage 1 will only be triggered when the *committed* contents of `stage0/` are changed.
|
|
Thus if you change files in it manually instead of through `update-stage0-commit` (see below) or fetching updates from git, you either need to commit those changes first or run `make -C build/release clean-stdlib`.
|
|
The same is true for further stages except that a rebuild of them is retriggered on any committed change, not just to a specific directory.
|
|
Thus when debugging e.g. stage 2 failures, you can resume the build from these failures on but may want to explicitly call `clean-stdlib` to either observe changes from `.olean` files of modules that built successfully or to check that you did not break modules that built successfully at some prior point.
|
|
|
|
If you have write access to the lean4 repository, you can also manually
|
|
trigger that process, for example to be able to use new features in the compiler itself.
|
|
You can do that on <https://github.com/leanprover/lean4/actions/workflows/update-stage0.yml>
|
|
or using Github CLI with
|
|
```
|
|
gh workflow run update-stage0.yml
|
|
```
|
|
|
|
Leaving stage0 updates to the CI automation is preferable, but should you need
|
|
to do it locally, you can use `make -C build/release update-stage0-commit` to
|
|
update `stage0` from `stage1` or `make -C build/release/stageN update-stage0-commit` to
|
|
update from another stage. This command will automatically stage the updated files
|
|
and introduce a commit, so make sure to commit your work before that.
|
|
|
|
If you rebased the branch (either onto a newer version of `master`, or fixing
|
|
up some commits prior to the stage0 update), recreate the stage0 update commits.
|
|
The script `script/rebase-stage0.sh` can be used for that.
|
|
|
|
The CI should prevent PRs with changes to stage0 (besides `stdlib_flags.h`)
|
|
from entering `master` through the (squashing!) merge queue, and label such PRs
|
|
with the `changes-stage0` label. Such PRs should have a cleaned up history,
|
|
with separate stage0 update commits; then coordinate with the admins to merge
|
|
your PR using rebase merge, bypassing the merge queue.
|
|
|
|
|
|
## Further Bootstrapping Complications
|
|
|
|
As written above, changes in meta code in the current stage usually will only
|
|
affect later stages. This is an issue in two specific cases.
|
|
|
|
* For the special case of *quotations*, it is desirable to have changes in builtin parsers affect them immediately: when the changes in the parser become active in the next stage, builtin macros implemented via quotations should generate syntax trees compatible with the new parser, and quotation patterns in builtin macros and elaborators should be able to match syntax created by the new parser and macros.
|
|
Since quotations capture the syntax tree structure during execution of the current stage and turn it into code for the next stage, we need to run the current stage's builtin parsers in quotations via the interpreter for this to work.
|
|
Caveats:
|
|
* We activate this behavior by default when building stage 1 by setting `-Dinternal.parseQuotWithCurrentStage=true`.
|
|
We force-disable it inside `macro/macro_rules/elab/elab_rules` via `suppressInsideQuot` as they are guaranteed not to run in the next stage and may need to be run in the current one, so the stage 0 parser is the correct one to use for them.
|
|
It may be necessary to extend this disabling to functions that contain quotations and are (exclusively) used by one of the mentioned commands. A function using quotations should never be used by both builtin and non-builtin macros/elaborators. Example: https://github.com/leanprover/lean4/blob/f70b7e5722da6101572869d87832494e2f8534b7/src/Lean/Elab/Tactic/Config.lean#L118-L122
|
|
* The parser needs to be reachable via an `import` statement, otherwise the version of the previous stage will silently be used.
|
|
* Only the parser code (`Parser.fn`) is affected; all metadata such as leading tokens is taken from the previous stage.
|
|
|
|
For an example, see https://github.com/leanprover/lean4/commit/f9dcbbddc48ccab22c7674ba20c5f409823b4cc1#diff-371387aed38bb02bf7761084fd9460e4168ae16d1ffe5de041b47d3ad2d22422R13
|
|
|
|
* For *non-builtin* meta code such as `notation`s or `macro`s in
|
|
`Notation.lean`, we expect changes to affect the current file and all later
|
|
files of the same stage immediately, just like outside the stdlib. To ensure
|
|
this, we build stage 1 using `-Dinterpreter.prefer_native=false` -
|
|
otherwise, when executing a macro, the interpreter would notice that there is
|
|
already a native symbol available for this function and run it instead of the
|
|
new IR, but the symbol is from the previous stage!
|
|
|
|
To make matters more complicated, while `false` is a reasonable default
|
|
incurring only minor overhead (`ParserDescr`s and simple macros are cheap to
|
|
interpret), there are situations where we *need* to set the option to `true`:
|
|
when the interpreter is executed from the native code of the previous stage,
|
|
the type of the value it computes must be identical to/ABI-compatible with the
|
|
type in the previous stage. For example, if we add a new parameter to `Macro`
|
|
or reorder constructors in `ParserDescr`, building the stage with the
|
|
interpreter will likely fail. Thus we need to set `interpreter.prefer_native`
|
|
to `true` in such cases to "freeze" meta code at their versions in the
|
|
previous stage; no new meta code should be introduced in this stage. Any
|
|
further stages (e.g. after an `update-stage0`) will then need to be compiled
|
|
with the flag set to `false` again since they will expect the new signature.
|
|
|
|
When enabling `prefer_native`, we usually want to *disable* `parseQuotWithCurrentStage` as it would otherwise make quotations use the interpreter after all.
|
|
However, there is a specific case where we want to set both options to `true`: when we make changes to a non-builtin parser like `simp` that has a builtin elaborator, we cannot have the new parser be active outside of quotations in stage 1 as the builtin elaborator from stage 0 would not understand them; on the other hand, we need quotations in e.g. the builtin `simp` elaborator to produce the new syntax in the next stage.
|
|
As this issue usually affects only tactics, enabling `debug.byAsSorry` instead of `prefer_native` can be a simpler solution.
|
|
|
|
For a `prefer_native` example, see https://github.com/leanprover/lean4/commit/da4c46370d85add64ef7ca5e7cc4638b62823fbb.
|
|
|
|
To modify either of these flags both for building and editing the stdlib, adjust
|
|
the code in `stage0/src/stdlib_flags.h`. The flags will automatically be reset
|
|
on the next `update-stage0` when the file is overwritten with the original
|
|
version in `src/`.
|