lean4-htt/doc/dev/bootstrap.md
Joachim Breitner 74adb0961c
chore: add ./script/rebase-stage0.sh (#3984)
heavily based on an script by Kim.
2024-05-02 12:26:25 +00:00

151 lines
8.4 KiB
Markdown

# Lean Build Bootstrapping
Since version 4, Lean is a partially bootstrapped program: most parts of the
frontend and compiler are written in Lean itself and thus need to be built before
building Lean itself - which is needed to again build those parts. This cycle is
broken by using pre-built C files checked into the repository (which ultimately
go back to a point where the Lean compiler was not written in Lean) in place of
these Lean inputs and then compiling everything in multiple stages up to a fixed
point. The build directory is organized in these stages:
```bash
stage0/
# Bootstrap binary built from stage0/src/.
# We do not use any other files from this directory in further stages.
bin/lean
stage1/
include/
config.h # config variables used to build `lean` such as used allocator
runtime/lean.h # runtime header, used by extracted C code, uses `config.h`
share/lean/
lean.mk # used by `leanmake`
lib/
lean/**/*.olean # the Lean library (incl. the compiler) compiled by the previous stage's `lean`
temp/**/*.{c,o} # the library extracted to C and compiled by `leanc`
libInit.a libLean.a # static libraries of the Lean library
libleancpp.a # a static library of the C++ sources of Lean
libleanshared.so # a dynamic library including the static libraries above
bin/
lean # the Lean compiler & server, a small executable that calls directly into libleanshared.so
leanc # a wrapper around a C compiler supplying search paths etc
leanmake # a wrapper around `make` supplying the Makefile above
stage2/...
stage3/...
```
Stage 0 can be viewed as a blackbox since it does not depend on any local
changes and is equivalent to downloading a bootstrapping binary as done in other
compilers. The build for any other stage starts by building the runtime and
standard library from `src/`, using the `lean` binary from the previous stage in
the latter case, which are then assembled into a new `bin/lean` binary.
Each stage can be built by calling `make stageN` in the root build folder.
Running just `make` will default to stage 1, which is usually sufficient for
testing changes on the test suite or other files outside of the stdlib. However,
it might happen that the stage 1 compiler is not able to load its own stdlib,
e.g. when changing the .olean format: the stage 1 stdlib will use the format
generated by the stage 0 compiler, but the stage 1 compiler will expect the new
format. In this case, we should continue with building and testing stage 2
instead, which will both build and expect the new format. Note that this is only
possible because when building a stage's stdlib, we use the previous compiler
but never load the previous stdlib (since everything is `prelude`). We can also
use stage 2 to test changes in the compiler or other "meta" parts, i.e. changes
that affect the produced (.olean or .c) code, on the stdlib and compiler itself.
We are not aware of any "meta-meta" parts that influence more than two stages of
compilation, so stage 3 should always be identical to stage 2 and only exists as
a sanity check.
In summary, doing a standard build via `make` internally involves these steps:
1. compile the `stage0/src` archived sources into `stage0/bin/lean`
1. use it to compile the current library (*including* your changes) into `stage1/lib`
1. link that and the current C++ code from `src/` into `stage1/bin/lean`
You now have a Lean binary and library that include your changes, though their
own compilation was not influenced by them, that you can use to test your
changes on test programs whose compilation *will* be influenced by the changes.
## Updating stage0
Finally, when we want to use new language features in the library, we need to
update the archived C source code of the stage 0 compiler in `stage0/src`.
The github repository will automatically update stage0 on `master` once
`src/stdlib_flags.h` and `stage0/src/stdlib_flags.h` are out of sync.
If you have write access to the lean4 repository, you can also also manually
trigger that process, for example to be able to use new features in the compiler itself.
You can do that on <https://github.com/leanprover/lean4/actions/workflows/update-stage0.yml>
or using Github CLI with
```
gh workflow run update-stage0.yml
```
Leaving stage0 updates to the CI automation is preferable, but should you need
to do it locally, you can use `make update-stage0-commit` in `build/release` to
update `stage0` from `stage1` or `make -C stageN update-stage0-commit` to
update from another stage. This command will automatically stage the updated files
and introduce a commit,so make sure to commit your work before that.
If you rebased the branch (either onto a newer version of `master`, or fixing
up some commits prior to the stage0 update, recreate the stage0 update commits.
The script `script/rebase-stage0.sh` can be used for that.
The CI should prevent PRs with changes to stage0 (besides `stdlib_flags.h`)
from entering `master` through the (squashing!) merge queue, and label such PRs
with the `changes-stage0` label. Such PRs should have a cleaned up history,
with separate stage0 update commits; then coordinate with the admins to merge
your PR using rebase merge, bypassing the merge queue.
## Further Bootstrapping Complications
As written above, changes in meta code in the current stage usually will only
affect later stages. This is an issue in two specific cases.
* For *non-builtin* meta code such as `notation`s or `macro`s in
`Notation.lean`, we expect changes to affect the current file and all later
files of the same stage immediately, just like outside the stdlib. To ensure
this, we need to build the stage using `-Dinterpreter.prefer_native=false` -
otherwise, when executing a macro, the interpreter would notice that there is
already a native symbol available for this function and run it instead of the
new IR, but the symbol is from the previous stage!
To make matters more complicated, while `false` is a reasonable default
incurring only minor overhead (`ParserDescr`s and simple macros are cheap to
interpret), there are situations where we *need* to set the option to `true`:
when the interpreter is executed from the native code of the previous stage,
the type of the value it computes must be identical to/ABI-compatible with the
type in the previous stage. For example, if we add a new parameter to `Macro`
or reorder constructors in `ParserDescr`, building the stage with the
interpreter will likely fail. Thus we need to set `interpreter.prefer_native`
to `true` in such cases to "freeze" meta code at their versions in the
previous stage; no new meta code should be introduced in this stage. Any
further stages (e.g. after an `update-stage0`) will then need to be compiled
with the flag set to `false` again since they will expect the new signature.
For an example, see https://github.com/leanprover/lean4/commit/da4c46370d85add64ef7ca5e7cc4638b62823fbb.
* For the special case of *quotations*, it is desirable to have changes in
built-in parsers affect them immediately: when the changes in the parser
become active in the next stage, macros implemented via quotations should
generate syntax trees compatible with the new parser, and quotation patterns
in macro and elaborators should be able to match syntax created by the new
parser and macros. Since quotations capture the syntax tree structure during
execution of the current stage and turn it into code for the next stage, we
need to run the current stage's built-in parsers in quotation via the
interpreter for this to work. Caveats:
* Since interpreting full parsers is not nearly as cheap and we rarely change
built-in syntax, this needs to be opted in using `-Dinternal.parseQuotWithCurrentStage=true`.
* The parser needs to be reachable via an `import` statement, otherwise the
version of the previous stage will silently be used.
* Only the parser code (`Parser.fn`) is affected; all metadata such as leading
tokens is taken from the previous stage.
For an example, see https://github.com/leanprover/lean4/commit/f9dcbbddc48ccab22c7674ba20c5f409823b4cc1#diff-371387aed38bb02bf7761084fd9460e4168ae16d1ffe5de041b47d3ad2d22422
(from before the flag defaulted to `false`).
To modify either of these flags both for building and editing the stdlib, adjust
the code in `stage0/src/stdlib_flags.h`. The flags will automatically be reset
on the next `update-stage0` when the file is overwritten with the original
version in `src/`.