lean4-htt/doc/dev/ffi.md
Sofia Rodrigues 0f2cb91336
feat: add lean_setup_libuv for initializing required LIBUV components (#8636)
This PR adds a function called `lean_setup_libuv` that initializes
required LIBUV components. It needs to be outside of
`lean_initialize_runtime_module` because it requires `argv` and `argc`
to work correctly.

---------

Co-authored-by: Markus Himmel <markus@lean-fro.org>
Co-authored-by: Eric Wieser <wieser.eric@gmail.com>
2025-06-27 11:11:17 +00:00

190 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Foreign Function Interface
NOTE: The current interface was designed for internal use in Lean and should be considered **unstable**.
It will be refined and extended in the future.
As Lean is written partially in Lean itself and partially in C++, it offers efficient interoperability between the two languages (or rather, between Lean and any language supporting C interfaces).
This support is however currently limited to transferring Lean data types; in particular, it is not possible yet to pass or return compound data structures such as C `struct`s by value from or to Lean.
There are two primary attributes for interoperating with other languages:
* `@[extern "sym"] constant leanSym : ...` binds a Lean declaration to the external symbol `sym`.
It can also be used with `def` to provide an internal definition, but ensuring consistency of both definitions is up to the user.
* `@[export sym] def leanSym : ...` exports `leanSym` under the unmangled symbol name `sym`.
For simple examples of how to call foreign code from Lean and vice versa, see <https://github.com/leanprover/lean4/blob/master/src/lake/examples/ffi> and <https://github.com/leanprover/lean4/blob/master/src/lake/examples/reverse-ffi>, respectively.
## The Lean ABI
The Lean Application Binary Interface (ABI) describes how the signature of a Lean declaration is encoded as a native calling convention.
It is based on the standard C ABI and calling convention of the target platform.
For a Lean declaration marked with either `@[extern "sym"]` or `@[export sym]` for some symbol name `sym`, let `α₁ → ... → αₙ → β` be the normalized declaration's type.
If `n` is 0, the corresponding C declaration is
```c
extern s sym;
```
where `s` is the C translation of `β` as specified in the next section.
In the case of an `@[extern]` definition, the symbol's value is guaranteed to be initialized only after calling the Lean module's initializer or that of an importing module; see [Initialization](#initialization).
If `n` is greater than 0, the corresponding C declaration is
```c
s sym(t, ..., tₘ);
```
where the parameter types `tᵢ` are the C translation of the `αᵢ` as in the next section.
In the case of `@[extern]` all *irrelevant* types are removed first; see next section.
### Translating Types from Lean to C
* The integer types `UInt8`, ..., `UInt64`, `USize` are represented by the C types `uint8_t`, ..., `uint64_t`, `size_t`, respectively
* `Char` is represented by `uint32_t`
* `Float` is represented by `double`
* An *enum* inductive type of at least 2 and at most 2^32 constructors, each of which with no parameters, is represented by the first type of `uint8_t`, `uint16_t`, `uint32_t` that is sufficient to represent all constructor indices.
For example, the type `Bool` is represented as `uint8_t` with values `0` for `false` and `1` for `true`.
* `Decidable α` is represented the same way as `Bool`
* An inductive type with a *trivial structure*, that is,
* it is none of the types described above
* it is not marked `unsafe`
* it has a single constructor with a single parameter of *relevant* type
is represented by the representation of that parameter's type.
For example, `{ x : α // p }`, the `Subtype` structure of a value of type `α` and an irrelevant proof, is represented by the representation of `α`.
Similarly, the signed integer types `Int8`, ..., `Int64`, `ISize` are also represented by the unsigned C types `uint8_t`, ..., `uint64_t`, `size_t`, respectively, because they have a trivial structure.
* `Nat` and `Int` are represented by `lean_object *`.
Their runtime values is either a pointer to an opaque bignum object or, if the lowest bit of the "pointer" is 1 (`lean_is_scalar`), an encoded unboxed natural number or integer (`lean_box`/`lean_unbox`).
* A universe `Sort u`, type constructor `... → Sort u`, or proposition `p : Prop` is *irrelevant* and is either statically erased (see above) or represented as a `lean_object *` with the runtime value `lean_box(0)`
* Any other type is represented by `lean_object *`.
Its runtime value is a pointer to an object of a subtype of `lean_object` (see the "Inductive types" section below) or the unboxed value `lean_box(cidx)` for the `cidx`th constructor of an inductive type if this constructor does not have any relevant parameters.
Example: the runtime value of `u : Unit` is always `lean_box(0)`.
#### Inductive types
For inductive types which are in the fallback `lean_object *` case above and not trivial constructors, the type is stored as a `lean_ctor_object`, and `lean_is_ctor` will return true. A `lean_ctor_object` stores the constructor index in the header, and the fields are stored in the `m_objs` portion of the object.
The memory order of the fields is derived from the types and order of the fields in the declaration. They are ordered as follows:
* Non-scalar fields stored as `lean_object *`
* Fields of type `USize`
* Other scalar fields, in decreasing order by size
Within each group the fields are ordered in declaration order. **Warning**: Trivial wrapper types still count toward a field being treated as non-scalar for this purpose.
* To access fields of the first kind, use `lean_ctor_get(val, i)` to get the `i`th non-scalar field.
* To access `USize` fields, use `lean_ctor_get_usize(val, n+i)` to get the `i`th usize field and `n` is the total number of fields of the first kind.
* To access other scalar fields, use `lean_ctor_get_uintN(val, off)` or `lean_ctor_get_usize(val, off)` as appropriate. Here `off` is the byte offset of the field in the structure, starting at `n*sizeof(void*)` where `n` is the number of fields of the first two kinds.
For example, a structure such as
```lean
structure S where
ptr_1 : Array Nat
usize_1 : USize
sc64_1 : UInt64
ptr_2 : { x : UInt64 // x > 0 } -- wrappers don't count as scalars
sc64_2 : Float -- `Float` is 64 bit
sc8_1 : Bool
sc16_1 : UInt16
sc8_2 : UInt8
sc64_3 : UInt64
usize_2 : USize
ptr_3 : Char -- trivial wrapper around `UInt32`
sc32_1 : UInt32
sc16_2 : UInt16
```
would get re-sorted into the following memory order:
* `S.ptr_1` - `lean_ctor_get(val, 0)`
* `S.ptr_2` - `lean_ctor_get(val, 1)`
* `S.ptr_3` - `lean_ctor_get(val, 2)`
* `S.usize_1` - `lean_ctor_get_usize(val, 3)`
* `S.usize_2` - `lean_ctor_get_usize(val, 4)`
* `S.sc64_1` - `lean_ctor_get_uint64(val, sizeof(void*)*5)`
* `S.sc64_2` - `lean_ctor_get_float(val, sizeof(void*)*5 + 8)`
* `S.sc64_3` - `lean_ctor_get_uint64(val, sizeof(void*)*5 + 16)`
* `S.sc32_1` - `lean_ctor_get_uint32(val, sizeof(void*)*5 + 24)`
* `S.sc16_1` - `lean_ctor_get_uint16(val, sizeof(void*)*5 + 28)`
* `S.sc16_2` - `lean_ctor_get_uint16(val, sizeof(void*)*5 + 30)`
* `S.sc8_1` - `lean_ctor_get_uint8(val, sizeof(void*)*5 + 32)`
* `S.sc8_2` - `lean_ctor_get_uint8(val, sizeof(void*)*5 + 33)`
### Borrowing
By default, all `lean_object *` parameters of an `@[extern]` function are considered *owned*, i.e. the external code is passed a "virtual RC token" and is responsible for passing this token along to another consuming function (exactly once) or freeing it via `lean_dec`.
To reduce reference counting overhead, parameters can be marked as *borrowed* by prefixing their type with `@&`.
Borrowed objects must only be passed to other non-consuming functions (arbitrarily often) or converted to owned values using `lean_inc`.
In `lean.h`, the `lean_object *` aliases `lean_obj_arg` and `b_lean_obj_arg` are used to mark this difference on the C side.
Return values and `@[export]` parameters are always owned at the moment.
## Initialization
When including Lean code as part of a larger program, modules must be *initialized* before accessing any of their declarations.
Module initialization entails
* initialization of all "constants" (nullary functions), including closed terms lifted out of other functions
* execution of all `[init]` functions
* execution of all `[builtin_init]` functions, if the `builtin` parameter of the module initializer has been set
The module initializer is automatically run with the `builtin` flag for executables compiled from Lean code and for "plugins" loaded with `lean --plugin`.
For all other modules imported by `lean`, the initializer is run without `builtin`.
Thus `[init]` functions are run iff their module is imported, regardless of whether they have native code available or not, while `[builtin_init]` functions are only run for native executable or plugins, regardless of whether their module is imported or not.
`lean` uses built-in initializers for e.g. registering basic parsers that should be available even without importing their module (which is necessary for bootstrapping).
The initializer for module `A.B` is called `initialize_A_B` and will automatically initialize any imported modules.
Module initializers are idempotent (when run with the same `builtin` flag), but not thread-safe.
**Important for process-related functionality**: If your application needs to use process-related functions from libuv, such as `Std.Internal.IO.Process.getProcessTitle` and `Std.Internal.IO.Process.setProcessTitle`, you must call `lean_setup_args(argc, argv)` (which returns a potentially modified `argv` that must be used in place of the original) **before** calling `lean_initialize()` or `lean_initialize_runtime_module()`. This sets up process handling capabilities correctly, which is essential for certain system-level operations that Lean's runtime may depend on.
Together with initialization of the Lean runtime, you should execute code like the following exactly once before accessing any Lean declarations:
```c
void lean_initialize_runtime_module();
void lean_initialize();
char ** lean_setup_args(int argc, char ** argv);
lean_object * initialize_A_B(uint8_t builtin, lean_object *);
lean_object * initialize_C(uint8_t builtin, lean_object *);
...
argv = lean_setup_args(argc, argv); // if using process-related functionality
lean_initialize_runtime_module();
//lean_initialize(); // necessary (and replaces `lean_initialize_runtime_module`) if you (indirectly) access the `Lean` package
lean_object * res;
// use same default as for Lean executables
uint8_t builtin = 1;
res = initialize_A_B(builtin, lean_io_mk_world());
if (lean_io_result_is_ok(res)) {
lean_dec_ref(res);
} else {
lean_io_result_show_error(res);
lean_dec(res);
return ...; // do not access Lean declarations if initialization failed
}
res = initialize_C(builtin, lean_io_mk_world());
if (lean_io_result_is_ok(res)) {
...
//lean_init_task_manager(); // necessary if you (indirectly) use `Task`
lean_io_mark_end_initialization();
```
In addition, any other thread not spawned by the Lean runtime itself must be initialized for Lean use by calling
```c
void lean_initialize_thread();
```
and should be finalized in order to free all thread-local resources by calling
```c
void lean_finalize_thread();
```
## `@[extern]` in the Interpreter
The interpreter can run Lean declarations for which symbols are available in loaded shared libraries, which includes `@[extern]` declarations.
Thus to e.g. run `#eval` on such a declaration, you need to
1. compile (at least) the module containing the declaration and its dependencies into a shared library, and then
1. pass this library to `lean --load-dynlib=` to run code `import`ing this module.
Note that it is not sufficient to load the foreign library containing the external symbol because the interpreter depends on code that is emitted for each `@[extern]` declaration.
Thus it is not possible to interpret an `@[extern]` declaration in the same file.
See [`tests/compiler/foreign`](https://github.com/leanprover/lean4/tree/master/tests/compiler/foreign/) for an example.