Commit graph

61 commits

Author SHA1 Message Date
Sebastian Ullrich
b4d4e371d2
chore: shake core (#12276) 2026-02-05 09:10:32 +00:00
Kim Morrison
d49e5d8a3d Revert "chore: temporarily disable proofs for bootstrap"
This reverts commit c56a5732a5a215f7b74d3f7a5cefd8612cf50474.
2026-02-05 13:41:34 +11:00
Kim Morrison
7b12b504df chore: temporarily disable proofs for bootstrap
This adds `set_option debug.byAsSorry true` and `decreasing_by sorry` to
various files to allow bootstrapping with Config structure changes. These
changes will be restored after the bootstrap dance is complete.
2026-02-05 13:41:34 +11:00
Markus Himmel
fa67f300f6
chore: rename String.ValidPos to String.Pos (#11240)
This PR renames `String.ValidPos` to `String.Pos`, `String.endValidPos`
to `String.endPos` and `String.startValidPos` to `String.startPos`.

Accordingly, the deprecations of `String.Pos` to `String.Pos.Raw` and
`String.endPos` to `String.rawEndPos` are removed early, after an
abbreviated deprecation cycle of two releases.
2025-11-24 16:40:21 +00:00
Markus Himmel
dda6885eae
refactor: String.foldl and String.isNat go through String.Slice (#11289)
This PR redefines `String.foldl`, `String.isNat` to use their
`String.Slice` counterparts.
2025-11-21 11:17:50 +00:00
Markus Himmel
2c2fcff4f8
refactor: do not use String.Iterator (#11127)
This PR removes all uses of `String.Iterator` from core, preferring
`String.ValidPos` instead.

In an upcoming PR, `String.Iterator` will be renamed to
`String.Legacy.Iterator`.
2025-11-11 11:46:58 +00:00
Markus Himmel
ba7798b389
chore: more reorganization of strings (#10928)
This PR splits more material out of `Init.Data.String.Basic`.
2025-10-23 11:56:11 +00:00
Markus Himmel
b5dc11e8d3
chore: move some material out of Init.Data.String.Basic (#10893)
This PR splits some low-hanging fruit out of `Init.Data.String.Basic`:
basic material about `String.Pos.Raw`, `String.Substrig`, and
`String.Iterator`.

More splitting required and the remaining material is quite unorganized,
but it's a start.
2025-10-22 16:31:08 +00:00
Markus Himmel
dad541265c
refactor: move operations on String.Pos.Raw to the String.Pos.Raw namespace (#10735)
This PR moves many operations involving `String.Pos.Raw` to a the
`String.Pos.Raw` namespace with the eventual aim of freeing up the
`String` namespace to contain operations using `String.ValidPos` (to be
renamed to `String.Pos`) instead.

This PR adds the `String.ValidPos.set` and `String.ValidPos.modify`
functions.

After this PR, `String.pos_lt_eq` is no longer a `simp` lemma. Add
`String.Pos.Raw.lt_iff` as a `simp` lemma if your proofs break.
2025-10-18 12:12:55 +00:00
Sebastian Ullrich
428355cf02
chore: remove redundant imports in core (#10750) 2025-10-16 20:27:46 +00:00
Markus Himmel
5bfbe2a875
refactor: incorporate UTF8 material from String.Extra into String.Basic (#10634)
This PR defines `ByteArray.validateUTF8`, uses it to show that
`ByteArray.IsValidUtf8` is decidable and redefines `String.fromUTF8` and
friends to use it.

The functions `String.validateUTF8` and `String.utf8DecodeChar?` are
deprecated in favor of the identically named functions in the
`ByteArray` namespace.
2025-10-01 11:33:29 +00:00
Markus Himmel
81ea922025
chore: rename String.Pos to String.Pos.Raw (#10624)
This PR renames `String.Pos` to `String.Pos.Raw`.

After an abbreviated deprecation cycle, we will then rename
`String.ValidPos` to `String.Pos`.
2025-10-01 07:45:24 +00:00
Markus Himmel
d6cd738ab4
feat: redefine String, part two (#10457)
This PR introduces safe alternatives to `String.Pos` and `Substring`
that can only represent valid positions/slices.

Specifically, the PR

- introduces the predicate `String.Pos.IsValid`;
- proves several nontrivial equivalent conditions for
`String.Pos.IsValid`;
- introduces `String.ValidPos`, which is a `String.Pos` with an
`IsValid` proof;
- introduces `String.Slice`, which is like `Substring` but made from
`String.ValidPos` instead of `Pos`;
- introduces `String.Pos.IsValidForSlice`, which is like
`String.Pos.IsValid` but for slices;
- introduces `String.Slice.Pos`, which is like `String.ValidPos` but for
slices;
- introduces various functions for converting between the two types of
positions.

The API added in this PR is not complete. It will be expanded in future
PRs with addional operations and verification.
2025-09-24 13:36:55 +00:00
Tom Levy
e42892cfb6
doc: fix comment about String.fromUTF8 replacing invalid chars (#10240)
Hi, the doc of `String.fromUTF8` previously said invalid characters are
replaced with 'A'. But the parameter `h : validateUTF8 a` guarantees
there are no invalid characters, so that explanation doesn't make sense
to me. This PR deletes that explanation (and fixes some unrelated
typos).

I also have a patch that uses `h` to prove each of the characters is
valid, eliminating the need for a default character
([pr/chore-String-fromUTF8-prove-valid](27f1ff36b2)),
would you be interested in merging that?

<details>
<summary>Notes on invalid characters from unchecked C++</summary>
I don't know if this function may be called from unchecked C++ with
invalid characters. If it may, I'm not sure what would happen with my
patched function... I'm not familiar with Lean's safety model, but it
seems like a bad idea to have a Lean function that takes a proof of a
proposition but is expected to operate in a certain way even if the
proposition is false. I think the safe approach is to have two functions
-- one that takes a proof and is only called from Lean, and another that
doesn't take a proof and replaces invalid chars (for use from C++, not
sure whether it's useful from Lean); I'd prefer to go even further and
report an error instead of silently replacing invalid characters (I'm
not sure if there is any easy way to report errors/panic in Lean code
called from C++).
</details>
2025-09-23 10:19:20 +00:00
Markus Himmel
197bc6cb66
feat: redefine String, part one (#10304)
This PR redefines `String` to be the type of byte arrays `b` for which
`b.IsValidUtf8`.

This moves the data model of strings much closer to the actual data
representation at runtime.

In the near future, we will

- provide variants of `String.Pos` and `Substring` that only allow for
valid positions
- redefine all `String` functions to be much closer to their C++
implementations

In the near-to-medium future we will then provide comprehensive
verification of `String` based on these refactors.
2025-09-18 11:36:52 +00:00
Markus Himmel
19bd0254c3
chore: move String.utf8EncodeChar to the prelude (#10264)
This PR moves `String.utf8EncodeChar` to the prelude to prepare for the
imminent redefinition of `String`.

The definition in the prelude uses modulo and division operations on
natural numbers. In `String.Extra`, a `csimp` lemma is provided, showing
that the new definition is equal to the previous one (which is now
called `utf8EncodeCharFast`) which uses bitwise operations on `UInt8`.
2025-09-07 12:42:53 +00:00
Kim Morrison
a06e6e7f4d
chore: make UInt.Lemmas a private import of String.Extra (#10115)
This PR makes the `Init.Data.UInt.Lemmas` import into
`Init.Data.String.Extra` private; previously this import was on the
rebuild critical path.
2025-08-25 16:46:22 +00:00
Sebastian Ullrich
0e8838df3b
chore: avoid confusing public import all combination (#10051) 2025-08-22 12:04:42 +00:00
Sebastian Ullrich
09a5b34931
feat: make private the default in module (#9044)
This PR adjusts the experimental module system to make `private` the
default visibility modifier in `module`s, introducing `public` as a new
modifier instead. `public section` can be used to revert the default for
an entire section, though this is more intended to ease gradual adoption
of the new semantics such as in `Init` (and soon `Std`) where they
should be replaced by a future decl-by-decl re-review of visibilities.
2025-06-28 16:30:53 +00:00
Joachim Breitner
be80a23281
chore: remove unused simp args (#8905)
This PR uses the linter from
https://github.com/leanprover/lean4/pull/8901 to clean up simp
arguments.
2025-06-20 22:34:30 +00:00
Rob23oba
e450a02621
fix: change show tactic to work as documented (#7395)
This PR changes the `show t` tactic to match its documentation.
Previously it was a synonym for `change t`, but now it finds the first
goal that unifies with the term `t` and moves it to the front of the
goal list.
2025-06-12 23:54:09 +00:00
Sebastian Ullrich
01dbbeed99
feat: do not export def bodies by default (#8221)
This PR adjusts the experimental module system to not export the bodies
of `def`s unless opted out by the new attribute `@[expose]` on the `def`
or on a surrounding `section`.

---------

Co-authored-by: Markus Himmel <markus@lean-fro.org>
2025-05-15 12:16:54 +00:00
Markus Himmel
68d9d14d44
chore: do not use the coercion α → Option α in Init and Std (#8085)
This PR moves the coercion `α → Option α` to the new file
`Init.Data.Option.Coe`. This file may not be imported anywhere in `Init`
or `Std`.
2025-04-24 13:35:01 +00:00
Sebastian Ullrich
7feb583b9e
feat: enable experimental module system in Init (#8047) 2025-04-23 17:21:33 +00:00
Markus Himmel
d66abc0fc0
feat: lemmas about operations on finite unsigned integers (#7484)
This PR adds some lemmas about operations defined on `UIntX`
2025-03-18 10:52:54 +00:00
David Thrane Christiansen
5d91ed01b7
doc: review String docstrings (#7506)
This PR adds missing `String` docstrings and makes the existing ones
consistent in style.
2025-03-18 04:36:49 +00:00
Kim Morrison
3a408e0e54
feat: change Array.get to take a Nat and a proof (#6032)
This PR changes the signature of `Array.get` to take a Nat and a proof,
rather than a `Fin`, for consistency with the rest of the (planned)
Array API. Note that because of bootstrapping issues we can't provide
`get_elem_tactic` as an autoparameter for the proof. As users will
mostly use the `xs[i]` notation provided by `GetElem`, this hopefully
isn't a problem.

We may restore `Fin` based versions, either here or downstream, as
needed, but they won't be the "main" functions.

---------

Co-authored-by: David Thrane Christiansen <david@davidchristiansen.dk>
2024-11-12 03:30:46 +00:00
Kim Morrison
ef05bdc449
chore: rename List.bind and Array.concatMap to flatMap (#5731) 2024-10-16 11:30:49 +00:00
Henrik Böving
19e06acc65
refactor: redefine unsigned fixed width integers in terms of BitVec (#5323)
I made a few choices so far that can probably be discussed:
- got rid of `modn` on `UInt`, nobody seems to use it apart from the
definition of `shift` which can use normal `mod`
- removed the previous defeq optimized definition of `USize.size` in
favor for a normal one. The motivation was to allow `OfNat` to work
which doesn't seem to be necessary anymore afaict.
- Minimized uses of `.val`, should we maybe mark it deprecated?
- Mostly got rid of `.val` in basically all theorems as the proper next
level of API would now be `.toBitVec`. We could probably re-prove them
but it would be more annoying given the change of definition.
- Did not yet redefine `log2` in terms of `BitVec` as this would require
a `log2` in `BitVec` as well, do we want this?
- I added a couple of theorems around the relation of `<` on `UInt` and
`Nat`. These were previously not needed because defeq was used all over
the place to save us. I did not yet generalize these to all types as I
wasn't sure if they are the appropriate lemma that we want to have.
2024-10-16 07:28:23 +00:00
Kim Morrison
aa2360a41d chore: rename List.join to List.flatten
one more

one more

one more

fix test
2024-10-14 22:28:12 +11:00
Mario Carneiro
0a1a855ba8
fix: validate UTF-8 at C++ -> Lean boundary (#3963)
Continuation of #3958. To ensure that lean code is able to uphold the
invariant that `String`s are valid UTF-8 (which is assumed by the lean
model), we have to make sure that no lean objects are created with
invalid UTF-8. #3958 covers the case of lean code creating strings via
`fromUTF8Unchecked`, but there are still many cases where C++ code
constructs strings from a `const char *` or `std::string` with unclear
UTF-8 status.

To address this and minimize accidental missed validation, the
`(lean_)mk_string` function is modified to validate UTF-8. The original
function is renamed to `mk_string_unchecked`, with several other
variants depending on whether we know the string is UTF-8 or ASCII and
whether we have the length and/or utf8 char count on hand. I reviewed
every function which leads to `mk_string` or its variants in the C code,
and used the appropriate validation function, defaulting to `mk_string`
if the provenance is unclear.

This PR adds no new error handling paths, meaning that incorrect UTF-8
will still produce incorrect results in e.g. IO functions, they are just
not causing unsound behavior anymore. A subsequent PR will handle adding
better error reporting for bad UTF-8.
2024-06-19 14:05:48 +00:00
Kim Morrison
2a2b276ede
chore: unify String.csize : Nat and Char.utf8Size : UInt32 as Char.size : Nat (#4357)
It seems:
* there was no actual need for the UInt32 valued version
* downstream we were getting duplicative lemmas about both
* so lets reduce the API surface area!

If anyone would prefer the remaining function is still called
`Char.utf8Size` I will happily change it. (`size` is hopefully still
unambiguous, and it's helpful to rename here so we can give a
deprecation warning that explains the type signature change.)

---------

Co-authored-by: Mac Malone <tydeu@hatpress.net>
2024-06-11 02:51:18 +00:00
Kim Morrison
56adfb856d
chore: upstream basic String lemmas (#4354) 2024-06-05 21:28:43 +00:00
Kyle Miller
a7338c5ad8
feat: make frontend normalize line endings to LF (#3903)
To eliminate parsing differences between Windows and other platforms,
the frontend now normalizes all CRLF line endings to LF, like [in
Rust](https://github.com/rust-lang/rust/issues/62865).

Effects:
- This makes Lake hashes be faithful to what Lean sees (Lake already
normalizes line endings before computing hashes).
- Docstrings now have normalized line endings. In particular, this fixes
`#guard_msgs` failing multiline tests for Windows users using CRLF.
- Now strings don't have different lengths depending on the platform.
Before this PR, the following theorem is true for LF and false for CRLF
files.
```lean
example : "
".length = 1 := rfl
```

Note: the normalization will take `\r\r\n` and turn it into `\r\n`. In
the elaborator, we reject loose `\r`'s that appear in whitespace. Rust
instead takes the approach of making the normalization routine fail.
They do this so that there's no downstream confusion about any `\r\n`
that appears.

Implementation note: the LSP maintains its own copy of a source file
that it updates when edit operations are applied. We are assuming that
edit operations never split or join CRLFs. If this assumption is not
correct, then the LSP copy of a source file can become slightly out of
sync. If this is an issue, there is some discussion
[here](https://github.com/leanprover/lean4/pull/3903#discussion_r1592930085).
2024-05-20 17:13:08 +00:00
Joachim Breitner
e2983e44ef
perf: use with_reducible in special-purpose decreasing_trivial macros (#3991)
Because of the last-added-tried-first rule for macros, all the special
purpose `decreasing_trivial` rules are tried for most recursive
definitions out there, and because they use `apply` and `assumption`
with default transparency may cause some definitoins to be unfolded over
and over again.

A quick test with one of the functions in the leansat project shows that
elaboration time goes down from 600ms to 375ms when using
```
decreasing_by all_goals decreasing_with with_reducible decreasing_trivial
```
instead of
```
decreasing_by all_goals decreasing_with decreasing_trivial
```

This change uses `with_reducible` in most of these macros.

This means that these tactics will no longer work when the
relations/definitions they look for is hidden behind a definition.
This affected in particular `Array.sizeOf_get`, which now has a
companion `sizeOf_getElem`.

In addition, there were three tactics using `apply` to apply Nat-related
lemmas
that we now expect `omega` to solve. We still need them when building
`Init` modules
that don’t have access to `omega`, but they now live in
`decreasing_trivial_pre_omega`,
meant to be only used internally.
2024-04-29 15:12:27 +00:00
Mario Carneiro
70a23945bf
feat: add model implementation for UTF8 enc/dec (#3961)
- [x] Depends on: #3958 
- [x] Depends on: #3960

This makes the UTF-8 encode and decode functions have lean definitions,
so that we can prove properties about them downstream.
2024-04-22 10:24:53 +00:00
Mario Carneiro
62cdb51ed5
feat: UTF-8 string validation (#3958)
Previously, there was a function `opaque fromUTF8Unchecked : ByteArray
-> String` which would convert a list of bytes into a string, but as the
name implies it does not validate that the string is UTF-8 before doing
so and as a result it produces unsound results in the compiler (because
the lean model of `String` indirectly asserts UTF-8 validity). This PR
replaces that function by
```lean
opaque validateUTF8 (a : @& ByteArray) : Bool

opaque fromUTF8 (a : @& ByteArray) (h : validateUTF8 a) : String
```
so that while the function is still "unchecked", we have a proof witness
that the string is valid. To recover the original, actually unchecked
version, use `lcProof` or other unsafe methods to produce the proof
witness.

Because this was the only `ByteArray -> String` conversion function, it
was used in several places in an unsound way (e.g. reading untrusted
input from IO and treating it as UTF-8). These have been replaced by
`fromUTF8?` or `fromUTF8!` as appropriate.
2024-04-20 18:36:37 +00:00
Scott Morrison
94d6286e5a
chore: reorganising to reduce imports (#3790)
[Before](https://github.com/leanprover/lean4/files/14772220/oi.pdf) and
[after](https://github.com/leanprover/lean4/files/14772226/oi2.pdf).

This gets `ByteArray`, `String.Extra`, `ToString.Macro` and `RCases` out
of the imports of `omega`. I'd hoped to get `Array.Subarray` too, but
it's tangled up in the list literal syntax. Further progress could come
from make `split` use available `Decidable` instances, so we could pull
out `Classical` (and possibly some of `PropLemmas`).
2024-03-27 11:15:01 +00:00
Joachim Breitner
23d3ac4760
refactor: reduced unsed imports (#3464) 2024-02-22 18:12:57 +00:00
Joe Hendrix
29244f32f6
chore: upstream solve_by_elim (#3408)
This upstreams the solve_by_elim tactic from Std.

It is a key tactic needed by library_search.
2024-02-21 01:16:04 +00:00
Joachim Breitner
b2d668c340
perf: Use flat ByteArrays in Trie (#2529) 2023-09-20 13:22:37 +02:00
Mario Carneiro
aa60791db3 feat: remove partial in Init.Data.String.Basic 2023-06-05 15:50:11 -07:00
E.W.Ayers
4ea4365354 doc: various String docstrings 2022-08-26 20:49:57 -07:00
Leonardo de Moura
eafd2a88ce chore: simplify Prelude.lean and Core.lean using elabAsElim 2022-07-29 18:13:56 -07:00
Leonardo de Moura
c341d8432f feat: remove leading spaces from docstrings 2022-07-18 22:18:15 -04:00
Leonardo de Moura
02c4e548df feat: replace constant with opaque 2022-06-14 17:02:59 -07:00
Leonardo de Moura
041827bed5 chore: unused variables 2022-06-07 17:54:10 -07:00
Leonardo de Moura
cae59c6916 chore: remove staging workarounds 2022-04-26 08:23:43 -07:00
Leonardo de Moura
6af1da450e feat: disable only eta for classes during TC resolution
closes #1123
2022-04-26 08:20:39 -07:00
Leonardo de Moura
e3dcce5320 chore: remove temporary workarounds 2022-04-09 12:13:37 -07:00