Commit graph

252 commits

Author SHA1 Message Date
Markus Himmel
6cbaada1bf
feat: verification of String.positions, String.chars, String.revPositions, String.revChars, ForIn m String Char (#12456)
This PR verifies all of the `String` iterators except for the bytes
iterator by relating them to `String.toList`.

Along the way we define `String.posLE` and `String.posLT` analogously to
`String.posGE` and `String.posGT` and redefine `String.prev` to go
through `String.posLT`.

We also define and verify `String.positionsFrom` and
`String.revPositionsFrom`, which are the obvious generaliziations of
`String.positions` and `String.revPositions` starting at a positions
other than the start/end.

Finally, we get various lemmas about strings and positions, including
some nice induction principles `String.Pos.next_induction` and
`String.Pos.prev_induction`.

Of course, we also have all of the analogous results for `String.Slice`.
2026-02-12 15:32:44 +00:00
Markus Himmel
01173b195f
chore: move string iteration to a new file (#12450)
This PR moves the `String.Slice`/`String` iterators out into their own
file, in preparation for verification.
2026-02-12 06:56:53 +00:00
Markus Himmel
7b29425361
chore: simplify Char.toString to String.singleton (#12449)
This PR marks `String.toString_eq_singleton` as a `simp` lemma.
2026-02-12 06:10:36 +00:00
Markus Himmel
6f51ec27ed
feat: verification of String.Slice.splitToSublice (#12437)
This PR verifies the `String.Slice.splitToSubslice` function by relating
it to a model implementation `Model.split` based on a
`ForwardPatternModel`.

The proof is generic, so it works for splitting by characters, strings
etc.

From this, we will be able to give user-facing API lemmas for
`String.split` and friends in future PRs.

We also move the verification of string patterns from
`String.Slice.Pattern` to `String.Slice.Pattern.Model` to achieve better
separation between code that users run in their programs and code that
only supports the theory.
2026-02-11 14:29:26 +00:00
Markus Himmel
61cef96cd7
feat: verification of our KMP implementation (#12424)
This PR gives a proof of `LawfulToForwardSearcherModel` for `Slice`
patterns, which amounts to proving that our implementation of KMP is
correct.

Note that this PR also changes the KMP implementation to make it
slightly more efficient and easier to verify. I also have a correctness
proof for the old implementation, so there were no bugs in the old
implementation.
2026-02-11 08:20:20 +00:00
Markus Himmel
5ba467d920
feat: LawfulForwardPatternModel for string patterns (#12360)
This PR provides a `LawfulForwardPatternModel` instance for string
patterns, i.e., it proves correctness of the `dropPrefix?` and
`startsWith` functions for string patterns.

Note that this is "just" the correctness proof; there isn't a way to
actually use it yet. API lemmas will follow.
2026-02-09 09:27:47 +00:00
Markus Himmel
36dd57aa06
feat: verification of character (predicate) patterns (#12349)
This PR builds on #12333 and proves that `Char` and `Char -> Bool`
patterns are lawful.
2026-02-06 14:04:07 +00:00
Markus Himmel
57a6d89f57
feat: verification of BEq String.Slice (#12346)
This PR shows `s == t ↔ s.copy = t.copy` for `s t : String.Slice` and
establishes the right-hand side as the simp normal form.
2026-02-06 11:56:01 +00:00
Markus Himmel
52bc216351
feat: basic infrastructure for verification of string patterns (#12333)
This PR adds the basic typeclasses that will be used in the verification
of our string searching infrastructure.
2026-02-05 16:37:50 +00:00
Markus Himmel
f82c40857b
feat: String.Slice.Subslice (#12322)
This PR adds `String.Slice.Subslice`, which is an unbundled version of
`String.Slice`.

This type is of interest because it is the correct type for string
searching and splitting operations to land in.

This PR just adds the type with minimal API. Additional API and
subsequent refactoring of the searching and splitting API is left for
future PRs.
2026-02-05 10:09:04 +00:00
Sebastian Ullrich
b4d4e371d2
chore: shake core (#12276) 2026-02-05 09:10:32 +00:00
Markus Himmel
54cba90dc5
refactor: derive string searcher from string pattern (#12312)
This PR reverses the relationship between the `ForwardPattern` and
`ToForwardSearcher` classes.

Previously, it was possible to derive `ForwardPattern` (i.e.,
`dropPrefix?`) from `ToForwardSearcher` (i.e., get an iterator of
`SearchStep (s)`). Now, we give the default instance in the other
direction: it is now possible to derive `ToForwardSearcher` from
`ForwardPattern`. Since it is usually much easier to provide
`ForwardPattern` than `ToForwardSearcher`, this means more shared code,
which pays off double since we will give a correctness proof for the
default implementation in an upcoming PR.

This PR also adds some string lemmas.
2026-02-05 07:38:31 +00:00
Kim Morrison
d49e5d8a3d Revert "chore: temporarily disable proofs for bootstrap"
This reverts commit c56a5732a5a215f7b74d3f7a5cefd8612cf50474.
2026-02-05 13:41:34 +11:00
Kim Morrison
7b12b504df chore: temporarily disable proofs for bootstrap
This adds `set_option debug.byAsSorry true` and `decreasing_by sorry` to
various files to allow bootstrapping with Config structure changes. These
changes will be restored after the bootstrap dance is complete.
2026-02-05 13:41:34 +11:00
Markus Himmel
8a9cb6def0
feat: Slice.posGE and Slice.posGT (#12301)
This PR introduces the functions `(String|Slice).posGE` and
`(String|Slice).posGT` will full verification and deprecates
`Slice.findNextPos` in favor of `Slice.posGT`.

The KMP implementation is adapted to use these two new functions.

Various useful string and order lemmas are added along the way.

Also add a `simp` attribute to `Std.le_refl` and fix the resulting
fallout (yes, this would have been better as a separate PR).
2026-02-04 09:45:44 +00:00
Paul Reichert
e7b6bd6734
refactor: rename Iter(M).count to Iter(M).length (#12210)
This PR renames `Iter(M).count` to `Iter(M).length` and updates lots of
lemmas, adding deprecations.
2026-01-29 07:26:13 +00:00
Markus Himmel
ba0e755adc
feat: Std.Iter.first? (#12162)
This PR adds the function `Std.Iter.first?` and proves the specification
lemma `Std.Iter.first?_eq_match_step` if the iterator is productive.

The monadic variant on `Std.IterM` is also provided.

We use this new function to fix the default implementation for
`startsWith` and `dropPrefix` on `String` patterns, which used to fail
if the searcher returned a `skip` at the beginning. None of the patterns
we ship out of the box were affected by this, but user-defined patterns
were vulnerable.

---------

Co-authored-by: Paul Reichert <6992158+datokrat@users.noreply.github.com>
2026-01-27 12:10:16 +00:00
Joachim Breitner
9167b13afa
refactor: move String.ofList to the Prelude (#12029)
This PR moves `String.ofList` to `Init.Prelude`. It is a function that
the Lean kernel expects to be present and has special support for (when
reducing string literals). By moving this to `Init.Prelude`, all
declarations that are special to the kernel are in that single module.
2026-01-19 08:22:13 +00:00
Alok Singh
4c360d50fa
style: fix typos in Init/ and Std/ docstrings (#11864)
Typos in `Init/` and `Std/`.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 07:24:07 +00:00
Henrik Böving
4eb5b5776d
perf: inline IsUTF8FirstByte (#11872)
This PR marks IsUTF8FirstByte as inline.

I have a use case where it shows up significantly in the profile.
2026-01-02 11:21:54 +00:00
Paul Reichert
1590a72913
feat: make FinitenessRelation part of the public API (#11789)
This PR makes the `FinitenessRelation` structure, which is helpful when
proving the finiteness of iterators, part of the public API. Previously,
it was marked internal and experimental.
2025-12-29 20:45:41 +00:00
Paul Reichert
5ef0207a85
refactor: remove IteratorCollect (#11706)
This PR removes the `IteratorCollect` type class and hereby simplifies
the iterator API. Its limited advantages did not justify the complexity
cost.
2025-12-17 23:02:33 +00:00
Paul Reichert
c79d74d9a1
refactor: move Iter and others from Std.Iterators to Std (#11446)
This PR moves many constants of the iterator API from `Std.Iterators` to
the `Std` namespace in order to make them more convenient to use. These
constants include, but are not limited to, `Iter`, `IterM` and
`IteratorLoop`. This is a breaking change. If something breaks, try
adding `open Std` in order to make these constants available again. If
some constants in the `Std.Iterators` namespace cannot be found, they
can be found directly in `Std` now.
2025-12-15 08:24:12 +00:00
Robert J. Simmons
f88e503f3d
feat: @[suggest_for] annotations for prompting easy-to-miss names (#11554)
This PR adds `@[suggest_for]` annotations to Lean, allowing lean to
provide corrections for `.every` or `.some` methods in place of `.all`
or `.any` methods for most default-imported types (arrays, lists,
strings, substrings, and subarrays, and vectors).

Due to the need for stage0 updates for new annotations, the
`suggest_for` annotation itself was introduced in previous PRs: #11367,
#11529, and #11590.

## Example
```
example := "abc".every (! ·.isWhitespace)
```

Error message:
```
Invalid field `every`: The environment does not contain `String.every`, so it is not possible to project the field `every` from an expression
  "abc"
of type `String`

Hint: Perhaps you meant `String.all` in place of `String.every`:
  .e̵v̵e̵r̵y̵a̲l̲l̲
```

(the hint is added by this PR)

## Additional changes

Adds suggestions that are not currently active but that can be used to
generate autocompletion suggestions in the reference manual:
 - `Either` -> `Except` and `Sum`
 - `Exception` -> `Except`
 - `ℕ` -> `Nat`
 - `Nullable` -> `Option` 
 - `Maybe` -> `Option`
 - `Optional` -> `Option`
 - `Result` -> `Except`
2025-12-10 22:50:45 +00:00
Paul Reichert
383c0caa91
feat: remove Finite conditions from iterator consumers relying on a new fixpoint combinator (#11038)
This PR introduces a new fixpoint combinator,
`WellFounded.extrinsicFix`. A termination proof, if provided at all, can
be given extrinsically, i.e., looking at the term from the outside, and
is only required if one intends to formally verify the behavior of the
fixpoint. The new combinator is then applied to the iterator API.
Consumers such as `toList` or `ForIn` no longer require a proof that the
underlying iterator is finite. If one wants to ensure the termination of
them intrinsically, there are strictly terminating variants available
as, for example, `it.ensureTermination.toList` instead of `it.toList`.
2025-12-08 16:03:22 +00:00
Henrik Böving
e11800d3c8
perf: annotate built-in functions with tagged_return (#11549)
This PR annotates built-in `extern` functions with `tagged_return`.
2025-12-08 13:10:55 +00:00
Kim Morrison
6cbcbce750
feat: support underscores in String.toNat? and String.toInt? (#11541)
This PR adds support for underscores as digit separators in
String.toNat?, String.toInt?, and related parsing functions. This makes
the string parsing functions consistent with Lean's numeric literal
syntax, which already supports underscores for readability (e.g.,
100_000_000).

The implementation validates that underscores:
- Cannot appear at the start or end of the number
- Cannot appear consecutively
- Are ignored when calculating the numeric value

This resolves a common source of friction when parsing user input from
command-line arguments, environment variables, or configuration files,
where users naturally expect to use the same numeric syntax they use in
source code.

## Examples

Before:
```lean
#eval "100_000_000".toNat?  -- none
```

After:
```lean
#eval "100_000_000".toNat?  -- some 100000000
#eval "1_000".toInt?        -- some 1000
#eval "-1_000_000".toInt?   -- some (-1000000)
```

## Testing

Added comprehensive tests in
`tests/lean/run/string_toNat_underscores.lean` covering:
- Basic underscore support
- Edge cases (leading/trailing/consecutive underscores)
- Both `toNat?` and `toInt?` functions
- String, Slice, and Substring types

All existing tests continue to pass.

Closes #11538

🤖 Prepared with Claude Code

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-08 03:57:55 +00:00
Tom Levy
2ca3bc2859
chore: fix spelling (#11531)
Hi, these are just some spelling corrections.

There is one I wasn't completely sure about in
src/Init/Data/List/Lemmas.lean:

> See also
> ...
> Also
> \* \`Init.Data.List.Monadic\` for **addiation** _(additional?)_ lemmas
about \`List.mapM\` and \`List.forM\`
2025-12-06 13:54:27 +00:00
Joachim Breitner
d4c832ecb0
perf: de-fuel some recursive definitions in Core (#11416)
This PR follows up on #7965 and avoids manual fuel constructions in some
recursive definitions.
2025-12-05 16:16:31 +00:00
Markus Himmel
e548fa414c
fix: Char -> Bool as default instance for string search (#11503)
This PR marks `Char -> Bool` patterns as default instances for string
search. This means that things like `" ".find (·.isWhitespace)` can now
be elaborated without error.

Previously, it was necessary to write `" ".find Char.isWhitespace`.

Thank you to David Christiansen for the idea of using a default
instance.
2025-12-04 09:25:16 +00:00
Markus Himmel
1ae680c5e2
chore: minor String API improvements (#11439)
This PR performs minor maintenance on the String API

- Rename `String.Pos.toCopy` to `String.Pos.copy` to adhere to the
naming convention
- Rename `String.Pos.extract` to `String.extract` to get sane dot
notation again
- Add `String.Slice.Pos.extract`
2025-12-01 11:44:14 +00:00
David Thrane Christiansen
34adc4d941
doc: add missing docstrings (#11364)
This PR adds missing docstrings for constants that occur in the
reference manual.

---------

Co-authored-by: Johannes Tantow <44068763+jt0202@users.noreply.github.com>
2025-11-26 15:00:50 +00:00
Markus Himmel
5fb25fff06
feat: grind instances for String.Pos and variants (#11384)
This PR adds the necessary instances for `grind` to reason about
`String.Pos.Raw`, `String.Pos` and `String.Slice.Pos`.
2025-11-26 13:59:01 +00:00
Markus Himmel
d8913f88dc
feat: move String positions between slices (#11380)
This PR renames `String.Slice.Pos.ofSlice` to `String.Pos.ofToSlice` to
adhere with the (yet-to-be documented) naming convention for mapping
positions to positions. It then adds several new functions so that for
every way to construct a slice from a string and slice, there are now
functions for mapping positions forwards and backwards along this
construction.
2025-11-26 11:48:59 +00:00
Markus Himmel
5a5f8c4c2e
perf: unbundle needle from char/pred pattern (#11376)
This PR aims to improve the performance of `String.contains`,
`String.find`, etc. when using patterns of type `Char` or `Char -> Bool`
by moving the needle out of the iterator state and thus working around
missing unboxing in the compiler.
2025-11-26 07:30:29 +00:00
Markus Himmel
85d7f3321c
feat: String.Slice.toInt? (#11358)
This PR adds `String.Slice.toInt?` and variants.

Closes #11275.
2025-11-25 15:48:41 +00:00
Markus Himmel
d99c515b16
refactor: String functions foldr, all, any, contains to go trough String.Slice (#11357)
This PR updates the `foldr`, `all`, `any` and `contains` functions on
`String` to be defined in terms of their `String.Slice` counterparts.

This is the last one in a long series of PRs. After this, all `String`
operations are polymorphic in the pattern, and no `String` operation
falls back to `String.Pos.Raw` internally (except those in the
`String.Pos.Raw` and `String.Substring.Raw` namespaces of course, which
still play a role in metaprogramming and will stay for the foreseeable
future).
2025-11-25 15:42:43 +00:00
Markus Himmel
29ac158fcf
feat: String.Pos.le_find (#11354)
This PR adds simple lemmas that show that searching from a position in a
string returns something that is at least that position.
2025-11-25 11:05:58 +00:00
Markus Himmel
151c034f4f
refactor: rename String.bytes to String.toByteArray (#11343)
This PR renames `String.bytes` to `String.toByteArray`.

This is for two reasons: first, `toByteArray` is a better name, and
second, we have something else that wants to use the name `bytes`,
namely the function that returns in iterator over the string's bytes.
2025-11-24 18:59:49 +00:00
Markus Himmel
96c4b9ee4d
feat: coercion from String to String.Slice (#11341)
This PR adds a coercion from `String` to `String.Slice`.

In our envisioned future, most functions operating on strings will
accept `String.Slice` parameters by default (like `str` in Rust), and
this enables calling such functions with arguments of type `String`.

Closes #11298.
2025-11-24 16:50:08 +00:00
Markus Himmel
fa67f300f6
chore: rename String.ValidPos to String.Pos (#11240)
This PR renames `String.ValidPos` to `String.Pos`, `String.endValidPos`
to `String.endPos` and `String.startValidPos` to `String.startPos`.

Accordingly, the deprecations of `String.Pos` to `String.Pos.Raw` and
`String.endPos` to `String.rawEndPos` are removed early, after an
abbreviated deprecation cycle of two releases.
2025-11-24 16:40:21 +00:00
Markus Himmel
e6a07ca6b1
refactor: deprecate String.posOf and variants in favor of unified String.find (#11276)
This PR cleans up the API around `String.find` and moves it uniformly to
the new position types `String.ValidPos` and `String.Slice.Pos`

Overview:

- To search for a character, character predicate, string or slice in a
string or slice `s`, use `s.find?` or `s.find`.
- To do the same, but starting at a position `p` of a string or slice,
use `p.find?` or `p.find`.
- To do the same but between two positions `p` and `q`, construct the
slice from `p` to `q` and then use `find?` or `find` on that.
- To search backwards, all of the above applies, except that the
function is called `revFind?`, there is no non-question-mark version
(use `getD` if there is a sane default return value in your specific
application), and that you can only search for characters and character
predicates, not strings or slices.
2025-11-23 18:39:53 +00:00
Markus Himmel
fba166eea0
chore: expose more String.Slice functions on String (#11308)
This PR redefines `front` and `back` on `String` to go through
`String.Slice` and adds the new `String` functions `front?`, `back?`,
`positions`, `chars`, `revPositions`, `revChars`, `byteIterator`,
`revBytes`, `lines`.
2025-11-23 15:33:16 +00:00
Markus Himmel
dda6885eae
refactor: String.foldl and String.isNat go through String.Slice (#11289)
This PR redefines `String.foldl`, `String.isNat` to use their
`String.Slice` counterparts.
2025-11-21 11:17:50 +00:00
Markus Himmel
51b67385cc
refactor: better name for String.replaceStart and variants (#11290)
This PR renames `String.replaceStartEnd` to `String.slice`,
`String.replaceStart` to `String.sliceFrom`, and `String.replaceEnd` to
`String.sliceTo`, and similar for the corresponding functions on
`String.Slice`.
2025-11-20 16:42:27 +00:00
Markus Himmel
7267ed707a
feat: string patterns for decidable predicates on Char (#11285)
This PR adds `Std.Slice.Pattern` instances for `p : Char -> Prop` as
long as `DecidablePred p`, to allow things like `"hello".dropWhile (· =
'h')`.

To achieve this, we refactor `ForwardPattern` and friends to be
"non-uniform", i.e., the class is now `ForwardPattern pat`, not
`ForwardPattern ρ` (where `pat : ρ`).
2025-11-20 15:30:37 +00:00
Markus Himmel
f7ed158002
chore: introduce and immediately deprecate String.Slice.length (#11286)
This PR adds a function `String.Slice.length`, with the following
deprecation string: There is no constant-time length function on slices.
Use `s.positions.count` instead, or `isEmpty` if you only need to know
whether the slice is empty.
2025-11-20 14:31:46 +00:00
Markus Himmel
cf0e4441e8
chore: create alias String.Slice.any for String.Slice.contains (#11282)
This PR adds the alias `String.Slice.any` for `String.Slice.contains`.

It would probably be even better to only have one, but we don't have a
good mechanism for pointing people looking for one towards the other, so
an alias it is for now.
2025-11-20 13:21:30 +00:00
Markus Himmel
2c12bc9fdf
chore: more deprecations for string migration (#11281)
This PR adds a few deprecations for functions that never existed but
that are still helpful for people migrating their code post-#11180.
2025-11-20 13:09:52 +00:00
Henrik Böving
827a96ade3
fix: several memory leaks in the new String API (#11263)
This PR fixes several memory leaks in the new `String` API.

These leaks are mostly situations where we forgot to put borrowing
annotations. The single
exception is the new `String` constructor `ofByteArray`. It cannot take
the `ByteArray` as
a borrowed argument anymore and must thus free it on its own.
2025-11-19 18:23:35 +00:00