Commit graph

210 commits

Author SHA1 Message Date
Markus Himmel
fba166eea0
chore: expose more String.Slice functions on String (#11308)
This PR redefines `front` and `back` on `String` to go through
`String.Slice` and adds the new `String` functions `front?`, `back?`,
`positions`, `chars`, `revPositions`, `revChars`, `byteIterator`,
`revBytes`, `lines`.
2025-11-23 15:33:16 +00:00
Markus Himmel
dda6885eae
refactor: String.foldl and String.isNat go through String.Slice (#11289)
This PR redefines `String.foldl`, `String.isNat` to use their
`String.Slice` counterparts.
2025-11-21 11:17:50 +00:00
Markus Himmel
51b67385cc
refactor: better name for String.replaceStart and variants (#11290)
This PR renames `String.replaceStartEnd` to `String.slice`,
`String.replaceStart` to `String.sliceFrom`, and `String.replaceEnd` to
`String.sliceTo`, and similar for the corresponding functions on
`String.Slice`.
2025-11-20 16:42:27 +00:00
Markus Himmel
7267ed707a
feat: string patterns for decidable predicates on Char (#11285)
This PR adds `Std.Slice.Pattern` instances for `p : Char -> Prop` as
long as `DecidablePred p`, to allow things like `"hello".dropWhile (· =
'h')`.

To achieve this, we refactor `ForwardPattern` and friends to be
"non-uniform", i.e., the class is now `ForwardPattern pat`, not
`ForwardPattern ρ` (where `pat : ρ`).
2025-11-20 15:30:37 +00:00
Markus Himmel
f7ed158002
chore: introduce and immediately deprecate String.Slice.length (#11286)
This PR adds a function `String.Slice.length`, with the following
deprecation string: There is no constant-time length function on slices.
Use `s.positions.count` instead, or `isEmpty` if you only need to know
whether the slice is empty.
2025-11-20 14:31:46 +00:00
Markus Himmel
cf0e4441e8
chore: create alias String.Slice.any for String.Slice.contains (#11282)
This PR adds the alias `String.Slice.any` for `String.Slice.contains`.

It would probably be even better to only have one, but we don't have a
good mechanism for pointing people looking for one towards the other, so
an alias it is for now.
2025-11-20 13:21:30 +00:00
Markus Himmel
2c12bc9fdf
chore: more deprecations for string migration (#11281)
This PR adds a few deprecations for functions that never existed but
that are still helpful for people migrating their code post-#11180.
2025-11-20 13:09:52 +00:00
Henrik Böving
827a96ade3
fix: several memory leaks in the new String API (#11263)
This PR fixes several memory leaks in the new `String` API.

These leaks are mostly situations where we forgot to put borrowing
annotations. The single
exception is the new `String` constructor `ofByteArray`. It cannot take
the `ByteArray` as
a borrowed argument anymore and must thus free it on its own.
2025-11-19 18:23:35 +00:00
Henrik Böving
52b687cab4
perf: less allocations when using string patterns (#11255)
This PR reduces the allocations when using string patterns. In
particular
`startsWith`, `dropPrefix?`, `endsWith`, `dropSuffix?` are optimized.
2025-11-19 13:06:27 +00:00
Markus Himmel
52d05b6972
refactor: use String.split instead of String.splitOn or String.splitToList (#11250)
This PR introduces a function `String.split` which is based on
`String.Slice.split` and therefore supports all pattern types and
returns a `Std.Iter String.Slice`.

This supersedes the functions `String.splitOn` and `String.splitToList`,
and we remove all all uses of these functions from core. They will be
deprecated in a future PR.

Migrating from `String.splitOn` and `String.splitToList` is easy: we
introduce functions `Iter.toStringList` and `Iter.toStringArray` that
can be used to conveniently go from `Std.Iter String.Slice` to `List
String` and `Array String`, so for example `s.splitOn "foo"` can be
replaced by `s.split "foo" |>.toStringList`.
2025-11-19 09:35:19 +00:00
Markus Himmel
59949f89ee
chore: add function String.Pos.extract (#11251)
This PR is a preparatory bootstrapping PR for #11240.
2025-11-19 08:05:28 +00:00
Markus Himmel
fa5d08b7de
refactor: use String.Slice in String.take and variants (#11180)
This PR redefines `String.take` and variants to operate on
`String.Slice`. While previously functions returning a substring of the
input sometimes returned `String` and sometimes returned
`Substring.Raw`, they now uniformly return `String.Slice`.

This is a BREAKING change, because many functions now have a different
return type. So for example, if `s` is a string and `f` is a function
accepting a string, `f (s.drop 1)` will no longer compile because
`s.drop 1` is a `String.Slice`. To fix this, insert a call to `copy` to
restore the old behavior: `f (s.drop 1).copy`.

Of course, in many cases, there will be more efficient options. For
example, don't write `f <| s.drop 1 |>.copy |>.dropEnd 1 |>.copy`, write
`f <| s.drop 1 |>.dropEnd 1 |>.copy` instead. Also, instead of `(s.drop
1).copy = "Hello"`, write `s.drop 1 == "Hello".toSlice` instead.
2025-11-18 16:13:48 +00:00
Markus Himmel
03eb2f73ac
chore: deprecate String.toSubstring (#11232)
This PR deprecates `String.toSubstring` in favor of
`String.toRawSubstring` (cf. #11154).
2025-11-18 13:50:50 +00:00
Wrenna Robson
36a6844625
feat: add Std.Trichotomous (#10945)
This PR adds `Std.Tricho r`, a typeclass for relations which identifies
them as trichotomous. This is preferred to `Std.Antisymm (¬ r · ·)` in
all cases (which it is equivalent to).
2025-11-18 13:20:53 +00:00
Markus Himmel
e301f86c6c
chore: add String.Pos.next (#11238)
This PR is split from a future PR and adds the function
`String.Pos.next`, an alias (and soon to be correct name) of
`String.ValidPos.next`.

This is for boring bootstrapping reasons.
2025-11-18 10:41:22 +00:00
Markus Himmel
f6a9059709
chore: rename String.offsetOfPos to String.Pos.Raw.offsetOfPos (#11218)
This PR renames `String.offsetOfPos` to `String.Pos.Raw.offsetOfPos` to
align with the other `String.Pos.Raw` operations.
2025-11-18 07:24:06 +00:00
Markus Himmel
bf60550ce5
chore: rename Substring to Substring.Raw (#11154)
This PR renames `Substring`  to `Substring.Raw`.

This is to signify its status as a second-class citizen (not deprecated,
but no real plans for verification, like `String.Pos.Raw`) and to free
up the name `Substring` for a possible future type `String.Substring :
String -> Type` so that `s.Substring` is the type of substrings of `s`.

The functions `String.toSubstring` and `String.toSubstring'` will remain
for now for bootstrapping reasons.
2025-11-16 09:30:04 +00:00
Markus Himmel
aca297d1c5
chore: some String API cleanup in Lake.Util.Version (#11160)
This PR performs some cleanup in `Lake.Util.Version`.

---------

Co-authored-by: Mac Malone <tydeu@hatpress.net>
2025-11-14 08:56:56 +00:00
Markus Himmel
eb01aaeee4
chore: rename String.Iterator to String.Legacy.Iterator (#11152)
This PR renames `String.Iterator` to `String.Legacy.Iterator`.

From the docstring of `String.Legacy.Iterator`:

> This is a no-longer-supported legacy API that will be removed in a
future release. You should use
> `String.ValidPos` instead, which is similar, but safer. To iterate
over a string `s`, start with
> `p : s.startValidPos`, advance it using `p.next`, access the current
character using `p.get` and
> check if the position is at the end using `p = s.endValidPos` or
`p.IsAtEnd`.
2025-11-13 13:46:22 +00:00
Markus Himmel
f1224277e2
perf: improve performance of String.ValidPos (#11142)
This PR aims to bring the performance of `String.ValidPos` closer to
that of `String.Pos.Raw` by adding/correcting `extern` annotations as
needed.

This is in response to a regression observed after #11127. The changes
to the `String` `Parsec` module lead to different compiler behavior for
functions like `strCore` and `natCore`. The new IR *looks* better than
the old IR, but the
[numbers](1e438647ba)
are a bit mixed.
2025-11-11 15:30:47 +00:00
Markus Himmel
2c2fcff4f8
refactor: do not use String.Iterator (#11127)
This PR removes all uses of `String.Iterator` from core, preferring
`String.ValidPos` instead.

In an upcoming PR, `String.Iterator` will be renamed to
`String.Legacy.Iterator`.
2025-11-11 11:46:58 +00:00
Markus Himmel
d24ece1396
feat: String.toList_map (#11021)
This PR adds more theory about `Splits` for strings and deduces the
first user-facing `String` lemma, `String.toList_map`.
2025-11-01 13:54:39 +00:00
Markus Himmel
377f149862
refactor: use String.ofList and String.toList for String <-> List Char conversion (#11017)
This PR establishes `String.ofList` and `String.toList` as the preferred
method for converting between strings and lists of characters and
deprecates the alternatives `String.mk`, `List.asString` and
`String.data`.
2025-10-31 14:41:23 +00:00
Markus Himmel
5af12df54b
chore: add String.ofList redefine String.toList (#11016)
This PR ensures that `String.toList` and `String.ofList` exist and have
the right `extern` annotations.
2025-10-30 07:07:12 +00:00
Markus Himmel
167429501b
refactor: redefine String.replace (#10986)
This PR defines `String.Slice.replace` and redefines `String.replace` to
use the `Slice` version.

The new implementation is generic in the pattern, so it supports things
like `"education".replace isVowel "☃!" = "☃!d☃!c☃!t☃!☃!n"`. Since it
uses the `ForwardSearcher` infrastructure, `String` patterns are
searched using KMP, unlike the previous implementation which had
quadratic runtime. As a side effect, the behavior when replacing an
empty string now matches that of most other programming languages,
namely `"abc".replace "" "k" = "kakbkck"`.
2025-10-29 07:48:33 +00:00
Markus Himmel
106b0fa661
fix: KMP implementation (#10998)
This PR fixes the KMP implementation, which did incorrect bookkeeping of
the backtracking process, leading to incorrect starting ranges of
matches.

The new implementation does not require `partial` anywhere.
2025-10-29 06:04:45 +00:00
Kim Morrison
335e34df19
chore: add deprecations for duplicated theorems (#10967) 2025-10-29 05:26:16 +00:00
Markus Himmel
d2f76ade61
fix: search for empty string (#10985)
This PR ensures that searching for an empty string returns the expected
pattern of alternating size-zero matches and size-one rejects.

In particular, splitting by an empty string returns an array formed of
the empty string, all of the string's characters as singleton strings,
followed by another empty string. This matches the [Rust
behavior](https://doc.rust-lang.org/std/primitive.str.html#method.split),
for example.
2025-10-27 13:05:33 +00:00
Markus Himmel
8fe260de55
feat: termination arguments for String.ValidPos and String.Slice.Pos (#10933)
This PR adds the basic infrastructure to perform termination proofs
about `String.ValidPos` and `String.Slice.Pos`.

We choose approach where the intended way to do termination arguments is
to argue about the position itself rather than some projection of it
like `remainingBytes`.

The types `String.ValidPos` and `String.Slice.Pos` are equipped with a
`WellFoundedRelation` instance given by the greater-than relation. This
means that if a function takes a position `p` and performs a recursive
call on `q`, then the decreasing obligation will be `p < q`. This works
well in the common case where `q` is `p.next h`, in which case the goal
`p < p.next h` is solved by the simplifier.

For stepping through a string backwards, we introduce a type synonym
with a `WellFoundedRelation` instance given by the less-than relation.
This means that if a function takes a position `p` and performs a
recursive call on `q` and specifies `termination_by p.down`, then the
decreasing obligation will be `q < p`. This works well in the case where
`q` is `p.prev h`, in which case the goal `p.prev h < p` is solved by
the simplifier.

For termination arguments invoving multiple strings, the lower-level
primitive `p.remainingBytes` (landing in `Nat`) is also available.

In a future PR, we will additionally provide the necessary typeclasses
instances to register `String.ValidPos` and `String.Slice.Pos` with
`grind` to make complex termination arguments more convenient in user
code.
2025-10-27 10:05:44 +00:00
Kim Morrison
a0e742be5e
chore: >6 month old deprecations (#10969) 2025-10-26 22:48:41 +00:00
Markus Himmel
59573646c2
chore: more minor String improvements (#10930)
This PR moves some more material out of `Init.Data.String.Basic` and
fixes the incorrect name
`String.Pos.Raw.IsValidForSlice.le_utf8ByteSize`.
2025-10-23 13:57:23 +00:00
Markus Himmel
ba7798b389
chore: more reorganization of strings (#10928)
This PR splits more material out of `Init.Data.String.Basic`.
2025-10-23 11:56:11 +00:00
Rob23oba
fad0e69cc7
fix: make name mangling unambiguous (#10727)
This PR fixes name mangling to be unambiguous / injective by adding `00`
for disambiguation where necessary. Additionally, the inverse function,
`Lean.Name.unmangle` has been added which can be used to unmangle a
mangled identifier. This unmangler has been added to demonstrate the
injectivity but also to allow unmangling identifiers e.g. for debugging
purposes.

Closes #10724
2025-10-23 07:18:07 +00:00
Markus Himmel
3ce7d4ef5c
chore: minor optimizations on the critical path (#10900)
This PR optimizes two `String` proofs and makes sure that
`MkIffOfInductiveProp` does not import `Lean.Elab.Tactic`, which
previously pushed it to the very end of the import graph.
2025-10-22 19:32:26 +00:00
Markus Himmel
b5dc11e8d3
chore: move some material out of Init.Data.String.Basic (#10893)
This PR splits some low-hanging fruit out of `Init.Data.String.Basic`:
basic material about `String.Pos.Raw`, `String.Substrig`, and
`String.Iterator`.

More splitting required and the remaining material is quite unorganized,
but it's a start.
2025-10-22 16:31:08 +00:00
Markus Himmel
6a1cc7d6b8
chore: minor String improvements (#10891)
This PR renames the cast functions on `String.ValidPos` for `set` and
`modify` to adhere to the established naming convention.

It also fixes two typos and very slighly tweaks the import graph,
shortening the critical path by a negligible amount.
2025-10-22 06:35:51 +00:00
Markus Himmel
b28daa6d60
chore: rename String.endPos -> String.rawEndPos (#10853)
This PR renames `String.endPos` to `String.rawEndPos`, as in a future
release the name `String.endPos` will be taken by the function that is
currently called `String.endValidPos`.
2025-10-21 11:25:30 +00:00
Markus Himmel
196d50156a
fix: logic error in String.Slice.takeWhile (#10868)
This PR fixes a bug in `String.Slice.takeWhile` which caused it to get
its bookkeeping wrong and panic. The new version only uses safe
operations on `String.Slice.Pos`.
2025-10-21 09:52:11 +00:00
Markus Himmel
c981ebc546
feat: split and splitInclusive iterators are finite (#10820)
This PR shows that the iterators returned by `String.Slice.split` and
`String.Slice.splitInclusive` are finite as long as the forward matcher
iterator for the pattern is finite (which we already know for all of our
patterns).

At actually also completely redefines the iterators to avoid the inner
loop in `Internal.nextMatch` which generates inefficient code. Instead,
when encountering a mismach from the matcher, we `skip` the split
iterator.
2025-10-20 10:21:21 +00:00
Markus Himmel
dad541265c
refactor: move operations on String.Pos.Raw to the String.Pos.Raw namespace (#10735)
This PR moves many operations involving `String.Pos.Raw` to a the
`String.Pos.Raw` namespace with the eventual aim of freeing up the
`String` namespace to contain operations using `String.ValidPos` (to be
renamed to `String.Pos`) instead.

This PR adds the `String.ValidPos.set` and `String.ValidPos.modify`
functions.

After this PR, `String.pos_lt_eq` is no longer a `simp` lemma. Add
`String.Pos.Raw.lt_iff` as a `simp` lemma if your proofs break.
2025-10-18 12:12:55 +00:00
Markus Himmel
ca7a8e18b7
refactor: rename String.split to String.splitToList (#10822)
This PR renames `String.split` to `String.splitToList`, because soon the
name `String.split` will be used by a new implementation which is
superior because it is polymorphic over the pattern kind and it returns
an iterator of slices instead of a list of strings.
2025-10-18 12:12:54 +00:00
Sebastian Ullrich
428355cf02
chore: remove redundant imports in core (#10750) 2025-10-16 20:27:46 +00:00
Paul Reichert
f58999a7a6
refactor: use Shrink stub in the iterator framework (#10725)
This PR introduces a no-op version of `Shrink`, a type that should allow
shrinking small types into smaller universes given a proof that the type
is small enough, and uses it in the iterator library. Because this type
would require special compiler support, the current version is just a
wrapper around the inner type so that the wrapper is equivalent, but not
definitionally equivalent.

While `Shrink` is unable to shrink universes right now, but introducing
it now will allow us to generalize the universes in the iterator library
with fewer breaking changes as soon as an actual `Shrink` is possible.
2025-10-14 10:22:14 +00:00
Markus Himmel
1dae353575
chore: duplicate some String functions ahead of deprecation (#10768)
This PR is split off from #10735 for boring bootstrapping reasons.
2025-10-14 07:36:05 +00:00
Markus Himmel
dca8d6d188
refactor: discipline around arithmetic of String.Pos.Raw (#10713)
This PR enforces rules around arithmetic of `String.Pos.Raw`.

Specifically, it adopts the following conventions:

- Byte indices ("ordinals") in strings should be represented using
`String.Pos.Raw`
- Amounts of bytes ("cardinals") in strings should be represented using
`Nat`.

For example, `String.Slice.utf8ByteSize` now returns `Nat` instead of
`String.Pos.Raw`, and there is a new function `String.Slice.rawEndPos`.

Finally, the `HAdd` and `HSub` instances for `String.Pos.Raw` are
reorganized. This is a **breaking change**.

The `HAdd/HSub String.Pos.Raw String.Pos.Raw String.Pos.Raw` instances
have been removed. For the use case of tracking positions relative to
some other position, we instead provide `offsetBy` and `unoffsetBy`
functions. For the use case of advancing/unadvancing a position by an
arbitrary number of bytes, we instead provide `increaseBy` and
`decreaseBy` functions. For
offsetting/unoffsetting/advancing/unadvancing a position `p` by the size
of a string `s` (resp. character `c`), use `s + p`/`p - s`/`p + s`/`p -
s` (resp. `c + p`/`p - c`/`p + c`/`p - c`).
2025-10-09 07:47:45 +00:00
Markus Himmel
d228cd3edd
feat: LT and LE instances on new position types (#10685)
This PR introduces `LT` and `LE` instances on `String.ValidPos` and
`String.Slice.Pos`.
2025-10-06 16:06:16 +00:00
Markus Himmel
5c707d936c
chore: rename Stream to Std.Stream (#10645)
This PR renames `Stream` to `Std.Stream` so that the name becomes
available to mathlib after a deprecation cycle.
2025-10-02 15:25:56 +00:00
David Thrane Christiansen
0b2193c771
chore: docstring review for ByteArray (#10632)
This PR adds missing docstrings for ByteArray and makes existing ones
consistent with our style.
2025-10-02 04:20:18 +00:00
David Thrane Christiansen
2c6576b269
chore: missing docstring + style updates for String docs (#10640)
This PR adds a missing docstring and applies our style guide to parts of
the String API.
2025-10-02 04:19:55 +00:00
Markus Himmel
2cca32ccc3
chore: use UTF8 instead of Utf8 in identifiers (#10636)
This PR renames `String.getUtf8Byte` to `String.getUTF8Byte` in order to
adhere to the standard library naming convention.
2025-10-01 17:57:32 +00:00