Commit graph

18 commits

Author SHA1 Message Date
Markus Himmel
5a5f8c4c2e
perf: unbundle needle from char/pred pattern (#11376)
This PR aims to improve the performance of `String.contains`,
`String.find`, etc. when using patterns of type `Char` or `Char -> Bool`
by moving the needle out of the iterator state and thus working around
missing unboxing in the compiler.
2025-11-26 07:30:29 +00:00
Markus Himmel
fa67f300f6
chore: rename String.ValidPos to String.Pos (#11240)
This PR renames `String.ValidPos` to `String.Pos`, `String.endValidPos`
to `String.endPos` and `String.startValidPos` to `String.startPos`.

Accordingly, the deprecations of `String.Pos` to `String.Pos.Raw` and
`String.endPos` to `String.rawEndPos` are removed early, after an
abbreviated deprecation cycle of two releases.
2025-11-24 16:40:21 +00:00
Markus Himmel
7267ed707a
feat: string patterns for decidable predicates on Char (#11285)
This PR adds `Std.Slice.Pattern` instances for `p : Char -> Prop` as
long as `DecidablePred p`, to allow things like `"hello".dropWhile (· =
'h')`.

To achieve this, we refactor `ForwardPattern` and friends to be
"non-uniform", i.e., the class is now `ForwardPattern pat`, not
`ForwardPattern ρ` (where `pat : ρ`).
2025-11-20 15:30:37 +00:00
Henrik Böving
52b687cab4
perf: less allocations when using string patterns (#11255)
This PR reduces the allocations when using string patterns. In
particular
`startsWith`, `dropPrefix?`, `endsWith`, `dropSuffix?` are optimized.
2025-11-19 13:06:27 +00:00
Markus Himmel
106b0fa661
fix: KMP implementation (#10998)
This PR fixes the KMP implementation, which did incorrect bookkeeping of
the backtracking process, leading to incorrect starting ranges of
matches.

The new implementation does not require `partial` anywhere.
2025-10-29 06:04:45 +00:00
Markus Himmel
d2f76ade61
fix: search for empty string (#10985)
This PR ensures that searching for an empty string returns the expected
pattern of alternating size-zero matches and size-one rejects.

In particular, splitting by an empty string returns an array formed of
the empty string, all of the string's characters as singleton strings,
followed by another empty string. This matches the [Rust
behavior](https://doc.rust-lang.org/std/primitive.str.html#method.split),
for example.
2025-10-27 13:05:33 +00:00
Markus Himmel
8fe260de55
feat: termination arguments for String.ValidPos and String.Slice.Pos (#10933)
This PR adds the basic infrastructure to perform termination proofs
about `String.ValidPos` and `String.Slice.Pos`.

We choose approach where the intended way to do termination arguments is
to argue about the position itself rather than some projection of it
like `remainingBytes`.

The types `String.ValidPos` and `String.Slice.Pos` are equipped with a
`WellFoundedRelation` instance given by the greater-than relation. This
means that if a function takes a position `p` and performs a recursive
call on `q`, then the decreasing obligation will be `p < q`. This works
well in the common case where `q` is `p.next h`, in which case the goal
`p < p.next h` is solved by the simplifier.

For stepping through a string backwards, we introduce a type synonym
with a `WellFoundedRelation` instance given by the less-than relation.
This means that if a function takes a position `p` and performs a
recursive call on `q` and specifies `termination_by p.down`, then the
decreasing obligation will be `q < p`. This works well in the case where
`q` is `p.prev h`, in which case the goal `p.prev h < p` is solved by
the simplifier.

For termination arguments invoving multiple strings, the lower-level
primitive `p.remainingBytes` (landing in `Nat`) is also available.

In a future PR, we will additionally provide the necessary typeclasses
instances to register `String.ValidPos` and `String.Slice.Pos` with
`grind` to make complex termination arguments more convenient in user
code.
2025-10-27 10:05:44 +00:00
Markus Himmel
59573646c2
chore: more minor String improvements (#10930)
This PR moves some more material out of `Init.Data.String.Basic` and
fixes the incorrect name
`String.Pos.Raw.IsValidForSlice.le_utf8ByteSize`.
2025-10-23 13:57:23 +00:00
Markus Himmel
196d50156a
fix: logic error in String.Slice.takeWhile (#10868)
This PR fixes a bug in `String.Slice.takeWhile` which caused it to get
its bookkeeping wrong and panic. The new version only uses safe
operations on `String.Slice.Pos`.
2025-10-21 09:52:11 +00:00
Markus Himmel
c981ebc546
feat: split and splitInclusive iterators are finite (#10820)
This PR shows that the iterators returned by `String.Slice.split` and
`String.Slice.splitInclusive` are finite as long as the forward matcher
iterator for the pattern is finite (which we already know for all of our
patterns).

At actually also completely redefines the iterators to avoid the inner
loop in `Internal.nextMatch` which generates inefficient code. Instead,
when encountering a mismach from the matcher, we `skip` the split
iterator.
2025-10-20 10:21:21 +00:00
Markus Himmel
dad541265c
refactor: move operations on String.Pos.Raw to the String.Pos.Raw namespace (#10735)
This PR moves many operations involving `String.Pos.Raw` to a the
`String.Pos.Raw` namespace with the eventual aim of freeing up the
`String` namespace to contain operations using `String.ValidPos` (to be
renamed to `String.Pos`) instead.

This PR adds the `String.ValidPos.set` and `String.ValidPos.modify`
functions.

After this PR, `String.pos_lt_eq` is no longer a `simp` lemma. Add
`String.Pos.Raw.lt_iff` as a `simp` lemma if your proofs break.
2025-10-18 12:12:55 +00:00
Paul Reichert
f58999a7a6
refactor: use Shrink stub in the iterator framework (#10725)
This PR introduces a no-op version of `Shrink`, a type that should allow
shrinking small types into smaller universes given a proof that the type
is small enough, and uses it in the iterator library. Because this type
would require special compiler support, the current version is just a
wrapper around the inner type so that the wrapper is equivalent, but not
definitionally equivalent.

While `Shrink` is unable to shrink universes right now, but introducing
it now will allow us to generalize the universes in the iterator library
with fewer breaking changes as soon as an actual `Shrink` is possible.
2025-10-14 10:22:14 +00:00
Markus Himmel
dca8d6d188
refactor: discipline around arithmetic of String.Pos.Raw (#10713)
This PR enforces rules around arithmetic of `String.Pos.Raw`.

Specifically, it adopts the following conventions:

- Byte indices ("ordinals") in strings should be represented using
`String.Pos.Raw`
- Amounts of bytes ("cardinals") in strings should be represented using
`Nat`.

For example, `String.Slice.utf8ByteSize` now returns `Nat` instead of
`String.Pos.Raw`, and there is a new function `String.Slice.rawEndPos`.

Finally, the `HAdd` and `HSub` instances for `String.Pos.Raw` are
reorganized. This is a **breaking change**.

The `HAdd/HSub String.Pos.Raw String.Pos.Raw String.Pos.Raw` instances
have been removed. For the use case of tracking positions relative to
some other position, we instead provide `offsetBy` and `unoffsetBy`
functions. For the use case of advancing/unadvancing a position by an
arbitrary number of bytes, we instead provide `increaseBy` and
`decreaseBy` functions. For
offsetting/unoffsetting/advancing/unadvancing a position `p` by the size
of a string `s` (resp. character `c`), use `s + p`/`p - s`/`p + s`/`p -
s` (resp. `c + p`/`p - c`/`p + c`/`p - c`).
2025-10-09 07:47:45 +00:00
Markus Himmel
d228cd3edd
feat: LT and LE instances on new position types (#10685)
This PR introduces `LT` and `LE` instances on `String.ValidPos` and
`String.Slice.Pos`.
2025-10-06 16:06:16 +00:00
David Thrane Christiansen
2c6576b269
chore: missing docstring + style updates for String docs (#10640)
This PR adds a missing docstring and applies our style guide to parts of
the String API.
2025-10-02 04:19:55 +00:00
Markus Himmel
2cca32ccc3
chore: use UTF8 instead of Utf8 in identifiers (#10636)
This PR renames `String.getUtf8Byte` to `String.getUTF8Byte` in order to
adhere to the standard library naming convention.
2025-10-01 17:57:32 +00:00
Markus Himmel
81ea922025
chore: rename String.Pos to String.Pos.Raw (#10624)
This PR renames `String.Pos` to `String.Pos.Raw`.

After an abbreviated deprecation cycle, we will then rename
`String.ValidPos` to `String.Pos`.
2025-10-01 07:45:24 +00:00
Henrik Böving
5fd8c1b94d
feat: new String.Slice API (#10514)
This PR defines the new `String.Slice` API.

Many of the core design principles of the API are taken over from Rust's
[string
library](https://doc.rust-lang.org/stable/std/string/struct.String.html).
2025-09-25 12:18:52 +00:00