Commit graph

143 commits

Author SHA1 Message Date
Markus Himmel
f1224277e2
perf: improve performance of String.ValidPos (#11142)
This PR aims to bring the performance of `String.ValidPos` closer to
that of `String.Pos.Raw` by adding/correcting `extern` annotations as
needed.

This is in response to a regression observed after #11127. The changes
to the `String` `Parsec` module lead to different compiler behavior for
functions like `strCore` and `natCore`. The new IR *looks* better than
the old IR, but the
[numbers](1e438647ba)
are a bit mixed.
2025-11-11 15:30:47 +00:00
Markus Himmel
2c2fcff4f8
refactor: do not use String.Iterator (#11127)
This PR removes all uses of `String.Iterator` from core, preferring
`String.ValidPos` instead.

In an upcoming PR, `String.Iterator` will be renamed to
`String.Legacy.Iterator`.
2025-11-11 11:46:58 +00:00
Markus Himmel
d24ece1396
feat: String.toList_map (#11021)
This PR adds more theory about `Splits` for strings and deduces the
first user-facing `String` lemma, `String.toList_map`.
2025-11-01 13:54:39 +00:00
Markus Himmel
377f149862
refactor: use String.ofList and String.toList for String <-> List Char conversion (#11017)
This PR establishes `String.ofList` and `String.toList` as the preferred
method for converting between strings and lists of characters and
deprecates the alternatives `String.mk`, `List.asString` and
`String.data`.
2025-10-31 14:41:23 +00:00
Markus Himmel
5af12df54b
chore: add String.ofList redefine String.toList (#11016)
This PR ensures that `String.toList` and `String.ofList` exist and have
the right `extern` annotations.
2025-10-30 07:07:12 +00:00
Kim Morrison
335e34df19
chore: add deprecations for duplicated theorems (#10967) 2025-10-29 05:26:16 +00:00
Markus Himmel
8fe260de55
feat: termination arguments for String.ValidPos and String.Slice.Pos (#10933)
This PR adds the basic infrastructure to perform termination proofs
about `String.ValidPos` and `String.Slice.Pos`.

We choose approach where the intended way to do termination arguments is
to argue about the position itself rather than some projection of it
like `remainingBytes`.

The types `String.ValidPos` and `String.Slice.Pos` are equipped with a
`WellFoundedRelation` instance given by the greater-than relation. This
means that if a function takes a position `p` and performs a recursive
call on `q`, then the decreasing obligation will be `p < q`. This works
well in the common case where `q` is `p.next h`, in which case the goal
`p < p.next h` is solved by the simplifier.

For stepping through a string backwards, we introduce a type synonym
with a `WellFoundedRelation` instance given by the less-than relation.
This means that if a function takes a position `p` and performs a
recursive call on `q` and specifies `termination_by p.down`, then the
decreasing obligation will be `q < p`. This works well in the case where
`q` is `p.prev h`, in which case the goal `p.prev h < p` is solved by
the simplifier.

For termination arguments invoving multiple strings, the lower-level
primitive `p.remainingBytes` (landing in `Nat`) is also available.

In a future PR, we will additionally provide the necessary typeclasses
instances to register `String.ValidPos` and `String.Slice.Pos` with
`grind` to make complex termination arguments more convenient in user
code.
2025-10-27 10:05:44 +00:00
Markus Himmel
59573646c2
chore: more minor String improvements (#10930)
This PR moves some more material out of `Init.Data.String.Basic` and
fixes the incorrect name
`String.Pos.Raw.IsValidForSlice.le_utf8ByteSize`.
2025-10-23 13:57:23 +00:00
Markus Himmel
ba7798b389
chore: more reorganization of strings (#10928)
This PR splits more material out of `Init.Data.String.Basic`.
2025-10-23 11:56:11 +00:00
Rob23oba
fad0e69cc7
fix: make name mangling unambiguous (#10727)
This PR fixes name mangling to be unambiguous / injective by adding `00`
for disambiguation where necessary. Additionally, the inverse function,
`Lean.Name.unmangle` has been added which can be used to unmangle a
mangled identifier. This unmangler has been added to demonstrate the
injectivity but also to allow unmangling identifiers e.g. for debugging
purposes.

Closes #10724
2025-10-23 07:18:07 +00:00
Markus Himmel
3ce7d4ef5c
chore: minor optimizations on the critical path (#10900)
This PR optimizes two `String` proofs and makes sure that
`MkIffOfInductiveProp` does not import `Lean.Elab.Tactic`, which
previously pushed it to the very end of the import graph.
2025-10-22 19:32:26 +00:00
Markus Himmel
b5dc11e8d3
chore: move some material out of Init.Data.String.Basic (#10893)
This PR splits some low-hanging fruit out of `Init.Data.String.Basic`:
basic material about `String.Pos.Raw`, `String.Substrig`, and
`String.Iterator`.

More splitting required and the remaining material is quite unorganized,
but it's a start.
2025-10-22 16:31:08 +00:00
Markus Himmel
6a1cc7d6b8
chore: minor String improvements (#10891)
This PR renames the cast functions on `String.ValidPos` for `set` and
`modify` to adhere to the established naming convention.

It also fixes two typos and very slighly tweaks the import graph,
shortening the critical path by a negligible amount.
2025-10-22 06:35:51 +00:00
Markus Himmel
b28daa6d60
chore: rename String.endPos -> String.rawEndPos (#10853)
This PR renames `String.endPos` to `String.rawEndPos`, as in a future
release the name `String.endPos` will be taken by the function that is
currently called `String.endValidPos`.
2025-10-21 11:25:30 +00:00
Markus Himmel
196d50156a
fix: logic error in String.Slice.takeWhile (#10868)
This PR fixes a bug in `String.Slice.takeWhile` which caused it to get
its bookkeeping wrong and panic. The new version only uses safe
operations on `String.Slice.Pos`.
2025-10-21 09:52:11 +00:00
Markus Himmel
dad541265c
refactor: move operations on String.Pos.Raw to the String.Pos.Raw namespace (#10735)
This PR moves many operations involving `String.Pos.Raw` to a the
`String.Pos.Raw` namespace with the eventual aim of freeing up the
`String` namespace to contain operations using `String.ValidPos` (to be
renamed to `String.Pos`) instead.

This PR adds the `String.ValidPos.set` and `String.ValidPos.modify`
functions.

After this PR, `String.pos_lt_eq` is no longer a `simp` lemma. Add
`String.Pos.Raw.lt_iff` as a `simp` lemma if your proofs break.
2025-10-18 12:12:55 +00:00
Markus Himmel
ca7a8e18b7
refactor: rename String.split to String.splitToList (#10822)
This PR renames `String.split` to `String.splitToList`, because soon the
name `String.split` will be used by a new implementation which is
superior because it is polymorphic over the pattern kind and it returns
an iterator of slices instead of a list of strings.
2025-10-18 12:12:54 +00:00
Sebastian Ullrich
428355cf02
chore: remove redundant imports in core (#10750) 2025-10-16 20:27:46 +00:00
Markus Himmel
1dae353575
chore: duplicate some String functions ahead of deprecation (#10768)
This PR is split off from #10735 for boring bootstrapping reasons.
2025-10-14 07:36:05 +00:00
Markus Himmel
dca8d6d188
refactor: discipline around arithmetic of String.Pos.Raw (#10713)
This PR enforces rules around arithmetic of `String.Pos.Raw`.

Specifically, it adopts the following conventions:

- Byte indices ("ordinals") in strings should be represented using
`String.Pos.Raw`
- Amounts of bytes ("cardinals") in strings should be represented using
`Nat`.

For example, `String.Slice.utf8ByteSize` now returns `Nat` instead of
`String.Pos.Raw`, and there is a new function `String.Slice.rawEndPos`.

Finally, the `HAdd` and `HSub` instances for `String.Pos.Raw` are
reorganized. This is a **breaking change**.

The `HAdd/HSub String.Pos.Raw String.Pos.Raw String.Pos.Raw` instances
have been removed. For the use case of tracking positions relative to
some other position, we instead provide `offsetBy` and `unoffsetBy`
functions. For the use case of advancing/unadvancing a position by an
arbitrary number of bytes, we instead provide `increaseBy` and
`decreaseBy` functions. For
offsetting/unoffsetting/advancing/unadvancing a position `p` by the size
of a string `s` (resp. character `c`), use `s + p`/`p - s`/`p + s`/`p -
s` (resp. `c + p`/`p - c`/`p + c`/`p - c`).
2025-10-09 07:47:45 +00:00
Markus Himmel
d228cd3edd
feat: LT and LE instances on new position types (#10685)
This PR introduces `LT` and `LE` instances on `String.ValidPos` and
`String.Slice.Pos`.
2025-10-06 16:06:16 +00:00
David Thrane Christiansen
0b2193c771
chore: docstring review for ByteArray (#10632)
This PR adds missing docstrings for ByteArray and makes existing ones
consistent with our style.
2025-10-02 04:20:18 +00:00
David Thrane Christiansen
2c6576b269
chore: missing docstring + style updates for String docs (#10640)
This PR adds a missing docstring and applies our style guide to parts of
the String API.
2025-10-02 04:19:55 +00:00
Markus Himmel
2cca32ccc3
chore: use UTF8 instead of Utf8 in identifiers (#10636)
This PR renames `String.getUtf8Byte` to `String.getUTF8Byte` in order to
adhere to the standard library naming convention.
2025-10-01 17:57:32 +00:00
Markus Himmel
29c2b86ef4
chore: String.getUTF8Byte (#10637)
This PR adds the function `String.getUTF8Byte` ahead of a more
comprehensive PR to use `UTF8` instead of `Utf8` in identifiers.
2025-10-01 13:59:42 +00:00
Markus Himmel
5bfbe2a875
refactor: incorporate UTF8 material from String.Extra into String.Basic (#10634)
This PR defines `ByteArray.validateUTF8`, uses it to show that
`ByteArray.IsValidUtf8` is decidable and redefines `String.fromUTF8` and
friends to use it.

The functions `String.validateUTF8` and `String.utf8DecodeChar?` are
deprecated in favor of the identically named functions in the
`ByteArray` namespace.
2025-10-01 11:33:29 +00:00
Markus Himmel
81ea922025
chore: rename String.Pos to String.Pos.Raw (#10624)
This PR renames `String.Pos` to `String.Pos.Raw`.

After an abbreviated deprecation cycle, we will then rename
`String.ValidPos` to `String.Pos`.
2025-10-01 07:45:24 +00:00
Mario Carneiro
9f41f3324a
fix: make Substring.beq reflexive (#10552)
This PR ensures that `Substring.beq` is reflexive, and in particular
satisfies the equivalence `ss1 == ss2 <-> ss1.toString = ss2.toString`.

Closes #10511.

Note: I also fixed a strange line in the `String.extract` documentation
which looks like it may have been a copypasta, and added another example
to show how invalid UTF8 positions work, but the doc also makes a point
of saying that it is unspecified so maybe it would be better not to have
the example? 🤷
2025-09-25 05:08:41 +00:00
Markus Himmel
d6cd738ab4
feat: redefine String, part two (#10457)
This PR introduces safe alternatives to `String.Pos` and `Substring`
that can only represent valid positions/slices.

Specifically, the PR

- introduces the predicate `String.Pos.IsValid`;
- proves several nontrivial equivalent conditions for
`String.Pos.IsValid`;
- introduces `String.ValidPos`, which is a `String.Pos` with an
`IsValid` proof;
- introduces `String.Slice`, which is like `Substring` but made from
`String.ValidPos` instead of `Pos`;
- introduces `String.Pos.IsValidForSlice`, which is like
`String.Pos.IsValid` but for slices;
- introduces `String.Slice.Pos`, which is like `String.ValidPos` but for
slices;
- introduces various functions for converting between the two types of
positions.

The API added in this PR is not complete. It will be expanded in future
PRs with addional operations and verification.
2025-09-24 13:36:55 +00:00
Markus Himmel
b6198434f2
fix: String regressions (#10523)
This PR fixes some regressions introduced by #10304.
2025-09-24 12:01:50 +00:00
Markus Himmel
197bc6cb66
feat: redefine String, part one (#10304)
This PR redefines `String` to be the type of byte arrays `b` for which
`b.IsValidUtf8`.

This moves the data model of strings much closer to the actual data
representation at runtime.

In the near future, we will

- provide variants of `String.Pos` and `Substring` that only allow for
valid positions
- redefine all `String` functions to be much closer to their C++
implementations

In the near-to-medium future we will then provide comprehensive
verification of `String` based on these refactors.
2025-09-18 11:36:52 +00:00
Markus Himmel
9402c307fe
chore: reorganize Init imports around strings (#10289)
This PR reorganizes the import hierarchy so that
`Init.Data.String.Basic` can import `Init.Data.UInt.Bitwise` and
`Init.Data.Array.Lemmas`.
2025-09-07 17:09:14 +00:00
Markus Himmel
aa0a31ae7d
chore: prepare for untangling strings (#10288)
This PR prepares for a future reorganization of the import hierarchy so
that `Init.Data.String.Basic` can import `Init.Data.UInt.Bitwise` and
`Init.Data.Array.Lemmas`.
2025-09-07 12:58:23 +00:00
David Thrane Christiansen
82932ec86a
feat: add stop position to parser (#10057)
This PR adds a stop position field to parser input contexts, allowing
the parser to be instructed to stop parsing prior to the end of a file.

This is step 1, prior to a stage0 update, to make run-time data
structures sufficiently compatible to avoid segfaults. After the update,
the actual code to stop parsing can be merged.
2025-08-22 17:04:04 +00:00
Markus Himmel
2e6c1a74e5
chore: move String.Pos operations out of Prelude (#9845)
This PR moves arithmetic of `String.Pos` out of the prelude.

Other `String` declarations are part of the prelude because they are
generated by macros, but this does not seem to be the case for these.
2025-08-18 09:23:02 +00:00
Kim Morrison
b676fb1164
fix: @[expose] String.firstDiffPos and String.extract (#9792)
This PR adds `@[expose]` to two definitions with `where` clauses that
Batteries proves theorems about.
2025-08-08 04:55:45 +00:00
Kim Morrison
6e06978961
chore: remove >6 month old deprecations (#9640) 2025-08-05 02:29:15 +00:00
Markus Himmel
3eab35ef22
chore: minor improvements (#9708)
This PR stylistically improves an internal hash map proof and fixes a
typo in the docsting of `String.join`.
2025-08-04 07:12:05 +00:00
Sebastian Ullrich
ff1d3138bf
refactor: module-ize Lean (#9330) 2025-07-25 12:02:51 +00:00
Rob23oba
b7f433c5b9
fix: behavior of String.prev (#9441)
This PR fixes the behavior of `String.prev`, aligning the runtime
implementation with the reference implementation. In particular, the
following statements hold now:
- `(s.prev p).byteIdx` is at least `p.byteIdx - 4` and at most
`p.byteIdx - 1`
- `s.prev 0 = 0`
- `s.prev` is monotone

Closes #9439
2025-07-21 10:50:14 +00:00
Sebastian Ullrich
09a5b34931
feat: make private the default in module (#9044)
This PR adjusts the experimental module system to make `private` the
default visibility modifier in `module`s, introducing `public` as a new
modifier instead. `public section` can be used to revert the default for
an entire section, though this is more intended to ease gradual adoption
of the new semantics such as in `Init` (and soon `Std`) where they
should be replaced by a future decl-by-decl re-review of visibilities.
2025-06-28 16:30:53 +00:00
Rob23oba
e450a02621
fix: change show tactic to work as documented (#7395)
This PR changes the `show t` tactic to match its documentation.
Previously it was a synonym for `change t`, but now it finds the first
goal that unifies with the term `t` and moves it to the front of the
goal list.
2025-06-12 23:54:09 +00:00
Joachim Breitner
803dc3e687
refactor: Init: expose lots of functions (#8501)
This PR adds the `@[expose]` attribute to many functions (and changes
some theorems to be by `:= (rfl)`) in preparation for the `@[defeq]`
attribute change in #8419.
2025-05-28 07:37:54 +00:00
Sebastian Ullrich
01dbbeed99
feat: do not export def bodies by default (#8221)
This PR adjusts the experimental module system to not export the bodies
of `def`s unless opted out by the new attribute `@[expose]` on the `def`
or on a surrounding `section`.

---------

Co-authored-by: Markus Himmel <markus@lean-fro.org>
2025-05-15 12:16:54 +00:00
Rob23oba
b5cfd86a89
fix: Substring.isNat for empty string (#8067)
This PR fixes the behavior of `Substring.isNat` to disallow empty
strings.

Closes #8005
2025-04-29 15:54:29 +00:00
Markus Himmel
68d9d14d44
chore: do not use the coercion α → Option α in Init and Std (#8085)
This PR moves the coercion `α → Option α` to the new file
`Init.Data.Option.Coe`. This file may not be imported anywhere in `Init`
or `Std`.
2025-04-24 13:35:01 +00:00
Sebastian Ullrich
7feb583b9e
feat: enable experimental module system in Init (#8047) 2025-04-23 17:21:33 +00:00
David Thrane Christiansen
fa2d28e2da
doc: docstring details (#7711)
This PR adds the last few missing docstrings that appear in the manual.
2025-03-28 22:30:53 +00:00
David Thrane Christiansen
b26516e33c
doc: docstring review for Substring (#7635)
This PR adds missing docstrings for `Substring` and makes the style of
`Substring` docstrings consistent.
2025-03-25 07:57:55 +00:00
David Thrane Christiansen
7e1ee70b7c
doc: add docstrings for String.drop and String.dropRight (#7607)
This PR adds docstrings for `String.drop` and `String.dropRight`.
2025-03-21 05:38:07 +00:00