Commit graph

165 commits

Author SHA1 Message Date
Markus Himmel
d228cd3edd
feat: LT and LE instances on new position types (#10685)
This PR introduces `LT` and `LE` instances on `String.ValidPos` and
`String.Slice.Pos`.
2025-10-06 16:06:16 +00:00
Markus Himmel
5c707d936c
chore: rename Stream to Std.Stream (#10645)
This PR renames `Stream` to `Std.Stream` so that the name becomes
available to mathlib after a deprecation cycle.
2025-10-02 15:25:56 +00:00
David Thrane Christiansen
0b2193c771
chore: docstring review for ByteArray (#10632)
This PR adds missing docstrings for ByteArray and makes existing ones
consistent with our style.
2025-10-02 04:20:18 +00:00
David Thrane Christiansen
2c6576b269
chore: missing docstring + style updates for String docs (#10640)
This PR adds a missing docstring and applies our style guide to parts of
the String API.
2025-10-02 04:19:55 +00:00
Markus Himmel
2cca32ccc3
chore: use UTF8 instead of Utf8 in identifiers (#10636)
This PR renames `String.getUtf8Byte` to `String.getUTF8Byte` in order to
adhere to the standard library naming convention.
2025-10-01 17:57:32 +00:00
Markus Himmel
29c2b86ef4
chore: String.getUTF8Byte (#10637)
This PR adds the function `String.getUTF8Byte` ahead of a more
comprehensive PR to use `UTF8` instead of `Utf8` in identifiers.
2025-10-01 13:59:42 +00:00
Markus Himmel
5bfbe2a875
refactor: incorporate UTF8 material from String.Extra into String.Basic (#10634)
This PR defines `ByteArray.validateUTF8`, uses it to show that
`ByteArray.IsValidUtf8` is decidable and redefines `String.fromUTF8` and
friends to use it.

The functions `String.validateUTF8` and `String.utf8DecodeChar?` are
deprecated in favor of the identically named functions in the
`ByteArray` namespace.
2025-10-01 11:33:29 +00:00
Markus Himmel
9dc1faf327
chore: add an internal String function (#10635)
This PR adds an internal `String` function ahead of an upcoming PR.
2025-10-01 11:12:35 +00:00
Markus Himmel
81ea922025
chore: rename String.Pos to String.Pos.Raw (#10624)
This PR renames `String.Pos` to `String.Pos.Raw`.

After an abbreviated deprecation cycle, we will then rename
`String.ValidPos` to `String.Pos`.
2025-10-01 07:45:24 +00:00
Markus Himmel
c039e29a3f
perf: shorten critical build path around String.Basic (#10614)
This PR cuts some edges from the import graph.

Specifically:
- `TreeMap` and `HashMap` no longer depend on `String`, so now the
expensive things are all in parallel instead of partially in sequence
- `Omega` no longer relies on `List` lemmas
- The section of the import graph between `Init.Omega` and
`Init.Data.Bitvec.Lemmas` is cleaned up a bit
2025-09-29 19:45:21 +00:00
Henrik Böving
5fd8c1b94d
feat: new String.Slice API (#10514)
This PR defines the new `String.Slice` API.

Many of the core design principles of the API are taken over from Rust's
[string
library](https://doc.rust-lang.org/stable/std/string/struct.String.html).
2025-09-25 12:18:52 +00:00
Mario Carneiro
9f41f3324a
fix: make Substring.beq reflexive (#10552)
This PR ensures that `Substring.beq` is reflexive, and in particular
satisfies the equivalence `ss1 == ss2 <-> ss1.toString = ss2.toString`.

Closes #10511.

Note: I also fixed a strange line in the `String.extract` documentation
which looks like it may have been a copypasta, and added another example
to show how invalid UTF8 positions work, but the doc also makes a point
of saying that it is unspecified so maybe it would be better not to have
the example? 🤷
2025-09-25 05:08:41 +00:00
Markus Himmel
d6cd738ab4
feat: redefine String, part two (#10457)
This PR introduces safe alternatives to `String.Pos` and `Substring`
that can only represent valid positions/slices.

Specifically, the PR

- introduces the predicate `String.Pos.IsValid`;
- proves several nontrivial equivalent conditions for
`String.Pos.IsValid`;
- introduces `String.ValidPos`, which is a `String.Pos` with an
`IsValid` proof;
- introduces `String.Slice`, which is like `Substring` but made from
`String.ValidPos` instead of `Pos`;
- introduces `String.Pos.IsValidForSlice`, which is like
`String.Pos.IsValid` but for slices;
- introduces `String.Slice.Pos`, which is like `String.ValidPos` but for
slices;
- introduces various functions for converting between the two types of
positions.

The API added in this PR is not complete. It will be expanded in future
PRs with addional operations and verification.
2025-09-24 13:36:55 +00:00
Markus Himmel
b6198434f2
fix: String regressions (#10523)
This PR fixes some regressions introduced by #10304.
2025-09-24 12:01:50 +00:00
Tom Levy
e42892cfb6
doc: fix comment about String.fromUTF8 replacing invalid chars (#10240)
Hi, the doc of `String.fromUTF8` previously said invalid characters are
replaced with 'A'. But the parameter `h : validateUTF8 a` guarantees
there are no invalid characters, so that explanation doesn't make sense
to me. This PR deletes that explanation (and fixes some unrelated
typos).

I also have a patch that uses `h` to prove each of the characters is
valid, eliminating the need for a default character
([pr/chore-String-fromUTF8-prove-valid](27f1ff36b2)),
would you be interested in merging that?

<details>
<summary>Notes on invalid characters from unchecked C++</summary>
I don't know if this function may be called from unchecked C++ with
invalid characters. If it may, I'm not sure what would happen with my
patched function... I'm not familiar with Lean's safety model, but it
seems like a bad idea to have a Lean function that takes a proof of a
proposition but is expected to operate in a certain way even if the
proposition is false. I think the safe approach is to have two functions
-- one that takes a proof and is only called from Lean, and another that
doesn't take a proof and replaces invalid chars (for use from C++, not
sure whether it's useful from Lean); I'd prefer to go even further and
report an error instead of silently replacing invalid characters (I'm
not sure if there is any easy way to report errors/panic in Lean code
called from C++).
</details>
2025-09-23 10:19:20 +00:00
Markus Himmel
197bc6cb66
feat: redefine String, part one (#10304)
This PR redefines `String` to be the type of byte arrays `b` for which
`b.IsValidUtf8`.

This moves the data model of strings much closer to the actual data
representation at runtime.

In the near future, we will

- provide variants of `String.Pos` and `Substring` that only allow for
valid positions
- redefine all `String` functions to be much closer to their C++
implementations

In the near-to-medium future we will then provide comprehensive
verification of `String` based on these refactors.
2025-09-18 11:36:52 +00:00
Markus Himmel
9402c307fe
chore: reorganize Init imports around strings (#10289)
This PR reorganizes the import hierarchy so that
`Init.Data.String.Basic` can import `Init.Data.UInt.Bitwise` and
`Init.Data.Array.Lemmas`.
2025-09-07 17:09:14 +00:00
Markus Himmel
aa0a31ae7d
chore: prepare for untangling strings (#10288)
This PR prepares for a future reorganization of the import hierarchy so
that `Init.Data.String.Basic` can import `Init.Data.UInt.Bitwise` and
`Init.Data.Array.Lemmas`.
2025-09-07 12:58:23 +00:00
Markus Himmel
19bd0254c3
chore: move String.utf8EncodeChar to the prelude (#10264)
This PR moves `String.utf8EncodeChar` to the prelude to prepare for the
imminent redefinition of `String`.

The definition in the prelude uses modulo and division operations on
natural numbers. In `String.Extra`, a `csimp` lemma is provided, showing
that the new definition is equal to the previous one (which is now
called `utf8EncodeCharFast`) which uses bitwise operations on `UInt8`.
2025-09-07 12:42:53 +00:00
Kim Morrison
a06e6e7f4d
chore: make UInt.Lemmas a private import of String.Extra (#10115)
This PR makes the `Init.Data.UInt.Lemmas` import into
`Init.Data.String.Extra` private; previously this import was on the
rebuild critical path.
2025-08-25 16:46:22 +00:00
David Thrane Christiansen
82932ec86a
feat: add stop position to parser (#10057)
This PR adds a stop position field to parser input contexts, allowing
the parser to be instructed to stop parsing prior to the end of a file.

This is step 1, prior to a stage0 update, to make run-time data
structures sufficiently compatible to avoid segfaults. After the update,
the actual code to stop parsing can be merged.
2025-08-22 17:04:04 +00:00
Sebastian Ullrich
0e8838df3b
chore: avoid confusing public import all combination (#10051) 2025-08-22 12:04:42 +00:00
Markus Himmel
2e6c1a74e5
chore: move String.Pos operations out of Prelude (#9845)
This PR moves arithmetic of `String.Pos` out of the prelude.

Other `String` declarations are part of the prelude because they are
generated by macros, but this does not seem to be the case for these.
2025-08-18 09:23:02 +00:00
Paul Reichert
0725349bbd
feat: high-level order typeclasses (#9729)
This PR introduces a canonical way to endow a type with an order
structure. The basic operations (`LE`, `LT`, `Min`, `Max`, and in later
PRs `BEq`, `Ord`, ...) and any higher-level property (a preorder, a
partial order, a linear order etc.) are then put in relation to `LE` as
necessary. The PR provides `IsLinearOrder` instances for many core types
and updates the signatures of some lemmas.

**BREAKING CHANGES:**

* The requirements of the `lt_of_le_of_lt`/`le_trans` lemmas for
`Vector`, `List` and `Array` are simplified. They now require an
`IsLinearOrder` instance. The new requirements are logically equivalent
to the old ones, but the `IsLinearOrder` instance is not automatically
inferred from the smaller typeclasses.
* Hypotheses of type `Std.Total (¬ · < · : α → α → Prop)` are replaced
with the equivalent class `Std.Asymm (· < · : α → α → Prop)`. Breakage
should be limited because there is now an instance that derives the
latter from the former.
* In `Init.Data.List.MinMax`, multiple theorem signatures are modified,
replacing explicit parameters for antisymmetry, totality, `min_ex_or`
etc. with corresponding instance parameters.
2025-08-11 14:55:17 +00:00
Kim Morrison
b676fb1164
fix: @[expose] String.firstDiffPos and String.extract (#9792)
This PR adds `@[expose]` to two definitions with `where` clauses that
Batteries proves theorems about.
2025-08-08 04:55:45 +00:00
Kim Morrison
6e06978961
chore: remove >6 month old deprecations (#9640) 2025-08-05 02:29:15 +00:00
Markus Himmel
3eab35ef22
chore: minor improvements (#9708)
This PR stylistically improves an internal hash map proof and fixes a
typo in the docsting of `String.join`.
2025-08-04 07:12:05 +00:00
Sebastian Ullrich
ff1d3138bf
refactor: module-ize Lean (#9330) 2025-07-25 12:02:51 +00:00
Rob23oba
b7f433c5b9
fix: behavior of String.prev (#9441)
This PR fixes the behavior of `String.prev`, aligning the runtime
implementation with the reference implementation. In particular, the
following statements hold now:
- `(s.prev p).byteIdx` is at least `p.byteIdx - 4` and at most
`p.byteIdx - 1`
- `s.prev 0 = 0`
- `s.prev` is monotone

Closes #9439
2025-07-21 10:50:14 +00:00
Sebastian Ullrich
09a5b34931
feat: make private the default in module (#9044)
This PR adjusts the experimental module system to make `private` the
default visibility modifier in `module`s, introducing `public` as a new
modifier instead. `public section` can be used to revert the default for
an entire section, though this is more intended to ease gradual adoption
of the new semantics such as in `Init` (and soon `Std`) where they
should be replaced by a future decl-by-decl re-review of visibilities.
2025-06-28 16:30:53 +00:00
Joachim Breitner
be80a23281
chore: remove unused simp args (#8905)
This PR uses the linter from
https://github.com/leanprover/lean4/pull/8901 to clean up simp
arguments.
2025-06-20 22:34:30 +00:00
Rob23oba
e450a02621
fix: change show tactic to work as documented (#7395)
This PR changes the `show t` tactic to match its documentation.
Previously it was a synonym for `change t`, but now it finds the first
goal that unifies with the term `t` and moves it to the front of the
goal list.
2025-06-12 23:54:09 +00:00
Joachim Breitner
803dc3e687
refactor: Init: expose lots of functions (#8501)
This PR adds the `@[expose]` attribute to many functions (and changes
some theorems to be by `:= (rfl)`) in preparation for the `@[defeq]`
attribute change in #8419.
2025-05-28 07:37:54 +00:00
Kim Morrison
efe2ab4c04
chore: remove duplicate instances (#8397)
This PR cleans up many duplicate instances (or, in some cases,
needlessly duplicated `def X := ...; instance Y := X`).
2025-05-19 04:36:06 +00:00
Sebastian Ullrich
01dbbeed99
feat: do not export def bodies by default (#8221)
This PR adjusts the experimental module system to not export the bodies
of `def`s unless opted out by the new attribute `@[expose]` on the `def`
or on a surrounding `section`.

---------

Co-authored-by: Markus Himmel <markus@lean-fro.org>
2025-05-15 12:16:54 +00:00
Rob23oba
b5cfd86a89
fix: Substring.isNat for empty string (#8067)
This PR fixes the behavior of `Substring.isNat` to disallow empty
strings.

Closes #8005
2025-04-29 15:54:29 +00:00
Markus Himmel
68d9d14d44
chore: do not use the coercion α → Option α in Init and Std (#8085)
This PR moves the coercion `α → Option α` to the new file
`Init.Data.Option.Coe`. This file may not be imported anywhere in `Init`
or `Std`.
2025-04-24 13:35:01 +00:00
Sebastian Ullrich
7feb583b9e
feat: enable experimental module system in Init (#8047) 2025-04-23 17:21:33 +00:00
David Thrane Christiansen
fa2d28e2da
doc: docstring details (#7711)
This PR adds the last few missing docstrings that appear in the manual.
2025-03-28 22:30:53 +00:00
David Thrane Christiansen
b26516e33c
doc: docstring review for Substring (#7635)
This PR adds missing docstrings for `Substring` and makes the style of
`Substring` docstrings consistent.
2025-03-25 07:57:55 +00:00
David Thrane Christiansen
7e1ee70b7c
doc: add docstrings for String.drop and String.dropRight (#7607)
This PR adds docstrings for `String.drop` and `String.dropRight`.
2025-03-21 05:38:07 +00:00
Markus Himmel
d66abc0fc0
feat: lemmas about operations on finite unsigned integers (#7484)
This PR adds some lemmas about operations defined on `UIntX`
2025-03-18 10:52:54 +00:00
David Thrane Christiansen
5d91ed01b7
doc: review String docstrings (#7506)
This PR adds missing `String` docstrings and makes the existing ones
consistent in style.
2025-03-18 04:36:49 +00:00
Kim Morrison
ce138e1cec
fix: correct names in library lemmas (#7541)
This PR corrects names of a number of lemmas, where the incorrect name
was identified automatically by a
[tool](https://leanprover.zulipchat.com/#narrow/channel/270676-lean4/topic/automatic.20spelling.20generation.20.26.20comparison/near/505760384)
written by @Rob23oba.
2025-03-18 03:50:03 +00:00
David Thrane Christiansen
25179352b4
doc: review List docstrings for manual (#7452)
This PR makes the style of all `List` docstrings that appear in the
language reference consistent.

Relies on #7240 for links and example formatting.

---------

Co-authored-by: Kim Morrison <kim@tqft.net>
2025-03-13 16:10:06 +00:00
David Thrane Christiansen
1a0d2b6fc1
doc: Char docstring proofreading (#7198)
This PR makes the docstrings in the `Char` namespace follow the
documentation conventions.

---------

Co-authored-by: Markus Himmel <markus@himmel-villmar.de>
2025-03-08 22:17:01 +00:00
Kim Morrison
e06673e200
feat: lemmas about List/Array/Vector lexicographic order (#6423)
This PR adds missing lemmas about lexicographic order on
List/Array/Vector.
2024-12-20 06:16:27 +00:00
Kim Morrison
b4ff5455ba
feat: lemmas about lexicographic order on Array and Vector (#6399)
This PR adds basic lemmas about lexicographic order on Array and Vector,
achieving parity with List.

Many lemmas are still missing for all three, particularly about how
order interacts with `++`.
2024-12-19 10:36:50 +00:00
Kim Morrison
6893913683
feat: replace List.lt with List.Lex (#6379)
This PR replaces `List.lt` with `List.Lex`, from Mathlib, and adds the
new `Bool` valued lexicographic comparatory function `List.lex`. This
subtly changes the definition of `<` on Lists in some situations.

`List.lt` was a weaker relation: in particular if `l₁ < l₂`, then
`a :: l₁ < b :: l₂` may hold according to `List.lt` even if `a` and `b`
are merely incomparable
(either neither `a < b` nor `b < a`), whereas according to `List.Lex`
this would require `a = b`.

When `<` is total, in the sense that `¬ · < ·` is antisymmetric, then
the two relations coincide.

Mathlib was already overriding the order instances for `List α`,
so this change should not be noticed by anyone already using Mathlib.

We simultaneously add the boolean valued `List.lex` function,
parameterised by a `BEq` typeclass
and an arbitrary `lt` function. This will support the flexibility
previously provided for `List.lt`,
via a `==` function which is weaker than strict equality.
2024-12-15 08:22:39 +00:00
Kim Morrison
1c30c76e72
chore: remove >6 month old deprecations (#6057) 2024-11-13 23:21:23 +00:00