Commit graph

104 commits

Author SHA1 Message Date
Henrik Böving
95549f17da
feat: LeanSAT's LRAT parsers + SAT solver interface (#5100)
Step 5/6 in upstreaming LeanSAT.

---------

Co-authored-by: Markus Himmel <markus@lean-fro.org>
2024-08-20 11:42:26 +00:00
Mario Carneiro
0a1a855ba8
fix: validate UTF-8 at C++ -> Lean boundary (#3963)
Continuation of #3958. To ensure that lean code is able to uphold the
invariant that `String`s are valid UTF-8 (which is assumed by the lean
model), we have to make sure that no lean objects are created with
invalid UTF-8. #3958 covers the case of lean code creating strings via
`fromUTF8Unchecked`, but there are still many cases where C++ code
constructs strings from a `const char *` or `std::string` with unclear
UTF-8 status.

To address this and minimize accidental missed validation, the
`(lean_)mk_string` function is modified to validate UTF-8. The original
function is renamed to `mk_string_unchecked`, with several other
variants depending on whether we know the string is UTF-8 or ASCII and
whether we have the length and/or utf8 char count on hand. I reviewed
every function which leads to `mk_string` or its variants in the C code,
and used the appropriate validation function, defaulting to `mk_string`
if the provenance is unclear.

This PR adds no new error handling paths, meaning that incorrect UTF-8
will still produce incorrect results in e.g. IO functions, they are just
not causing unsound behavior anymore. A subsequent PR will handle adding
better error reporting for bad UTF-8.
2024-06-19 14:05:48 +00:00
Kim Morrison
2a2b276ede
chore: unify String.csize : Nat and Char.utf8Size : UInt32 as Char.size : Nat (#4357)
It seems:
* there was no actual need for the UInt32 valued version
* downstream we were getting duplicative lemmas about both
* so lets reduce the API surface area!

If anyone would prefer the remaining function is still called
`Char.utf8Size` I will happily change it. (`size` is hopefully still
unambiguous, and it's helpful to rename here so we can give a
deprecation warning that explains the type signature change.)

---------

Co-authored-by: Mac Malone <tydeu@hatpress.net>
2024-06-11 02:51:18 +00:00
Kim Morrison
56adfb856d
chore: upstream basic String lemmas (#4354) 2024-06-05 21:28:43 +00:00
Austin Letson
644c1d4e36
doc: add docstrings and examples for String functions (#4332)
Add docstrings, usage examples, and doctests for `String.get'`,
`String.next'`, `String.posOf`, `String.revPosOf`.
2024-06-05 05:16:56 +00:00
Sebastian Ullrich
f97a7d4234
feat: incremental elaboration of definition headers, bodies, and tactics (#3940)
Extends Lean's incremental reporting and reuse between commands into
various steps inside declarations:
* headers and bodies of each (mutual) definition/theorem
* `theorem ... := by` for each contained tactic step, including
recursively inside supported combinators currently consisting of
  * `·` (cdot), `case`, `next`
  * `induction`, `cases`
  * macros such as `next` unfolding to the above

![Recording 2024-05-10 at 11 07
32](https://github.com/leanprover/lean4/assets/109126/c9d67b6f-c131-4bc3-a0de-7d63eaf1bfc9)

*Incremental reuse* means not recomputing any such steps if they are not
affected by a document change. *Incremental reporting* includes the
parts seen in the recording above: the progress bar and messages. Other
language server features such as hover etc. are *not yet* supported
incrementally, i.e. they are shown only when the declaration has been
fully processed as before.

---------

Co-authored-by: Scott Morrison <scott.morrison@gmail.com>
2024-05-22 13:23:30 +00:00
Leonardo de Moura
8c03650359
feat: some Char, UInt, and Fin theorems (#4231)
for SSFT24 summer school: https://github.com/david-christiansen/ssft24

---------

Co-authored-by: Kim Morrison <kim@tqft.net>
Co-authored-by: Kim Morrison <scott.morrison@gmail.com>
Co-authored-by: David Thrane Christiansen <david@davidchristiansen.dk>
2024-05-21 06:11:23 +00:00
Austin Letson
2faa81d41f
doc: add docstrings and examples for String functions (#4166)
Add docstrings, usage examples, and doc tests for `String.prev`,
`.front`, `.back`, `.atEnd`.

Improve docstring examples for `String.next` based on discussion
examples for `String.prev`.

---------

Co-authored-by: Kim Morrison <kim@tqft.net>
2024-05-21 04:27:40 +00:00
Leonardo de Moura
f3ccd6b023
feat: some string simprocs (#4233)
For the SSFT24 summer school.
2024-05-20 22:53:10 +00:00
Kyle Miller
a7338c5ad8
feat: make frontend normalize line endings to LF (#3903)
To eliminate parsing differences between Windows and other platforms,
the frontend now normalizes all CRLF line endings to LF, like [in
Rust](https://github.com/rust-lang/rust/issues/62865).

Effects:
- This makes Lake hashes be faithful to what Lean sees (Lake already
normalizes line endings before computing hashes).
- Docstrings now have normalized line endings. In particular, this fixes
`#guard_msgs` failing multiline tests for Windows users using CRLF.
- Now strings don't have different lengths depending on the platform.
Before this PR, the following theorem is true for LF and false for CRLF
files.
```lean
example : "
".length = 1 := rfl
```

Note: the normalization will take `\r\r\n` and turn it into `\r\n`. In
the elaborator, we reject loose `\r`'s that appear in whitespace. Rust
instead takes the approach of making the normalization routine fail.
They do this so that there's no downstream confusion about any `\r\n`
that appears.

Implementation note: the LSP maintains its own copy of a source file
that it updates when edit operations are applied. We are assuming that
edit operations never split or join CRLFs. If this assumption is not
correct, then the LSP copy of a source file can become slightly out of
sync. If this is an issue, there is some discussion
[here](https://github.com/leanprover/lean4/pull/3903#discussion_r1592930085).
2024-05-20 17:13:08 +00:00
Kim Morrison
799923d145
chore: move have to decreasing_by in substrEq.loop (#4143)
Currently this causes linter warnings downstream in proofs that unfold
substrEq.loop.
2024-05-13 06:18:44 +00:00
Austin Letson
b8e67d87a8
doc: add docstrings and usage examples in Init.Data.String.Basic (#4001)
Add docstrings and usage examples for `String.length`, `.push`,
`.append`, `.get?`, `.set`, `.modyify`, and `.next`. Update docstrings
and add usage examples for `String.toList`, `.get`, and `.get!`.

---------

Co-authored-by: Joachim Breitner <mail@joachim-breitner.de>
Co-authored-by: David Thrane Christiansen <david@davidchristiansen.dk>
2024-05-07 23:49:43 +00:00
Joachim Breitner
e2983e44ef
perf: use with_reducible in special-purpose decreasing_trivial macros (#3991)
Because of the last-added-tried-first rule for macros, all the special
purpose `decreasing_trivial` rules are tried for most recursive
definitions out there, and because they use `apply` and `assumption`
with default transparency may cause some definitoins to be unfolded over
and over again.

A quick test with one of the functions in the leansat project shows that
elaboration time goes down from 600ms to 375ms when using
```
decreasing_by all_goals decreasing_with with_reducible decreasing_trivial
```
instead of
```
decreasing_by all_goals decreasing_with decreasing_trivial
```

This change uses `with_reducible` in most of these macros.

This means that these tactics will no longer work when the
relations/definitions they look for is hidden behind a definition.
This affected in particular `Array.sizeOf_get`, which now has a
companion `sizeOf_getElem`.

In addition, there were three tactics using `apply` to apply Nat-related
lemmas
that we now expect `omega` to solve. We still need them when building
`Init` modules
that don’t have access to `omega`, but they now live in
`decreasing_trivial_pre_omega`,
meant to be only used internally.
2024-04-29 15:12:27 +00:00
Mario Carneiro
70a23945bf
feat: add model implementation for UTF8 enc/dec (#3961)
- [x] Depends on: #3958 
- [x] Depends on: #3960

This makes the UTF-8 encode and decode functions have lean definitions,
so that we can prove properties about them downstream.
2024-04-22 10:24:53 +00:00
Mario Carneiro
62cdb51ed5
feat: UTF-8 string validation (#3958)
Previously, there was a function `opaque fromUTF8Unchecked : ByteArray
-> String` which would convert a list of bytes into a string, but as the
name implies it does not validate that the string is UTF-8 before doing
so and as a result it produces unsound results in the compiler (because
the lean model of `String` indirectly asserts UTF-8 validity). This PR
replaces that function by
```lean
opaque validateUTF8 (a : @& ByteArray) : Bool

opaque fromUTF8 (a : @& ByteArray) (h : validateUTF8 a) : String
```
so that while the function is still "unchecked", we have a proof witness
that the string is valid. To recover the original, actually unchecked
version, use `lcProof` or other unsafe methods to produce the proof
witness.

Because this was the only `ByteArray -> String` conversion function, it
was used in several places in an unsound way (e.g. reading untrusted
input from IO and treating it as UTF-8). These have been replaced by
`fromUTF8?` or `fromUTF8!` as appropriate.
2024-04-20 18:36:37 +00:00
Mario Carneiro
aeacb7b69e
feat: String.Pos.isValid (#3959)
This adds a function that can be used to check whether a position is on
a UTF-8 byte boundary.
2024-04-20 14:57:35 +00:00
Mario Carneiro
e41cd310e9
fix: String.splitOn bug (#3832)
Fixes #3829. As reported on Zulip (both
[recently](https://leanprover.zulipchat.com/#narrow/stream/270676-lean4/topic/current.20definition.20of.20.60String.2EsplitOn.60.20is.20incorrect/near/430930535)
and [a year
ago](https://leanprover.zulipchat.com/#narrow/stream/270676-lean4/topic/should.20we.20redefine.20.60String.2EsplitOnAux.60.3F/near/365899332)),
`String.splitOn` has a bug when dealing with separators of more than one
character (which are luckily rare). The code change here is very small,
replacing a `i` with `i - j`, but it makes termination more complex so
that's where the rest of the line count goes.
2024-04-04 09:30:53 +00:00
Scott Morrison
94d6286e5a
chore: reorganising to reduce imports (#3790)
[Before](https://github.com/leanprover/lean4/files/14772220/oi.pdf) and
[after](https://github.com/leanprover/lean4/files/14772226/oi2.pdf).

This gets `ByteArray`, `String.Extra`, `ToString.Macro` and `RCases` out
of the imports of `omega`. I'd hoped to get `Array.Subarray` too, but
it's tangled up in the list literal syntax. Further progress could come
from make `split` use available `Decidable` instances, so we could pull
out `Classical` (and possibly some of `PropLemmas`).
2024-03-27 11:15:01 +00:00
Joachim Breitner
23d3ac4760
refactor: reduced unsed imports (#3464) 2024-02-22 18:12:57 +00:00
Joe Hendrix
29244f32f6
chore: upstream solve_by_elim (#3408)
This upstreams the solve_by_elim tactic from Std.

It is a key tactic needed by library_search.
2024-02-21 01:16:04 +00:00
Adrien Champion
a898aa18f3
chore: add documentation for the String.iterator API (#3300)
Adds documentation to the `String.Iterator` API, mentored by
@eric-wieser and @david-christiansen

---------

Co-authored-by: David Thrane Christiansen <david@davidchristiansen.dk>
2024-02-20 13:31:27 +00:00
Scott Morrison
904239ae61
feat: upstream some Syntax/Position helper functions used in code actions in Std (#3260)
Co-authored-by: David Thrane Christiansen <david@davidchristiansen.dk>
2024-02-09 10:50:19 +00:00
Joachim Breitner
368ead54b2 refactor: termination_by changes in stdlib 2024-01-10 17:27:35 +01:00
int-y1
ce4ae37c19 chore: fix more typos in comments 2023-10-08 14:37:34 -07:00
Joachim Breitner
b2d668c340
perf: Use flat ByteArrays in Trie (#2529) 2023-09-20 13:22:37 +02:00
Bulhwi Cha
367b38701f
refactor: simplify String.splitOnAux (#2271) 2023-07-19 11:50:27 +00:00
Mario Carneiro
bc841809c2 chore: remove intermediate 2023-06-05 15:50:11 -07:00
Mario Carneiro
2ae78f3c45 fix: tail-recursive String.foldr 2023-06-05 15:50:11 -07:00
Mario Carneiro
e68554b854 fix: use empty string instead of mk 2023-06-05 15:50:11 -07:00
Mario Carneiro
fd72fdf8f8 fix: incorrect utf8 in splitAux 2023-06-05 15:50:11 -07:00
Mario Carneiro
aa60791db3 feat: remove partial in Init.Data.String.Basic 2023-06-05 15:50:11 -07:00
Bulhwi Cha
8d0504b3b7 doc: add docstring to String.next' 2023-05-28 17:32:08 -07:00
Mario Carneiro
7f84bf07ba fix: bug in reference implementation of String.get? 2023-05-15 08:35:20 -07:00
Mario Carneiro
c9e84a6ad6 fix: remove private from string defs 2023-05-05 12:09:38 -07:00
Leonardo de Moura
3e33fcc4f8 chore: use lean_string_utf8_next_fast 2022-11-09 12:06:37 -08:00
Leonardo de Moura
92c03c0050 perf: prepare do add String.next' 2022-11-09 12:00:31 -08:00
Leonardo de Moura
20eeb4202f perf: fast String.get' without runtime bounds check
TODO: naming convention `String.get'` should be called `String.get`,
and we should rename the old `String.get`
2022-11-09 12:00:30 -08:00
Leonardo de Moura
5b1aac7b8f fix: avoid nontermination on non-utf8 input
This is not a perfect solution, but ensures the non-termination does
not happen. The changes also make it easier to prove termination in
the future.

TODO: validate UTF8 input?

closes #1690
2022-10-06 17:45:21 -07:00
E.W.Ayers
4ea4365354 doc: various String docstrings 2022-08-26 20:49:57 -07:00
Leonardo de Moura
eafd2a88ce chore: simplify Prelude.lean and Core.lean using elabAsElim 2022-07-29 18:13:56 -07:00
Leonardo de Moura
c341d8432f feat: remove leading spaces from docstrings 2022-07-18 22:18:15 -04:00
Leonardo de Moura
1caff852fb chore: remove getOp functions 2022-07-09 16:09:28 -07:00
Leonardo de Moura
757171db1f feat: add String.get! and s[i]! notation for String 2022-07-03 14:59:44 -07:00
Leonardo de Moura
e8935d996b chore: String.get?, String.getOp?, and remove String.getOp 2022-07-02 09:59:04 -07:00
Sebastian Ullrich
5a0c3b8d80 fix: String.isNat 2022-06-25 18:42:08 +02:00
Leonardo de Moura
02c4e548df feat: replace constant with opaque 2022-06-14 17:02:59 -07:00
Leonardo de Moura
041827bed5 chore: unused variables 2022-06-07 17:54:10 -07:00
Sebastian Ullrich
ae7b895f7a refactor: unname some unused variables 2022-06-07 16:37:45 -07:00
Leonardo de Moura
cae59c6916 chore: remove staging workarounds 2022-04-26 08:23:43 -07:00
Leonardo de Moura
6af1da450e feat: disable only eta for classes during TC resolution
closes #1123
2022-04-26 08:20:39 -07:00