I made a few choices so far that can probably be discussed:
- got rid of `modn` on `UInt`, nobody seems to use it apart from the
definition of `shift` which can use normal `mod`
- removed the previous defeq optimized definition of `USize.size` in
favor for a normal one. The motivation was to allow `OfNat` to work
which doesn't seem to be necessary anymore afaict.
- Minimized uses of `.val`, should we maybe mark it deprecated?
- Mostly got rid of `.val` in basically all theorems as the proper next
level of API would now be `.toBitVec`. We could probably re-prove them
but it would be more annoying given the change of definition.
- Did not yet redefine `log2` in terms of `BitVec` as this would require
a `log2` in `BitVec` as well, do we want this?
- I added a couple of theorems around the relation of `<` on `UInt` and
`Nat`. These were previously not needed because defeq was used all over
the place to save us. I did not yet generalize these to all types as I
wasn't sure if they are the appropriate lemma that we want to have.
This PR ensures that deprecated declarations are displayed with a
strikethrough markup in the completion popup of VS Code and that the
docstring of a completion item denotes the meta-data of the deprecation.
Resolve cases when the `To/FromJSON` type classes are used with `Empty`,
e.g. in the following motivating example.
```
import Lean
structure Foo (α : Type) where
y : Option α
deriving Lean.ToJson
#eval Lean.toJson (⟨none⟩ : Foo Empty) -- fails
```
This is a follow-up to this PR
https://github.com/leanprover/lean4/pull/5415, as suggested by
@eric-wieser. It expands on the original suggestion by also handling
`FromJSON`.
---------
Co-authored-by: Kyle Miller <kmill31415@gmail.com>
From the new doc-string:
```quote
In early versions of Lean, the typeclasses provided by `/` and `%`
were defined in terms of `tdiv` and `tmod`, and these were named simply as `div` and `mod`.
However we decided it was better to use `ediv` and `emod`,
as they are consistent with the conventions used in SMTLib, Mathlib,
and often mathematical reasoning is easier with these conventions.
At that time, we did not rename `div` and `mod` to `tdiv` and `tmod` (along with all their lemma).
In September 2024, we decided to do this rename (with deprecations in place),
and later we intend to rename `ediv` and `emod` to `div` and `mod`, as nearly all users will only
ever need to use these functions and their associated lemmas.
```
1. Remove the need to allocate an intermediate `String` for literally
every character in a JSON `String`.
2. Use a single `String` buffer in the entire `Json.compress` machinery.
3. Use `toListAppend`
Number 1 is doing most of the lifting in the perf diff, the rest are
some minor but measurable improvements.
This doesn't completely resolve the danger (only relevant in `prelude`
files) of importing `Init.Data.List.Basic` but not `Init.Data.List.Impl`
and thereby not having `@[csimp]` lemmas installed for some list
operations.
I'm going to address this better while working on `Array`.
This PR roughly halves the time needed to load the .ilean files by
optimizing the JSON parser and the conversion from JSON to Lean data
structures.
The code is optimized roughly as follows:
- String operations are inlined more aggressively
- Parsers are changed to use new `String.Iterator` functions `curr'` and
`next'` that receive a proof and hence do not need to perform an
additional check
- The `RefIdent` of .ilean files now uses a `String` instead of a `Name`
to avoid the expensive parse step from `String` to `Name` (despite the
fact that we only very rarely actually need a `Name` in downstream code)
- Instead of `List`s and `Subarray`s, the JSON to Lean conversion now
directly passes around arrays and array indices to avoid redundant
boxing
- Parsec's `peek?` sometimes generates redundant `Option` wrappers
because the generation of basic blocks interferes with the ctor-match
optimization, so it is changed to use an `isEof` check where possible
- Early returns and inline-do-blocks cause the code generator to
generate new functions, which then interfere with optimizations, so they
are now avoided
- Mutual defs are used instead of unspecialized passing of higher-order
functions to generate faster code
- The object parser is made tail-recursive
This PR also fixes a stack overflow in `Lean.Json.compress` that would
occur with long lists and adds a benchmark for the .ilean roundtrip
(compressed pretty-printing -> parsing).
I'm experimenting with changing the signature of `Ord.arrayOrd`; rather
than make a local synonym here, let's make a local instance so it
doesn't interact with the experiments.
This restores all of the imports of `Lean.Data.HashMap` and
`Lean.Data.HashSet` so that users actually see the deprecation warnings
instead of a "declaration not found" error.
For experimentation by @the-sofi-uwu.
I also have an efficient number parser in LeanSAT that I am planning to
upstream after we have sufficiently bikeshed this change.
This came up when watching new Lean users in a class situation. A number
of them were confused when they omitted a namespace on a constructor
name, and Lean treated the variable as a pattern that matches anything.
For example, this program is accepted but may not do what the user
thinks:
```
inductive Tree (α : Type) where
| leaf
| branch (left : Tree α) (val : α) (right : Tree α)
def depth : Tree α → Nat
| leaf => 0
```
Adding a `branch` case to `depth` results in a confusing message.
With this linter, Lean marks `leaf` with:
```
Local variable 'leaf' resembles constructor 'Tree.leaf' - write '.leaf' (with a dot) or 'Tree.leaf' to use the constructor.
note: this linter can be disabled with `set_option linter.constructorNameAsVariable false`
```
Additionally, the error message that occurs when invalid names are
applied in patterns now suggests similar names. This means that:
```
def length (list : List α) : Nat :=
match list with
| nil => 0
| cons x xs => length xs + 1
```
now results in the following warning on `nil`:
```
warning: Local variable 'nil' resembles constructor 'List.nil' - write '.nil' (with a dot) or 'List.nil' to use the constructor.
note: this linter can be disabled with `set_option linter.constructorNameAsVariable false`
```
and error on `cons`:
```
invalid pattern, constructor or constant marked with '[match_pattern]' expected
Suggestion: 'List.cons' is similar
```
The list of suggested constructors is generated before the type of the
pattern is known, so it's less accurate, but it truncates the list to
ten elements to avoid being overwhelming. This mostly comes up with
`mk`.
This `@[inline]` causes Lean to respecialize `RBMap.find?` to `NameMap`
at each call site of `NameMap.find?`, creating lots of unnecessary
duplicate IR.
To eliminate parsing differences between Windows and other platforms,
the frontend now normalizes all CRLF line endings to LF, like [in
Rust](https://github.com/rust-lang/rust/issues/62865).
Effects:
- This makes Lake hashes be faithful to what Lean sees (Lake already
normalizes line endings before computing hashes).
- Docstrings now have normalized line endings. In particular, this fixes
`#guard_msgs` failing multiline tests for Windows users using CRLF.
- Now strings don't have different lengths depending on the platform.
Before this PR, the following theorem is true for LF and false for CRLF
files.
```lean
example : "
".length = 1 := rfl
```
Note: the normalization will take `\r\r\n` and turn it into `\r\n`. In
the elaborator, we reject loose `\r`'s that appear in whitespace. Rust
instead takes the approach of making the normalization routine fail.
They do this so that there's no downstream confusion about any `\r\n`
that appears.
Implementation note: the LSP maintains its own copy of a source file
that it updates when edit operations are applied. We are assuming that
edit operations never split or join CRLFs. If this assumption is not
correct, then the LSP copy of a source file can become slightly out of
sync. If this is an issue, there is some discussion
[here](https://github.com/leanprover/lean4/pull/3903#discussion_r1592930085).
Fixes#3270 by moving the deprecation check from
`Lean.Elab.Term.mkConsts` to `Lean.Elab.Term.mkConst`, so
`Lean.Elab.Term.mkBaseProjections`, `.elabAppLValsAux`, `.elabAppFn`,
and `.elabForIn` also hit the check. Not all of these really need to hit
the check, so I'll run `!bench` to see if it's a problem.
Because of the last-added-tried-first rule for macros, all the special
purpose `decreasing_trivial` rules are tried for most recursive
definitions out there, and because they use `apply` and `assumption`
with default transparency may cause some definitoins to be unfolded over
and over again.
A quick test with one of the functions in the leansat project shows that
elaboration time goes down from 600ms to 375ms when using
```
decreasing_by all_goals decreasing_with with_reducible decreasing_trivial
```
instead of
```
decreasing_by all_goals decreasing_with decreasing_trivial
```
This change uses `with_reducible` in most of these macros.
This means that these tactics will no longer work when the
relations/definitions they look for is hidden behind a definition.
This affected in particular `Array.sizeOf_get`, which now has a
companion `sizeOf_getElem`.
In addition, there were three tactics using `apply` to apply Nat-related
lemmas
that we now expect `omega` to solve. We still need them when building
`Init` modules
that don’t have access to `omega`, but they now live in
`decreasing_trivial_pre_omega`,
meant to be only used internally.
Adds a `--json` option to the `lean` CLI. When used, the Lean frontend
will print messages as JSON objects using the default `ToJson` encoding
for the `Message` structure. This allows consumers (such as Lake) to
handle Lean output in a more intelligent, well-structured way.
`Message` has been refactored into `BaseMessage`, `Message`, and
`SerialMessage` to enable deriving `ToJson`/ `FromJson` instances
automatically for `BaseMessage` / `SerialMessage`. `SerialMessage` is a
`Message` with its `MessageData` eagerly serialized to a `String`.
Previously, there was a function `opaque fromUTF8Unchecked : ByteArray
-> String` which would convert a list of bytes into a string, but as the
name implies it does not validate that the string is UTF-8 before doing
so and as a result it produces unsound results in the compiler (because
the lean model of `String` indirectly asserts UTF-8 validity). This PR
replaces that function by
```lean
opaque validateUTF8 (a : @& ByteArray) : Bool
opaque fromUTF8 (a : @& ByteArray) (h : validateUTF8 a) : String
```
so that while the function is still "unchecked", we have a proof witness
that the string is valid. To recover the original, actually unchecked
version, use `lcProof` or other unsafe methods to produce the proof
witness.
Because this was the only `ByteArray -> String` conversion function, it
was used in several places in an unsound way (e.g. reading untrusted
input from IO and treating it as UTF-8). These have been replaced by
`fromUTF8?` or `fromUTF8!` as appropriate.
While implementing #3925, I noticed that the performance of the
`textDocument/semanticTokens/full` request is *extremely* bad due to a
quadratic implementation. Specifically, on my machine, computing the
full semantic tokens for `Lean/Elab/Do.lean` took a full 5s. In
practice, this means that while elaborating the file, one core is
entirely busy with computing the semantic tokens for the file.
This PR fixes this performance bug by re-implementing the semantic token
handling, reducing the latency for `Lean/Elab/Do.lean` from 5s to 60ms.
As a result, the overly cautious refresh latency of 5s in #3925 can
easily be reduced to 2s again.
Since the previous semantic tokens implementation used a very brittle
hack to identify projections, this PR also changes the projection
notation elaboration to augment the `InfoTree` syntax for the field of a
projection with a special syntax node of kind
`Lean.Parser.Term.identProjKind`. With this syntax kind, projection
fields can now easily be identified in the `InfoTree`.
When `discharge?` failed, the `usedSimps` was being restored, but the
cache wasn't. This bug was exposed by issue #3710.
This PR makes the following changes:
- We restore the `cache` at `discharge?`. We use `SMap` to ensure the
operation is efficient.
- We don't need the field `dischargeDepth` anymore at `Simp.Result`.
- `UsedSimps` should use `PHashMap` since it is not used linearly.
closes#3710
---------
Co-authored-by: Mario Carneiro <di.gama@gmail.com>
This makes changes to the `GetElem` class so that it does not lead to
unnecessary overhead in container like `RBMap`.
The changes are to:
1. Make `getElem?` and `getElem!` part of the `GetElem` class so they
can be overridden in instances.
2. Introduce a `LawfulGetElem` class that contains correctness theorems
for `getElem?` and `getElem!` using the original definitions.
3. Reorganize definitions (e.g, by moving `GetElem` out of
`Init.Prelude`) so that the `GetElem` changes are feasible.
4. Provide `LawfulGetElem` instances to complement all existing
`GetElem` instances in Lean core.
To reduce the size of the PR, this doesn't do the work of providing new
`GetElem` instances for `RBMap`, `HashMap` etc. That will be done in a
separate PR (#3688) that depends on this.
---------
Co-authored-by: Mac Malone <tydeu@hatpress.net>
[Before](https://github.com/leanprover/lean4/files/14772220/oi.pdf) and
[after](https://github.com/leanprover/lean4/files/14772226/oi2.pdf).
This gets `ByteArray`, `String.Extra`, `ToString.Macro` and `RCases` out
of the imports of `omega`. I'd hoped to get `Array.Subarray` too, but
it's tangled up in the list literal syntax. Further progress could come
from make `split` use available `Decidable` instances, so we could pull
out `Classical` (and possibly some of `PropLemmas`).
`FileMap.lines` is an array that seems to be manually managed to have
the form `#[1, 2, ..., n-1, n-1]` with same length as
`FileMap.positions`. Remove this structure field in favour of
calculating the line number as `min(x+1, positions.size-1)` when needed.
Follow-up on #3221
This coercion caused difficult-to-diagnose bugs sometimes. Because there
are some situations where converting a string to a name should be done
by parsing the string, and others where it should not, an explicit
choice seems better here.
---------
Co-authored-by: Mac Malone <tydeu@hatpress.net>
Sends a diagnostic informing the user to run Restart File when a file
dependency is saved.
Based on #3014 because this feature was easier to implement with the new
architecture.
ToDo:
- [x] Adjust vscode-lean4 to display a notification when this diagnostic
appears in a non-annoying way
(https://github.com/leanprover/vscode-lean4/pull/393)
- [x] Use a file watcher to identify changes to files not tracked by VS
Code
- [x] Rebase onto master when #3014 is merged
- Removes the public definitions `Array.eraseIdxAux` and
`Array.eraseIdxSzAux` which were implementation details.
- Motivation: `Array.eraseIdxAux` and `Array.eraseIdxSzAux` were clearly
not intended to remain public, but simply making them private would make
it inconvenient to unfold them when writing proofs in Std.
- Adds documentation comments to the public `Array.eraseIdx`-related
definitions which remain.
- Removes `Array.eraseIdx'` which was just `Array.feraseIdx` wrapped in
a subtype and adds `Array.size_feraseIdx` to prove the subtype property
as a standalone theorem.
Co-Authored-By: Daniel Windham <daniel@atlascomputing.org>
Fixes#1170.
This PR adds the module name to `RefIdent` in order to distinguish
conflicting names from different files. This also fixes related issues
in find-references or the call hierarchy feature.
It also adds some docstrings and stylistically refactors a bunch of
code.