To eliminate parsing differences between Windows and other platforms, the frontend now normalizes all CRLF line endings to LF, like [in Rust](https://github.com/rust-lang/rust/issues/62865). Effects: - This makes Lake hashes be faithful to what Lean sees (Lake already normalizes line endings before computing hashes). - Docstrings now have normalized line endings. In particular, this fixes `#guard_msgs` failing multiline tests for Windows users using CRLF. - Now strings don't have different lengths depending on the platform. Before this PR, the following theorem is true for LF and false for CRLF files. ```lean example : " ".length = 1 := rfl ``` Note: the normalization will take `\r\r\n` and turn it into `\r\n`. In the elaborator, we reject loose `\r`'s that appear in whitespace. Rust instead takes the approach of making the normalization routine fail. They do this so that there's no downstream confusion about any `\r\n` that appears. Implementation note: the LSP maintains its own copy of a source file that it updates when edit operations are applied. We are assuming that edit operations never split or join CRLFs. If this assumption is not correct, then the LSP copy of a source file can become slightly out of sync. If this is an issue, there is some discussion [here](https://github.com/leanprover/lean4/pull/3903#discussion_r1592930085).
166 lines
3.5 KiB
Text
166 lines
3.5 KiB
Text
import Lean.Parser.Extension
|
|
import Lean.Elab.Term
|
|
|
|
/-!
|
|
# Testing string gaps in string literals
|
|
|
|
String gaps are described in RFC #2838
|
|
-/
|
|
|
|
/-!
|
|
A string gap with no trailing space.
|
|
-/
|
|
/-- info: "ab" -/
|
|
#guard_msgs in
|
|
#eval "a\
|
|
b"
|
|
|
|
/-!
|
|
A string gap with trailing space before the `b`, which is consumed.
|
|
-/
|
|
/-- info: "ab" -/
|
|
#guard_msgs in
|
|
#eval "a\
|
|
b"
|
|
|
|
/-!
|
|
A string gap with space before the gap, which is not consumed.
|
|
-/
|
|
/-- info: "a b" -/
|
|
#guard_msgs in
|
|
#eval "a \
|
|
b"
|
|
|
|
/-!
|
|
Multiple string gaps in a row.
|
|
-/
|
|
/-- info: "a b" -/
|
|
#guard_msgs in
|
|
#eval "a \
|
|
\
|
|
\
|
|
b"
|
|
|
|
/-!
|
|
Two tests from the RFC.
|
|
-/
|
|
/-- info: "this is a string" -/
|
|
#guard_msgs in
|
|
#eval "this is \
|
|
a string"
|
|
/-- info: "this is a string" -/
|
|
#guard_msgs in
|
|
#eval "this is \
|
|
a string"
|
|
|
|
/-!
|
|
Two examples of how spaces are accounted for in string gaps. `\x20` is a way to force a leading space.
|
|
-/
|
|
/-- info: "there are three spaces between the brackets < >" -/
|
|
#guard_msgs in
|
|
#eval "there are three spaces between the brackets < \
|
|
>"
|
|
/-- info: "there are three spaces between the brackets < >" -/
|
|
#guard_msgs in
|
|
#eval "there are three spaces between the brackets <\
|
|
\x20 >"
|
|
|
|
/-!
|
|
Using `\n` to terminate a string gap, which is a technique suggested by Mario for using string gaps to write
|
|
multiline literals in an indented context.
|
|
-/
|
|
/-- info: "this is\n a string with two space indent" -/
|
|
#guard_msgs in
|
|
#eval "this is\
|
|
\n a string with two space indent"
|
|
|
|
/-!
|
|
Similar tests but for interpolated strings.
|
|
-/
|
|
/-- info: "ab" -/
|
|
#guard_msgs in
|
|
#eval s!"a\
|
|
b"
|
|
/-- info: "ab" -/
|
|
#guard_msgs in
|
|
#eval s!"a\
|
|
b"
|
|
/-- info: "ab" -/
|
|
#guard_msgs in
|
|
#eval s!"a\
|
|
b"
|
|
|
|
/-!
|
|
The `{` terminates the string gap.
|
|
-/
|
|
/-- info: "ab" -/
|
|
#guard_msgs in
|
|
#eval s!"a\
|
|
{"b"}\
|
|
"
|
|
|
|
open Lean
|
|
|
|
/-!
|
|
## Testing whitespace handling with specific line terminators
|
|
-/
|
|
|
|
/-!
|
|
Standard string gap, with LF
|
|
-/
|
|
/-- info: "ab" -/
|
|
#guard_msgs in
|
|
#eval show MetaM String from do
|
|
let stx ← ofExcept <| Parser.runParserCategory (← getEnv) `term "\"a\\\n b\""
|
|
let some s := stx.isStrLit? | failure
|
|
return s
|
|
|
|
/-!
|
|
Isolated CR, which is an error
|
|
-/
|
|
/-- error: <input>:1:3: invalid escape sequence -/
|
|
#guard_msgs (error, drop info) in
|
|
#eval show MetaM String from do
|
|
let stx ← ofExcept <| Parser.runParserCategory (← getEnv) `term "\"a\\\r b\""
|
|
let some s := stx.isStrLit? | failure
|
|
return s
|
|
|
|
/-!
|
|
Not a string gap since there's no end-of-line.
|
|
-/
|
|
/-- error: <input>:1:3: invalid escape sequence -/
|
|
#guard_msgs (error, drop info) in
|
|
#eval show MetaM String from do
|
|
let stx ← ofExcept <| Parser.runParserCategory (← getEnv) `term "\"a\\ b\""
|
|
let some s := stx.isStrLit? | failure
|
|
return s
|
|
|
|
/-!
|
|
## Scala-style stripMargin
|
|
|
|
This is a test that string gaps could be paired with a new string elaboration syntax
|
|
for indented multiline string literals.
|
|
-/
|
|
|
|
def String.dedent (s : String) : Option String :=
|
|
let parts := s.split (· == '\n') |>.map String.trimLeft
|
|
match parts with
|
|
| [] => ""
|
|
| [p] => p
|
|
| p₀ :: parts =>
|
|
if !parts.all (·.startsWith "|") then
|
|
none
|
|
else
|
|
p₀ ++ "\n" ++ String.intercalate "\n" (parts.map fun p => p.drop 1)
|
|
|
|
elab "d!" s:str : term => do
|
|
let some s := s.raw.isStrLit? | Lean.Elab.throwIllFormedSyntax
|
|
let some s := String.dedent s | Lean.Elab.throwIllFormedSyntax
|
|
pure $ Lean.mkStrLit s
|
|
|
|
/-- info: "this is line 1\n line 2, indented\nline 3" -/
|
|
#guard_msgs in
|
|
#eval d!"this is \
|
|
line 1
|
|
| line 2, indented
|
|
|line 3"
|