lean4-htt/tests/lean/run/string_gaps.lean
Kyle Miller a7338c5ad8
feat: make frontend normalize line endings to LF (#3903)
To eliminate parsing differences between Windows and other platforms,
the frontend now normalizes all CRLF line endings to LF, like [in
Rust](https://github.com/rust-lang/rust/issues/62865).

Effects:
- This makes Lake hashes be faithful to what Lean sees (Lake already
normalizes line endings before computing hashes).
- Docstrings now have normalized line endings. In particular, this fixes
`#guard_msgs` failing multiline tests for Windows users using CRLF.
- Now strings don't have different lengths depending on the platform.
Before this PR, the following theorem is true for LF and false for CRLF
files.
```lean
example : "
".length = 1 := rfl
```

Note: the normalization will take `\r\r\n` and turn it into `\r\n`. In
the elaborator, we reject loose `\r`'s that appear in whitespace. Rust
instead takes the approach of making the normalization routine fail.
They do this so that there's no downstream confusion about any `\r\n`
that appears.

Implementation note: the LSP maintains its own copy of a source file
that it updates when edit operations are applied. We are assuming that
edit operations never split or join CRLFs. If this assumption is not
correct, then the LSP copy of a source file can become slightly out of
sync. If this is an issue, there is some discussion
[here](https://github.com/leanprover/lean4/pull/3903#discussion_r1592930085).
2024-05-20 17:13:08 +00:00

166 lines
3.5 KiB
Text

import Lean.Parser.Extension
import Lean.Elab.Term
/-!
# Testing string gaps in string literals
String gaps are described in RFC #2838
-/
/-!
A string gap with no trailing space.
-/
/-- info: "ab" -/
#guard_msgs in
#eval "a\
b"
/-!
A string gap with trailing space before the `b`, which is consumed.
-/
/-- info: "ab" -/
#guard_msgs in
#eval "a\
b"
/-!
A string gap with space before the gap, which is not consumed.
-/
/-- info: "a b" -/
#guard_msgs in
#eval "a \
b"
/-!
Multiple string gaps in a row.
-/
/-- info: "a b" -/
#guard_msgs in
#eval "a \
\
\
b"
/-!
Two tests from the RFC.
-/
/-- info: "this is a string" -/
#guard_msgs in
#eval "this is \
a string"
/-- info: "this is a string" -/
#guard_msgs in
#eval "this is \
a string"
/-!
Two examples of how spaces are accounted for in string gaps. `\x20` is a way to force a leading space.
-/
/-- info: "there are three spaces between the brackets < >" -/
#guard_msgs in
#eval "there are three spaces between the brackets < \
>"
/-- info: "there are three spaces between the brackets < >" -/
#guard_msgs in
#eval "there are three spaces between the brackets <\
\x20 >"
/-!
Using `\n` to terminate a string gap, which is a technique suggested by Mario for using string gaps to write
multiline literals in an indented context.
-/
/-- info: "this is\n a string with two space indent" -/
#guard_msgs in
#eval "this is\
\n a string with two space indent"
/-!
Similar tests but for interpolated strings.
-/
/-- info: "ab" -/
#guard_msgs in
#eval s!"a\
b"
/-- info: "ab" -/
#guard_msgs in
#eval s!"a\
b"
/-- info: "ab" -/
#guard_msgs in
#eval s!"a\
b"
/-!
The `{` terminates the string gap.
-/
/-- info: "ab" -/
#guard_msgs in
#eval s!"a\
{"b"}\
"
open Lean
/-!
## Testing whitespace handling with specific line terminators
-/
/-!
Standard string gap, with LF
-/
/-- info: "ab" -/
#guard_msgs in
#eval show MetaM String from do
let stx ← ofExcept <| Parser.runParserCategory (← getEnv) `term "\"a\\\n b\""
let some s := stx.isStrLit? | failure
return s
/-!
Isolated CR, which is an error
-/
/-- error: <input>:1:3: invalid escape sequence -/
#guard_msgs (error, drop info) in
#eval show MetaM String from do
let stx ← ofExcept <| Parser.runParserCategory (← getEnv) `term "\"a\\\r b\""
let some s := stx.isStrLit? | failure
return s
/-!
Not a string gap since there's no end-of-line.
-/
/-- error: <input>:1:3: invalid escape sequence -/
#guard_msgs (error, drop info) in
#eval show MetaM String from do
let stx ← ofExcept <| Parser.runParserCategory (← getEnv) `term "\"a\\ b\""
let some s := stx.isStrLit? | failure
return s
/-!
## Scala-style stripMargin
This is a test that string gaps could be paired with a new string elaboration syntax
for indented multiline string literals.
-/
def String.dedent (s : String) : Option String :=
let parts := s.split (· == '\n') |>.map String.trimLeft
match parts with
| [] => ""
| [p] => p
| p₀ :: parts =>
if !parts.all (·.startsWith "|") then
none
else
p₀ ++ "\n" ++ String.intercalate "\n" (parts.map fun p => p.drop 1)
elab "d!" s:str : term => do
let some s := s.raw.isStrLit? | Lean.Elab.throwIllFormedSyntax
let some s := String.dedent s | Lean.Elab.throwIllFormedSyntax
pure $ Lean.mkStrLit s
/-- info: "this is line 1\n line 2, indented\nline 3" -/
#guard_msgs in
#eval d!"this is \
line 1
| line 2, indented
|line 3"