/- Copyright (c) 2019 Microsoft Corporation. All rights reserved. Released under Apache 2.0 license as described in the file LICENSE. Authors: Leonardo de Moura, Sebastian Ullrich -/ prelude import Lean.Parser.Types /-! # Basic Lean parser infrastructure The Lean parser was developed with the following primary goals in mind: * flexibility: Lean's grammar is complex and includes indentation and other whitespace sensitivity. It should be possible to introduce such custom "tweaks" locally without having to adjust the fundamental parsing approach. * extensibility: Lean's grammar can be extended on the fly within a Lean file, and with Lean 4 we want to extend this to cover embedding domain-specific languages that may look nothing like Lean, down to using a separate set of tokens. * losslessness: The parser should produce a concrete syntax tree that preserves all whitespace and other "sub-token" information for the use in tooling. * performance: The overhead of the parser building blocks, and the overall parser performance on average-complexity input, should be comparable with that of the previous parser hand-written in C++. No fancy optimizations should be necessary for this. Given these constraints, we decided to implement a combinatoric, non-monadic, lexer-less, memoizing recursive-descent parser. Using combinators instead of some more formal and introspectible grammar representation ensures ultimate flexibility as well as efficient extensibility: there is (almost) no pre-processing necessary when extending the grammar with a new parser. However, because all the results the combinators produce are of the homogeneous `Syntax` type, the basic parser type is not actually a monad but a monomorphic linear function `ParserState → ParserState`, avoiding constructing and deconstructing countless monadic return values. Instead of explicitly returning syntax objects, parsers push (zero or more of) them onto a syntax stack inside the linear state. Chaining parsers via `>>` accumulates their output on the stack. Combinators such as `node` then pop off all syntax objects produced during their invocation and wrap them in a single `Syntax.node` object that is again pushed on this stack. Instead of calling `node` directly, we usually use the macro `leading_parser p`, which unfolds to `node k p` where the new syntax node kind `k` is the name of the declaration being defined. The lack of a dedicated lexer ensures we can modify and replace the lexical grammar at any point, and simplifies detecting and propagating whitespace. The parser still has a concept of "tokens", however, and caches the most recent one for performance: when `tokenFn` is called twice at the same position in the input, it will reuse the result of the first call. `tokenFn` recognizes some built-in variable-length tokens such as identifiers as well as any fixed token in the `ParserContext`'s `TokenTable` (a trie); however, the same cache field and strategy could be reused by custom token parsers. Tokens also play a central role in the `prattParser` combinator, which selects a *leading* parser followed by zero or more *trailing* parsers based on the current token (via `peekToken`); see the documentation of `prattParser` for more details. Tokens are specified via the `symbol` parser, or with `symbolNoWs` for tokens that should not be preceded by whitespace. The `Parser` type is extended with additional metadata over the mere parsing function to propagate token information: `collectTokens` collects all tokens within a parser for registering. `firstTokens` holds information about the "FIRST" token set used to speed up parser selection in `prattParser`. This approach of combining static and dynamic information in the parser type is inspired by the paper "Deterministic, Error-Correcting Combinator Parsers" by Swierstra and Duponcheel. If multiple parsers accept the same current token, `prattParser` tries all of them using the backtracking `longestMatchFn` combinator. This is the only case where standard parsers might execute arbitrary backtracking. Repeated invocations of the same category or concrete parser at the same position are cached where possible; see `withCache`. Finally, error reporting follows the standard combinatoric approach of collecting a single unexpected token/... and zero or more expected tokens (see `Error` below). Expected tokens are e.g. set by `symbol` and merged by `<|>`. Combinators running multiple parsers should check if an error message is set in the parser state (`hasError`) and act accordingly. Error recovery is left to the designer of the specific language; for example, Lean's top-level `parseCommand` loop skips tokens until the next command keyword on error. -/ namespace Lean.Parser def dbgTraceStateFn (label : String) (p : ParserFn) : ParserFn := fun c s => let sz := s.stxStack.size let s' := p c s dbg_trace "{label} pos: {s'.pos} err: {s'.errorMsg} out: {s'.stxStack.extract sz s'.stxStack.size}" s' def dbgTraceState (label : String) : Parser → Parser := withFn (dbgTraceStateFn label) @[noinline]def epsilonInfo : ParserInfo := { firstTokens := FirstTokens.epsilon } def checkStackTopFn (p : Syntax → Bool) (msg : String) : ParserFn := fun _ s => if p s.stxStack.back then s else s.mkUnexpectedError msg def checkStackTop (p : Syntax → Bool) (msg : String) : Parser := { info := epsilonInfo, fn := checkStackTopFn p msg } def andthenFn (p q : ParserFn) : ParserFn := fun c s => let s := p c s if s.hasError then s else q c s @[noinline] def andthenInfo (p q : ParserInfo) : ParserInfo := { collectTokens := p.collectTokens ∘ q.collectTokens, collectKinds := p.collectKinds ∘ q.collectKinds, firstTokens := p.firstTokens.seq q.firstTokens } /-- The `andthen(p, q)` combinator, usually written as adjacency in syntax declarations (`p q`), parses `p` followed by `q`. The arity of this parser is the sum of the arities of `p` and `q`: that is, it accumulates all the nodes produced by `p` followed by the nodes from `q` into the list of arguments to the surrounding parse node. -/ def andthen (p q : Parser) : Parser where info := andthenInfo p.info q.info fn := andthenFn p.fn q.fn instance : AndThen Parser where andThen a b := andthen a (b ()) def nodeFn (n : SyntaxNodeKind) (p : ParserFn) : ParserFn := fun c s => let iniSz := s.stackSize let s := p c s s.mkNode n iniSz def trailingNodeFn (n : SyntaxNodeKind) (p : ParserFn) : ParserFn := fun c s => let iniSz := s.stackSize let s := p c s s.mkTrailingNode n iniSz @[noinline] def nodeInfo (n : SyntaxNodeKind) (p : ParserInfo) : ParserInfo := { collectTokens := p.collectTokens, collectKinds := fun s => (p.collectKinds s).insert n, firstTokens := p.firstTokens } def node (n : SyntaxNodeKind) (p : Parser) : Parser := { info := nodeInfo n p.info, fn := nodeFn n p.fn } def errorFn (msg : String) : ParserFn := fun _ s => s.mkUnexpectedError msg def error (msg : String) : Parser := { info := epsilonInfo, fn := errorFn msg } def errorAtSavedPosFn (msg : String) (delta : Bool) : ParserFn := fun c s => match c.savedPos? with | none => s | some pos => let pos := if delta then c.input.next pos else pos s.mkUnexpectedErrorAt msg pos /-- Generate an error at the position saved with the `withPosition` combinator. If `delta == true`, then it reports at saved position+1. This useful to make sure a parser consumed at least one character. -/ @[builtin_doc] def errorAtSavedPos (msg : String) (delta : Bool) : Parser := { fn := errorAtSavedPosFn msg delta } /-- Succeeds if `c.prec <= prec` -/ def checkPrecFn (prec : Nat) : ParserFn := fun c s => if c.prec <= prec then s else s.mkUnexpectedError "unexpected token at this precedence level; consider parenthesizing the term" def checkPrec (prec : Nat) : Parser := { info := epsilonInfo fn := checkPrecFn prec } /-- Succeeds if `c.lhsPrec >= prec` -/ def checkLhsPrecFn (prec : Nat) : ParserFn := fun _ s => if s.lhsPrec >= prec then s else s.mkUnexpectedError "unexpected token at this precedence level; consider parenthesizing the term" def checkLhsPrec (prec : Nat) : Parser := { info := epsilonInfo fn := checkLhsPrecFn prec } def setLhsPrecFn (prec : Nat) : ParserFn := fun _ s => if s.hasError then s else { s with lhsPrec := prec } def setLhsPrec (prec : Nat) : Parser := { info := epsilonInfo fn := setLhsPrecFn prec } private def addQuotDepth (i : Int) (p : Parser) : Parser := adaptCacheableContext (fun c => { c with quotDepth := c.quotDepth + i |>.toNat }) p def incQuotDepth (p : Parser) : Parser := addQuotDepth 1 p def decQuotDepth (p : Parser) : Parser := addQuotDepth (-1) p def suppressInsideQuot : Parser → Parser := adaptCacheableContext fun c => -- if we are already within a quotation, don't change anything if c.quotDepth == 0 then { c with suppressInsideQuot := true } else c def leadingNode (n : SyntaxNodeKind) (prec : Nat) (p : Parser) : Parser := checkPrec prec >> node n p >> setLhsPrec prec def trailingNodeAux (n : SyntaxNodeKind) (p : Parser) : TrailingParser := { info := nodeInfo n p.info fn := trailingNodeFn n p.fn } def trailingNode (n : SyntaxNodeKind) (prec lhsPrec : Nat) (p : Parser) : TrailingParser := checkPrec prec >> checkLhsPrec lhsPrec >> trailingNodeAux n p >> setLhsPrec prec def mergeOrElseErrors (s : ParserState) (error1 : Error) (iniPos : String.Pos) (mergeErrors : Bool) : ParserState := match s with | ⟨stack, lhsPrec, pos, cache, some error2, errs⟩ => if pos == iniPos then ⟨stack, lhsPrec, pos, cache, some (if mergeErrors then error1.merge error2 else error2), errs⟩ else s | other => other -- When `p` in `p <|> q` parses exactly one antiquotation, ... inductive OrElseOnAntiquotBehavior where | acceptLhs -- return it | takeLongest -- return result of `q` instead if it made more progress | merge -- ... and create choice node if both made the same progress deriving BEq def orelseFnCore (p q : ParserFn) (antiquotBehavior := OrElseOnAntiquotBehavior.merge) : ParserFn := fun c s => Id.run do let iniSz := s.stackSize let iniPos := s.pos let mut s := p c s match s.errorMsg with | some errorMsg => if s.pos == iniPos then mergeOrElseErrors (q c (s.restore iniSz iniPos)) errorMsg iniPos true else s | none => let pBack := s.stxStack.back if antiquotBehavior == .acceptLhs || s.stackSize != iniSz + 1 || !pBack.isAntiquots then return s let pPos := s.pos s := s.restore iniSz iniPos s := q c s if s.hasError then return s.restore iniSz pPos |>.pushSyntax pBack -- If `q` made more progress than `p`, we prefer its result. -- Thus `(structInstField| $id := $val) is interpreted as -- `(structInstField| $id:ident := $val:term), not -- `(structInstField| $id:structInstField . if s.pos > pPos then return s if s.pos < pPos || antiquotBehavior != .merge || s.stackSize != iniSz + 1 || !s.stxStack.back.isAntiquots then return s.restore iniSz pPos |>.pushSyntax pBack -- Pop off result of `q`, push result(s) of `p` and `q` in that order, turn them into a choice node let qBack := s.stxStack.back s := s.popSyntax let pushAntiquots stx s := if stx.isOfKind choiceKind then -- Flatten existing choice node { s with stxStack := s.stxStack ++ stx.getArgs } else s.pushSyntax stx s := pushAntiquots pBack s s := pushAntiquots qBack s s.mkNode choiceKind iniSz def orelseFn (p q : ParserFn) : ParserFn := orelseFnCore p q @[noinline] def orelseInfo (p q : ParserInfo) : ParserInfo := { collectTokens := p.collectTokens ∘ q.collectTokens collectKinds := p.collectKinds ∘ q.collectKinds firstTokens := p.firstTokens.merge q.firstTokens } /-- Run `p`, falling back to `q` if `p` failed without consuming any input. NOTE: In order for the pretty printer to retrace an `orelse`, `p` must be a call to `node` or some other parser producing a single node kind. Nested `orelse` calls are flattened for this, i.e. `(node k1 p1 <|> node k2 p2) <|> ...` is fine as well. -/ @[builtin_doc] def orelse (p q : Parser) : Parser where info := orelseInfo p.info q.info fn := orelseFn p.fn q.fn instance : OrElse Parser where orElse a b := orelse a (b ()) @[noinline] def noFirstTokenInfo (info : ParserInfo) : ParserInfo := { collectTokens := info.collectTokens collectKinds := info.collectKinds } def atomicFn (p : ParserFn) : ParserFn := fun c s => let iniPos := s.pos match p c s with | ⟨stack, lhsPrec, _, cache, some msg, errs⟩ => ⟨stack, lhsPrec, iniPos, cache, some msg, errs⟩ | other => other /-- The `atomic(p)` parser parses `p`, returns the same result as `p` and fails iff `p` fails, but if `p` fails after consuming some tokens `atomic(p)` will fail without consuming tokens. This is important for the `p <|> q` combinator, because it is not backtracking, and will fail if `p` fails after consuming some tokens. To get backtracking behavior, use `atomic(p) <|> q` instead. This parser has the same arity as `p` - it produces the same result as `p`. -/ @[builtin_doc] def atomic : Parser → Parser := withFn atomicFn /-- Information about the state of the parse prior to the failing parser's execution -/ structure RecoveryContext where /-- The position prior to the failing parser -/ initialPos : String.Pos /-- The syntax stack height prior to the failing parser's execution -/ initialSize : Nat deriving BEq, DecidableEq, Repr /-- Recover from errors in `p` using `recover` to consume input until a known-good state has appeared. If `recover` fails itself, then no recovery is performed. `recover` is provided with information about the failing parser's effects , and it is run in the state immediately after the failure. -/ def recoverFn (p : ParserFn) (recover : RecoveryContext → ParserFn) : ParserFn := fun c s => let iniPos := s.pos let iniSz := s.stxStack.size let s' := p c s if let some msg := s'.errorMsg then let s' := recover ⟨iniPos, iniSz⟩ c {s' with errorMsg := none} if s'.hasError then s' else {s' with pos := s'.pos, lhsPrec := s'.lhsPrec, cache := s'.cache, errorMsg := none, recoveredErrors := s'.recoveredErrors.push (s'.pos, s'.stxStack, msg) } else s' /-- Recover from errors in `parser` using `handler` to consume input until a known-good state has appeared. If `handler` fails itself, then no recovery is performed. `handler` is provided with information about the failing parser's effects , and it is run in the state immediately after the failure. The interactions between <|> and `recover'` are subtle, especially for syntactic categories that admit user extension. Consider avoiding it in these cases. -/ @[builtin_doc] def recover' (parser : Parser) (handler : RecoveryContext → Parser) : Parser where info := parser.info fn := recoverFn parser.fn fun s => handler s |>.fn /-- Recover from errors in `parser` using `handler` to consume input until a known-good state has appeared. If `handler` fails itself, then no recovery is performed. `handler` is run in the state immediately after the failure. The interactions between <|> and `recover` are subtle, especially for syntactic categories that admit user extension. Consider avoiding it in these cases. -/ @[builtin_doc] def recover (parser handler : Parser) : Parser := recover' parser fun _ => handler def optionalFn (p : ParserFn) : ParserFn := fun c s => let iniSz := s.stackSize let iniPos := s.pos let s := p c s let s := if s.hasError && s.pos == iniPos then s.restore iniSz iniPos else s s.mkNode nullKind iniSz @[noinline] def optionaInfo (p : ParserInfo) : ParserInfo := { collectTokens := p.collectTokens collectKinds := p.collectKinds firstTokens := p.firstTokens.toOptional } def optionalNoAntiquot (p : Parser) : Parser := { info := optionaInfo p.info fn := optionalFn p.fn } def lookaheadFn (p : ParserFn) : ParserFn := fun c s => let iniSz := s.stackSize let iniPos := s.pos let s := p c s if s.hasError then s else s.restore iniSz iniPos /-- `lookahead(p)` runs `p` and fails if `p` does, but it produces no parse nodes and rewinds the position to the original state on success. So for example `lookahead("=>")` will ensure that the next token is `"=>"`, without actually consuming this token. This parser has arity 0 - it does not capture anything. -/ @[builtin_doc] def lookahead : Parser → Parser := withFn lookaheadFn def notFollowedByFn (p : ParserFn) (msg : String) : ParserFn := fun c s => let iniSz := s.stackSize let iniPos := s.pos let s := p c s if s.hasError then s.restore iniSz iniPos else let s := s.restore iniSz iniPos s.mkUnexpectedError s!"unexpected {msg}" /-- `notFollowedBy(p, "foo")` succeeds iff `p` fails; if `p` succeeds then it fails with the message `"unexpected foo"`. This parser has arity 0 - it does not capture anything. -/ @[builtin_doc] def notFollowedBy (p : Parser) (msg : String) : Parser where fn := notFollowedByFn p.fn msg partial def manyAux (p : ParserFn) : ParserFn := fun c s => Id.run do let iniSz := s.stackSize let iniPos := s.pos let mut s := p c s if s.hasError then return if iniPos == s.pos then s.restore iniSz iniPos else s if iniPos == s.pos then return s.mkUnexpectedError "invalid 'many' parser combinator application, parser did not consume anything" if s.stackSize > iniSz + 1 then s := s.mkNode nullKind iniSz manyAux p c s def manyFn (p : ParserFn) : ParserFn := fun c s => let iniSz := s.stackSize let s := manyAux p c s s.mkNode nullKind iniSz def manyNoAntiquot (p : Parser) : Parser := { info := noFirstTokenInfo p.info fn := manyFn p.fn } def many1Fn (p : ParserFn) : ParserFn := fun c s => let iniSz := s.stackSize let s := andthenFn p (manyAux p) c s s.mkNode nullKind iniSz def many1NoAntiquot : Parser → Parser := withFn many1Fn private partial def sepByFnAux (p : ParserFn) (sep : ParserFn) (allowTrailingSep : Bool) (iniSz : Nat) (pOpt : Bool) : ParserFn := let rec parse (pOpt : Bool) (c s) := Id.run do let sz := s.stackSize let pos := s.pos let mut s := p c s if s.hasError then if s.pos > pos then return s.mkNode nullKind iniSz else if pOpt then s := s.restore sz pos return s.mkNode nullKind iniSz else -- append `Syntax.missing` to make clear that List is incomplete s := s.pushSyntax Syntax.missing return s.mkNode nullKind iniSz if s.stackSize > sz + 1 then s := s.mkNode nullKind sz let sz := s.stackSize let pos := s.pos s := sep c s if s.hasError then s := s.restore sz pos return s.mkNode nullKind iniSz if s.stackSize > sz + 1 then s := s.mkNode nullKind sz parse allowTrailingSep c s parse pOpt def sepByFn (allowTrailingSep : Bool) (p : ParserFn) (sep : ParserFn) : ParserFn := fun c s => let iniSz := s.stackSize sepByFnAux p sep allowTrailingSep iniSz true c s def sepBy1Fn (allowTrailingSep : Bool) (p : ParserFn) (sep : ParserFn) : ParserFn := fun c s => let iniSz := s.stackSize sepByFnAux p sep allowTrailingSep iniSz false c s @[noinline] def sepByInfo (p sep : ParserInfo) : ParserInfo := { collectTokens := p.collectTokens ∘ sep.collectTokens collectKinds := p.collectKinds ∘ sep.collectKinds } @[noinline] def sepBy1Info (p sep : ParserInfo) : ParserInfo := { collectTokens := p.collectTokens ∘ sep.collectTokens collectKinds := p.collectKinds ∘ sep.collectKinds firstTokens := p.firstTokens } def sepByNoAntiquot (p sep : Parser) (allowTrailingSep : Bool := false) : Parser := { info := sepByInfo p.info sep.info fn := sepByFn allowTrailingSep p.fn sep.fn } def sepBy1NoAntiquot (p sep : Parser) (allowTrailingSep : Bool := false) : Parser := { info := sepBy1Info p.info sep.info fn := sepBy1Fn allowTrailingSep p.fn sep.fn } /-- Apply `f` to the syntax object produced by `p` -/ def withResultOfFn (p : ParserFn) (f : Syntax → Syntax) : ParserFn := fun c s => let s := p c s if s.hasError then s else let stx := s.stxStack.back s.popSyntax.pushSyntax (f stx) @[noinline] def withResultOfInfo (p : ParserInfo) : ParserInfo := { collectTokens := p.collectTokens collectKinds := p.collectKinds } def withResultOf (p : Parser) (f : Syntax → Syntax) : Parser := { info := withResultOfInfo p.info fn := withResultOfFn p.fn f } def many1Unbox (p : Parser) : Parser := withResultOf (many1NoAntiquot p) fun stx => if stx.getNumArgs == 1 then stx.getArg 0 else stx partial def satisfyFn (p : Char → Bool) (errorMsg : String := "unexpected character") : ParserFn := fun c s => let i := s.pos if h : c.input.atEnd i then s.mkEOIError else if p (c.input.get' i h) then s.next' c.input i h else s.mkUnexpectedError errorMsg partial def takeUntilFn (p : Char → Bool) : ParserFn := fun c s => let i := s.pos if h : c.input.atEnd i then s else if p (c.input.get' i h) then s else takeUntilFn p c (s.next' c.input i h) def takeWhileFn (p : Char → Bool) : ParserFn := takeUntilFn (fun c => !p c) def takeWhile1Fn (p : Char → Bool) (errorMsg : String) : ParserFn := andthenFn (satisfyFn p errorMsg) (takeWhileFn p) variable (pushMissingOnError : Bool) in partial def finishCommentBlock (nesting : Nat) : ParserFn := fun c s => let input := c.input let i := s.pos if h : input.atEnd i then eoi s else let curr := input.get' i h let i := input.next' i h if curr == '-' then if h : input.atEnd i then eoi s else let curr := input.get' i h if curr == '/' then -- "-/" end of comment if nesting == 1 then s.next' input i h else finishCommentBlock (nesting-1) c (s.next' input i h) else finishCommentBlock nesting c (s.setPos i) else if curr == '/' then if h : input.atEnd i then eoi s else let curr := input.get' i h if curr == '-' then finishCommentBlock (nesting+1) c (s.next' input i h) else finishCommentBlock nesting c (s.setPos i) else finishCommentBlock nesting c (s.setPos i) where eoi s := s.mkUnexpectedError (pushMissing := pushMissingOnError) "unterminated comment" /-- Consume whitespace and comments -/ partial def whitespace : ParserFn := fun c s => let input := c.input let i := s.pos if h : input.atEnd i then s else let curr := input.get' i h if curr == '\t' then s.mkUnexpectedError (pushMissing := false) "tabs are not allowed; please configure your editor to expand them" else if curr == '\r' then s.mkUnexpectedError (pushMissing := false) "isolated carriage returns are not allowed" else if curr.isWhitespace then whitespace c (s.next' input i h) else if curr == '-' then let i := input.next' i h let curr := input.get i if curr == '-' then andthenFn (takeUntilFn (fun c => c = '\n')) whitespace c (s.next input i) else s else if curr == '/' then let i := input.next' i h let curr := input.get i if curr == '-' then let i := input.next i let curr := input.get i if curr == '-' || curr == '!' then s -- "/--" and "/-!" doc comment are actual tokens else andthenFn (finishCommentBlock (pushMissingOnError := false) 1) whitespace c (s.next input i) else s else s def mkEmptySubstringAt (s : String) (p : String.Pos) : Substring := { str := s, startPos := p, stopPos := p } private def rawAux (startPos : String.Pos) (trailingWs : Bool) : ParserFn := fun c s => let input := c.input let stopPos := s.pos let leading := mkEmptySubstringAt input startPos let val := input.extract startPos stopPos if trailingWs then let s := whitespace c s let stopPos' := s.pos let trailing := { str := input, startPos := stopPos, stopPos := stopPos' : Substring } let atom := mkAtom (SourceInfo.original leading startPos trailing (startPos + val)) val s.pushSyntax atom else let trailing := mkEmptySubstringAt input stopPos let atom := mkAtom (SourceInfo.original leading startPos trailing (startPos + val)) val s.pushSyntax atom /-- Match an arbitrary Parser and return the consumed String in a `Syntax.atom`. -/ def rawFn (p : ParserFn) (trailingWs := false) : ParserFn := fun c s => let startPos := s.pos let s := p c s if s.hasError then s else rawAux startPos trailingWs c s def chFn (c : Char) (trailingWs := false) : ParserFn := rawFn (satisfyFn (fun d => c == d) ("'" ++ toString c ++ "'")) trailingWs def rawCh (c : Char) (trailingWs := false) : Parser := { fn := chFn c trailingWs } def hexDigitFn : ParserFn := fun c s => let input := c.input let i := s.pos if h : input.atEnd i then s.mkEOIError else let curr := input.get' i h let i := input.next' i h if curr.isDigit || ('a' <= curr && curr <= 'f') || ('A' <= curr && curr <= 'F') then s.setPos i else s.mkUnexpectedError "invalid hexadecimal numeral" /-- Parses the whitespace after the `\` when there is a string gap. Raises an error if the whitespace does not contain exactly one newline character. -/ partial def stringGapFn (seenNewline : Bool) : ParserFn := fun c s => let i := s.pos if h : c.input.atEnd i then s -- let strLitFnAux handle the EOI error if !seenNewline else let curr := c.input.get' i h if curr == '\n' then if seenNewline then -- Having more than one newline in a string gap is visually confusing s.mkUnexpectedError "unexpected additional newline in string gap" else stringGapFn true c (s.next' c.input i h) else if curr.isWhitespace then stringGapFn seenNewline c (s.next' c.input i h) else if seenNewline then s else s.mkUnexpectedError "expecting newline in string gap" /-- Parses a string quotation after a `\`. - `isQuotable` determines which characters are valid escapes - `inString` enables features that are only valid within strings, in particular `"\" newline whitespace*` gaps. -/ def quotedCharCoreFn (isQuotable : Char → Bool) (inString : Bool) : ParserFn := fun c s => let input := c.input let i := s.pos if h : input.atEnd i then s.mkEOIError else let curr := input.get' i h if isQuotable curr then s.next' input i h else if curr == 'x' then andthenFn hexDigitFn hexDigitFn c (s.next' input i h) else if curr == 'u' then andthenFn hexDigitFn (andthenFn hexDigitFn (andthenFn hexDigitFn hexDigitFn)) c (s.next' input i h) else if inString && curr == '\n' then stringGapFn false c s else s.mkUnexpectedError "invalid escape sequence" def isQuotableCharDefault (c : Char) : Bool := c == '\\' || c == '\"' || c == '\'' || c == 'r' || c == 'n' || c == 't' def quotedCharFn : ParserFn := quotedCharCoreFn isQuotableCharDefault false /-- Like `quotedCharFn` but enables escapes that are only valid inside strings. In particular, string gaps (`"\" newline whitespace*`). -/ def quotedStringFn : ParserFn := quotedCharCoreFn isQuotableCharDefault true /-- Push `(Syntax.node tk )` onto syntax stack if parse was successful. -/ def mkNodeToken (n : SyntaxNodeKind) (startPos : String.Pos) : ParserFn := fun c s => Id.run do if s.hasError then return s let input := c.input let stopPos := s.pos let leading := mkEmptySubstringAt input startPos let val := input.extract startPos stopPos let s := whitespace c s let wsStopPos := s.pos let trailing := { str := input, startPos := stopPos, stopPos := wsStopPos : Substring } let info := SourceInfo.original leading startPos trailing stopPos s.pushSyntax (Syntax.mkLit n val info) def charLitFnAux (startPos : String.Pos) : ParserFn := fun c s => let input := c.input let i := s.pos if h : input.atEnd i then s.mkEOIError else let curr := input.get' i h let s := s.setPos (input.next' i h) let s := if curr == '\\' then quotedCharFn c s else s if s.hasError then s else let i := s.pos let curr := input.get i let s := s.setPos (input.next i) if curr == '\'' then mkNodeToken charLitKind startPos c s else s.mkUnexpectedError "missing end of character literal" partial def strLitFnAux (startPos : String.Pos) : ParserFn := fun c s => let input := c.input let i := s.pos if h : input.atEnd i then s.mkUnexpectedErrorAt "unterminated string literal" startPos else let curr := input.get' i h let s := s.setPos (input.next' i h) if curr == '\"' then mkNodeToken strLitKind startPos c s else if curr == '\\' then andthenFn quotedStringFn (strLitFnAux startPos) c s else strLitFnAux startPos c s /-- Raw strings have the syntax `r##...#"..."#...##` with zero or more `#`'s. If we are looking at a valid start to a raw string (`r##...#"`), returns true. We assume `i` begins at the position immediately after `r`. -/ partial def isRawStrLitStart (input : String) (i : String.Pos) : Bool := if h : input.atEnd i then false else let curr := input.get' i h if curr == '#' then isRawStrLitStart input (input.next' i h) else curr == '"' /-- Parses a raw string literal assuming `isRawStrLitStart` has returned true. The `startPos` is the start of the raw string (at the `r`). The parser state is assumed to be immediately after the `r`. -/ partial def rawStrLitFnAux (startPos : String.Pos) : ParserFn := initState 0 where /-- Gives the "unterminated raw string literal" error. -/ errorUnterminated (s : ParserState) := s.mkUnexpectedErrorAt "unterminated raw string literal" startPos /-- Parses the `#`'s and `"` at the beginning of the raw string. The `num` variable counts the number of `#`s after the `r`. -/ initState (num : Nat) : ParserFn := fun c s => let input := c.input let i := s.pos if h : input.atEnd i then errorUnterminated s else let curr := input.get' i h let s := s.setPos (input.next' i h) if curr == '#' then initState (num + 1) c s else if curr == '"' then normalState num c s else -- This should not occur, since we assume `isRawStrLitStart` succeeded. errorUnterminated s /-- Parses characters after the first `"`. If we need to start counting `#`'s to decide if we are closing the raw string literal, we switch to `closingState`. -/ normalState (num : Nat) : ParserFn := fun c s => let input := c.input let i := s.pos if h : input.atEnd i then errorUnterminated s else let curr := input.get' i h let s := s.setPos (input.next' i h) if curr == '\"' then if num == 0 then mkNodeToken strLitKind startPos c s else closingState num 0 c s else normalState num c s /-- Parses `#` characters immediately after a `"`. The `closingNum` variable counts the number of `#`s seen after the `"`. Note: `num > 0` since the `num = 0` case is entirely handled by `normalState`. -/ closingState (num : Nat) (closingNum : Nat) : ParserFn := fun c s => let input := c.input let i := s.pos if h : input.atEnd i then errorUnterminated s else let curr := input.get' i h let s := s.setPos (input.next' i h) if curr == '#' then if closingNum + 1 == num then mkNodeToken strLitKind startPos c s else closingState num (closingNum + 1) c s else if curr == '\"' then closingState num 0 c s else normalState num c s /-- Parses a sequence of the form `many (many '_' >> many1 digit)`, but if `needDigit` is true the parsed result must be nonempty. Note: this does not report that it is expecting `_` if we reach EOI or an unexpected character. Rationale: this error happens if there is already a `_`, and while sequences of `_` are allowed, it's a bit perverse to suggest extending the sequence. -/ partial def takeDigitsFn (isDigit : Char → Bool) (expecting : String) (needDigit : Bool) : ParserFn := fun c s => let input := c.input let i := s.pos if h : input.atEnd i then if needDigit then s.mkEOIError [expecting] else s else let curr := input.get' i h if curr == '_' then takeDigitsFn isDigit expecting true c (s.next' c.input i h) else if isDigit curr then takeDigitsFn isDigit expecting false c (s.next' c.input i h) else if needDigit then s.mkUnexpectedError "unexpected character" (expected := [expecting]) else s def decimalNumberFn (startPos : String.Pos) (c : ParserContext) : ParserState → ParserState := fun s => let s := takeDigitsFn (fun c => c.isDigit) "decimal number" false c s let input := c.input let i := s.pos if h : input.atEnd i then mkNodeToken numLitKind startPos c s else let curr := input.get' i h let j := input.next i if ∃ hj : ¬ input.atEnd j, curr = '.' && input.get' j hj = '.' then mkNodeToken numLitKind startPos c s else if curr == '.' || curr == 'e' || curr == 'E' then parseScientific s else mkNodeToken numLitKind startPos c s where parseScientific s := let s := parseOptDot s let s := parseOptExp s mkNodeToken scientificLitKind startPos c s parseOptDot s := let input := c.input let i := s.pos let curr := input.get i if curr == '.' then let i := input.next i let curr := input.get i if curr.isDigit then takeDigitsFn (fun c => c.isDigit) "decimal number" false c (s.setPos i) else s.setPos i else s parseOptExp s := let input := c.input let i := s.pos let curr := input.get i if curr == 'e' || curr == 'E' then let i := input.next i let i := if input.get i == '-' || input.get i == '+' then input.next i else i let curr := input.get i if curr.isDigit then takeDigitsFn (fun c => c.isDigit) "decimal number" false c (s.setPos i) else s.mkUnexpectedError "missing exponent digits in scientific literal" else s def binNumberFn (startPos : String.Pos) : ParserFn := fun c s => let s := takeDigitsFn (fun c => c == '0' || c == '1') "binary number" true c s mkNodeToken numLitKind startPos c s def octalNumberFn (startPos : String.Pos) : ParserFn := fun c s => let s := takeDigitsFn (fun c => '0' ≤ c && c ≤ '7') "octal number" true c s mkNodeToken numLitKind startPos c s def hexNumberFn (startPos : String.Pos) : ParserFn := fun c s => let s := takeDigitsFn (fun c => ('0' ≤ c && c ≤ '9') || ('a' ≤ c && c ≤ 'f') || ('A' ≤ c && c ≤ 'F')) "hexadecimal number" true c s mkNodeToken numLitKind startPos c s def numberFnAux : ParserFn := fun c s => let input := c.input let startPos := s.pos if h : input.atEnd startPos then s.mkEOIError else let curr := input.get' startPos h if curr == '0' then let i := input.next' startPos h let curr := input.get i if curr == 'b' || curr == 'B' then binNumberFn startPos c (s.next input i) else if curr == 'o' || curr == 'O' then octalNumberFn startPos c (s.next input i) else if curr == 'x' || curr == 'X' then hexNumberFn startPos c (s.next input i) else decimalNumberFn startPos c (s.setPos i) else if curr.isDigit then decimalNumberFn startPos c (s.next input startPos) else s.mkError "numeral" def isIdCont : String → ParserState → Bool := fun input s => let i := s.pos let curr := input.get i if curr == '.' then let i := input.next i if input.atEnd i then false else let curr := input.get i isIdFirst curr || isIdBeginEscape curr else false private def isToken (idStartPos idStopPos : String.Pos) (tk : Option Token) : Bool := match tk with | none => false | some tk => -- if a token is both a symbol and a valid identifier (i.e. a keyword), -- we want it to be recognized as a symbol tk.endPos ≥ idStopPos - idStartPos def mkTokenAndFixPos (startPos : String.Pos) (tk : Option Token) : ParserFn := fun c s => match tk with | none => s.mkErrorAt "token" startPos | some tk => if c.forbiddenTk? == some tk then s.mkErrorAt "forbidden token" startPos else let input := c.input let leading := mkEmptySubstringAt input startPos let stopPos := startPos + tk let s := s.setPos stopPos let s := whitespace c s let wsStopPos := s.pos let trailing := { str := input, startPos := stopPos, stopPos := wsStopPos : Substring } let atom := mkAtom (SourceInfo.original leading startPos trailing stopPos) tk s.pushSyntax atom def mkIdResult (startPos : String.Pos) (tk : Option Token) (val : Name) : ParserFn := fun c s => let stopPos := s.pos if isToken startPos stopPos tk then mkTokenAndFixPos startPos tk c s else let input := c.input let rawVal := { str := input, startPos := startPos, stopPos := stopPos : Substring } let s := whitespace c s let trailingStopPos := s.pos let leading := mkEmptySubstringAt input startPos let trailing := { str := input, startPos := stopPos, stopPos := trailingStopPos : Substring } let info := SourceInfo.original leading startPos trailing stopPos let atom := mkIdent info rawVal val s.pushSyntax atom partial def identFnAux (startPos : String.Pos) (tk : Option Token) (r : Name) : ParserFn := let rec parse (r : Name) (c s) := let input := c.input let i := s.pos if h : input.atEnd i then s.mkEOIError else let curr := input.get' i h if isIdBeginEscape curr then let startPart := input.next' i h let s := takeUntilFn isIdEndEscape c (s.setPos startPart) if h : input.atEnd s.pos then s.mkUnexpectedErrorAt "unterminated identifier escape" startPart else let stopPart := s.pos let s := s.next' c.input s.pos h let r := .str r (input.extract startPart stopPart) if isIdCont input s then let s := s.next input s.pos parse r c s else mkIdResult startPos tk r c s else if isIdFirst curr then let startPart := i let s := takeWhileFn isIdRest c (s.next input i) let stopPart := s.pos let r := .str r (input.extract startPart stopPart) if isIdCont input s then let s := s.next input s.pos parse r c s else mkIdResult startPos tk r c s else mkTokenAndFixPos startPos tk c s parse r private def isIdFirstOrBeginEscape (c : Char) : Bool := isIdFirst c || isIdBeginEscape c private def nameLitAux (startPos : String.Pos) : ParserFn := fun c s => let input := c.input let s := identFnAux startPos none .anonymous c (s.next input startPos) if s.hasError then s else let stx := s.stxStack.back match stx with | .ident info rawStr _ _ => let s := s.popSyntax s.pushSyntax (Syntax.mkNameLit rawStr.toString info) | _ => s.mkError "invalid Name literal" private def tokenFnAux : ParserFn := fun c s => let input := c.input let i := s.pos let curr := input.get i if curr == '\"' then strLitFnAux i c (s.next input i) else if curr == '\'' && getNext input i != '\'' then charLitFnAux i c (s.next input i) else if curr.isDigit then numberFnAux c s else if curr == '`' && isIdFirstOrBeginEscape (getNext input i) then nameLitAux i c s else if curr == 'r' && isRawStrLitStart input (input.next i) then rawStrLitFnAux i c (s.next input i) else let tk := c.tokens.matchPrefix input i identFnAux i tk .anonymous c s private def updateTokenCache (startPos : String.Pos) (s : ParserState) : ParserState := -- do not cache token parsing errors, which are rare and usually fatal and thus not worth an extra field in `TokenCache` match s with | ⟨stack, lhsPrec, pos, ⟨_, catCache⟩, none, errs⟩ => if stack.size == 0 then s else let tk := stack.back ⟨stack, lhsPrec, pos, ⟨{ startPos := startPos, stopPos := pos, token := tk }, catCache⟩, none, errs⟩ | other => other def tokenFn (expected : List String := []) : ParserFn := fun c s => let input := c.input let i := s.pos if input.atEnd i then s.mkEOIError expected else let tkc := s.cache.tokenCache if tkc.startPos == i then let s := s.pushSyntax tkc.token s.setPos tkc.stopPos else let s := tokenFnAux c s updateTokenCache i s def peekTokenAux (c : ParserContext) (s : ParserState) : ParserState × Except ParserState Syntax := let iniSz := s.stackSize let iniPos := s.pos let s := tokenFn [] c s if let some _ := s.errorMsg then (s.restore iniSz iniPos, .error s) else let stx := s.stxStack.back (s.restore iniSz iniPos, .ok stx) def peekToken (c : ParserContext) (s : ParserState) : ParserState × Except ParserState Syntax := let tkc := s.cache.tokenCache if tkc.startPos == s.pos then (s, .ok tkc.token) else peekTokenAux c s /-- Treat keywords as identifiers. -/ def rawIdentFn : ParserFn := fun c s => let input := c.input let i := s.pos if input.atEnd i then s.mkEOIError else identFnAux i none .anonymous c s def satisfySymbolFn (p : String → Bool) (expected : List String) : ParserFn := fun c s => Id.run do let iniPos := s.pos let s := tokenFn expected c s if s.hasError then return s if let .atom _ sym := s.stxStack.back then if p sym then return s -- this is a very hot `mkUnexpectedTokenErrors` call, so explicitly pass `iniPos` s.mkUnexpectedTokenErrors expected iniPos def symbolFnAux (sym : String) (errorMsg : String) : ParserFn := satisfySymbolFn (fun s => s == sym) [errorMsg] def symbolInfo (sym : String) : ParserInfo := { collectTokens := fun tks => sym :: tks firstTokens := FirstTokens.tokens [ sym ] } def symbolFn (sym : String) : ParserFn := symbolFnAux sym ("'" ++ sym ++ "'") def symbolNoAntiquot (sym : String) : Parser := let sym := sym.trim { info := symbolInfo sym fn := symbolFn sym } def checkTailNoWs (prev : Syntax) : Bool := match prev.getTailInfo with | .original _ _ trailing _ => trailing.stopPos == trailing.startPos | _ => false /-- Check if the following token is the symbol _or_ identifier `sym`. Useful for parsing local tokens that have not been added to the token table (but may have been so by some unrelated code). For example, the universe `max` Function is parsed using this combinator so that it can still be used as an identifier outside of universe (but registering it as a token in a Term Syntax would not break the universe Parser). -/ def nonReservedSymbolFnAux (sym : String) (errorMsg : String) : ParserFn := fun c s => Id.run do let s := tokenFn [errorMsg] c s if s.hasError then return s match s.stxStack.back with | .atom _ sym' => if sym == sym' then return s | .ident info rawVal _ _ => if sym == rawVal.toString then let s := s.popSyntax return s.pushSyntax (Syntax.atom info sym) | _ => () s.mkUnexpectedTokenError errorMsg def nonReservedSymbolFn (sym : String) : ParserFn := nonReservedSymbolFnAux sym ("'" ++ sym ++ "'") def nonReservedSymbolInfo (sym : String) (includeIdent : Bool) : ParserInfo := { firstTokens := if includeIdent then .tokens [ sym, "ident" ] else .tokens [ sym ] } def nonReservedSymbolNoAntiquot (sym : String) (includeIdent := false) : Parser := let sym := sym.trim { info := nonReservedSymbolInfo sym includeIdent, fn := nonReservedSymbolFn sym } partial def strAux (sym : String) (errorMsg : String) (j : String.Pos) :ParserFn := let rec parse (j c s) := if h₁ : sym.atEnd j then s else let i := s.pos let input := c.input if h₂ : input.atEnd i then s.mkError errorMsg else if sym.get' j h₁ != input.get' i h₂ then s.mkError errorMsg else parse (sym.next' j h₁) c (s.next' input i h₂) parse j def checkTailWs (prev : Syntax) : Bool := match prev.getTailInfo with | .original _ _ trailing _ => trailing.stopPos > trailing.startPos | _ => false def checkWsBeforeFn (errorMsg : String) : ParserFn := fun _ s => let prev := s.stxStack.back if checkTailWs prev then s else s.mkError errorMsg /-- The `ws` parser requires that there is some whitespace at this location. For example, the parser `"foo" ws "+"` parses `foo +` or `foo/- -/+` but not `foo+`. This parser has arity 0 - it does not capture anything. -/ @[builtin_doc] def checkWsBefore (errorMsg : String := "space before") : Parser where info := epsilonInfo fn := checkWsBeforeFn errorMsg def checkTailLinebreak (prev : Syntax) : Bool := match prev.getTailInfo with | .original _ _ trailing _ => trailing.contains '\n' | _ => false def checkLinebreakBeforeFn (errorMsg : String) : ParserFn := fun _ s => let prev := s.stxStack.back if checkTailLinebreak prev then s else s.mkError errorMsg /-- The `linebreak` parser requires that there is at least one line break at this location. (The line break may be inside a comment.) This parser has arity 0 - it does not capture anything. -/ @[builtin_doc] def checkLinebreakBefore (errorMsg : String := "line break") : Parser where info := epsilonInfo fn := checkLinebreakBeforeFn errorMsg private def pickNonNone (stack : SyntaxStack) : Syntax := match stack.toSubarray.findRev? fun stx => !stx.isNone with | none => Syntax.missing | some stx => stx def checkNoWsBeforeFn (errorMsg : String) : ParserFn := fun _ s => let prev := pickNonNone s.stxStack if checkTailNoWs prev then s else s.mkError errorMsg /-- The `noWs` parser requires that there is *no* whitespace between the preceding and following parsers. For example, the parser `"foo" noWs "+"` parses `foo+` but not `foo +`. This is almost the same as `"foo+"`, but using this parser will make `foo+` a token, which may cause problems for the use of `"foo"` and `"+"` as separate tokens in other parsers. This parser has arity 0 - it does not capture anything. -/ @[builtin_doc] def checkNoWsBefore (errorMsg : String := "no space before") : Parser := { info := epsilonInfo fn := checkNoWsBeforeFn errorMsg } def unicodeSymbolFnAux (sym asciiSym : String) (expected : List String) : ParserFn := satisfySymbolFn (fun s => s == sym || s == asciiSym) expected def unicodeSymbolInfo (sym asciiSym : String) : ParserInfo := { collectTokens := fun tks => sym :: asciiSym :: tks firstTokens := FirstTokens.tokens [ sym, asciiSym ] } def unicodeSymbolFn (sym asciiSym : String) : ParserFn := unicodeSymbolFnAux sym asciiSym ["'" ++ sym ++ "', '" ++ asciiSym ++ "'"] def unicodeSymbolNoAntiquot (sym asciiSym : String) : Parser := let sym := sym.trim let asciiSym := asciiSym.trim { info := unicodeSymbolInfo sym asciiSym fn := unicodeSymbolFn sym asciiSym } def mkAtomicInfo (k : String) : ParserInfo := { firstTokens := FirstTokens.tokens [ k ] } /-- Parses a token and asserts the result is of the given kind. `desc` is used in error messages as the token kind. -/ def expectTokenFn (k : SyntaxNodeKind) (desc : String) : ParserFn := fun c s => let s := tokenFn [desc] c s if !s.hasError && !(s.stxStack.back.isOfKind k) then s.mkUnexpectedTokenError desc else s def numLitFn : ParserFn := expectTokenFn numLitKind "numeral" def numLitNoAntiquot : Parser := { fn := numLitFn info := mkAtomicInfo "num" } def scientificLitFn : ParserFn := expectTokenFn scientificLitKind "scientific number" def scientificLitNoAntiquot : Parser := { fn := scientificLitFn info := mkAtomicInfo "scientific" } def strLitFn : ParserFn := expectTokenFn strLitKind "string literal" def strLitNoAntiquot : Parser := { fn := strLitFn info := mkAtomicInfo "str" } def charLitFn : ParserFn := expectTokenFn charLitKind "character literal" def charLitNoAntiquot : Parser := { fn := charLitFn info := mkAtomicInfo "char" } def nameLitFn : ParserFn := expectTokenFn nameLitKind "Name literal" def nameLitNoAntiquot : Parser := { fn := nameLitFn info := mkAtomicInfo "name" } def identFn : ParserFn := expectTokenFn identKind "identifier" def identNoAntiquot : Parser := { fn := identFn info := mkAtomicInfo "ident" } def rawIdentNoAntiquot : Parser := { fn := rawIdentFn } def identEqFn (id : Name) : ParserFn := fun c s => let s := tokenFn ["identifier"] c s if s.hasError then s else match s.stxStack.back with | .ident _ _ val _ => if val != id then s.mkUnexpectedTokenError s!"identifier '{id}'" else s | _ => s.mkUnexpectedTokenError "identifier" def identEq (id : Name) : Parser := { fn := identEqFn id info := mkAtomicInfo "ident" } def hygieneInfoFn : ParserFn := fun c s => Id.run do let input := c.input let finish pos str trailing s := -- Builds an actual hygieneInfo node from empty string `str` and trailing whitespace `trailing`. let info := SourceInfo.original str pos trailing pos let ident := mkIdent info str .anonymous let stx := mkNode hygieneInfoKind #[ident] s.pushSyntax stx -- If we are at the whitespace after a token, the last item on the stack -- will have trailing whitespace. We want to position this hygieneInfo -- item immediately after the last token, and reattribute the trailing whitespace -- to the hygieneInfo node itself. This allows combinators like `ws` to -- be unaffected by `hygieneInfo` parsers before or after, see `2262.lean`. if !s.stxStack.isEmpty then let prev := s.stxStack.back if let .original leading pos trailing endPos := prev.getTailInfo then let str := mkEmptySubstringAt input endPos -- steal the trailing whitespace from the last node and use it for this node let s := s.popSyntax.pushSyntax <| prev.setTailInfo (.original leading pos str endPos) return finish endPos str trailing s -- The stack can be empty if this is either the first token, or if we are in a fresh cache. -- In that case we just put the hygieneInfo at the current location. let str := mkEmptySubstringAt input s.pos finish s.pos str str s def hygieneInfoNoAntiquot : Parser := { fn := hygieneInfoFn info := nodeInfo hygieneInfoKind epsilonInfo } namespace ParserState def keepTop (s : SyntaxStack) (startStackSize : Nat) : SyntaxStack := let node := s.back s.shrink startStackSize |>.push node def keepNewError (s : ParserState) (oldStackSize : Nat) : ParserState := match s with | ⟨stack, lhsPrec, pos, cache, err, errs⟩ => ⟨keepTop stack oldStackSize, lhsPrec, pos, cache, err, errs⟩ def keepPrevError (s : ParserState) (oldStackSize : Nat) (oldStopPos : String.Pos) (oldError : Option Error) (oldLhsPrec : Nat) : ParserState := match s with | ⟨stack, _, _, cache, _, errs⟩ => ⟨stack.shrink oldStackSize, oldLhsPrec, oldStopPos, cache, oldError, errs⟩ def mergeErrors (s : ParserState) (oldStackSize : Nat) (oldError : Error) : ParserState := match s with | ⟨stack, lhsPrec, pos, cache, some err, errs⟩ => let newError := if oldError == err then err else oldError.merge err ⟨stack.shrink oldStackSize, lhsPrec, pos, cache, some newError, errs⟩ | other => other def keepLatest (s : ParserState) (startStackSize : Nat) : ParserState := match s with | ⟨stack, lhsPrec, pos, cache, _, errs⟩ => ⟨keepTop stack startStackSize, lhsPrec, pos, cache, none, errs⟩ def replaceLongest (s : ParserState) (startStackSize : Nat) : ParserState := s.keepLatest startStackSize end ParserState def invalidLongestMatchParser (s : ParserState) : ParserState := s.mkError "longestMatch parsers must generate exactly one Syntax node" /-- Auxiliary function used to execute parsers provided to `longestMatchFn`. Push `left?` into the stack if it is not `none`, and execute `p`. Remark: `p` must produce exactly one syntax node. Remark: the `left?` is not none when we are processing trailing parsers. -/ def runLongestMatchParser (left? : Option Syntax) (startLhsPrec : Nat) (p : ParserFn) : ParserFn := fun c s => Id.run do /- We assume any registered parser `p` has one of two forms: * a direct call to `leadingParser` or `trailingParser` * a direct call to a (leading) token parser In the first case, we can extract the precedence of the parser by having `leadingParser/trailingParser` set `ParserState.lhsPrec` to it in the very end so that no nested parser can interfere. In the second case, the precedence is effectively `max` (there is a `checkPrec` merely for the convenience of the pretty printer) and there are no nested `leadingParser/trailingParser` calls, so the value of `lhsPrec` will not be changed by the parser (nor will it be read by any leading parser). Thus we initialize the field to `maxPrec` in the leading case. -/ let mut s := { s with lhsPrec := if left?.isSome then startLhsPrec else maxPrec } let startSize := s.stackSize if let some left := left? then s := s.pushSyntax left s := p c s -- stack contains `[..., result ]` if s.stackSize == startSize + 1 then s -- success or error with the expected number of nodes else if s.hasError then -- error with an unexpected number of nodes. s.shrinkStack startSize |>.pushSyntax Syntax.missing else -- parser succeeded with incorrect number of nodes invalidLongestMatchParser s def longestMatchStep (left? : Option Syntax) (startSize startLhsPrec : Nat) (startPos : String.Pos) (prevPrio : Nat) (prio : Nat) (p : ParserFn) : ParserContext → ParserState → ParserState × Nat := fun c s => let score (s : ParserState) (prio : Nat) := (s.pos.byteIdx, if s.errorMsg.isSome then (0 : Nat) else 1, prio) let previousScore := score s prevPrio let prevErrorMsg := s.errorMsg let prevStopPos := s.pos let prevSize := s.stackSize let prevLhsPrec := s.lhsPrec let s := s.restore prevSize startPos let s := runLongestMatchParser left? startLhsPrec p c s match (let _ := @lexOrd; compare previousScore (score s prio)) with | .lt => (s.keepNewError startSize, prio) | .gt => (s.keepPrevError prevSize prevStopPos prevErrorMsg prevLhsPrec, prevPrio) | .eq => match prevErrorMsg with | none => -- it is not clear what the precedence of a choice node should be, so we conservatively take the minimum ({s with lhsPrec := s.lhsPrec.min prevLhsPrec }, prio) | some oldError => (s.mergeErrors prevSize oldError, prio) def longestMatchMkResult (startSize : Nat) (s : ParserState) : ParserState := if s.stackSize > startSize + 1 then s.mkNode choiceKind startSize else s def longestMatchFnAux (left? : Option Syntax) (startSize startLhsPrec : Nat) (startPos : String.Pos) (prevPrio : Nat) (ps : List (Parser × Nat)) : ParserFn := let rec parse (prevPrio : Nat) (ps : List (Parser × Nat)) := match ps with | [] => fun _ s => longestMatchMkResult startSize s | p::ps => fun c s => let (s, prevPrio) := longestMatchStep left? startSize startLhsPrec startPos prevPrio p.2 p.1.fn c s parse prevPrio ps c s parse prevPrio ps def longestMatchFn (left? : Option Syntax) : List (Parser × Nat) → ParserFn | [] => fun _ s => s.mkError "longestMatch: empty list" | [p] => fun c s => runLongestMatchParser left? s.lhsPrec p.1.fn c s | p::ps => fun c s => let startSize := s.stackSize let startLhsPrec := s.lhsPrec let startPos := s.pos let s := runLongestMatchParser left? s.lhsPrec p.1.fn c s longestMatchFnAux left? startSize startLhsPrec startPos p.2 ps c s def anyOfFn : List Parser → ParserFn | [], _, s => s.mkError "anyOf: empty list" | [p], c, s => p.fn c s | p::ps, c, s => orelseFn p.fn (anyOfFn ps) c s def checkColEqFn (errorMsg : String) : ParserFn := fun c s => match c.savedPos? with | none => s | some savedPos => let savedPos := c.fileMap.toPosition savedPos let pos := c.fileMap.toPosition s.pos if pos.column = savedPos.column then s else s.mkError errorMsg /-- The `colEq` parser ensures that the next token starts at exactly the column of the saved position (see `withPosition`). This can be used to do whitespace sensitive syntax like a `by` block or `do` block, where all the lines have to line up. This parser has arity 0 - it does not capture anything. -/ @[builtin_doc] def checkColEq (errorMsg : String := "checkColEq") : Parser where fn := checkColEqFn errorMsg def checkColGeFn (errorMsg : String) : ParserFn := fun c s => match c.savedPos? with | none => s | some savedPos => let savedPos := c.fileMap.toPosition savedPos let pos := c.fileMap.toPosition s.pos if pos.column ≥ savedPos.column then s else s.mkError errorMsg /-- The `colGe` parser requires that the next token starts from at least the column of the saved position (see `withPosition`), but allows it to be more indented. This can be used for whitespace sensitive syntax to ensure that a block does not go outside a certain indentation scope. For example it is used in the lean grammar for `else if`, to ensure that the `else` is not less indented than the `if` it matches with. This parser has arity 0 - it does not capture anything. -/ @[builtin_doc] def checkColGe (errorMsg : String := "checkColGe") : Parser where fn := checkColGeFn errorMsg info := epsilonInfo def checkColGtFn (errorMsg : String) : ParserFn := fun c s => match c.savedPos? with | none => s | some savedPos => let savedPos := c.fileMap.toPosition savedPos let pos := c.fileMap.toPosition s.pos if pos.column > savedPos.column then s else s.mkError errorMsg /-- The `colGt` parser requires that the next token starts a strictly greater column than the saved position (see `withPosition`). This can be used for whitespace sensitive syntax for the arguments to a tactic, to ensure that the following tactic is not interpreted as an argument. ``` example (x : False) : False := by revert x exact id ``` Here, the `revert` tactic is followed by a list of `colGt ident`, because otherwise it would interpret `exact` as an identifier and try to revert a variable named `exact`. This parser has arity 0 - it does not capture anything. -/ @[builtin_doc] def checkColGt (errorMsg : String := "checkColGt") : Parser where fn := checkColGtFn errorMsg def checkLineEqFn (errorMsg : String) : ParserFn := fun c s => match c.savedPos? with | none => s | some savedPos => let savedPos := c.fileMap.toPosition savedPos let pos := c.fileMap.toPosition s.pos if pos.line == savedPos.line then s else s.mkError errorMsg /-- The `lineEq` parser requires that the current token is on the same line as the saved position (see `withPosition`). This can be used to ensure that composite tokens are not "broken up" across different lines. For example, `else if` is parsed using `lineEq` to ensure that the two tokens are on the same line. This parser has arity 0 - it does not capture anything. -/ @[builtin_doc] def checkLineEq (errorMsg : String := "checkLineEq") : Parser where fn := checkLineEqFn errorMsg /-- `withPosition(p)` runs `p` while setting the "saved position" to the current position. This has no effect on its own, but various other parsers access this position to achieve some composite effect: * `colGt`, `colGe`, `colEq` compare the column of the saved position to the current position, used to implement Python-style indentation sensitive blocks * `lineEq` ensures that the current position is still on the same line as the saved position, used to implement composite tokens The saved position is only available in the read-only state, which is why this is a scoping parser: after the `withPosition(..)` block the saved position will be restored to its original value. This parser has the same arity as `p` - it just forwards the results of `p`. -/ @[builtin_doc] def withPosition : Parser → Parser := withFn fun f c s => adaptCacheableContextFn ({ · with savedPos? := s.pos }) f c s def withPositionAfterLinebreak : Parser → Parser := withFn fun f c s => let prev := s.stxStack.back adaptCacheableContextFn (fun c => if checkTailLinebreak prev then { c with savedPos? := s.pos } else c) f c s /-- `withoutPosition(p)` runs `p` without the saved position, meaning that position-checking parsers like `colGt` will have no effect. This is usually used by bracketing constructs like `(...)` so that the user can locally override whitespace sensitivity. This parser has the same arity as `p` - it just forwards the results of `p`. -/ @[builtin_doc] def withoutPosition (p : Parser) : Parser := adaptCacheableContext ({ · with savedPos? := none }) p /-- `withForbidden tk p` runs `p` with `tk` as a "forbidden token". This means that if the token appears anywhere in `p` (unless it is nested in `withoutForbidden`), parsing will immediately stop there, making `tk` effectively a lowest-precedence operator. This is used for parsers like `for x in arr do ...`: `arr` is parsed as `withForbidden "do" term` because otherwise `arr do ...` would be treated as an application. This parser has the same arity as `p` - it just forwards the results of `p`. -/ @[builtin_doc] def withForbidden (tk : Token) (p : Parser) : Parser := adaptCacheableContext ({ · with forbiddenTk? := tk }) p /-- `withoutForbidden(p)` runs `p` disabling the "forbidden token" (see `withForbidden`), if any. This is usually used by bracketing constructs like `(...)` because there is no parsing ambiguity inside these nested constructs. This parser has the same arity as `p` - it just forwards the results of `p`. -/ @[builtin_doc] def withoutForbidden (p : Parser) : Parser := adaptCacheableContext ({ · with forbiddenTk? := none }) p def eoiFn : ParserFn := fun c s => let i := s.pos if c.input.atEnd i then s else s.mkError "expected end of file" def eoi : Parser := { fn := eoiFn } /-- A multimap indexed by tokens. Used for indexing parsers by their leading token. -/ def TokenMap (α : Type) := Std.TreeMap Name (List α) Name.quickCmp namespace TokenMap def insert (map : TokenMap α) (k : Name) (v : α) : TokenMap α := match map.get? k with | none => Std.TreeMap.insert map k [v] | some vs => Std.TreeMap.insert map k (v::vs) instance : Inhabited (TokenMap α) where default := Std.TreeMap.empty instance : EmptyCollection (TokenMap α) := ⟨Std.TreeMap.empty⟩ instance : ForIn m (TokenMap α) (Name × List α) := inferInstanceAs (ForIn _ (Std.TreeMap _ _ _) _) end TokenMap structure PrattParsingTables where leadingTable : TokenMap (Parser × Nat) := {} leadingParsers : List (Parser × Nat) := [] -- for supporting parsers we cannot obtain first token trailingTable : TokenMap (Parser × Nat) := {} trailingParsers : List (Parser × Nat) := [] -- for supporting parsers such as function application instance : Inhabited PrattParsingTables where default := {} /-- Specifies how the parsing table lookup function behaves for identifiers. The function `Lean.Parser.prattParser` uses two tables: one each for leading and trailing parsers. These tables map tokens to parsers. Because keyword tokens are distinct from identifier tokens, keywords and identifiers cannot be confused, even when they are syntactically identical. Specifying an alternative leading identifier behavior allows greater flexibility and makes it possible to avoid reserved keywords in some situations. When the leading token is syntactically an identifier, the current syntax category's `LeadingIdentBehavior` specifies how the parsing table lookup function behaves, and allows controlled “punning” between identifiers and keywords. This feature is used to avoid creating a reserved symbol for each built-in tactic (e.g., `apply` or `assumption`). As a result, tactic names can be used as identifiers. -/ inductive LeadingIdentBehavior where /-- If the leading token is an identifier, then the parser just executes the parsers associated with the auxiliary token “ident”, which parses identifiers. -/ | default /-- If the leading token is an identifier ``, and there are parsers `P` associated with the token ``, then the parser executes `P`. Otherwise, it executes only the parsers associated with the auxiliary token “ident”, which parses identifiers. -/ | symbol /-- If the leading token is an identifier ``, then it executes the parsers associated with token `` and parsers associated with the auxiliary token “ident”, which parses identifiers. -/ | both deriving Inhabited, BEq, Repr /-- Each parser category is implemented using a Pratt's parser. The system comes equipped with the following categories: `level`, `term`, `tactic`, and `command`. Users and plugins may define extra categories. The method ``` categoryParser `term prec ``` executes the Pratt's parser for category `term` with precedence `prec`. That is, only parsers with precedence at least `prec` are considered. The method `termParser prec` is equivalent to the method above. -/ structure ParserCategory where /-- The name of a declaration which will be used as the target of go-to-definition queries and from which doc strings will be extracted. This is a dummy declaration of type `Lean.Parser.Category` created by `declare_syntax_cat`, but for builtin categories the declaration is made manually and passed to `registerBuiltinParserAttribute`. -/ declName : Name /-- The list of syntax nodes that can parse into this category. This can be used to list all syntaxes in the category. -/ kinds : SyntaxNodeKindSet := {} /-- The parsing tables, which consist of a dynamic set of parser functions based on the syntaxes that have been declared so far. -/ tables : PrattParsingTables := {} /-- The `LeadingIdentBehavior`, which specifies how the parsing table lookup function behaves for the first identifier to be parsed. This is used by the `tactic` parser to avoid creating a reserved symbol for each builtin tactic (e.g., `apply`, `assumption`, etc.). -/ behavior : LeadingIdentBehavior deriving Inhabited abbrev ParserCategories := PersistentHashMap Name ParserCategory def indexed {α : Type} (map : TokenMap α) (c : ParserContext) (s : ParserState) (behavior : LeadingIdentBehavior) : ParserState × List α := let (s, stx) := peekToken c s let find (n : Name) : ParserState × List α := match map.get? n with | some as => (s, as) | _ => (s, []) match stx with | .ok (.atom _ sym) => find (.mkSimple sym) | .ok (.ident _ _ val _) => match behavior with | .default => find identKind | .symbol => match map.get? val with | some as => (s, as) | none => find identKind | .both => match map.get? val with | some as => if val == identKind then (s, as) -- avoid running the same parsers twice else match map.get? identKind with | some as' => (s, as ++ as') | _ => (s, as) | none => find identKind | .ok (.node _ k _) => find k | .ok _ => (s, []) | .error s' => (s', []) abbrev CategoryParserFn := Name → ParserFn builtin_initialize categoryParserFnRef : IO.Ref CategoryParserFn ← IO.mkRef fun (_ : Name) => whitespace builtin_initialize categoryParserFnExtension : EnvExtension CategoryParserFn ← registerEnvExtension $ categoryParserFnRef.get def categoryParserFn (catName : Name) : ParserFn := fun ctx s => let fn := categoryParserFnExtension.getState ctx.env fn catName ctx s def categoryParser (catName : Name) (prec : Nat) : Parser where fn := adaptCacheableContextFn ({ · with prec }) (withCacheFn catName (categoryParserFn catName)) -- Define `termParser` here because we need it for antiquotations def termParser (prec : Nat := 0) : Parser := categoryParser `term prec -- ================== /-! # Antiquotations -/ -- ================== /-- Fail if previous token is immediately followed by ':'. -/ @[builtin_doc] def checkNoImmediateColon : Parser := { fn := fun c s => let prev := s.stxStack.back if checkTailNoWs prev then let input := c.input let i := s.pos if h : input.atEnd i then s else let curr := input.get' i h if curr == ':' then s.mkUnexpectedError "unexpected ':'" else s else s } def setExpectedFn (expected : List String) (p : ParserFn) : ParserFn := fun c s => match p c s with | s'@{ errorMsg := some msg, .. } => { s' with errorMsg := some { msg with expected } } | s' => s' def setExpected (expected : List String) : Parser → Parser := withFn (setExpectedFn expected) def pushNone : Parser := { fn := fun _ s => s.pushSyntax mkNullNode } -- We support three kinds of antiquotations: `$id`, `$_`, and `$(t)`, where `id` is a term identifier and `t` is a term. def antiquotNestedExpr : Parser := node `antiquotNestedExpr (symbolNoAntiquot "(" >> decQuotDepth termParser >> symbolNoAntiquot ")") def antiquotExpr : Parser := identNoAntiquot <|> symbolNoAntiquot "_" <|> antiquotNestedExpr def tokenAntiquotFn : ParserFn := fun c s => Id.run do if s.hasError then return s let iniSz := s.stackSize let iniPos := s.pos let s := (checkNoWsBefore >> symbolNoAntiquot "%" >> symbolNoAntiquot "$" >> checkNoWsBefore >> antiquotExpr).fn c s if s.hasError then return s.restore iniSz iniPos s.mkNode (`token_antiquot) (iniSz - 1) def tokenWithAntiquot : Parser → Parser := withFn fun f c s => let s := f c s -- fast check that is false in most cases if c.input.get s.pos == '%' then tokenAntiquotFn c s else s def symbol (sym : String) : Parser := tokenWithAntiquot (symbolNoAntiquot sym) instance : Coe String Parser where coe := symbol def nonReservedSymbol (sym : String) (includeIdent := false) : Parser := tokenWithAntiquot (nonReservedSymbolNoAntiquot sym includeIdent) def unicodeSymbol (sym asciiSym : String) : Parser := tokenWithAntiquot (unicodeSymbolNoAntiquot sym asciiSym) /-- Define parser for `$e` (if `anonymous == true`) and `$e:name`. `kind` is embedded in the antiquotation's kind, and checked at syntax `match` unless `isPseudoKind` is true. Antiquotations can be escaped as in `$$e`, which produces the syntax tree for `$e`. -/ @[builtin_doc] def mkAntiquot (name : String) (kind : SyntaxNodeKind) (anonymous := true) (isPseudoKind := false) : Parser := let kind := kind ++ (if isPseudoKind then `pseudo else .anonymous) ++ `antiquot let nameP := node `antiquotName <| checkNoWsBefore ("no space before ':" ++ name ++ "'") >> symbol ":" >> nonReservedSymbol name -- if parsing the kind fails and `anonymous` is true, check that we're not ignoring a different -- antiquotation kind via `noImmediateColon` let nameP := if anonymous then nameP <|> checkNoImmediateColon >> pushNone else nameP -- antiquotations are not part of the "standard" syntax, so hide "expected '$'" on error leadingNode kind maxPrec <| atomic <| setExpected [] "$" >> manyNoAntiquot (checkNoWsBefore "" >> "$") >> checkNoWsBefore "no space before spliced term" >> antiquotExpr >> nameP def withAntiquotFn (antiquotP p : ParserFn) (isCatAntiquot := false) : ParserFn := fun c s => -- fast check that is false in most cases if c.input.get s.pos == '$' then -- Do not allow antiquotation choice nodes here as `antiquotP` is the strictly more general -- antiquotation than any in `p`. -- If it is a category antiquotation, do not backtrack into the category at all as that would -- run *all* parsers of the category, and trailing parsers will later be applied anyway. orelseFnCore (antiquotBehavior := if isCatAntiquot then .acceptLhs else .takeLongest) antiquotP p c s else p c s /-- Optimized version of `mkAntiquot ... <|> p`. -/ @[builtin_doc] def withAntiquot (antiquotP p : Parser) : Parser := { fn := withAntiquotFn antiquotP.fn p.fn info := orelseInfo antiquotP.info p.info } def withoutInfo (p : Parser) : Parser := { fn := p.fn } /-- Parse `$[p]suffix`, e.g. `$[p],*`. -/ @[builtin_doc] def mkAntiquotSplice (kind : SyntaxNodeKind) (p suffix : Parser) : Parser := let kind := kind ++ `antiquot_scope leadingNode kind maxPrec <| atomic <| setExpected [] "$" >> manyNoAntiquot (checkNoWsBefore "" >> "$") >> checkNoWsBefore "no space before spliced term" >> symbol "[" >> node nullKind p >> symbol "]" >> suffix private def withAntiquotSuffixSpliceFn (kind : SyntaxNodeKind) (suffix : ParserFn) : ParserFn := fun c s => Id.run do let iniSz := s.stackSize let iniPos := s.pos let s := suffix c s if s.hasError then return s.restore iniSz iniPos s.mkNode (kind ++ `antiquot_suffix_splice) (s.stxStack.size - 2) /-- Parse `suffix` after an antiquotation, e.g. `$x,*`, and put both into a new node. -/ @[builtin_doc] def withAntiquotSuffixSplice (kind : SyntaxNodeKind) (p suffix : Parser) : Parser where info := andthenInfo p.info suffix.info fn c s := let s := p.fn c s -- fast check that is false in most cases if !s.hasError && s.stxStack.back.isAntiquots then withAntiquotSuffixSpliceFn kind suffix.fn c s else s def withAntiquotSpliceAndSuffix (kind : SyntaxNodeKind) (p suffix : Parser) := -- prevent `p`'s info from being collected twice withAntiquot (mkAntiquotSplice kind (withoutInfo p) suffix) (withAntiquotSuffixSplice kind p suffix) def nodeWithAntiquot (name : String) (kind : SyntaxNodeKind) (p : Parser) (anonymous := false) : Parser := withAntiquot (mkAntiquot name kind anonymous) $ node kind p -- ========================= /-! # End of Antiquotations -/ -- ========================= def sepByElemParser (p : Parser) (sep : String) : Parser := withAntiquotSpliceAndSuffix `sepBy p (symbol (sep.trim ++ "*")) def sepBy (p : Parser) (sep : String) (psep : Parser := symbol sep) (allowTrailingSep : Bool := false) : Parser := sepByNoAntiquot (sepByElemParser p sep) psep allowTrailingSep def sepBy1 (p : Parser) (sep : String) (psep : Parser := symbol sep) (allowTrailingSep : Bool := false) : Parser := sepBy1NoAntiquot (sepByElemParser p sep) psep allowTrailingSep private def mkResult (s : ParserState) (iniSz : Nat) : ParserState := if s.stackSize == iniSz + 1 then s else s.mkNode nullKind iniSz -- throw error instead? def leadingParserAux (kind : Name) (tables : PrattParsingTables) (behavior : LeadingIdentBehavior) : ParserFn := fun c s => Id.run do let iniSz := s.stackSize let (s, ps) := indexed tables.leadingTable c s behavior if s.hasError then return s let ps := tables.leadingParsers ++ ps if ps.isEmpty then -- if there are no applicable parsers, consume the leading token and flag it as unexpected at this position let s := tokenFn [toString kind] c s if s.hasError then return s return s.mkUnexpectedTokenError (toString kind) let s := longestMatchFn none ps c s mkResult s iniSz def leadingParser (kind : Name) (tables : PrattParsingTables) (behavior : LeadingIdentBehavior) (antiquotParser : ParserFn) : ParserFn := withAntiquotFn (isCatAntiquot := true) antiquotParser (leadingParserAux kind tables behavior) def trailingLoopStep (tables : PrattParsingTables) (left : Syntax) (ps : List (Parser × Nat)) : ParserFn := fun c s => longestMatchFn left (ps ++ tables.trailingParsers) c s partial def trailingLoop (tables : PrattParsingTables) (c : ParserContext) (s : ParserState) : ParserState := Id.run do let iniSz := s.stackSize let iniPos := s.pos let (s, ps) := indexed tables.trailingTable c s LeadingIdentBehavior.default if s.hasError then -- Discard token parse errors and break the trailing loop instead. -- The error will be flagged when the next leading position is parsed, unless the token -- is in fact valid there (e.g. EOI at command level, no-longer forbidden token) return s.restore iniSz iniPos if ps.isEmpty && tables.trailingParsers.isEmpty then return s -- no available trailing parser let left := s.stxStack.back let s := s.popSyntax let s := trailingLoopStep tables left ps c s if s.hasError then -- Discard non-consuming parse errors and break the trailing loop instead, restoring `left`. -- This is necessary for fallback parsers like `app` that pretend to be always applicable. return if s.pos == iniPos then s.restore (iniSz - 1) iniPos |>.pushSyntax left else s trailingLoop tables c s /-- Implements a variant of Pratt's algorithm. In Pratt's algorithms tokens have a right and left binding power. In our implementation, parsers have precedence instead. This method selects a parser (or more, via `longestMatchFn`) from `leadingTable` based on the current token. Note that the unindexed `leadingParsers` parsers are also tried. We have the unidexed `leadingParsers` because some parsers do not have a "first token". Example: ``` syntax term:51 "≤" ident "<" term "|" term : index ``` Example, in principle, the set of first tokens for this parser is any token that can start a term, but this set is always changing. Thus, this parsing rule is stored as an unindexed leading parser at `leadingParsers`. After processing the leading parser, we chain with parsers from `trailingTable`/`trailingParsers` that have precedence at least `c.prec` where `c` is the `ParsingContext`. Recall that `c.prec` is set by `categoryParser`. Note that in the original Pratt's algorithm, precedences are only checked before calling trailing parsers. In our implementation, leading *and* trailing parsers check the precedence. We claim our algorithm is more flexible, modular and easier to understand. `antiquotParser` should be a `mkAntiquot` parser (or always fail) and is tried before all other parsers. It should not be added to the regular leading parsers because it would heavily overlap with antiquotation parsers nested inside them. -/ def prattParser (kind : Name) (tables : PrattParsingTables) (behavior : LeadingIdentBehavior) (antiquotParser : ParserFn) : ParserFn := fun c s => let s := leadingParser kind tables behavior antiquotParser c s if s.hasError then s else trailingLoop tables c s def fieldIdxFn : ParserFn := fun c s => let initStackSz := s.stackSize let iniPos := s.pos let curr := c.input.get iniPos if curr.isDigit && curr != '0' then let s := takeWhileFn (fun c => c.isDigit) c s mkNodeToken fieldIdxKind iniPos c s else s.mkErrorAt "field index" iniPos initStackSz def fieldIdx : Parser := withAntiquot (mkAntiquot "fieldIdx" `fieldIdx) { fn := fieldIdxFn info := mkAtomicInfo "fieldIdx" } def skip : Parser := { fn := fun _ s => s info := epsilonInfo } end Parser namespace Syntax section variable [Monad m] def foldArgsM (s : Syntax) (f : Syntax → β → m β) (b : β) : m β := s.getArgs.foldlM (flip f) b def foldArgs (s : Syntax) (f : Syntax → β → β) (b : β) : β := Id.run (s.foldArgsM (pure <| f · ·) b) def forArgsM (s : Syntax) (f : Syntax → m Unit) : m Unit := s.foldArgsM (fun s _ => f s) () end end Syntax end Lean