Continuation of #3958. To ensure that lean code is able to uphold the
invariant that `String`s are valid UTF-8 (which is assumed by the lean
model), we have to make sure that no lean objects are created with
invalid UTF-8. #3958 covers the case of lean code creating strings via
`fromUTF8Unchecked`, but there are still many cases where C++ code
constructs strings from a `const char *` or `std::string` with unclear
UTF-8 status.
To address this and minimize accidental missed validation, the
`(lean_)mk_string` function is modified to validate UTF-8. The original
function is renamed to `mk_string_unchecked`, with several other
variants depending on whether we know the string is UTF-8 or ASCII and
whether we have the length and/or utf8 char count on hand. I reviewed
every function which leads to `mk_string` or its variants in the C code,
and used the appropriate validation function, defaulting to `mk_string`
if the provenance is unclear.
This PR adds no new error handling paths, meaning that incorrect UTF-8
will still produce incorrect results in e.g. IO functions, they are just
not causing unsound behavior anymore. A subsequent PR will handle adding
better error reporting for bad UTF-8.
Before, app unexpanders would only be applied to entire applications.
However, some notations produce functions, and these functions can be
given additional arguments. The solution so far has been to write app
unexpanders so that they can take an arbitrary number of additional
arguments. However, as reported in [this Zulip
thread](https://leanprover.zulipchat.com/#narrow/stream/270676-lean4/topic/pretty.20printer.20bug/near/420662236),
this leads to misleading hover information in the Infoview. For example,
while `HAdd.hAdd f g 1` pretty prints as `(f + g) 1`, hovering over `f +
g` shows `f`. There is no way to fix the situation from within an app
unexpander; the expression position for `HAdd.hAdd f g` is absent, and
app unexpanders cannot register TermInfo.
This commit changes the app delaborator to try running app unexpanders
on every prefix of an application, from longest to shortest prefix. For
efficiency, it is careful to only try this when app delaborators do in
fact exist for the head constant, and it also ensures arguments are only
delaborated once. Then, in `(f + g) 1`, the `f + g` gets TermInfo
registered for that subexpression, making it properly hoverable.
The app delaborator is also refactored, and there are some bug fixes:
- app unexpanders only run when `pp.explicit` is false
- trailing parameters in under-applied applications are now only
considered up to reducible & instance transparency, which lets, for
example, optional arguments for `IO`-valued functions to be omitted.
(`IO` is a reader monad, so it's hiding a pi type)
- app unexpanders will no longer run for delaborators that use
`withOverApp`
- auto parameters now always pretty print, since we are not verifying
that the provided argument equals the result of evaluating the tactic
Furthermore, the `notation` command has been modified to generate an app
unexpander that relies on the app delaborator's new behavior.
The change to app unexpanders is reverse-compatible, but it's
recommended to update `@[app_unexpander]`s in downstream projects so
that they no longer handle overapplication themselves.
@Kha we do that in Lean 3. It helps when the error is due to incorrect universe levels.
BTW, I had to update `tests/lean/server/content_diag.json` since the
error message is different, but a few other stuff changed too.
Could you please take a look whether the test is still correct?