lean4-htt/tests/lean/301.lean.expected.out at dfd3d185300df044fa087d3fa166cb6fb3bd8269 - max/lean4-htt - maxgit: wireguard access, git repository.

max/lean4-htt

Mario Carneiro 0a1a855ba8

fix: validate UTF-8 at C++ -> Lean boundary (#3963 )

Continuation of #3958. To ensure that lean code is able to uphold the
invariant that `String`s are valid UTF-8 (which is assumed by the lean
model), we have to make sure that no lean objects are created with
invalid UTF-8. #3958 covers the case of lean code creating strings via
`fromUTF8Unchecked`, but there are still many cases where C++ code
constructs strings from a `const char *` or `std::string` with unclear
UTF-8 status.

To address this and minimize accidental missed validation, the
`(lean_)mk_string` function is modified to validate UTF-8. The original
function is renamed to `mk_string_unchecked`, with several other
variants depending on whether we know the string is UTF-8 or ASCII and
whether we have the length and/or utf8 char count on hand. I reviewed
every function which leads to `mk_string` or its variants in the C code,
and used the appropriate validation function, defaulting to `mk_string`
if the provenance is unclear.

This PR adds no new error handling paths, meaning that incorrect UTF-8
will still produce incorrect results in e.g. IO functions, they are just
not causing unsound behavior anymore. A subsequent PR will handle adding
better error reporting for bad UTF-8.

2024-06-19 14:05:48 +00:00

2 lines

73 B

Text

Raw Blame History

	`301.lean:1:5: error: expected token`
	`301.lean:1:21: error: expected token`