lean4-htt/src/Lean/Compiler/IR
Mario Carneiro 0a1a855ba8
fix: validate UTF-8 at C++ -> Lean boundary (#3963)
Continuation of #3958. To ensure that lean code is able to uphold the
invariant that `String`s are valid UTF-8 (which is assumed by the lean
model), we have to make sure that no lean objects are created with
invalid UTF-8. #3958 covers the case of lean code creating strings via
`fromUTF8Unchecked`, but there are still many cases where C++ code
constructs strings from a `const char *` or `std::string` with unclear
UTF-8 status.

To address this and minimize accidental missed validation, the
`(lean_)mk_string` function is modified to validate UTF-8. The original
function is renamed to `mk_string_unchecked`, with several other
variants depending on whether we know the string is UTF-8 or ASCII and
whether we have the length and/or utf8 char count on hand. I reviewed
every function which leads to `mk_string` or its variants in the C code,
and used the appropriate validation function, defaulting to `mk_string`
if the provenance is unclear.

This PR adds no new error handling paths, meaning that incorrect UTF-8
will still produce incorrect results in e.g. IO functions, they are just
not causing unsound behavior anymore. A subsequent PR will handle adding
better error reporting for bad UTF-8.
2024-06-19 14:05:48 +00:00
..
Basic.lean fix: accidental ownership with specialization 2024-06-07 13:59:22 +02:00
Borrow.lean fix: accidental ownership with specialization 2024-06-07 13:59:22 +02:00
Boxing.lean perf: add prelude to all Lean modules 2024-02-18 14:55:17 -08:00
Checker.lean perf: add prelude to all Lean modules 2024-02-18 14:55:17 -08:00
CompilerM.lean perf: add prelude to all Lean modules 2024-02-18 14:55:17 -08:00
CtorLayout.lean perf: add prelude to all Lean modules 2024-02-18 14:55:17 -08:00
ElimDeadBranches.lean perf: add prelude to all Lean modules 2024-02-18 14:55:17 -08:00
ElimDeadVars.lean perf: add prelude to all Lean modules 2024-02-18 14:55:17 -08:00
EmitC.lean fix: validate UTF-8 at C++ -> Lean boundary (#3963) 2024-06-19 14:05:48 +00:00
EmitLLVM.lean fix: validate UTF-8 at C++ -> Lean boundary (#3963) 2024-06-19 14:05:48 +00:00
EmitUtil.lean perf: add prelude to all Lean modules 2024-02-18 14:55:17 -08:00
ExpandResetReuse.lean perf: add prelude to all Lean modules 2024-02-18 14:55:17 -08:00
Format.lean perf: add prelude to all Lean modules 2024-02-18 14:55:17 -08:00
FreeVars.lean perf: add prelude to all Lean modules 2024-02-18 14:55:17 -08:00
LiveVars.lean perf: add prelude to all Lean modules 2024-02-18 14:55:17 -08:00
LLVMBindings.lean perf: add prelude to all Lean modules 2024-02-18 14:55:17 -08:00
NormIds.lean perf: add prelude to all Lean modules 2024-02-18 14:55:17 -08:00
PushProj.lean perf: add prelude to all Lean modules 2024-02-18 14:55:17 -08:00
RC.lean fix: double reset bug at ResetReuse (#4028) 2024-04-29 23:26:07 +00:00
ResetReuse.lean feat: relaxed reset/reuse in the code generator (#4100) 2024-05-07 22:08:32 +00:00
SimpCase.lean perf: add prelude to all Lean modules 2024-02-18 14:55:17 -08:00
Sorry.lean perf: add prelude to all Lean modules 2024-02-18 14:55:17 -08:00
UnboxResult.lean perf: add prelude to all Lean modules 2024-02-18 14:55:17 -08:00