From e42892cfb6ce409503caa01665d77e6744edb06d Mon Sep 17 00:00:00 2001 From: Tom Levy Date: Tue, 23 Sep 2025 22:19:20 +1200 Subject: [PATCH] doc: fix comment about String.fromUTF8 replacing invalid chars (#10240) Hi, the doc of `String.fromUTF8` previously said invalid characters are replaced with 'A'. But the parameter `h : validateUTF8 a` guarantees there are no invalid characters, so that explanation doesn't make sense to me. This PR deletes that explanation (and fixes some unrelated typos). I also have a patch that uses `h` to prove each of the characters is valid, eliminating the need for a default character ([pr/chore-String-fromUTF8-prove-valid](https://github.com/leanprover/lean4/commit/27f1ff36b24877ec0a4bfb129f0f0962106a995c)), would you be interested in merging that?
Notes on invalid characters from unchecked C++ I don't know if this function may be called from unchecked C++ with invalid characters. If it may, I'm not sure what would happen with my patched function... I'm not familiar with Lean's safety model, but it seems like a bad idea to have a Lean function that takes a proof of a proposition but is expected to operate in a certain way even if the proposition is false. I think the safe approach is to have two functions -- one that takes a proof and is only called from Lean, and another that doesn't take a proof and replaces invalid chars (for use from C++, not sure whether it's useful from Lean); I'd prefer to go even further and report an error instead of silently replacing invalid characters (I'm not sure if there is any easy way to report errors/panic in Lean code called from C++).
--- src/Init/Data/String/Extra.lean | 3 +-- src/Init/Tactics.lean | 4 ++-- 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/src/Init/Data/String/Extra.lean b/src/Init/Data/String/Extra.lean index b96b36fb23..89fd79abc7 100644 --- a/src/Init/Data/String/Extra.lean +++ b/src/Init/Data/String/Extra.lean @@ -104,8 +104,7 @@ where /-- Decodes an array of bytes that encode a string as [UTF-8](https://en.wikipedia.org/wiki/UTF-8) into -the corresponding string. Invalid UTF-8 characters in the byte array result in `(default : Char)`, -or `'A'`, in the string. +the corresponding string. -/ @[extern "lean_string_from_utf8_unchecked"] def fromUTF8 (a : @& ByteArray) (h : validateUTF8 a) : String := diff --git a/src/Init/Tactics.lean b/src/Init/Tactics.lean index 5588fcf532..5361aec984 100644 --- a/src/Init/Tactics.lean +++ b/src/Init/Tactics.lean @@ -1025,7 +1025,7 @@ The form ``` fun_induction f ``` -(with no arguments to `f`) searches the goal for an unique eligible application of `f`, and uses +(with no arguments to `f`) searches the goal for a unique eligible application of `f`, and uses these arguments. An application of `f` is eligible if it is saturated and the arguments that will become targets are free variables. @@ -1058,7 +1058,7 @@ The form ``` fun_cases f ``` -(with no arguments to `f`) searches the goal for an unique eligible application of `f`, and uses +(with no arguments to `f`) searches the goal for a unique eligible application of `f`, and uses these arguments. An application of `f` is eligible if it is saturated and the arguments that will become targets are free variables.