lean4-htt

Author	SHA1	Message	Date
Mario Carneiro	0a1a855ba8	fix: validate UTF-8 at C++ -> Lean boundary (#3963 ) Continuation of #3958. To ensure that lean code is able to uphold the invariant that `String`s are valid UTF-8 (which is assumed by the lean model), we have to make sure that no lean objects are created with invalid UTF-8. #3958 covers the case of lean code creating strings via `fromUTF8Unchecked`, but there are still many cases where C++ code constructs strings from a `const char *` or `std::string` with unclear UTF-8 status. To address this and minimize accidental missed validation, the `(lean_)mk_string` function is modified to validate UTF-8. The original function is renamed to `mk_string_unchecked`, with several other variants depending on whether we know the string is UTF-8 or ASCII and whether we have the length and/or utf8 char count on hand. I reviewed every function which leads to `mk_string` or its variants in the C code, and used the appropriate validation function, defaulting to `mk_string` if the provenance is unclear. This PR adds no new error handling paths, meaning that incorrect UTF-8 will still produce incorrect results in e.g. IO functions, they are just not causing unsound behavior anymore. A subsequent PR will handle adding better error reporting for bad UTF-8.	2024-06-19 14:05:48 +00:00
Mario Carneiro	70a23945bf	feat: add model implementation for UTF8 enc/dec (#3961 ) - [x] Depends on: #3958 - [x] Depends on: #3960 This makes the UTF-8 encode and decode functions have lean definitions, so that we can prove properties about them downstream.	2024-04-22 10:24:53 +00:00
Mario Carneiro	62cdb51ed5	feat: UTF-8 string validation (#3958 ) Previously, there was a function `opaque fromUTF8Unchecked : ByteArray -> String` which would convert a list of bytes into a string, but as the name implies it does not validate that the string is UTF-8 before doing so and as a result it produces unsound results in the compiler (because the lean model of `String` indirectly asserts UTF-8 validity). This PR replaces that function by ```lean opaque validateUTF8 (a : @& ByteArray) : Bool opaque fromUTF8 (a : @& ByteArray) (h : validateUTF8 a) : String ``` so that while the function is still "unchecked", we have a proof witness that the string is valid. To recover the original, actually unchecked version, use `lcProof` or other unsafe methods to produce the proof witness. Because this was the only `ByteArray -> String` conversion function, it was used in several places in an unsound way (e.g. reading untrusted input from IO and treating it as UTF-8). These have been replaced by `fromUTF8?` or `fromUTF8!` as appropriate.	2024-04-20 18:36:37 +00:00
Sebastian Ullrich	70f99ab655	chore: placate GCC	2021-09-23 16:31:41 +02:00
Leonardo de Moura	f9bc4b9b3a	feat: add missing APIs	2021-09-11 15:39:11 -07:00
Leonardo de Moura	c8406a301d	chore: reduce `src/include/lean`	2021-09-07 08:24:54 -07:00
Leonardo de Moura	c71eebde8c	chore: remove `util/buffer.h` dependency from `runtime`	2020-12-14 18:07:28 -08:00
Leonardo de Moura	2f1ec93289	chore: move runtime implementation to `src/runtime`	2020-05-22 14:35:16 -07:00
Leonardo de Moura	1a77ee4f89	chore: delete old runtime directory	2020-05-18 11:33:18 -07:00
Leonardo de Moura	8bdca35282	chore: use `#include <lean/runtime/...>` for runtime .h files	2020-05-18 11:30:07 -07:00
Leonardo de Moura	01b4983fa2	fix(runtime/object): `string_utf8_extract`	2019-03-09 12:57:51 -08:00
Leonardo de Moura	c862ce4a75	feat(runtime, library/init/data/string/basic): add `utf8_pos` `utf8_pos` is a low level alternative for `string.iterator`. TODO: implement `string.iterator` using it.	2019-03-09 12:30:19 -08:00
Leonardo de Moura	13c532d0d4	fix(*): truncation bugs - Lean strings (like std::string) may contain null characters. The codebase was ignoring this issue. - We now have a wrapper `string_ref` for wrapping Lean string objects in C++. This wrapper also implements correctly the coercions std::string <-> string_ref. Remark: I also found a few places where the code relies on the following property which is not true Forall s : std::string, std::string(s.c_str()) == s - `name` object wrapper was assuming that all numerals were small `nat` values. This is true in most cases, but the system would crash when processing if it is a big number. - The commit tries to make sure runtime/util/kernel are correct. Modules that will be deleted contain many `TODO` comments indicating they may crash and/or produce incorrect results when strings contain null characters and numerals are big. cc @kha @kha: I thought about using `string` instead of `string_ref`. We consistently use `std::string`. So, it should be fine, but I was concerned about code readability. After we bootstrap Lean4, we will be able to delete `lean::list` template, and rename `lean::list_ref` to `lean::list`. I am going to add `pair_ref` for wrapping Lean pair objects. If we use `lean::string` instead of `lean::string_ref`, then we should also use `lean::pair` instead of `lean::pair_ref`. But, there is a problem in this case since we have https://github.com/leanprover/lean4/blob/master/src/util/pair.h#L13 :(	2018-06-15 16:05:11 -07:00
Leonardo de Moura	fe2d416cde	fix(runtime,util,kernel): should not use strcmp to compare Lean string objects Reason: - UTF8 encoding - Lean strings may contain null char. That is, null char is not an end of string delimiter like in C. Lean string objects are similar to std::string	2018-06-15 16:05:11 -07:00
Leonardo de Moura	5d53eccb59	feat(runtime): string support	2018-05-17 13:11:47 -07:00
Leonardo de Moura	8ee2f4fea1	feat(*): basic runtime string support	2018-05-14 16:52:55 -07:00

16 commits