Commit graph

15 commits

Author SHA1 Message Date
Mario Carneiro
70a23945bf
feat: add model implementation for UTF8 enc/dec (#3961)
- [x] Depends on: #3958 
- [x] Depends on: #3960

This makes the UTF-8 encode and decode functions have lean definitions,
so that we can prove properties about them downstream.
2024-04-22 10:24:53 +00:00
Mario Carneiro
62cdb51ed5
feat: UTF-8 string validation (#3958)
Previously, there was a function `opaque fromUTF8Unchecked : ByteArray
-> String` which would convert a list of bytes into a string, but as the
name implies it does not validate that the string is UTF-8 before doing
so and as a result it produces unsound results in the compiler (because
the lean model of `String` indirectly asserts UTF-8 validity). This PR
replaces that function by
```lean
opaque validateUTF8 (a : @& ByteArray) : Bool

opaque fromUTF8 (a : @& ByteArray) (h : validateUTF8 a) : String
```
so that while the function is still "unchecked", we have a proof witness
that the string is valid. To recover the original, actually unchecked
version, use `lcProof` or other unsafe methods to produce the proof
witness.

Because this was the only `ByteArray -> String` conversion function, it
was used in several places in an unsound way (e.g. reading untrusted
input from IO and treating it as UTF-8). These have been replaced by
`fromUTF8?` or `fromUTF8!` as appropriate.
2024-04-20 18:36:37 +00:00
Sebastian Ullrich
70f99ab655 chore: placate GCC 2021-09-23 16:31:41 +02:00
Leonardo de Moura
f9bc4b9b3a feat: add missing APIs 2021-09-11 15:39:11 -07:00
Leonardo de Moura
c8406a301d chore: reduce src/include/lean 2021-09-07 08:24:54 -07:00
Leonardo de Moura
c71eebde8c chore: remove util/buffer.h dependency from runtime 2020-12-14 18:07:28 -08:00
Leonardo de Moura
2f1ec93289 chore: move runtime implementation to src/runtime 2020-05-22 14:35:16 -07:00
Leonardo de Moura
1a77ee4f89 chore: delete old runtime directory 2020-05-18 11:33:18 -07:00
Leonardo de Moura
8bdca35282 chore: use #include <lean/runtime/...> for runtime .h files 2020-05-18 11:30:07 -07:00
Leonardo de Moura
01b4983fa2 fix(runtime/object): string_utf8_extract 2019-03-09 12:57:51 -08:00
Leonardo de Moura
c862ce4a75 feat(runtime, library/init/data/string/basic): add utf8_pos
`utf8_pos` is a low level alternative for `string.iterator`.
TODO: implement `string.iterator` using it.
2019-03-09 12:30:19 -08:00
Leonardo de Moura
13c532d0d4 fix(*): truncation bugs
- Lean strings (like std::string) may contain null characters. The
  codebase was ignoring this issue.

- We now have a wrapper `string_ref` for wrapping Lean string objects in
  C++. This wrapper also implements correctly the coercions std::string <-> string_ref.
  Remark: I also found a few places where the code relies on the
  following property which is not true
  Forall s : std::string, std::string(s.c_str()) == s

- `name` object wrapper was assuming that all numerals were small
  `nat` values. This is true in most cases, but the system would
  crash when processing if it is a big number.

- The commit tries to make sure runtime/util/kernel are correct.
  Modules that will be deleted contain many `TODO` comments
  indicating they may crash and/or produce incorrect results
  when strings contain null characters and numerals are big.

cc @kha

@kha: I thought about using `string` instead of `string_ref`.
We consistently use `std::string`. So, it should be fine, but I
was concerned about code readability.

After we bootstrap Lean4, we will be able to delete `lean::list`
template, and rename `lean::list_ref` to `lean::list`.

I am going to add `pair_ref` for wrapping Lean pair objects.
If we use `lean::string` instead of `lean::string_ref`, then
we should also use `lean::pair` instead of `lean::pair_ref`.
But, there is a problem in this case since we have
https://github.com/leanprover/lean4/blob/master/src/util/pair.h#L13
:(
2018-06-15 16:05:11 -07:00
Leonardo de Moura
fe2d416cde fix(runtime,util,kernel): should not use strcmp to compare Lean string objects
Reason:
- UTF8 encoding
- Lean strings may contain null char. That is, null char is not an end
  of string delimiter like in C. Lean string objects are similar to std::string
2018-06-15 16:05:11 -07:00
Leonardo de Moura
5d53eccb59 feat(runtime): string support 2018-05-17 13:11:47 -07:00
Leonardo de Moura
8ee2f4fea1 feat(*): basic runtime string support 2018-05-14 16:52:55 -07:00
Renamed from src/util/utf8.cpp (Browse further)