Commit graph

9 commits

Author SHA1 Message Date
Sebastian Ullrich
1a80ea9c8e fix(util/utf8): UTF8 decoding 2017-10-27 09:48:09 -07:00
Leonardo de Moura
9399ce8346 feat(library/vm/vm_string): provide native implementation of type string in the VM
closes #1175

The types `string_imp` and `string.iterator_imp` were supposed to be
marked private, but we cannot do it because we need to provide
`string_imp.mk`, `string_imp.cases_on`, `string.iterator_imp.mk` and
`string.iterator_imp.cases_on` in the VM since we use a different
internal representation. Note that marking them as private does not
work since users can still access `string_imp.cases_on` using
meta-programming.
So, we need better support for private declarations.

Missing feature, char literals do not support non ASCII values.
That is, in the current implementation, we cannot write 'α'.
This will be implemented in the future.

The VM native implementation does not behave correctly for huge
strings (i.e., strings with more than 4G characters).
The problem is that the current implementation relies on
```
size_t force_to_size_t(vm_obj const & o, size_t def)
```
We may also have overflow problems in the string.iterator implementation
code. This is not a big deal right now, since I doubt we will try
to process string with more than 2^32 characters.

@Kha the `core_lib` and tests seem to be working correctly, but
we need more tests.
2017-10-23 10:55:26 -07:00
Leonardo de Moura
28501a0e0e feat(library/init/data/string): string as a list of unicode scalar values, and iterator abstraction
TODO:
- Implement string primitives in the VM.
- Support for unicode char literals.
2017-10-23 10:55:26 -07:00
Sebastian Ullrich
f53fa97c4a feat(frontends/lean): escape identifiers when pretty-printing 2017-06-28 10:43:19 -07:00
Sebastian Ullrich
c9a8c02fdc feat(emacs,frontends/lean/parser,shell/server): more accurate info command using server-side parsing 2016-12-27 10:07:34 -08:00
Sebastian Ullrich
da08079af9 feat(frontends/lean): allow specifying notation spacing via quoted symbols
Unquoted tokens inherit their spacing from the respective reserved definition.
2015-09-30 17:36:32 -07:00
Sebastian Ullrich
189a300b11 feat(frontends/lean): improve pretty printing space insertion heuristic 2015-09-30 17:36:32 -07:00
Leonardo de Moura
656b642c4a fix(frontends/lean): identifier size when using unicode
see issue #756
2015-07-30 11:32:24 -07:00
Leonardo de Moura
469368f090 refactor(frontends/lean/scanner): move basic UTF8 procedures to separate module 2014-10-19 13:29:15 -07:00