Old `Nat.repeat` => `Nat.for`
Old `Nat.mrepeat` => `Nat.mfor`
New `Nat.repeat` has type
```
def repeat {α : Type u} (f : α → α) (n : Nat) (a : α) : α :=
``
`List.repeat` => `List.replicate` (like in Haskell)
Avoid weird `ℕ` in List library
@kha: I initially planned to use the UTF8 API only in very special
cases, but I found them to be super useful. They allow us to implement
an efficient String library mostly in Lean.
However, the there was a problem: `abbrev String.Pos := USize`.
This definition is fine for a low level API, but this is not the case
anymore. By having `String.Pos := USize`, we will not be able to
prove natural theorems for the `String` API. For example,
`String.map id s = s` did not hold. We would have to include the
artificial antecedent `s.length <= usizeMax` (or something like this).
I suspect it would be very painful.
So, this commit defines `String.Pos` as `Nat`. The performance
overhead seems to be very small.
The `offset` field is problematic because it prevents us from having an
efficient way of moving back and forth between `String.Pos` and `String.Iterator`.
@kha I temporarily added `String.OldIterator` for making sure the
parser doesn't break. This is a temporary fix that will be eliminated
after we replace `parsec`.
@kha I am finding the UTF8 API super useful. So, I am giving nice names
to it. The API is safe for users and the runtime implementation should
match the reference one.
@kha I've added
iterator.extract : iterator -> iterator -> option string
It returns `none` if the iterators are "incompatible".
If this function is inconvenient to use, we can change it and return the
empty string in these cases.
Given iterators `it1` and `it2`, if they are sharing the same string
object in memory, then the cost is O(pos(it2) - pos(it1)).
If not, we have an extra O(N) step where we check whether the strings
being iterated by it1 and it2 are equal (`N` is the size of the strings).
In most applications, I believe the iterators will share the string
object.
I didn't test the code much. BTW, I found an unrelated bug at
vm_string.cpp. So, I'm not very confident this code is rock solid.
closes#1175
The types `string_imp` and `string.iterator_imp` were supposed to be
marked private, but we cannot do it because we need to provide
`string_imp.mk`, `string_imp.cases_on`, `string.iterator_imp.mk` and
`string.iterator_imp.cases_on` in the VM since we use a different
internal representation. Note that marking them as private does not
work since users can still access `string_imp.cases_on` using
meta-programming.
So, we need better support for private declarations.
Missing feature, char literals do not support non ASCII values.
That is, in the current implementation, we cannot write 'α'.
This will be implemented in the future.
The VM native implementation does not behave correctly for huge
strings (i.e., strings with more than 4G characters).
The problem is that the current implementation relies on
```
size_t force_to_size_t(vm_obj const & o, size_t def)
```
We may also have overflow problems in the string.iterator implementation
code. This is not a big deal right now, since I doubt we will try
to process string with more than 2^32 characters.
@Kha the `core_lib` and tests seem to be working correctly, but
we need more tests.
See issue #1175
BTW, we may have to revise this decision in the future when we decide to
populate the string library with lemmas.
It is inconvenient to prove the lemmas at string/basic.lean since the
tactic framework has not been defined yet.
Anyway, I think it is worth to keep the private for now, and make sure
nobody relies on its implementation.
We want to make sure string users do not depend on the string
implementation. This is the first step.
We need this refactoring *now* to make sure it will not be
super painful to address issue #1175
This suggestion has been discussed at Slack.
We have decided to use #"c" as notation because we wanted to allow `'`
in the beginning of identifiers like in SML and F*. In particular,
we wanted to allow users to use 'a 'b 'c for naming type parameters
like in SML. However, nobody used this notation. In the Lean standard
library, we are using greek letters for naming type parameters.
So, there is no real motivation for the ugly #"c" syntax.