@kha I renamed the homogeneous `map` to `hmap`, and added the
heterogeneous one as `map`. As soon as we add user-defined rewriting
rules, we will be able to replace `map` with `hmap` whenever the types
are the same.
Old `Nat.repeat` => `Nat.for`
Old `Nat.mrepeat` => `Nat.mfor`
New `Nat.repeat` has type
```
def repeat {α : Type u} (f : α → α) (n : Nat) (a : α) : α :=
``
`List.repeat` => `List.replicate` (like in Haskell)
Avoid weird `ℕ` in List library
The array read and write operations are now called:
- "Comfortable" version (with runtime bound checks):
`Array.get` and `Array.set` like OCaml.
It is also consistent with `Ref.get` and `Ref.put`,
and `get` and `set` for `MonadState`.
- `Fin` version (without runtime bound checks):
`Array.index` and `Array.update` like in F*.
- `USize` version (without runtime bound checks and unboxing):
`Array.idx` and `Array.updt`.
cc @kha
Without these annotations, Lean will timeout when trying to synthesize
the type class instance `decidable_eq uint32`. The type class resolution
problem will produce the unification problem:
```
decidable (@eq uint32 a b) =?= decidable (@eq usize ?x ?y)
```
which Lean tries to solve by assigning `?x := a`.
During the assignment, the types of `?x` and `a` are unified with "full
force". Thus, we get the constraint
```
usize_sz =?= uint32_sz
```
which will take forever to be solved when peforming the computation in
unary arithmetic.
Remark: this commit also makes sure that `type_context` will not unfold
irreducible definitions when trying to unify/match the types.
The new test `type_class_performance1.lean` exposes the problem fixed
by this commit.
Most efficient hash functions use uint32/uint64 and produce values
that do not fit in out small nat representation. Thus, GMP big numbers
would have to be created.
The array dimension is now stored inside the array.
The main motivation is that it reflects the actual runtime implementation.
We need to store the array size to be able to GC it.
So, it feels silly to have the array size stored in each array object,
but we cannot use this information.
For example, in the `hashmap` we implemented the bucket array using
`array`, and we store the `size` of the array.
Same for the Lean3 `buffer` object.
The `buffer` object doesn't even need to exist.
The actual `array` implementation is the `buffer`
All theorems are proved without using the tactic framework.
Thus, we can define `fin/uint32/uint64` types and their operations
before we define the tactic framework.
@kha I added the `d_array` type that we discussed today.
However, the VM implemantion is still using persistent arrays.
If we remove the persistent array support, then code using
hash_map will only be efficient if the hash_map is used linearly.
This is not the case in the reader module because we are planning
to support backtracking.
On the other hand, it is awkward we currently don't have a vanilla
array implementation in the VM. I suspect this will be a problem in
the future.
So, I see the following possibilities:
1- We implement a map data-structure using red-black trees in Lean.
We use this new data-structure to implement all maps in the new reader and
macro expander.
2- We implement a very simple map as a list of pairs.
Then, we replace it in the VM with an efficient implementation.
The VM implementation may use our internal red-black trees.
We may also use a persistent hash table implemented in C++,
but it would be awkward to ask the user to provide a hash function in the reference
implementation (i.e., the one using list of pairs), but not use it
anywhere :)
In contrast, if we use the red-black tree implementation we
would have to ask the user to provide a total order.
It is overkill for the list of pair reference implementation because
we just need an equality test, but, at least, the comparison function
will be used in the implementation.
3- Add types `d_parray` (dependent persistent array) and
`parray` (persistent array). In Lean, they would just wrap the
`d_array` and `array` types. In the VM, `d_array` and `array` would
be implemented using vanilla arrays and `d_parray` and `parray` would
be implemented using persistent arrays. Then, we could have
`d_hash_map`, `hash_map`, `d_phash_map` and `phash_map`. Argh, so many
versions :(
We would use `phash_map` to implement our reader and macro expander.
4- Add a `(persistent : bool := ff)` parameter to `d_array` and
`array` types. The disadvantage of this approach is that it has
a performance impact. The VM implementation would have to check
the `persisent` flag at runtime. The value of this flag is known
at compilation time, but we currently don't have a mechanism
for specializing native builtin C++ implementations for VM functions.
This modification was suggested by @kha.
TODO:
- Use `simp [-f]` instead of `simp without f`
- Allow users to remove hypothesis from `*`. Example: `simp [*, -h]`
for simplify using all hypotheses but `h`.