lean4-htt/src/Std/Data/Iterators.lean
Paul Reichert 90462e2551
feat: introduce iterator combinators filterMap, filter and map (#8451)
This PR provides the iterator combinator `filterMap` in a pure and
monadic version and specializations `map` and `filter`. This new
combinator allows to apply a function to the emitted values of a stream
while filtering out certain elements.

`map` should have an optimized `IteratorCollect` implementation but it
turns out that this is not possible without a major refactor of
`IteratorCollect`: `toArrayMapped` requires a proof that the iterator is
finite. If `it.mapM f` is `Finite` but `it` is not, then such a proof
does not exist. `IteratorCollect` needs to take a proof that the loop
will terminate for the given monadic function `f` instead. This will not
be done in this PR.
2025-05-30 13:43:41 +00:00

106 lines
4.5 KiB
Text
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

/-
Copyright (c) 2025 Lean FRO, LLC. All rights reserved.
Released under Apache 2.0 license as described in the file LICENSE.
Authors: Paul Reichert
-/
prelude
import Std.Data.Iterators.Basic
import Std.Data.Iterators.Producers
import Std.Data.Iterators.Consumers
import Std.Data.Iterators.Combinators
import Std.Data.Iterators.Lemmas
import Std.Data.Iterators.PostConditionMonad
import Std.Data.Iterators.Internal
/-!
# Iterators
The `Std.Data.Iterators` module provides a uniform abstraction over data that can be iterated over
in a sequential way, with a focus on convenience and efficiency. It is heavily inspired by Rust's
iterator library and Haskell's `streamly`.
An iterator is an abstraction of a sequence of values that may or may not terminate. For example,
every list can be traversed with an iterator via `List.iter`.
Most users of the iterator API will just put together existing library functions that
create, combine and consume iterators. Consider a simple example:
```lean
-- [1, 2, 3].iter : Iter (α := ListIterator α) Nat (α being implicit)
#check [1, 2, 3].iter
-- 12
#eval [1, 2, 3].iter.map (· * 2) |>.fold (init := 0) (· + ·)
```
An iterator that emits values in `β` is an element of the type `Iter (α := ?) β`. The implicit
type argument `α` contains stateful information about the iterator. `IterM (α := ?) m β` represents
iterators over a monad `m`. In both cases, the implementation is provided by a typeclass
`Iterator α m β`, where `m` is a monad in which the iteration happens.
The heart of an iterator `it : Iter β` is its `it.step` function, which returns `it.Step α β`.
Here, `it.Step` is a type that either (1) provides an output value in `β` and a
successor iterator (`yield`), (2) provides only a successor iterator with no output (`skip`), or
(3) signals that the iterator has finished and will provide no more outputs (`done`).
For technical reasons related to termination proofs, the returned `it.Step` also contains proof
that it is a "plausible" step obtained from `it`.
The `step` function can also be used by hand:
```lean
def sumrec (l : List Nat) : Nat :=
go (l.iter.map (2 * ·)) 0
where
go it acc :=
match it.step with
| .yield it' n _ => go it' (acc + n)
| .skip it' _ => go it' acc
| .done _ => acc
termination_by it.finitelyManySteps
```
In general, iterators do not need to terminate after finitely many steps. This example
works because the iterator type at hand has an instance of the `Std.Iterators.Finite` typeclass.
Iterators that may not terminate but will not end up in an infinite sequence of `.skip`
steps are called productive. This behavior is encoded in the `Std.Iterators.Productive` typeclass.
## Stability
The API for the usage of iterators provided in this module can be considered stable, as well as
the API for the verification of programms using iterators, unless explicitly stated otherwise.
Contrarily, the API for implementation of new iterators, including the design of the `Iterator`
typeclass, is still experimental and will change in the future. It is already planned that there
will be a breaking change to make the iterators more flexible with regard to universes, a change
that needs to wait for a language feature.
## Module structure
A distinction that cuts through the whole module is that between pure and monadic
iterators. Each submodule contains a dedicated `Monadic` sub-submodule.
All of the following module names are prefixed with `Std.Data.Iterators`.
### Basic iterator API
* `Basic` defines `Iter` and `IterM` as well as `Iterator`, `Finite`
and `Productive` typeclasses.
* `Producers` provides iterator implementations for common data types.
* `Consumers` provides functions that take one or more iterators, iterate over them and potentially
return some result. Examples are the `toList` function and an instance for the `ForIn` typeclass.
These operations allow for what is also known as *internal iteration*, where the caller delegates
the control flow during the iteration to the called consumer.
* `Combinators` will provide operations that transform one or more iterators into a new iterator
as soon as the first such combinator has been implemented.
### Verification API
`Lemmas` provides the means to verify programs that use iterators.
### Implementation details
`Internal` contains code that should not be relied upon because it may change in the future.
This whole module is explicitly experimental and it is not advisable for downstream users to
expect stability to implement their own iterators at this point in time.
-/