Skip to content

Commit 0d5a5b1

Browse files
authored
Make support for single-use iterables explicit (#1235)
Hi, could you please consider patch to remove explicit support of single-use iterables in sequences. It fixes #1192. single-use iterables now throw a `TypeError`, when they are implicitly or explicitly coerced into a sequence. A new `iterator-seq` is introduced to explicitly convert them into a seq instead . See #1198 (comment) for the rationale. The `count` fn was also updated to behave the same. There were a few places where single use iterators were converted implicitly to a seq, which are now explicitly converted using a `iterator-seq` or the python equivalent instead. Tests are added for the same, as well as documentation section under Python Iterators. Thanks --------- Co-authored-by: ikappaki <[email protected]>
1 parent 39addb7 commit 0d5a5b1

File tree

13 files changed

+174
-36
lines changed

13 files changed

+174
-36
lines changed

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
88
### Added
99
* Added support for referring imported Python names as by `from ... import ...` (#1154)
1010

11+
### Changed
12+
* Removed implicit support for single-use iterables in sequences, and introduced `iterator-seq` to expliciltly handle them (#1192)
13+
1114
### Fixed
1215
* Fix a bug where protocols with methods with leading hyphens in the could not be defined (#1230)
1316
* Fix a bug where attempting to `:refer` a non-existent Var from another namespace would throw an unhelpful exception (#1231)

docs/pyinterop.rst

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -283,6 +283,41 @@ Type hints may be applied to :lpy:form:`def` names, function arguments and retur
283283
Return annotations are combined as by :external:py:obj:`typing.Union`, so ``typing.Union[str, str] == str``.
284284
The annotations for individual arity arguments are preserved in their compiled form, but they are challenging to access programmatically.
285285

286+
.. _python_iterators:
287+
288+
Python Iterators
289+
----------------
290+
291+
In Python, an **iterable** is an object like a list, range, or generator that can be looped over, while an **iterator** is the object that actually yields each item of the iterable one at a time using ``next()``. They are ubiquituous in Python, showing up in ``for`` loops, list comprehensions and many built in functions.
292+
293+
In Basilisp, iterables are treated as first-class sequences and are :lpy:fn:`basilisp.core/seq`-able, except for **single-use** iterables, which must be explicitly converted to a sequence using :lpy:fn:`basilisp.core/iterator-seq` before use.
294+
295+
Single-use iterables are those that return the same iterator every time one is requested.
296+
This becomes problematic when the single-use iterable is coerced to a sequence more than once. For example:
297+
298+
.. code-block:: clojure
299+
300+
(when (> (count iterable-coll) 0)
301+
(first iterable-coll))
302+
303+
304+
Here, both :lpy:fn:`basilisp.core/count` and :lpy:fn:`basilisp.core/first` internally request an iterator from ``iterable-coll``.
305+
If it is **re-iterable**, each call gets a fresh iterator beginning at the start of the collection, and the code behaves as expected.
306+
But if it is a **single-use** iterable, like a generator, both operations share the same iterator.
307+
As a result, ``count`` consumes all elements, and ``first`` returns ``nil``, which is wrong, since the iterator is already exhausted, leading to incorect behavior.
308+
309+
To prevent this subtle bug, Basilisp throws a :external:py:obj:`TypeError` when an iterator is requested from such functions.
310+
The correct approach is to use :lpy:fn:`basilisp.core/iterator-seq` to create a sequence from it:
311+
312+
.. code-block:: clojure
313+
314+
(let [s (iterator-seq iterable-coll)]
315+
(when (> (count s) 0)
316+
(first s)))
317+
318+
319+
This ensures ``count`` and ``first`` operate on the same stable sequence rather than consuming a shared iterator.
320+
286321
.. _python_decorators:
287322

288323
Python Decorators

src/basilisp/contrib/nrepl_server.lpy

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -299,8 +299,8 @@
299299
(create-ns (symbol ns))
300300
eval-ns)
301301
completions (when-not (str/blank? prefix)
302-
(seq (binding [*ns* completion-ns]
303-
(basilisp.lang.runtime/repl_completions prefix))))]
302+
(iterator-seq (binding [*ns* completion-ns]
303+
(basilisp.lang.runtime/repl_completions prefix))))]
304304
(send-fn request {"completions" (->> (map str completions)
305305
sort
306306
(map (fn [completion]

src/basilisp/contrib/sphinx/autodoc.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ def can_document_member(
8888
return False
8989

9090
def parse_name(self) -> bool:
91-
v = runtime.first(reader.read_str(self.name))
91+
v = runtime.first(runtime.to_iterator_seq(reader.read_str(self.name)))
9292
if isinstance(v, sym.Symbol) and v.ns is None:
9393
self.modname = v.name
9494
self.objpath: list[str] = []

src/basilisp/contrib/sphinx/domain.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -222,7 +222,7 @@ def handle_signature(self, sig: str, signode: desc_signature) -> tuple[str, str]
222222

223223
self._add_source_annotations(signode)
224224

225-
sig_sexp = runtime.first(reader.read_str(sig))
225+
sig_sexp = runtime.first(runtime.to_iterator_seq(reader.read_str(sig)))
226226
assert isinstance(sig_sexp, IPersistentList)
227227
fn_sym = runtime.first(sig_sexp)
228228
assert isinstance(fn_sym, sym.Symbol)
@@ -285,7 +285,7 @@ def get_signature_prefix(self, sig: str) -> str:
285285

286286
def get_index_text(self, modname: str, name: tuple[str, str]) -> str:
287287
sig, prefix = name
288-
sig_sexp = runtime.first(reader.read_str(sig))
288+
sig_sexp = runtime.first(runtime.to_iterator_seq(reader.read_str(sig)))
289289
if isinstance(sig_sexp, IPersistentList):
290290
sig = runtime.first(sig_sexp)
291291
return f"{sig} ({prefix} in {modname})"
@@ -560,7 +560,7 @@ def resolve_xref( # pylint: disable=too-many-arguments
560560
maybe_obj = self.forms.get(target)
561561
title = target
562562
else:
563-
obj_sym = runtime.first(reader.read_str(target))
563+
obj_sym = runtime.first(runtime.to_iterator_seq(reader.read_str(target)))
564564
assert isinstance(
565565
obj_sym, sym.Symbol
566566
), f"Symbol expected; not {obj_sym.__class__}"

src/basilisp/core.lpy

Lines changed: 31 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -197,6 +197,15 @@
197197
(fn seq [o]
198198
(basilisp.lang.runtime/to-seq o)))
199199

200+
(def ^{:doc "Coerce the python Iterable ``it`` to a Seq. Supports both re-iterable
201+
(as with ``seq``) as well as single-use iterables that always return
202+
the same iterator."
203+
:arglists '([it])}
204+
iterator-seq
205+
(fn iterator-seq [it]
206+
(basilisp.lang.runtime/to-iterator-seq it)))
207+
208+
200209
(def ^{:doc "Apply function ``f`` to the arguments provided.
201210

202211
The last argument must always be coercible to a Seq. Intermediate
@@ -326,6 +335,7 @@
326335
(.alter-meta #'vector? assoc :inline ^:safe-py-params (fn [o] `(instance? basilisp.lang.interfaces/IPersistentVector ~o)))
327336
(.alter-meta #'seq? assoc :inline ^:safe-py-params (fn [o] `(instance? basilisp.lang.interfaces/ISeq ~o)))
328337
(.alter-meta #'seq assoc :inline ^:safe-py-params (fn [o] `(basilisp.lang.runtime/to-seq ~o)))
338+
(.alter-meta #'iterator-seq assoc :inline ^:safe-py-params (fn [it] `(basilisp.lang.runtime/to-iterator-seq ~it)))
329339
(.alter-meta #'set assoc :inline ^:safe-py-params (fn [coll] `(basilisp.lang.runtime/to-set ~coll)))
330340
(.alter-meta #'vec assoc :inline ^:safe-py-params (fn [coll] `(basilisp.lang.runtime/vector ~coll)))
331341

@@ -1484,10 +1494,10 @@
14841494
(instance? basilisp.lang.interfaces/IReversible x))
14851495

14861496
(defn ^:inline seqable?
1487-
"Return ``true`` if an :py:class:`basilisp.lang.interfaces.ISeq` can be produced from
1488-
``x``\\."
1497+
"Return ``true`` if :lpy:fn:``seq`` is supported on``x``\\."
14891498
[x]
1490-
(instance? basilisp.lang.interfaces/ISeqable x))
1499+
(or (instance? basilisp.lang.interfaces/ISeqable x)
1500+
(basilisp.lang.runtime/is-reiterable-iterable x)))
14911501

14921502
(defn ^:inline sequential?
14931503
"Return ``true`` if ``x`` implements :py:class:`basilisp.lang.interfaces.ISequential`\\."
@@ -2514,7 +2524,8 @@
25142524
"Return a lazy sequence of the concatenation of ``colls``\\. None of the input
25152525
collections will be evaluated until it is needed."
25162526
[& colls]
2517-
`(concat ~@(map (fn [coll] `(lazy-seq ~coll)) colls)))
2527+
`(concat ~@(iterator-seq (python/map (fn [coll] `(lazy-seq ~coll)) colls))))
2528+
25182529

25192530
(defn dorun
25202531
"Force a lazy sequence be fully realized. Returns ``nil``\\.
@@ -3259,7 +3270,7 @@
32593270
sequence."
32603271
[v]
32613272
(lazy-seq
3262-
(when (and (or (seq? v) (seqable? v)) (seq v))
3273+
(when (and (sequential? v) (seq v))
32633274
(let [e (first v)
32643275
r (rest v)]
32653276
(if (or (seq? e) (seqable? e))
@@ -4570,17 +4581,17 @@
45704581
(read-seq {} stream))
45714582
([opts stream]
45724583
(let [read (:read opts basilisp.lang.reader/read)]
4573-
(seq (read stream
4574-
*resolver*
4575-
*data-readers*
4576-
(:eof opts)
4577-
(= (:eof opts) :eofthrow)
4578-
(:features opts)
4579-
(not= :preserve (:read-cond opts))
4580-
*default-data-reader-fn*
4581-
**
4582-
:init-line (:init-line opts)
4583-
:init-column (:init-column opts))))))
4584+
(iterator-seq (read stream
4585+
*resolver*
4586+
*data-readers*
4587+
(:eof opts)
4588+
(= (:eof opts) :eofthrow)
4589+
(:features opts)
4590+
(not= :preserve (:read-cond opts))
4591+
*default-data-reader-fn*
4592+
**
4593+
:init-line (:init-line opts)
4594+
:init-column (:init-column opts))))))
45844595

45854596
(defn read-string
45864597
"Read a string of Basilisp code.
@@ -5497,7 +5508,7 @@
54975508
string. Otherwise, return a vector with the string in the first position and the
54985509
match groups in the following positions."
54995510
[pattern s]
5500-
(lazy-re-seq (seq (re/finditer pattern s))))
5511+
(lazy-re-seq (iterator-seq (re/finditer pattern s))))
55015512

55025513
;;;;;;;;;;;;;;;;;
55035514
;; Hierarchies ;;
@@ -6125,7 +6136,7 @@
61256136
fmeta (merge (meta fn-decl) (meta name))
61266137
decorators (:decorators fmeta)]
61276138
(if decorators
6128-
(loop [wrappers (seq (python/reversed decorators))
6139+
(loop [wrappers (iterator-seq (python/reversed decorators))
61296140
final fn-decl]
61306141
(if (seq wrappers)
61316142
(recur (rest wrappers)
@@ -7640,13 +7651,13 @@
76407651
(mapcat (fn [^os/DirEntry entry]
76417652
(if (.is-dir entry)
76427653
;; immediate subdirectory
7643-
(os/scandir (.-path entry))
7654+
(iterator-seq (os/scandir (.-path entry)))
76447655
;; top level file
76457656
[entry])))
76467657
(filter #(.is-file %))
76477658
(map #(pathlib/Path (.-path %)))
76487659
(filter (comp #{"data_readers.lpy" "data_readers.cljc"} name)))
7649-
(eduction (os/scandir dir))))))
7660+
(eduction (iterator-seq (os/scandir dir)))))))
76507661
(group-by #(.-parent %))
76517662
vals
76527663
;; Only load one data readers file per directory and prefer

src/basilisp/lang/map.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@
2929
process_lrepr_kwargs,
3030
)
3131
from basilisp.lang.reduced import Reduced
32-
from basilisp.lang.seq import sequence
32+
from basilisp.lang.seq import iterator_sequence
3333
from basilisp.lang.vector import MapEntry
3434
from basilisp.util import partition
3535

@@ -354,7 +354,7 @@ def empty(self) -> "PersistentMap":
354354
def seq(self) -> Optional[ISeq[IMapEntry[K, V]]]:
355355
if len(self._inner) == 0:
356356
return None
357-
return sequence(MapEntry.of(k, v) for k, v in self._inner.items())
357+
return iterator_sequence((MapEntry.of(k, v) for k, v in self._inner.items()))
358358

359359
def to_transient(self) -> TransientMap[K, V]:
360360
return TransientMap(self._inner.mutate())

src/basilisp/lang/runtime.py

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1287,10 +1287,19 @@ def cons(o, seq) -> ISeq:
12871287

12881288
to_seq = lseq.to_seq
12891289

1290+
to_iterator_seq = lseq.iterator_sequence
1291+
1292+
1293+
def is_reiterable_iterable(x: Any) -> bool:
1294+
"""Return ``true`` if x is a re-iterable Iterable object."""
1295+
return isinstance(x, Iterable) and iter(x) is not x
1296+
12901297

12911298
def concat(*seqs: Any) -> ISeq:
12921299
"""Concatenate the sequences given by seqs into a single ISeq."""
1293-
return lseq.sequence(itertools.chain.from_iterable(filter(None, map(to_seq, seqs))))
1300+
return lseq.iterator_sequence(
1301+
itertools.chain.from_iterable(filter(None, map(to_seq, seqs)))
1302+
)
12941303

12951304

12961305
T_reduce_init = TypeVar("T_reduce_init")
@@ -1391,6 +1400,11 @@ def apply_kw(f, args):
13911400

13921401
@functools.singledispatch
13931402
def count(coll) -> int:
1403+
if isinstance(coll, Iterable) and iter(coll) is coll:
1404+
raise TypeError(
1405+
f"The count function is not supported on single-use iterable objects because it would exhaust them during counting. Please use iterator-seq to coerce them into sequences first. Iterable Object type: {type(coll)}"
1406+
)
1407+
13941408
try:
13951409
return sum(1 for _ in coll)
13961410
except TypeError as e:

src/basilisp/lang/seq.py

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -218,10 +218,20 @@ def empty(self):
218218
return EMPTY
219219

220220

221-
def sequence(s: Iterable[T]) -> ISeq[T]:
222-
"""Create a Sequence from Iterable s."""
221+
def sequence(s: Iterable[T], support_single_use: bool = False) -> ISeq[T]:
222+
"""Create a Sequence from Iterable `s`.
223+
224+
By default, raise a ``TypeError`` if `s` is a single-use
225+
Iterable, unless `fail_single_use` is ``True``.
226+
227+
"""
223228
i = iter(s)
224229

230+
if not support_single_use and i is s:
231+
raise TypeError(
232+
f"Can't create sequence out of single-use iterable object, please use iterator-seq instead. Iterable Object type: {type(s)}"
233+
)
234+
225235
def _next_elem() -> ISeq[T]:
226236
try:
227237
e = next(i)
@@ -233,6 +243,11 @@ def _next_elem() -> ISeq[T]:
233243
return LazySeq(_next_elem)
234244

235245

246+
def iterator_sequence(s: Iterable[T]) -> ISeq[T]:
247+
"""Create a Sequence from any iterable `s`."""
248+
return sequence(s, support_single_use=True)
249+
250+
236251
@overload
237252
def _seq_or_nil(s: None) -> None: ...
238253

src/basilisp/lang/vector.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424
from basilisp.lang.obj import PrintSettings
2525
from basilisp.lang.obj import seq_lrepr as _seq_lrepr
2626
from basilisp.lang.reduced import Reduced
27-
from basilisp.lang.seq import sequence
27+
from basilisp.lang.seq import iterator_sequence, sequence
2828
from basilisp.util import partition
2929

3030
T = TypeVar("T")
@@ -213,7 +213,7 @@ def pop(self) -> "PersistentVector[T]":
213213
return self[:-1]
214214

215215
def rseq(self) -> ISeq[T]:
216-
return sequence(reversed(self))
216+
return iterator_sequence(reversed(self))
217217

218218
def to_transient(self) -> TransientVector:
219219
return TransientVector(self._inner.evolver())

0 commit comments

Comments
 (0)