We've learned before what an iterable is. It is an object we can iterate over - we can go over its items one by one.
An iterator is an object that does the actual iteration over an iterable. It provides iteration, enabling us to iterate over an iterable.
Python documentation says it is:
An object representing a stream of data
Iterator reads data from iterable and returns them one by one. Data might be from a container (list, tuple etc.) or other sources like files, network connections, etc.
So an iterator is an intermediary between the data source - iterable - and code that needs to iterate over its data.
Getting an iterator for iterablePermalink
We use the iter()
function to get an iterator for iterable.
my_list = [1, 2, 3]
iter(my_list)
# outputs: <list_iterator object at 0x10bbadc90>
iter()
function calls the __iter__()
method on the iterable to obtain an iterator.
Iterable's __iter__()
method is responsible for creating and returning an iterator.
As we'll see later, the iterator needs access to the iterable to get data from it. So when the __iter__
method creates an iterator, it passes the iterable itself to the iterator.
(Another option is that iterable has the method __getitem__()
- we’ve discussed it in the article about iterable)
Getting data from the iteratorPermalink
Once we have an iterator for our iterable, how do we get data from it?
We pass the iterator to the next()
function.
# create our iterable
numbers = [1,2,3]
# we get the iterator
it = iter(numbers)
# we call next() to get item
next(it)
1
next(it)
2
The next()
function tells the iterator to give us the next item from the iterable.
Every time we iterate over an iterable (using for
, in
, etc.), behind the scenes, Python uses the next()
function to get items one by one.
If the object passed to the next()
function is not an iterator, next()
raises TypeError
.
>>> next(5)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'int' object is not an iterator
>>>
But how does the next()
know if an object is an iterator and how it gets data from it?
What makes object an iterator?Permalink
The next()
function checks if an object has the __next__()
method.
If it does, it uses it to get the next item.
If it does not then next()
raises TypeError
.
So, an iterator is an object with the
__next__()
method.
The __next__()
method is what makes an object an iterator. It is responsible for returning the next item from the iterable.
The next()
function calls the __next__()
method and returns what it returned.
We can also use the __next__()
method directly:
numbers = [1,2,3]
it = iter(numbers)
it.__next__()
1
it.__next__()
2
But as with any dunder method, we should not. It’s better to use the next()
function.
We will write our own iterator with the __next__()
method shortly.
Now, let's look at what happens if there is no next item in an iterable?
ExhaustionPermalink
When there is no next item, we say the iterator is exhausted, and the __next__()
method must raise StopIteration error.
numbers = [1,2,3]
it = iter(numbers)
it.__next__()
1
it.__next__()
2
it.__next__()
3
# now we call __next__() again and we get an error
it.__next__()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Once the __next__()
method raises the StopIteration error, it needs to continue to do so for subsequent calls. Otherwise, it’s broken.
Let’s call __next()__
once more:
>>> it.__next__()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
We should always use the next()
function instead of using the __next__()
method directly. When __next__()
raises StopIteration, the next()
function propagates that error.
numbers = [1,2,3]
# get iterator
it = iter(numbers)
next(it)
1
next(it)
2
next(it)
3
next(it)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Once the iterator is exhausted, it is of no use anymore. We can’t get any more items from iterable using exhausted iterator.
If we want to iterate over an iterable again, we need to get a new iterator from it using iter()
:
numbers = [1,2,3]
# get iterator
it = iter(numbers)
next(it)
1
next(it)
2
next(it)
3
next(it)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
# ^^ we exhausted iterator `it`
# but we can get a new iterator
it2 = iter(numbers)
next(it2)
1
Getting a new iterator is not possible for all iterables, though. We’ll explain that bit later.
How does it all work together?Permalink
So how does all of this come together?
Every time we iterate over the iterable with a for loop, Python uses the iter()
function (which uses the __iter__()
method) and the next()
function (which uses the __next__()
method) behind the scenes.
It works like this:
- Python uses the
iter()
function to get an iter-ator for an object. - The
iter()
function uses the__iter__()
method of the object to get an iterator. If the iterable does not have an__iter__()
method (in which case object is not iter-able) or an object returned from the__iter__()
is not an iterator, it raises TypeError. - Python then passes the iterator to the
next()
function. - The
next()
function uses the__next__()
method of the iterator to get the next item. (Thenext()
checks that iterator has__next__()
method and raisesTypeError
if it does not, although this was also checked before by theiter()
function.) - If the
__next__()
method returns some item, it is used in the for loop, and we go back to step 3 - Python callsnext()
again. - If there is no next item, the
__next__()
method raisesStopIteration
, stopping the whole process.
Here's the code that roughly corresponds to what is happening when we use a for loop:
# Say we have iterable `numbers` like this
numbers = [1, 2, 3]
# This is what Python does when we use a for loop:
# Python will get iterator for `numbers`
it = iter(numbers)
# It starts a while loop with condition True
# so it will run forever unless it is stopped
while True:
try:
# it calls the next method passing it the iterator
item = next(it)
# if it returns an item it uses it
print(item)
# otherwise next() raises StopIteration and this is where
# it will break the while loop
except StopIteration:
break
Let's now build our own iterator and iterable from scratch.
Building iteratorPermalink
Let's create a simple class Bistro
with three fields, waitress
, chef
and barman
:
class Bistro:
def __init__(self, waitress, chef, barman):
self.waitress = waitress
self.chef = chef
self.barman = barman
Once we know how our class looks, we can create an iterator for it:
class BistroIterator:
def __init__(self, bistro):
self.bistro = bistro
self.next_item = 'waitress'
def __next__(self):
if self.next_item == 'waitress':
self.next_item = 'chef'
return self.bistro.waitress
elif self.next_item == 'chef':
self.next_item = 'barman'
return self.bistro.chef
elif self.next_item == 'barman':
self.next_item = None
return self.bistro.barman
else:
raise StopIteration
Our BistroIterator needs a Bistro object for which it provides iteration capability so that it can access its data:
- When we create our iterator, it stores the Bistro object into field
bistro
and sets which field it will return when it's asked for the next item - we chose to return fieldwaitress
. - When the
__next__()
method is called, it checks thenext_item
field to know what to return. - If the
next_item
field contains the valuewaitress
, it sets thenext_item
to be'chef'
and returns thewaitress
field. - If it contains the value
chef
, it sets thenext_item
to'barman'
and returns thechef
field. - If it contains the value
barman
, it sets thenext_item
to None and returns thebarman
field. - If the
next_item
is neither of those, it raises a StopIteration exception.
Now we need to update our Bistro class with the __iter__()
method in which we create a BistroIterator object, passing itself as a parameter and returning it.
class BistroIterator:
def __init__(self, bistro):
self.next_item = 'waitress'
self.bistro = bistro
def __iter__(self):
return PersonIterator(self)
# rest of class as before
Our Bistro class has become iterable, and we can use a for loop to iterate over its values:
our_bistro = Bistro('Mary', 'John', 'Alvin')
for item in our_bistro:
print(item)
# outputs:
# Mary
# John
# Alvin
And that is the iterator's job: Take data from iterable and return them from the __next__()
method one by one.
Usually, it is not very useful to iterate over objects like Bistro, but it proves that we can make iterable out of almost any object.
Iterator is (almost always) iterable & iterable can be an iteratorPermalink
We said before that iterator is an object that has the __next__()
method.
But...
Iterator protocolPermalink
Python documentation says that iterator is also required to have the __iter__()
method in addition to the __next__()
method to conform to the iterator protocol.
However, an iterator without __iter__()
will still work because what makes the iterator work is the __next__()
method - as we've seen before. Our BistroIterator has no __iter__()
method, but it still works.
So iterator does not have to conform to the iterator protocol to work.
When an iterator does have the __iter__()
method, it must return the iterator object itself.
class SomeIterator:
def __iter__(self):
return self
def __next__(self):
...
However, most iterators do have the __iter__()
method.
Why?
Iterator as iterablePermalink
We learned before that if an object has the __iter__()
method, it is iterable.
So if an iterator has it, then it's also iterable!
And that means we can use iterator with for loop
and in
expression and functions that expect iterable.
Let's try with a for loop:
numbers = [1, 2, 3]
# get iterator
it = iter(numbers)
# notice below we use iterator `it` not `numbers`
for item in it:
print(item)
# outputs
# 1
# 2
# 3
It works because the for
loop uses the iter()
function to get an iterator for an iterable. Here we gave it an iterator. But the iterator for the list is also iterable - it has the __iter__()
method, which returns self. So the for loop will use the same iterator as if we looped over numbers
.
We can check that list's iterator returns self
when we ask for its iterator:
numbers = [1, 2, 3]
# get iterator
it = iter(numbers)
it
# outputs:
# <list_iterator object at 0x10633f5b0>
# get iterator from iterator
it2 = iter(it)
it2
# outputs:
# <list_iterator object at 0x10633f5b0>
it == it2
# outputs:
# True
So when we give the for loop an iterator, and it asks for its iterator, it will return itself. Then it proceeds by calling next()
until it raises StopIteration.
It would not be possible if the iterator had no __iter___ ()
method. If we try the above with our BistroIterator which doesn't have __iter__()
, we'll get an error:
our_bistro = Bistro('Mary', 'John', 'Alvin')
it = iter(our_bistro)
for item in it:
print(item)
# outputs:
# Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# TypeError: 'PersonIterator' object is not iterable
When we add the __iter__()
method, it will work:
class BistroIterator:
def __init__(self, bistro):
self.next_item = 'waitress'
self.bistro = bistro
def __next__(self):
if self.next_item == 'waitress':
self.next_item = 'chef'
return self.bistro.waitress
elif self.next_item == 'chef':
self.next_item = 'barman'
return self.bistro.chef
elif self.next_item == 'chef':
self.next_item = None
return self.bistro.barman
else:
raise StopIteration
def __iter__(self):
return self
our_bistro = Bistro('Mary', 'John', 'Alvin')
it = iter(our_bistro)
for item in it:
print(item)
# outputs:
# Mary
# John
# Alvin
Not all iterables are the samePermalink
Although we can use iterators with the __iter__()
method where iterable is expected, we need to be aware of exhaustion.
Once an iterator is exhausted, we can't use it anymore to get data from it.
We said in the Exhaustion section that if we want to iterate over iterable multiple times, we could get a new iterator for each iteration.
But, an iterator that is also iterable will not give us a new iterator when we ask for it because its __iter__()
method returns itself.
numbers = [1, 2, 3]
it = iter(numbers)
for item in it:
print(item)
# outputs:
# 1
# 2
# 3
# Now `it` is exhausted.
# But `it` is also an iterable so
# let's get an iterator from it using `iter()`
it2_from_it = iter(it)
for item in it2_from_it:
print(item)
# Doesn't outputs anything as `it` and `it2_from_it`
# are the same object
So a function or other code that expects an iterable must not assume it will be able to iterate over the iterable more than once. If the iterable is also iterator it will be exhausted after the first iteration. For further iterations, that code or function will not get any items from the iterable (which is also iterator), which might break it.
We can iterate only once over iterables whose
__iter__()
method always returns the same iterator. They're often iterators themselves.
Different kinds of iterablesPermalink
We've seen that some iterables will return a new iterator each time we ask for it. But others will not.
Which iterables return a new iterator every time?
When iterable and iterator are separate objects, iterable will return a new iterator each time we ask for it.
Our iterable Bistro is a good example. It creates and returns a new BistroIterator object every time we ask it for an iterator. So we can iterate over our_bistro
many times.
Another example is all container objects like list, tuple dict, etc.
Each time we pass them to the iter()
function (or call their __iter__()
method), they produce a new iterator.
numbers = [1,2,3]
# get iterator
it = iter(numbers)
next(it)
1
next(it)
2
next(it)
3
next(it)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
# we exhausted iterator `it`
# but we can get a new iterator
it2 = iter(numbers)
next(it2)
1
That is because, same as our Bistro object, they're separate objects from their iterators. They hold data and create an iterator - they have the __iter__()
method. But the iterator is a different object, and only the iterator has the __next__()
method. They don't.
Which iterables return the same iterator every time?
When an object is both iterable and iterator (has both __iter__()
and __next__()
methods), it will return the same iterator - self - each time we ask for it.
Usually, it is so when an object doesn't store data we want to iterate over.
One example is the iterator object with the __iter__()
method. For example, the list iterator we've seen before. It doesn't have any data by itself. It needs another object - iterable (list) - to read data from. It is an iterator for another object, so it just returns itself when asked for its iterator. It doesn't create another iterator as it is an iterator.
But there are other objects which are both iterable and iterator. They have __iter__()
and __next__()
but are not merely iterator for another iterable.
These objects still don't store any data, but they are doing the work to get data from somewhere.
For example, Python's file object (_io.TextIOWrapper).
It doesn't contain any data. Those are in a file. But it has the __iter__()
method as well as the __next__()
method. It's not an iterator for another iterable. It is a standalone object that deals with the file to get data, handles file closing, etc.
But when asked for an iterator, it will just return itself.
We can iterate only once over objects that are both iterator and iterable.
So:
- Some objects are only iterables.
- Some objects are only iterators, but most iterators are also iterables (even though they are iterators for other iterable).
- And some objects are both iterable and iterator (without being iterator for other iterable ).
We can often use iterable and iterator interchangeably. But only when we understand how it works, we can properly decide what to use and how to use it and avoid surprises.
SummaryPermalink
- Iterator is an object that provides iteration for iterable.
- We get iterator from iterable using
iter()
function which calls__iter__()
method on iterable. - We get an item from the iterator by passing it to the
next()
function. next()
function uses__next__()
method of iterator to get the next item in iterable. Iterator has knowledge of and access to iterable.- Iterator is the object with
__next__()
method. - When there are no more items in iterable, then the iterator is exhausted; in this case, its
__next__()
method must raise StopIteration. - for loop uses
iter()
andnext()
functions to iterate over the iterable. - The iterator protocol requires the iterator to have the
__iter__()
method, but it will work without it anyway. - Most iterators have it, though, as it makes iterator an iterable, enabling us to use it in places where iterable is expected.
- Iterator and iterable can be separate objects.
- When they're separate objects, we can get a new iterator for iterable when the current iterator is exhausted.
- But they can also be one object which is both iterator and iterable.
- When it's one object, and it is exhausted, we can't get a new iterator, so we can only iterate over such iterable once.
Happy coding!
You might also like
Better Python apps in AWS with stelvio.dev
Deep dives and quick insights into cloud architecture.