Lazy Iterator Interfaces
How iterators can help us to develop code that is flexible and reusable, by separating looping policy and object generation.
This article continues the story about iterators and lazy computation. We are going to iterate (pun intended) together over the interface of a function to explore how Python iterators, and the related functions present in the standard library, can improve the versatility and power of your functions. We will see how to abstract away looping conditions and then loops themselves. The article concludes with some code that clearly separates the code inside a loop from the loop itself, so that both components can be reused independently.
Introduction
Let us assume that we are building a math-heavy application and it will rely significantly on the Newton’s method to find roots (zeroes) of functions. Here is its code in Python:
def newton_update(f):
def update(x):
return x - f(x) / approx_derivative(f, x)
return update
def approx_derivative(f, x, delta=1e-5):
df = f(x + delta) - f(x)
return df/delta
Credit: https://wizardforcel.gitbooks.io/sicp-in-python/content/6.html
newton_update
is a higher-order function that takes as input the function f
of which we want to find the zero, and returns a new function that computes a step of computation of the Newton method. In fact, it is an iterative method. The details can be found on Wikipedia.
The basic idea is that we start from a point and the update
step gives us a new point closer to the function’s zero. This will be the next input to the update step and so on until we reach a zero or a very close approximation.
Looping
Now, we can leave it as it is and let the client code loop over it. So one possibility would be:
def iter_sqrt16(x):
# when the return value is 0, x is the square root of 16
return x*x - 16
target = iter_sqrt16
f = newton_update(target)
init_point = 0.0
x0, x1 = init_point, f(init_point)
tol = 1e-4
while abs(x1 - x0) > tol and x1 != 0.0: # When increments are too small we stop
new_val = f(x1)
x0, x1 = x1, new_val
and this works, but since we are going to write many times for different functions f
, we may want to abstract away this loop.
def newton_solver0(target, init_point, tol=1e-4):
f = newton_update(target)
x0, x1 = init_point, f(init_point)
while abs(x1 - x0) > tol and x1 != 0.0:
new_val = f(x1)
x0, x1 = x1, new_val
return x1
>>> newton_solver0(iter_sqrt16, 0.0)
4.000000000000008
So far so good. Since it is an iterative method it does not provide exact results, but for most practical uses our result is identical to 4.0. Now, let us suppose to have a complicated function and using the absolute tolerance is not a good criterion. We would rather use the relative tolerance, so if a step is much smaller than the previous one we consider it done. However, we still have to keep the absolute tolerance into account, both in the arguments and in the logic.
def newton_solver1(target, init_point, atol=1e-4, rtol=0.0):
f = newton_update(target)
eps = 1e-8
x0, x1 = init_point, f(init_point)
while x1 != 0.0 and abs(x1 - x0) > atol and abs((x1 - x0) / (x0 + eps)) > rtol:
new_val = f(x1)
x0, x1 = x1, new_val
return x1
>>> print(newton_solver1(iter_sqrt16, 0.0, atol=0.0, rtol=1e-4))
4.000000006523922
We see that things are starting to go out of control. Still, we are requested to add a new termination condition to stop the loop after a predetermined number of iterations. Here is the new code:
def newton_solver2(target, init_point, max_iter=50, atol=1e-4, rtol=0.0):
f = newton_update(target)
eps = 1e-8
x0, x1 = init_point, f(init_point)
steps = 0
while x1 != 0.0 and steps < max_iter and abs(x1 - x0) > atol and abs((x1 - x0) / (x0 + eps)) > rtol:
new_val = f(x1)
x0, x1 = x1, new_val
steps += 1
return x1
>>> print(newton_solver2(iter_sqrt16, 0.0, max_iter=50, atol=0.0, rtol=0.0))
4.0
It turns out that just few iterations are sufficient to zero-out this function.
Now, I would like to invite you to observe the complexity of the stopping criterion, and consider that the Netwon’s algorithm itself is not that complicated. Also, the real function would need to be even longer because we would need to check the validity of all parameters, for instance the tolerances and max_iter cannot be lower than 0. So, much more code is needed here.
If you are thinking who would write a function like this, please note that this is only a subset of the parameters accepted by scipy.optimize.newton, which is a more sophisticated version of this code.
Yet, this solution is not satisfactory, can we do better? It turns out, we can work at a higher level of abstraction.
Abstract the condition
Instead of hard-coding the condition, we can provide a clear interface for it and let the client code define it.
def newton_solver3(target, init_point, cond):
# cond takes as arguments (x0, x1, steps) and returns a boolean that is True when the loop should continue
f = newton_update(target)
x0, x1 = init_point, f(init_point)
steps = 0
while x1 != 0.0 and cond(x0, x1, steps):
new_val = f(x1)
x0, x1 = x1, new_val
steps += 1
return x1
def cond(x0, x1, steps):
return steps < 100 and abs(x1 - x0) > 1e-8
>>> newton_solver3(iter_sqrt16, 1.0, cond)
4.0
The tradeoff that we are taking in this version is to keep the number of arguments under control, at the expense of some additional work by client code. Reducing the number of arguments is useful to reduce the cognitive load for a reader of the function. They have to understand the meaning of fewer arguments, and most important, we remove arguments that would interact between each other. It just becomes easier to understand the function interface. On the other side, since a user of the function needs to write cond, it must be very clear how it should be defined, and having some example functions can help. We may have the following default functions, corresponding to the previous conditions, and show to the users how to adapt them to be used with newton_solver3
:
def atol_cond(x0, x1, steps, atol=1e-8):
return abs(x1 - x0) > atol
def rtol_cond(x0, x1, steps, rtol=1e-5, eps=1e-8):
return abs((x1 - x0)/(x0+eps)) > rtol
def maxsteps_cond(x0, x1, steps, maxiter=50):
return steps < maxiter
# Use those functions using functools.partial if you want to replace the default values
#
# from functools import partial
# my_atol = partial(atol_cond, atol=1e-4)
# my_maxsteps = partial(maxsteps_cond, maxiter=100)
#
# Then you can combine them
# new_cond = lambda x0, x1, s: my_atol(x0, x1, s) and my_maxsteps(x0, x1, s)
#
# And finally use it with newton_solver3
# newton_solver3(iter_sqrt16, 1.0, new_cond)
For sure it is a bit less immediate to use but can offer more flexibility. For instance, the client code may define some closures to have conditions depending on variables defined elsewhere.
But enough with the condition, another way to achieve a higher level of abstraction is to abstract away the loop and let the client code totally decide what to do with the values.
Abstract Loops with Iterators
Let us start with a couple of helper functions:
from itertools import islice
def iterate(f, x):
" Return an infinite sequence of x, f(x), f(f(x)), f(f(f(x))) ..."
while True:
yield x
x = f(x)
def take(n, iterable):
"Return first n items of the iterable as a list. Taken from https://docs.python.org/3/library/itertools.html"
return list(islice(iterable, n))
Now newton_solver can simply be defined in terms of iterate:
def newton_solver4(target, init_point):
f = newton_update(target)
yield from iterate(f, init_point)
Defining iterate is not strictly necessary here, but I wanted to show how a particular pattern for a loop can be abstracted away so that in our function we can focus only in the details that are really important for that function.
Our newton_solver4
is now an iterator for an infinite sequence, so it is the client code’s work to decide how many times to call it. Recall from our previous articles that iterators can produce infinite sequences because they are “lazy”, so they produce results on demand.
The client code can now produce an exact number of steps by using the take
function defined above
>>> take(10, newton_solver4(iter_sqrt16, 1.0))
[1.0, 8.4999624998022, 5.191163819072843, 4.136663168217681, 4.00225763674572, 4.000000639575587, 4.000000000000851, 4.0, 4.0, 4.0]
or just wait for the function to become a zero (not advisable):
def high_precision_solver(solver, f, thresh):
# produce values till they are close enough (by thresh) to the desired value. Return the list with all the produced values
l = []
for x in solver:
l.append(x)
if abs(f(x)) < thresh:
break
return l
>>> high_precision_solver(newton_solver4(iter_sqrt16, 1.0), iter_sqrt16, thresh=1e-24)
[1.0, 8.4999624998022, 5.191163819072843, 4.136663168217681, 4.00225763674572, 4.000000639575587, 4.000000000000851, 4.0]
Once again, we have separated the looping logic from the production of the results.
Do we want to iterate based on a difference tolerance?
def tolerance_solver(solver, atol):
x0 = next(solver) # get the first value
l = [x0]
for x1 in solver:
l.append(x1)
if abs(x1 - x0) < atol:
break
x0 = x1
return l
>>> tolerance_solver(newton_solver4(iter_sqrt16, 1.0), atol=1e-24)
[1.0, 8.4999624998022, 5.191163819072843, 4.136663168217681, 4.00225763674572, 4.000000639575587, 4.000000000000851, 4.0, 4.0]
This last function does not look much simpler than our original solver, except for one thing: it works on an iterable, it does not know about functions to optimize. This let us update one single variable at a time, while the other is assigned from the iterable without explicit function calls (x0 = x1
vs x0, x1 = x1, f(x1)
). It is simpler in the end and it is not limited to our newton solver, since everything it does is to retrieve values from an iterator.
And this is the main point of this article. Iterators are a powerful abstraction that allow for great reusability and enable lazy computation. Any kind of sequence computation can be hidden behind the iterator interface. Moreover, the take-home message is that they make it easy to separate the logic to generate objects from the logic to loop over them - which decides when and how to get a new item or whether to stop.
The item generator logic - in our example, the solver - can be reused with many loops. At the same time, different looping functions can be reused with different iterators. By searching for such kind of repetitions in your code you can obtain great benefits like reducing the code size, and consequently test your existing code more and better. Code quality will surely benefit.