I was looking for some more information on how to construct lazy sequences and realized that there is little in the way of documentation on lazy sequences in Clojure. These design notes were the best I could find on the subject and while they are illuminating, I thought the concept could be explained better.
Lazy Sequences are an integral part of the Clojure programming language but one you don’t see in a lot of other programming languages. A sequence, in Clojure, is a sort of metatype or protocol. It signifies a class of datastructures which store items in a logical sequence, such as arrays or lists. Clojure has several types of sequences including classic Lisp-style lists, vectors (which correspond most closely to arrays in other languages,) and lazy sequences.
Lazy sequences are interesting datastructures in the sense that they aren’t really datastructures. Instead they are objects which conform to the sequence API, but which contain a functions which return the next item in the sequence. So while calling first
on a Clojure list, returns the item stored in the first cons cell in the list, calling the same on a lazy sequence instead calls a function which returns both a value and another function. The value is returned by first. Here’s a small subset of the sequence protocol functions and what they do:
Regular Sequence | Lazy Sequence | |
---|---|---|
first |
returns the first item in the sequence | calls a function which returns both a value and another function; returns the value |
rest |
returns a sequence with the first item removed | calls a function which returns both a value and another function; returns the new function |
second |
returns the second item in the sequence | calls the function returned by rest ; this function in turn returns both another value and another function; returns the function |
nth |
returns an item from the sequence n steps from the head | executes a chain of n functions and returns the last value returned |
drop |
returns a new sequence equivalent to the current sequence with the first n items removed | executes a chain of n functions and returns the last function returned |
last |
returns the last item in the sequence | calls every chained function until no more functions are returned; returns the last value returned |
take |
returns a sequence of the first n items in the sequence | executes a chain of n functions and returns a sequence of all the values returned |
It should be clear after reading this that lazy sequences are analogous to old style Lisp cons cells.1 Whereas in a cons cell, the head contains an item and the tail contains a pointer to the next cell in the list, with a lazy sequence, the function returns both an item and the next function in the sequence.
In fact, in Clojure, one can mix cons cell based lists with lazy sequences. A one point in the list the tail of a cons cell is set to the head of a lazy sequence so that the first portion of the sequence is a concrete list while the remainder is an ephemeral lazy sequence. You actually make this happen with the code below:
user> (def foo (conj (lazy-seq [1 2 3]) 0))
#'user/foo
user> (type foo)
clojure.lang.Cons
user> (type (drop 1 foo))
clojure.lang.LazySeq
It’s also worth noting that this can go the other way. A lazy sequence can also return a concrete sequence rather than a lazy sequence pair. This allows structural sharing with other sequence types.
This rather clearly summarized what lazy sequences are, but it doesn’t say much about why one would want to use them. The short answer is that lazy sequence make things possible that would be impossible without them and solve problems that would otherwise have to solved with statefull variables.2
A Clojure lazy sequence allows for an algorithmically generated logical sequence which doesn’t have to exist in memory. Among other things, this makes it possible to have infinite sequences, or to use sequences to represent objects which are too large to load into memory, (such as large files,) or even to represent abstract sequences which don’t exist as such. This also makes it possible to compose sequence operations before actually operating on sequences, which allows for a lot of performance enhancements, such as parallelizing an entire pipeline.
Take for example:
(iterate inc 1)
This will generate a sequence of numbers starting with one and incrementing by one onto infinity. You loop over this with another sequence and have index number available for each item in that sequence, like so:
user> (doall (map #(println (str %2 ": " %1)) persons (range)))
0: Irene
1: Joe
2: Steve
3: Fred
4: Thomas
5: Louise
6: Carlotta
7: Johnson
(nil nil nil nil nil nil nil nil)
That’s a simple example, but line-seq
let’s one operate on a file one line at a time without reading the whole file into memory. tree-seq
let’s one operate on a tree structure as a sequence, one node at a time, without having to convert the whole tree into a sequence.
You’ll notice that lazy sequences fulfill some of the uses of an iterator object in other languages.((In face Clojure has the iterator-seq
function which wraps Java iterators.) In that they create an abstract interface which allows one to loop over anything, without having to worry so much about memory usage or performance. That’s what lazy sequences do.
So how does one generate and use lazy sequence? Well the good news is that most Clojure which emit sequences emit lazy ones or will at least do so if they are passed a lazy sequence in the first place. This includes functions like map
, remove
, filter
, take
, drop
, concat
, and others. To generate a lazy sequence from scratch you can use one of the generator functions:
Some Clojure sequence generator functions | |
---|---|
` range` | A lazy sequence of numbers with an option starting value, maximum value, and step size |
` repeat` | An indefinitely repeating sequence of a constant |
` repeatedly` | An indefinite sequence consisting of the output of repeated calls to the same function |
` iterate` | A lazy sequence of output of repeated calls of a function, passed the output of the previous call to same function |
Each of the functions generates a lazy sequence from some precondition and one can expand on it using map
or similar function.
In addition, one can generate lazy sequences from scratch using the lazy-seq
macro. This macro takes a form which generates a sequences and wraps it in a function which is used as the tail of a lazy sequence. You can recursively generate a lazy sequence then, with a recursive function that wraps it’s body in a call to lazy-seq
. For example:
(defn iterate [fn val]
(cons val (lazy-seq (iterate fn (fn val)))))
Here, ‘iterate
’ returns a lazy sequence, which when invoked, returns val
and another call to iterate with (fn val)
as the input. This is in fact equivalent to the built-in iterate
function. You can use lazy-seq
like this as a wrapper for any function which recursively buildings a list to instead create a lazy sequence.3
- The building blocks of a tradition Lisp list. Read more here. ↩
- A problem in Clojure which attempts to restrict itself to stateless functional programming. This, among other things, helps with Clojure’s vaunted concurrency semantics. ↩
- This has the added benefit of not blowing the stack on long recursions. ↩