Lazy Sequences in Clojure

I was look­ing for some more infor­ma­tion on how to con­struct lazy sequences and real­ized that there is lit­tle in the way of doc­u­men­ta­tion on lazy sequences in Clo­jure. These design notes were the best I could find on the sub­ject and while they are illu­mi­nat­ing, I thought the con­cept could be explained bet­ter.

Lazy Sequences are an inte­gral part of the Clo­jure pro­gram­ming lan­guage but one you don’t see in a lot of other pro­gram­ming lan­guages. A sequence, in Clo­jure, is a sort of metatype or pro­to­col. It sig­ni­fies a class of datas­truc­tures which store items in a log­i­cal sequence, such as arrays or lists. Clo­jure has sev­eral types of sequences includ­ing clas­sic Lisp-style lists, vec­tors (which cor­re­spond most closely to arrays in other lan­guages,) and lazy sequences.

Lazy sequences are inter­est­ing datas­truc­tures in the sense that they aren’t really datas­truc­tures. Instead they are objects which con­form to the sequence API, but which con­tain a func­tions which return the next item in the sequence. So while call­ing first on a Clo­jure list, returns the item stored in the first cons cell in the list, call­ing the same on a lazy sequence instead calls a func­tion which returns both a value and another func­tion. The value is returned by first. Here’s a small sub­set of the sequence pro­to­col func­tions and what they do:

  Regular Sequence Lazy Sequence
first returns the first item in the sequence calls a function which returns both a value and another function; returns the value
rest returns a sequence with the first item removed calls a function which returns both a value and another function; returns the new function
second returns the second item in the sequence calls the function returned by rest; this function in turn returns both another value and another function; returns the function
nth returns an item from the sequence n steps from the head executes a chain of n functions and returns the last value returned
drop returns a new sequence equivalent to the current sequence with the first n items removed executes a chain of n functions and returns the last function returned
last returns the last item in the sequence calls every chained function until no more functions are returned; returns the last value returned
take returns a sequence of the first n items in the sequence executes a chain of n functions and returns a sequence of all the values returned

It should be clear after read­ing this that lazy sequences are anal­o­gous to old style Lisp cons cells.1 Whereas in a cons cell, the head con­tains an item and the tail con­tains a pointer to the next cell in the list, with a lazy sequence, the func­tion returns both an item and the next func­tion in the sequence.

In fact, in Clo­jure, one can mix cons cell based lists with lazy sequences. A one point in the list the tail of a cons cell is set to the head of a lazy sequence so that the first por­tion of the sequence is a con­crete list while the remain­der is an ephemeral lazy sequence. You actu­ally make this hap­pen with the code below:

user> (def foo (conj (lazy-seq [1 2 3]) 0))
user> (type foo)
user> (type (drop 1 foo))

It’s also worth not­ing that this can go the other way. A lazy sequence can also return a con­crete sequence rather than a lazy sequence pair. This allows struc­tural shar­ing with other sequence types.

This rather clearly sum­ma­rized what lazy sequences are, but it does­n’t say much about why one would want to use them. The short answer is that lazy sequence make things pos­si­ble that would be impos­si­ble with­out them and solve prob­lems that would oth­er­wise have to solved with state­full variables.2

A Clo­jure lazy sequence allows for an algo­rith­mi­cally gen­er­ated log­i­cal sequence which does­n’t have to exist in mem­o­ry. Among other things, this makes it pos­si­ble to have infi­nite sequences, or to use sequences to rep­re­sent objects which are too large to load into mem­o­ry, (such as large files,) or even to rep­re­sent abstract sequences which don’t exist as such. This also makes it pos­si­ble to com­pose sequence oper­a­tions before actu­ally oper­at­ing on sequences, which allows for a lot of per­for­mance enhance­ments, such as par­al­leliz­ing an entire pipeline.

Take for example:

(iterate inc 1)

This will gen­er­ate a sequence of num­bers start­ing with one and incre­ment­ing by one onto infin­i­ty. You loop over this with another sequence and have index num­ber avail­able for each item in that sequence, like so:

user> (doall (map #(println (str %2 ": " %1)) persons (range)))
0: Irene
1: Joe
2: Steve
3: Fred
4: Thomas
5: Louise
6: Carlotta
7: Johnson
(nil nil nil nil nil nil nil nil)

That’s a sim­ple exam­ple, but line-seq let’s one oper­ate on a file one line at a time with­out read­ing the whole file into mem­o­ry. tree-seq let’s one oper­ate on a tree struc­ture as a sequence, one node at a time, with­out hav­ing to con­vert the whole tree into a sequence.

You’ll notice that lazy sequences ful­fill some of the uses of an iter­a­tor object in other lan­guages.((In face Clo­jure has the iterator-seq func­tion which wraps Java iter­a­tors.) In that they cre­ate an abstract inter­face which allows one to loop over anything, with­out hav­ing to worry so much about mem­ory usage or per­for­mance. That’s what lazy sequences do.

So how does one gen­er­ate and use lazy sequence? Well the good news is that most Clo­jure which emit sequences emit lazy ones or will at least do so if they are passed a lazy sequence in the first place. This includes func­tions like map, remove, filter, take, drop, concat, and oth­ers. To gen­er­ate a lazy sequence from scratch you can use one of the gen­er­a­tor functions:

Some Clojure sequence generator functions  
` range` A lazy sequence of numbers with an option starting value, maximum value, and step size
` repeat` An indefinitely repeating sequence of a constant
` repeatedly` An indefinite sequence consisting of the output of repeated calls to the same function
` iterate` A lazy sequence of output of repeated calls of a function, passed the output of the previous call to same function

Each of the func­tions gen­er­ates a lazy sequence from some pre­con­di­tion and one can expand on it using map or sim­i­lar func­tion.

In addi­tion, one can gen­er­ate lazy sequences from scratch using the lazy-seq macro. This macro takes a form which gen­er­ates a sequences and wraps it in a func­tion which is used as the tail of a lazy sequence. You can recur­sively gen­er­ate a lazy sequence then, with a recur­sive func­tion that wraps it’s body in a call to lazy-seq. For example:

(defn iterate [fn val]
  (cons val (lazy-seq (iterate fn (fn val)))))

Here, ‘iterate’ returns a lazy sequence, which when invoked, returns val and another call to iter­ate with (fn val) as the input. This is in fact equiv­a­lent to the built-in iterate func­tion. You can use lazy-seq like this as a wrap­per for any func­tion which recur­sively build­ings a list to instead cre­ate a lazy sequence.3

  1. The building blocks of a tradition Lisp list. Read more here
  2. A problem in Clojure which attempts to restrict itself to stateless functional programming. This, among other things, helps with Clojure’s vaunted concurrency semantics. 
  3. This has the added benefit of not blowing the stack on long recursions. 

Last update: 29/11/2013

blog comments powered by Disqus