t3x.org / sketchy / vol1 / sl08.html

Sketchy LISP

  By Nils M Holm, 2006,2007,2008
Buy a copy at Lulu.com

An Introduction to Functional Programming in Scheme

1.8 Lists, Pairs, and History

A good definition of the list is still lacking. To give one, another type that already has occurred in this book will be explained first: the pair. A pair is what you get when you cons together two atoms. An atom, as the name suggests, is a type that cannot be decomposed. (Atomos is the Greek word for indivisible.) A Scheme object that cannot be decomposed is said to be atomic. The following objects are atomic:

In the previous sections, the second argument of cons has always been a list. So what happens when cons is applied to two atoms? Here is the answer:

(cons 'a 'b) => (a . b)

The two objects are glued together and form something that is called a dotted pair. To understand how his works, it is probably best to disclose some internal details. The cons procedure actually creates a special type of object that is sometimes referred to as an object of the type cons. Its only purpose is to glue together two objects. The dot of a dotted pair may be considered to be the external representation of a the cons object itself.

What is particularly interesting is what happens when the empty list is consed to an atom:

(cons 'a ()) => (a)

You might have expected an object of the form (a . ()) but what you get is a list with a single member. The empty list seems to have vanished, but it has not. In fact a single-element list is a pair whose cdr part is the empty list:

'(a . ())  =  (a)

Gathering all information about lists that have been given so far, the following rules for constructing lists can be devised:

The second case is in fact redundant, because () is a list and so case two is just a variant of case three. This method of constructing lists was used in the self-made append2 procedure at the beginning of this book. Append2 created some nested conses in order to create a new list:

(cons 'a (cons 'b '(c d))) => (a b c d)

Every list can be written in dotted pair notation. For example

(a b c)  =  (a b c . ())
         =  (a b . (c . ()))
         =  (a . (b . (c . ())))

The first line is particularly interesting, because it shows that the cdr part of the last element of a list is always (). This is why null? can be used to detect the end of a list. (Other dialects of LISP call the empty list NIL which means Not In List.) Above illustration also makes clear why taking the cdr part of a list results in a list. Dotted pair notation also can be used to show why consing a list to a list does not append the lists:

(cons '(a b) '(c d))
=> ((a b) . (c d))
=  ((a b) c d)

Above definition for the list implies that the last element of each list must be (). Is it possible, though, to append an atom to a list? Let us see:

(append '(a b c) 'd) => (a b c . d)

The result of this application is called an improper list. It is called improper just because it does not end with (), which every proper list must do. Because the last member of an improper list is not the empty list, the external representation of such a list includes a dot marking its last cons. You can create improper lists not only by appending an atom to a list, but also by just typing them in:

'(a b c . d) => (a b c . d)

However, most procedures of Scheme expect proper lists and reduce to bottom when an improper list is passed to them:

(reverse '(a b . c)) => bottom
(append '(a b . c) '(d)) => bottom

Improper lists are not useless, though. Even the Scheme language itself makes use of them. The details will be discussed in a later section.

1.8.1 Association Lists

A list that consists of pairs exclusively is called an association list (or alist). Alists are frequently used to form environments. Some Scheme implementations even use them to store their own environments, like the lexical environments of closures.

Each pair of an alist is called an association. The car part of an association is called the key and the cdr part is called the value of that association:

((key1 . value1) ... (keyn . valuen))

To retrieve an association from an alist, the assoc procedure is used. It returns the first association containing the given key. In case no matching association is found, it returns #f:

(define alist '((a . 1) (b . 2) (c . 3) (b . 4)))
(assoc 'b alist) => (b . 2)
(assoc 'x alist) => #f

Although assoc always returns a true value when it finds a matching association and false if it does not find one, it is not a predicate, because it returns any true value in case of success and not always #t.

Note that associations created by alists are independent from bindings of symbols. Although the above alist associates symbols with values, it does not change the bindings of any symbols. The symbols themselves serve as keys in alists:

(define x 'foo)
x => foo
(assoc 'x '((x . bar))) => (x . bar)
x => foo

The keys of alists are not really limited to symbols, but they can be of any type. There may even be different types of keys in the same alist:

(define alist '((#f . first)
                ("hello" . second)
                ((a b c) . third)))
      (assoc #f alist) => (#f . first)
 (assoc "hello" alist) => ("hello" . second)
(assoc '(a b c) alist) => ((a b c) . third)

1.8.2 Nested Lists

Another thing you might want to know about lists is how to access members of nested lists efficiently. Of course, the car and cdr procedures can be combined to achieve this, but nested applications of these procedures soon become a mess to read:

(define x '(define (square x) (* x x)))
(car x) => define
(car (cdr x)) => (square x)
(car (car (cdr x))) => square
(car (cdr (car (cdr x)))) => x

You also might want to know why Scheme still uses the names car and cdr (which are artifacts from the early LISP days of the 1950's) to refer to the heads and tails of lists. Other dialects of LISP use names like first and rest or head and tail, so why does Scheme stick with the traditional names? The answer is simple: because they can easily be combined to form new names for procedures that access members of nested lists. For example, a combination of car and cdr is used to access the second element of a list:

(car (cdr '(a b c))) => b

If you take the a of car and the d of cdr, you can use these letters to create the name cadr. This is the name of the Scheme procedure for accessing the car of the cdr of a list (or the head of the tail of a list (or the first member of the rest of a list)):

(cadr '(a b c)) => b

Scheme provides built-in procedures for accessing up to four levels of nesting, such as

              (cddr '(a b c)) => (c)
             (caddr '(a b c)) => c
 (caddr '((a b) (c d) (e f))) => (e f)
(caaddr '((a b) (c d) (e f))) => e
(cadadr '((a b) (c d) (e f))) => d

Figuring out which member a procedure like cadadr produces is simple. Read the a's and the d's of the name backward and perform a car for each a and a cdr for each d, as in the following example:

(cadadr '((a b) (c d) (e f)))  ; do a cdr
-> (cadar ((c d) (e f)))       ; do a car
-> (cadr (c d))                ; do a cdr
-> (car (d))                   ; do a car
=> d

Of course, these procedures are only useful when decomposing lists on the fly. If you plan to use lists as data structures, you would give more descriptive names to procedures extracting specific members. For example, a date could be stored in a list of the form

(year month day)

but of course, you would not use the names car, cadr and caddr to refer to the single fields of a date object. You would define accessor procedures like these:

(define (year date) (car date))
(define (month date) (cadr date))
(define (day date) (caddr date))

Of course, cadr and friends as well as custom names for accessing parts of structured lists are only suitable for handling rather flat and short lists. More complex list structures are typically handled by traversing them. The built-in length procedure, for instance, traverses a flat list and counts its members, resulting in the length of that list:

(length '(a b c)) => 3
(length '(a (b (c) d) (e (f (g))) h)) => 4

The (also built-in) list-ref procedure produces the nth member of a list, where the first member is at position zero:

(list-ref '(a b c) 0) => a
(list-ref '(a b c) 2) => c
(list-ref '(a b c) 3) => bottom

When no appropriate procedure is available, a specialized one must be coded. For example, Scheme has no procedure for computing the depth of an object (probably because it is not terribly useful). However, writing a depth procedure may serve as an interesting exercise. The depth of an object is the maximum number of times that car can be applied to that object. The cars do not have to be in a row, but they can be mixed with any number of cdrs. For example,

          (depth 'x) => 0
        (depth '(x)) => 1
      (depth '((x))) => 2
(depth '(u (v (x)))) => 3

So the depth of each member of a given list has to be computed and then the maximum of the results has to be chosen. To compute the depth of each sublist, depth has to recurse. The trivial case is simple: each atomic object has a depth of zero. The depth of non-atomic objects is the depth of the deepest object contained in it plus one:

(define (depth x)
  (if (not (pair? x))
      0
      (+ 1 (deepest-sublist x))))

The depth of the deeptest sublist is computed by applying depth to each sublist, collecting the results, and then returning the maximum of the results. To do so, a procedure for finding the maximum of a list is required:

(define (max-list a)
  (cond ((null? (cdr a)) (car a))
        ((> (car a) (cadr a))
          (max-list (cons (car a) (cddr a))))
        (else (max-list (cdr a)))))

(Note: the > procedure, which makes sure that its arguments are in numerically strict descending order, is a predicate, even if its name does not end with a question mark.)

The only procedure that is left to implement is deepest-sublist. This procedure has to transform a list of objects into a list of numbers, where each number indicates the depth of the object it replaces. After transforming the list, it is passed to max-list to extract the greatest depth found in the list. Here is the code:

(define (deepest-sublist x)
  (letrec
    ((transform
       (lambda (in out)
         (cond ((null? in) out)
               (else (transform (cdr in)
                                (cons (depth (car in))
                                      out)))))))
    (max-list (transform x '()))))

When looking at the transform procedure, you might notice an opportunity for an optimization in the following expression.

(cons (depth (car in)) out)

Got it? Before doing the next iteration in transform, the depth of the current sublist is consed to out. If the maximum depth found so far was bound to out instead of the list, the new maximum depth could be computed right at this point. In this case, the max-list procedure could be optimized away altogether. Of course, transform and out would have to get more descriptive names, too:

(define (deepest-sublist x)
  (letrec
    ((max-depth
       (lambda (in max-so-far)
         (cond ((null? in) max-so-far)
               (else (max-depth (cdr in)
                                (max (depth (car in))
                                     max-so-far)))))))
    (max-depth x 0)))

Optimizations like this spring to mind quite frequently. The first program rarely is the optimal solution (if such a thing exists anyway). The most important goal is to make an algorithm work at all. Even the first version of depth worked fine, and there is barely a (non-trivial) program that cannot be optimized in one or another way. Do you think that depth cannot be expressed in an even simpler way? We will see.

1.8.3 A Historical Note

The names car and cdr date back to the very first implementation of LISP in the late 1950's. This version of LISP ran on an IBM Type 704 Electronic Data Processing Machine, which was a vacuum tube-based computer.

The memory of that computer consisted of up to 32768 machine words of magnetic core, so 15 bits were sufficient to address each word of storage. A machine word on the IBM 704 had 36 bits, so a complete cons object could be stored in a single word or register.

The registers of the 704 had four parts named prefix, address, tag, and decrement. The address and decrement parts were used to store addresses, so they had a size of 15 bits each. Hence they were used to store the references to the car and cdr part of a cons cell.

The routine for extracting the head of a cons cell was called CAR, which was short for Content of Address part of Register. The routine to extract the tail was called CDR for Content of Decrement part of Register.

Although the technical details vanished when LISP was being ported to other machines, the names car and cdr persisted, and most dialects of LISP still keep them today, because they can easily be combined to form new procedure names like cadr.