Sketchy LISP |
Copyright (C) 2006,2007
Nils M Holm Buy a copy at Lulu.com |
An Introduction to Functional Programming in Scheme
| Previous Section | - Contents - Index - | Next Section |
A good definition of the list is still lacking. To give one, another
type that already has occurred in this book will be explained first: the
pair. A pair is what you get when you
cons together two atoms. An
atom, as the name suggests, is a type that cannot be decomposed.
(Atomos
is the Greek word for indivisible
.) A Scheme object
that cannot be decomposed is said to be atomic
. The following
objects are atomic:
In the previous sections, the second argument of
cons
has always been a list. So what happens when cons is
applied to two atoms? Here is the answer:
(cons 'a 'b) => (a . b)
The two objects are glued together
and form something that
is called a dotted pair. To
understand how his works, it is probably best to disclose some
internal details. The cons procedure actually creates
a special type of object that is sometimes referred to as an
object of the type cons.
Its only purpose is to glue together two objects. The dot of a
dotted pair may be considered to be the external representation of
a the cons object itself.
What is particularly interesting is what happens when the empty list is consed to an atom:
(cons 'a ()) => (a)
You might have expected an object of the form
(a . ()) but what you get is a list with a
single member. The empty list seems to have vanished, but it has
not. In fact a single-element list is a pair whose
cdr part is the empty list:
'(a . ()) = (a)
Gathering all information about lists that have been given so far, the following rules for constructing lists can be devised:
() is a list (obviously);(Cons x '()) is a list for any object x;(cons x y) is a list.The second case is in fact redundant, because () is
a list and so case two is just a variant of case three. This method
of constructing lists was used in the self-made append2
procedure at the beginning of this book. Append2 created
some nested conses in order to create a new list:
(cons 'a (cons 'b '(c d))) => (a b c d)
Every list can be written in dotted pair notation. For example
(a b c) = (a b c . ())
= (a b . (c . ()))
= (a . (b . (c . ())))
The first line is particularly interesting, because it shows that
the cdr part of the last element of a list is always ().
This is why null? can be used to detect the end of a
list. (Other dialects of LISP call the empty list NIL
which means Not In List
.) Above illustration also
makes clear why taking the cdr part of a list results in a list.
Dotted pair notation also can be used to show why consing a list
to a list does not append the lists:
(cons '(a b) '(c d)) => ((a b) . (c d)) = ((a b) c d)
Above definition for the list implies that the last element of
each list must be (). Is it possible, though, to
append an atom to a list? Let us see:
(append '(a b c) 'd) => (a b c . d)
The result of this application is called an
improper list. It is called
improper
just because it does not end with (),
which every proper list must do. Because the last member
of an improper list is not the empty list, the external representation
of such a list includes a dot marking its last cons. You can create
improper lists not only by appending an atom to a list, but also by
just typing them in:
'(a b c . d) => (a b c . d)
However, most procedures of Scheme expect proper lists and reduce to bottom when an improper list is passed to them:
(reverse '(a b . c)) => bottom (append '(a b . c) '(d)) => bottom
Improper lists are not useless, though. Even the Scheme language itself makes use of them. The details will be discussed in a later section.
A list that consists of pairs exclusively is called an association list (or alist). Alists are frequently used to form environments. Some Scheme implementations even use them to store their own environments, like the lexical environments of closures.
Each pair of an alist is called an association
. The car part
of an association is called the key
and the cdr part is called the
value
of that association:
((key1 . value1) ... (keyn . valuen))
To retrieve an association from an alist, the
assoc procedure
is used. It returns the first association containing the given key. In
case no matching association is found, it returns #f:
(define alist '((a . 1) (b . 2) (c . 3) (b . 4))) (assoc 'b alist) => (b . 2) (assoc 'x alist) => #f
Although assoc always returns a true value when it finds
a matching association and false if it does not find one, it is not a
predicate, because it returns any true value in case of success and not
always #t.
Note that associations created by alists are independent from bindings of symbols. Although the above alist associates symbols with values, it does not change the bindings of any symbols. The symbols themselves serve as keys in alists:
(define x 'foo) x => foo (assoc 'x '((x . bar))) => (x . bar) x => foo
The keys of alists are not really limited to symbols, but they can be of any type. There may even be different types of keys in the same alist:
(define alist '((#f . first)
("hello" . second)
((a b c) . third)))
(assoc #f alist) => (#f . first)
(assoc "hello" alist) => ("hello" . second)
(assoc '(a b c) alist) => ((a b c) . third)
Another thing you might want to know about lists is how to
access members of nested lists efficiently. Of course, the
car and cdr procedures can be combined
to achieve this, but nested applications of these procedures soon
become a mess to read:
(define x '(define (square x) (* x x))) (car x) => define (car (cdr x)) => (square x) (car (car (cdr x))) => square (car (cdr (car (cdr x)))) => x
You also might want to know why Scheme still uses the names
car and cdr (which are artifacts from
the early LISP days of the 1950's) to refer to the heads and
tails of lists. Other dialects of LISP use names like first
and rest
or head
and tail
, so why does Scheme
stick with the traditional names? The answer is simple: because
they can easily be combined to form new names for procedures that
access members of nested lists. For example, a combination of
car and cdr is used to access the
second element of a list:
(car (cdr '(a b c))) => b
If you take the a
of car and the d
of cdr, you can use these letters to create the name
cadr. This is the name of the Scheme procedure for
accessing the car of the cdr of a list (or the head of the tail
of a list (or the first member of the rest of a list)):
(cadr '(a b c)) => b
Scheme provides built-in procedures for accessing up to four levels of nesting, such as
(cddr '(a b c)) => (c)
(caddr '(a b c)) => c
(caddr '((a b) (c d) (e f))) => (e f)
(caaddr '((a b) (c d) (e f))) => e
(cadadr '((a b) (c d) (e f))) => d
Figuring out which member a procedure like
cadadr
produces is simple. Read the a's and the d's of the name backward
and perform a car for each a
and a cdr
for each d
, as in the following example:
(cadadr '((a b) (c d) (e f))) ; do a cdr -> (cadar ((c d) (e f))) ; do a car -> (cadr (c d)) ; do a cdr -> (car (d)) ; do a car => d
Of course, these procedures are only useful when decomposing lists on the fly. If you plan to use lists as data structures, you would give more descriptive names to procedures extracting specific members. For example, a date could be stored in a list of the form
(year month day)
but of course, you would not use the names car,
cadr and caddr to refer to the single
fields of a date object. You would define accessor procedures like
these:
(define (year date) (car date)) (define (month date) (cadr date)) (define (day date) (caddr date))
Of course, cadr and friends as well as custom names
for accessing parts of structured lists are only suitable for handling
rather flat and short lists. More complex list structures are typically
handled by traversing
them. The built-in
length
procedure, for instance, traverses a flat list and counts its members,
resulting in the length of that list:
(length '(a b c)) => 3 (length '(a (b (c) d) (e (f (g))) h)) => 4
The (also built-in)
list-ref
procedure produces the nth member of a list, where the first
member is at position zero:
(list-ref '(a b c) 0) => a (list-ref '(a b c) 2) => c (list-ref '(a b c) 3) => bottom
When no appropriate procedure is available, a specialized one
must be coded. For example, Scheme has no procedure for
computing the depth of an object (probably because it
is not terribly useful). However, writing a depth procedure may serve
as an interesting exercise. The depth of an object is the maximum
number of times that car can be applied to that object.
The cars do not have to be in a row, but they can be
mixed with any number of cdrs. For example,
(depth 'x) => 0
(depth '(x)) => 1
(depth '((x))) => 2
(depth '(u (v (x)))) => 3
So the depth of each member of a given list has to be computed and then the maximum of the results has to be chosen. To compute the depth of each sublist, depth has to recurse. The trivial case is simple: each atomic object has a depth of zero. The depth of non-atomic objects is the depth of the deepest object contained in it plus one:
(define (depth x)
(if (not (pair? x))
0
(+ 1 (deepest-sublist x))))
The depth of the deeptest sublist is computed by applying depth to each sublist, collecting the results, and then returning the maximum of the results. To do so, a procedure for finding the maximum of a list is required:
(define (max-list a)
(cond ((null? (cdr a)) (car a))
((> (car a) (cadr a))
(max-list (cons (car a) (cddr a))))
(else (max-list (cdr a)))))
(Note: the
> procedure,
which makes sure that its arguments are in numerically strict descending
order, is a predicate, even if its name does not end with a question mark.)
The only procedure that is left to implement is deepest-sublist. This procedure has to transform a list of objects into a list of numbers, where each number indicates the depth of the object it replaces. After transforming the list, it is passed to max-list to extract the greatest depth found in the list. Here is the code:
(define (deepest-sublist x)
(letrec
((transform
(lambda (in out)
(cond ((null? in) out)
(else (transform (cdr in)
(cons (depth (car in))
out)))))))
(max-list (transform x '()))))
When looking at the transform procedure, you might notice an opportunity for an optimization in the following expression.
(cons (depth (car in)) out)
Got it? Before doing the next iteration in transform, the depth of the current sublist is consed to out. If the maximum depth found so far was bound to out instead of the list, the new maximum depth could be computed right at this point. In this case, the max-list procedure could be optimized away altogether. Of course, transform and out would have to get more descriptive names, too:
(define (deepest-sublist x)
(letrec
((max-depth
(lambda (in max-so-far)
(cond ((null? in) max-so-far)
(else (max-depth (cdr in)
(max (depth (car in))
max-so-far)))))))
(max-depth x 0)))
Optimizations like this spring to mind quite frequently. The first program rarely is the optimal solution (if such a thing exists anyway). The most important goal is to make an algorithm work at all. Even the first version of depth worked fine, and there is barely a (non-trivial) program that cannot be optimized in one or another way. Do you think that depth cannot be expressed in an even simpler way? We will see.
The names
car
and cdr
date back to the very first implementation of LISP in the late
1950's. This version of LISP ran on an IBM Type 704
Electronic Data Processing Machine
, which was a vacuum
tube-based computer.
The memory of that computer consisted of up to 32768 machine words of magnetic core, so 15 bits were sufficient to address each word of storage. A machine word on the IBM 704 had 36 bits, so a complete cons object could be stored in a single word or register.
The registers of the 704 had four parts named prefix
,
address
, tag
, and decrement
. The address
and decrement parts were used to store addresses, so they had
a size of 15 bits each. Hence they were used to store the
references to the car and cdr part of a cons cell.
The routine for extracting the head of a cons cell was
called CAR, which was short for Content of
Address part of Register
. The routine to extract the
tail was called CDR for Content of Decrement
part of Register
.
Although the technical details vanished when LISP was being
ported to other machines, the names car and
cdr persisted, and most dialects of LISP still
keep them today, because they can easily be combined to form
new procedure names like cadr.
| Previous Section | - Contents - Index - | Next Section |