t3x.org / sketchy / vol1 / sl22.html

Sketchy LISP

  By Nils M Holm, 2006,2007,2008
Buy a copy at Lulu.com

An Introduction to Functional Programming in Scheme

4 Scheme in the Wild

This chapter illustrates what real-world Scheme code looks like by means of a small, self-contained, and heavily commented utility program.

There is a notation called box notation which is very useful for understanding Scheme data structures. Using box notation, forms like this one

(a b (c . d) e f)

are represented by diagrams like this one:

[o|o]---[o|o]---[o|o]---[o|o]---[o|/]
 |       |       |       |       |      
 a       b       |       e       f      
                 |      
                [o|o]--- d
                 |
                 c

In a box diagram each object of the type cons is represented by a box: [o|o]. The first o of the box points to the car part of the cons, the second one to its cdr part:

(car . cdr)  =  [o|o]--- cdr
                 |
                car

The empty list () is represented by a slash when it occurs at the end of a list:

(a () b)  =  [o|o]---[o|o]---[o|/]
              |       |       |
              a       ()      c

While drawing box diagrams may be a nice exercise, it is a cumbersome task that begs for automation. The program discussed in the following sections will accept any Scheme datum as input and write the corresponding box diagram to its output.

4.1 Drawing Box Diagrams

The draw-tree program, whose code will be shown in the following section, draws ASCII box diagrams that represent Scheme data structures, just like the diagrams introduced in the previous section. In fact draw-tree was used to render these diagrams.

When designing the program, the initial question is, of course, how to draw the diagrams.

So what does a typical Scheme data structure look like? Here is an example:

(a (b c . d) e)  =  [o|o]---[o|o]---[o|/]
                      |       |       |      
                      a       |       e      
                              |      
                             [o|o]---[o|o]--- d      
                              |       |      
                              b       c      

The spine of the structure, which is to be drawn first, is a series of boxes representing the cons cells of the corresponding list. The next line contains bars connecting the spine to the members of that list. The third line consists of the members themselves. Members may be lists, so the process recurses at this point.

One problem is to decide when to render sublists. In the above example, the sublist cannot be drawn directly below the spine, because doing so would allocate the space that is needed to render the e member.

The problem is solved by always rendering the rightmost member of a list first. After rendering e, we know that there is sufficient space to render the second-to-last member below. Atoms are the exception to this rule. They are always rendered as soon as they are encountered. This is why a is displayed immediately in the above example.

The next question to ask is how to represent Scheme data structures internally.

The above approach leads quite directly to the data structure used to store lists internally while rendering them. When a list is encountered, a series of conses is drawn and the list is converted into an internal representation:

(a (b c . d) e)   renders   [o|o]---[o|o]---[o|/]
                  gives     ((V) a (b c . d) e)

(V) is a tag that is used to mark internal structures. It indicates that this list is already being visited.

After drawing the bars, the internal form of the list is processed once more. The atoms a and e are rendered. The sublist (b c . d) is skipped because it is not the last member of the list being visited. A vertical bar is emitted in its place, so it can be rendered later. After displaying an atom, it is replaced with an (N) tag, which represents nothing:

((V) a (b c . d) e)   renders    a       |       e
                      gives     ((V) (N) (b c . d)))

Note that trailing (N)s can be removed immediately. This has been done above.

After drawing another series of bars, the structure is visited once again:

((V) (N) (b c . d))   renders   __________[o|o]---[o|o]--- d      
                      gives     ((V) (N) ((V) b c))

This time the embedded improper list (b c . d) is converted to internal representation and its spine is rendered. The trailing d of the improper list is removed after displaying it.

The underscores (_) represent the spaces that are used to render nothing ((N)), so the (N) tag makes sure that the embedded spine is properly indented.

After drawing another set of bars, the final iteration finishes the process:

((V) (N) ((V) b c))   renders   __________ b       c
                      gives     ((V) (N) ((V)))

Because ((V)) is essentially equal to (N):

  ((V) (N) ((V)))
= ((V) (N) (N))
= ((V))

((V)) represents an empty visited list internally, so there is nothing left to render and the program terminates.

The final question to answer is which algorithm to use. Given the above data structure, the algorithm is rather obvious. Here is a rough sketch:

  1. Draw a set of conses, mark the input list visited (((V) ...));
  2. Draw a set of spaces (for (N) members) and bars (for other members);
  3. Draw a set of objects
  4. Remove trailing (N)s and ((V))s;
  5. Repeat steps 2, 3, 4 until the input list is empty (((V))).

Of course, this algorithm works only as long as its input it a pair. Atomic input has to be handled separately.