t3x.org / sketchy / vol1 / sl12.html

Sketchy LISP

  By Nils M Holm, 2006,2007,2008
Buy a copy at Lulu.com

An Introduction to Functional Programming in Scheme

2.4 Dynamic Typing

Like most popular languages, Scheme is a typed language which means that there are different types of objects, like numbers, booleans, chars, strings, etc. Most contemporary languages like C or Java are statically typed languages, though, while Scheme uses a system called dynamic typing. In statically typed languages, types are associated with variables. A variable of a given type can be used to store objects of that type. The dynamic typing approach is much more flexible. It is made possible by the fact that Scheme does not store data in variables but only stores references to data.

Scheme variables do not have types. They are just symbolic names and you can bind any type of object to them. The type information is carried in the objects themselves. For example, the object 127 is of the type integer, the object "Hello, World!" is of the type string, and the object (a b c) is of the type pair. A value that carries its type information with it is also called a boxed value.

Many procedures accept only arguments of one specific type. For instance, the reverse procedure expects a list:

(reverse '(a b c)) => (c b a)

Passing any other type to the procedure will make it fail:

(reverse 'non-list) => bottom

The actual behavior of the reverse procedure depends on the Scheme implementation you are using. Most versions will tell you that you applied a procedure expecting a list to a symbol or print a similarly informative message. In an environment where reverse is a user-level procedure, though, the system may not perform any type checking at all when reverse is applied to an argument. When the procedure is applied to a non-list, it simply fails as soon as the first primitive is hit. In such an environment the reverse procedure may look like this:

(define (reverse a)
  (letrec
    ((reverse2
       (lambda (a b)
         (cond ((null? a) b)
               (else (reverse2 (cdr a)
                               (cons (car a) b)))))))
    (reverse2 a '())))

When this procedure is applied to the symbol non-list, the following happens:

(reverse 'non-list)
-> (reverse2 non-list ())
-> (cond ((null? non-list) b)
         (else (reverse2 (cdr non-list)
                         (cons (car non-list) b))))
-> (reverse2 (cdr non-list)
             (cons (car non-list) b))

At this point, Scheme attempts to take the cdr of a symbol:

(cdr non-list) => bottom

and because Scheme procedures are strict, reverse also reduces to bottom. The system will then print a message like this:

reverse2: cdr: expected a pair, but got: non-list

It tells you that the application of cdr to an object that is not a pair was attempted. The system also tells you that this happened inside of the reverse2 procedure and that the offending object was the symbol non-list.

It is a common practice in Scheme to let procedures fail like this. Of course, you could intercept such type errors by checking the argument using a predicate like list?, but all this would buy you was a slightly improved error message. This is why Scheme programmers normally do not bother with adding such redundant checks.

Note that adding type checks does not improve safety, because all primitive procedures of Scheme perform type checking internally. This is why the cdr procedure finally caught the bad type of the argument of reverse. All primitives abort evaluation immediately when an object of the wrong type is passed to them, and because all procedures are ultimately composed of primitives, a wrong type sooner or later will hit such a primitive. The following example will illustrate this principle a bit more. Let us assume that the char<=? predicate was a user-level procedure. It would probably be defined in such a way:

(define (char<=? a b) (not (char>? a b)))

It is based on the char>? predicate, which could be a user-level procedure, too:

(define (char>? a b) (char<? b a))

Char<?, finally, is a primitive procedure which expects two arguments of the type char. Now imagine a procedure called hex-digit? which is based upon char<=?:

(define (hex-digit? x)
  (or (char<=? #\0 x #\9)
      (char<=? #\a x #\f)))

When a non-char is passed to hex-digit?, evaluation continues until the primititive procedure char<? is applied to it. First Hex-digit? calls char<=?, which calls char>?, which in turn calls char<?. Char<?, finally, detects the wrong type and reduces to bottom.

But what happens if a type error is never caught at all? Type errors are feared by programmers all over the world. Once more the answer is: relax. A type error that is not caught in Scheme is not a type error at all. Either an object hits a primitive that does not know how to handle it, or the evaluation terminates normally and delivers a proper normal form.

In fact, there have been quite a few procedures in this book that do not depend on specific types. For example, the cons procedure can be used to glue together objects of any type:

    (cons 'a #f) => (a . #f)
(cons "foo" 123) => ("foo" . 123)
(cons #\x '(25)) => (#\x 25)

The eq? procedure also accepts arguments of any type, although it does not always reduce to a valid result. This is not a type error, though, but a deliberate design decision. If the set of types accepted by eq? was restricted, eq? would not work, as the following examples show:

(define (null? x) (eq? x '()))
(define (not x) (eq? x #f))

The above definitions of null? and not fully comply with the Scheme standard (but they are mostly implemented as primitives). Instead of saying that eq? allows type errors, it should be considered a procedure that does both type checking and checking for identity, because

(eq? x y) => #t

implies that x and y have the same type.

There are many other procedures and pseudo functions that accept any type, like quote and pair?, but one of these procedures is particularly interesting: equal?. It is interesting because it makes use of type predicates. A type predicate is used to check whether a given object has a given type. Pair?, for instance, returns truth only if its argument is a pair. Of course, type predicates have to accept any type of argument themselves, or they could not work. The following type predicates exist in Scheme:

boolean?  char?  number?  pair?  port?  procedure?
string?  symbol?  vector?

Any data object of the Scheme languages satisfies exactly one of the above predicates. No object may have more than one type and all types are different. This principle is called disjointness of types. In case you wonder why the list? predicate is not in the above list: The list is a special case of the pair, so it is sufficient to include the pair in the list of fundamental types. Furthermore, adding a special case to the list of fundamental types would break disjointness of types because

(list? '(a)) => #t
(pair? '(a)) => #t

The examples given in this section show that dynamic typing provides a high level of flexibility and a high level of safety at the same time. It is particularly suitable for rapid prototyping, because it does not force the programmer to make decisions about types where no such decision is required. Statically typed languages that employ strong typing (like Pascal) often lack flexibility, and languages employing weak typing (like C) typically sacrifice safety for flexibility. Dynamic typing combines the safety of strong static typing with the flexibility of weak typing.

Another thing most programmers are concerned about is efficiency, and boxed values decrease performance, because they do not fit in registers. However, optimizing Scheme compilers can decide to unbox values in contexts where their type information is not needed. For example, when compiling the distance-between-points procedure introduced earlier, a compiler may find out that the type information of the arguments is not used:

(define (distance-between-points x y x2 y2)
  (let ((dx (abs (- (unbox x2) (unbox x))))
        (dy (abs (- (unbox y2) (unbox y)))))
    (box (sqrt (+ (* dx dx)
                  (* dy dy))))))

Therefore it is sufficient to unbox the arguments when the procedure is entered and re-box the result before returning it. Calls to the fictitious procedures box and unbox have been added to the code above in order to illustrate this principle. An optimizing compiler would add code for boxing and unboxing values on its own. The values of the variables dx and dy do not have to be boxed at all, because they are local to the scope of the procedure. Hence all intermediate results can be kept in registers inside of distance-between-points.

By unboxing values and limiting type checking to primitive procedures, the overhead is reduced to a degree where there is not much of a difference between optimized Scheme code and optimized code generated by a compiler for a statically typed language. On the other hand, this small overhead buys a high degree of freedom and flexibility.