t3x.org / sketchy / vol1 / sl04.html

Sketchy LISP

  Copyright (C) 2006,2007 Nils M Holm
Buy a copy at Lulu.com

An Introduction to Functional Programming in Scheme

1.4 Basic Data Types

Up to this point only two data types have been introduced: the integer and the boolean. There are more types in Scheme, though. Some of them will be introduced in this section.

1.4.1 Integers

The integer is the only numeric type that is required to be supported by all Scheme implementations. Most implementations also provide floating point, rational and complex numbers, but these are not discussed in detail here. While Scheme uses a highly interesting and mathematically correct model for numeric computations involving floating point arithmetics, most of the numeric code in this book deals with integers.

One interesting property of Scheme arithmetics is that computations do not overflow. There are still some popular languages that will silently provide a wrong result for expressions like

1234567890 + 9876543210

For instance, the following C program will print -1773790788 (on a two's-complement 32-bit machine), which is plain wrong:

#include <stdio.h>
int main(void) {
        printf("%ld\n", 1234567890 + 9876543210);
        return 0;
}

Of course, modern computers have 64-bit integers, but there will always be a number that is too big to compute in a given number of bits, and there will always be a language which computes the wrong result without notice.

Scheme uses so-called bignum arithmetics. Bignum is short for big number. The implementation of this kind of arithmetics is independent from a specific kind of hardware. It can handle all numbers that fit in memory, so the only sort of overflow that theoretically can occur while doing bignum arithmetics is the memory overflow, and even such an event will reliably abort the computation in progress and not deliver a wrong result.

Some versions of Scheme use bignum arithmetics exclusively, but more efficient implementations use native arithmetic operations as long as the result fits in a machine word. When an overflow occurs, they switch to bignum mode. To the user there is no visible difference. Bignum integers look exactly the same as native integers. As long as the result of an operation fits in a machine word, Scheme is about as fast as any other language. When an overflow occurs, it does slow down, but still produces correct results.

1.4.2 Booleans

The boolean is used to represent logical truth and falsity. There are only two literals representing theses states, which is why this sub-section is rather short.

The object #t represents logical truth (or just truth) and #f represents logical falsity (falsity, the false value). There is only one false value, but any object except for #f is considered a true value. This is why the following expression works fine:

(cond (0 'foo)) => foo

Do not worry about the sub-expression 'foo - it will be explained real soon now. What is more interesting is the fact that 0 is considered to be true. This might look odd at a first glance, but it simplifies some code significantly.

For example, the member procedure searches a given object in a list. When it finds a matching member, it returns a sublist with that member at the first position. When it does not find a match, it returns #f. Because all non-#f values are considered true, you can write code like this:

(define (digit-type x)
  (cond ((member x '(1 3 5 7 9)) 'odd)
        ((member x '(0 2 4 6 8)) 'even)
        (else                    'not-a-number)))

1.4.3 Chars, Strings, and External Representation

Like most other languages, Scheme provides data types for holding single characters and sequences of characters. Single characters are represented by chars, and sequences of characters are represented by strings.

At this point, it is time to introduce something called external representation. The external representation of an object is what the programmer keys in in order to create an object of a given type. For instance, 123 is typed to create a numeric object with a value of 123. Of course, the computer does not really store the number 123 in this form, but it converts it to some internal representation that is more efficient from the machine's point of view.

Because the internal representation of 123 would be something like 01111010, which is hard to decode for a human being, each Scheme object maps to exactly one single external representation. Numbers map to their decimal forms, logical falsity maps to #f, etc.

The Scheme procedures read and write translate between internal and external representation. Read maps external representation to the internal one:

(read) 12345 => 12345

Because Scheme environments use write to output their results, though, you cannot see the internal representation. In fact, the internal representation is never of any interest to Scheme programmers. You do not have to care about things like byte ordering, memory layout, etc. Scheme hides these details from you and allows you to concentrate on real problems instead.

The external representation of a char consists of the prefix #\ and the character to represent, for example:

#\H #\e #\l #\l #\o
#\\ #\( #\5 #\+ #\ 

The last char in the above sample represents the blank character, but this representation is unfortunate, because the character to represent is invisible. This is why an alternative external representation of this character exists: #\space. The same is valid for the newline character, which is written #\newline.

A sequence of characters is called a string. The characters of a string are enclosed in double quotes:

"Hello, World!"

To include a quotation mark in a string, it must be prefixed with a backslash:

"He said \"hello\"."

To include a backslash, it must be prefixed with another backslash. The external representation of a string requires the backslash before the characters #\" and #\\. Therefore,

"\"Hi\"" => "\"Hi\""

If you want to output a string without the additional backslashes, you have to use the pretty-printing display procedure:

(display "A\\B")  writes  A\B

Note that the output of display is not an external representation. When discussing Scheme, the external representation is used exclusively to refer to objects. Neither the sequence A\B nor the sequence "A\B" is a valid object of the type string.

1.4.4 Lists, Symbols, and Quotation

In fact I was lying when I told you that you only had learned about two data types at the beginning of this section. Actually, it were four, because programs like

(define (fact n)
  (cond ((zero? n) 1)
    (else (* n (fact (- n 1))))))

are nothing but lists of integers, booleans, other lists, and symbols. So you already have seen quite a few lists and symbols in this book. You even may have understood how lists work by using your intuition. A simple explanation would be that a list is a sequence of objects enclosed by parentheses, like these:

(1 2 3)
(#t "hi" (+ 5 7) #\x)
(cond (a b) (#t #f))
(define (id x) x)
() ; <--- this is the empty list

These examples ilustrate two notable properties: The first one is that a list may contain lists. Lists may be nested to any depth. The second one is that Scheme programs are in fact lists. This is a particularly nice feature, because it allows Scheme programs to examine, decompose, and even write Scheme programs in a simple way, just by using procedures for manipulating lists. On the other hand, this may seem troublesome, because procedure applications look like lists, too:

(+ 5 7)

Is this supposed to be a list of three elements or an application of the + procedure to the numbers 5 and 7? At this point, quotation comes into play. Above expression is not quoted, so it is a procedure application:

(+ 5 7) => 12

To create a list containing the members +, 5, and 7, the quote pseudo function is used:

(quote (+ 5 7)) => (+ 5 7)

Quote is a very important and frequently used construct and albeit it appears to do nothing more than return its argument without any further evaluation, you should read about its implications carefully. Understanding quote is absolutely essential in order to understand Scheme.

Another type that frequently has to be quoted is the symbol. While symbols are common in most programming languages, quoted symbols are only present in a minority of them. As long as a symbol occurs without quotation in a Scheme program, it has exactly the same meaning as in most other languages: either it is a variable representing a value or it is a keyword.

Scheme has keywords, too. You already have seen a few in this book, namely define, cond, if, letrec, and lambda. Most other symbols you have seen were in fact variables, including, for instance, zero?, fact, and +.

As you can see, the naming conventions for symbols are rather lax in Scheme. You may include letters, numbers, and even a wide range of special characters, like + - * / : < > ? and others.

Characters you may never use in symbols are these:

( ) [ ] . ' ` , # " ;

You may not use sequences that could be mixed up with numbers, either (like 123, -1, etc).

The define pseudo function introduces a variable, binds it to a storage location, and makes that location refer to the normal form of an expression. In most contexts, it is also correct to say that it binds a variable to a value.

Note that variables do not have types in Scheme, so this is different from storing a value in a typed variable. When a value is stored in a typed variable, like in a C program, the variable must have the right type to hold that value. You cannot store a C string in a C int. A Scheme variable, however, is merely a reference to a value. By associating a variable with a value, you make the variable refer to that value.

Once a variable is associated with with a value, each reference to the variable evaluates to that value:

(define foo 5)
foo => 5
(+ foo foo) => 10
(+ foo (* foo foo)) => 30

It is an error to refer to a symbol that is not bound to any value:

unbound-symbol => bottom

Unlike most other popular languages, Scheme can refer to symbols themselves. In order to refer to a symbol rather than its value, the symbol is quoted:

foo => 5
(quote foo) => foo

Even unbound symbols can be referred to in this way:

(quote unbound-symbol) => unbound-symbol

To avoid misunderstandings, the term symbol will be used to denote quoted symbols and unbound symbols in this book. Unquoted and bound symbols will be referred to as variables or arguments or, if they are bound to procedures, even as procedures, as in the procedure f takes two arguments a and b.

An alternative notation for quoting expressions is:

'foo => foo

This is because typing (quote something) each time you want to refer to a symbol rather than its value or create a list rather than apply a procedure is a bit tedious. The two notations are perfectly equivalent:

'expression = (quote expression)

(Quoted) symbols do not have any value other than their name. They cannot be composed, decomposed, or manipulated in any other way. All you can do with them is to convert them to strings:

(symbol->string 'foo) => "foo"

or compare them:

(eq? 'foo 'foo) => #t
(eq? 'foo 'bar) => #f
(define baz 'foo)
(eq? 'foo baz) => #t

The eq? predicate tests whether its two arguments are identical. The term identical is much stronger than the terms equal or equivalent. It states that two appearances of an object are in fact one and the same. All occurrences of one symbol are identical. Applying eq? to two appearances of the same symbol always yields truth. Symbols, the empty list (), and the truth literals #t and #f are the only objects whose occurrences are identical.

Although there are many Scheme programs that do not make any use of symbols, they are essential to the language, because Scheme programs themselves are made of lists and symbols. For example, the expression

'(define (fact n)
   (cond ((zero? n) 1)
     (else (* n (fact (- n 1))))))

reduces to a list that resembles a Scheme program. Note that the quote character (') at the beginning quotes the entire expression. The expression consists of (lists containing) truth literals, integers, and lots of symbols. You can use list procedures to decompose the program and examine its components. For example, you can extract the first member of the list, define, and store it in a variable x. You can then use the expression

(eq? x 'define)

to find out whether the program defines something. This is how Scheme programs can examine Scheme programs. The necessary procedures for composing and decomposing lists will be introduced in the following section.