Sketchy LISP |
By Nils M Holm,
2006,2007,2008 Buy a copy at Lulu.com |
An Introduction to Functional Programming in Scheme
| Previous Section | - Contents - Index - | Next Section |
In Scheme, there are many procedures for testing for
equality,
but there is only one procedure for testing for
identity. As outlined
earlier, only appearances of the same object can be
identical. The definition of equality
is a much broader
one.
There are only three types of objects that can be tested for
identity: the symbol, the boolean, and the empty list. It would be
incorrect to state that two symbols are identical, if they have
the same name
, because identity is a relation that only exists
between an object and itself. If there are two or more symbols,
they cannot be identical. However, the name of one symbol can
be written down any number of times, like this:
foo foo foo foo foo
In this case, the name of the symbol exists five times,
but the symbol is still unique. Sometimes the name of a symbol is
referred to as a symbol, which is confusing with regard to
identity. It is better to state that all these names refer
to the same symbol.
The eq?
procedure tests whether two expressions reduce to the same symbol:
(eq? 'foo 'foo) => #t (eq? 'foo 'bar) => #f (eq? 'foo (car '(foo))) => #t
The empty list () shares the property of uniqueness with
symbols. There is only one empty list in Scheme (which is why it is
frequently referred to as the empty list), so eq?
can be applied to it safely:
(eq? '() '()) => #t (eq? 'foo '()) => #f
The truth literals #t and #f are unique,
too:
(eq? #f #f) => #t (eq? #t #t) => #t
If one of the two arguments of eq? is neither
a symbol nor an empty list nor a boolean, the result is negative:
(eq? 'foo 5) => #f (eq? #t #\x) => #f (eq? '() '(y)) => #f
So eq? performs some implied type checking. If the
types of its arguments do not match, it always returns falsity.
If both arguments of eq? are neither
symbols nor empty lists nor a booleans, the result is undefined:
(eq? 5 5) => bottom
(eq? #\a #\a) => bottom
(eq? '(x) '(x)) => bottom
Note that bottom really means undefined
in this case. Some implementations of Scheme may return truth when
eq? is applied to 5 and 5, some may return falsity, some
may return truth most of the time and falsity another time.
Just do not use eq? to compare with anything but
symbols and booleans, and you are on the safe side. Even to
compare with (), null? should be preferred.
There are lots of other types in Scheme which can be compared. However, these types are compared in terms of equality or equivalence. There is a whole bunch of different procedures for comparing different types of objects. For example, these procedures are used to compare numbers:
< <= = > >=
These are predicates, but because their names consist of special
characters exclusively, they have no question marks attached. The
meanings of these symbols are the same as in mathematics or in many
other programming languages:
<
means less than
,
>=
means greater than or equal to
, and
=
tests for numeric equivalence. There is no predicate to test whether a
sequence of values is not equivalent, though. The reason will
become clear soon.
The numeric predicates of Scheme are variadic procedures that take at least two arguments, and
(< a b c d)
actually tests whether a is less than b
and b is less than c and
c is less than d. In other words, <
tests whether its arguments are in monotonically ascending order.
The <=
predicate tests whether its arguments are in
monotonically non-descending order, and = tests whether
its arguments are all equivalent.
Given these definitions, what is a hypothetical not= (not-equivalent) predicate to do? To be in line with the above definitions
(not= a b c)
would mean that a is not equal to b and b is not equal to c. In this case, though, not= would have a meaning hat differs from
(not (= a b c))
The latter would denote that a is not equal to b or b is not equal to c.
Because each of the above definitions of negative equivalence is flawed in one or another way, the Scheme standard does the only right thing to do: it does not include such a definition at all.
There are predicates similar to = and friends that
are used to compare chars and strings. They work in the same way
as the numeric predicates, but compare chars and strings respectively.
These procedures are not required by the standard to accept more
than two arguments, but implementations may choose to make them variadic.
The names of string and
char predicates are formed
by attaching the prefixes string
or char
and appending
a question mark:
numeric char string < char<? string<? <= char<=? string<=? = char=? string=? > char>? string>? >= char>=? string>=?
A char a is considered less than
a char b,
if
The definitions of the other char predicates can be derived from the above rules using basic logic:
(char>? a b) = (char<? b a) (char<=? a b) = (not (char>? a b)) etc
When comparing letters, only the alphabetic part of the ASCII
definition is taken into consideration. The order of characters other than
letters and decimal digits (special
characters) is undefined, and
so are the relations between letters and digits, letters
and special characters, and digits and special characters:
(char>? #\x #\%) => bottom (char>? #\+ #\-) => bottom (char>? #\+ #\0) => bottom
The char equivalence predicate has no such limitations, though:
(char=? #\a #\0) => #f (char=? #\* #\*) => #t
When the name of a char predicate begins with char-ci
instead
of char
, then the predicate folds the case of its arguments
before comparing them. The abbreviation ci
means
case insensitive. For example,
(char<? #\a #\b) => #t (char<? #\a #\B) => bottom
but
(char-ci<? #\a #\B) => #t
Strings are sequences of chars, so the
string predicates basically
use the char predicates to compare the individual characters of two
strings.
String=?
evaluates to truth, if the strings passed
to it contain equal characters at corresponding positions:
(string=? "abcd" "abcd") => #t (string=? "dcba" "abcd") => #f
The other predicates skip over all characters that are equal in both strings and then apply the corresponding char predicate to the first characters that do not match, returning its result:
(string<? "abcd" "abcz") => #t (string>? "abcd" "abca") => #t
If two strings have different lengths and contain equal characters
up to the end of shorter string, the shorter string is considered
less than
the longer string:
(string<? "abc" "abcd") => #t
Like the char predicates, all of the string predicates have case
insensitive counterparts called
string-ci=?,
string-ci<?,
etc. They work in the same way as the case sensitive string predicates,
but apply the corresponding case insensitive char predicates instead.
Given all the different types of equivalence described in this section, a general predicate for testing all kinds of objects for equality would be desirable. Such a predicate is discussed in the following section.
The predicates discussed in the previous section express
identity and equivalence. Identity, the sameness of two objects,
already has been discussed before. Equivalence expresses that
two objects have the same value
. It is typically applied
to objects on which an ordering can be imposed, like numbers,
characters, or strings.
Scheme also provides a more general equivalence predicate that
can be used to compare objects of different types:
eqv? is a
less strict variant of eq? that can be used to
compare numbers and characters:
(eqv? 'x 'x) => #t
(eqv? #f #f) => #t
(eqv? '() '()) => #t
(eqv? #\Z #\Z) => #t
(eqv? -123 -123) => #t
For each pair of arguments for which eq? returns truth,
eqv? returns truth as well. Eqv? performs the
same implied type checking as eq?:
(eqv? 5 "x") => #f (eqv? 7 '(x)) => #f
But even eqv? cannot be used to compare two strings
or pairs:
(eqv? "abc" "abc") => bottom (eqv? '(x) '(x)) => bottom
While strings can be compared using string=? or
string-ci=?, none of the procedures discussed so
far could compare pairs.
Pairs have no order, so equivalence does not apply to them.
Because the application of eq? to pairs is undefined,
identity does not help either, which leaves
equality. To give a first
impression of general equality, only symbols and lists of
symbols will be taken into consideration. Two lists of symbols
are equal, if they contain the same symbol at corresponding
positions. Since the list is only a special case of the pair,
though, an approach based on the pair would be more promising.
Here is a first definition:
Two objects are equal, if
The conversion of this definition into code is quite straight-forward:
(define (eql? a b)
(if (pair? a)
(if (pair? b)
(if (eql? (car a) (car b))
(eql? (cdr a) (cdr b))
#f)
#f)
(eq? a b)))
The condition a and b are both pairs
and contain equal car and cdr parts
is implemented in a
rather awkward way in this procedure by using three nested
applications of if. Scheme provides a much better way
of expressing logical conjunctions. The expression a
is a pair and b is a pair
can be written as:
(and (pair? a) (pair? b))
The and
pseudo function does not just implement a
logical and
, though. In fact it implements conditional
evaluation in a similar way as cond or if.
And evaluates its arguments in sequence until either an
argument reduces to #f or it runs out of arguments. As
soon as one of its arguments evaluates to #f, it stops
instantly and returns falsity, ignoring the remaining arguments,
if any. Therefore, it is safe to write code like this:
(and (pair? a)
(pair? (car a))
(caar a))
If a is not a pair, and returns #f
immediately and the expressions (pair? (car a))
and (caar a) are not evaluated at all. Only if a
already turned out to be a pair, the next expression is evaluated, and
only if (car a) is a pair, too, the final expression is
reduced. In this case, the value of the final expression is the
value of the entire application of and. By skipping the
remaining expressions as soon as a predicate could not be satisfied,
the above application of and protects car
from taking the car part of a non-pair.
Formally and can be re-written using if
in the following way:
(and a b) = (if a b #f) (and a b c) = (if a (if b c #f) #f) etc
Using and, the above eql? predicate can be
simplified significantly:
(define (eql? a b)
(if (and (pair? a)
(pair? b))
(and (eql? (car a) (car b))
(eql? (cdr a) (cdr b)))
(eq? a b)))
Based on the eql? procedure, a general predicate
for testing the equality or equivalence or identity of any two
objects can be devised. Such a predicate could be used to compare
any type of objects without ever reducing to bottom (with one
exception that will be outlined later). In order to
compare any objects, some
type checking has to be
added. Using and makes this a piece of cake:
(define (equal? a b)
(cond ((number? a) (and (number? b) (= a b)))
((char? a) (and (char? b) (char=? a b)))
((string? a) (and (string? b) (string=? a b)))
((and (pair? a) (pair? b))
(and (equal? (car a) (car b))
(equal? (cdr a) (cdr b))))
((or (procedure? a) (procedure? b)) (bottom))
(else (eq? a b))))
Equal? uses
type predicates like
char? and number? to find out what type
an object has. A type predicate returns truth, if its argument
has a given type. Type predicates allow equal? to choose
the proper procedure for comparing the given objects. For example,
the clause
((number? a) (and (number? b) (= a b)))
is selected, if the argument a is numeric. In the
body of the clause, (= a b) is only
evaluated if b is also a number. If the
types of a and b are not equal,
#f is returned. And protects =
from improper types. The same
method is applied to the other types, so specialized predicates
are always applied to the matching types. The clause handling
pairs is based on the eql? procedure. The
default clause handles symbols, booleans, and empty lists, which
are checked for identity.
The equal? procedure can be used to check numbers,
strings, chars, booleans, lists of any objects and any combination
of these for equality, and it always delivers a meaningful result:
(equal? 'foo 'foo) => #t
(equal? 123 123) => #t
(equal? '(a (12) (b . c))
'(a (12) (b . c))) => #t
(equal? '(a . b) (a . c)) => #f
Equality can be considered a lax form of identity (every object
is equal to itself), so equal? delivers truth when
appearances of the same object are passed to it. Because
equal? is such a versatile procedure, is is part
of the Scheme standard.
One clause of equal? deserves a second glance for
two reasons:
((or (procedure? a) (procedure? b)) (bottom))
One reason is that it employs the
or pseudo function
which has not been discussed yet, and the other reason
is that is reduces to (bottom) without any further
checking when a procedure is passed to equal?.
Or works in a similar way as and,
implementing both the logical or and conditional evaluation. However,
or evaluates its arguments in sequence until one of them
evaluates to truth. As soon as an argument reduces to truth,
or evaluates to that argument and ignores the remaining
arguments. When all but the last argument reduce to falsity, it
evaluates to the normal form of the last argument. Or can
be re-written using cond like this:
(or a b) = (cond (a a) (else b)) (or a b c) = (cond (a a) (b b) (else c))
Applying a single argument to either and or or
is equivalent to the argument alone:
(or a) = a (and a) = a
Applying these pseudo functions to zero arguments yields the neutral element of the corresponding logic function:
(and) => #t (or) => #f
#T is the neutral element of the logical and
operation because adding another #t to a sequence of
expressions linked together using logical and does not change
the value of that expression, just like adding a zero does not
change a sum and an additional factor of one does not change a
product.
BTW: The arithmetic procedures
* and
+
indeed deliver the neutral elements of the product
and the sum respectively when they are applied to zero arguments.
The second question about the equal? predicate was
why the equality of procedures is undefined. This question is
perhaps best answered with another questions: Under which
circumstances should two procedures be considered equal?
Should they be equal, if their lambda expressions are equal? In
this case, the two following procedures would be different, although
they obviously do the same:
(lambda (x) x) (lambda (a) a)
Or should they be considered equal, if they do the same? In
this case, what would doing the same
mean? Deliver the
same values for each set of arguments? This would not only be
impossible to check, because there is an infinite number of
possible sets of arguments, it would also make quicksort and
bubblesort equal,
[Footnote: Quicksort and bubblesort are both sorting procedures,
but quicksort is known to be highly efficient while bubblesort is
known to be one of the worst sorting algorithms around.]
because both of them sort a set of objects.
And at this point, the topic of primitive procedures was not even
touched. When should two primitive procedures be considered equal?
When their machine codes are equal? What about implementations
for different CPUs?
Defining the equality of procedures is a task that is far beyond the scope of a simple utility procedure. This is why the Scheme standard leaves this point undefined.
The rules for using eq?, eqv?, and
equal? are simple. When in doubt, use equal?.
When you are sure about the types to compare, use eqv?.
When you are absolutely sure about the type, use the predicates specialized
in that type. For example, use = to compare numbers and
string=? or string-ci=? to compare
strings. Of course, equal? can only check for
equality, so if you want to examine the order of some objects, you
have to use more specialized predicates like < and
char>? anyway.
Only when dealing with symbols or truth values exclusively,
eq? is the predicate of choice, because it expresses
identity, which is what you actually want to check in this case,
and eq? is much more efficient than eqv?
or equal?. For instance, the logical not can be
implemented quite efficiently like this:
(eq? predicate #f)
Whenever predicate evaluates to truth, this expression
will reduce to falsity and vice versa. This actually is what the standard
procedure not
does.
When checking for the empty list, the null? predicate
should be used. Using
(eq? x '())
would be fine from a technical point of view, but using
null? probably expresses your intent more clearly.
There are some procedures which use equal? internally,
like assoc and member. This is why
assoc recognizes any type of object in the key fields
of association lists. To search an alist containing symbolic keys
exclusively, the
assq
procedure is much more efficient. To search an alist that contains
numbers or characters as keys, the
assv is
most efficient. Assq uses eq? in the
place of equal? and assv uses eqv?.
The member procedure also has counterparts that are
based on eq? and eqv?. They are called
memq and
memv.
Although they are highly efficient, their use is limited to the
types covered by their equivalence predicates. For example,
memq is limited to flat lists of symbols
(or truth values):
(memq 'y '(x y z)) => (y z)
It cannot be used to search non-symbolic members of lists:
(memq '(c d) '((a b) (c d) (e f))) => bottom
In this case, you have to use member instead. Above rule
of the thumb also applies here: when in doubt, use assoc
and member.
Remember: Equality is a more general case than identity. Wherever
eq?, assq, and memq return a
non-#f result, equal?, assoc,
and member will deliver the same result. The reverse
is not true, though: (==> denotes a logical implication)
(eq? a b) ==> (eqv? a b) (eqv? a b) ==> (equal? a b)
but
(equal? a b) =/=> (eqv? a b) (eqv? a b) =/=> (eq? a b)
| Previous Section | - Contents - Index - | Next Section |