T3X Part | LISP Part
This document describes the implementation of KILO LISP 22 a small, yet quite capable interpreter for purely symbolic LISP. The interpreter is written in a simple procedural language called T3X. Here is a quick reference for the language at ../../t3x/uman.html
Prelude | Parameters | Special Values | Some Useful Functions | Input and Output | Error Handling | Memory Allocation | More Useful Functions | Symbols | The Reader | The Printer | Image Files | Built-In Functions | Special Forms | The Macro Expander | The Evaluator | System Initialization | Main Program and REPL
! KILO LISP, a kilo byte-sized LISP system ! Nils M Holm, 2019, 2020 ! In the public domain
A module is a program. The KLISP module uses the classes T3X and TTYCtl. The T3X class contains methods for opening, closing, reading, and writing files, as well as for memory operations, such as copying, comparing, and scanning. The TTYCtl class is used to control a terminal by moving the cursor, clearing the screen, etc.
module klisp(t3x, ttyctl);
The classes have to instantiated, so messages can be sent to them.
object t[t3x], tty[ttyctl];
The NNODES constant specifies the total number of nodes in the system. Most nodes are cons cells. Atoms allocate one (atomic) node per two characters in their name.
const NNODES = 12288;
SYMLEN is the maximum length of a symbol/atom name. BUFLEN is the size of an input buffer for file and TTY I/O. EVDEPTH is the maximum number of times EVAL can be called recursively. This happens only during macro expansion. PRDEPTH is the maximum depth (number of nested CAR fields) in an S-expression when printing it. NLOAD is the maximum number of nested calls to LOAD. E.g., when NLOAD=3, a file can load a file that loads a file, but no more nesting is possible.
const SYMLEN = 64; const BUFLEN = 128; const EVDEPTH = 512; const PRDEPTH = 64; const NLOAD = 3;
These are special values. They are represented by integers that are not valid offsets into the node pool. NIL is NIL. EOT is the end of file, DOT is a dot read by the reader, RPAREN is a right parenthesis read by the reader. UNDEF is an undefined value. It is the value of undefined or unbound symbols, and it is also used internally for different purposes.
const SPCL = NNODES; const NIL = NNODES+1; const EOT = NNODES+2; const DOT = NNODES+3; const RPAREN = NNODES+4; const UNDEF = NNODES+5;
Here are some string functions. The module could import the STRING class, but, to keep things obvious, they are defined here.
STRLEN scans the first 32767 characters of S for NUL and returns the offset of the first occurrence, which coincides with the length of a NUL-terminated string. STREQU returns truth (−1; sometimes written %1 in T3X) when the strings A and B are equal. STRCOPY copies B to A.
strlen(s) return t.memscan(s, 0, 32767); strequ(a, b) return t.memcomp(a, b, strlen(a)+1) = 0; strcopy(a, b) return t.memcopy(a, b, strlen(b)+1);
NTOA writes the decimal representation of the signed integer N to the global buffer NTOAB. It returns a pointer to the first character of the resulting string.
var Ntoab::20; ntoa(n) do var q, p; p := @Ntoab::19; q := p; p::0 := 0; while (n \/ p = q) do p := p-1; p::0 := n mod 10 + '0'; n := n / 10; end return p; end
The global variables XPOS and YPOS keep track of the X and Y position of the cursor on the terminal screen.
var Xpos, Ypos;
CLEAR clears the terminal screen and updates XPOS/YPOS.
clear() do tty.clear(); Xpos := 0; Ypos := 0; end
When the variable PASS is true (non-zero), the interpreter will use the SYSIN/SYSOUT interface instead of the terminal screen. SYSIN/SYSOUT is basically the same as stdin/stdout in C.
var Pass;
The PR function writes the string S to the terminal screen and updates the XPOS and YPOS variables accordingly. It interprets carriage return and linefeed characters, handles wrap-around at the end of a line, and scrolls the screen up when the bottom line is reached.
Note that it reserves the bottom line itself for status messages. The bottom line does not scroll when the bottom of the screen is reached.
When PASS=%1, none of the above happens. The string S is simply written to T3X.SYSOUT (stdout).
pr(s) do var i; ie (Pass) do t.write(T3X.SYSOUT, s, strlen(s)); end else do i := 0; while (s::i) do ie (s::i = '\n') do Ypos := Ypos+ 1; tty.move(Xpos, Ypos); end else ie (s::i = '\r') do Xpos := 0; tty.move(Xpos, Ypos); end else do tty.writec(s::i); Xpos := Xpos + 1; end if (Xpos >= tty.columns()) do Xpos := 0; Ypos := Ypos + 1; tty.move(Xpos, Ypos); end if (Ypos >= tty.lines()-1) do tty.scroll(0, tty.lines()-2); Ypos := tty.lines() - 2; tty.move(Xpos, Ypos); end i := i+1; end end end
The NL function moves the cursor to the beginning of the next line.
nl() do var b::3; ie (Pass) pr(t.newline(b)); else pr("\r\n"); end
Input from files and the terminal is buffered, and the following is the structure of an input buffer. INPUT is the file descriptor being read, INBUF is a pointer to the vector in which input is being buffered; it points to BUFFER initially. INP is the offset of the next character to extract from the buffer, and INK (IN-K, not the liquid!) is the number of characters in the buffer. When INP=INK, the buffer is empty.
Note that INPUT is ignored unless LOADing a program or in PASS mode, and the terminal keyboard is read directly.
REJECTED is used to put characters back into the input buffer. When REJECTED is not EOT, the character in REJECTED will be returned by the next read operation.
var Input, Inp, Ink; var Inbuf, Buffer::BUFLEN; var Rejected;
FLUSHINP clears the input buffer, points it back to the default vector, and connects it to the default input descriptor, T3X.SYSIN (stdin).
flushinp() do Input := T3X.SYSIN; Inbuf := Buffer; Inp := 0; Ink := 0; Rejected := EOT; end
AUXB1 and AUXB2 store the two lines most recently submitted to the interpreter, thereby implementing a two-line history. The ROTATEBUFS function rotates the buffers through the parameter B, thereby loading a line from the history into B.
var Auxb1::BUFLEN, Auxb2::BUFLEN; rotatebufs(b) do var i, c; for (i=0, BUFLEN) do c := Auxb1::i; Auxb1::i := Auxb2::i; Auxb2::i := b::i; b::i := c; end end
READCON reads a line from the terminal into the buffer B. B must have a size of at least BUFLEN bytes. The function provides cursor movement inside of the line, deletion of characters, history, erasing the entire input line, and aborting input. See the code for details, it is very straight-forward.
The following variables are used: P0 = X-position of the first character in the line on the terminal. P=position of the character under the cursor. N=number of characters in the line. R=redraw line when not zero. M=maximum number of characters to accept. K=key pressed.
The difference between erasing the input line and aborting input is that aborting kills all previous lines of a multi-line expression, too, while erasing the line only erases the current line.
readcon(b) do var p0, p, r, m, k, n; m := tty.columns() - 4; p0 := Xpos; p := 0; r := 1; n := 0; b::0 := 0; while (1) do if (r) do tty.move(p0, Ypos); tty.writes(b); tty.writec('\s'); r := 0; end tty.move(p+p0, Ypos); k := tty.readc(); ie (k = 'B'-'@' \/ k = ttyctl.K_LEFT) do if (p > 0) p := p-1; end else ie (k = 'F'-'@' \/ k = ttyctl.K_RIGHT) do if (p < n) p := p+1; end else ie (k = 'A'-'@') do p := 0; end else ie (k = 'E'-'@') do p := n; end else ie (k = 'L'-'@') do clear(); if (p0) pr("* "); r := 1; end else ie (k = 'P'-'@' \/ k = ttyctl.K_UP) do rotatebufs(b); n := strlen(b); p := n; r := 1; tty.move(Xpos, Ypos); tty.clreol(); end else ie (k = '\r' \/ k = '\n') do if (n > 0) do if (Auxb1::0 \= 0) strcopy(Auxb2, Auxb1); strcopy(Auxb1, b); end b::n := k; n := n+1; b::n := 0; nl(); return n; end else ie (k = 'C'-'@') do nl(); b::0 := 0; error("abort", UNDEF); return 0; end else ie (k = 'U'-'@') do b::0 := 0; n := 0; p := 0; r := 1; tty.move(p0, Ypos); tty.clreol(); end else ie ((k = '\b' \/ k = ttyctl.K_BKSP \/ k = 127) /\ p > 0) do n := n-1; p := p-1; t.memcopy(@b::p, @b::(p+1), n-p); b::n := 0; r := 1; end else ie (k = 4 /\ n = 0) do b::n := 4; b::(n+1) := 0; pr("^D"); nl(); return n; end else if ('\s' <= k /\ k <= '~') do if (n >= m) loop; t.memcopy(@b::(p+1), @b::p, n-p); b::p := k; p := p+1; n := n+1; b::n := 0; r := 1; end end end
The RDCH function reads a character from the input, be it the terminal or an input file. It reads the input buffer and when it is empty it refills it by either reading a file descriptor (in PASS mode) or by calling the line editor (READCON). It returns the next character from the buffer or EOT when the end of input is reached. (You can press control-D in READCON to send an EOT.)
rdch() do var c; if (Rejected \= EOT) do c := Rejected; Rejected := EOT; return c; end if (Inp >= Ink) do ie (Input = T3X.SYSIN /\ \Pass) Ink := readcon(Inbuf); else Ink := t.read(Input, Inbuf, BUFLEN); if (Ink < 1) return EOT; Inp := 0; end c := Inbuf::Inp; Inp := Inp+1; if (c = 4) return EOT; return c; end
RDCHCI is the case-insensitive variant of RDCH. It converts all input to lower case.
rdchci() do var c; c := rdch(); if (c >= 'A' /\ c <= 'Z') return c+32; return c; end
The MESSAGE function writes the string S to the message line at the bottom of the screen. In PASS mode it does nothing. MESSAGE does not update XPOS/YPOS, but returns to the original position after writing the message.
message(s) do if (Pass) return; tty.move(0, tty.lines()-1); tty.clreol(); tty.writes(s); tty.move(Xpos, Ypos); end
The FINI function shuts down the TTYCtl interface (when not in PASS mode), exits, and returns an exit code to the operating system.
fini(x) do if (\Pass) do tty.reset(); tty.mode(0); tty.fini(); end ie (x) halt 1; else halt 0; end
The ERROR function writes the error message M to the terminal (or T3X.SYSOUT in PASS mode). When N does not equal UNDEF, it also prints a colon and the LISP object passed to it in N. It uses the forward-declared PRINT function to print the LISP object.
ERROR also flushes the input buffer and sets the variable ERRFLAG to true (−1) to notify the interpreter of the error. No further errors will be reported as long as ERRFLAG is set.
In PASS mode ERROR halts the interpreter after reporting an error.
var Errflag; decl print(1); error(m, n) do if (Errflag) return; pr("? "); pr(m); if (n \= UNDEF) do pr(": "); print(n); end nl(); flushinp(); Errflag := %1; if (Pass) fini(1); return NIL; end
FATAL reports an error and then halts the interpreter. There are not many fatal errors. Most are internal errors (which should never happen) or handle corrupt image files.
fatal(m) do message(""); error(m, UNDEF); pr("? aborting"); nl(); fini(1); end
Memory of the LISP system is organized in "nodes" or "cells". Each node consists of three fields called "car", "cdr", and "tag". The fields are kept in vectors and are addressed using their offsets in the vectors. For instance, given a LISP object N (which is an offset into the above vectors), CAR[N] would be the car of N and CDR[N] would be the cdr of N.
CDR[N] always points to another LISP object. CAR[N] points to a LISP object, if TAG::N does not have the ATOMB flag set. (X::N is T3X notation for X[N] when X is a byte vector.) When the ATOMB flag of TAG::N is set, CAR[N] contains a small integer value and not a LISP object.
A node with the ATOMB flag set is an "atomic node", which is used to build atoms/symbols. A node with the ATOMB flag cleared is a cons cell.
var Car[NNODES], Cdr[NNODES]; var Tag::NNODES;
There are two more flags in the TAG field of each node. They are used by the garbage collector and will be explained below.
const ATOMB = 0x01; const MARKB = 0x02; const SWAPB = 0x04;
The FREELIST contains a list of unused nodes that are linked together via their cdr fields. When FREELIST=NIL, memory is exhausted and a garbage collection will be initiated.
var Freelist;
There is a small difference between atoms and symbols in KILO LISP 22, because there are some atoms that are not symbols, like NIL, or the end-of-file object, or the "undefined" object. Hence all symbols are atoms, but not all atoms are symbols. ATOMP and SYMBOLP test whether the given object is an atom or symbol.
atomp(n) return n >= SPCL \/ Car[n] < SPCL /\ Tag::Car[n] & ATOMB; symbolp(n) return n < SPCL /\ Car[n] < SPCL /\ Tag::Car[n] & ATOMB;
The MARK function implements a Deutsch/Schorr/Waite graph marker. It traverses a tree of nodes N and sets the MARKB flags of all nodes in the tree. It performs the traversal in constant space, i.e. it only requires three integers worth of space addition to the tree itself.
The basic idea of the D/S/W algorithm is to reverse the direction of a link (edge) in the tree when following it, so the child points to the parent while the tree starting at the child is traversed. When returning from a subtree after traversal, the original direction is restored. The algorithm uses the MARKB and SWAPB flags to direct the traversal as follows (M=MARKB, S=SWAPB):
The algorithms starts at a tree root N with parent P=NIL. It terminates when returning to P=NIL. When the MARK function exits, all nodes in the tree N have their MARKB flag set and their SWAPB flag cleared.
mark(n) do var p, x; p := NIL; while (1) do ie (n >= SPCL \/ Tag::n & MARKB) do if (p = NIL) leave; ie (Tag::p & SWAPB) do x := Cdr[p]; Cdr[p] := Car[p]; Car[p] := n; Tag::p := Tag::p & ~SWAPB; n := x; end else do x := p; p := Cdr[x]; Cdr[x] := n; n := x; end end else ie (Tag::n & ATOMB) do x := Cdr[n]; Cdr[n] := p; p := n; n := x; Tag::p := Tag::p | MARKB; end else do x := Car[n]; Car[n] := p; Tag::n := Tag::n | MARKB; p := n; n := x; Tag::p := Tag::p | SWAPB; end end end
The "garbage collector" (GC) of a LISP system is the part that recycles allocated but no longer used objects, so that the space allocated by them can be reused for different objects. In order to find out whether an object is "unused", it marks all nodes that can be addressed by any program in memory. The nodes that are not marked after this step are inaccessible and hence can be recycled safely. The present implementation of the GC uses the D/S/W graph marker to mark trees.
The trees that are marked by the GC start at locations called the "GC roots". In KILO LISP 22, these locations are:
Don't worry — these will be explained in the following!
var Symbols; var Acc, Env; var Stack, Mstack; var Tmpcar, Tmpcdr; var Znode;
When the VERBOSE_GV variable is non-zero, the GC will print some interesting information after each collection.
var Verbose_GC;
The GC function performs a garbage collection. It first marks the GC roots and then adds all still-unmarked nodes to the freelist. It also unmarks the nodes marked during traversal, thereby restoring their original state.
gc() do var i, k; if (Verbose_GC) pr("GC: "); mark(Znode); mark(Acc); mark(Env); mark(Symbols); mark(Stack); mark(Mstack); mark(Tmpcar); mark(Tmpcdr); k := 0; Freelist := NIL; for (i=0, NNODES) do ie (Tag::i & MARKB) do Tag::i := Tag::i & ~MARKB; end else do Cdr[i] := Freelist; Freelist := i; k := k+1; end end if (Verbose_GC) do pr(ntoa(k)); pr(" NODES"); nl(); end return k; end
The CONS3 function is the principal memory allocator of KILO LISP 22. It allocates a new node by removing it from the freelist and initializes its fields with car=A, cdr=D and tag=TA.
When the freelist is empty initially, it performs a garbage collection first. In this case the variables A and D are temporarily saved in the GC roots TMPCAR and TMPCDR, so they cannot be recycled by the GC. Therefore expressions like
cons3(a, cons3(b, c))
do not need to temporarily protect the values B or C or the return value of the inner CONS. Only A has to be protected while cons(b,c) runs.
When there are no free nodes after a garbage collection, CONS3 reports an error and returns the last-resort node ZNODE, which is linked to the LISP object (NIL . NIL). This is done to cover the case where the function calling CONS3 wants to modify the car or cdr field of the allocated object.
cons3(a, d, ta) do var n; if (Freelist = NIL) do Tmpcdr := d; if (\ta) Tmpcar := a; gc(); Tmpcar := NIL; Tmpcdr := NIL; if (Freelist = NIL) do error("out of nodes", UNDEF); return Znode; end end n := Freelist; Freelist := Cdr[Freelist]; Car[n] := a; Cdr[n] := d; Tag::n := ta; return n; end
CONS allocates and returns a fresh cons cell.
cons(a, d) return cons3(a, d, 0);
CAAR and friends. Not all variants are used.
caar(x) return Car[Car[x]]; cadr(x) return Car[Cdr[x]]; cdar(x) return Cdr[Car[x]]; cddr(x) return Cdr[Cdr[x]]; caadr(x) return Car[Car[Cdr[x]]]; cadar(x) return Car[Cdr[Car[x]]]; caddr(x) return Car[Cdr[Cdr[x]]]; cdadr(x) return Cdr[Car[Cdr[x]]]; cddar(x) return Cdr[Cdr[Car[x]]]; caddar(x) return Car[Cdr[Cdr[Car[x]]]]; cdddar(x) return Cdr[Cdr[Cdr[Car[x]]]];
NREV reverses a list destructively and returns the reversed list.
nrev(n) do var m, h; m := NIL; while (n \= NIL) do h := Cdr[n]; Cdr[n] := m; m := n; n := h; end return m; end
SAVE saves a LISP object on the stack. UNSAVE removes K objects from the stack and returns the last one removed. It also reports a fatal error when the stack is empty. This should never happen.
save(n) Stack := cons(n, Stack); unsave(k) do var n; while (k) do if (Stack = NIL) fatal("stack empty"); n := Car[Stack]; Stack := Cdr[Stack]; k := k-1; end return n; end
MSAVE saves a small integer value on the "mode stack", which is used by the evaluator to save its current "mode" (state). Hence the name of this stack. MUNSAVE works like UNSAVE, but always removes and returns the top element.
msave(n) Mstack := cons3(n, Mstack, ATOMB); munsave() do var n; if (Mstack = NIL) fatal("mstack empty"); n := Car[Mstack]; Mstack := Cdr[Mstack]; return n; end
The STRSYM function creates a new (uninterned) symbol from a string. The internal structure of a symbol is a chain of atomic nodes consed to the value of the symbol. The chain of atomic nodes carries two characters of the name per node. When the number of characters is not even, the last node will be filled with NUL.
Here is the structure of the symbol named LAMBDA:
CONS +---+---+ | O | O-----> VALUE +-|-+---+ | V +----+---+ +----+---+ +----+---+ | LA | O----->| MB | O----->| DA |NIL| +----+---+ +----+---+ +----+---+ ATOM ATOM ATOM
Note that the resulting symbol does not have a value box and hence cannot be used as a variable. Value boxes will be added by the ADDSYM function, which follows further below.
strsym(s) do var i, n; i := 0; if (s::i = 0) return NIL; n := cons3(NIL, NIL, ATOMB); save(n); while (1) do Car[n] := s::i << 8 | s::(i+1); if (s::(i+1) = 0 \/ s::(i+2) = 0) leave; Cdr[n] := cons3(NIL, NIL, ATOMB); n := Cdr[n]; i := i + 2; end n := unsave(1); return cons(n, UNDEF); end
SYMSTR is the inverse function of STRSYM: given a symbol it returns a string containing the name of the symbol. The name is returned in a global buffer (SYMB) that will be overwritten each time SYMSTR is called.
var Symb::SYMLEN+2; symstr(n) do var i; i := 0; n := Car[n]; while (n \= NIL) do Symb::i := Car[n] >> 8; Symb::(i+1) := Car[n] & 0xff; i := i + 2; n := Cdr[n]; end Symb::i := 0; return Symb; end
FINDSYM locates a symbol in the symbol list (SYMBOLS) and returns it. It identifies symbols by comparing their names, hence it can be used to locate not-yet-interned symbols. When the given symbol is not in the list, FINDSYM returns NIL.
findsym(s) do var p; p := Symbols; while (p \= NIL) do if (strequ(s, symstr(Car[p]))) return Car[p]; p := Cdr[p]; end return NIL; end
ADDSYM interns a symbol S and adds a value box to it. "Interning a symbol" means to look the symbol up in the symbol list (SYMBOLS) and return the symbol in the list instead of S. When S is not in SYMBOLS, it is first added and then returned. The reader (READ) will return the return value of ADDSYM for symbols, hence the same symbol name will always return the same member of SYMBOLS. This is the reason why addresses of symbols are unique and can be compared by EQ.
ADDSYM also adds a "value box" to the symbol S and puts the value of V into the box, giving the following structure:
+---+---+ +---+---+ | O | O----->| O |NIL| +-|-+---+ +-|-+---+ | | V V SYMBOL VALUE NAME
When the special value SPCL is passed to ADDSYM as the value of V, the value box will refer to the symbol itself, thereby forming a "constant":
+--------------------+ | | V | +---+---+ +---+---+ | | O | O----->| O |NIL| | +-|-+---+ +-|-+---+ | | | | V +---------+ SYMBOL NAME
(Note that constants are not "constant" in the usual sense: nothing stops you from modifying them!)
When V=UNDEF, the resulting symbol will remain "unbound".
addsym(s, v) do var n; n := findsym(s); if (n \= NIL) return n; n := strsym(s); Symbols := cons(n, Symbols); ie (v = SPCL) Cdr[n] := cons(n, NIL); else Cdr[n] := cons(v, NIL); return n; end
The LOOKUP function looks up the value of the symbol X in the current "lexical environment" (ENV) as well as in the "global environment". Lexical bindings are independent from (global) symbol bindings! A lexical environment is a list of association lists. It has the following general form:
( ( (var-1,1 . (val-1,1)) ... (var-1,M . (val-1,M)) ) ... ( (var-N,1 . (val-N,1)) ... (var-N,K . (val-N,K)) )
I.e., each lexical environment consists of a list of N association lists, each containing some number (M,K,...) of associations. Each association
(var . (val))
consists of a variable name (symbol) "var" and a value box "(val)". The value box will be used to store values in the variable.
The LOOKUP function traverses the environment and locates the symbol X. Because symbols are interned at this point, the operation X=Y can be used to find out if X and Y are the same symbol. When LOOKUP finds a variable associated with the given symbol, it returns its value box (not its value).
When no variable with the given name is found in the current lexical environment, the global binding (symbol binding) of the symbol is returned (also as a value box).
This approach uses deep binding (looking up symbols in lists) for local (lexical) variables and shallow binding (symbols are bound directly to value boxes) for global bindings. Shallow binding is much faster and lexical environments are typically small, so this method is almost as fast as shallow binding.
In case you wonder why not to use shallow binding for lexical variables, too: In lexically scoped systems there can be multiple, independent variables with the same name, and shallow bindings allows only one binding per symbol. This is a problem that deep binding solves, but at the price of linear-time lookup instead of constant-time lookup.
lookup(x) do var a ,e; e := Env; while (e \= NIL) do a := Car[e]; while (a \= NIL) do if (caar(a) = x) return cdar(a); a := Cdr[a]; end e := Cdr[e]; end return Cdr[x]; end
The variable PARENS keeps track of the number of open left parentheses.
var Parens;
The RDLIST function reads the members of a list (which in turn may be lists) and returns it. It recurses through the READ function, which implements the LISP reader, in order to read each individual member of the list. Note that the name "READ" is not reserved for a system call in T3X, so it can be used to name the LISP reader.
RDLIST appends new elements to its result destructively by maintaining a pointer (A) that points to the last cons in the list. This is why it starts with LST=(NIL): the first member of the resulting list will be inserted in the place of the NIL. The append pointer (A) is initially NIL in order to identify the empty list, (). It will be pointed to the next free slot after reading and inserting the first member of a list. I.e., initially:
LST --> (NIL) A --> NIL
and after reading the first and second element:
A A | | V V LST --> (first NIL) LST --> (first second NIL)
The next element will always be inserted at car(A), another (NIL) will be attached at cdr(A), and then A will move to that (NIL).
RDLIST also reads dotted lists. There are lots of ugly edge cases to cover here, like:
decl read(0); rdlist() do var n, lst, a, badpair; badpair := "bad pair"; Parens := Parens+1; lst := cons(NIL, NIL); save(lst); a := NIL; while (1) do if (Errflag) return NIL; n := read(); if (n = EOT) return error("missing ')'", UNDEF); if (n = DOT) do if (a = NIL) return error(badpair, UNDEF); n := read(); Cdr[a] := n; if (n = RPAREN \/ read() \= RPAREN) return error(badpair, UNDEF); unsave(1); Parens := Parens-1; return lst; end if (n = RPAREN) leave; ie (a = NIL) a := lst; else a := Cdr[a]; Car[a] := n; Cdr[a] := cons(NIL, NIL); end Parens := Parens-1; if (a \= NIL) Cdr[a] := NIL; unsave(1); return a = NIL-> NIL: lst; end
SYMBOLIC returns truth, if the character in its parameter, C, can appear in a symbol name. It also returns truth for the slash ('/') character, which escapes (slashifies) special characters in symbol names.
symbolic(c) return c >= 'a' /\ c <= 'z' \/ c >= 'A' /\ c <= 'Z' \/ c >= '0' /\ c <= '9' \/ c = '-' \/ c = '/';
The following variables will be bound to internal symbols known to the KILO LISP 22 system. First special forms and then built-in functions (and the T symbol indicating truth).
var S_apply, S_if, S_ifnot, S_lambda, S_lamstar, S_macro, S_progn, S_quote, S_quasiquote, S_unquote, S_unquote_splice, S_setq; var S_t, S_binding, S_cons, S_car, S_cdr, S_atom, S_eq, S_gensym, S_it, S_suspend, S_gc, S_eofp, S_load, S_rplaca, S_rplacd, S_read, S_prin1, S_print, S_error, S_symbols;
RDSYM reads a symbol name, which may consist of letters, decimal digits, and minus signs, as indicated by the SYMBOLIC function, above. When a slash ('/') is found in a symbol name, the following character is included, no matter what it is. For instance, the text /+/− will result in a symbol consisting of the characters '+' and '−'.
RDSYM folds all characters to lower case. When the symbol read by it is "nil", it will return the NIL object. In all other cases it will intern the accepted symbol and return it.
rdsym(c) do var s::SYMLEN+1, i; i := 0; while (symbolic(c)) do if (c = '/') c := rdchci(); ie (SYMLEN = i) error("long symbol", UNDEF); else if (i < SYMLEN) do s::i := c; i := i+1; end c := rdchci(); end s::i := 0; Rejected := c; if (strequ(s, "nil")) return NIL; return addsym(s, UNDEF); end
The SYNTAX function is called in various places where a syntactically wrong object (X) is encountered. This includes expressions like (IF X Y), where the third argument is missing, but also lexically wrong text, like *FOO* (which is not a valid symbol).
syntax(x) error("syntax", x);
QUOTE is used to construct expressions of the form (Q N), where N is any expression and Q is one out of QUOTE, QUASIQUOTE, UNQUOTE, and UNQUOTE-SPLICE. Q does not have to be protected from GC, because it is a symbol and all symbols are protected through the symbol list.
quote(q, n) return cons(q, cons(n, NIL));
RDWORD reads a "word". In KILO LISP 22 a word is a list of single-character symbols. Words may be abbreviated (condensed) with a # character followed by the symbols they contain. For example
(f o o b a r b a z)
may be written as
#foobarbaz
There is no empty word, so there must be at least one symbolic character after the # sign. Non-symbolic characters may be included by slashifying them. E.g.:
#hello/,/ world/!
rdword() do var s::2, n, c; s::1 := 0; c := rdchci(); if (\symbolic(c)) syntax(UNDEF); n := cons(NIL, NIL); save(n); while (1) do if (c = '/') c := rdchci(); s::0 := c; Car[n] := addsym(s, UNDEF); c := rdchci(); ie (symbolic(c)) do Cdr[n] := cons(NIL, NIL); n := Cdr[n]; end else leave; end Rejected := c; n := unsave(1); return n; end
The READ function implements the LISP reader, which parses textual representations of LISP objects and translates them to trees of nodes. This is what the function does:
READ returns the tree it generated from the parsed object. It reports a syntax error when it fails to parse its input.
read() do var c; c := rdchci(); while (1) do while (c = ' ' \/ c = '\t' \/ c = '\n' \/ c = '\r') do if (Errflag) return NIL; c := rdchci(); end if (c \= ';') leave; while (c \= '\n' /\ c \= '\r') c := rdch(); end if (c = EOT \/ c = '%') return EOT; if (c = '(') do return rdlist(); end if (c = '\'') do return quote(S_quote, read()); end if (c = '@' \/ c = '`') do return quote(S_quasiquote, read()); end if (c = ',') do c := rdchci(); if (c = '@') return quote(S_unquote_splice, read()); Rejected := c; return quote(S_unquote, read()); end if (c = ')') do if (\Parens) error("extra paren", UNDEF); return RPAREN; end if (c = '.') do if (\Parens) error("free dot", UNDEF); return DOT; end if (symbolic(c)) do return rdsym(c); end if (c = '#') do return rdword(); end syntax(UNDEF); return UNDEF; end
The PRINT and PRINT2 functions implement the LISP printer, which turns LISP objects — trees of nodes — into textual representations. It is (almost) the inverse function of READ. Almost the inverse function, because there are some outputs of PRINT that READ cannot read (like the string *CLOSURE* that will print when passing a function to PRINT).
The printer will stop printing when the error flag is set or when the maximum print depth is exceeded. Therefore, the values of expressions like
(let ((x '(1))) (rplaca x x))
will not overflow the stack. (Lists that cycle through cdr will print indefinetely, though.)
print2(n, d) do if (d > PRDEPTH) error("print depth", UNDEF); if (Errflag) return error("stop", UNDEF); ie (n = NIL) do pr("nil"); end else ie (n = EOT) do pr("*eot*"); end else ie (n >= SPCL \/ Tag::n & ATOMB) do pr("*unprintable*"); end else ie (symbolp(n)) do pr(symstr(n)); end else do ! List if (\atomp(n) /\ Car[n] = S_lamstar) do pr("*closure*"); return; end pr("("); while (n \= NIL) do print2(Car[n], d+1); n := Cdr[n]; ie (symbolp(n)) do pr(" . "); print2(n, d+1); n := NIL; end else if (n \= NIL) do pr(" "); end end if (\Errflag) pr(")"); end end print(n) print2(n, 0);
An "image file" is a file containing the entire node pool of the LISP system plus some meta data. The layout of an image file is as follows ("W" is the size of a machine word in bytes, two-byte integers are stored in big-endian order):
Size (Bytes) | Content | Location |
---|---|---|
4 | Magic ID ("KL22") | MAGIC |
2 | Number of nodes in pool | NNODES |
2 | Offset of freelist | FREELIST |
2 | Offset of symbol list | SYMBOLS |
2 | Next GENSYM ID | ID |
W × NNODES | Car fields of node pool | CAR |
W × NNODES | Cdr fields of node pool | CDR |
NNODES | Tag fields of node pool | TAG |
By saving an image file and reading it back later, the previous state of the LISP system is restored.
The SUSPEND function writes an image to the file specified in the string S and the FASLOAD function loads an image file from the given file. Both functions will report a fatal error when encountering an I/O error. In addition, FASLOAD will report a fatal error when the image file passed to it has the wrong magic ID or the wrong pool size.
var Magic; var Id; dowrite(fd, b, k) if (t.write(fd, b, k) \= k) error("image write error", UNDEF); suspend(s) do var fd, k, buf::20; fd := t.open(s, T3X.OWRITE); if (fd < 0) error("suspend", strsym(s)); k := strlen(Magic); t.memcopy(buf, Magic, k); buf::(k) := (NNODES >> 8); buf::(k+1) := NNODES; buf::(k+2) := (Freelist >> 8); buf::(k+3) := Freelist; buf::(k+4) := (Symbols >> 8); buf::(k+5) := Symbols; buf::(k+6) := (Id >> 8); buf::(k+7) := Id; dowrite(fd, buf, k+8); dowrite(fd, Car, NNODES * t.bpw()); dowrite(fd, Cdr, NNODES * t.bpw()); dowrite(fd, Tag, NNODES); t.close(fd); end doread(fd, b, k) if (t.read(fd, b, k) \= k) fatal("image read error"); fasload(s) do var fd, k, n, m, buf::20; fd := t.open(s, T3X.OREAD); if (fd < 0) return; k := strlen(Magic); doread(fd, buf, k+8); n := buf::k << 8 | buf::(k+1); if (n \= NNODES \/ t.memcomp(buf, Magic, k)) fatal("bad image"); Freelist := buf::(k+2) << 8 | buf::(k+3); Symbols := buf::(k+4) << 8 | buf::(k+5); Id := buf::(k+6) << 8 | buf::(k+7); doread(fd, Car, NNODES * t.bpw()); doread(fd, Cdr, NNODES * t.bpw()); doread(fd, Tag, NNODES); t.close(fd); end
CHECK makes sure that the list X has at least K0 and at most KN elements. When KN=−1, it makes sure than X has at least K0 elements (and there is no upper limit). When neither case is given, it reports a syntax error. It also reports a syntax error when X is a dotted list (i.e. when it does not end with NIL).
The CHECK function is used to make sure that applications of built-in functions and special forms have the proper number of arguments. Therefore expressions like (CAR '1 '2) will be reported as syntax errors.
check(x, k0, kn) do var i, a; i := 0; a := x; while (\atomp(a)) do i := i+1; a := Cdr[a]; end; if (a \= NIL \/ i < k0 \/ (kn \= %1 /\ i > kn)) syntax(x); end
The LOAD function reads and evaluates all LISP expressions contained in the file specified in the string S. All expressions evaluate as if entered at the interpreter prompt. When the given file does not exist, an error is reported.
Most of the code of LOAD saves, modifies, and restores the variables that comprise the input buffer. The state of the buffer is stored in local variables, then the state is reset, the buffer is linked to the given file, and a new buffer area (in TMPBUF) is allocated. After loading the file, the previous state of the buffer is restored.
Because stack space may be limited, recursive loading is limited to NLOAD levels of nesting.
var Loads; var Tmpbuf::NLOAD*BUFLEN; decl eval(1); load(s) do var fd, ib, ip, ik, in, re; if (Loads >= NLOAD) return error("load limit", UNDEF); fd := t.open(s, T3X.OREAD); if (fd < 0) return error("load", strsym(s)); in := Input; ip := Inp; ik := Ink; ib := Inbuf; re := Rejected; Input := fd; Inbuf := @Tmpbuf::(Loads*BUFLEN); Inp := 0; Ink := 0; Rejected := EOT; Loads := Loads+1; while (\Errflag) do Acc := read(); if (Acc = EOT) leave; eval(Acc); end t.close(fd); Loads := Loads-1; Rejected := re; Inbuf := ib; Ink := ik; Inp := ip; Input := in; end
The TYPE function is called whenever the wrong type is passed to a built-in function in an expression X.
type(x) error("type", x);
The function BUILTIN receives a list X that represents the application of a built-in function, like
(cons x y)
BUILTIN then identifies the function in the first slot of the list and runs the code that implements the function. The code fragments that implement built-in functions all look similar:
For instance, the code implementing CAR would make sure that X has two elements (the name CAR and the argument of CAR). It would then make sure that the argument of CAR is not an atom and, finally, it would return (CAADR X), the CAR of the first argument in X. The code for CONS or EQ would skip the type check, because these functions accept any types of arguments.
While BUILTIN is (formally) a large function, it is really a collection of small functions of the above form.
builtin(x) do var s, n; ie (S_car = Car[x]) do check(x, 2, 2); n := cadr(x); if (n = NIL) return NIL; if (atomp(n)) type(x); return Car[n]; end else ie (S_cdr = Car[x]) do check(x, 2, 2); n := cadr(x); if (n = NIL) return NIL; if (atomp(n)) type(x); return Cdr[n]; end else ie (S_eq = Car[x]) do check(x, 3, 3); return cadr(x) = caddr(x)-> S_t: NIL; end else ie (S_atom = Car[x]) do check(x, 2, 2); return atomp(cadr(x))-> S_t: NIL; end else ie (S_cons = Car[x]) do check(x, 3, 3); return cons(cadr(x), caddr(x)); end else ie (S_rplaca = Car[x]) do check(x, 3, 3); if (atomp(cadr(x))) type(x); Car[cadr(x)] := caddr(x); return cadr(x); end else ie (S_rplacd = Car[x]) do check(x, 3, 3); if (atomp(cadr(x))) type(x); Cdr[cadr(x)] := caddr(x); return cadr(x); end else ie (S_gensym = Car[x]) do check(x, 1, 1); Id := Id+1; s := ntoa(Id); s := s-1; s::0 := 'G'; return addsym(s, UNDEF); end else ie (S_eofp = Car[x]) do check(x, 2, 2); return cadr(x) = EOT-> S_t: NIL; end else ie (S_read = Car[x]) do check(x, 1, 1); return read(); end else ie (S_prin1 = Car[x]) do check(x, 2, 2); print(cadr(x)); return cadr(x); end else ie (S_print = Car[x]) do check(x, 2, 2); print(cadr(x)); nl(); return cadr(x); end else ie (S_binding = Car[x]) do check(x, 2, 2); n := cadr(x); if (\symbolp(n)) type(x); n := lookup(n); if (Car[n] = UNDEF) return NIL; return n; end else ie (S_load = Car[x]) do check(x, 2, 2); if (\symbolp(cadr(x))) type(x); load(symstr(cadr(x))); return S_t; end else ie (S_error = Car[x]) do check(x, 2, 3); if (\symbolp(cadr(x))) type(x); ie (cddr(x) = NIL) error(symstr(cadr(x)), UNDEF); else error(symstr(cadr(x)), caddr(x)); return UNDEF; end else ie (S_gc = Car[x]) do check(x, 1, 2); if (Cdr[x] \= NIL) Verbose_GC := cadr(x) \= NIL; return strsym(ntoa(gc())); end else ie (S_suspend = Car[x]) do check(x, 2, 2); if (\symbolp(cadr(x))) type(x); suspend(symstr(cadr(x))); return S_t; end else ie (S_symbols = Car[x]) do check(x, 1, 1); return Symbols; end else do syntax(x); return UNDEF; end end
SPECIAL returns truth, if N is an atom denoting a special operator, like IF, PROGN, etc. S_LAMSTAR denotes the atom LAMBDA*, which is used internally to form closures. Obviously it is a bad idea to use LAMBDA/* in KILO LISP programs. Closures should only be generated by applications of LAMBDA.
specialp(n) return n = S_quote \/ n = S_if \/ n = S_progn \/ n = S_ifnot \/ n = S_lambda \/ n = S_lamstar \/ n = S_apply \/ n = S_setq \/ n = S_macro;
CKLAM checks the syntax of a LAMBDA form: it must have at least two arguments and the first one must be either an atom or a list of atoms. A dotted list is fine.
cklam(x) do var p; check(x, 3, %1); p := cadr(x); while (\atomp(p)) do if (\symbolp(Car[p])) syntax(x); p := Cdr[p]; end end
The following STRUCT enumerates the "states" or "modes" of the evaluator. STRUCT is like ENUM in C (but it is most often used to give structure to vectors in T3X). The modes are as follows:
MHALT | evaluation is complete |
---|---|
MEXPR | start evaluation of an expression |
MLIST | evaluate arguments of an application |
MBETA | apply an operator (special, built-in, or closure) |
MRETN | return from a closure application |
MAPPL | evaluate arguments of APPLY |
MPRED | select alternative or consequent of IF |
MNOTP | select predicate or alternative of IFNOT |
MSETQ | store a value in a variable |
MPROG | evaluate arguments of PROGN |
The modes will be explained in detail in the following sections.
struct MODES = MHALT, MEXPR, MLIST, MBETA, MRETN, MAPPL, MPRED, MNOTP, MSETQ, MPROG;
SPECIAL is like BUILTIN, but computes special forms. In addition to the special form in X it receives another argument, PM, which is an output parameter (pointer) that is used to update the evaluator mode. SPECIAL will return the (intermediate) result of its operation to the caller and also set the variable pointed to by PM to the mode of the subsequent computation.
All special forms change the mode of the evaluator. This is one of their defining qualities in KILO LISP. The other one is that special forms receive their arguments unevaluated. For instance, evaluating
(QUOTE x)
will pass exactly that form to SPECIAL; X will not be evaluated first. The code fragment interpreting QUOTE will then check the syntax of QUOTE and return the second element of the QUOTE form. It will also restore the previous evaluator state by removing it from the mode stack and storing it in PM[0]. When QUOTE is evaluated directly at the prompt, the previous evaluator state would be MHALT, causing the evaluator to halt and return the argument of QUOTE.
Evaluation of most special forms is more complicated, though.
For example, the IF special form saves the MPRED state on the mode stack, sets the mode to MEXPR, saves the cddr part (the second and third argument of IF) on the stack, and returns the first argument of the IF form. E.g. (IF A B C) would
Because the mode is set to MEXPR, the evaluator would then evaluate A, leaving the result in ACC. When evaluation of A finishes, the previous state is popped from the mode stack. IF left MPRED on the mode stack, so either B or C is selected, depending on the value of ACC. The state will then be changed back to MEXPR so the selected expression (B or C) evaluates.
Most special forms work in this way:
Many special forms set the mode to MEXPR, causing the value returned by them to be evaluated. Exceptions are QUOTE and LAMBDA*, which are self-quoting, and LAMBDA and MACRO, which return their results directly. These forms pop the previous mode from the mode stack.
special(x, pm) do if (S_quote = Car[x]) do check(x, 2, 2); pm[0] := munsave(); return cadr(x); end if (S_if = Car[x]) do check(x, 4, 4); msave(MPRED); pm[0] := MEXPR; save(cddr(x)); return cadr(x); end if (S_progn = Car[x]) do pm[0] := MEXPR; if (Cdr[x] = NIL) return NIL; if (cddr(x) = NIL) return cadr(x); msave(MPROG); save(cddr(x)); return cadr(x); end if (S_ifnot = Car[x]) do check(x, 3, 3); msave(MNOTP); pm[0] := MEXPR; save(caddr(x)); return cadr(x); end if (S_lambda = Car[x]) do cklam(x); pm[0] := munsave(); return cons(S_lamstar, cons(Env, Cdr[x])); end if (S_lamstar = Car[x]) do ! check(x, 3, %1); pm[0] := munsave(); return x; end if (S_apply = Car[x]) do check(x, 3, 3); msave(MAPPL); pm[0] := MEXPR; save(caddr(x)); save(UNDEF); return cadr(x); end if (S_macro = Car[x]) do check(x, 2, 2); if (atomp(cadr(x)) \/ caadr(x) \= S_lambda) syntax(x); cklam(cadr(x)); pm[0] := munsave(); return cons(S_macro, cons(cons(S_lamstar, cons(Env, cdadr(x))), NIL)); end if (S_setq = Car[x]) do check(x, 3, 3); if (\symbolp(cadr(x))) syntax(x); msave(MSETQ); pm[0] := MEXPR; save(cadr(x)); return caddr(x); end syntax(x); return UNDEF; end
The EXPAND function receives a LISP form in X and returns its expanded form. An expanded form is a form that contains only function applications and applications of "primitive" special forms, but no applications of "derived special forms". A primitive special form is a form that is being processed by the SPECIAL function, above. A derived special form is a list with the name of a macro (a "keyword") in its first slot.
The process of macro expansion searches derived special forms in the form X and "expands" them by applying the function associated with the keyword of the form to the arguments of the form. Arguments are not evaluated. When a derived special form
(keyword argument-1 ... argument-N)
is found in X, EXPAND transforms it to the form
(apply *closure* (quote (argument-1 ... argument-N)))
where *CLOSURE* is the function associated with the macro KEYWORD.
It then evaluates the resulting form and replaces the application of the derived special operator with its result. Derived special forms are expanded recursively, i.e. the result of macro expansion is expanded again before substituting the application of the special operator.
Macro expansion never expands atoms or quoted objects. The result of expanding a form X that does not contain any derived special forms is the form X itself. Hence macro expansion eventually reaches a fixed point where its result is identical to its argument (unless expansion does not terminate due to a faulty macro).
The first half of EXPAND deals with the trivial cases and performs the transformation described above. The second half applies EXPAND recursively to every element of X, thereby performing a depth-first traversal of X. Note that dotted lists are never expanded.
expand(x) do var n, m, p; if (atomp(x)) return x; if (Car[x] = S_quote) return x; n := symbolp(Car[x])-> Car[lookup(Car[x])]: UNDEF; if (\atomp(n) /\ Car[n] = S_macro) do m := cons(Cdr[x], NIL); m := cons(S_quote, m); m := cons(m, NIL); m := cons(cadr(n), m); m := cons(S_apply, m); save(m); n := eval(m); Car[Stack] := n; n := expand(n); unsave(1); return n; end p := x; while (\atomp(p)) p := Cdr[p]; if (symbolp(p)) return x; save(x); n := NIL; save(n); p := x; while (p \= NIL) do m := expand(Car[p]); n := cons(m, n); Car[Stack] := n; p := Cdr[p]; end n := nrev(unsave(1)); unsave(1); return n; end
The BINDARGS function binds the symbols in the list V to the values (arguments) in the list A. It does so by creating an association list with the elements of V as keys and value boxes containing the elements of A as values. The resulting association list is then added to the front to the current environment ENV. Existing variables in ENV which have the same names as symbols in V are thereby shadowed.
V and A must have the same length unless V is a dotted list. When V is a dotted list, there must be one corresponding element in A for each element of V that comes before the dot. Any excess elements of A will then be bound to the last symbol in V as a list. For example
V = (a b . c) A = (1 2 3 4 5)
will result in the association list
((a . (1)) (b . (2)) (c . ((3 4 5))))
When number of symbols in V and values in A does not match, BINDARGS reports an error. In case of success, it extends ENV with the new bindings.
bindargs(v, a) do var e, n; e := NIL; save(e); while (\atomp(v)) do if (a = NIL) return error("too few args", Acc); n := cons(Car[v], cons(Car[a], NIL)); e := cons(n, e); Car[Stack] := e; v := Cdr[v]; a := Cdr[a]; end ie (symbolp(v)) do n := cons(v, cons(a, NIL)); e := cons(n, e); Car[Stack] := e; end else if (a \= NIL) do return error("extra args", Acc); end Env := cons(e, Env); unsave(1); end
FUNAPP applies a function ("closure") to some arguments. It receives the application of a closure of the form
((lambda* vars env body ...) args ...)
where (LAMBDA* ...) is a closure as returned by the LAMBDA special form. A closure is a function containing the following components:
A closure is applied to the arguments ARGS by
When the top of the mode stack is MRETN when a function is applied, then the function application is a tail application. When the called (applied) function returns, the caller will itself return, so there is no need to carry along the saved environment of the caller. Therefore steps 1 and 4 will be omitted in the above list when performing a tail application. Subsequently tail-recursive functions will compute in constant space.
funapp(x) do Acc := x; if (atomp(Car[x]) \/ caar(x) \= S_lamstar) syntax(x); ie (Mstack \= NIL /\ Car[Mstack] = MRETN) do Env := cadar(Acc); bindargs(caddar(Acc), Cdr[Acc]); end else do save(Env); Env := cadar(Acc); bindargs(caddar(Acc), Cdr[Acc]); msave(MRETN); end return cons(S_progn, cdddar(x)); end
The EVAL function is the core of the LISP system. It receives a LISP expression and returns its "normal form", i.e. the result of interpreting the expression. There is a limit on the number of recursive applications of EVAL (EVDEPTH), but this limit only affects macro expansion. While evaluating an expression EVAL never recurses.
The first thing EVAL does is to macro-expand the expression (X) passed to it. The macro expander will in turn apply EVAL in order to expand any macros in X. This mutual recursion is what it limited to EVDEPTH iterations.
The evaluator has a current "state" or "mode" (M), which determines in which way it will transform the expression X. In principle the evaluator can be thought of as a set of functions, one for each mode, and the main loop of the interpreter passes data between these functions. When one internal function calls another internal function, the current mode is pushed to the mode stack. In order to return to the caller, the previous mode is popped from the stack. The mode stack is not only used to return to callers, though. Some functions push a "continuation" to the mode stack: a mode where the computation shall continue when completing the current mode. This has been illustrated in the section on "special forms".
The evaluator stores results (and intermediate results) in a register named the "accumulator" (ACC). This register is also used internally to pass data between the internal functions (modes).
Here follows a summary of the different modes of the evaluator.
When the expression ACC is a symbol, load the value bound to it into ACC and go the previous mode. When the symbol is not bound, report an error.
When ACC is a non-symbolic atom (NIL, the EOT object, etc), go to the previous state (keeping the atom in ACC).
When ACC is a special form (its car part is a special operator or macro name), go to state MBETA.
When ACC is a function application, save cdr(ACC) on the stack, load car(ACC) to ACC, push cdr(ACC) and NIL to the stack and MLIST to the mode stack. The state remains MEXPR, which causes car(ACC) to be evaluated next. After evaluation of car(ACC), MLIST will be popped from the stack, so MLIST is the continuation of this case.
In this state the remaining arguments to be processed will be in cadr(STACK) and the arguments processed so far will be in car(STACK) in reverse order. When there are no more arguments to process, MLIST will construct the final argument list by adding the last element (ACC) and reversing the list. The final argument list is then stored in ACC and both parameters are removed from the stack. The mode will then change to MBETA.
When there are more arguments to be processed, MLIST will cons the previously evaluated argument (ACC) to the top element on the stack and fetch the next element from cadr(STACK) and remove it from that list. It will then push MLIST to the mode stack again (because it has been popped by evaluating the current list element) and go to mode MEXPR in order to evaluate the next element.
This mode delegates application of special forms, built-in functions, and closures to the functions BUILTIN, SPECIAL, and FUNAPP. Special forms get their continuation mode from the function SPECIAL by passing the address of M (@M) to it. Built-in functions return to the previous mode after computing their result. Closures evaluate the body returned by FUNAPP by going to mode MEXPR. Note that FUNAPP pushes MRETN in order to return from the function later.
This mode is called MBETA because it loosely resembles the "beta" conversion of lambda calculus.
Returning from a function is done by restoring the environment (ENV) saved on the stack and the mode saved on the mode stack. Both are removed from their stacks.
The APPLY operator has two arguments that need to be evaluated. When MAPPL is entered for the first time, car(STACK) is UNDEF, cadr(STACK) contains the (not yet evaluated) second argument, and ACC contains the (already evaluated) first argument. The first argument is then stored in car(STACK) and the second argument is prepared for evaluation (like in MLIST).
When MAPPL is entered for the second time, it conses the first argument to the second one and goes to mode MBETA. At this point ACC will contain the form
(first second-1 ... second-N)
where "first" is the (evaluated) first argument of APPLY and each "second-i" is a member of the list forming its (evaluated) second argument. The above form is then evaluated without evaluating its arguments again, by changing the mode directly to MBETA instead of going through MLIST.
When MPRED is entered, the predicate of the IF special form
(IF predicate consequent alternative)
has been evaluated and its result is in ACC. Car(STACK) contains the "consequent" and "alternative" parts of the special form. MPRED loads the alternative into ACC when ACC is NIL and otherwise it loads the consequent. It then changes the mode to MEXPR in order to evaluate the selected branch.
This mode evaluates the IFNOT special form, which is probably has no counterpart in other LISPs. The IFNOT special form
(IFNOT predicate alternative)
evaluates the predicate and if the predicate it not NIL it just returns the value of the predicate (without evaluating it again). If the predicate is NIL it evaluates and returns the alternative.
When MNOTP is entered, the value saved in car(STACK) is the alternative branch. When ACC is not NIL, MNOTP returns to the previous mode. When it is NIL, it will submit car(STACK) for evaluation.
IFNOT is very handy for implementing the OR special form or predicate-only clauses of COND. See the LISP part of this document for examples.
When MSETQ is entered, the value to be assigned to a variable is in ACC and the symbol naming the variable is in car(STACK). MSETQ removes the symbol from the stack, stores the value in its value box, and returns to the previous mode.
MPROG evaluates a sequence of expressions (a "program"). It discards the values of all expressions except for the last one, which it returns. Empty sequences result in NIL, but this case is handled by the function SPECIAL, which has been discussed earlier.
MPROG keeps the list of expressions to evaluate in car(STACK). When car(STACK) contains one single expression, it removes the list from the stack and submits the expression for evaluation. When there are multiple expressions in the list, it removes and submits one expression from the list and pushes MPROG to the mode stack as its continuation.
This mode returns from EVAL and delivers the value in ACC to the caller.
Note that MHALT is on top of the mode stack when the EVAL is entered, so returning to the previous mode after finishing evaluation of the expression passed to EVAL will cause evaluation to halt.
Evaluation will also halt when an error is reported (ERRFLAG is set) or when control-C is pressed on the terminal keyboard.
var Evlev; eval(x) do var n, m; if (Evlev >= EVDEPTH) return error("expansion limit", UNDEF); Evlev := Evlev + 1; Acc := expand(x); msave(MHALT); m := MEXPR; while (\Errflag) do ie (m = MEXPR) do if (\Pass /\ tty.query() /\ tty.readc() = 3) error("stop", UNDEF); ie (symbolp(Acc)) do n := Car[lookup(Acc)]; if (n = UNDEF) return error("undefined", Acc); Acc := n; m := munsave(); end else ie (atomp(Acc)) do m := munsave(); end else ie (specialp(Car[Acc])) do m := MBETA; end else do save(Cdr[Acc]); Acc := Car[Acc]; save(NIL); msave(MLIST); end end else ie (m = MLIST) do ie (cadr(Stack) = NIL) do Acc := nrev(cons(Acc, unsave(1))); unsave(1); m := MBETA; end else do Car[Stack] := cons(Acc, Car[Stack]); Acc := caadr(Stack); Car[Cdr[Stack]] := cdadr(Stack); msave(MLIST); m := MEXPR; end end else ie (m = MBETA) do ie (specialp(Car[Acc])) do Acc := special(Acc, @m); end else ie (symbolp(Car[Acc])) do Acc := builtin(Acc); m := munsave(); end else do Acc := funapp(Acc); m := MEXPR; end end else ie (m = MRETN) do Env := unsave(1); m := munsave(); end else ie (m = MAPPL) do ie (Car[Stack] = UNDEF) do Car[Stack] := Acc; Acc := cadr(Stack); msave(MAPPL); m := MEXPR; end else do n := unsave(1); unsave(1); Acc := cons(n, Acc); m := MBETA; end end else ie (m = MPRED) do n := unsave(1); ie (Acc = NIL) Acc := cadr(n); else Acc := Car[n]; m := MEXPR; end else ie (m = MNOTP) do n := unsave(1); ie (Acc = NIL) do Acc := n; m := MEXPR; end else do m := munsave(); end end else ie (m = MSETQ) do n := unsave(1); Car[lookup(n)] := Acc; Acc := n; m := munsave(); end else ie (m = MPROG) do ie (cdar(Stack) = NIL) do Acc := Car[unsave(1)]; m := MEXPR; end else do Acc := caar(Stack); Car[Stack] := cdar(Stack); msave(MPROG); m := MEXPR; end end else if (m = MHALT) do Evlev := Evlev - 1; return Acc; end end return NIL; end
The RESET function resets some variables to their initial values. For instance, it clears the registers (ACC, ENV, TMPCAR, TMPCDR), empties the stacks, and resets the various counters and the error flag. It prepares the evaluation of an expression. This cannot be done in EVAL itself, because EVAL recurses during macro expansion.
The lexical environment (ENV) must be NIL (empty) initially, because there are no local bindings at the top level. Aborted computation may leave spurious associations in ENV, so it is cleared here. TMPCAR, TMPCDR and the links of ZNODE are cleared so that they do not protect random objects from being recycled.
reset() do Evlev := 0; Parens := 0; Loads := 0; Errflag := 0; Acc := NIL; Env := NIL; Stack := NIL; Mstack := NIL; Tmpcar := NIL; Tmpcdr := NIL; Car[Znode] := NIL; Cdr[Znode] := NIL; end
The INIT function initializes the state of the system. When not in PASS mode, it sets up the TTYCtl class and puts it in raw mode on Unix systems. It then sets up all internal variables: the freelist and symbol list are set to NIL, the history buffers of READCON are cleared, etc. The input buffer is flushed.
A large part of INIT creates symbols for internal functions, like CONS, CAR, ATOM, etc. All these symbols are defined as constants (symbols referring to themselves), because EQ is used to identify them in the evaluator. For instance, the ATOM in (ATOM 'X) evaluates to ATOM and is then identified using EQ in BUILTIN.
Special form symbols are bound to UNDEF, which leaves them unbound. This means that APPLY or SETQ, for instance, are undefined at the LISP level, but they exist internally and can still be identified by EQ, which is done in the function SPECIAL.
This section also creates the symbols QUASIQUOTE, UNQUOTE, and UNQUOTE-SPLICE, so that the reader can expand their abbreviations.
The canonical truth symbol, T, is also defined here.
init() do if (\Pass) do tty.init(); if (tty.columns() > BUFLEN) do clear(); fatal("screen too wide"); end tty.mode(1); clear(); end Magic := "KL22"; Symbols := NIL; Freelist := NIL; Znode := NIL; Auxb1::0 := 0; Auxb2::0 := 0; Id := 0; flushinp(); Verbose_GC := 0; t.memfill(Tag, 0, NNODES); Znode := cons(NIL, NIL); S_t := addsym("t", SPCL); S_apply := addsym("apply", UNDEF); S_if := addsym("if", UNDEF); S_ifnot := addsym("ifnot", UNDEF); S_lambda := addsym("lambda", UNDEF); S_lamstar := addsym("lambda*", UNDEF); S_macro := addsym("macro", UNDEF); S_progn := addsym("progn", UNDEF); S_quote := addsym("quote", UNDEF); S_quasiquote := addsym("quasiquote", UNDEF); S_unquote := addsym("unquote", UNDEF); S_unquote_splice := addsym("unquote-splice", UNDEF); S_setq := addsym("setq", UNDEF); S_it := addsym("it", UNDEF); S_binding := addsym("binding", SPCL); S_cons := addsym("cons", SPCL); S_car := addsym("car", SPCL); S_cdr := addsym("cdr", SPCL); S_atom := addsym("atom", SPCL); S_eq := addsym("eq", SPCL); S_eofp := addsym("eofp", SPCL); S_rplaca := addsym("rplaca", SPCL); S_rplacd := addsym("rplacd", SPCL); S_gensym := addsym("gensym", SPCL); S_read := addsym("read", SPCL); S_prin1 := addsym("prin1", SPCL); S_print := addsym("print", SPCL); S_error := addsym("error", SPCL); S_load := addsym("load", SPCL); S_gc := addsym("gc", SPCL); S_suspend := addsym("suspend", SPCL); S_symbols := addsym("symbols", SPCL); reset(); end
A statement block without an associated function is the main program and entry point of a T3X module.
The main program first extracts the first command line argument (using T.GETARG) and activates PASS mode when it equals "−". Hence "kl −" can be used to run KILO LISP 22 in batch ("pass") mode with I/O connected to the SYSIN and SYSOUT descriptors.
After initializing the internal state of the system an image file is loaded. When an image file name is specified as the first command line argument, then the given file is loaded. When there is no argument or the argument equals "−" then the image file "klisp" is loaded. A non-existent file will be ignored and no image will be loaded. A corrupt or incompatible image file will cause a fatal error.
Before entering the "REPL" (the read-eval-print loop), the interpreter is heralded and the number of free nodes is reported (unless in PASS mode).
The REPL itself is simple. It
do var a::14, c; c := t.getarg(1, a, 14); Pass := 0; if (c > 0 /\ strequ(a, "-")) Pass := %1; init(); fasload(c>0 /\ \Pass-> a: "klisp"); pr("KILO LISP "); pr(@Magic::2); nl(); if (\Pass) do pr(ntoa(gc())); pr(" NODES"); nl(); end while (1) do reset(); message(""); pr("*\s"); Acc := read(); if (Errflag) loop; if (Acc = EOT) leave; message("[EVAL]"); Acc := eval(Acc); if (Errflag) loop; print(Acc); Car[Cdr[S_it]] := Acc; nl(); end if (Pass) nl(); fini(0); end
The KILO LISP 22 source code continues with the LISP part of the system.