README - Previous Version

This source file is part of the SubC compiler, which is described in the book

Practical Compiler Construction.

You might prefer to download the compiler source code. It is in the public domain.


        SubC Compiler, Version 2014-04-25
        By Nils M Holm, 2011--2014
        Placed in the public domain


        SUMMARY

        SubC is a compiler for a (mostly) strict and sane subset of
        C as described in "The C Programming Language", 2nd Ed (also
        known informally as "ANSI C" or "C89").

        A previous version of the compiler is described in great detail
        in the book "Practical Compiler Construction", which can be
        purchased at Lulu.com. See  http://www.t3x.org/reload/  for
        ordering information.

        The SubC compiler can compile itself. Unlike many other small C
        compilers, it does not bend the rules, though. Its code passes
        "gcc -Wall -pedantic" with little or no warnings (depending on
        the gcc version used). Of course, you can also bootstrap it with
        other C compilers, such as Clang or PCC.

        SubC is fast and simple. Its output is typically small (due to
        a non-bloated library), but not very runtime-efficient, because
        it employs none of the optimization strategies explained in the
        book.


        SUPPORTED SYSTEMS

        SubC generates code for GAS, the GNU assembler (except for the
        DOS version, which emits TASM-style syntax). It targets the
        following processors and operating systems:

                FreeBSD         386+    armv6+  x86-64
                Linux           386+*   -       x86-64+*
                NetBSD          -       -       x86-64
                OpenBSD         386*    -       -
                Windows/MinGW   386*    -       -
                Darwin          -       -       x86-64*
                DOS             8086+!

                + does *not* use the syscall layer of the host libc
                * untested
                ! experimental

        Platforms tagged "untested" are not regularly tested by myself
        and are therefore subject to potential bit rot. You can help
        me improve SubC by running "make tests" on an "untested"
        platform and inform me about the result.

        Porting SubC to other 32-bit or 64-bit platforms should be
        quite straight-forward. See the file "Porting" and/or the book
        for a general road map.

        The DOS version brings its own toolchain, which can be found in
        the s86/ directory, so no pre-existing DOS assembler or linker
        is required to compile SubC programs on DOS.


        CHANGES TO THE BOOK VERSION

        Note: The book version runs on FreeBSD/386 exclusively.

        The current version uses an improved code generator, which
        emits about 10% smaller and 20% faster code than the compiler
        described in the book.

        This version of the SubC compiler adds support for the
        following parts of C language to the version described in
        "Practical Compiler Construction":

        o  &array is now valid syntax (you no longer have to write
           &array[0]).

        o  the auto, register, and volatile keywords are recognized
           (as no-ops). Yes, volatile is safe, because SubC does not
           have register variables.

        o  enums may now be local.

        o  extern identifiers may now be declared locally.

        o  Prototypes may have the static storage class.

        o  There is support for structs and unions.

        o  jmp_buf is now a struct; setjmp() and longjmp() must be
           called with &jmp_buf.

        o  FILEs are now structs and can no longer be mistaken for
           ints by the type checker.

        o  The #error, #line, and #pragma commands have been added.

        o  There is a (non-standard) kprintf() function, which is
           like fprintf(), but uses a file descriptor.

        o  There is now a (slightly incompatible) varargs mechanism.
           Here is how it works:

                #include <varargs.h>

                void p(void *last, ...) {
                        void    *ap;
                        int     first;

                        ap = _va_start(&last);
                        first = (int) _va_arg(&ap);
                        vprintf("other args: %d %d %d\n", ap);
                        _va_end(&ap);
                }

                int p(int x, int y, ...);

           There may only be one argument in front of the "...", but
           the signature of the function can later be modified with a
           prototype.

        o  The vprintf(), vfprintf(), and vsprintf() functions have
           been added to the runtime library.


        DIFFERENCES BETWEEN SUBC (THIS VERSION) AND FULL C89

        o  The following keywords are not recognized:
           const, double, float, goto, long, short, signed, typedef,
           unsigned.

        o  There are only two primitive data types: the signed int and
           the unsigned char; there are also void pointers, and there
           is limited support for int(*)() (pointers to functions
           of type int).

        o  No more than two levels of indirection are supported, and
           arrays are limited to one dimension, i.e. valid declarators
           are limited to x, x[], *x, *x[], **x (and (*x)()).

        o  K&R-style function declarations (with parameter
           declarations between the parameter list and function body)
           are not accepted.

        o  There are no ``const'' variables.

        o  There is no typedef.

        o  There are no unsigned integers, long integers, or signed
           chars.

        o  Struct/union declarations must be separate from the
           declarations of struct/union objects, i.e.
           ``struct p { int x, y; } q;'' will not work.

        o  Struct/union declarations must be global (struct and union
           objects may be declared locally, though).

        o  There is no support for bit fields.

        o  Only ints, chars and arrays of int and char can be
           initialized in their declarations; pointers can be
           initialized with 0 (but not with NULL).

        o  Local arrays cannot have initializers.

        o  Local declarations are limited to the beginnings of function
           bodies (they do not work in other compound statements).

        o  Arguments of prototypes must be named.

        o  There is no goto.

        o  There are no parameterized macros.

        o  The #if and #elif preprocessor commands are not recognized.

        o  The preprocessor does not accept multi-line commands.

        o  The preprocessor does not accept comments in (some) commands.

        o  The preprocessor does not recognize the # and ## operators.

        o  There may not be any blanks between the # that introduces
           a preprocessor command and the subsequent command (e.g.:
           "# define" would not be recognized as a valid command).

        o  The sizeof operator requires parentheses.

        o  Subscripting an integer with a pointer (e.g. 1["foo"]) is
           not supported.

        o  Function pointers are limited to one single type, int(*)(),
           and they have no argument types.

        o  There is no assert() due to the lack of parameterized macros.

        o  The atexit() mechanism is limited to one function (this may
           even be covered by TCPL2).

        o  The signal() function returns int due to the lack of a more
           sophisticated type system; the return value must be casted to
           int(*)() manually.

        o  Most of the time-related functions are missing, in particular:
           asctime(), gmtime(), localtime(), mktime(), and strftime().

        o  The clock() function is missing, because CLOCKS_PER_SEC
           varies among systems.

        o  The ctime() function ignores the time zone.

        o  The varargs mechanism is slightly incompatible.


        SELECTING A TARGET PLATFORM

        The easiest way to prepare a build is to run the configure
        script in this directory. Don't worry, it is just a simple
        script that will figure out the host platform via uname and
        link a few machine-dependent files into place.

        If you want to configure the compiler manually: select one of
        the target descriptions (cg*.c) files in src/targets/cg and
        symlink it to src/cg.c. Also link the corresponding header
        file into place:

                (cd src && ln -fs targets/cg/cg386.c cg.c)
                (cd src && ln -fs targets/cg/cg386.h cg.h)

        Next select the C startup (crt0) file for your OS and CPU type
        from src/targets/OS-CPU/ and link it to src/lib/crt0.s, e.g.:

                (cd src/lib && \
                 ln -fs ../targets/freebsd-386/crt0-freebsd-386.s \
                        crt0.s)

        If your OS/CPU combination is not supported, you might try
        to port the compiler. See the file "Porting" for details.

        You will also need some operating system-dependent
        definitions, which are kept in files named "sys-OS-CPU.h"
        in src/targets/OS-CPU/. Just symlink the appropriate file
        to src/sys.h:

                (cd src && \
                 ln -fs targets/freebsd-386/sys-freebsd-386.h sys.h)

        Finally, select a limits-*.h file from targets/include/ that
        reflects the machine word size of your target and link it to
        src/include/limits.h:

                (cd src/include && \
                 ln -fs ../targets/include/limits-32.h limits.h)


        COMPILING THE COMPILER

        The compiler sources are contained in the "src" directory,
        so all the subsequent steps assume that this is your current
        working directory. (I.e. do a "cd src" now.)

        On a supported system, just type "make".

        Without "make" the compiler can be bootstrapped by running:

                cc -o scc0 *.c

        To compile and package the runtime library:

                ./scc0 -c lib/*.c
                ar -rc lib/libscc.a lib/*.o
                ranlib lib/libscc.a

        To compile the startup module:

                as -o lib/crt0.o lib/crt0.s

        To test the compiler, either run "make test" or perform the
        following steps:

                ./scc0 -o scc1 *.c
                ./scc1 -o scc *.c
                cmp scc1 scc

        There should not be any differences between the scc1 and scc
        executables.


        INSTALLING THE COMPILER

        The easy way would be to set up the PREFIX (and optionally
        SCCDIR and BINDIR) variables in src/Makefile to suit your
        taste and then run

                make dirs       # to create the directories
                make install

        If you want to install the SubC compiler manually, you will
        have to change the SCCDIR variable in the compiler itself.
        It points to the base directory which will contain the SubC
        headers and runtime library. SCCDIR defaults to "." but can
        be overridden on the command line:

                ./scc1 -o scc -D 'SCCDIR="INSTALLDIR"' *.c

        (where INSTALLDIR is where the compiler will be installed.)

        You can place the 'scc' executable wherever you want, as long
        as its location is covered by the PATH environment variable.
        The headers (include/*) go to INSTALLDIR/include, the library
        'lib/libscc.a' and the startup module 'lib/crt0.o' go to
        INSTALLDIR/lib.

        To test the installation just re-compile the compiler:

                rm scc && scc -o scc *.c


        DOS SUPPORT

        Please see the file SubC-for-DOS.


        WINDOWS SUPPORT

        There is a build script (winbuild.bat) in the base directory
        of the package, but I have never tried it. Caveat utilitor!

        All Windows-related code in the runtime system has been
        generously supplied by Jean-Marc Lienher (http://cod5.org).
        I am afraid I am not able to answer any questions about it,
        because I know nothing about Windows.

        The Windows version of SubC requires the MinGW infrastructure,
        but, like the Linux version, does not use the GNU libc. To
        compile SubC on Windows, just run configure and make, like
        on a Unix system.

        In case you have to configure it manually, you also have to
        do the following:

        Replace or symlink the following files:

                ln -fs targets/lib/init-windows.c lib/init.c
                ln -fs targets/lib/system-windows.c lib/system.c

        Finally change the EXESFX (executable suffix) variable in
        src/Makefile to ".exe" (without quotes).

        After installing these changes, the compiler should bootstrap
        as usual and pass the triple test.


        THANKS

        To the Super Dimension Fortress (SDF.ORG) for providing
        free shell accounts on 64-bit NetBSD machines.

        To Bakul Shah for granting me remote access to a 64-bit
        FreeBSD system and a Linux VM.

        To "minux" for porting the runtime module to Linux/x86-64.

        To Jean-Marc Lienher (cod5.org) for porting the runtime module
        to MinGW Windows/386.

        To Romain LWPB for porting the runtime module to OpenBSD/386
        and Darwin/x86-64 as well as for modifying the x86-64 code
        generator to emit proper code for Darwin.

        To The Unknown Hacker for various minor and not so minor
        contributions.


        CONTACT

        Send feedback, suggestions, etc to:

        n m h @ t 3 x . o r g

        See http://t3x.org/contact.html for current ways through my
        spam filter.


contact