README - Stable Version

This source file is part of the SubC compiler, which is described in the book

Practical Compiler Construction.

You might prefer to download the compiler source code. It is in the public domain.


	SubC Compiler, Version 2013-05-18
	By Nils M Holm, 2011--2013
	Placed in the public domain


	SUMMARY

	SubC is a compiler for a (mostly) strict and sane subset of
	C as described in "The C Programming Language", 2nd Ed.
	The language is also known informally as "ANSI C" or "C89".

	A previous version of the compiler is described in great detail
	in the book "Practical Compiler Construction", which can be
	purchased at Lulu.com. See  http://www.t3x.org/reload/  for
	ordering information.

	The SubC compiler can compile itself. Unlike many other small C
	compilers, it does not bend the rules, though. Its code passes
	"gcc -Wall -pedantic" with little or no warnings (depending on
	the gcc version used).

	The compiler generates code for GAS, the GNU assembler. It
	targets the 386 and x86-64 processors and currently offers
	runtime support for the following platforms:

		FreeBSD/386
		FreeBSD/x86-64
		NetBSD/x86-64
		Linux/386
		Linux/x86-64
		Windows/386 (MinGW)

	Porting it to other 32-bit or 64-bit platforms should be quite
	straight-forward. See the file "Porting" and/or the book for a
	general road map.

	There is also an in-progress port to DOS on 8086-based
	processors, but it does not emit working code at this point
	and has not been touched in months. Feel free to improve it,
	though!

	SubC is fast and simple. Its output is typically small (due
	to a non-bloated library), but not very runtime efficient,
	because it employs none of the optimization strategies
	explained in the book.

	There is now an experimental synthesizing back-end (as described
	in the book), which generates much better code than the original
	stack-based back-end. However, is has not been tested on all
	platforms. See "selecting a target platform" below for details.


	CHANGES TO THE BOOK VERSION

	Note: The book version runs on FreeBSD/386 exclusively.

	This version of the SubC compiler adds support for the
	following parts of C language to the version described in
	"Practical Compiler Construction":

	o  &array is now valid syntax (you no longer have to write
	   &array[0]).

	o  the auto and register keywords are recognized (as no-ops).

	o  enums may now be local.

	o  extern identifiers may now be declared locally.

	o  Prototypes may have the static storage class.

	o  There is support for structs and unions.

	o  jmp_buf is now a struct; setjmp() and longjmp() must be
	   called with &jmp_buf.

	o  FILEs are now structs and can no longer be mistaken for
	   ints by the type checker.

	o  The #error, #line, and #pragma command have been added.

	o  There is a (non-standard) kprintf() function, which is
	   like fprintf(), but uses a file descriptor.


	DIFFERENCES BETWEEN SUBC (THIS VERSION) AND FULL C89

	o  The following keywords are not recognized:
	   const, double, float, goto, long, short, signed, typedef,
	   unsigned, volatile.

	o  There are only two primitive data types: the signed int and
	   the unsigned char; there are also void pointers, and there
	   is limited support for int(*)() (pointers to functions
	   of type int).

	o  No more than two levels of indirection are supported, and
	   arrays are limited to one dimension, i.e. valid declarators
	   are limited to x, x[], *x, *x[], **x (and (*x)()).

	o  K&R-style function declarations (with parameter
	   declarations between the parameter list and function body)
	   are not accepted.

	o  There are no ``volatile'', or ``const'' variables. No
	   register allocation takes place, so all variables are
	   implicitly ``volatile''.

	o  There is no typedef.

	o  There are no unsigned integers and no long integers.

	o  Struct/union declarations must be separate from the
	   declarations of struct/union objects, i.e.
	   ``struct p { int x, y; } q;'' will not work.

	o  Struct/union declarations must be global (struct and union
	   objects may be declared locally, though).

	o  Only ints, chars and arrays of int and char can be
	   initialized in their declarations; pointers can be
	   initialized with 0 (but not with NULL).

	o  Local arrays cannot have initializers.

	o  Local declarations are limited to the beginnings of function
	   bodies (they do not work in other compound statements).

	o  Arguments of prototypes must be named.

	o  There is no goto.

	o  There are no parameterized macros.

	o  The #if and #elif preprocessor commands are not recognized.

	o  The preprocessor does not accept multi-line command.

	o  The preprocessor does not accept comments in commands.

	o  The preprocessor does not recognize the # and ## operators.

	o  There may not be any blanks between the # that introduces
	   a preprocessor command and the subsequent command (e.g.:
	   "# define" would not be recognized as a valid command).

	o  The sizeof operator requires parentheses.

	o  Subscripting an integer with a pointer (e.g. 1["foo"]) is
	   not supported.

	o  Function pointers are limited to one single type, int(*)(),
	   and they have no argument types.

	o  There is no assert() due to the lack of parameterized macros.

	o  The atexit() mechanism is limited to one function (this may
	   even be covered by TCPL2).

	o  The setjmp()/longjmp() functions must be called with &jmp_buf
	   due to the lack of typedef. This is a bug!

	o  The signal() function returns int due to the lack of a more
	   sophisticated type system; the return value must be casted to
	   int(*)() manually.

	o  Most of the time-related functions are missing, in particular:
	   asctime(), gmtime(), localtime(), mktime(), and strftime().

	o  The clock() function is missing, because CLOCKS_PER_SEC
	   varies among systems.

	o  The ctime() function ignores the time zone.


	SELECTING A TARGET PLATFORM

	The easiest way to prepare a build is to run the configure
	script in this directory. Don't worry, it is just a simple
	script that will figure out the host platform via uname and
	link a few machine-dependent files into place.

	If the build should fail the triple test, make clean, re-run
	configure with the '-old' option, and retry. Running

		./configure -old

	will select the naive, stack-based back-end, which generates
	worse code, but but may be worth a try in case you have run
	into a compiler bug.

	If you want to configure the compiler manually: select one of
	the target descriptions (cg*.c) files in src/targets and symlink
	it to src/cg.c. Also link the corresponding header file and
	code generator into place:

		(cd src && ln -fs targets/cg386-stk.c cg.c)
		(cd src && ln -fs targets/cg386.h cg.h)
		(cd src && ln -fs targets/stkgen.c gen.c)

	Use stkgen.c (the stack-based generator) for *-stk.c target
	descriptions and syngen.c (the synthesizing generator) for
	*-syn.c descriptions.

	Also select the C startup (crt0) file for your OS and CPU type
	from src/targets and link it to src/lib/crt0.s, e.g.:

		(cd src/lib && \
		 ln -fs ../targets/crt0-freebsd-386.s crt0.s)

	If your OS/CPU combination is not supported, you might try
	to port the compiler. See the file "Porting" for details.

	You will also need some operating system-dependent definitions,
	which are kept in files names <your-os.h> in src/targets/. Just
	symlink the appropriate file to sys.h:

		(cd src && ln -fs targets/freebsd.h sys.h)

	Finally, select limits-*.h file from targets/ that reflects the
	machine word size of your target and link it to include/limits.h:

		(cd src/include && \
		 ln -fs ../targets/limits-32.h limits.h)


	COMPILING THE COMPILER

	The compiler sources are contained in the "src" directory,
	so all the subsequent steps assume that this is your current
	working directory. (I.e. do a "cd src" now.)

	On a supported system, just type "make".

	Without "make" the compiler can be bootstrapped by running:

		cc -o scc0 *.c

	To compile and package the runtime library:

		./scc0 -c lib/*.c
		ar -rc lib/libscc.a lib/*.o
		ranlib lib/libscc.a

	To compile the startup module:

		as -o lib/crt0.o lib/crt0.s

	To test the compiler, either run "make test" or perform the
	following steps:

		./scc0 -o scc1 *.c
		./scc1 -o scc *.c
		cmp scc1 scc

	There should not be any differences between the scc1 and scc
	executables.


	INSTALLING THE COMPILER

	The easy way would be to set up the SCCDIR and BINDIR variables
	in src/Makefile to suit your taste and then run

		make clean install

	If you want to install the SubC compiler manually, you will
	have to change the SCCDIR variable in the compiler itself.
	It points to the base directory which will contain the SubC
	headers and runtime library. SCCDIR defaults to "." and can
	be overridden on the command line:

		./scc1 -o scc -D 'SCCDIR="INSTALLDIR"' *.c

	(where INSTALLDIR is where the compiler will be installed.)

	You can place the 'scc' executable wherever you want, as long
	as its location is covered by the PATH environment variable.
	The headers (include/*) go to INSTALLDIR/include, the library
	'lib/libscc.a' and the startup module 'lib/crt0.o' go to
	INSTALLDIR/lib.

	To test the installation just re-compile the compiler:

		rm scc && scc -o scc *.c


	WINDOWS SUPPORT

	All Windows-related code in the runtime system has been
	generously supplied by Jean-Marc Lienher (http://cod5.org).
	I am afraid I am not able to answer any questions about it,
	because I know nothing about Windows.

	The Windows version of SubC requires the MinGW infrastructure,
	but, like the Linux version, does not use the GNU libc. To
	compile SubC on Windows, just run configure and make, like
	on a Unix system.

	In case you have to configure it manually, you also have to
	do the following:

	Replace or symlink following files:

		ln -fs targets/init-windows.c lib/init.c
		ln -fs targets/system-windows.c lib/system.c

	Also use the Windows Makefile instead of its Unix cousin:

		cp src/Makefile.windows src/Makefile

	After installing these files, the compiler should bootstrap
	as usual and pass the triple test.


	THANKS

	To the Super Dimension Fortress (SDF.ORG) for providing
	free shell accounts on 64-bit NetBSD machines.

	To Bakul Shah for granting me remote access to a 64-bit
	FreeBSD system and a Linux VM.

	To "minux" for porting the runtime module to Linux/x86-64.

	To Jean-Marc Lienher (cod5.org) for porting the runtime module
	to MinGW Windows/386.


	CONTACT

	Send feedback, suggestions, etc to:

	n m h @ t 3 x . o r g

	See http://t3x.org/contact.html for current ways through my
	spam filter.


contact