OXCC.TXT Version: 1.433 -- preliminary 5 Nov 1995 by Norman D. Culver

    OXCC is a multipass interpreting C compiler with numerous language
    extensions (see c.grm).

    OXCC generates output in an Architecture Neutral Format (ANF).
	Sample backends are provided to guide the programmer in dealing with
	ANF and in writing additional backends for a specific purpose.

	The builtin interpreter provides for a great deal of flexibility
	but it does make the compiler a memory hog. The entire file under
	compilation is stored in memory as an Abstract Syntax Tree (AST).
	The AST can be printed to stdout with the -a option.

	Language extensions have been inspired by GCC, MSC, and Watcom C.
	OXCC is designed to produce 16 bit, 32 bit, 64 bit, segmented and 
	flat model code for any target architecture and operating system.

	OXCC can regenerate C source code after interpreting some or all of 
	its' input. Source regeneration properly handles `malloced' data
	containing pointers. The regenerated source code can be `shrouded'.
	Regenerated source files have the suffix .cr .

	The builtin interpreter can be run in fast or slow mode. Slow mode
	maintains elaborate pointer and initialization information which
	often is the only way to catch a subtle runtime bug.

    The compiler and include files are located in the file `oxcc.cff'. It
    is run from the command line by the skeleton program `oxcc.exe'
    (see skel.doc). If no switches are enabled, OXCC merely checks the
    input file(s) for errors.

	OXCC is reentrant; a set of C calls and a set of class based calls
	are provided. Multiple instances of OXCC can be run simultaneously.
	A program being compiled by OXCC can run OXCC as a subroutine.
	The program under compilation can gain access to the AST and
	symbol tables which describe it by using the calls __builtin_iv() and
	__builtin_root(). See: toxcc.c

Usage: oxcc [-adoqrstuwABDEFGHILMOPRSTWY(] file...
   -a == print ast                   -s == print symbol table
   -t == print runtimes              -u == print memory usage
   -r == run the code                -f == if run code, go fast
   -L == produce listing             -E == preprocess only
   -S == shrouded source output      -T == trace lines while interpreting
   -P == Parse only
   "-(args for interpreted main" 
   -dn == enable debug output, 1=parser 2=lexer 3=both
   -q == suppress printing gratuitous info
   -o outfile == name of output file, default is first infile
   -A == ansi_mode (suppress extensions, not fully implemented)
   -W == if run code and not fast mode, warn about address problems
   -w == suppress compiler warnings
   -R func == if run code, start at function `func'
   -Ipath == include path for the C preprocessor
   -Ddef == define something for the C preprocssor
   -Gx == generate output in format `x' (abdnmrs)
   -Ox == generate output for operating system `x' (dDwWCNoOUL)
   -Hx == generate output for hardware type `x' (iIPDHmM)
   -Bx == generate output for debugger `x' (vwbgd)
   -Fx == generate object file format `x' (oOPWBace)
   -Yx == generate assembler format `x' (ugmt)
   -Mx == use memory model `x' (tsmlchx)

OUTPUT OPTIONS
    -Gs   regenerate source (output file has .cr suffix)
    -SGs  regenerate shrouded source (output file has .cr suffix)
    -Gb   generate bytecodes (calls oxccb, output file has .byt suffix)
    -LGb  generate bytecode listing (calls oxccb, output file has .lst suffix)
    -Ga   generate assembler output (calls oxccaH, where H is hardware type)
    -Gd   generate readable ANF code (output file has .dbg suffix)
    -Gm   generate machine code (calls oxccmH, where H is hardware type)
    -Gn   generate ANF code (output file has .anf suffix)
    -Gr   generate RIP code (calls oxccr, output file has .rip suffix)

	(see oxanf.doc, oxanf.h)
    -Ox   placed in header slot `target_os'         default `D' DOS32PLAT
    -Hx   placed in header slot `target_hardware'   default `I' INTEL8632
    -Bx   placed in header slot `target_debugger'   default  0  NONE
    -Fx   placed in header slot `obj_format'        default `a' AOUTFORMAT
    -Yx   placed in header slot `target_assembler'  default `g' GAS
    -Mx   placed in header slot `memory_model'      default `x' MODFLAT


INCOMPATIBILITIES

    FUNCTION DECLARATIONS
    OXCC, being a multipass compiler, always chooses the `best' declaration
    for a function. The old style practice of hiding function declarations
    with a declaration containing an unknown number of args (commonly used
    by some programmers) just will not work. At the very least you
    will get a warning if a subroutine is called with arguments imcommensurate
    with the `best' declaration. OXCC will not assume that the declaration
    of an undeclared function `func' is `int func()', you must explicitly
    declare all functions.


LANGUAGE EXTENSIONS

    RUNTIME INTERPRETATION
    The -r switch will cause OXCC to interpret the AST if it can find
    a function named `main' or failing that a function with the base name
    of the input file e.g. test32.c with a function named `test32'.
    The user can specify a unique starting function with the -R switch.
    Arguments can be passed to the starting function by using the -( switch
    providing the starting function adheres to the argc, argv convention.
    e.g.:
       oxcc -r test32.c "-(23 hello 14 -W"
    Another way to cause runtime interpretation is to call the starting
    function from the right hand side of the last initialized outer variable.
    The only restriction is that the starting function must return a value.


    INTERPRETING OUTER DECLARATIONS INCLUDING INNER STATIC VARIABLES
	OXCC evaluates (interprets) non-constant expressions in outer declarations.
    Anything that can appear in a normal C program can contribute to the value
    that is stored in an initialized variable. Uninitialized variables can 
    become initialized as a side effect of a function call. Two reserved words
    `_ival' and `_ifunc' can be prepended to variables and functions 
    respectively in order to prevent them from appearing in the output.
    e.g.:
        double q = sin(2.0) / cos(4.3);
	    void *ptr = malloc(200);    // interpreted malloc acts like calloc
	    static int x,y;
	    _ifunc int initfunc()      // function `initfunc' will not be output
	    {
		int i;
	        x = 50;    // static variable x is initialized to 50.
	        y = 25;    // static variable y is initialized to 25.
			for(i = 0; i < x; ++i)
				ptr[i] = malloc(y);	// initialize the array of pointers
	        return 0;
	    }
        int startfunc(int z)  // function `startfunc' will appear in output
        {
			x += z;		// static variable x is modified before output
			...
			return 0;
        }
	    _ival int z = initfunc();    // variable `z' will not be output
		char *ary[20] = {[2]=ptr[3], [3]=malloc(x), [18]=malloc(y)};
        _ival int dummy = startfunc(25); // variable `dummy' will not be output


	AUTOMATIC VARIABLES (INNER DECLARATIONS)
    Automatic variables can be initialized with non-constant expressions.
    Static variables mentioned inside functions can be non-constant and
    will be initialized at outer declaration time.
    `alloca' is not a suitable initializer for a static variable inside
    a function, use `malloc'.


    DEFAULT ARGUMENTS FOR FUNCTIONS
    Functions can be declared with default args, just use an `=' and fill
    in the right hand side.
    e.g.:
        int func(int a = 3, struct _a b = {2.3,4,1}, char *cp = "hello")
        {
        	....	
        }
    Functions with default args can be called with 0 or more actual args.
    They can also be called normally.
    e.g.:
        func(cp: "goodby"); // a and b will take the default values
        func(3,B,ptr);      // a, b, and cp are fully specified
		func(3);            // b and cp will take the default values
		func();             // a, b, and cp take the default values


    LABELED IDENTIFIERS FOR INITIALIZING ARRAYS AND STRUCTURES
    This extension is inspired by GCC 2.6.x .
    e.g.:
        int array[200] = {[5]= 2, [123]= 45};
        int array[20][50] = {[3][12]= 6, [18][23]= 8};
        struct {
          int x;
    	  int y;
    	  double q;
		  struct {
			int a;
			int b;
		  } bb;
		  struct {
			int a;
			int b;
		  } cc;
        } aa = {.q=4.6, .cc={.b = 12}};  // everything else set to 0


    COMPOUND EXPRESSIONS RETURN A VALUE
    Place parenthesis around braces to create a compound expression
    which consists of multiple statements and returns a value.
    This extension is inspired by GCC.
    e.g.:
	  int y = ({int i; 
                 for(i=0; i<200;++i)
                   if(i>x)
                    break;
                 i+x;	// mention variable to be returned
               });  // y = i+x


    NESTED FUNCTIONS
    Nested functions are functions placed inside functions at
    the location of automatic variables, i.e. before stmts following
    a left brace. All of the automatic variables of the enclosing
    function are within the scope of the nested function and do not
    have to be passed as arguments when the nested function is called.

    OXCC implements flavor #1 of nested functions in which the stack
    of the nested function coincides with the stack of the enclosing
    function. This is the most efficient way to deal with nested functions
    but precludes a nested function from being called recursively. The
    address of a nested function can be taken and passed to a syncronous
    callback. Asyncronous callbacks (such as might occur in an operating
    system like WINDOWS) will not work. Nested functions can call other
    nested functions which are within scope.

    Flavor #2 of nested functions requires that the nested function be
    extracted from it's surroundings and given a stack of it's own. A
    pointer to the stack frame of the enclosing function is passed invisibly
    whenever the nested function is called. Callbacks are implemented with
    thunks. This method produces a much slower nested function facility
    but is usually necessary when generating machine language. Asyncronous
    callbacks will not work.


    TYPEOF, ALIGNOF
    The type of an expression can be derived and applied wherever a normal
    type would be used.
    e.g.:
        typeof(x) y;
        typeof(*x) y;
    The alignment of an expression can be obtained.
    e.g.:
        int x = __alignof__(y);


    COMPUTED TYPEDEF
    A typedef can be computed.
    e.g.:
	    typedef XTYPE = x;
        XTYPE q;


    STRUCTURE ALIGNMENT AND PACKING
    Structures can be designated as packed with the `_Packed' keyword.
    OXCC also supports the awful __attribute__ constructions of GCC.
    OXCC also supports various forms of the #pragma pack(n) directives
    but it is strongly suggested that these not be used because source
    regeneration does not handle pragma regeneration.
   
  
    LOCAL LABELS
    Each block is a scope in which local labels can be declared. The
    value of the label goes out of scope with the block. This is handy
    for macros. GCC inspired.
    e.g.:
      {
      	__label__ l1:			// declares l1 to be a local label
            ...
			goto l1;
			...
		l1:
      }


    CASE RANGES
    Case values may be expressed in the form:
        case 2 ... 4:


    ARITHMETIC ON VOID POINTERS
    Pointers typed as void* are assumed to have the same size as char*
    for the purpose of pointer arithmetic.


    MACROS WITH VARIABLE NUMBERS OF ARGUMENTS
    GCC inspired extension to the C preprocessor.
    e.g.:
        #define myprintf(format, args...) \
        fprintf(stderr, format, ## args)


    ZERO LENGTH ARRAYS
    Arrays of zero length are allowed within structures.


    CONDITIONALS WITH OMITTED OPERANDS
    The construction x ? : y
    is equivalent to x ? x : y 
    except that x is not evaluated a second time.
    

    DOUBLE WORD INTEGERS
    The long long type is supported.

    LONG DOUBLE
    The long double type is supported.

	FUNCTION TYPES
    Various keywords from the segmented DOS world are understood by OXCC.
    Currently OXCC does not do anything other than label the function
    for later processing. (see c.grm)
    

	SEGMENT INFORMATION
    The keywords `__segdef__' and `__seguse__' are used to specify
    segment info. The arguments to __segdef__ must be constant expressions.
    This info is passed along to back end code generators.
    e.g.:
        __segdef__ DATA16 arg1, arg2, arg3;	// 0 to 3 args
        __segdef__ DATA32 arg1, arg2, arg3;
        __segdef__ TEXT16 arg1, arg2, arg3;

        __seguse__ DATA32;
        int x,y,z;

        __seguse__ TEXT16;
        int func()
        {
        }
        __seguse__ TEXT32;
        int func1()
        {
        }


	BASED POINTERS
    Microsoft C defines based pointers and segment variables, OXCC currently
    parses and stores the information for later processing by back ends,
    but it does not yet know how to interpret this stuff. It can correctly
    regenerate source.


	NEAR FAR HUGE POINTERS
    Ditto as per Based pointers.(see c.grm)


    ASSEMBLER INSTRUCTIONS
    Various flavors of assembler code can be absorbed and regenerated by
    OXCC. Interpretation is out of the question and assembler instructions
    are not passed to back end code generators on the theory that portability
    can never be achieved. The OXCC solution is to provide an extensible
    facility for direct generation of ANF code. (see c.grm)


    ANF INSTRUCTION BLOCKS
    OXCC generates ANF code (see anf.doc, oxanf.h) from C instructions. The
    programmer can generate ANF code by enclosing it in a block.
    e.g.:
        __anf__ {
          mov x,y/2;		// divide y by 2 and store in x
          lsh y,z,3;		// shift z left by 3 and store in y
           ...
        }
    ANF blocks can be placed inside or outside of functions.
    The basic set of ANF instructions can be extended by programmers to
    achieve meaningful (I hope) methods of expressing concepts which
    normally require assembler code. Essentially, ANF instructions consist
    of an opcode followed by up to 3 arguments, the opcode set can be
	extended by:
       1. add new strings to oxanf.h
       2. compile oxanf.h
       3. insert in oxlib.cff with `cfar.exe' (see oxcc.mak)
    ANF arguments can be any valid C expression and are evaluated by OXCC with
    code generation where appropriate.


    NO-NAME STRUCTURES/UNIONS
    Inspired by Visual C++ 2.0
    In order to compile 32 bit Windows programs it is necessary to deal with
    un-named structures and unions which are members of named struct/unions.
    This feature permits the programmer to reference the members of the
    un-named struct/unions as if they were members of the enclosing named
    container. Just make sure that all of the member names are unique.
    Very nice idea.


GLOBAL SUBROUTINES IN OXCC -- also callable by code being interpreted

    (see oxcc.h and toxcc.c)
	void *__builtin_iv(void);
	void *__builtin_root(void);
    void *oxcc_get_pg(void *iv);
    void oxcc_enable_trace(void *iv);
    void oxcc_disable_trace(void *iv);
    void oxcc_debug(void *iv, int bits);

    void oxcc_proc_ptr_info(void *iv, void (*func)());
        func(void*,void*,void*,long);
    void oxcc_proc_syms(void *iv, unsigned space, void (*func)());
        func(AstP node, long symb, void *container);
    void oxcc_proc_swtable(void*iv, void *swnode, void (*func)());
        func(long swval, AstP root);
    void oxcc_proc_mallocs(void *iv, void *func());
        func(void *loc, int size, Item *ip);

    void *oxcc_open_instance(void);
    void oxcc_set_options(void *iv, char *opts);
    int oxcc_preproc_file(void *iv, void *is, void *os, void *es,
                                           int argc, char **argv);
    int oxcc_parse_file(void *iv, void *is, void *es, char *filename);
    void oxcc_print_parse_errors(void *iv, void *es);
    int oxcc_check_ast_tree(void *iv, void *es, char *filename);
    int oxcc_init_outers(void *iv, void *es);
    int oxcc_run_tree(void *iv, void *es, char *fnam, char *arg, char *startf);
    int oxcc_gen_code(void *iv, void *es, char *filename, void *os);
    void oxcc_cleanup_parse(void *iv);
    void oxcc_close_codefile(void *iv);
    void oxcc_close_instance(void *iv);

    void oxcc_print_ast(void *iv, void *os, int flag);
    void *oxcc_get_ast_root(void *iv);
    int oxcc_eval_expr(void *iv, void *buf, double *result, void *es);

    void gSetup(void *self, void *str);
    int gPreProc(void *self, void *is, void *os, void *es,int argc,char **argv);
    int gParse(void *self, void *is, void *es, char *filename);
    void gPerror(void *self, void *es);
    int gCheckTree(void *self, void *es, char *filename);
    int gInitOuters(void *self, void *es);
    int gRunCode(void *self, void *es, char *filename, char *args);
    int gGenCode(void *self, void *es, void *os, char *filename);
    void gCleanup(void *self);
    void gCloseCode(void *self);
    void gPrtAst(void *self, void *es, int flag);
    void *gGetRoot(void *self);
    int gEval(void *self, void *buf, double *result, void *es);


TODO
  1.  Improved optimization
  2.  Inline functions
  3.  Flavor #2 for nested functions
  4.  Modify oxcc to be callable as a reentrant subroutine or class [DONE 25May]
  5.  True interpretation of segmented code and 16 bit code.
  6.  Interpret ANF instruction blocks.
  7.  Support long double and complex data types. [long double DONE 5Nov]
  8.  Write more back ends
  9.  Better documentation
  10. Better test program [test.bat for starters]
  11. Add a new type `enumstring' to avoid parallel tables
  12. Built in inheritance engine with COM, SOM, DCE compliance. [Coming up]
  13. Generate Java,RIP code (need to juice up the grammar a bit)
  14. Make ANF more general and text readable
  15. Suggestions ??
