c-datadec.man 5 KB
Newer Older
css1dw's avatar
css1dw committed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202
.TH DATADEC L
.SH NAME
datadec \- C data declaration module constructor
.SH SYNOPSIS
datadec
.RB [\- vno ]
.I basename
.RB [infile]
.SH DESCRIPTION
.B Datadec
takes an input file - or stdin if no input file is given -
containing a series of HOPE/Miranda style recursive data declarations
with optional hints on printing, 
and builds an definition/implementation pair of files \-
.I "basename.h"
and
.I "basename.c"
containing data declarations,
constructor functions, deconstructor functions and printing functions.

.PP
The two files produced together form a module that implements the relevent
data types.

.SH "OPTIONS"
.TP 8
.B "\-v"
enter verbose mode.
.I "Datadec"
now displays the data types that it parses, along with various almost
certainly useless bits of information about optimization.
.TP
.B "\-n"
do not perform various optimizations.
.TP
.B "\-o"
perform optimizations (the default).

.SH "AN EXAMPLE"
.PP
The simplest use is to prepare an input file, such as
.I "data.in,"
which might (for example) contain:
.nf
TYPE {
	IntList =  Null or Cons( int first, IntList next );
	ILList  =  Null or Cons( IntList first, ILList next );
	IdTree  =  Leaf( string id )
		or Node( IdTree left, IdTree right );
}
.fi
To generate C code implementing these types, invoke:
.nf
     datadec eek data.in
.fi
which generates
.I "eek.h"
and
.I "eek.c"

.SH THE DATA DECLARATION LANGUAGE

The language accepted by datadec is split into two components:
the "outer language" is patterned after
the GMD compiler tools
.B "LALR"
and
.B "REX"
(similar to Yacc and Lex)
and allows you to specify four sections (only the last is compulsory):

.PP
.nf
.B "[ EXPORT { free_format_text } ]"
.br
.B "[ GLOBAL { free_format_text } ]"
.br
.B "[ BEGIN { free_format_text } ]"
.br
.B "TYPE { types }"
.fi

.PP
The contents of the
.I "export"
section are placed in the header file (the .h).
Commonly, you may wish to add extern function declarations, public types and
external variable declarations
which must be
placed at the top of the header file, and also define some additional
procedures using the automatically generated types which must be placed after
the type declarations!
To achieve this, you should place a `@@' in the export section -- the text up
to that point is placed at the top of the header file, whereas the text
after it is placed at the bottom of the header file -- after all the types
have been defined.

.PP
Similarly, the contents of the 
.I "global"
section are placed in the C file,
again with `@@' being used to split the global section into "top of file" and
"bottom at file" pieces.

.PP
Similarly, the contents of the 
.I "begin"
section are placed in an initialization procedure, which the user of the
constructed module must remember to call at an appropriate juncture (eg.
immediately when main starts).

.PP
The
.I "types"
section contains the type declarations themselves - the inner language.

.PP
The "inner language" - that of specifying the actual types section -
is closely modelled on Miranda or Hope, with printing rules added.
Here is the grammar:

.PP
.nf
types	= list(type)
.br
type 	= type_name '=' shape list( 'or' shape ) ';'
.br
shape	= constructor_name [ '(' params ') ] [ print ]
.br
params	= param list( ',' param)
.br
param	= type_name param_name
.br
print	= list(element)
.br
element	= number | string_literal
.fi

.PP
Note that each data type is terminated by a semicolon,
and that (within one data type) each shape is separated from the next by 'or'
(just like the '|' in Miranda).
If a particular shape has parameters, they are separated from each other
by commas.
Each type name is simply an identifier.

.PP
.I "Datadec"
also generates routines to write each type to an open FILE *.
The method of printing each shape is governed by the presence or absence
of a print rule.  If no print rule is given, the constructor name is printed,
and then each parameter is written out using the appropriate print routine.

.PP
If a print rule is given, each print element
(these are syntactically separated by whitespace)
is used to generate the write routine as follows:
A literal string will simply be printed
(well, '\\n' is turned into a newline!),
whereas a number (eg. 4) means that the
4th parameter is printed (invoking the print function for that routine).

.PP
For example, we could augment the
.I "IdTree"
type from the example given above with print rules:

.nf
TYPE {
IdTree  = Leaf( string id )			"leaf(" 1 ")"
	or Node( IdTree left, IdTree right )	"node(" 1 ",\\n" 2 ")";
}
.fi

.PP
Now, an IdTree constructed as
.nf
Node( Leaf( "hello" ), Node( Leaf( "there" ) ) )
.fi
would print as:
.nf
node(leaf("hello"),
.br
node(leaf("there")))
.fi

.SH SEE ALSO
.nf
LALR, REX, Miranda Language Definition.
.fi

.SH BUGS
Some single letter typenames (eg. "f" or "p") could clash with internal
parameter names in the print routines, leading to syntax errors when you
compile the files generated by datadec.
.PP
Someday I'll get it to free up the types too!
.PP
And, finally, one day I'll have to write the C++ and Java versions :-)

.SH "AUTHOR"
Duncan C. White, D.White@surrey.ac.uk.