c-datadec.man 7.36 KB
Newer Older
css1dw's avatar
css1dw committed
1 2
.TH DATADEC L
.SH NAME
css1dw's avatar
css1dw committed
3
datadec \- ANSI C data declaration module constructor
css1dw's avatar
css1dw committed
4 5
.SH SYNOPSIS
datadec
6
.RB [\- vfno ]
css1dw's avatar
css1dw committed
7 8 9 10 11
.I basename
.RB [infile]
.SH DESCRIPTION
.B Datadec
takes an input file - or stdin if no input file is given -
12 13
containing a series of Haskell style recursive (or inductive) datatype
declarations - with optional hints on printing and freeing,
css1dw's avatar
css1dw committed
14
and builds an definition/implementation pair of ANSI C files \-
css1dw's avatar
css1dw committed
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
.I "basename.h"
and
.I "basename.c"
containing data declarations,
constructor functions, deconstructor functions and printing functions.

.PP
The two files produced together form a module that implements the relevent
data types.

.SH "OPTIONS"
.TP 8
.B "\-v"
enter verbose mode.
.I "Datadec"
now displays the data types that it parses, along with various almost
certainly useless bits of information about optimization.
.TP
33 34 35 36
.B "\-f"
Enable experimental generation of free_TYPE() functions.
See the later section on free functions.
.TP
css1dw's avatar
css1dw committed
37 38 39 40 41 42 43 44 45 46 47 48 49
.B "\-n"
do not perform various optimizations.
.TP
.B "\-o"
perform optimizations (the default).

.SH "AN EXAMPLE"
.PP
The simplest use is to prepare an input file, such as
.I "data.in,"
which might (for example) contain:
.nf
TYPE {
50 51 52 53
        intlist =  nil or cons( int first, intlist next );
        illist  =  nil or cons( intlist first, illist next );
        idtree  =  leaf( string id )
                or node( idtree left, idtree right );
css1dw's avatar
css1dw committed
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106
}
.fi
To generate C code implementing these types, invoke:
.nf
     datadec eek data.in
.fi
which generates
.I "eek.h"
and
.I "eek.c"

.SH THE DATA DECLARATION LANGUAGE

The language accepted by datadec is split into two components:
the "outer language" is patterned after
the GMD compiler tools
.B "LALR"
and
.B "REX"
(similar to Yacc and Lex)
and allows you to specify four sections (only the last is compulsory):

.PP
.nf
.B "[ EXPORT { free_format_text } ]"
.br
.B "[ GLOBAL { free_format_text } ]"
.br
.B "[ BEGIN { free_format_text } ]"
.br
.B "TYPE { types }"
.fi

.PP
The contents of the
.I "export"
section are placed in the header file (the .h).
Commonly, you may wish to add extern function declarations, public types and
external variable declarations
which must be
placed at the top of the header file, and also define some additional
procedures using the automatically generated types which must be placed after
the type declarations!
To achieve this, you should place a `@@' in the export section -- the text up
to that point is placed at the top of the header file, whereas the text
after it is placed at the bottom of the header file -- after all the types
have been defined.

.PP
Similarly, the contents of the 
.I "global"
section are placed in the C file,
again with `@@' being used to split the global section into "top of file" and
107
"bottom of file" pieces.
css1dw's avatar
css1dw committed
108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135

.PP
Similarly, the contents of the 
.I "begin"
section are placed in an initialization procedure, which the user of the
constructed module must remember to call at an appropriate juncture (eg.
immediately when main starts).

.PP
The
.I "types"
section contains the type declarations themselves - the inner language.

.PP
The "inner language" - that of specifying the actual types section -
is closely modelled on Miranda or Hope, with printing rules added.
Here is the grammar:

.PP
.nf
types	= list(type)
.br
type 	= type_name '=' shape list( 'or' shape ) ';'
.br
shape	= constructor_name [ '(' params ') ] [ print ]
.br
params	= param list( ',' param)
.br
136
param	= ['-'] type_name param_name
css1dw's avatar
css1dw committed
137 138 139 140 141 142 143 144 145
.br
print	= list(element)
.br
element	= number | string_literal
.fi

.PP
Note that each data type is terminated by a semicolon,
and that (within one data type) each shape is separated from the next by 'or'
146
(just like the '|' in Haskell).
css1dw's avatar
css1dw committed
147 148 149 150 151 152 153 154 155
If a particular shape has parameters, they are separated from each other
by commas.
Each type name is simply an identifier.

.PP
.I "Datadec"
also generates routines to write each type to an open FILE *.
The method of printing each shape is governed by the presence or absence
of a print rule.  If no print rule is given, the constructor name is printed,
156 157 158
then an open bracket,
then each parameter is written out using the appropriate print routine,
with commas separating the parameters, then a close bracket.
css1dw's avatar
css1dw committed
159 160 161 162 163 164 165 166 167 168 169 170

.PP
If a print rule is given, each print element
(these are syntactically separated by whitespace)
is used to generate the write routine as follows:
A literal string will simply be printed
(well, '\\n' is turned into a newline!),
whereas a number (eg. 4) means that the
4th parameter is printed (invoking the print function for that routine).

.PP
For example, we could augment the
171
.I "idtree"
css1dw's avatar
css1dw committed
172 173 174 175
type from the example given above with print rules:

.nf
TYPE {
176 177
idtree  =  leaf( string id )                    "leaf(" 1 ")"
        or node( idtree left, idtree right )    "node( l=" 1 ", r=" 2 " )";
css1dw's avatar
css1dw committed
178 179 180 181
}
.fi

.PP
182
Now, an idtree constructed as
css1dw's avatar
css1dw committed
183
.nf
184
node( leaf( "hello" ), node( leaf( "there" ), leaf( "how are you" ) ) )
css1dw's avatar
css1dw committed
185 186 187
.fi
would print as:
.nf
188
node( l=leaf("hello"), r=node(l=leaf("there"),r=leaf("how are you")))
css1dw's avatar
css1dw committed
189 190 191 192
.fi

.SH SEE ALSO
.nf
193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228
LALR, REX, Haskell Language Definition.
.fi

.SH FREE FUNCTIONS (EXPERIMENTAL)

May 2014: After saying for 20 years that
"I must implement the missing free_TYPE functions sometime",
I have now experimentally added support for this. Currently
you have to enable the generation of these via '-f',
.I "Datadec"
will then generate a series of free_TYPE( TYPE t ) functions,
essentially these will perform a post-order tree traversal,
recursively freeing every non-basic parameter in each shape.
(Basic types, not needing freeing, are things like int, long,
BOOL etc).

.PP
However, in C you often share pointers (most obviously to readonly
string literals, but whole data structures can be shared),
so the user may or may not want to free these.  To handle this,
the above grammar has been extended to allow optional '-' hints
(meaning "never free this") on parameters within shapes.

.PP
So, for instance, given the type:

.nf
TYPE {
strlist  =  nil
        or cons( string h, strlist t )
}
.fi

the corresponding free_strlist() function will attempt to
.nf
free_string( head_of_list );
css1dw's avatar
css1dw committed
229 230
.fi

231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262
.PP
(assuming that in the GLOBAL section you will then provide
a function or macro to implement free_string(), as in:

.nf
#define free_string(s) free(s)
.fi

.PP
But if you prepend a '-' to the "string h" part of the type, i.e.:

.nf
TYPE {
strlist  =  nil
        or cons( -string h, strlist t )
}
.fi

.PP
then no call to free_string( head_of_list ) will be made.

.PP
However, even with this mechanism, it is still incredibly easy to
share pointers when building data structures, such that you
.I sometimes
want to free a parameter, and sometimes don't.
I am still thinking about this.
For now, I recommend that you compile all datadec generated
modules (with free functions in) against a memory checking
system like memlib, or use a tool like valgrind's memcheck
to validate your memory allocation and deallocation.

263 264 265 266 267 268
.SH MISSING FEATURES
.PP
A cool extra would be a sprint_TYPE function to print into a string.
This should be a trivial modification on the code that prints to a file,
of course we don't know how long the generated string will need to be.

css1dw's avatar
css1dw committed
269 270 271 272 273
.SH BUGS
Some single letter typenames (eg. "f" or "p") could clash with internal
parameter names in the print routines, leading to syntax errors when you
compile the files generated by datadec.
.PP
274
And, finally, one day I'll have to write the Perl, C++ and (perhaps) Java versions :-)
css1dw's avatar
css1dw committed
275 276

.SH "AUTHOR"
277
Duncan C. White, D.White@imperial.ac.uk.