c-datadec.man 8.66 KB
Newer Older
css1dw's avatar
css1dw committed
1 2
.TH DATADEC L
.SH NAME
css1dw's avatar
css1dw committed
3
datadec \- ANSI C data declaration module constructor
css1dw's avatar
css1dw committed
4 5
.SH SYNOPSIS
datadec
6
.RB [\- vfno ]
7
.RB [\- s FUNCTIONNAME...]
css1dw's avatar
css1dw committed
8
.I basename
9 10 11 12 13 14 15 16
.RB infile
.PP
OR
.PP
datadec
.RB [\- m ]
.RB infile

css1dw's avatar
css1dw committed
17 18
.SH DESCRIPTION
.B Datadec
19 20
takes an input file containing a series of Haskell style recursive (or
inductive) datatype declarations - with optional hints on printing and freeing,
css1dw's avatar
css1dw committed
21
and builds an definition/implementation pair of ANSI C files \-
css1dw's avatar
css1dw committed
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
.I "basename.h"
and
.I "basename.c"
containing data declarations,
constructor functions, deconstructor functions and printing functions.

.PP
The two files produced together form a module that implements the relevent
data types.

.SH "OPTIONS"
.TP 8
.B "\-v"
enter verbose mode.
.I "Datadec"
now displays the data types that it parses, along with various almost
certainly useless bits of information about optimization.
.TP
40 41 42 43
.B "\-f"
Enable experimental generation of free_TYPE() functions.
See the later section on free functions.
.TP
css1dw's avatar
css1dw committed
44 45 46 47 48
.B "\-n"
do not perform various optimizations.
.TP
.B "\-o"
perform optimizations (the default).
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67
.TP
.B "\-m"
Do **NOT** produce the normal module.  Instead, produce
.I "meta-data"
on stdout.  This lists, for each type and shape, the typename,
the shapename, and a comma-separated list of the shape parameter
types.
.TP
.B "\-s FUNCTIONNAME"
Add
.I FUNCTIONNAME
to the set of functions to be suppressed, i.e. not generated in the
output .c file.
Currently this is only implemented for print_TYPE functions,
but it would be trivial to implement for constructors, deconstructors
and free functions too.
The usual reason for using this suppression feature is because you
have provided a manually constructed optimized print function in the
GLOBAL section of the datadec input file.
css1dw's avatar
css1dw committed
68 69 70 71 72 73 74 75

.SH "AN EXAMPLE"
.PP
The simplest use is to prepare an input file, such as
.I "data.in,"
which might (for example) contain:
.nf
TYPE {
76 77 78 79
        intlist =  nil or cons( int first, intlist next );
        illist  =  nil or cons( intlist first, illist next );
        idtree  =  leaf( string id )
                or node( idtree left, idtree right );
css1dw's avatar
css1dw committed
80 81 82 83 84 85 86 87 88 89 90
}
.fi
To generate C code implementing these types, invoke:
.nf
     datadec eek data.in
.fi
which generates
.I "eek.h"
and
.I "eek.c"

91 92 93 94 95 96 97
.fi
If you wanted to suppress print_idtree and print_illist from eek.c
(while leaving their prototypes in eek.h), you would run:
.nf
     datadec -s print_idtree -s print_illist eek data.in
.fi

css1dw's avatar
css1dw committed
98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139
.SH THE DATA DECLARATION LANGUAGE

The language accepted by datadec is split into two components:
the "outer language" is patterned after
the GMD compiler tools
.B "LALR"
and
.B "REX"
(similar to Yacc and Lex)
and allows you to specify four sections (only the last is compulsory):

.PP
.nf
.B "[ EXPORT { free_format_text } ]"
.br
.B "[ GLOBAL { free_format_text } ]"
.br
.B "[ BEGIN { free_format_text } ]"
.br
.B "TYPE { types }"
.fi

.PP
The contents of the
.I "export"
section are placed in the header file (the .h).
Commonly, you may wish to add extern function declarations, public types and
external variable declarations
which must be
placed at the top of the header file, and also define some additional
procedures using the automatically generated types which must be placed after
the type declarations!
To achieve this, you should place a `@@' in the export section -- the text up
to that point is placed at the top of the header file, whereas the text
after it is placed at the bottom of the header file -- after all the types
have been defined.

.PP
Similarly, the contents of the 
.I "global"
section are placed in the C file,
again with `@@' being used to split the global section into "top of file" and
140
"bottom of file" pieces.
141 142 143 144 145 146
Note that the "top of file" piece is placed ABOVE the '#include "thismodule.h"
and thus CANNOT use the inductive data types themselves, however this makes it
the perfect place to add #include's to allow the data types to use other
types as fields in the shapes.  For example, should one of your inductive
data types use a "set h" parameter in a shape, #include "set.h" might need
to go into GLOBAL above the @@.
css1dw's avatar
css1dw committed
147 148 149 150 151 152 153 154 155 156 157 158 159 160 161

.PP
Similarly, the contents of the 
.I "begin"
section are placed in an initialization procedure, which the user of the
constructed module must remember to call at an appropriate juncture (eg.
immediately when main starts).

.PP
The
.I "types"
section contains the type declarations themselves - the inner language.

.PP
The "inner language" - that of specifying the actual types section -
162
is closely modelled on Haskell, Miranda or Hope, with printing rules added.
css1dw's avatar
css1dw committed
163 164 165 166 167 168 169 170 171 172 173 174
Here is the grammar:

.PP
.nf
types	= list(type)
.br
type 	= type_name '=' shape list( 'or' shape ) ';'
.br
shape	= constructor_name [ '(' params ') ] [ print ]
.br
params	= param list( ',' param)
.br
175
param	= ['-'] type_name param_name
css1dw's avatar
css1dw committed
176 177 178 179 180 181 182 183 184
.br
print	= list(element)
.br
element	= number | string_literal
.fi

.PP
Note that each data type is terminated by a semicolon,
and that (within one data type) each shape is separated from the next by 'or'
185
(just like the '|' in Haskell).
css1dw's avatar
css1dw committed
186 187 188 189 190 191 192 193 194
If a particular shape has parameters, they are separated from each other
by commas.
Each type name is simply an identifier.

.PP
.I "Datadec"
also generates routines to write each type to an open FILE *.
The method of printing each shape is governed by the presence or absence
of a print rule.  If no print rule is given, the constructor name is printed,
195 196 197
then an open bracket,
then each parameter is written out using the appropriate print routine,
with commas separating the parameters, then a close bracket.
css1dw's avatar
css1dw committed
198 199 200 201 202 203 204 205 206 207 208 209

.PP
If a print rule is given, each print element
(these are syntactically separated by whitespace)
is used to generate the write routine as follows:
A literal string will simply be printed
(well, '\\n' is turned into a newline!),
whereas a number (eg. 4) means that the
4th parameter is printed (invoking the print function for that routine).

.PP
For example, we could augment the
210
.I "idtree"
css1dw's avatar
css1dw committed
211 212 213 214
type from the example given above with print rules:

.nf
TYPE {
215 216
idtree  =  leaf( string id )                    "leaf(" 1 ")"
        or node( idtree left, idtree right )    "node( l=" 1 ", r=" 2 " )";
css1dw's avatar
css1dw committed
217 218 219 220
}
.fi

.PP
221
Now, an idtree constructed as
css1dw's avatar
css1dw committed
222
.nf
223
node( leaf( "hello" ), node( leaf( "there" ), leaf( "how are you" ) ) )
css1dw's avatar
css1dw committed
224 225 226
.fi
would print as:
.nf
227
node( l=leaf("hello"), r=node(l=leaf("there"),r=leaf("how are you")))
css1dw's avatar
css1dw committed
228 229 230 231
.fi

.SH SEE ALSO
.nf
232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267
LALR, REX, Haskell Language Definition.
.fi

.SH FREE FUNCTIONS (EXPERIMENTAL)

May 2014: After saying for 20 years that
"I must implement the missing free_TYPE functions sometime",
I have now experimentally added support for this. Currently
you have to enable the generation of these via '-f',
.I "Datadec"
will then generate a series of free_TYPE( TYPE t ) functions,
essentially these will perform a post-order tree traversal,
recursively freeing every non-basic parameter in each shape.
(Basic types, not needing freeing, are things like int, long,
BOOL etc).

.PP
However, in C you often share pointers (most obviously to readonly
string literals, but whole data structures can be shared),
so the user may or may not want to free these.  To handle this,
the above grammar has been extended to allow optional '-' hints
(meaning "never free this") on parameters within shapes.

.PP
So, for instance, given the type:

.nf
TYPE {
strlist  =  nil
        or cons( string h, strlist t )
}
.fi

the corresponding free_strlist() function will attempt to
.nf
free_string( head_of_list );
css1dw's avatar
css1dw committed
268 269
.fi

270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301
.PP
(assuming that in the GLOBAL section you will then provide
a function or macro to implement free_string(), as in:

.nf
#define free_string(s) free(s)
.fi

.PP
But if you prepend a '-' to the "string h" part of the type, i.e.:

.nf
TYPE {
strlist  =  nil
        or cons( -string h, strlist t )
}
.fi

.PP
then no call to free_string( head_of_list ) will be made.

.PP
However, even with this mechanism, it is still incredibly easy to
share pointers when building data structures, such that you
.I sometimes
want to free a parameter, and sometimes don't.
I am still thinking about this.
For now, I recommend that you compile all datadec generated
modules (with free functions in) against a memory checking
system like memlib, or use a tool like valgrind's memcheck
to validate your memory allocation and deallocation.

302 303 304 305 306 307
.SH MISSING FEATURES
.PP
A cool extra would be a sprint_TYPE function to print into a string.
This should be a trivial modification on the code that prints to a file,
of course we don't know how long the generated string will need to be.

css1dw's avatar
css1dw committed
308 309 310 311 312
.SH BUGS
Some single letter typenames (eg. "f" or "p") could clash with internal
parameter names in the print routines, leading to syntax errors when you
compile the files generated by datadec.
.PP
313
And, finally, one day I'll have to write the Perl, C++ and (perhaps) Java versions :-)
css1dw's avatar
css1dw committed
314 315

.SH "AUTHOR"
316
Duncan C. White, D.White@imperial.ac.uk.