123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342 |
- .\" Copyright (C) 2005-2008 G.P. Halkes
- .\" This program is free software: you can redistribute it and/or modify
- .\" it under the terms of the GNU General Public License version 3, as
- .\" published by the Free Software Foundation.
- .\"
- .\" This program is distributed in the hope that it will be useful,
- .\" but WITHOUT ANY WARRANTY; without even the implied warranty of
- .\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- .\" GNU General Public License for more details.
- .\"
- .\" You should have received a copy of the GNU General Public License
- .\" along with this program. If not, see <http://www.gnu.org/licenses/>.
- .TH "LLnextgen" "1" "31-12-2011" "Version 0.5.5" "LLnextgen parser generator"
- .hw /usr/share/doc/LLnextgen-0.5.5 http://os.ghalkes.nl/LLnextgen
- .SH NAME
- \fBLLnextgen\fP \- an Extended-LL(1) parser generator
- .SH SYNOPSIS
- \fBLLnextgen\fP [\fIOPTIONS\fP] [\fIFILES\fP]
- .SH DESCRIPTION
- \fBLLnextgen\fP is a (partial) reimplementation of the \fBLLgen\fP ELL(1)
- parser generator created by D.\~Grune and C.J.H.\~Jacobs (note: this is not the
- same as the \fBLLgen\fP parser generator by Fischer and LeBlanc). It takes an
- EBNF-like description of the grammar as input(s), and produces a parser in C.
- .PP
- Input files are expected to end in .g. The output files will have .g removed
- and .c and .h added. If the input file does not end in .g, the extensions .c
- and .h will simply be added to the name of the input file. Output files can also
- be given a different base name using the option \-\-base\-name (see below).
- .SH OPTIONS
- \fBLLnextgen\fP accepts the following options:
- .IP "\fB\-c\fP, \fB\-\-max\-compatibility\fP"
- Set options required for maximum source-level compatibility. This is different
- from running as \fBLLgen\fP, as all extensions are still allowed. LLreissue and
- the prototypes in the header file are still generated. This option turns on the
- \fB\-\-llgen\-arg\-style\fP, \fB\-\-llgen\-escapes\-only\fP and
- \fB\-\-llgen\-output\-style\fP options.
- .IP "\fB\-e\fP, \fB\-\-warnings\-as\-errors\fP"
- Treat warnings as errors.
- .IP "\fB\-E\fP\fInum\fP, \fB\-\-error\-limit\fP=\fInum\fP"
- Set the maximum number of errors, before \fBLLnextgen\fP aborts. If \fInum\fP
- is set 0, the error limit is set to infinity. This is to override the error
- limit option specified in the grammar file.
- .IP "\fB\-h\fP[\fIwhich\fP], \fB\-\-help\fP[=\fIwhich\fP]"
- Print out a help message, describing the options. The optional \fIwhich\fP
- argument allows selection of which options to print. \fIwhich\fP can be set to
- all, depend, error, and extra.
- .IP "\fB\-V\fP, \fB\-\-version\fP"
- Print the program version and copyright information, and exit.
- .IP "\fB\-v\fP[\fIlevel\fP], \fB\-\-verbose\fP[=\fIlevel\fP]"
- Increase (without explicit level) or set (with explicit level) the verbosity
- level. \fBLLnextgen\fP uses this option differently than \fBLLgen\fP. At
- level\~1, \fBLLnextgen\fP will output traces of the conflicts to standard
- error. At level\~2, \fBLLnextgen\fP will also write a file named LL.output with
- the rules containing conflicts. At level\~3, \fBLLnextgen\fP will include the
- entire grammar in LL.output.
- .br
- \fBLLgen\fP will write the LL.output file from level\~1, but cannot generate
- conflict traces. It also has an intermediate setting between \fBLLnextgen\fP
- levels\~2 and\~3.
- .IP "\fB\-w\fP[\fIwarnings\fP], \fB\-\-suppress\-warnings\fP[=\fIwarnings\fP]"
- Suppress all or selected warnings. Available warnings are: arg-separator,
- option-override, unbalanced-c, multiple-parser, eofile, unused[:<identifier>],
- datatype and unused-retval. The unused warning can suppress all warnings about
- unused tokens and non-terminals, or can be used to suppress warnings about
- specific tokens or non-terminals by adding a colon and a name. For example,
- to suppress warning messages about FOO not being used, use
- \fB\-wunused:FOO\fP. Several comma separated warnings can be specified with
- one option on the command line.
- .IP "\fB\-\-abort\fP"
- Generate the LLabort function.
- .IP "\fB\-\-base\-name\fP=\fIname\fP"
- Set the base name for the output files. Normally \fBLLnextgen\fP uses the name
- of the first input file without any trailing\~.g as the base name. This option
- can be used to override the default. The files created will be \fIname\fP.c
- and \fIname\fP.h.
- This option cannot be used in combination with
- \fB\-\-llgen\-output\-style\fP.
- .IP "\fB\-\-depend\fP[=\fImodifiers\fP]"
- Generate dependency information to be used by the \fBmake\fP(1) program. The
- modifiers can be used to change the make targets (targets:<targets>, and
- extra-targets:<targets>) and the output (file:<file>). The default are to use
- the output names as they would be created by running with the same arguments as
- targets, and to output to standard output. Using the targets modifier, the list
- of targets can be specified manually. The extra-targets modifier allows targets
- to be added to the default list of targets. Finally, the phony modifier will add
- phony targets for all dependencies to avoid \fBmake\fP(1) problems when removing
- or renaming dependencies. This is like the \fBgcc\fP(1) -MP option.
- .IP "\fB\-\-depend-cpp\fP"
- Dump all top-level C-code to standard out. This can be used to generate
- dependency information for the generated files by piping the output from
- \fBLLnextgen\fP through the C preprocessor with the appropriate options.
- .IP "\fB\-\-dump\-lexer\-wrapper\fP"
- Write the lexer wrapper function to standard output, and exit.
- .IP "\fB\-\-dump\-llmessage\fP"
- Write the default LLmessage function to standard output, and exit.
- .IP "\fB\-\-dump\-tokens\fP[=\fImodifier\fP]"
- Dump %token directives for unknown identifiers that match the
- \fB\-\-token\-pattern\fP pattern. The default is to generate a single %token
- directive with all the unknown identifiers separated by comma's. This default
- can be overridden by \fImodifier\fP. The modifier \fIseparate\fP produces a
- separate %token directive for each identifier, while \fIlabel\fP produces a
- %label directive. The text of the label will be the name of the identifier.
- If the \fIlabel\fP modifier and the \fB\-\-lowercase\-symbols\fP option are
- both specified the label will contain only lowercase characters.
- .br
- Note: this option is not always available. It requires the POSIX regex API. If
- the POSIX regex API is not available on your platform, or the \fBLLnextgen\fP
- binary was compiled without support for the API, you will not be able to use
- this option.
- .IP "\fB\-\-extensions\fP=\fIlist\fP"
- Specify the extensions to be used for the generated files. The list must be comma
- separated, and should not contain the . before the extension. The first item in
- the list is the C source file and the second item is the header file. You can
- omit the extension for the C source file and only specify the extension for the
- header file.
- .IP "\fB\-\-generate\-lexer\-wrapper\fP[=\fIyes|no\fP]"
- Indicate whether to generate a wrapper for the lexical analyser. As
- \fBLLnextgen\fP requires a lexical analyser to return the last token returned
- after detecting an error which requires inserting a token to repair, most
- lexical analysers require a wrapper to accommodate \fBLLnextgen\fP. As it is
- identical for almost each grammar, \fBLLnextgen\fP can provide one. Use
- \fB\-\-dump\-lexer\-wrapper\fP to see the code. If you do specifiy this option
- \fBLLnextgen\fP will generate a warning, to help remind you that a wrapper is
- required.
- .br
- If you do not want the automatically generate wrapper you should
- specifiy this option followed by \fB=no\fP.
- .IP "\fB\-\-generate\-llmessage\fP"
- Generate an \fILLmessage\fP function. \fBLLnextgen\fP requires programs to
- provide a function for informing the user about errors in the input. When
- developing a parser, it is often desirable to have a default \fILLmessage\fP.
- The provided \fILLmessage\fP is very simple and should be replaced by a more
- elaborate one, once the parser is beyond the first testing phase. Use
- \fB\-\-dump\-llmessage\fP to see the code. This option automatically
- turns on \fB\-\-generate\-symbol\-table\fP.
- .IP "\fB\-\-generate\-symbol\-table\fP"
- Generate a symbol table. The symbol table will contain strings for all
- tokens and character literals. By default, the symbol table contains the token
- name as specified in the grammar. To change the string, for both tokens and
- character literals, use the %label directive.
- .IP "\fB\-\-gettext\fP[=\fImacro,guard\fP]"
- Add gettext support. A macro call is added around symbol table entries
- generated from %label directives. The macro will expand to the string itself.
- This is meant to allow \fBxgettext\fP(1) to extract the strings. The default is
- N_, because that is what most people use. A guard will be included such that
- compilation without gettext is possible by not defining the guard. The guard
- is set to USE_NLS by default. Translations will be done automatically in
- LLgetSymbol in the generated parser through a call to gettext.
- .IP "\fB\-\-keep\-dir\fP"
- Do not remove directory component of the input file-name when creating the
- output file-name. By default, outputs are created in the current directory.
- This option will generate the output in the directory of the input.
- .IP "\fB\-\-llgen\-arg\-style\fP"
- Use semicolons as argument separators in rule headers. \fBLLnextgen\fP uses
- comma's by default, as this is what ANSI C does.
- .IP "\fB\-\-llgen\-escapes\-only\fP"
- Only allow the escape sequences defined by \fBLLgen\fP in character literals.
- By default \fBLLnextgen\fP also allows \\a, \\v, \\?, \\", and hexadecimal
- constants with \\x.
- .IP "\fB\-\-llgen\-output\-style\fP"
- Generate one .c output per input, and the files Lpars.c and Lpars.h, instead of
- one .c and one .h file based on the name of the first input.
- .IP "\fB\-\-lowercase\-symbols\fP"
- Convert the token names used for generating the symbol table to lower case.
- This only applies to tokens for which no %label directive has been specified.
- .IP "\fB\-\-no\-allow\-label\-create\fP"
- Do not allow the %label directive to create new tokens. Note that this requires
- that the token being labelled is either a character literal or a %token
- directive creating the named token has preceded the %label directive.
- .IP "\fB\-\-no\-arg\-count\fP"
- Do not check argument counts for rules. LLnextgen checks whether a rule is
- used with the same number of arguments as it is defined. LLnextgen also checks
- that any rules for which a %start directive is specified, the number of
- arguments is 0.
- .IP "\fB\-\-no\-eof\-zero\fP"
- Do not use 0 as end-of-file token. \fB(f)lex\fP(1) uses 0 as the
- end-of-file token. Other lexical-analyser generators may use \-1, and may
- use 0 for something else (e.g. the nul character).
- .IP "\fB\-\-no\-init\-llretval\fP"
- Do not initialise \fBLLretval\fP with 0 bytes. Note that you have to take care
- of initialisation of \fBLLretval\fP yourself when using this option.
- .IP "\fB\-\-no\-line\-directives\fP"
- Do not generate \fI#line\fP directives in the output. This means all errors will
- be reported relative to the output file. By default \fBLLnextgen\fP generates
- \fI#line\fP directives to make the C compiler generate errors relative to the
- \fBLLnextgen\fP input file.
- .IP "\fB\-\-no\-llreissue\fP"
- Do not generate the \fILLreissue\fP variable, which is used to indicate when a
- token should be reissued by the lexical analyser.
- .IP "\fB\-\-no\-prototypes\-header\fP"
- Do not generate prototypes for the parser and other functions in the header
- file.
- .IP "\fB\-\-not\-only\-reachable\fP"
- Do not only analyse reachable rules. \fBLLnextgen\fP by default does not take
- unreachable rules into account when doing conflict analysis, as these can cause
- spurious conflicts. However, if the unreachable rules will be used in the
- future, one might already want to be notified of problems with these rules.
- \fBLLgen\fP by default does analyse unreachable rules.
- .br
- Note: in the case where a rule is unreachable because the only alternative of
- another reachable rule that mentions it is never chosen (because of a %avoid
- directive), the rule is still deemed reachable for the analysis. The only way
- to avoid this behaviour is by doing the complete analysis twice, which is an
- excessive amount of work to do for a very rare case.
- .IP "\fB\-\-reentrant\fP"
- Generate a reentrant parser. By default, \fBLLnextgen\fP generates
- non-reentrant parsers. A reentrant parser can be called from itself, but not
- from another thread. Use \-\-thread\-safe to generate a thread-safe parser.
- .br
- Note that when multiple parsers are specified in one grammar (using multiple
- %start directives), and one of these parsers calls another, either the
- \-\-reentrant option or the \-\-thread-safe option is also required. If these
- parsers are only called when none of the others is running, the option is not
- necessary.
- .br
- Use only in combination with a reentrant lexical analyser.
- .IP "\fB\-\-show\-dir\fP"
- Show directory names of source files in error and warning messages. These are
- usually omitted for readability, but may sometimes be necessary for tracing
- errors.
- .IP "\fB\-\-thread\-safe\fP"
- Generate a thread-safe parser. Thread-safe parsers can be run in parallel in
- different threads of the same program. The interface of a thread-safe parser is
- different from the regular (and then reentrant) version. See the detailed manual
- for more details.
- .IP "\fB\-\-token\-pattern\fP=\fIpattern\fP"
- Specify a regular expression to match with unknown identifiers used in the
- grammar. If an unknown identifier matches, \fBLLnextgen\fP will generate a
- token declaration for the identifier. This option is primarily implemented to
- aid in the first stages of development, to allow for quick testing for conflicts
- without having to specify all the tokens yet. A list of tokens can be generated
- with the \fB\-\-dump\-tokens\fP option.
- .br
- Note: this option is not always available. It requires the POSIX regex API. If
- the POSIX regex API is not available on your platform, or the \fBLLnextgen\fP
- binary was compiled without support for the API, you will not be able to use
- this option.
- .PP
- By running \fBLLnextgen\fP using the name \fBLLgen\fP, \fBLLnextgen\fP goes
- into \fBLLgen\fP-mode. This is implemented by turning off all default extra
- functionality like \fILLreissue\fP, and disallowing all extensions to the
- \fBLLgen\fP language. When running as \fBLLgen\fP, \fBLLnextgen\fP accepts the
- following options from \fBLLgen\fP:
- .IP "\fB\-a\fP"
- Ignored. \fBLLnextgen\fP only generates ANSI C.
- .IP "\fB\-h\fP\fInum\fP"
- Ignored. \fBLLnextgen\fP leaves optimisation of jump tables entirely
- up to the C\-compiler.
- .IP "\fB\-j\fP[\fInum\fP]"
- Ignored. \fBLLnextgen\fP leaves optimisation of jump tables
- entirely up to the C\-compiler.
- .IP "\fB\-l\fP[\fInum\fP]"
- Ignored. \fBLLnextgen\fP leaves optimisation of jump tables entirely
- up to the C\-compiler.
- .IP "\fB\-v\fP"
- Increase the verbosity level. See the description of the \fB\-v\fP option
- above for details.
- .IP "\fB\-w\fP"
- Suppress all warnings.
- .IP "\fB\-x\fP"
- Ignored. \fBLLnextgen\fP will only generate token sets in LL.output.
- The extensive error-reporting mechanisms in \fBLLnextgen\fP make this feature
- obsolete.
- .PP
- \fBLLnextgen\fP cannot create parsers with non-correcting error-recovery.
- Therefore, using the \fB\-n\fP or \fB\-s\fP options will cause \fBLLnextgen\fP
- to print an error message and exit.
- .SH COMPATIBILITY WITH LLGEN
- At this time the basic \fBLLgen\fP functionality is implemented. This includes
- everything apart from the extended user error-handling with the %onerror
- directive and the non-correcting error-recovery.
- .PP
- Although I've tried to copy the behaviour of \fBLLgen\fP accurately, I have
- implemented some aspects slightly differently. The following is a list of the
- differences in behaviour between \fBLLgen\fP and \fBLLnextgen\fP:
- .IP "\fB\(bu\fP"
- \fBLLgen\fP generated both K&R style C code and ANSI C code. \fBLLnextgen\fP
- only supports generation of ANSI C code.
- .IP "\fB\(bu\fP"
- There is a minor difference in the determination of the default choices.
- \fBLLnextgen\fP simply chooses the first production with the shortest possible
- terminal production, while \fBLLgen\fP also takes the complexity in terms of
- non-terminals and terms into account. There is also a minor difference when
- there is more than one shortest alternative and some of them are marked with
- %avoid. Both differences are not very important as the user can specify
- which alternative should be the default, thereby circumventing the
- differences in the algorithms.
- .IP "\fB\(bu\fP"
- The default behaviour of generating one output C file per input and Lpars.c
- and Lpars.h has been changed in favour of generating one\~.c file and one\~.h
- file. The rationale given for creating multiple output files in the first
- place was that it would reduce the compilation time for the generated
- parser. As computation power has become much more abundant this feature is
- no longer necessary, and the difficult interaction with the make program
- makes it undesirable. The \fBLLgen\fP behaviour is still supported through a
- command-line switch.
- .IP "\fB\(bu\fP"
- in \fBLLgen\fP one could have a parser and a %first macro with the same name.
- \fBLLnextgen\fP forbids this, as it leads to name collisions in the new file
- naming scheme. For the old \fBLLgen\fP file naming scheme it could also easily
- lead to name collisions, although they could be circumvented by not mentioning
- the parser in any of the C code in the\~.g files.
- .IP "\fB\(bu\fP"
- \fBLLgen\fP names the labels it generates L_X, where X is a number.
- \fBLLnextgen\fP names these LL_X.
- .IP "\fB\(bu\fP"
- \fBLLgen\fP parsers are always reentrant. As this feature is not used very
- often, \fBLLnextgen\fP parsers are non-reentrant unless the option
- \fB\-\-reentrant\fP is used.
- .PP
- Furthermore, \fBLLnextgen\fP has many extended features, for easier development.
- .SH BUGS
- If you think you have found a bug, please check that you are using the latest
- version of \fBLLnextgen\fP [http://os.ghalkes.nl/LLnextgen]. When reporting
- bugs, please include a minimal grammar that demonstrates the problem.
- .SH AUTHOR
- G.P. Halkes <llnextgen@ghalkes.nl>
- .SH COPYRIGHT
- Copyright \(co 2005-2008 G.P. Halkes
- .br
- LLnextgen is licensed under the GNU General Public License version 3.
- .br
- For more details on the license, see the file COPYING in the documentation
- directory. On Un*x systems this is usually /usr/share/doc/LLnextgen-0.5.5.
- .SH SEE ALSO
- \fBLLgen\fP(1), \fBbison\fP(1), \fByacc\fP(1), \fBlex\fP(1), \fBflex\fP(1).
- .PP
- A detailed manual for \fBLLnextgen\fP is available as part of the distribution.
- It includes the syntax for the grammar files, details on how to use the
- generated parser in your programs, and details on the workings of the generated
- parsers. This manual can be found in the documentation directory. On Un*x
- systems this is usually /usr/share/doc/LLnextgen-0.5.5.
|