LLnextgen.1 18 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342
  1. .\" Copyright (C) 2005-2008 G.P. Halkes
  2. .\" This program is free software: you can redistribute it and/or modify
  3. .\" it under the terms of the GNU General Public License version 3, as
  4. .\" published by the Free Software Foundation.
  5. .\"
  6. .\" This program is distributed in the hope that it will be useful,
  7. .\" but WITHOUT ANY WARRANTY; without even the implied warranty of
  8. .\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
  9. .\" GNU General Public License for more details.
  10. .\"
  11. .\" You should have received a copy of the GNU General Public License
  12. .\" along with this program. If not, see <http://www.gnu.org/licenses/>.
  13. .TH "LLnextgen" "1" "31-12-2011" "Version 0.5.5" "LLnextgen parser generator"
  14. .hw /usr/share/doc/LLnextgen-0.5.5 http://os.ghalkes.nl/LLnextgen
  15. .SH NAME
  16. \fBLLnextgen\fP \- an Extended-LL(1) parser generator
  17. .SH SYNOPSIS
  18. \fBLLnextgen\fP [\fIOPTIONS\fP] [\fIFILES\fP]
  19. .SH DESCRIPTION
  20. \fBLLnextgen\fP is a (partial) reimplementation of the \fBLLgen\fP ELL(1)
  21. parser generator created by D.\~Grune and C.J.H.\~Jacobs (note: this is not the
  22. same as the \fBLLgen\fP parser generator by Fischer and LeBlanc). It takes an
  23. EBNF-like description of the grammar as input(s), and produces a parser in C.
  24. .PP
  25. Input files are expected to end in .g. The output files will have .g removed
  26. and .c and .h added. If the input file does not end in .g, the extensions .c
  27. and .h will simply be added to the name of the input file. Output files can also
  28. be given a different base name using the option \-\-base\-name (see below).
  29. .SH OPTIONS
  30. \fBLLnextgen\fP accepts the following options:
  31. .IP "\fB\-c\fP, \fB\-\-max\-compatibility\fP"
  32. Set options required for maximum source-level compatibility. This is different
  33. from running as \fBLLgen\fP, as all extensions are still allowed. LLreissue and
  34. the prototypes in the header file are still generated. This option turns on the
  35. \fB\-\-llgen\-arg\-style\fP, \fB\-\-llgen\-escapes\-only\fP and
  36. \fB\-\-llgen\-output\-style\fP options.
  37. .IP "\fB\-e\fP, \fB\-\-warnings\-as\-errors\fP"
  38. Treat warnings as errors.
  39. .IP "\fB\-E\fP\fInum\fP, \fB\-\-error\-limit\fP=\fInum\fP"
  40. Set the maximum number of errors, before \fBLLnextgen\fP aborts. If \fInum\fP
  41. is set 0, the error limit is set to infinity. This is to override the error
  42. limit option specified in the grammar file.
  43. .IP "\fB\-h\fP[\fIwhich\fP], \fB\-\-help\fP[=\fIwhich\fP]"
  44. Print out a help message, describing the options. The optional \fIwhich\fP
  45. argument allows selection of which options to print. \fIwhich\fP can be set to
  46. all, depend, error, and extra.
  47. .IP "\fB\-V\fP, \fB\-\-version\fP"
  48. Print the program version and copyright information, and exit.
  49. .IP "\fB\-v\fP[\fIlevel\fP], \fB\-\-verbose\fP[=\fIlevel\fP]"
  50. Increase (without explicit level) or set (with explicit level) the verbosity
  51. level. \fBLLnextgen\fP uses this option differently than \fBLLgen\fP. At
  52. level\~1, \fBLLnextgen\fP will output traces of the conflicts to standard
  53. error. At level\~2, \fBLLnextgen\fP will also write a file named LL.output with
  54. the rules containing conflicts. At level\~3, \fBLLnextgen\fP will include the
  55. entire grammar in LL.output.
  56. .br
  57. \fBLLgen\fP will write the LL.output file from level\~1, but cannot generate
  58. conflict traces. It also has an intermediate setting between \fBLLnextgen\fP
  59. levels\~2 and\~3.
  60. .IP "\fB\-w\fP[\fIwarnings\fP], \fB\-\-suppress\-warnings\fP[=\fIwarnings\fP]"
  61. Suppress all or selected warnings. Available warnings are: arg-separator,
  62. option-override, unbalanced-c, multiple-parser, eofile, unused[:<identifier>],
  63. datatype and unused-retval. The unused warning can suppress all warnings about
  64. unused tokens and non-terminals, or can be used to suppress warnings about
  65. specific tokens or non-terminals by adding a colon and a name. For example,
  66. to suppress warning messages about FOO not being used, use
  67. \fB\-wunused:FOO\fP. Several comma separated warnings can be specified with
  68. one option on the command line.
  69. .IP "\fB\-\-abort\fP"
  70. Generate the LLabort function.
  71. .IP "\fB\-\-base\-name\fP=\fIname\fP"
  72. Set the base name for the output files. Normally \fBLLnextgen\fP uses the name
  73. of the first input file without any trailing\~.g as the base name. This option
  74. can be used to override the default. The files created will be \fIname\fP.c
  75. and \fIname\fP.h.
  76. This option cannot be used in combination with
  77. \fB\-\-llgen\-output\-style\fP.
  78. .IP "\fB\-\-depend\fP[=\fImodifiers\fP]"
  79. Generate dependency information to be used by the \fBmake\fP(1) program. The
  80. modifiers can be used to change the make targets (targets:<targets>, and
  81. extra-targets:<targets>) and the output (file:<file>). The default are to use
  82. the output names as they would be created by running with the same arguments as
  83. targets, and to output to standard output. Using the targets modifier, the list
  84. of targets can be specified manually. The extra-targets modifier allows targets
  85. to be added to the default list of targets. Finally, the phony modifier will add
  86. phony targets for all dependencies to avoid \fBmake\fP(1) problems when removing
  87. or renaming dependencies. This is like the \fBgcc\fP(1) -MP option.
  88. .IP "\fB\-\-depend-cpp\fP"
  89. Dump all top-level C-code to standard out. This can be used to generate
  90. dependency information for the generated files by piping the output from
  91. \fBLLnextgen\fP through the C preprocessor with the appropriate options.
  92. .IP "\fB\-\-dump\-lexer\-wrapper\fP"
  93. Write the lexer wrapper function to standard output, and exit.
  94. .IP "\fB\-\-dump\-llmessage\fP"
  95. Write the default LLmessage function to standard output, and exit.
  96. .IP "\fB\-\-dump\-tokens\fP[=\fImodifier\fP]"
  97. Dump %token directives for unknown identifiers that match the
  98. \fB\-\-token\-pattern\fP pattern. The default is to generate a single %token
  99. directive with all the unknown identifiers separated by comma's. This default
  100. can be overridden by \fImodifier\fP. The modifier \fIseparate\fP produces a
  101. separate %token directive for each identifier, while \fIlabel\fP produces a
  102. %label directive. The text of the label will be the name of the identifier.
  103. If the \fIlabel\fP modifier and the \fB\-\-lowercase\-symbols\fP option are
  104. both specified the label will contain only lowercase characters.
  105. .br
  106. Note: this option is not always available. It requires the POSIX regex API. If
  107. the POSIX regex API is not available on your platform, or the \fBLLnextgen\fP
  108. binary was compiled without support for the API, you will not be able to use
  109. this option.
  110. .IP "\fB\-\-extensions\fP=\fIlist\fP"
  111. Specify the extensions to be used for the generated files. The list must be comma
  112. separated, and should not contain the . before the extension. The first item in
  113. the list is the C source file and the second item is the header file. You can
  114. omit the extension for the C source file and only specify the extension for the
  115. header file.
  116. .IP "\fB\-\-generate\-lexer\-wrapper\fP[=\fIyes|no\fP]"
  117. Indicate whether to generate a wrapper for the lexical analyser. As
  118. \fBLLnextgen\fP requires a lexical analyser to return the last token returned
  119. after detecting an error which requires inserting a token to repair, most
  120. lexical analysers require a wrapper to accommodate \fBLLnextgen\fP. As it is
  121. identical for almost each grammar, \fBLLnextgen\fP can provide one. Use
  122. \fB\-\-dump\-lexer\-wrapper\fP to see the code. If you do specifiy this option
  123. \fBLLnextgen\fP will generate a warning, to help remind you that a wrapper is
  124. required.
  125. .br
  126. If you do not want the automatically generate wrapper you should
  127. specifiy this option followed by \fB=no\fP.
  128. .IP "\fB\-\-generate\-llmessage\fP"
  129. Generate an \fILLmessage\fP function. \fBLLnextgen\fP requires programs to
  130. provide a function for informing the user about errors in the input. When
  131. developing a parser, it is often desirable to have a default \fILLmessage\fP.
  132. The provided \fILLmessage\fP is very simple and should be replaced by a more
  133. elaborate one, once the parser is beyond the first testing phase. Use
  134. \fB\-\-dump\-llmessage\fP to see the code. This option automatically
  135. turns on \fB\-\-generate\-symbol\-table\fP.
  136. .IP "\fB\-\-generate\-symbol\-table\fP"
  137. Generate a symbol table. The symbol table will contain strings for all
  138. tokens and character literals. By default, the symbol table contains the token
  139. name as specified in the grammar. To change the string, for both tokens and
  140. character literals, use the %label directive.
  141. .IP "\fB\-\-gettext\fP[=\fImacro,guard\fP]"
  142. Add gettext support. A macro call is added around symbol table entries
  143. generated from %label directives. The macro will expand to the string itself.
  144. This is meant to allow \fBxgettext\fP(1) to extract the strings. The default is
  145. N_, because that is what most people use. A guard will be included such that
  146. compilation without gettext is possible by not defining the guard. The guard
  147. is set to USE_NLS by default. Translations will be done automatically in
  148. LLgetSymbol in the generated parser through a call to gettext.
  149. .IP "\fB\-\-keep\-dir\fP"
  150. Do not remove directory component of the input file-name when creating the
  151. output file-name. By default, outputs are created in the current directory.
  152. This option will generate the output in the directory of the input.
  153. .IP "\fB\-\-llgen\-arg\-style\fP"
  154. Use semicolons as argument separators in rule headers. \fBLLnextgen\fP uses
  155. comma's by default, as this is what ANSI C does.
  156. .IP "\fB\-\-llgen\-escapes\-only\fP"
  157. Only allow the escape sequences defined by \fBLLgen\fP in character literals.
  158. By default \fBLLnextgen\fP also allows \\a, \\v, \\?, \\", and hexadecimal
  159. constants with \\x.
  160. .IP "\fB\-\-llgen\-output\-style\fP"
  161. Generate one .c output per input, and the files Lpars.c and Lpars.h, instead of
  162. one .c and one .h file based on the name of the first input.
  163. .IP "\fB\-\-lowercase\-symbols\fP"
  164. Convert the token names used for generating the symbol table to lower case.
  165. This only applies to tokens for which no %label directive has been specified.
  166. .IP "\fB\-\-no\-allow\-label\-create\fP"
  167. Do not allow the %label directive to create new tokens. Note that this requires
  168. that the token being labelled is either a character literal or a %token
  169. directive creating the named token has preceded the %label directive.
  170. .IP "\fB\-\-no\-arg\-count\fP"
  171. Do not check argument counts for rules. LLnextgen checks whether a rule is
  172. used with the same number of arguments as it is defined. LLnextgen also checks
  173. that any rules for which a %start directive is specified, the number of
  174. arguments is 0.
  175. .IP "\fB\-\-no\-eof\-zero\fP"
  176. Do not use 0 as end-of-file token. \fB(f)lex\fP(1) uses 0 as the
  177. end-of-file token. Other lexical-analyser generators may use \-1, and may
  178. use 0 for something else (e.g. the nul character).
  179. .IP "\fB\-\-no\-init\-llretval\fP"
  180. Do not initialise \fBLLretval\fP with 0 bytes. Note that you have to take care
  181. of initialisation of \fBLLretval\fP yourself when using this option.
  182. .IP "\fB\-\-no\-line\-directives\fP"
  183. Do not generate \fI#line\fP directives in the output. This means all errors will
  184. be reported relative to the output file. By default \fBLLnextgen\fP generates
  185. \fI#line\fP directives to make the C compiler generate errors relative to the
  186. \fBLLnextgen\fP input file.
  187. .IP "\fB\-\-no\-llreissue\fP"
  188. Do not generate the \fILLreissue\fP variable, which is used to indicate when a
  189. token should be reissued by the lexical analyser.
  190. .IP "\fB\-\-no\-prototypes\-header\fP"
  191. Do not generate prototypes for the parser and other functions in the header
  192. file.
  193. .IP "\fB\-\-not\-only\-reachable\fP"
  194. Do not only analyse reachable rules. \fBLLnextgen\fP by default does not take
  195. unreachable rules into account when doing conflict analysis, as these can cause
  196. spurious conflicts. However, if the unreachable rules will be used in the
  197. future, one might already want to be notified of problems with these rules.
  198. \fBLLgen\fP by default does analyse unreachable rules.
  199. .br
  200. Note: in the case where a rule is unreachable because the only alternative of
  201. another reachable rule that mentions it is never chosen (because of a %avoid
  202. directive), the rule is still deemed reachable for the analysis. The only way
  203. to avoid this behaviour is by doing the complete analysis twice, which is an
  204. excessive amount of work to do for a very rare case.
  205. .IP "\fB\-\-reentrant\fP"
  206. Generate a reentrant parser. By default, \fBLLnextgen\fP generates
  207. non-reentrant parsers. A reentrant parser can be called from itself, but not
  208. from another thread. Use \-\-thread\-safe to generate a thread-safe parser.
  209. .br
  210. Note that when multiple parsers are specified in one grammar (using multiple
  211. %start directives), and one of these parsers calls another, either the
  212. \-\-reentrant option or the \-\-thread-safe option is also required. If these
  213. parsers are only called when none of the others is running, the option is not
  214. necessary.
  215. .br
  216. Use only in combination with a reentrant lexical analyser.
  217. .IP "\fB\-\-show\-dir\fP"
  218. Show directory names of source files in error and warning messages. These are
  219. usually omitted for readability, but may sometimes be necessary for tracing
  220. errors.
  221. .IP "\fB\-\-thread\-safe\fP"
  222. Generate a thread-safe parser. Thread-safe parsers can be run in parallel in
  223. different threads of the same program. The interface of a thread-safe parser is
  224. different from the regular (and then reentrant) version. See the detailed manual
  225. for more details.
  226. .IP "\fB\-\-token\-pattern\fP=\fIpattern\fP"
  227. Specify a regular expression to match with unknown identifiers used in the
  228. grammar. If an unknown identifier matches, \fBLLnextgen\fP will generate a
  229. token declaration for the identifier. This option is primarily implemented to
  230. aid in the first stages of development, to allow for quick testing for conflicts
  231. without having to specify all the tokens yet. A list of tokens can be generated
  232. with the \fB\-\-dump\-tokens\fP option.
  233. .br
  234. Note: this option is not always available. It requires the POSIX regex API. If
  235. the POSIX regex API is not available on your platform, or the \fBLLnextgen\fP
  236. binary was compiled without support for the API, you will not be able to use
  237. this option.
  238. .PP
  239. By running \fBLLnextgen\fP using the name \fBLLgen\fP, \fBLLnextgen\fP goes
  240. into \fBLLgen\fP-mode. This is implemented by turning off all default extra
  241. functionality like \fILLreissue\fP, and disallowing all extensions to the
  242. \fBLLgen\fP language. When running as \fBLLgen\fP, \fBLLnextgen\fP accepts the
  243. following options from \fBLLgen\fP:
  244. .IP "\fB\-a\fP"
  245. Ignored. \fBLLnextgen\fP only generates ANSI C.
  246. .IP "\fB\-h\fP\fInum\fP"
  247. Ignored. \fBLLnextgen\fP leaves optimisation of jump tables entirely
  248. up to the C\-compiler.
  249. .IP "\fB\-j\fP[\fInum\fP]"
  250. Ignored. \fBLLnextgen\fP leaves optimisation of jump tables
  251. entirely up to the C\-compiler.
  252. .IP "\fB\-l\fP[\fInum\fP]"
  253. Ignored. \fBLLnextgen\fP leaves optimisation of jump tables entirely
  254. up to the C\-compiler.
  255. .IP "\fB\-v\fP"
  256. Increase the verbosity level. See the description of the \fB\-v\fP option
  257. above for details.
  258. .IP "\fB\-w\fP"
  259. Suppress all warnings.
  260. .IP "\fB\-x\fP"
  261. Ignored. \fBLLnextgen\fP will only generate token sets in LL.output.
  262. The extensive error-reporting mechanisms in \fBLLnextgen\fP make this feature
  263. obsolete.
  264. .PP
  265. \fBLLnextgen\fP cannot create parsers with non-correcting error-recovery.
  266. Therefore, using the \fB\-n\fP or \fB\-s\fP options will cause \fBLLnextgen\fP
  267. to print an error message and exit.
  268. .SH COMPATIBILITY WITH LLGEN
  269. At this time the basic \fBLLgen\fP functionality is implemented. This includes
  270. everything apart from the extended user error-handling with the %onerror
  271. directive and the non-correcting error-recovery.
  272. .PP
  273. Although I've tried to copy the behaviour of \fBLLgen\fP accurately, I have
  274. implemented some aspects slightly differently. The following is a list of the
  275. differences in behaviour between \fBLLgen\fP and \fBLLnextgen\fP:
  276. .IP "\fB\(bu\fP"
  277. \fBLLgen\fP generated both K&R style C code and ANSI C code. \fBLLnextgen\fP
  278. only supports generation of ANSI C code.
  279. .IP "\fB\(bu\fP"
  280. There is a minor difference in the determination of the default choices.
  281. \fBLLnextgen\fP simply chooses the first production with the shortest possible
  282. terminal production, while \fBLLgen\fP also takes the complexity in terms of
  283. non-terminals and terms into account. There is also a minor difference when
  284. there is more than one shortest alternative and some of them are marked with
  285. %avoid. Both differences are not very important as the user can specify
  286. which alternative should be the default, thereby circumventing the
  287. differences in the algorithms.
  288. .IP "\fB\(bu\fP"
  289. The default behaviour of generating one output C file per input and Lpars.c
  290. and Lpars.h has been changed in favour of generating one\~.c file and one\~.h
  291. file. The rationale given for creating multiple output files in the first
  292. place was that it would reduce the compilation time for the generated
  293. parser. As computation power has become much more abundant this feature is
  294. no longer necessary, and the difficult interaction with the make program
  295. makes it undesirable. The \fBLLgen\fP behaviour is still supported through a
  296. command-line switch.
  297. .IP "\fB\(bu\fP"
  298. in \fBLLgen\fP one could have a parser and a %first macro with the same name.
  299. \fBLLnextgen\fP forbids this, as it leads to name collisions in the new file
  300. naming scheme. For the old \fBLLgen\fP file naming scheme it could also easily
  301. lead to name collisions, although they could be circumvented by not mentioning
  302. the parser in any of the C code in the\~.g files.
  303. .IP "\fB\(bu\fP"
  304. \fBLLgen\fP names the labels it generates L_X, where X is a number.
  305. \fBLLnextgen\fP names these LL_X.
  306. .IP "\fB\(bu\fP"
  307. \fBLLgen\fP parsers are always reentrant. As this feature is not used very
  308. often, \fBLLnextgen\fP parsers are non-reentrant unless the option
  309. \fB\-\-reentrant\fP is used.
  310. .PP
  311. Furthermore, \fBLLnextgen\fP has many extended features, for easier development.
  312. .SH BUGS
  313. If you think you have found a bug, please check that you are using the latest
  314. version of \fBLLnextgen\fP [http://os.ghalkes.nl/LLnextgen]. When reporting
  315. bugs, please include a minimal grammar that demonstrates the problem.
  316. .SH AUTHOR
  317. G.P. Halkes <llnextgen@ghalkes.nl>
  318. .SH COPYRIGHT
  319. Copyright \(co 2005-2008 G.P. Halkes
  320. .br
  321. LLnextgen is licensed under the GNU General Public License version 3.
  322. .br
  323. For more details on the license, see the file COPYING in the documentation
  324. directory. On Un*x systems this is usually /usr/share/doc/LLnextgen-0.5.5.
  325. .SH SEE ALSO
  326. \fBLLgen\fP(1), \fBbison\fP(1), \fByacc\fP(1), \fBlex\fP(1), \fBflex\fP(1).
  327. .PP
  328. A detailed manual for \fBLLnextgen\fP is available as part of the distribution.
  329. It includes the syntax for the grammar files, details on how to use the
  330. generated parser in your programs, and details on the workings of the generated
  331. parsers. This manual can be found in the documentation directory. On Un*x
  332. systems this is usually /usr/share/doc/LLnextgen-0.5.5.