1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101111121131141151161171181191201211221231241251261271281291301311321331341351361371381391401411421431441451461471481491501511521531541551561571581591601611621631641651661671681691701711721731741751761771781791801811821831841851861871881891901911921931941951961971981992002012022032042052062072082092102112122132142152162172182192202212222232242252262272282292302312322332342352362372382392402412422432442452462472482492502512522532542552562572582592602612622632642652662672682692702712722732742752762772782792802812822832842852862872882892902912922932942952962972982993003013023033043053063073083093103113123133143153163173183193203213223233243253263273283293303313323333343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434444454464474484494504514524534544554564574584594604614624634644654664674684694704714724734744754764774784794804814824834844854864874884894904914924934944954964974984995005015025035045055065075085095105115125135145155165175185195205215225235245255265275285295305315325335345355365375385395405415425435445455465475485495505515525535545555565575585595605615625635645655665675685695705715725735745755765775785795805815825835845855865875885895905915925935945955965975985996006016026036046056066076086096106116126136146156166176186196206216226236246256266276286296306316326336346356366376386396406416426436446456466476486496506516526536546556566576586596606616626636646656666676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767777787797807817827837847857867877887897907917927937947957967977987998008018028038048058068078088098108118128138148158168178188198208218228238248258268278288298308318328338348358368378388398408418428438448458468478488498508518528538548558568578588598608618628638648658668678688698708718728738748758768778788798808818828838848858868878888898908918928938948958968978988999009019029039049059069079089099109119129139149159169179189199209219229239249259269279289299309319329339349359369379389399409419429439449459469479489499509519529539549559569579589599609619629639649659669679689699709719729739749759769779789799809819829839849859869879889899909919929939949959969979989991000100110021003100410051006100710081009101010111012101310141015101610171018101910201021102210231024102510261027102810291030103110321033103410351036103710381039104010411042104310441045104610471048104910501051105210531054105510561057105810591060106110621063106410651066106710681069107010711072107310741075107610771078107910801081108210831084108510861087108810891090109110921093109410951096109710981099110011011102110311041105110611071108110911101111111211131114111511161117111811191120112111221123112411251126112711281129113011311132113311341135113611371138113911401141114211431144114511461147114811491150115111521153115411551156115711581159116011611162116311641165116611671168116911701171 |
- This is Info file bison.info, produced by Makeinfo-1.55 from the input
- file bison.texinfo.
- This file documents the Bison parser generator.
- Copyright (C) 1988, 1989, 1990, 1991, 1992 Free Software Foundation,
- Inc. Modified (1993) from bison-1.22 by Wilfred J. Hansen
- (wjh+@cmu.edu), Andrew Consortium, Carnegie Mellon University
- CARNEGIE MELLON UNIVERSITY DISCLAIMS ALL WARRANTIES WITH REGARD TO
- THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
- FITNESS. IN NO EVENT SHALL CARNEGIE MELLON UNIVERSITY BE LIABLE FOR ANY
- SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER
- RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF
- CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
- CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
- Permission is granted to make and distribute verbatim copies of this
- manual provided the copyright notice and this permission notice are
- preserved on all copies.
- Permission is granted to copy and distribute modified versions of
- this manual under the conditions for verbatim copying, provided also
- that the sections entitled "GNU General Public License" and "Conditions
- for Using Bison" are included exactly as in the original, and provided
- that the entire resulting derived work is distributed under the terms
- of a permission notice identical to this one.
- Permission is granted to copy and distribute translations of this
- manual into another language, under the above conditions for modified
- versions, except that the sections entitled "GNU General Public
- License", "Conditions for Using Bison" and this permission notice may be
- included in translations approved by the Free Software Foundation
- instead of in the original English.
- File: bison.info, Node: Reduce/Reduce, Next: Mystery Conflicts, Prev: Parser States, Up: Algorithm
- Reduce/Reduce Conflicts
- =======================
- A reduce/reduce conflict occurs if there are two or more rules that
- apply to the same sequence of input. This usually indicates a serious
- error in the grammar.
- For example, here is an erroneous attempt to define a sequence of
- zero or more `word' groupings.
- sequence: /* empty */
- { printf ("empty sequence\n"); }
- | maybeword
- | sequence word
- { printf ("added word %s\n", $2); }
- ;
-
- maybeword: /* empty */
- { printf ("empty maybeword\n"); }
- | word
- { printf ("single word %s\n", $1); }
- ;
- The error is an ambiguity: there is more than one way to parse a single
- `word' into a `sequence'. It could be reduced to a `maybeword' and
- then into a `sequence' via the second rule. Alternatively,
- nothing-at-all could be reduced into a `sequence' via the first rule,
- and this could be combined with the `word' using the third rule for
- `sequence'.
- There is also more than one way to reduce nothing-at-all into a
- `sequence'. This can be done directly via the first rule, or
- indirectly via `maybeword' and then the second rule.
- You might think that this is a distinction without a difference,
- because it does not change whether any particular input is valid or
- not. But it does affect which actions are run. One parsing order runs
- the second rule's action; the other runs the first rule's action and
- the third rule's action. In this example, the output of the program
- changes.
- Bison resolves a reduce/reduce conflict by choosing to use the rule
- that appears first in the grammar, but it is very risky to rely on
- this. Every reduce/reduce conflict must be studied and usually
- eliminated. Here is the proper way to define `sequence':
- sequence: /* empty */
- { printf ("empty sequence\n"); }
- | sequence word
- { printf ("added word %s\n", $2); }
- ;
- Here is another common error that yields a reduce/reduce conflict:
- sequence: /* empty */
- | sequence words
- | sequence redirects
- ;
-
- words: /* empty */
- | words word
- ;
-
- redirects:/* empty */
- | redirects redirect
- ;
- The intention here is to define a sequence which can contain either
- `word' or `redirect' groupings. The individual definitions of
- `sequence', `words' and `redirects' are error-free, but the three
- together make a subtle ambiguity: even an empty input can be parsed in
- infinitely many ways!
- Consider: nothing-at-all could be a `words'. Or it could be two
- `words' in a row, or three, or any number. It could equally well be a
- `redirects', or two, or any number. Or it could be a `words' followed
- by three `redirects' and another `words'. And so on.
- Here are two ways to correct these rules. First, to make it a
- single level of sequence:
- sequence: /* empty */
- | sequence word
- | sequence redirect
- ;
- Second, to prevent either a `words' or a `redirects' from being
- empty:
- sequence: /* empty */
- | sequence words
- | sequence redirects
- ;
-
- words: word
- | words word
- ;
-
- redirects:redirect
- | redirects redirect
- ;
- File: bison.info, Node: Mystery Conflicts, Next: Stack Overflow, Prev: Reduce/Reduce, Up: Algorithm
- Mysterious Reduce/Reduce Conflicts
- ==================================
- Sometimes reduce/reduce conflicts can occur that don't look
- warranted. Here is an example:
- %token ID
-
- %%
- def: param_spec return_spec ','
- ;
- param_spec:
- type
- | name_list ':' type
- ;
- return_spec:
- type
- | name ':' type
- ;
- type: ID
- ;
- name: ID
- ;
- name_list:
- name
- | name ',' name_list
- ;
- It would seem that this grammar can be parsed with only a single
- token of look-ahead: when a `param_spec' is being read, an `ID' is a
- `name' if a comma or colon follows, or a `type' if another `ID'
- follows. In other words, this grammar is LR(1).
- However, Bison, like most parser generators, cannot actually handle
- all LR(1) grammars. In this grammar, two contexts, that after an `ID'
- at the beginning of a `param_spec' and likewise at the beginning of a
- `return_spec', are similar enough that Bison assumes they are the same.
- They appear similar because the same set of rules would be active--the
- rule for reducing to a `name' and that for reducing to a `type'. Bison
- is unable to determine at that stage of processing that the rules would
- require different look-ahead tokens in the two contexts, so it makes a
- single parser state for them both. Combining the two contexts causes a
- conflict later. In parser terminology, this occurrence means that the
- grammar is not LALR(1).
- In general, it is better to fix deficiencies than to document them.
- But this particular deficiency is intrinsically hard to fix; parser
- generators that can handle LR(1) grammars are hard to write and tend to
- produce parsers that are very large. In practice, Bison is more useful
- as it is now.
- When the problem arises, you can often fix it by identifying the two
- parser states that are being confused, and adding something to make them
- look distinct. In the above example, adding one rule to `return_spec'
- as follows makes the problem go away:
- %token BOGUS
- ...
- %%
- ...
- return_spec:
- type
- | name ':' type
- /* This rule is never used. */
- | ID BOGUS
- ;
- This corrects the problem because it introduces the possibility of an
- additional active rule in the context after the `ID' at the beginning of
- `return_spec'. This rule is not active in the corresponding context in
- a `param_spec', so the two contexts receive distinct parser states. As
- long as the token `BOGUS' is never generated by `yylex', the added rule
- cannot alter the way actual input is parsed.
- In this particular example, there is another way to solve the
- problem: rewrite the rule for `return_spec' to use `ID' directly
- instead of via `name'. This also causes the two confusing contexts to
- have different sets of active rules, because the one for `return_spec'
- activates the altered rule for `return_spec' rather than the one for
- `name'.
- param_spec:
- type
- | name_list ':' type
- ;
- return_spec:
- type
- | ID ':' type
- ;
- File: bison.info, Node: Stack Overflow, Prev: Mystery Conflicts, Up: Algorithm
- Stack Overflow, and How to Avoid It
- ===================================
- The Bison parser stack can overflow if too many tokens are shifted
- and not reduced. When this happens, the parser function `yyparse'
- returns a nonzero value, pausing only to call `yyerror' to report the
- overflow.
- By defining the macro `YYMAXDEPTH', you can control how deep the
- parser stack can become before a stack overflow occurs. Define the
- macro with a value that is an integer. This value is the maximum number
- of tokens that can be shifted (and not reduced) before overflow. It
- must be a constant expression whose value is known at compile time.
- The stack space allowed is not necessarily allocated. If you
- specify a large value for `YYMAXDEPTH', the parser actually allocates a
- small stack at first, and then makes it bigger by stages as needed.
- This increasing allocation happens automatically and silently.
- Therefore, you do not need to make `YYMAXDEPTH' painfully small merely
- to save space for ordinary inputs that do not need much stack.
- The default value of `YYMAXDEPTH', if you do not define it, is 10000.
- You can control how much stack is allocated initially by defining the
- macro `YYINITDEPTH'. This value too must be a compile-time constant
- integer. The default is 200.
- File: bison.info, Node: Error Recovery, Next: Context Dependency, Prev: Algorithm, Up: Top
- Error Recovery
- **************
- It is not usually acceptable to have a program terminate on a parse
- error. For example, a compiler should recover sufficiently to parse the
- rest of the input file and check it for errors; a calculator should
- accept another expression.
- In a simple interactive command parser where each input is one line,
- it may be sufficient to allow `yyparse' to return 1 on error and have
- the caller ignore the rest of the input line when that happens (and
- then call `yyparse' again). But this is inadequate for a compiler,
- because it forgets all the syntactic context leading up to the error.
- A syntax error deep within a function in the compiler input should not
- cause the compiler to treat the following line like the beginning of a
- source file.
- You can define how to recover from a syntax error by writing rules to
- recognize the special token `error'. This is a terminal symbol that is
- always defined (you need not declare it) and reserved for error
- handling. The Bison parser generates an `error' token whenever a
- syntax error happens; if you have provided a rule to recognize this
- token in the current context, the parse can continue.
- For example:
- stmnts: /* empty string */
- | stmnts '\n'
- | stmnts exp '\n'
- | stmnts error '\n'
- The fourth rule in this example says that an error followed by a
- newline makes a valid addition to any `stmnts'.
- What happens if a syntax error occurs in the middle of an `exp'? The
- error recovery rule, interpreted strictly, applies to the precise
- sequence of a `stmnts', an `error' and a newline. If an error occurs in
- the middle of an `exp', there will probably be some additional tokens
- and subexpressions on the stack after the last `stmnts', and there will
- be tokens to read before the next newline. So the rule is not
- applicable in the ordinary way.
- But Bison can force the situation to fit the rule, by discarding
- part of the semantic context and part of the input. First it discards
- states and objects from the stack until it gets back to a state in
- which the `error' token is acceptable. (This means that the
- subexpressions already parsed are discarded, back to the last complete
- `stmnts'.) At this point the `error' token can be shifted. Then, if
- the old look-ahead token is not acceptable to be shifted next, the
- parser reads tokens and discards them until it finds a token which is
- acceptable. In this example, Bison reads and discards input until the
- next newline so that the fourth rule can apply.
- The choice of error rules in the grammar is a choice of strategies
- for error recovery. A simple and useful strategy is simply to skip the
- rest of the current input line or current statement if an error is
- detected:
- stmnt: error ';' /* on error, skip until ';' is read */
- It is also useful to recover to the matching close-delimiter of an
- opening-delimiter that has already been parsed. Otherwise the
- close-delimiter will probably appear to be unmatched, and generate
- another, spurious error message:
- primary: '(' expr ')'
- | '(' error ')'
- ...
- ;
- Error recovery strategies are necessarily guesses. When they guess
- wrong, one syntax error often leads to another. In the above example,
- the error recovery rule guesses that an error is due to bad input
- within one `stmnt'. Suppose that instead a spurious semicolon is
- inserted in the middle of a valid `stmnt'. After the error recovery
- rule recovers from the first error, another syntax error will be found
- straightaway, since the text following the spurious semicolon is also
- an invalid `stmnt'.
- To prevent an outpouring of error messages, the parser will output
- no error message for another syntax error that happens shortly after
- the first; only after three consecutive input tokens have been
- successfully shifted will error messages resume.
- Note that rules which accept the `error' token may have actions, just
- as any other rules can.
- You can make error messages resume immediately by using the macro
- `yyerrok' in an action. If you do this in the error rule's action, no
- error messages will be suppressed. This macro requires no arguments;
- `yyerrok;' is a valid C statement.
- The previous look-ahead token is reanalyzed immediately after an
- error. If this is unacceptable, then the macro `yyclearin' may be used
- to clear this token. Write the statement `yyclearin;' in the error
- rule's action.
- For example, suppose that on a parse error, an error handling
- routine is called that advances the input stream to some point where
- parsing should once again commence. The next symbol returned by the
- lexical scanner is probably correct. The previous look-ahead token
- ought to be discarded with `yyclearin;'.
- The macro `YYRECOVERING' stands for an expression that has the value
- 1 when the parser is recovering from a syntax error, and 0 the rest of
- the time. A value of 1 indicates that error messages are currently
- suppressed for new syntax errors.
- File: bison.info, Node: Context Dependency, Next: Debugging, Prev: Error Recovery, Up: Top
- Handling Context Dependencies
- *****************************
- The Bison paradigm is to parse tokens first, then group them into
- larger syntactic units. In many languages, the meaning of a token is
- affected by its context. Although this violates the Bison paradigm,
- certain techniques (known as "kludges") may enable you to write Bison
- parsers for such languages.
- * Menu:
- * Semantic Tokens:: Token parsing can depend on the semantic context.
- * Lexical Tie-ins:: Token parsing can depend on the syntactic context.
- * Tie-in Recovery:: Lexical tie-ins have implications for how
- error recovery rules must be written.
- (Actually, "kludge" means any technique that gets its job done but is
- neither clean nor robust.)
- File: bison.info, Node: Semantic Tokens, Next: Lexical Tie-ins, Up: Context Dependency
- Semantic Info in Token Types
- ============================
- The C language has a context dependency: the way an identifier is
- used depends on what its current meaning is. For example, consider
- this:
- foo (x);
- This looks like a function call statement, but if `foo' is a typedef
- name, then this is actually a declaration of `x'. How can a Bison
- parser for C decide how to parse this input?
- The method used in GNU C is to have two different token types,
- `IDENTIFIER' and `TYPENAME'. When `yylex' finds an identifier, it
- looks up the current declaration of the identifier in order to decide
- which token type to return: `TYPENAME' if the identifier is declared as
- a typedef, `IDENTIFIER' otherwise.
- The grammar rules can then express the context dependency by the
- choice of token type to recognize. `IDENTIFIER' is accepted as an
- expression, but `TYPENAME' is not. `TYPENAME' can start a declaration,
- but `IDENTIFIER' cannot. In contexts where the meaning of the
- identifier is *not* significant, such as in declarations that can
- shadow a typedef name, either `TYPENAME' or `IDENTIFIER' is
- accepted--there is one rule for each of the two token types.
- This technique is simple to use if the decision of which kinds of
- identifiers to allow is made at a place close to where the identifier is
- parsed. But in C this is not always so: C allows a declaration to
- redeclare a typedef name provided an explicit type has been specified
- earlier:
- typedef int foo, bar, lose;
- static foo (bar); /* redeclare `bar' as static variable */
- static int foo (lose); /* redeclare `foo' as function */
- Unfortunately, the name being declared is separated from the
- declaration construct itself by a complicated syntactic structure--the
- "declarator".
- As a result, the part of Bison parser for C needs to be duplicated,
- with all the nonterminal names changed: once for parsing a declaration
- in which a typedef name can be redefined, and once for parsing a
- declaration in which that can't be done. Here is a part of the
- duplication, with actions omitted for brevity:
- initdcl:
- declarator maybeasm '='
- init
- | declarator maybeasm
- ;
-
- notype_initdcl:
- notype_declarator maybeasm '='
- init
- | notype_declarator maybeasm
- ;
- Here `initdcl' can redeclare a typedef name, but `notype_initdcl'
- cannot. The distinction between `declarator' and `notype_declarator'
- is the same sort of thing.
- There is some similarity between this technique and a lexical tie-in
- (described next), in that information which alters the lexical analysis
- is changed during parsing by other parts of the program. The
- difference is here the information is global, and is used for other
- purposes in the program. A true lexical tie-in has a special-purpose
- flag controlled by the syntactic context.
- File: bison.info, Node: Lexical Tie-ins, Next: Tie-in Recovery, Prev: Semantic Tokens, Up: Context Dependency
- Lexical Tie-ins
- ===============
- One way to handle context-dependency is the "lexical tie-in": a flag
- which is set by Bison actions, whose purpose is to alter the way tokens
- are parsed.
- For example, suppose we have a language vaguely like C, but with a
- special construct `hex (HEX-EXPR)'. After the keyword `hex' comes an
- expression in parentheses in which all integers are hexadecimal. In
- particular, the token `a1b' must be treated as an integer rather than
- as an identifier if it appears in that context. Here is how you can do
- it:
- %{
- int hexflag;
- %}
- %%
- ...
- expr: IDENTIFIER
- | constant
- | HEX '('
- { hexflag = 1; }
- expr ')'
- { hexflag = 0;
- $$ = $4; }
- | expr '+' expr
- { $$ = make_sum ($1, $3); }
- ...
- ;
-
- constant:
- INTEGER
- | STRING
- ;
- Here we assume that `yylex' looks at the value of `hexflag'; when it is
- nonzero, all integers are parsed in hexadecimal, and tokens starting
- with letters are parsed as integers if possible.
- The declaration of `hexflag' shown in the C declarations section of
- the parser file is needed to make it accessible to the actions (*note
- The C Declarations Section: C Declarations.). You must also write the
- code in `yylex' to obey the flag.
- File: bison.info, Node: Tie-in Recovery, Prev: Lexical Tie-ins, Up: Context Dependency
- Lexical Tie-ins and Error Recovery
- ==================================
- Lexical tie-ins make strict demands on any error recovery rules you
- have. *Note Error Recovery::.
- The reason for this is that the purpose of an error recovery rule is
- to abort the parsing of one construct and resume in some larger
- construct. For example, in C-like languages, a typical error recovery
- rule is to skip tokens until the next semicolon, and then start a new
- statement, like this:
- stmt: expr ';'
- | IF '(' expr ')' stmt { ... }
- ...
- error ';'
- { hexflag = 0; }
- ;
- If there is a syntax error in the middle of a `hex (EXPR)'
- construct, this error rule will apply, and then the action for the
- completed `hex (EXPR)' will never run. So `hexflag' would remain set
- for the entire rest of the input, or until the next `hex' keyword,
- causing identifiers to be misinterpreted as integers.
- To avoid this problem the error recovery rule itself clears
- `hexflag'.
- There may also be an error recovery rule that works within
- expressions. For example, there could be a rule which applies within
- parentheses and skips to the close-parenthesis:
- expr: ...
- | '(' expr ')'
- { $$ = $2; }
- | '(' error ')'
- ...
- If this rule acts within the `hex' construct, it is not going to
- abort that construct (since it applies to an inner level of parentheses
- within the construct). Therefore, it should not clear the flag: the
- rest of the `hex' construct should be parsed with the flag still in
- effect.
- What if there is an error recovery rule which might abort out of the
- `hex' construct or might not, depending on circumstances? There is no
- way you can write the action to determine whether a `hex' construct is
- being aborted or not. So if you are using a lexical tie-in, you had
- better make sure your error recovery rules are not of this kind. Each
- rule must be such that you can be sure that it always will, or always
- won't, have to clear the flag.
- File: bison.info, Node: Debugging, Next: Invocation, Prev: Context Dependency, Up: Top
- Debugging Your Parser
- *********************
- If a Bison grammar compiles properly but doesn't do what you want
- when it runs, the `yydebug' parser-trace feature can help you figure
- out why.
- To enable compilation of trace facilities, you must define the macro
- `YYDEBUG' when you compile the parser. You could use `-DYYDEBUG=1' as
- a compiler option or you could put `#define YYDEBUG 1' in the C
- declarations section of the grammar file (*note The C Declarations
- Section: C Declarations.). Alternatively, use the `-t' option when you
- run Bison (*note Invoking Bison: Invocation.). We always define
- `YYDEBUG' so that debugging is always possible.
- The trace facility uses `stderr', so you must add
- `#include <stdio.h>' to the C declarations section unless it is already
- there.
- Once you have compiled the program with trace facilities, the way to
- request a trace is to store a nonzero value in the variable `yydebug'.
- You can do this by making the C code do it (in `main', perhaps), or you
- can alter the value with a C debugger.
- Each step taken by the parser when `yydebug' is nonzero produces a
- line or two of trace information, written on `stderr'. The trace
- messages tell you these things:
- * Each time the parser calls `yylex', what kind of token was read.
- * Each time a token is shifted, the depth and complete contents of
- the state stack (*note Parser States::.).
- * Each time a rule is reduced, which rule it is, and the complete
- contents of the state stack afterward.
- To make sense of this information, it helps to refer to the listing
- file produced by the Bison `-v' option (*note Invoking Bison:
- Invocation.). This file shows the meaning of each state in terms of
- positions in various rules, and also what each state will do with each
- possible input token. As you read the successive trace messages, you
- can see that the parser is functioning according to its specification
- in the listing file. Eventually you will arrive at the place where
- something undesirable happens, and you will see which parts of the
- grammar are to blame.
- The parser file is a C program and you can use C debuggers on it,
- but it's not easy to interpret what it is doing. The parser function
- is a finite-state machine interpreter, and aside from the actions it
- executes the same code over and over. Only the values of variables
- show where in the grammar it is working.
- The debugging information normally gives the token type of each token
- read, but not its semantic value. You can optionally define a macro
- named `YYPRINT' to provide a way to print the value. If you define
- `YYPRINT', it should take three arguments. The parser will pass a
- standard I/O stream, the numeric code for the token type, and the token
- value (from `yylval').
- Here is an example of `YYPRINT' suitable for the multi-function
- calculator (*note Declarations for `mfcalc': Mfcalc Decl.):
- #define YYPRINT(file, type, value) yyprint (file, type, value)
-
- static void
- yyprint (file, type, value)
- FILE *file;
- int type;
- YYSTYPE value;
- {
- if (type == VAR)
- fprintf (file, " %s", value.tptr->name);
- else if (type == NUM)
- fprintf (file, " %d", value.val);
- }
- File: bison.info, Node: Invocation, Next: Table of Symbols, Prev: Debugging, Up: Top
- Invoking Bison
- **************
- The usual way to invoke Bison is as follows:
- bison INFILE
- Here INFILE is the grammar file name, which usually ends in `.y'.
- The parser file's name is made by replacing the `.y' with `.tab.c'.
- Thus, the `bison foo.y' filename yields `foo.tab.c', and the `bison
- hack/foo.y' filename yields `hack/foo.tab.c'.
- * Menu:
- * Bison Options:: All the options described in detail,
- in alphabetical order by short options.
- * Option Cross Key:: Alphabetical list of long options.
- * VMS Invocation:: Bison command syntax on VMS.
- File: bison.info, Node: Bison Options, Next: Option Cross Key, Up: Invocation
- Bison Options
- =============
- Bison supports both traditional single-letter options and mnemonic
- long option names. Long option names are indicated with `--' instead of
- `-'. Abbreviations for option names are allowed as long as they are
- unique. When a long option takes an argument, like `--file-prefix',
- connect the option name and the argument with `='.
- Here is a list of options that can be used with Bison, alphabetized
- by short option. It is followed by a cross key alphabetized by long
- option.
- `-b FILE-PREFIX'
- `--file-prefix=PREFIX'
- Specify a prefix to use for all Bison output file names. The
- names are chosen as if the input file were named `PREFIX.c'.
- `-d'
- `--defines'
- Write an extra output file containing macro definitions for the
- token type names defined in the grammar and the semantic value type
- `YYSTYPE', as well as a few `extern' variable declarations.
- If the parser output file is named `NAME.c' then this file is
- named `NAME.h'.
- This output file is essential if you wish to put the definition of
- `yylex' in a separate source file, because `yylex' needs to be
- able to refer to token type codes and the variable `yylval'.
- *Note Semantic Values of Tokens: Token Values.
- `-k'
- `--token-table'
- This switch causes the .tab.c output to include a list of token
- names in order by their token numbers; this is defined in the
- array `yytname'. The first three elements are `"$"', `"error"',
- and `"$illegal"'; entries for single- and multiple-character
- symbols include their quotes: `"\'+\'"' and `"\"<=\""'. Also
- generated are #defines for `YYNTOKENS, YYNNTS, YYNRULES', and
- `YYNSTATES' giving, respectively, one more than the highest token
- number, the number of non-terminal symbols, the number of grammar
- rules, and the number of states.
- `-l'
- `--no-lines'
- Don't put any `#line' preprocessor commands in the parser file.
- Ordinarily Bison puts them in the parser file so that the C
- compiler and debuggers will associate errors with your source
- file, the grammar file. This option causes them to associate
- errors with the parser file, treating it an independent source
- file in its own right.
- `-n'
- `--no-parser'
- Do not generate the parser code into the output; generate only
- declarations. The generated `y.tab.c' file will have only
- constant declarations. In addition, a FILENAME.act file is
- generated containing a switch statement body containing all the
- translated actions. (The FILENAME is taken from the input file or
- set in accordance with the -o switch.) The declarations in the
- FILENAME`.tab.c' file will all be static or #defined. Some
- symbols only appear if appropriate options are selected. These
- are #defined: YYLTYPE, YYFINAL, YYFLAG, YYNTBASE, YYTRANSLATE,
- YYLAST, YYNTOKENS, YYNNTS, YYNRULES, YYNSTATES, YYMAXUTOK. These
- are declared and given values: yyltype, yytranslate, yyprhs,
- yyrhs, yystos, yyrline, yytname, yytoknum, yyr1, yyr2, yydefact,
- yydefgoto, yypact, yypgoto, yytable, and yycheck. See the source
- file output.c for definitions of these variables.
- `-o OUTFILE'
- `--output-file=OUTFILE'
- Specify the name OUTFILE for the parser file.
- The other output files' names are constructed from OUTFILE as
- described under the `-v' and `-d' options.
- `-p PREFIX'
- `--name-prefix=PREFIX'
- Rename the external symbols used in the parser so that they start
- with PREFIX instead of `yy'. The precise list of symbols renamed
- is `yyparse', `yylex', `yyerror', `yylval', `yychar' and `yydebug'.
- For example, if you use `-p c', the names become `cparse', `clex',
- and so on.
- *Note Multiple Parsers in the Same Program: Multiple Parsers.
- `-r'
- `--raw'
- In the output to `NAME.h' the tokens are usually defined with Yacc
- compatible token numbers. If this switch is specified, the Bison
- assigned numbers are output instead. (Yacc numbers start at 257
- except for single character tokens; Bison assigns token numbers
- sequentially for all tokens starting at 3.)
- `-t'
- `--debug'
- Output a definition of the macro `YYDEBUG' into the parser file,
- so that the debugging facilities are compiled. *Note Debugging
- Your Parser: Debugging.
- `-v'
- `--verbose'
- Write an extra output file containing verbose descriptions of the
- parser states and what is done for each type of look-ahead token in
- that state.
- This file also describes all the conflicts, both those resolved by
- operator precedence and the unresolved ones.
- The file's name is made by removing `.tab.c' or `.c' from the
- parser output file name, and adding `.output' instead.
- Therefore, if the input file is `foo.y', then the parser file is
- called `foo.tab.c' by default. As a consequence, the verbose
- output file is called `foo.output'.
- `-V'
- `--version'
- Print the version number of Bison and exit.
- `-h'
- `--help'
- Print a summary of the command-line options to Bison and exit.
- `-y'
- `--yacc'
- `--fixed-output-files'
- Equivalent to `-o y.tab.c'; the parser output file is called
- `y.tab.c', and the other outputs are called `y.output' and
- `y.tab.h'. The purpose of this switch is to imitate Yacc's output
- file name conventions. Thus, the following shell script can
- substitute for Yacc:
- bison -y $*
- File: bison.info, Node: Option Cross Key, Next: VMS Invocation, Prev: Bison Options, Up: Invocation
- Option Cross Key
- ================
- Here is a list of options, alphabetized by long option, to help you
- find the corresponding short option.
- --debug -t
- --defines -d
- --file-prefix=PREFIX -b FILE-PREFIX
- --fixed-output-files --yacc -y
- --help -h
- --name-prefix -p
- --no-lines -l
- --no-parser -n
- --output-file=OUTFILE -o OUTFILE
- --raw -r
- --token-table -k
- --verbose -v
- --version -V
- File: bison.info, Node: VMS Invocation, Prev: Option Cross Key, Up: Invocation
- Invoking Bison under VMS
- ========================
- The command line syntax for Bison on VMS is a variant of the usual
- Bison command syntax--adapted to fit VMS conventions.
- To find the VMS equivalent for any Bison option, start with the long
- option, and substitute a `/' for the leading `--', and substitute a `_'
- for each `-' in the name of the long option. For example, the
- following invocation under VMS:
- bison /debug/name_prefix=bar foo.y
- is equivalent to the following command under POSIX.
- bison --debug --name-prefix=bar foo.y
- The VMS file system does not permit filenames such as `foo.tab.c'.
- In the above example, the output file would instead be named
- `foo_tab.c'.
- File: bison.info, Node: Table of Symbols, Next: Parser Symbols, Prev: Invocation, Up: Top
- Bison Symbols
- *************
- `error'
- A token name reserved for error recovery. This token may be used
- in grammar rules so as to allow the Bison parser to recognize an
- error in the grammar without halting the process. In effect, a
- sentence containing an error may be recognized as valid. On a
- parse error, the token `error' becomes the current look-ahead
- token. Actions corresponding to `error' are then executed, and
- the look-ahead token is reset to the token that originally caused
- the violation. *Note Error Recovery::.
- `YYABORT'
- Macro to pretend that an unrecoverable syntax error has occurred,
- by making `yyparse' return 1 immediately. The error reporting
- function `yyerror' is not called. *Note The Parser Function
- `yyparse': Parser Function.
- `YYACCEPT'
- Macro to pretend that a complete utterance of the language has been
- read, by making `yyparse' return 0 immediately. *Note The Parser
- Function `yyparse': Parser Function.
- `YYBACKUP'
- Macro to discard a value from the parser stack and fake a
- look-ahead token. *Note Special Features for Use in Actions:
- Action Features.
- `YYERROR'
- Macro to pretend that a syntax error has just been detected: call
- `yyerror' and then perform normal error recovery if possible
- (*note Error Recovery::.), or (if recovery is impossible) make
- `yyparse' return 1. *Note Error Recovery::.
- `YYERROR_VERBOSE'
- Macro that you define with `#define' in the Bison declarations
- section to request verbose, specific error message strings when
- `yyerror' is called.
- `YYINITDEPTH'
- Macro for specifying the initial size of the parser stack. *Note
- Stack Overflow::.
- `YYLTYPE'
- Macro for the data type of `yylloc'; a structure with four
- members. *Note Textual Positions of Tokens: Token Positions.
- `yyltype'
- Default value for YYLTYPE.
- `YYMAXDEPTH'
- Macro for specifying the maximum size of the parser stack. *Note
- Stack Overflow::.
- `YYRECOVERING'
- Macro whose value indicates whether the parser is recovering from a
- syntax error. *Note Special Features for Use in Actions: Action
- Features.
- `YYSTYPE'
- Macro for the data type of semantic values; `int' by default.
- *Note Data Types of Semantic Values: Value Type.
- `yychar'
- External integer variable that contains the integer value of the
- current look-ahead token. (In a pure parser, it is a local
- variable within `yyparse'.) Error-recovery rule actions may
- examine this variable. *Note Special Features for Use in Actions:
- Action Features.
- `yyclearin'
- Macro used in error-recovery rule actions. It clears the previous
- look-ahead token. *Note Error Recovery::.
- `yydebug'
- External integer variable set to zero by default. If `yydebug' is
- given a nonzero value, the parser will output information on input
- symbols and parser action. *Note Debugging Your Parser: Debugging.
- `yyerrok'
- Macro to cause parser to recover immediately to its normal mode
- after a parse error. *Note Error Recovery::.
- `yyerror'
- User-supplied function to be called by `yyparse' on error. The
- function receives one argument, a pointer to a character string
- containing an error message. *Note The Error Reporting Function
- `yyerror': Error Reporting.
- `yylex'
- User-supplied lexical analyzer function, called with no arguments
- to get the next token. *Note The Lexical Analyzer Function
- `yylex': Lexical.
- `yylval'
- External variable in which `yylex' should place the semantic value
- associated with a token. (In a pure parser, it is a local
- variable within `yyparse', and its address is passed to `yylex'.)
- *Note Semantic Values of Tokens: Token Values.
- `yylloc'
- External variable in which `yylex' should place the line and
- column numbers associated with a token. (In a pure parser, it is a
- local variable within `yyparse', and its address is passed to
- `yylex'.) You can ignore this variable if you don't use the `@'
- feature in the grammar actions. *Note Textual Positions of
- Tokens: Token Positions.
- `yynerrs'
- Global variable which Bison increments each time there is a parse
- error. (In a pure parser, it is a local variable within
- `yyparse'.) *Note The Error Reporting Function `yyerror': Error
- Reporting.
- `yyparse'
- The parser function produced by Bison; call this function to start
- parsing. *Note The Parser Function `yyparse': Parser Function.
- `%left'
- Bison declaration to assign left associativity to token(s). *Note
- Operator Precedence: Precedence Decl.
- `%nonassoc'
- Bison declaration to assign nonassociativity to token(s). *Note
- Operator Precedence: Precedence Decl.
- `%prec'
- Bison declaration to assign a precedence to a specific rule.
- *Note Context-Dependent Precedence: Contextual Precedence.
- `%pure_parser'
- Bison declaration to request a pure (reentrant) parser. *Note A
- Pure (Reentrant) Parser: Pure Decl.
- `%right'
- Bison declaration to assign right associativity to token(s).
- *Note Operator Precedence: Precedence Decl.
- `%start'
- Bison declaration to specify the start symbol. *Note The
- Start-Symbol: Start Decl.
- `%token'
- Bison declaration to declare token(s) without specifying
- precedence. *Note Token Type Names: Token Decl.
- `%type'
- Bison declaration to declare nonterminals. *Note Nonterminal
- Symbols: Type Decl.
- `%union'
- Bison declaration to specify several possible data types for
- semantic values. *Note The Collection of Value Types: Union Decl.
- These are the punctuation and delimiters used in Bison input:
- `%%'
- Delimiter used to separate the grammar rule section from the Bison
- declarations section or the additional C code section. *Note The
- Overall Layout of a Bison Grammar: Grammar Layout.
- `%{ %}'
- All code listed between `%{' and `%}' is copied directly to the
- output file uninterpreted. Such code forms the "C declarations"
- section of the input file. *Note Outline of a Bison Grammar:
- Grammar Outline.
- `/*...*/'
- Comment delimiters, as in C.
- `:'
- Separates a rule's result from its components. *Note Syntax of
- Grammar Rules: Rules.
- `;'
- Terminates a rule. *Note Syntax of Grammar Rules: Rules.
- `|'
- Separates alternate rules for the same result nonterminal. *Note
- Syntax of Grammar Rules: Rules.
- File: bison.info, Node: Parser Symbols, Next: Glossary, Prev: Table of Symbols, Up: Top
- Parser Symbols
- ==============
- Each symbol (either token or variable) receives a symbol number.
- Numbers 0 to ntokens-1 are for tokens, and ntokens to nsyms-1 are for
- variables. Symbol number zero is the end-of-input token. This token
- is counted in ntokens.
- The rules receive rule numbers 1 to nrules in the order they are
- written. Actions and guards are accessed via the rule number.
- The rules themselves are described by three arrays: rrhs, rlhs and
- ritem. rlhs[R] is the symbol number of the left hand side of rule R.
- The right hand side is stored as symbol numbers in a portion of ritem.
- rrhs[R] contains the index in ritem of the beginning of the portion for
- rule R.
- If rlhs[R] is -1, the rule has been thrown out by reduce.c and
- should be ignored.
- The length of the portion is one greater than the number of symbols
- in the rule's right hand side. The last element in the portion
- contains minus R, which identifies it as the end of a portion and says
- which rule it is for.
- The portions of ritem come in order of increasing rule number and are
- followed by an element which is zero to mark the end.
- These symbols are #defined:
- YYFINAL = the state number of the termination state. YYFLAG = most
- negative short int. Used to flag ?? YYNTBASE = ntokens YYTRANSLATE =
- macro to translate token number from yacc to bison YYLAST = index of
- highest entry in yytable and yycheck YYNTOKENS = number of terminal
- symbols (same as YYNTBASE) YYNNTS = number of nonterminals YYNRULES =
- number of rules in the grammar YYNSTATES = number of states in parser
- YYMAXUTOK = highest user token number
- YYTRANSLATE(y) == if y <= YYMAXUTOK then yytranslate[y] else
- YYNTOKENS+YYNNTS
- The parser tables consist of the following tables. Starred ones
- needed only for the semantic parser. Double starred are output only if
- switches are set.
- yytranslate = vector mapping yylex's token numbers into bison's
- token numbers. The token numbers differ because (a) yacc/yylex
- utilize the first 256 token numbers to refer to individual
- characters and (b) the yacc grammar allows explicit assignment
- of token numbers. Bison token numbers are assigned
- sequentially from 0 to YYNTOKENS-1.
- ** yytname = vector of string-names indexed by bison token number
- (range 0...YYNTOKENS+YYNNTS, the last entry is "") user
- token names are in (3...YYNTOKENS-1) and may be identifier - all letters
- an identifier represents either a reserved word or a token class
- character literal - 'x' (the apostrophes appear in the string)
- string literal - "xxx" (the quotes appear in the string)
- non-terminal symbol names are in (YYNTOKENS...YYNTOKENS+YYNNTS-1)
- ** yytoknum = vector of yacc/yylex token numbers corresponding to
- entries in yytname (range 0...YYNTOKENS+YYNNTS, the last entry
- is 0)
- yyrline = vector of line-numbers of all rules. For yydebug
- printouts. (range 0...YYNRULES, the first entry is 0)
- ** yyrhs = vector of items of all rules. This is exactly
- what ritems contains. For yydebug and for semantic parser.
- ** yyprhs[r] = index in yyrhs of first item for rule r.
- (range 0...YYNRULES, the first entry is 0)
- yyr1[r] = symbol number of symbol that rule r derives.
- (range 0...YYNRULES, the first entry is 0)
- yyr2[r] = number of symbols composing right hand side of rule r.
- (range 0...YYNRULES, the first entry is 0)
- * yystos[s] = the symbol number of the symbol that leads to state s.
- (range 0...NSTATES-1)
- yydefact[s] = default rule to reduce with in state s,
- when yytable doesn't specify something else to do. Zero means
- the default is an error. (range 0...NSTATES-1)
- yydefgoto[i] = default state to go to after a reduction of a rule
- that generates variable ntokens + i, except when yytable
- specifies something else to do.
- yypact[s] = index in yytable of the portion describing state s.
- (range 0...YYNSTATES-1). The lookahead token's
- type is used to index that portion to find out what to do.
- If the value in yytable is positive, we shift the
- token and go to that state.
- If the value is negative, it is minus a rule number to reduce by.
- If the value is zero, the default action from yydefact[s] is
- used.
- yypgoto[i] = the index in yytable of the portion describing
- what to do after reducing a rule that derives variable i + ntokens.
- This portion is indexed by the parser state number as
- of before the text for this nonterminal was read. The value from
- yytable is the state to go to.
- yytable = a vector filled with portions for different uses,
- found via yypact and yypgoto. (range 0...YYLAST)
- yycheck = a vector indexed in parallel with yytable. It
- indicates, in a roundabout way, the bounds of the portion you are
- trying to examine.
- Suppose that the portion of yytable starts at index p
- and the index to be examined within the portion is i. Then if
- yycheck[p+i] != i, i is outside the bounds of what is
- actually allocated, and the default (from yydefact or yydefgoto)
- should be used. Otherwise, yytable[p+i] should be used. (range
- 0...YYLAST)
|