1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101111121131141151161171181191201211221231241251261271281291301311321331341351361371381391401411421431441451461471481491501511521531541551561571581591601611621631641651661671681691701711721731741751761771781791801811821831841851861871881891901911921931941951961971981992002012022032042052062072082092102112122132142152162172182192202212222232242252262272282292302312322332342352362372382392402412422432442452462472482492502512522532542552562572582592602612622632642652662672682692702712722732742752762772782792802812822832842852862872882892902912922932942952962972982993003013023033043053063073083093103113123133143153163173183193203213223233243253263273283293303313323333343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434444454464474484494504514524534544554564574584594604614624634644654664674684694704714724734744754764774784794804814824834844854864874884894904914924934944954964974984995005015025035045055065075085095105115125135145155165175185195205215225235245255265275285295305315325335345355365375385395405415425435445455465475485495505515525535545555565575585595605615625635645655665675685695705715725735745755765775785795805815825835845855865875885895905915925935945955965975985996006016026036046056066076086096106116126136146156166176186196206216226236246256266276286296306316326336346356366376386396406416426436446456466476486496506516526536546556566576586596606616626636646656666676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767777787797807817827837847857867877887897907917927937947957967977987998008018028038048058068078088098108118128138148158168178188198208218228238248258268278288298308318328338348358368378388398408418428438448458468478488498508518528538548558568578588598608618628638648658668678688698708718728738748758768778788798808818828838848858868878888898908918928938948958968978988999009019029039049059069079089099109119129139149159169179189199209219229239249259269279289299309319329339349359369379389399409419429439449459469479489499509519529539549559569579589599609619629639649659669679689699709719729739749759769779789799809819829839849859869879889899909919929939949959969979989991000100110021003100410051006100710081009101010111012101310141015101610171018101910201021102210231024102510261027102810291030103110321033103410351036103710381039104010411042104310441045104610471048104910501051105210531054105510561057105810591060106110621063106410651066106710681069107010711072107310741075107610771078107910801081108210831084108510861087108810891090109110921093109410951096109710981099110011011102110311041105110611071108110911101111111211131114111511161117111811191120112111221123112411251126112711281129113011311132113311341135113611371138113911401141114211431144114511461147114811491150115111521153115411551156115711581159116011611162116311641165116611671168116911701171117211731174117511761177117811791180118111821183118411851186118711881189119011911192119311941195119611971198119912001201120212031204120512061207120812091210121112121213121412151216121712181219122012211222122312241225122612271228122912301231123212331234123512361237123812391240124112421243124412451246124712481249125012511252125312541255125612571258125912601261126212631264126512661267126812691270127112721273127412751276127712781279128012811282128312841285128612871288128912901291129212931294129512961297129812991300130113021303 |
- Info file bison.info, produced by Makeinfo, -*- Text -*- from input
- file bison.texinfo.
- This file documents the Bison parser generator.
- Copyright (C) 1988, 1989 Free Software Foundation, Inc.
- Permission is granted to make and distribute verbatim copies of this
- manual provided the copyright notice and this permission notice are
- preserved on all copies.
- Permission is granted to copy and distribute modified versions of
- this manual under the conditions for verbatim copying, provided also
- that the sections entitled ``GNU General Public License'' and
- ``Conditions for Using Bison'' are included exactly as in the
- original, and provided that the entire resulting derived work is
- distributed under the terms of a permission notice identical to this
- one.
- Permission is granted to copy and distribute translations of this
- manual into another language, under the above conditions for modified
- versions, except that the sections entitled ``GNU General Public
- License'', ``Conditions for Using Bison'' and this permission notice
- may be included in translations approved by the Free Software
- Foundation instead of in the original English.
- File: bison.info, Node: Multiple Parsers, Prev: Declarations, Up: Grammar File
- Multiple Parsers in the Same Program
- ====================================
- Most programs that use Bison parse only one language and therefore
- contain only one Bison parser. But what if you want to parse more
- than one language with the same program? Here is what you must do:
- * Make each parser a pure parser (*note Pure Decl::.). This gets
- rid of global variables such as `yylval' which would otherwise
- conflict between the various parsers, but it requires an
- alternate calling convention for `yylex' (*note Pure Calling::.).
- * In each grammar file, define `yyparse' as a macro, expanding
- into the name you want for that parser. Put this definition in
- the C declarations section (*note C Declarations::.). For
- example:
- %{
- #define yyparse parse_algol
- %}
- Then use the expanded name `parse_algol' in other source files
- to call this parser.
- * If you want different lexical analyzers for each grammar, you
- can define `yylex' as a macro, just like `yyparse'. Use the
- expanded name when you define `yylex' in another source file.
- If you define `yylex' in the grammar file itself, simply make it
- static, like this:
- %{
- static int yylex ();
- %}
- %%
- ... GRAMMAR RULES ...
- %%
- static int
- yylex (yylvalp, yyllocp)
- YYSTYPE *yylvalp;
- YYLTYPE *yyllocp;
- { ... }
- * If you want a different `yyerror' function for each grammar, you
- can use the same methods that work for `yylex'.
- File: bison.info, Node: Interface, Next: Algorithm, Prev: Grammar File, Up: Top
- Parser C-Language Interface
- ***************************
- The Bison parser is actually a C function named `yyparse'. Here we
- describe the interface conventions of `yyparse' and the other
- functions that it needs to use.
- Keep in mind that the parser uses many C identifiers starting with
- `yy' and `YY' for internal purposes. If you use such an identifier
- (aside from those in this manual) in an action or in additional C
- code in the grammar file, you are likely to run into trouble.
- * Menu:
- * Parser Function:: How to call `yyparse' and what it returns.
- * Lexical:: You must supply a function `yylex' which reads tokens.
- * Error Reporting:: You must supply a function `yyerror'.
- * Action Features:: Special features for use in actions.
-
- File: bison.info, Node: Parser Function, Next: Lexical, Prev: Interface, Up: Interface
- The Parser Function `yyparse'
- =============================
- You call the function `yyparse' to cause parsing to occur. This
- function reads tokens, executes actions, and ultimately returns when
- it encounters end-of-input or an unrecoverable syntax error. You can
- also write an action which directs `yyparse' to return immediately
- without reading further.
- The value returned by `yyparse' is 0 if parsing was successful
- (return is due to end-of-input).
- The value is 1 if parsing failed (return is due to a syntax error).
- In an action, you can cause immediate return from `yyparse' by using
- these macros:
- `YYACCEPT'
- Return immediately with value 0 (to report success).
- `YYABORT'
- Return immediately with value 1 (to report failure).
- File: bison.info, Node: Lexical, Next: Error Reporting, Prev: Parser Function, Up: Interface
- The Lexical Analyzer Function `yylex'
- =====================================
- The "lexical analyzer" function, `yylex', recognizes tokens from the
- input stream and returns them to the parser. Bison does not create
- this function automatically; you must write it so that `yyparse' can
- call it. The function is sometimes referred to as a lexical scanner.
- In simple programs, `yylex' is often defined at the end of the Bison
- grammar file. If `yylex' is defined in a separate source file, you
- need to arrange for the token-type macro definitions to be available
- there. To do this, use the `-d' option when you run Bison, so that
- it will write these macro definitions into a separate header file
- `NAME.tab.h' which you can include in the other source files that
- need it. *Note Invocation::.
- * Menu:
- * Calling Convention:: How `yyparse' calls `yylex'.
- * Token Values:: How `yylex' must return the semantic value
- of the token it has read.
- * Token Positions:: How `yylex' must return the text position
- (line number, etc.) of the token, if the
- actions want that.
- * Pure Calling:: How the calling convention differs
- in a pure parser (*note Pure Decl::.).
-
- File: bison.info, Node: Calling Convention, Next: Token Values, Prev: Lexical, Up: Lexical
- Calling Convention for `yylex'
- ------------------------------
- The value that `yylex' returns must be the numeric code for the type
- of token it has just found, or 0 for end-of-input.
- When a token is referred to in the grammar rules by a name, that name
- in the parser file becomes a C macro whose definition is the proper
- numeric code for that token type. So `yylex' can use the name to
- indicate that type. *Note Symbols::.
- When a token is referred to in the grammar rules by a character
- literal, the numeric code for that character is also the code for the
- token type. So `yylex' can simply return that character code. The
- null character must not be used this way, because its code is zero
- and that is what signifies end-of-input.
- Here is an example showing these things:
- yylex()
- {
- ...
- if (c == EOF) /* Detect end of file. */
- return 0;
- ...
- if (c == '+' || c == '-')
- return c; /* Assume token type for `+' is '+'. */
- ...
- return INT; /* Return the type of the token. */
- ...
- }
- This interface has been designed so that the output from the `lex'
- utility can be used without change as the definition of `yylex'.
- File: bison.info, Node: Token Values, Next: Token Positions, Prev: Calling Convention, Up: Lexical
- Returning Semantic Values of Tokens
- -----------------------------------
- In an ordinary (nonreentrant) parser, the semantic value of the token
- must be stored into the global variable `yylval'. When you are using
- just one data type for semantic values, `yylval' has that type.
- Thus, if the type is `int' (the default), you might write this in
- `yylex':
- ...
- yylval = value; /* Put value onto Bison stack. */
- return INT; /* Return the type of the token. */
- ...
- When you are using multiple data types, `yylval''s type is a union
- made from the `%union' declaration (*note Union Decl::.). So when
- you store a token's value, you must use the proper member of the union.
- If the `%union' declaration looks like this:
- %union {
- int intval;
- double val;
- symrec *tptr;
- }
- then the code in `yylex' might look like this:
- ...
- yylval.intval = value; /* Put value onto Bison stack. */
- return INT; /* Return the type of the token. */
- ...
-
- File: bison.info, Node: Token Positions, Next: Pure Calling, Prev: Token Values, Up: Lexical
- Reporting Textual Positions of Tokens
- -------------------------------------
- If you are using the `@N'-feature (*note Action Features::.) in
- actions to keep track of the textual locations of tokens and
- groupings, then you must provide this information in `yylex'. The
- function `yyparse' expects to find the textual location of a token
- just parsed in the global variable `yylloc'. So `yylex' must store
- the proper data in that variable. The value of `yylloc' is a
- structure and you need only initialize the members that are going to
- be used by the actions. The four members are called `first_line',
- `first_column', `last_line' and `last_column'. Note that the use of
- this feature makes the parser noticeably slower.
- The data type of `yylloc' has the name `YYLTYPE'.
- File: bison.info, Node: Pure Calling, Prev: Token Positions, Up: Lexical
- Calling Convention for Pure Parsers
- -----------------------------------
- When you use the Bison declaration `%pure_parser' to request a pure,
- reentrant parser, the global communication variables `yylval' and
- `yylloc' cannot be used. (*Note Pure Decl::.) In such parsers the
- two global variables are replaced by pointers passed as arguments to
- `yylex'. You must declare them as shown here, and pass the
- information back by storing it through those pointers.
- yylex (lvalp, llocp)
- YYSTYPE *lvalp;
- YYLTYPE *llocp;
- {
- ...
- *lvalp = value; /* Put value onto Bison stack. */
- return INT; /* Return the type of the token. */
- ...
- }
- File: bison.info, Node: Error Reporting, Next: Action Features, Prev: Lexical, Up: Interface
- The Error Reporting Function `yyerror'
- ======================================
- The Bison parser detects a "parse error" or "syntax error" whenever
- it reads a token which cannot satisfy any syntax rule. A action in
- the grammar can also explicitly proclaim an error, using the macro
- `YYERROR' (*note Action Features::.).
- The Bison parser expects to report the error by calling an error
- reporting function named `yyerror', which you must supply. It is
- called by `yyparse' whenever a syntax error is found, and it receives
- one argument. For a parse error, the string is always `"parse error"'.
- The parser can detect one other kind of error: stack overflow. This
- happens when the input contains constructions that are very deeply
- nested. It isn't likely you will encounter this, since the Bison
- parser extends its stack automatically up to a very large limit. But
- if overflow happens, `yyparse' calls `yyerror' in the usual fashion,
- except that the argument string is `"parser stack overflow"'.
- The following definition suffices in simple programs:
- yyerror (s)
- char *s;
- {
- fprintf (stderr, "%s\n", s);
- }
- After `yyerror' returns to `yyparse', the latter will attempt error
- recovery if you have written suitable error recovery grammar rules
- (*note Error Recovery::.). If recovery is impossible, `yyparse' will
- immediately return 1.
- The variable `yynerrs' contains the number of syntax errors
- encountered so far. Normally this variable is global; but if you
- request a pure parser (*note Pure Decl::.) then it is a local
- variable which only the actions can access.
- File: bison.info, Node: Action Features, Prev: Error Reporting, Up: Interface
- Special Features for Use in Actions
- ===================================
- Here is a table of Bison constructs, variables and macros that are
- useful in actions.
- `$$'
- Acts like a variable that contains the semantic value for the
- grouping made by the current rule. *Note Actions::.
- `$N'
- Acts like a variable that contains the semantic value for the
- Nth component of the current rule. *Note Actions::.
- `$<TYPEALT>$'
- Like `$$' but specifies alternative TYPEALT in the union
- specified by the `%union' declaration. *Note Action Types::.
- `$<TYPEALT>N'
- Like `$N' but specifies alternative TYPEALT in the union
- specified by the `%union' declaration. *Note Action Types::.
- `YYABORT;'
- Return immediately from `yyparse', indicating failure. *Note
- Parser Function::.
- `YYACCEPT;'
- Return immediately from `yyparse', indicating success. *Note
- Parser Function::.
- `YYEMPTY'
- Value stored in `yychar' when there is no look-ahead token.
- `YYERROR;'
- Cause an immediate syntax error. This causes `yyerror' to be
- called, and then error recovery begins. *Note Error Recovery::.
- `yychar'
- Variable containing the current look-ahead token. (In a pure
- parser, this is actually a local variable within `yyparse'.)
- When there is no look-ahead token, the value `YYERROR' is stored
- here. *Note Look-Ahead::.
- `yyclearin;'
- Discard the current look-ahead token. This is useful primarily
- in error rules. *Note Error Recovery::.
- `yyerrok;'
- Resume generating error messages immediately for subsequent
- syntax errors. This is useful primarily in error rules. *Note
- Error Recovery::.
- `@N'
- Acts like a structure variable containing information on the
- line numbers and column numbers of the Nth component of the
- current rule. The structure has four members, like this:
- struct {
- int first_line, last_line;
- int first_column, last_column;
- };
- Thus, to get the starting line number of the third component,
- use `@3.first_line'.
- In order for the members of this structure to contain valid
- information, you must make `yylex' supply this information about
- each token. If you need only certain members, then `yylex' need
- only fill in those members.
- The use of this feature makes the parser noticeably slower.
- File: bison.info, Node: Algorithm, Next: Error Recovery, Prev: Interface, Up: Top
- The Algorithm of the Bison Parser
- *********************************
- As Bison reads tokens, it pushes them onto a stack along with their
- semantic values. The stack is called the "parser stack". Pushing a
- token is traditionally called "shifting".
- For example, suppose the infix calculator has read `1 + 5 *', with a
- `3' to come. The stack will have four elements, one for each token
- that was shifted.
- But the stack does not always have an element for each token read.
- When the last N tokens and groupings shifted match the components of
- a grammar rule, they can be combined according to that rule. This is
- called "reduction". Those tokens and groupings are replaced on the
- stack by a single grouping whose symbol is the result (left hand
- side) of that rule. Running the rule's action is part of the process
- of reduction, because this is what computes the semantic value of the
- resulting grouping.
- For example, if the infix calculator's parser stack contains this:
- 1 + 5 * 3
- and the next input token is a newline character, then the last three
- elements can be reduced to 15 via the rule:
- expr: expr '*' expr;
- Then the stack contains just these three elements:
- 1 + 15
- At this point, another reduction can be made, resulting in the single
- value 16. Then the newline token can be shifted.
- The parser tries, by shifts and reductions, to reduce the entire
- input down to a single grouping whose symbol is the grammar's
- start-symbol (*note Language and Grammar::.).
- This kind of parser is known in the literature as a bottom-up parser.
- * Menu:
- * Look-Ahead:: Parser looks one token ahead when deciding what to do.
- * Shift/Reduce:: Conflicts: when either shifting or reduction is valid.
- * Precedence:: Operator precedence works by resolving conflicts.
- * Contextual Precedence:: When an operator's precedence depends on context.
- * Parser States:: The parser is a finite-state-machine with stack.
- * Reduce/Reduce:: When two rules are applicable in the same situation.
-
- File: bison.info, Node: Look-Ahead, Next: Shift/Reduce, Prev: Algorithm, Up: Algorithm
- Look-Ahead Tokens
- =================
- The Bison parser does *not* always reduce immediately as soon as the
- last N tokens and groupings match a rule. This is because such a
- simple strategy is inadequate to handle most languages. Instead,
- when a reduction is possible, the parser sometimes ``looks ahead'' at
- the next token in order to decide what to do.
- When a token is read, it is not immediately shifted; first it becomes
- the "look-ahead token", which is not on the stack. Now the parser
- can perform one or more reductions of tokens and groupings on the
- stack, while the look-ahead token remains off to the side. When no
- more reductions should take place, the look-ahead token is shifted
- onto the stack. This does not mean that all possible reductions have
- been done; depending on the token type of the look-ahead token, some
- rules may choose to delay their application.
- Here is a simple case where look-ahead is needed. These three rules
- define expressions which contain binary addition operators and
- postfix unary factorial operators (`!'), and allow parentheses for
- grouping.
- expr: term '+' expr
- | term
- ;
-
- term: '(' expr ')'
- | term '!'
- | NUMBER
- ;
- Suppose that the tokens `1 + 2' have been read and shifted; what
- should be done? If the following token is `)', then the first three
- tokens must be reduced to form an `expr'. This is the only valid
- course, because shifting the `)' would produce a sequence of symbols
- `term ')'', and no rule allows this.
- If the following token is `!', then it must be shifted immediately so
- that `2 !' can be reduced to make a `term'. If instead the parser
- were to reduce before shifting, `1 + 2' would become an `expr'. It
- would then be impossible to shift the `!' because doing so would
- produce on the stack the sequence of symbols `expr '!''. No rule
- allows that sequence.
- The current look-ahead token is stored in the variable `yychar'.
- *Note Action Features::.
- File: bison.info, Node: Shift/Reduce, Next: Precedence, Prev: Look-Ahead, Up: Algorithm
- Shift/Reduce Conflicts
- ======================
- Suppose we are parsing a language which has if-then and if-then-else
- statements, with a pair of rules like this:
- if_stmt:
- IF expr THEN stmt
- | IF expr THEN stmt ELSE stmt
- ;
- (Here we assume that `IF', `THEN' and `ELSE' are terminal symbols for
- specific keyword tokens.)
- When the `ELSE' token is read and becomes the look-ahead token, the
- contents of the stack (assuming the input is valid) are just right
- for reduction by the first rule. But it is also legitimate to shift
- the `ELSE', because that would lead to eventual reduction by the
- second rule.
- This situation, where either a shift or a reduction would be valid,
- is called a "shift/reduce conflict". Bison is designed to resolve
- these conflicts by choosing to shift, unless otherwise directed by
- operator precedence declarations. To see the reason for this, let's
- contrast it with the other alternative.
- Since the parser prefers to shift the `ELSE', the result is to attach
- the else-clause to the innermost if-statement, making these two
- inputs equivalent:
- if x then if y then win(); else lose;
-
- if x then do; if y then win(); else lose; end;
- But if the parser chose to reduce when possible rather than shift,
- the result would be to attach the else-clause to the outermost
- if-statement, making these two inputs equivalent:
- if x then if y then win(); else lose;
-
- if x then do; if y then win(); end; else lose;
- The conflict exists because the grammar as written is ambiguous:
- either parsing of the simple nested if-statement is legitimate. The
- established convention is that these ambiguities are resolved by
- attaching the else-clause to the innermost if-statement; this is what
- Bison accomplishes by choosing to shift rather than reduce. (It
- would ideally be cleaner to write an unambiguous grammar, but that is
- very hard to do in this case.) This particular ambiguity was first
- encountered in the specifications of Algol 60 and is called the
- ``dangling `else''' ambiguity.
- To avoid warnings from Bison about predictable, legitimate
- shift/reduce conflicts, use the `%expect N' declaration. There will
- be no warning as long as the number of shift/reduce conflicts is
- exactly N. *Note Expect Decl::.
- File: bison.info, Node: Precedence, Next: Contextual Precedence, Prev: Shift/Reduce, Up: Algorithm
- Operator Precedence
- ===================
- Another situation where shift/reduce conflicts appear is in
- arithmetic expressions. Here shifting is not always the preferred
- resolution; the Bison declarations for operator precedence allow you
- to specify when to shift and when to reduce.
- * Menu:
- * Why Precedence:: An example showing why precedence is needed.
- * Using Precedence:: How to specify precedence in Bison grammars.
- * Precedence Examples:: How these features are used in the previous example.
- * How Precedence:: How they work.
-
- File: bison.info, Node: Why Precedence, Next: Using Precedence, Prev: Precedence, Up: Precedence
- When Precedence is Needed
- -------------------------
- Consider the following ambiguous grammar fragment (ambiguous because
- the input `1 - 2 * 3' can be parsed in two different ways):
- expr: expr '-' expr
- | expr '*' expr
- | expr '<' expr
- | '(' expr ')'
- ...
- ;
- Suppose the parser has seen the tokens `1', `-' and `2'; should it
- reduce them via the rule for the addition operator? It depends on
- the next token. Of course, if the next token is `)', we must reduce;
- shifting is invalid because no single rule can reduce the token
- sequence `- 2 )' or anything starting with that. But if the next
- token is `*' or `<', we have a choice: either shifting or reduction
- would allow the parse to complete, but with different results.
- To decide which one Bison should do, we must consider the results.
- If the next operator token OP is shifted, then it must be reduced
- first in order to permit another opportunity to reduce the sum. The
- result is (in effect) `1 - (2 OP 3)'. On the other hand, if the
- subtraction is reduced before shifting OP, the result is
- `(1 - 2) OP 3'. Clearly, then, the choice of shift or reduce
- should depend on the relative precedence of the operators `-' and OP:
- `*' should be shifted first, but not `<'.
- What about input like `1 - 2 - 5'; should this be `(1 - 2) - 5' or
- `1 - (2 - 5)'? For most operators we prefer the former, which is
- called "left association". The latter alternative, "right
- association", is desirable for assignment operators. The choice of
- left or right association is a matter of whether the parser chooses
- to shift or reduce when the stack contains `1 - 2' and the look-ahead
- token is `-': shifting makes right-associativity.
- File: bison.info, Node: Using Precedence, Next: Precedence Examples, Prev: Why Precedence, Up: Precedence
- How to Specify Operator Precedence
- ----------------------------------
- Bison allows you to specify these choices with the operator
- precedence declarations `%left' and `%right'. Each such declaration
- contains a list of tokens, which are operators whose precedence and
- associativity is being declared. The `%left' declaration makes all
- those operators left-associative and the `%right' declaration makes
- them right-associative. A third alternative is `%nonassoc', which
- declares that it is a syntax error to find the same operator twice
- ``in a row''.
- The relative precedence of different operators is controlled by the
- order in which they are declared. The first `%left' or `%right'
- declaration declares the operators whose precedence is lowest, the
- next such declaration declares the operators whose precedence is a
- little higher, and so on.
- File: bison.info, Node: Precedence Examples, Next: How Precedence, Prev: Using Precedence, Up: Precedence
- Precedence Examples
- -------------------
- In our example, we would want the following declarations:
- %left '<'
- %left '-'
- %left '*'
- In a more complete example, which supports other operators as well,
- we would declare them in groups of equal precedence. For example,
- `'+'' is declared with `'-'':
- %left '<' '>' '=' NE LE GE
- %left '+' '-'
- %left '*' '/'
- (Here `NE' and so on stand for the operators for ``not equal'' and so
- on. We assume that these tokens are more than one character long and
- therefore are represented by names, not character literals.)
- File: bison.info, Node: How Precedence, Prev: Precedence Examples, Up: Precedence
- How Precedence Works
- --------------------
- The first effect of the precedence declarations is to assign
- precedence levels to the terminal symbols declared. The second
- effect is to assign precedence levels to certain rules: each rule
- gets its precedence from the last terminal symbol mentioned in the
- components. (You can also specify explicitly the precedence of a
- rule. *Note Contextual Precedence::.)
- Finally, the resolution of conflicts works by comparing the
- precedence of the rule being considered with that of the look-ahead
- token. If the token's precedence is higher, the choice is to shift.
- If the rule's precedence is higher, the choice is to reduce. If they
- have equal precedence, the choice is made based on the associativity
- of that precedence level. The verbose output file made by `-v'
- (*note Invocation::.) says how each conflict was resolved.
- Not all rules and not all tokens have precedence. If either the rule
- or the look-ahead token has no precedence, then the default is to
- shift.
- File: bison.info, Node: Contextual Precedence, Next: Parser States, Prev: Precedence, Up: Algorithm
- Operators with Context-Dependent Precedence
- ===========================================
- Often the precedence of an operator depends on the context. This
- sounds outlandish at first, but it is really very common. For
- example, a minus sign typically has a very high precedence as a unary
- operator, and a somewhat lower precedence (lower than multiplication)
- as a binary operator.
- The Bison precedence declarations, `%left', `%right' and `%nonassoc',
- can only be used once for a given token; so a token has only one
- precedence declared in this way. For context-dependent precedence,
- you need to use an additional mechanism: the `%prec' modifier for
- rules.
- The `%prec' modifier declares the precedence of a particular rule by
- specifying a terminal symbol whose predecence should be used for that
- rule. It's not necessary for that symbol to appear otherwise in the
- rule. The modifier's syntax is:
- %prec TERMINAL-SYMBOL
- and it is written after the components of the rule. Its effect is to
- assign the rule the precedence of TERMINAL-SYMBOL, overriding the
- precedence that would be deduced for it in the ordinary way. The
- altered rule precedence then affects how conflicts involving that
- rule are resolved (*note Precedence::.).
- Here is how `%prec' solves the problem of unary minus. First,
- declare a precedence for a fictitious terminal symbol named `UMINUS'.
- There are no tokens of this type, but the symbol serves to stand for
- its precedence:
- ...
- %left '+' '-'
- %left '*'
- %left UMINUS
- Now the precedence of `UMINUS' can be used in specific rules:
- exp: ...
- | exp '-' exp
- ...
- | '-' exp %prec UMINUS
- File: bison.info, Node: Parser States, Next: Reduce/Reduce, Prev: Contextual Precedence, Up: Algorithm
- Parser States
- =============
- The function `yyparse' is implemented using a finite-state machine.
- The values pushed on the parser stack are not simply token type
- codes; they represent the entire sequence of terminal and nonterminal
- symbols at or near the top of the stack. The current state collects
- all the information about previous input which is relevant to
- deciding what to do next.
- Each time a look-ahead token is read, the current parser state
- together with the type of look-ahead token are looked up in a table.
- This table entry can say, ``Shift the look-ahead token.'' In this
- case, it also specifies the new parser state, which is pushed onto
- the top of the parser stack. Or it can say, ``Reduce using rule
- number N.'' This means that a certain of tokens or groupings are
- taken off the top of the stack, and replaced by one grouping. In
- other words, that number of states are popped from the stack, and one
- new state is pushed.
- There is one other alternative: the table can say that the look-ahead
- token is erroneous in the current state. This causes error
- processing to begin (*note Error Recovery::.).
- File: bison.info, Node: Reduce/Reduce, Prev: Parser States, Up: Algorithm
- Reduce/Reduce conflicts
- =======================
- A reduce/reduce conflict occurs if there are two or more rules that
- apply to the same sequence of input. This usually indicates a
- serious error in the grammar.
- For example, here is an erroneous attempt to define a sequence of
- zero or more `word' groupings.
- sequence: /* empty */
- { printf ("empty sequence\n"); }
- | word
- { printf ("single word %s\n", $1); }
- | sequence word
- { printf ("added word %s\n", $2); }
- ;
- The error is an ambiguity: there is more than one way to parse a
- single `word' into a `sequence'. It could be reduced directly via
- the second rule. Alternatively, nothing-at-all could be reduced into
- a `sequence' via the first rule, and this could be combined with the
- `word' using the third rule.
- You might think that this is a distinction without a difference,
- because it does not change whether any particular input is valid or
- not. But it does affect which actions are run. One parsing order
- runs the second rule's action; the other runs the first rule's action
- and the third rule's action. In this example, the output of the
- program changes.
- Bison resolves a reduce/reduce conflict by choosing to use the rule
- that appears first in the grammar, but it is very risky to rely on
- this. Every reduce/reduce conflict must be studied and usually
- eliminated. Here is the proper way to define `sequence':
- sequence: /* empty */
- { printf ("empty sequence\n"); }
- | sequence word
- { printf ("added word %s\n", $2); }
- ;
- Here is another common error that yields a reduce/reduce conflict:
- sequence: /* empty */
- | sequence words
- | sequence redirects
- ;
-
- words: /* empty */
- | words word
- ;
-
- redirects:/* empty */
- | redirects redirect
- ;
- The intention here is to define a sequence which can contain either
- `word' or `redirect' groupings. The individual definitions of
- `sequence', `words' and `redirects' are error-free, but the three
- together make a subtle ambiguity: even an empty input can be parsed
- in infinitely many ways!
- Consider: nothing-at-all could be a `words'. Or it could be two
- `words' in a row, or three, or any number. It could equally well be
- a `redirects', or two, or any number. Or it could be a `words'
- followed by three `redirects' and another `words'. And so on.
- Here are two ways to correct these rules. First, to make it a single
- level of sequence:
- sequence: /* empty */
- | sequence word
- | sequence redirect
- ;
- Second, to prevent either a `words' or a `redirects' from being empty:
- sequence: /* empty */
- | sequence words
- | sequence redirects
- ;
-
- words: word
- | words word
- ;
-
- redirects:redirect
- | redirects redirect
- ;
- File: bison.info, Node: Error Recovery, Next: Context Dependency, Prev: Algorithm, Up: Top
- Error Recovery
- **************
- It is not usually acceptable to have the program terminate on a parse
- error. For example, a compiler should recover sufficiently to parse
- the rest of the input file and check it for errors; a calculator
- should accept another expression.
- In a simple interactive command parser where each input is one line,
- it may be sufficient to allow `yyparse' to return 1 on error and have
- the caller ignore the rest of the input line when that happens (and
- then call `yyparse' again). But this is inadequate for a compiler,
- because it forgets all the syntactic context leading up to the error.
- A syntax error deep within a function in the compiler input should
- not cause the compiler to treat the following line like the beginning
- of a source file.
- You can define how to recover from a syntax error by writing rules to
- recognize the special token `error'. This is a terminal symbol that
- is always defined (you need not declare it) and reserved for error
- handling. The Bison parser generates an `error' token whenever a
- syntax error happens; if you have provided a rule to recognize this
- token in the current context, the parse can continue. For example:
- stmnts: /* empty string */
- | stmnts '\n'
- | stmnts exp '\n'
- | stmnts error '\n'
- The fourth rule in this example says that an error followed by a
- newline makes a valid addition to any `stmnts'.
- What happens if a syntax error occurs in the middle of an `exp'? The
- error recovery rule, interpreted strictly, applies to the precise
- sequence of a `stmnts', an `error' and a newline. If an error occurs
- in the middle of an `exp', there will probably be some additional
- tokens and subexpressions on the stack after the last `stmnts', and
- there will be tokens to read before the next newline. So the rule is
- not applicable in the ordinary way.
- But Bison can force the situation to fit the rule, by discarding part
- of the semantic context and part of the input. First it discards
- states and objects from the stack until it gets back to a state in
- which the `error' token is acceptable. (This means that the
- subexpressions already parsed are discarded, back to the last
- complete `stmnts'.) At this point the `error' token can be shifted.
- Then, if the old look-ahead token is not acceptable to be shifted
- next, the parser reads tokens and discards them until it finds a
- token which is acceptable. In this example, Bison reads and discards
- input until the next newline so that the fourth rule can apply.
- The choice of error rules in the grammar is a choice of strategies
- for error recovery. A simple and useful strategy is simply to skip
- the rest of the current input line or current statement if an error
- is detected:
- stmnt: error ';' /* on error, skip until ';' is read */
- It is also useful to recover to the matching close-delimiter of an
- opening-delimiter that has already been parsed. Otherwise the
- close-delimiter will probably appear to be unmatched, and generate
- another, spurious error message:
- primary: '(' expr ')'
- | '(' error ')'
- ...
- ;
- Error recovery strategies are necessarily guesses. When they guess
- wrong, one syntax error often leads to another. In the above
- example, the error recovery rule guesses that an error is due to bad
- input within one `stmnt'. Suppose that instead a spurious semicolon
- is inserted in the middle of a valid `stmnt'. After the error
- recovery rule recovers from the first error, another syntax error
- will be found straightaway, since the text following the spurious
- semicolon is also an invalid `stmnt'.
- To prevent an outpouring of error messages, the parser will output no
- error message for another syntax error that happens shortly after the
- first; only after three consecutive input tokens have been
- successfully shifted will error messages resume.
- Note that rules which accept the `error' token may have actions, just
- as any other rules can.
- You can make error messages resume immediately by using the macro
- `yyerrok' in an action. If you do this in the error rule's action,
- no error messages will be suppressed. This macro requires no
- arguments; `yyerrok;' is a valid C statement.
- The previous look-ahead token is reanalyzed immediately after an
- error. If this is unacceptable, then the macro `yyclearin' may be
- used to clear this token. Write the statement `yyclearin;' in the
- error rule's action.
- For example, suppose that on a parse error, an error handling routine
- is called that advances the input stream to some point where parsing
- should once again commence. The next symbol returned by the lexical
- scanner is probably correct. The previous look-ahead token ought to
- be discarded with `yyclearin;'.
- File: bison.info, Node: Context Dependency, Next: Debugging, Prev: Error Recovery, Up: Top
- Handling Context Dependencies
- *****************************
- The Bison paradigm is to parse tokens first, then group them into
- larger syntactic units. In many languages, the meaning of a token is
- affected by its context. Although this violates the Bison paradigm,
- certain techniques (known as "kludges") may enable you to write Bison
- parsers for such languages.
- * Menu:
- * Semantic Tokens:: Token parsing can depend on the semantic context.
- * Lexical Tie-ins:: Token parsing can depend on the syntactic context.
- * Tie-in Recovery:: Lexical tie-ins have implications for how
- error recovery rules must be written.
- (Actually, ``kludge'' means any technique that gets its job done but
- is neither clean nor robust.)
- File: bison.info, Node: Semantic Tokens, Next: Lexical Tie-ins, Prev: Context Dependency, Up: Context Dependency
- Semantic Info in Token Types
- ============================
- The C language has a context dependency: the way an identifier is
- used depends on what its current meaning is. For example, consider
- this:
- foo (x);
- This looks like a function call statement, but if `foo' is a typedef
- name, then this is actually a declaration of `x'. How can a Bison
- parser for C decide how to parse this input?
- The method used in GNU C is to have two different token types,
- `IDENTIFIER' and `TYPENAME'. When `yylex' finds an identifier, it
- looks up the current declaration of the identifier in order to decide
- which token type to return: `TYPENAME' if the identifier is declared
- as a typedef, `IDENTIFIER' otherwise.
- The grammar rules can then express the context dependency by the
- choice of token type to recognize. `IDENTIFIER' is accepted as an
- expression, but `TYPENAME' is not. `TYPENAME' can start a
- declaration, but `IDENTIFIER' cannot. In contexts where the meaning
- of the identifier is *not* significant, such as in declarations that
- can shadow a typedef name, either `TYPENAME' or `IDENTIFIER' is
- accepted--there is one rule for each of the two token types.
- This technique is simple to use if the decision of which kinds of
- identifiers to allow is made at a place close to where the identifier
- is parsed. But in C this is not always so: C allows a declaration to
- redeclare a typedef name provided an explicit type has been specified
- earlier:
- typedef int foo, bar, lose;
- static foo (bar); /* redeclare `bar' as static variable */
- static int foo (lose); /* redeclare `foo' as function */
- Unfortunately, the name being declared is separated from the
- declaration construct itself by a complicated syntactic
- structure--the ``declarator''.
- As a result, the part of Bison parser for C needs to be duplicated,
- with all the nonterminal names changed: once for parsing a
- declaration in which a typedef name can be redefined, and once for
- parsing a declaration in which that can't be done. Here is a part of
- the duplication, with actions omitted for brevity:
- initdcl:
- declarator maybeasm '='
- init
- | declarator maybeasm
- ;
-
- notype_initdcl:
- notype_declarator maybeasm '='
- init
- | notype_declarator maybeasm
- ;
- Here `initdcl' can redeclare a typedef name, but `notype_initdcl'
- cannot. The distinction between `declarator' and `notype_declarator'
- is the same sort of thing.
- There is some similarity between this technique and a lexical tie-in
- (described next), in that information which alters the lexical
- analysis is changed during parsing by other parts of the program.
- The difference is here the information is global, and is used for
- other purposes in the program. A true lexical tie-in has a
- special-purpose flag controlled by the syntactic context.
- File: bison.info, Node: Lexical Tie-ins, Next: Tie-in Recovery, Prev: Semantic Tokens, Up: Context Dependency
- Lexical Tie-ins
- ===============
- One way to handle context-dependency is the "lexical tie-in": a flag
- which is set by Bison actions, whose purpose is to alter the way
- tokens are parsed.
- For example, suppose we have a language vaguely like C, but with a
- special construct `hex (HEX-EXPR)'. After the keyword `hex' comes an
- expression in parentheses in which all integers are hexadecimal. In
- particular, the token `a1b' must be treated as an integer rather than
- as an identifier if it appears in that context. Here is how you can
- do it:
- %{
- int hexflag;
- %}
- %%
- ...
- expr: IDENTIFIER
- | constant
- | HEX '('
- { hexflag = 1; }
- expr ')'
- { hexflag = 0;
- $$ = $4; }
- | expr '+' expr
- { $$ = make_sum ($1, $3); }
- ...
- ;
-
- constant:
- INTEGER
- | STRING
- ;
- Here we assume that `yylex' looks at the value of `hexflag'; when it
- is nonzero, all integers are parsed in hexadecimal, and tokens
- starting with letters are parsed as integers if possible.
- The declaration of `hexflag' shown in the C declarations section of
- the parser file is needed to make it accessible to the actions (*note
- C Declarations::.). You must also write the code in `yylex' to obey
- the flag.
- File: bison.info, Node: Tie-in Recovery, Prev: Lexical Tie-ins, Up: Context Dependency
- Lexical Tie-ins and Error Recovery
- ==================================
- Lexical tie-ins make strict demands on any error recovery rules you
- have. *Note Error Recovery::.
- The reason for this is that the purpose of an error recovery rule is
- to abort the parsing of one construct and resume in some larger
- construct. For example, in C-like languages, a typical error
- recovery rule is to skip tokens until the next semicolon, and then
- start a new statement, like this:
- stmt: expr ';'
- | IF '(' expr ')' stmt { ... }
- ...
- error ';'
- { hexflag = 0; }
- ;
- If there is a syntax error in the middle of a `hex (EXPR)' construct,
- this error rule will apply, and then the action for the completed
- `hex (EXPR)' will never run. So `hexflag' would remain set for the
- entire rest of the input, or until the next `hex' keyword, causing
- identifiers to be misinterpreted as integers.
- To avoid this problem the error recovery rule itself clears `hexflag'.
- There may also be an error recovery rule that works within expressions.
- For example, there could be a rule which applies within parentheses
- and skips to the close-parenthesis:
- expr: ...
- | '(' expr ')'
- { $$ = $2; }
- | '(' error ')'
- ...
- If this rule acts within the `hex' construct, it is not going to
- abort that construct (since it applies to an inner level of
- parentheses within the construct). Therefore, it should not clear
- the flag: the rest of the `hex' construct should be parsed with the
- flag still in effect.
- What if there is an error recovery rule which might abort out of the
- `hex' construct or might not, depending on circumstances? There is
- no way you can write the action to determine whether a `hex'
- construct is being aborted or not. So if you are using a lexical
- tie-in, you had better make sure your error recovery rules are not of
- this kind. Each rule must be such that you can be sure that it
- always will, or always won't, have to clear the flag.
- File: bison.info, Node: Debugging, Next: Invocation, Prev: Context Dependency, Up: Top
- Debugging Your Parser
- *********************
- If a Bison grammar compiles properly but doesn't do what you want
- when it runs, the `yydebug' parser-trace feature can help you figure
- out why.
- To enable compilation of trace facilities, you must define the macro
- `YYDEBUG' when you compile the parser. You could use `-DYYDEBUG' as
- a compiler option or you could put `#define YYDEBUG' in the C
- declarations section of the grammar file (*note C Declarations::.).
- Alternatively, use the `-t' option when you run Bison (*note
- Invocation::.). I always define `YYDEBUG' so that debugging is
- always possible.
- The trace facility uses `stderr', so you must add
- `#include <stdio.h>' to the C declarations section unless it is
- already there.
- Once you have compiled the program with trace facilities, the way to
- request a trace is to store a nonzero value in the variable `yydebug'.
- You can do this by making the C code do it (in `main', perhaps), or
- you can alter the value with a C debugger.
- Each step taken by the parser when `yydebug' is nonzero produces a
- line or two of trace information, written on `stderr'. The trace
- messages tell you these things:
- * Each time the parser calls `yylex', what kind of token was read.
- * Each time a token is shifted, the depth and complete contents of
- the state stack (*note Parser States::.).
- * Each time a rule is reduced, which rule it is, and the complete
- contents of the state stack afterward.
- To make sense of this information, it helps to refer to the listing
- file produced by the Bison `-v' option (*note Invocation::.). This
- file shows the meaning of each state in terms of positions in various
- rules, and also what each state will do with each possible input
- token. As you read the successive trace messages, you can see that
- the parser is functioning according to its specification in the
- listing file. Eventually you will arrive at the place where
- something undesirable happens, and you will see which parts of the
- grammar are to blame.
- The parser file is a C program and you can use C debuggers on it, but
- it's not easy to interpret what it is doing. The parser function is
- a finite-state machine interpreter, and aside from the actions it
- executes the same code over and over. Only the values of variables
- show where in the grammar it is working.
- File: bison.info, Node: Invocation, Next: Table of Symbols, Prev: Debugging, Up: Top
- Invocation of Bison; Command Options
- ************************************
- The usual way to invoke Bison is as follows:
- bison INFILE
- Here INFILE is the grammar file name, which usually ends in `.y'.
- The parser file's name is made by replacing the `.y' with `.tab.c'.
- Thus, `bison foo.y' outputs `foo.tab.c', and `bison hack/foo.y'
- outputs `hack/foo.tab.c'.
- These options can be used with Bison:
- `-d'
- Write an extra output file containing macro definitions for the
- token type names defined in the grammar and the semantic value
- type `YYSTYPE', as well as a few `extern' variable declarations.
- If the parser output file is named `NAME.c' then this file is
- named `NAME.h'.
- This output file is essential if you wish to put the definition
- of `yylex' in a separate source file, because `yylex' needs to
- be able to refer to token type codes and the variable `yylval'.
- *Note Token Values::.
- `-l'
- Don't put any `#line' preprocessor commands in the parser file.
- Ordinarily Bison puts them in the parser file so that the C
- compiler and debuggers will associate errors with your source
- file, the grammar file. This option causes them to associate
- errors with the parser file, treating it an independent source
- file in its own right.
- `-o OUTFILE'
- Specify the name OUTFILE for the parser file.
- The other output files' names are constructed from OUTFILE as
- described under the `-v' and `-d' switches.
- `-t'
- Output a definition of the macro `YYDEBUG' into the parser file,
- so that the debugging facilities are compiled. *Note Debugging::.
- `-v'
- Write an extra output file containing verbose descriptions of
- the parser states and what is done for each type of look-ahead
- token in that state.
- This file also describes all the conflicts, both those resolved
- by operator precedence and the unresolved ones.
- The file's name is made by removing `.tab.c' or `.c' from the
- parser output file name, and adding `.output' instead.
- Therefore, if the input file is `foo.y', then the parser file is
- called `foo.tab.c' by default. As a consequence, the verbose
- output file is called `foo.output'.
- `-y'
- Equivalent to `-o y.tab.c'; the parser output file is called
- `y.tab.c', and the other outputs are called `y.output' and
- `y.tab.h'. The purpose of this switch is to imitate Yacc's
- output file name conventions.
|