12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361136213631364136513661367136813691370137113721373137413751376137713781379138013811382138313841385 |
- Info file bison.info, produced by Makeinfo, -*- Text -*- from input
- file bison.texinfo.
- This file documents the Bison parser generator.
- Copyright (C) 1988, 1989 Free Software Foundation, Inc.
- Permission is granted to make and distribute verbatim copies of this
- manual provided the copyright notice and this permission notice are
- preserved on all copies.
- Permission is granted to copy and distribute modified versions of
- this manual under the conditions for verbatim copying, provided also
- that the sections entitled ``GNU General Public License'' and
- ``Conditions for Using Bison'' are included exactly as in the
- original, and provided that the entire resulting derived work is
- distributed under the terms of a permission notice identical to this
- one.
- Permission is granted to copy and distribute translations of this
- manual into another language, under the above conditions for modified
- versions, except that the sections entitled ``GNU General Public
- License'', ``Conditions for Using Bison'' and this permission notice
- may be included in translations approved by the Free Software
- Foundation instead of in the original English.
- File: bison.info, Node: Infix Calc, Next: Simple Error Recovery, Prev: RPN Calc, Up: Examples
- Infix Notation Calculator: `calc'
- =================================
- We now modify rpcalc to handle infix operators instead of postfix.
- Infix notation involves the concept of operator precedence and the
- need for parentheses nested to arbitrary depth. Here is the Bison
- code for `calc.y', an infix desk-top calculator.
- /* Infix notation calculator--calc */
-
- %{
- #define YYSTYPE double
- #include <math.h>
- %}
-
- %token NUM
- %left '-' '+'
- %left '*' '/'
- %left NEG /* negation--unary minus */
- %right '^' /* exponentiation */
-
- /* Grammar follows */
- %%
- input: /* empty string */
- | input line
- ;
-
- line: '\n'
- | exp '\n' { printf("\t%.10g\n", $1); }
- ;
-
- exp: NUM { $$ = $1; }
- | exp '+' exp { $$ = $1 + $3; }
- | exp '-' exp { $$ = $1 - $3; }
- | exp '*' exp { $$ = $1 * $3; }
- | exp '/' exp { $$ = $1 / $3; }
- | '-' exp %prec NEG { $$ = -$2; }
- | exp '^' exp { $$ = pow ($1, $3); }
- | '(' exp ')' { $$ = $2; }
- ;
- %%
- The functions `yylex', `yyerror' and `main' can be the same as before.
- There are two important new features shown in this code.
- In the second section (Bison declarations), `%left' declares token
- types and says they are left-associative operators. The declarations
- `%left' and `%right' (right associativity) take the place of `%token'
- which is used to declare a token type name without associativity.
- (These tokens are single-character literals, which ordinarily don't
- need to be declared. We declare them here to specify the
- associativity.)
- Operator precedence is determined by the line ordering of the
- declarations; the lower the declaration, the higher the precedence.
- Hence, exponentiation has the highest precedence, unary minus (`NEG')
- is next, followed by `*' and `/', and so on. *Note Precedence::.
- The other important new feature is the `%prec' in the grammar section
- for the unary minus operator. The `%prec' simply instructs Bison
- that the rule `| '-' exp' has the same precedence as `NEG'--in this
- case the next-to-highest. *Note Contextual Precedence::.
- Here is a sample run of `calc.y':
- % calc
- 4 + 4.5 - (34/(8*3+-3))
- 6.880952381
- -56 + 2
- -54
- 3 ^ 2
- 9
- File: bison.info, Node: Simple Error Recovery, Next: Multi-function Calc, Prev: Infix Calc, Up: Examples
- Simple Error Recovery
- =====================
- Up to this point, this manual has not addressed the issue of "error
- recovery"--how to continue parsing after the parser detects a syntax
- error. All we have handled is error reporting with `yyerror'.
- Recall that by default `yyparse' returns after calling `yyerror'.
- This means that an erroneous input line causes the calculator program
- to exit. Now we show how to rectify this deficiency.
- The Bison language itself includes the reserved word `error', which
- may be included in the grammar rules. In the example below it has
- been added to one of the alternatives for `line':
- line: '\n'
- | exp '\n' { printf("\t%.10g\n", $1); }
- | error '\n' { yyerrok; }
- ;
- This addition to the grammar allows for simple error recovery in the
- event of a parse error. If an expression that cannot be evaluated is
- read, the error will be recognized by the third rule for `line', and
- parsing will continue. (The `yyerror' function is still called upon
- to print its message as well.) The action executes the statement
- `yyerrok', a macro defined automatically by Bison; its meaning is
- that error recovery is complete (*note Error Recovery::.). Note the
- difference between `yyerrok' and `yyerror'; neither one is a misprint.
- This form of error recovery deals with syntax errors. There are
- other kinds of errors; for example, division by zero, which raises an
- exception signal that is normally fatal. A real calculator program
- must handle this signal and use `longjmp' to return to `main' and
- resume parsing input lines; it would also have to discard the rest of
- the current line of input. We won't discuss this issue further
- because it is not specific to Bison programs.
- File: bison.info, Node: Multi-function Calc, Next: Exercises, Prev: Simple Error Recovery, Up: Examples
- Multi-Function Calculator: `mfcalc'
- ===================================
- Now that the basics of Bison have been discussed, it is time to move
- on to a more advanced problem. The above calculators provided only
- five functions, `+', `-', `*', `/' and `^'. It would be nice to have
- a calculator that provides other mathematical functions such as
- `sin', `cos', etc.
- It is easy to add new operators to the infix calculator as long as
- they are only single-character literals. The lexical analyzer
- `yylex' passes back all non-number characters as tokens, so new
- grammar rules suffice for adding a new operator. But we want
- something more flexible: built-in functions whose syntax has this form:
- FUNCTION_NAME (ARGUMENT)
- At the same time, we will add memory to the calculator, by allowing
- you to create named variables, store values in them, and use them
- later. Here is a sample session with the multi-function calculator:
- % acalc
- pi = 3.141592653589
- 3.1415926536
- sin(pi)
- 0.0000000000
- alpha = beta1 = 2.3
- 2.3000000000
- alpha
- 2.3000000000
- ln(alpha)
- 0.8329091229
- exp(ln(beta1))
- 2.3000000000
- %
- Note that multiple assignment and nested function calls are permitted.
- * Menu:
- * Decl: Mfcalc Decl. Bison declarations for multi-function calculator.
- * Rules: Mfcalc Rules. Grammar rules for the calculator.
- * Symtab: Mfcalc Symtab. Symbol table management subroutines.
-
- File: bison.info, Node: Mfcalc Decl, Next: Mfcalc Rules, Prev: Multi-function Calc, Up: Multi-function Calc
- Declarations for `mfcalc'
- -------------------------
- Here are the C and Bison declarations for the multi-function
- calculator.
- %{
- #include <math.h> /* For math functions, cos(), sin(), etc */
- #include "calc.h" /* Contains definition of `symrec' */
- %}
- %union {
- double val; /* For returning numbers. */
- symrec *tptr; /* For returning symbol-table pointers */
- }
-
- %token <val> NUM /* Simple double precision number */
- %token <tptr> VAR FNCT /* Variable and Function */
- %type <val> exp
-
- %right '='
- %left '-' '+'
- %left '*' '/'
- %left NEG /* Negation--unary minus */
- %right '^' /* Exponentiation */
-
- /* Grammar follows */
-
- %%
- The above grammar introduces only two new features of the Bison
- language. These features allow semantic values to have various data
- types (*note Multiple Types::.).
- The `%union' declaration specifies the entire list of possible types;
- this is instead of defining `YYSTYPE'. The allowable types are now
- double-floats (for `exp' and `NUM') and pointers to entries in the
- symbol table. *Note Union Decl::.
- Since values can now have various types, it is necessary to associate
- a type with each grammar symbol whose semantic value is used. These
- symbols are `NUM', `VAR', `FNCT', and `exp'. Their declarations are
- augmented with information about their data type (placed between
- angle brackets).
- The Bison construct `%type' is used for declaring nonterminal
- symbols, just as `%token' is used for declaring token types. We have
- not used `%type' before because nonterminal symbols are normally
- declared implicitly by the rules that define them. But `exp' must be
- declared explicitly so we can specify its value type. *Note Type
- Decl::.
- File: bison.info, Node: Mfcalc Rules, Next: Mfcalc Symtab, Prev: Mfcalc Decl, Up: Multi-function Calc
- Grammar Rules for `mfcalc'
- --------------------------
- Here are the grammar rules for the multi-function calculator. Most
- of them are copied directly from `calc'; three rules, those which
- mention `VAR' or `FNCT', are new.
- input: /* empty */
- | input line
- ;
-
- line:
- '\n'
- | exp '\n' { printf ("\t%.10g\n", $1); }
- | error '\n' { yyerrok; }
- ;
-
- exp: NUM { $$ = $1; }
- | VAR { $$ = $1->value.var; }
- | VAR '=' exp { $$ = $3; $1->value.var = $3; }
- | FNCT '(' exp ')' { $$ = (*($1->value.fnctptr))($3); }
- | exp '+' exp { $$ = $1 + $3; }
- | exp '-' exp { $$ = $1 - $3; }
- | exp '*' exp { $$ = $1 * $3; }
- | exp '/' exp { $$ = $1 / $3; }
- | '-' exp %prec NEG { $$ = -$2; }
- | exp '^' exp { $$ = pow ($1, $3); }
- | '(' exp ')' { $$ = $2; }
- ;
- /* End of grammar */
- %%
- File: bison.info, Node: Mfcalc Symtab, Prev: Mfcalc Rules, Up: Multi-function Calc
- Managing the Symbol Table for `mfcalc'
- --------------------------------------
- The multi-function calculator requires a symbol table to keep track
- of the names and meanings of variables and functions. This doesn't
- affect the grammar rules (except for the actions) or the Bison
- declarations, but it requires some additional C functions for support.
- The symbol table itself consists of a linked list of records. Its
- definition, which is kept in the header `calc.h', is as follows. It
- provides for either functions or variables to be placed in the table.
- /* Data type for links in the chain of symbols. */
- struct symrec
- {
- char *name; /* name of symbol */
- int type; /* type of symbol: either VAR or FNCT */
- union {
- double var; /* value of a VAR */
- double (*fnctptr)(); /* value of a FNCT */
- } value;
- struct symrec *next; /* link field */
- };
-
- typedef struct symrec symrec;
-
- /* The symbol table: a chain of `struct symrec'. */
- extern symrec *sym_table;
-
- symrec *putsym ();
- symrec *getsym ();
- The new version of `main' includes a call to `init_table', a function
- that initializes the symbol table. Here it is, and `init_table' as
- well:
- #include <stdio.h>
-
- main()
- {
- init_table ();
- yyparse ();
- }
-
- yyerror (s) /* Called by yyparse on error */
- char *s;
- {
- printf ("%s\n", s);
- }
-
- struct init
- {
- char *fname;
- double (*fnct)();
- };
-
- struct init arith_fncts[]
- = {
- "sin", sin,
- "cos", cos,
- "atan", atan,
- "ln", log,
- "exp", exp,
- "sqrt", sqrt,
- 0, 0
- };
-
- /* The symbol table: a chain of `struct symrec'. */
- symrec *sym_table = (symrec *)0;
-
- init_table () /* puts arithmetic functions in table. */
- {
- int i;
- symrec *ptr;
- for (i = 0; arith_fncts[i].fname != 0; i++)
- {
- ptr = putsym (arith_fncts[i].fname, FNCT);
- ptr->value.fnctptr = arith_fncts[i].fnct;
- }
- }
- By simply editing the initialization list and adding the necessary
- include files, you can add additional functions to the calculator.
- Two important functions allow look-up and installation of symbols in
- the symbol table. The function `putsym' is passed a name and the
- type (`VAR' or `FNCT') of the object to be installed. The object is
- linked to the front of the list, and a pointer to the object is
- returned. The function `getsym' is passed the name of the symbol to
- look up. If found, a pointer to that symbol is returned; otherwise
- zero is returned.
- symrec *
- putsym (sym_name,sym_type)
- char *sym_name;
- int sym_type;
- {
- symrec *ptr;
- ptr = (symrec *) malloc (sizeof(symrec));
- ptr->name = (char *) malloc (strlen(sym_name)+1);
- strcpy (ptr->name,sym_name);
- ptr->type = sym_type;
- ptr->value.var = 0; /* set value to 0 even if fctn. */
- ptr->next = (struct symrec *)sym_table;
- sym_table = ptr;
- return ptr;
- }
-
- symrec *
- getsym (sym_name)
- char *sym_name;
- {
- symrec *ptr;
- for (ptr = sym_table; ptr != (symrec *) 0;
- ptr = (symrec *)ptr->next)
- if (strcmp (ptr->name,sym_name) == 0)
- return ptr;
- return 0;
- }
- The function `yylex' must now recognize variables, numeric values,
- and the single-character arithmetic operators. Strings of
- alphanumeric characters with a leading nondigit are recognized as
- either variables or functions depending on what the symbol table says
- about them.
- The string is passed to `getsym' for look up in the symbol table. If
- the name appears in the table, a pointer to its location and its type
- (`VAR' or `FNCT') is returned to `yyparse'. If it is not already in
- the table, then it is installed as a `VAR' using `putsym'. Again, a
- pointer and its type (which must be `VAR') is returned to `yyparse'.
- No change is needed in the handling of numeric values and arithmetic
- operators in `yylex'.
- #include <ctype.h>
- yylex()
- {
- int c;
-
- /* Ignore whitespace, get first nonwhite character. */
- while ((c = getchar ()) == ' ' || c == '\t');
-
- if (c == EOF)
- return 0;
-
- /* Char starts a number => parse the number. */
- if (c == '.' || isdigit (c))
- {
- ungetc (c, stdin);
- scanf ("%lf", &yylval.val);
- return NUM;
- }
-
- /* Char starts an identifier => read the name. */
- if (isalpha (c))
- {
- symrec *s;
- static char *symbuf = 0;
- static int length = 0;
- int i;
-
- /* Initially make the buffer long enough
- for a 40-character symbol name. */
- if (length == 0)
- length = 40, symbuf = (char *)malloc (length + 1);
-
- i = 0;
- do
- {
- /* If buffer is full, make it bigger. */
- if (i == length)
- {
- length *= 2;
- symbuf = (char *)realloc (symbuf, length + 1);
- }
- /* Add this character to the buffer. */
- symbuf[i++] = c;
- /* Get another character. */
- c = getchar ();
- }
- while (c != EOF && isalnum (c));
-
- ungetc (c, stdin);
- symbuf[i] = '\0';
-
- s = getsym (symbuf);
- if (s == 0)
- s = putsym (symbuf, VAR);
- yylval.tptr = s;
- return s->type;
- }
-
- /* Any other character is a token by itself. */
- return c;
- }
- This program is both powerful and flexible. You may easily add new
- functions, and it is a simple job to modify this code to install
- predefined variables such as `pi' or `e' as well.
- File: bison.info, Node: Exercises, Prev: Multi-function calc, Up: Examples
- Exercises
- =========
- 1. Add some new functions from `math.h' to the initialization list.
- 2. Add another array that contains constants and their values.
- Then modify `init_table' to add these constants to the symbol
- table. It will be easiest to give the constants type `VAR'.
- 3. Make the program report an error if the user refers to an
- uninitialized variable in any way except to store a value in it.
- File: bison.info, Node: Grammar File, Next: Interface, Prev: Examples, Up: Top
- Bison Grammar Files
- *******************
- Bison takes as input a context-free grammar specification and
- produces a C-language function that recognizes correct instances of
- the grammar.
- The Bison grammar input file conventionally has a name ending in `.y'.
- * Menu:
- * Grammar Outline:: Overall layout of the grammar file.
- * Symbols:: Terminal and nonterminal symbols.
- * Rules:: How to write grammar rules.
- * Recursion:: Writing recursive rules.
- * Semantics:: Semantic values and actions.
- * Declarations:: All kinds of Bison declarations are described here.
- * Multiple Parsers:: Putting more than one Bison parser in one program.
-
- File: bison.info, Node: Grammar Outline, Next: Symbols, Prev: Grammar File, Up: Grammar File
- Outline of a Bison Grammar
- ==========================
- A Bison grammar file has four main sections, shown here with the
- appropriate delimiters:
- %{
- C DECLARATIONS
- %}
-
- BISON DECLARATIONS
-
- %%
- GRAMMAR RULES
- %%
-
- ADDITIONAL C CODE
- Comments enclosed in `/* ... */' may appear in any of the sections.
- * Menu:
- * C Declarations:: Syntax and usage of the C declarations section.
- * Bison Declarations:: Syntax and usage of the Bison declarations section.
- * Grammar Rules:: Syntax and usage of the grammar rules section.
- * C Code:: Syntax and usage of the additional C code section.
-
- File: bison.info, Node: C Declarations, Next: Bison Declarations, Prev: Grammar Outline, Up: Grammar Outline
- The C Declarations Section
- --------------------------
- The C DECLARATIONS section contains macro definitions and
- declarations of functions and variables that are used in the actions
- in the grammar rules. These are copied to the beginning of the
- parser file so that they precede the definition of `yylex'. You can
- use `#include' to get the declarations from a header file. If you
- don't need any C declarations, you may omit the `%{' and `%}'
- delimiters that bracket this section.
- File: bison.info, Node: Bison Declarations, Next: Grammar Rules, Prev: C Declarations, Up: Grammar Outline
- The Bison Declarations Section
- ------------------------------
- The BISON DECLARATIONS section contains declarations that define
- terminal and nonterminal symbols, specify precedence, and so on. In
- some simple grammars you may not need any declarations. *Note
- Declarations::.
- File: bison.info, Node: Grammar Rules, Next: C Code, Prev: Bison Declarations, Up: Grammar Outline
- The Grammar Rules Section
- -------------------------
- The "grammar rules" section contains one or more Bison grammar rules,
- and nothing else. *Note Rules::.
- There must always be at least one grammar rule, and the first `%%'
- (which precedes the grammar rules) may never be omitted even if it is
- the first thing in the file.
- File: bison.info, Node: C Code, Prev: Grammar Rules, Up: Grammar Outline
- The Additional C Code Section
- -----------------------------
- The ADDITIONAL C CODE section is copied verbatim to the end of the
- parser file, just as the C DECLARATIONS section is copied to the
- beginning. This is the most convenient place to put anything that
- you want to have in the parser file but which need not come before
- the definition of `yylex'. For example, the definitions of `yylex'
- and `yyerror' often go here. *Note Interface::.
- If the last section is empty, you may omit the `%%' that separates it
- from the grammar rules.
- The Bison parser itself contains many static variables whose names
- start with `yy' and many macros whose names start with `YY'. It is a
- good idea to avoid using any such names (except those documented in
- this manual) in the additional C code section of the grammar file.
- File: bison.info, Node: Symbols, Next: Rules, Prev: Grammar Outline, Up: Grammar File
- Symbols, Terminal and Nonterminal
- =================================
- "Symbols" in Bison grammars represent the grammatical classifications
- of the language.
- A "terminal symbol" (also known as a "token type") represents a class
- of syntactically equivalent tokens. You use the symbol in grammar
- rules to mean that a token in that class is allowed. The symbol is
- represented in the Bison parser by a numeric code, and the `yylex'
- function returns a token type code to indicate what kind of token has
- been read. You don't need to know what the code value is; you can
- use the symbol to stand for it.
- A "nonterminal symbol" stands for a class of syntactically equivalent
- groupings. The symbol name is used in writing grammar rules. By
- convention, it should be all lower case.
- Symbol names can contain letters, digits (not at the beginning),
- underscores and periods. Periods make sense only in nonterminals.
- There are two ways of writing terminal symbols in the grammar:
- * A "named token type" is written with an identifier, like an
- identifier in C. By convention, it should be all upper case.
- Each such name must be defined with a Bison declaration such as
- `%token'. *Note Token Decl::.
- * A "character token type" (or "literal token") is written in the
- grammar using the same syntax used in C for character constants;
- for example, `'+'' is a character token type. A character token
- type doesn't need to be declared unless you need to specify its
- semantic value data type (*note Value Type::.), associativity,
- or precedence (*note Precedence::.).
- By convention, a character token type is used only to represent
- a token that consists of that particular character. Thus, the
- token type `'+'' is used to represent the character `+' as a
- token. Nothing enforces this convention, but if you depart from
- it, your program will confuse other readers.
- All the usual escape sequences used in character literals in C
- can be used in Bison as well, but you must not use the null
- character as a character literal because its ASCII code, zero,
- is the code `yylex' returns for end-of-input (*note Calling
- Convention::.).
- How you choose to write a terminal symbol has no effect on its
- grammatical meaning. That depends only on where it appears in rules
- and on when the parser function returns that symbol.
- The value returned by `yylex' is always one of the terminal symbols
- (or 0 for end-of-input). Whichever way you write the token type in
- the grammar rules, you write it the same way in the definition of
- `yylex'. The numeric code for a character token type is simply the
- ASCII code for the character, so `yylex' can use the identical
- character constant to generate the requisite code. Each named token
- type becomes a C macro in the parser file, so `yylex' can use the
- name to stand for the code. (This is why periods don't make sense in
- terminal symbols.) *Note Calling Convention::.
- If `yylex' is defined in a separate file, you need to arrange for the
- token-type macro definitions to be available there. Use the `-d'
- option when you run Bison, so that it will write these macro
- definitions into a separate header file `NAME.tab.h' which you can
- include in the other source files that need it. *Note Invocation::.
- The symbol `error' is a terminal symbol reserved for error recovery
- (*note Error Recovery::.); you shouldn't use it for any other purpose.
- In particular, `yylex' should never return this value.
- File: bison.info, Node: Rules, Next: Recursion, Prev: Symbols, Up: Grammar File
- Syntax of Grammar Rules
- =======================
- A Bison grammar rule has the following general form:
- RESULT: COMPONENTS...
- ;
- where RESULT is the nonterminal symbol that this rule describes and
- COMPONENTS are various terminal and nonterminal symbols that are put
- together by this rule (*note Symbols::.). For example,
- exp: exp '+' exp
- ;
- says that two groupings of type `exp', with a `+' token in between,
- can be combined into a larger grouping of type `exp'.
- Whitespace in rules is significant only to separate symbols. You can
- add extra whitespace as you wish.
- Scattered among the components can be ACTIONS that determine the
- semantics of the rule. An action looks like this:
- {C STATEMENTS}
- Usually there is only one action and it follows the components.
- *Note Actions::.
- Multiple rules for the same RESULT can be written separately or can
- be joined with the vertical-bar character `|' as follows:
- RESULT: RULE1-COMPONENTS...
- | RULE2-COMPONENTS...
- ...
- ;
- They are still considered distinct rules even when joined in this way.
- If COMPONENTS in a rule is empty, it means that RESULT can match the
- empty string. For example, here is how to define a comma-separated
- sequence of zero or more `exp' groupings:
- expseq: /* empty */
- | expseq1
- ;
-
- expseq1: exp
- | expseq1 ',' exp
- ;
- It is customary to write a comment `/* empty */' in each rule with no
- components.
- File: bison.info, Node: Recursion, Next: Semantics, Prev: Rules, Up: Grammar File
- Recursive Rules
- ===============
- A rule is called "recursive" when its RESULT nonterminal appears also
- on its right hand side. Nearly all Bison grammars need to use
- recursion, because that is the only way to define a sequence of any
- number of somethings. Consider this recursive definition of a
- comma-separated sequence of one or more expressions:
- expseq1: exp
- | expseq1 ',' exp
- ;
- Since the recursive use of `expseq1' is the leftmost symbol in the
- right hand side, we call this "left recursion". By contrast, here
- the same construct is defined using "right recursion":
- expseq1: exp
- | exp ',' expseq1
- ;
- Any kind of sequence can be defined using either left recursion or
- right recursion, but you should always use left recursion, because it
- can parse a sequence of any number of elements with bounded stack
- space. Right recursion uses up space on the Bison stack in
- proportion to the number of elements in the sequence, because all the
- elements must be shifted onto the stack before the rule can be
- applied even once. *Note Algorithm::, for further explanation of this.
- "Indirect" or "mutual" recursion occurs when the result of the rule
- does not appear directly on its right hand side, but does appear in
- rules for other nonterminals which do appear on its right hand side.
- For example:
- expr: primary
- | primary '+' primary
- ;
-
- primary: constant
- | '(' expr ')'
- ;
- defines two mutually-recursive nonterminals, since each refers to the
- other.
- File: bison.info, Node: Semantics, Next: Declarations, Prev: Recursion, Up: Grammar File
- The Semantics of the Language
- =============================
- The grammar rules for a language determine only the syntax. The
- semantics are determined by the semantic values associated with
- various tokens and groupings, and by the actions taken when various
- groupings are recognized.
- For example, the calculator calculates properly because the value
- associated with each expression is the proper number; it adds
- properly because the action for the grouping `X + Y' is to add the
- numbers associated with X and Y.
- * Menu:
- * Value Type:: Specifying one data type for all semantic values.
- * Multiple Types:: Specifying several alternative data types.
- * Actions:: An action is the semantic definition of a grammar rule.
- * Action Types:: Specifying data types for actions to operate on.
- * Mid-Rule Actions:: Most actions go at the end of a rule.
- This says when, why and how to use the exceptional
- action in the middle of a rule.
-
- File: bison.info, Node: Value Type, Next: Multiple Types, Prev: Semantics, Up: Semantics
- The Data Types of Semantic Values
- ---------------------------------
- In a simple program it may be sufficient to use the same data type
- for the semantic values of all language constructs. This was true in
- the RPN and infix calculator examples (*note RPN Calc::.).
- Bison's default is to use type `int' for all semantic values. To
- specify some other type, define `YYSTYPE' as a macro, like this:
- #define YYSTYPE double
- This macro definition must go in the C declarations section of the
- grammar file (*note Grammar Outline::.).
- File: bison.info, Node: Multiple Types, Next: Actions, Prev: Value Type, Up: Semantics
- More Than One Type for Semantic Values
- --------------------------------------
- In most programs, you will need different data types for different
- kinds of tokens and groupings. For example, a numeric constant may
- need type `int' or `long', while a string constant needs type `char
- *', and an identifier might need a pointer to an entry in the symbol
- table.
- To use more than one data type for semantic values in one parser,
- Bison requires you to do two things:
- * Specify the entire collection of possible data types, with the
- `%union' Bison declaration (*note Union Decl::.).
- * Choose one of those types for each symbol (terminal or
- nonterminal) for which semantic values are used. This is done
- for tokens with the `%token' Bison declaration (*note Token
- Decl::.) and for groupings with the `%type' Bison declaration
- (*note Type Decl::.).
- File: bison.info, Node: Actions, Next: Action Types, Prev: Multiple Types, Up: Semantics
- Actions
- -------
- An action accompanies a syntactic rule and contains C code to be
- executed each time an instance of that rule is recognized. The task
- of most actions is to compute a semantic value for the grouping built
- by the rule from the semantic values associated with tokens or
- smaller groupings.
- An action consists of C statements surrounded by braces, much like a
- compound statement in C. It can be placed at any position in the
- rule; it is executed at that position. Most rules have just one
- action at the end of the rule, following all the components. Actions
- in the middle of a rule are tricky and used only for special purposes
- (*note Mid-Rule Actions::.).
- The C code in an action can refer to the semantic values of the
- components matched by the rule with the construct `$N', which stands
- for the value of the Nth component. The semantic value for the
- grouping being constructed is `$$'. (Bison translates both of these
- constructs into array element references when it copies the actions
- into the parser file.)
- Here is a typical example:
- exp: ...
- | exp '+' exp
- { $$ = $1 + $3; }
- This rule constructs an `exp' from two smaller `exp' groupings
- connected by a plus-sign token. In the action, `$1' and `$3' refer
- to the semantic values of the two component `exp' groupings, which
- are the first and third symbols on the right hand side of the rule.
- The sum is stored into `$$' so that it becomes the semantic value of
- the addition-expression just recognized by the rule. If there were a
- useful semantic value associated with the `+' token, it could be
- referred to as `$2'.
- `$N' with N zero or negative is allowed for reference to tokens and
- groupings on the stack *before* those that match the current rule.
- This is a very risky practice, and to use it reliably you must be
- certain of the context in which the rule is applied. Here is a case
- in which you can use this reliably:
- foo: expr bar '+' expr { ... }
- | expr bar '-' expr { ... }
- ;
-
- bar: /* empty */
- { previous_expr = $0; }
- ;
- As long as `bar' is used only in the fashion shown here, `$0' always
- refers to the `expr' which precedes `bar' in the definition of `foo'.
- File: bison.info, Node: Action Types, Next: Mid-Rule Actions, Prev: Actions, Up: Semantics
- Data Types of Values in Actions
- -------------------------------
- If you have chosen a single data type for semantic values, the `$$'
- and `$N' constructs always have that data type.
- If you have used `%union' to specify a variety of data types, then
- you must declare a choice among these types for each terminal or
- nonterminal symbol that can have a semantic value. Then each time
- you use `$$' or `$N', its data type is determined by which symbol it
- refers to in the rule. In this example,
- exp: ...
- | exp '+' exp
- { $$ = $1 + $3; }
- `$3' and `$$' refer to instances of `exp', so they all have the data
- type declared for the nonterminal symbol `exp'. If `$2' were used,
- it would have the data type declared for the terminal symbol `'+'',
- whatever that might be.
- Alternatively, you can specify the data type when you refer to the
- value, by inserting `<TYPE>' after the `$' at the beginning of the
- reference. For example, if you have defined types as shown here:
- %union {
- int itype;
- double dtype;
- }
- then you can write `$<itype>1' to refer to the first subunit of the
- rule as an integer, or `$<dtype>1' to refer to it as a double.
- File: bison.info, Node: Mid-Rule Actions, Prev: Action Types, Up: Semantics
- Actions in Mid-Rule
- -------------------
- Occasionally it is useful to put an action in the middle of a rule.
- These actions are written just like usual end-of-rule actions, but
- they are executed before the parser even recognizes the following
- components.
- A mid-rule action may refer to the components preceding it using
- `$N', but it may not refer to subsequent components because it is run
- before they are parsed.
- The mid-rule action itself counts as one of the components of the rule.
- This makes a difference when there is another action later in the
- same rule (and usually there is another at the end): you have to
- count the actions along with the symbols when working out which
- number N to use in `$N'.
- The mid-rule action can also have a semantic value. This can be set
- within that action by an assignment to `$$', and can referred to by
- later actions using `$N'. Since there is no symbol to name the
- action, there is no way to declare a data type for the value in
- advance, so you must use the `$<...>' construct to specify a data
- type each time you refer to this value.
- Here is an example from a hypothetical compiler, handling a `let'
- statement that looks like `let (VARIABLE) STATEMENT' and serves to
- create a variable named VARIABLE temporarily for the duration of
- STATEMENT. To parse this construct, we must put VARIABLE into the
- symbol table while STATEMENT is parsed, then remove it afterward.
- Here is how it is done:
- stmt: LET '(' var ')'
- { $<context>$ = push_context ();
- declare_variable ($3); }
- stmt { $$ = $6;
- pop_context ($<context>5); }
- As soon as `let (VARIABLE)' has been recognized, the first action is
- run. It saves a copy of the current semantic context (the list of
- accessible variables) as its semantic value, using alternative
- `context' in the data-type union. Then it calls `declare_variable'
- to add the new variable to that list. Once the first action is
- finished, the embedded statement `stmt' can be parsed. Note that the
- mid-rule action is component number 5, so the `stmt' is component
- number 6.
- After the embedded statement is parsed, its semantic value becomes
- the value of the entire `let'-statement. Then the semantic value
- from the earlier action is used to restore the prior list of
- variables. This removes the temporary `let'-variable from the list
- so that it won't appear to exist while the rest of the program is
- parsed.
- Taking action before a rule is completely recognized often leads to
- conflicts since the parser must commit to a parse in order to execute
- the action. For example, the following two rules, without mid-rule
- actions, can coexist in a working parser because the parser can shift
- the open-brace token and look at what follows before deciding whether
- there is a declaration or not:
- compound: '{' declarations statements '}'
- | '{' statements '}'
- ;
- But when we add a mid-rule action as follows, the rules become
- nonfunctional:
- compound: { prepare_for_local_variables (); }
- '{' declarations statements '}'
- | '{' statements '}'
- ;
- Now the parser is forced to decide whether to run the mid-rule action
- when it has read no farther than the open-brace. In other words, it
- must commit to using one rule or the other, without sufficient
- information to do it correctly. (The open-brace token is what is
- called the "look-ahead" token at this time, since the parser is still
- deciding what to do about it. *Note Look-Ahead::.)
- You might think that you could correct the problem by putting
- identical actions into the two rules, like this:
- compound: { prepare_for_local_variables (); }
- '{' declarations statements '}'
- | { prepare_for_local_variables (); }
- '{' statements '}'
- ;
- But this does not help, because Bison does not realize that the two
- actions are identical. (Bison never tries to understand the C code
- in an action.)
- If the grammar is such that a declaration can be distinguished from a
- statement by the first token (which is true in C), then one solution
- which does work is to put the action after the open-brace, like this:
- compound: '{' { prepare_for_local_variables (); }
- declarations statements '}'
- | '{' statements '}'
- ;
- Now the first token of the following declaration or statement, which
- would in any case tell Bison which rule to use, can still do so.
- Another solution is to bury the action inside a nonterminal symbol
- which serves as a subroutine:
- subroutine: /* empty */
- { prepare_for_local_variables (); }
- ;
-
- compound: subroutine
- '{' declarations statements '}'
- | subroutine
- '{' statements '}'
- ;
- Now Bison can execute the action in the rule for `subroutine' without
- deciding which rule for `compound' it will eventually use. Note that
- the action is now at the end of its rule. Any mid-rule action can be
- converted to an end-of-rule action in this way, and this is what
- Bison actually does to implement mid-rule actions.
- File: bison.info, Node: Declarations, Next: Multiple Parsers, Prev: Semantics, Up: Grammar File
- Bison Declarations
- ==================
- The "Bison declarations" section of a Bison grammar defines the
- symbols used in formulating the grammar and the data types of
- semantic values. *Note Symbols::.
- All token type names (but not single-character literal tokens such as
- `'+'' and `'*'') must be declared. Nonterminal symbols must be
- declared if you need to specify which data type to use for the
- semantic value (*note Multiple Types::.).
- The first rule in the file also specifies the start symbol, by default.
- If you want some other symbol to be the start symbol, you must
- declare it explicitly (*note Language and Grammar::.).
- * Menu:
- * Token Decl:: Declaring terminal symbols.
- * Precedence Decl:: Declaring terminals with precedence and associativity.
- * Union Decl:: Declaring the set of all semantic value types.
- * Type Decl:: Declaring the choice of type for a nonterminal symbol.
- * Expect Decl:: Suppressing warnings about shift/reduce conflicts.
- * Start Decl:: Specifying the start symbol.
- * Pure Decl:: Requesting a reentrant parser.
- * Decl Summary:: Table of all Bison declarations.
-
- File: bison.info, Node: Token Decl, Next: Precedence Decl, Prev: Declarations, Up: Declarations
- Declaring Token Type Names
- --------------------------
- The basic way to declare a token type name (terminal symbol) is as
- follows:
- %token NAME
- Bison will convert this into a `#define' directive in the parser, so
- that the function `yylex' (if it is in this file) can use the name
- NAME to stand for this token type's code.
- Alternatively you can use `%left', `%right', or `%nonassoc' instead
- of `%token', if you wish to specify precedence. *Note Precedence
- Decl::.
- You can explicitly specify the numeric code for a token type by
- appending an integer value in the field immediately following the
- token name:
- %token NUM 300
- It is generally best, however, to let Bison choose the numeric codes
- for all token types. Bison will automatically select codes that
- don't conflict with each other or with ASCII characters.
- In the event that the stack type is a union, you must augment the
- `%token' or other token declaration to include the data type
- alternative delimited by angle-brackets (*note Multiple Types::.).
- For example:
- %union { /* define stack type */
- double val;
- symrec *tptr;
- }
- %token <val> NUM /* define token NUM and its type */
- File: bison.info, Node: Precedence Decl, Next: Union Decl, Prev: Token Decl, Up: Declarations
- Declaring Operator Precedence
- -----------------------------
- Use the `%left', `%right' or `%nonassoc' declaration to declare a
- token and specify its precedence and associativity, all at once.
- These are called "precedence declarations". *Note Precedence::, for
- general information on operator precedence.
- The syntax of a precedence declaration is the same as that of
- `%token': either
- %left SYMBOLS...
- or
- %left <TYPE> SYMBOLS...
- And indeed any of these declarations serves the purposes of `%token'.
- But in addition, they specify the associativity and relative
- precedence for all the SYMBOLS:
- * The associativity of an operator OP determines how repeated uses
- of the operator nest: whether `X OP Y OP Z' is parsed by
- grouping X with Y first or by grouping Y with Z first. `%left'
- specifies left-associativity (grouping X with Y first) and
- `%right' specifies right-associativity (grouping Y with Z
- first). `%nonassoc' specifies no associativity, which means
- that `X OP Y OP Z' is considered a syntax error.
- * The precedence of an operator determines how it nests with other
- operators. All the tokens declared in a single precedence
- declaration have equal precedence and nest together according to
- their associativity. When two tokens declared in different
- precedence declarations associate, the one declared later has
- the higher precedence and is grouped first.
- File: bison.info, Node: Union Decl, Next: Type Decl, Prev: Precedence Decl, Up: Declarations
- Declaring the Collection of Value Types
- ---------------------------------------
- The `%union' declaration specifies the entire collection of possible
- data types for semantic values. The keyword `%union' is followed by
- a pair of braces containing the same thing that goes inside a `union'
- in C. For example:
- %union {
- double val;
- symrec *tptr;
- }
- This says that the two alternative types are `double' and `symrec *'.
- They are given names `val' and `tptr'; these names are used in the
- `%token' and `%type' declarations to pick one of the types for a
- terminal or nonterminal symbol (*note Type Decl::.).
- Note that, unlike making a `union' declaration in C, you do not write
- a semicolon after the closing brace.
- File: bison.info, Node: Type Decl, Next: Expect Decl, Prev: Union Decl, Up: Declarations
- Declaring Value Types of Nonterminal Symbols
- --------------------------------------------
- When you use `%union' to specify multiple value types, you must
- declare the value type of each nonterminal symbol for which values
- are used. This is done with a `%type' declaration, like this:
- %type <TYPE> NONTERMINAL...
- Here NONTERMINAL is the name of a nonterminal symbol, and TYPE is the
- name given in the `%union' to the alternative that you want (*note
- Union Decl::.). You can give any number of nonterminal symbols in
- the same `%type' declaration, if they have the same value type. Use
- spaces to separate the symbol names.
- File: bison.info, Node: Expect Decl, Next: Start Decl, Prev: Type Decl, Up: Declarations
- Preventing Warnings about Conflicts
- -----------------------------------
- Bison normally warns if there are any conflicts in the grammar (*note
- Shift/Reduce::.), but most real grammars have harmless shift/reduce
- conflicts which are resolved in a predictable way and would be
- difficult to eliminate. It is desirable to suppress the warning
- about these conflicts unless the number of conflicts changes. You
- can do this with the `%expect' declaration.
- The declaration looks like this:
- %expect N
- Here N is a decimal integer. The declaration says there should be no
- warning if there are N shift/reduce conflicts and no reduce/reduce
- conflicts. The usual warning is given if there are either more or
- fewer conflicts, or if there are any reduce/reduce conflicts.
- In general, using `%expect' involves these steps:
- * Compile your grammar without `%expect'. Use the `-v' option to
- get a verbose list of where the conflicts occur. Bison will
- also print the number of conflicts.
- * Check each of the conflicts to make sure that Bison's default
- resolution is what you really want. If not, rewrite the grammar
- and go back to the beginning.
- * Add an `%expect' declaration, copying the number N from the
- number which Bison printed.
- Now Bison will stop annoying you about the conflicts you have
- checked, but it will warn you again if changes in the grammer result
- in additional conflicts.
- File: bison.info, Node: Start Decl, Next: Pure Decl, Prev: Expect Decl, Up: Declarations
- Declaring the Start-Symbol
- --------------------------
- Bison assumes by default that the start symbol for the grammar is the
- first nonterminal specified in the grammar specification section.
- The programmer may override this restriction with the `%start'
- declaration as follows:
- %start SYMBOL
- File: bison.info, Node: Pure Decl, Next: Decl Summary, Prev: Start Decl, Up: Declarations
- Requesting a Pure (Reentrant) Parser
- ------------------------------------
- A "reentrant" program is one which does not alter in the course of
- execution; in other words, it consists entirely of "pure" (read-only)
- code. Reentrancy is important whenever asynchronous execution is
- possible; for example, a nonreentrant program may not be safe to call
- from a signal handler. In systems with multiple threads of control,
- a nonreentrant program must be called only within interlocks.
- The Bison parser is not normally a reentrant program, because it uses
- statically allocated variables for communication with `yylex'. These
- variables include `yylval' and `yylloc'.
- The Bison declaration `%pure_parser' says that you want the parser to
- be reentrant. It looks like this:
- %pure_parser
- The effect is that the the two communication variables become local
- variables in `yyparse', and a different calling convention is used
- for the lexical analyzer function `yylex'. *Note Pure Calling::, for
- the details of this. The variable `yynerrs' also becomes local in
- `yyparse' (*note Error Reporting::.). The convention for calling
- `yyparse' itself is unchanged.
- File: bison.info, Node: Decl Summary, Prev: Pure Decl, Up: Declarations
- Bison Declaration Summary
- -------------------------
- Here is a summary of all Bison declarations:
- `%union'
- Declare the collection of data types that semantic values may
- have (*note Union Decl::.).
- `%token'
- Declare a terminal symbol (token type name) with no precedence
- or associativity specified (*note Token Decl::.).
- `%right'
- Declare a terminal symbol (token type name) that is
- right-associative (*note Precedence Decl::.).
- `%left'
- Declare a terminal symbol (token type name) that is
- left-associative (*note Precedence Decl::.).
- `%nonassoc'
- Declare a terminal symbol (token type name) that is
- nonassociative (using it in a way that would be associative is a
- syntax error) (*note Precedence Decl::.).
- `%type'
- Declare the type of semantic values for a nonterminal symbol
- (*note Type Decl::.).
- `%start'
- Specify the grammar's start symbol (*note Start Decl::.).
- `%expect'
- Declare the expected number of shift-reduce conflicts (*note
- Expect Decl::.).
- `%pure_parser'
- Request a pure (reentrant) parser program (*note Pure Decl::.).
|