123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956957958959960961962963964965966967968969970971972973974975976977978979980981982983984985986987988989990991992993994995996997998999100010011002100310041005100610071008100910101011101210131014101510161017101810191020102110221023102410251026102710281029103010311032103310341035103610371038103910401041104210431044104510461047104810491050105110521053105410551056105710581059106010611062106310641065106610671068106910701071107210731074107510761077107810791080108110821083108410851086108710881089109010911092109310941095109610971098109911001101110211031104110511061107110811091110111111121113111411151116111711181119112011211122112311241125112611271128112911301131113211331134113511361137113811391140114111421143114411451146114711481149115011511152115311541155115611571158115911601161116211631164116511661167116811691170117111721173117411751176117711781179118011811182118311841185118611871188118911901191119211931194119511961197119811991200120112021203120412051206120712081209121012111212121312141215121612171218121912201221122212231224122512261227122812291230123112321233123412351236123712381239124012411242124312441245124612471248124912501251125212531254125512561257125812591260126112621263126412651266126712681269127012711272127312741275127612771278127912801281128212831284128512861287128812891290129112921293129412951296129712981299130013011302130313041305130613071308130913101311131213131314131513161317131813191320132113221323132413251326132713281329133013311332133313341335133613371338133913401341134213431344134513461347134813491350135113521353135413551356135713581359136013611362136313641365136613671368136913701371137213731374137513761377137813791380138113821383138413851386138713881389139013911392139313941395139613971398139914001401140214031404140514061407140814091410141114121413141414151416141714181419142014211422 |
- Info file bison.info, produced by Makeinfo, -*- Text -*- from input
- file bison.texinfo.
- This file documents the Bison parser generator.
- Copyright (C) 1988, 1989, 1990 Free Software Foundation, Inc.
- Permission is granted to make and distribute verbatim copies of this
- manual provided the copyright notice and this permission notice are
- preserved on all copies.
- Permission is granted to copy and distribute modified versions of
- this manual under the conditions for verbatim copying, provided also
- that the sections entitled "GNU General Public License" and
- "Conditions for Using Bison" are included exactly as in the original,
- and provided that the entire resulting derived work is distributed
- under the terms of a permission notice identical to this one.
- Permission is granted to copy and distribute translations of this
- manual into another language, under the above conditions for modified
- versions, except that the sections entitled "GNU General Public
- License", "Conditions for Using Bison" and this permission notice may
- be included in translations approved by the Free Software Foundation
- instead of in the original English.
- File: bison.info, Node: Infix Calc, Next: Simple Error Recovery, Prev: RPN Calc, Up: Examples
- Infix Notation Calculator: `calc'
- =================================
- We now modify rpcalc to handle infix operators instead of postfix.
- Infix notation involves the concept of operator precedence and the
- need for parentheses nested to arbitrary depth. Here is the Bison
- code for `calc.y', an infix desk-top calculator.
- /* Infix notation calculator--calc */
-
- %{
- #define YYSTYPE double
- #include <math.h>
- %}
-
- /* BISON Declarations */
- %token NUM
- %left '-' '+'
- %left '*' '/'
- %left NEG /* negation--unary minus */
- %right '^' /* exponentiation */
-
- /* Grammar follows */
- %%
- input: /* empty string */
- | input line
- ;
-
- line: '\n'
- | exp '\n' { printf("\t%.10g\n", $1); }
- ;
-
- exp: NUM { $$ = $1; }
- | exp '+' exp { $$ = $1 + $3; }
- | exp '-' exp { $$ = $1 - $3; }
- | exp '*' exp { $$ = $1 * $3; }
- | exp '/' exp { $$ = $1 / $3; }
- | '-' exp %prec NEG { $$ = -$2; }
- | exp '^' exp { $$ = pow ($1, $3); }
- | '(' exp ')' { $$ = $2; }
- ;
- %%
- The functions `yylex', `yyerror' and `main' can be the same as before.
- There are two important new features shown in this code.
- In the second section (Bison declarations), `%left' declares token
- types and says they are left-associative operators. The declarations
- `%left' and `%right' (right associativity) take the place of `%token'
- which is used to declare a token type name without associativity.
- (These tokens are single-character literals, which ordinarily don't
- need to be declared. We declare them here to specify the
- associativity.)
- Operator precedence is determined by the line ordering of the
- declarations; the higher the line number of the declaration (lower on
- the page or screen), the higher the precedence. Hence,
- exponentiation has the highest precedence, unary minus (`NEG') is
- next, followed by `*' and `/', and so on. *Note Precedence::.
- The other important new feature is the `%prec' in the grammar section
- for the unary minus operator. The `%prec' simply instructs Bison
- that the rule `| '-' exp' has the same precedence as `NEG'--in this
- case the next-to-highest. *Note Contextual Precedence::.
- Here is a sample run of `calc.y':
- 500
- % calc
- 4 + 4.5 - (34/(8*3+-3))
- 6.880952381
- -56 + 2
- -54
- 3 ^ 2
- 9
- File: bison.info, Node: Simple Error Recovery, Next: Multi-function Calc, Prev: Infix Calc, Up: Examples
- Simple Error Recovery
- =====================
- Up to this point, this manual has not addressed the issue of "error
- recovery"--how to continue parsing after the parser detects a syntax
- error. All we have handled is error reporting with `yyerror'.
- Recall that by default `yyparse' returns after calling `yyerror'.
- This means that an erroneous input line causes the calculator program
- to exit. Now we show how to rectify this deficiency.
- The Bison language itself includes the reserved word `error', which
- may be included in the grammar rules. In the example below it has
- been added to one of the alternatives for `line':
- line: '\n'
- | exp '\n' { printf("\t%.10g\n", $1); }
- | error '\n' { yyerrok; }
- ;
- This addition to the grammar allows for simple error recovery in the
- event of a parse error. If an expression that cannot be evaluated is
- read, the error will be recognized by the third rule for `line', and
- parsing will continue. (The `yyerror' function is still called upon
- to print its message as well.) The action executes the statement
- `yyerrok', a macro defined automatically by Bison; its meaning is
- that error recovery is complete (*note Error Recovery::.). Note the
- difference between `yyerrok' and `yyerror'; neither one is a misprint.
- This form of error recovery deals with syntax errors. There are
- other kinds of errors; for example, division by zero, which raises an
- exception signal that is normally fatal. A real calculator program
- must handle this signal and use `longjmp' to return to `main' and
- resume parsing input lines; it would also have to discard the rest of
- the current line of input. We won't discuss this issue further
- because it is not specific to Bison programs.
- File: bison.info, Node: Multi-function Calc, Next: Exercises, Prev: Simple Error Recovery, Up: Examples
- Multi-Function Calculator: `mfcalc'
- ===================================
- Now that the basics of Bison have been discussed, it is time to move
- on to a more advanced problem. The above calculators provided only
- five functions, `+', `-', `*', `/' and `^'. It would be nice to have
- a calculator that provides other mathematical functions such as
- `sin', `cos', etc.
- It is easy to add new operators to the infix calculator as long as
- they are only single-character literals. The lexical analyzer
- `yylex' passes back all non-number characters as tokens, so new
- grammar rules suffice for adding a new operator. But we want
- something more flexible: built-in functions whose syntax has this form:
- FUNCTION_NAME (ARGUMENT)
- At the same time, we will add memory to the calculator, by allowing
- you to create named variables, store values in them, and use them
- later. Here is a sample session with the multi-function calculator:
- % acalc
- pi = 3.141592653589
- 3.1415926536
- sin(pi)
- 0.0000000000
- alpha = beta1 = 2.3
- 2.3000000000
- alpha
- 2.3000000000
- ln(alpha)
- 0.8329091229
- exp(ln(beta1))
- 2.3000000000
- %
- Note that multiple assignment and nested function calls are permitted.
- * Menu:
- * Decl: Mfcalc Decl. Bison declarations for multi-function calculator.
- * Rules: Mfcalc Rules. Grammar rules for the calculator.
- * Symtab: Mfcalc Symtab. Symbol table management subroutines.
-
- File: bison.info, Node: Mfcalc Decl, Next: Mfcalc Rules, Prev: Multi-function Calc, Up: Multi-function Calc
- Declarations for `mfcalc'
- -------------------------
- Here are the C and Bison declarations for the multi-function
- calculator.
- %{
- #include <math.h> /* For math functions, cos(), sin(), etc. */
- #include "calc.h" /* Contains definition of `symrec' */
- %}
- %union {
- double val; /* For returning numbers. */
- symrec *tptr; /* For returning symbol-table pointers */
- }
-
- %token <val> NUM /* Simple double precision number */
- %token <tptr> VAR FNCT /* Variable and Function */
- %type <val> exp
-
- %right '='
- %left '-' '+'
- %left '*' '/'
- %left NEG /* Negation--unary minus */
- %right '^' /* Exponentiation */
-
- /* Grammar follows */
-
- %%
- The above grammar introduces only two new features of the Bison
- language. These features allow semantic values to have various data
- types (*note Multiple Types::.).
- The `%union' declaration specifies the entire list of possible types;
- this is instead of defining `YYSTYPE'. The allowable types are now
- double-floats (for `exp' and `NUM') and pointers to entries in the
- symbol table. *Note Union Decl::.
- Since values can now have various types, it is necessary to associate
- a type with each grammar symbol whose semantic value is used. These
- symbols are `NUM', `VAR', `FNCT', and `exp'. Their declarations are
- augmented with information about their data type (placed between
- angle brackets).
- The Bison construct `%type' is used for declaring nonterminal
- symbols, just as `%token' is used for declaring token types. We have
- not used `%type' before because nonterminal symbols are normally
- declared implicitly by the rules that define them. But `exp' must be
- declared explicitly so we can specify its value type. *Note Type
- Decl::.
- File: bison.info, Node: Mfcalc Rules, Next: Mfcalc Symtab, Prev: Mfcalc Decl, Up: Multi-function Calc
- Grammar Rules for `mfcalc'
- --------------------------
- Here are the grammar rules for the multi-function calculator. Most
- of them are copied directly from `calc'; three rules, those which
- mention `VAR' or `FNCT', are new.
- input: /* empty */
- | input line
- ;
-
- line:
- '\n'
- | exp '\n' { printf ("\t%.10g\n", $1); }
- | error '\n' { yyerrok; }
- ;
-
- exp: NUM { $$ = $1; }
- | VAR { $$ = $1->value.var; }
- | VAR '=' exp { $$ = $3; $1->value.var = $3; }
- | FNCT '(' exp ')' { $$ = (*($1->value.fnctptr))($3); }
- | exp '+' exp { $$ = $1 + $3; }
- | exp '-' exp { $$ = $1 - $3; }
- | exp '*' exp { $$ = $1 * $3; }
- | exp '/' exp { $$ = $1 / $3; }
- | '-' exp %prec NEG { $$ = -$2; }
- | exp '^' exp { $$ = pow ($1, $3); }
- | '(' exp ')' { $$ = $2; }
- ;
- /* End of grammar */
- %%
- File: bison.info, Node: Mfcalc Symtab, Prev: Mfcalc Rules, Up: Multi-function Calc
- The `mfcalc' Symbol Table
- -------------------------
- The multi-function calculator requires a symbol table to keep track
- of the names and meanings of variables and functions. This doesn't
- affect the grammar rules (except for the actions) or the Bison
- declarations, but it requires some additional C functions for support.
- The symbol table itself consists of a linked list of records. Its
- definition, which is kept in the header `calc.h', is as follows. It
- provides for either functions or variables to be placed in the table.
- /* Data type for links in the chain of symbols. */
- struct symrec
- {
- char *name; /* name of symbol */
- int type; /* type of symbol: either VAR or FNCT */
- union {
- double var; /* value of a VAR */
- double (*fnctptr)(); /* value of a FNCT */
- } value;
- struct symrec *next; /* link field */
- };
-
- typedef struct symrec symrec;
-
- /* The symbol table: a chain of `struct symrec'. */
- extern symrec *sym_table;
-
- symrec *putsym ();
- symrec *getsym ();
- The new version of `main' includes a call to `init_table', a function
- that initializes the symbol table. Here it is, and `init_table' as
- well:
- #include <stdio.h>
-
- main()
- {
- init_table ();
- yyparse ();
- }
-
- yyerror (s) /* Called by yyparse on error */
- char *s;
- {
- printf ("%s\n", s);
- }
-
- struct init
- {
- char *fname;
- double (*fnct)();
- };
-
- struct init arith_fncts[]
- = {
- "sin", sin,
- "cos", cos,
- "atan", atan,
- "ln", log,
- "exp", exp,
- "sqrt", sqrt,
- 0, 0
- };
-
- /* The symbol table: a chain of `struct symrec'. */
- symrec *sym_table = (symrec *)0;
-
- init_table () /* puts arithmetic functions in table. */
- {
- int i;
- symrec *ptr;
- for (i = 0; arith_fncts[i].fname != 0; i++)
- {
- ptr = putsym (arith_fncts[i].fname, FNCT);
- ptr->value.fnctptr = arith_fncts[i].fnct;
- }
- }
- By simply editing the initialization list and adding the necessary
- include files, you can add additional functions to the calculator.
- Two important functions allow look-up and installation of symbols in
- the symbol table. The function `putsym' is passed a name and the
- type (`VAR' or `FNCT') of the object to be installed. The object is
- linked to the front of the list, and a pointer to the object is
- returned. The function `getsym' is passed the name of the symbol to
- look up. If found, a pointer to that symbol is returned; otherwise
- zero is returned.
- symrec *
- putsym (sym_name,sym_type)
- char *sym_name;
- int sym_type;
- {
- symrec *ptr;
- ptr = (symrec *) malloc (sizeof(symrec));
- ptr->name = (char *) malloc (strlen(sym_name)+1);
- strcpy (ptr->name,sym_name);
- ptr->type = sym_type;
- ptr->value.var = 0; /* set value to 0 even if fctn. */
- ptr->next = (struct symrec *)sym_table;
- sym_table = ptr;
- return ptr;
- }
-
- symrec *
- getsym (sym_name)
- char *sym_name;
- {
- symrec *ptr;
- for (ptr = sym_table; ptr != (symrec *) 0;
- ptr = (symrec *)ptr->next)
- if (strcmp (ptr->name,sym_name) == 0)
- return ptr;
- return 0;
- }
- The function `yylex' must now recognize variables, numeric values,
- and the single-character arithmetic operators. Strings of
- alphanumeric characters with a leading nondigit are recognized as
- either variables or functions depending on what the symbol table says
- about them.
- The string is passed to `getsym' for look up in the symbol table. If
- the name appears in the table, a pointer to its location and its type
- (`VAR' or `FNCT') is returned to `yyparse'. If it is not already in
- the table, then it is installed as a `VAR' using `putsym'. Again, a
- pointer and its type (which must be `VAR') is returned to `yyparse'.
- No change is needed in the handling of numeric values and arithmetic
- operators in `yylex'.
- #include <ctype.h>
- yylex()
- {
- int c;
-
- /* Ignore whitespace, get first nonwhite character. */
- while ((c = getchar ()) == ' ' || c == '\t');
-
- if (c == EOF)
- return 0;
-
- /* Char starts a number => parse the number. */
- if (c == '.' || isdigit (c))
- {
- ungetc (c, stdin);
- scanf ("%lf", &yylval.val);
- return NUM;
- }
-
- /* Char starts an identifier => read the name. */
- if (isalpha (c))
- {
- symrec *s;
- static char *symbuf = 0;
- static int length = 0;
- int i;
-
- /* Initially make the buffer long enough
- for a 40-character symbol name. */
- if (length == 0)
- length = 40, symbuf = (char *)malloc (length + 1);
-
- i = 0;
- do
- {
- /* If buffer is full, make it bigger. */
- if (i == length)
- {
- length *= 2;
- symbuf = (char *)realloc (symbuf, length + 1);
- }
- /* Add this character to the buffer. */
- symbuf[i++] = c;
- /* Get another character. */
- c = getchar ();
- }
- while (c != EOF && isalnum (c));
-
- ungetc (c, stdin);
- symbuf[i] = '\0';
-
- s = getsym (symbuf);
- if (s == 0)
- s = putsym (symbuf, VAR);
- yylval.tptr = s;
- return s->type;
- }
-
- /* Any other character is a token by itself. */
- return c;
- }
- This program is both powerful and flexible. You may easily add new
- functions, and it is a simple job to modify this code to install
- predefined variables such as `pi' or `e' as well.
- File: bison.info, Node: Exercises, Prev: Multi-function calc, Up: Examples
- Exercises
- =========
- 1. Add some new functions from `math.h' to the initialization list.
- 2. Add another array that contains constants and their values.
- Then modify `init_table' to add these constants to the symbol
- table. It will be easiest to give the constants type `VAR'.
- 3. Make the program report an error if the user refers to an
- uninitialized variable in any way except to store a value in it.
- File: bison.info, Node: Grammar File, Next: Interface, Prev: Examples, Up: Top
- Bison Grammar Files
- *******************
- Bison takes as input a context-free grammar specification and
- produces a C-language function that recognizes correct instances of
- the grammar.
- The Bison grammar input file conventionally has a name ending in `.y'.
- * Menu:
- * Grammar Outline:: Overall layout of the grammar file.
- * Symbols:: Terminal and nonterminal symbols.
- * Rules:: How to write grammar rules.
- * Recursion:: Writing recursive rules.
- * Semantics:: Semantic values and actions.
- * Declarations:: All kinds of Bison declarations are described here.
- * Multiple Parsers:: Putting more than one Bison parser in one program.
-
- File: bison.info, Node: Grammar Outline, Next: Symbols, Prev: Grammar File, Up: Grammar File
- Outline of a Bison Grammar
- ==========================
- A Bison grammar file has four main sections, shown here with the
- appropriate delimiters:
- %{
- C DECLARATIONS
- %}
-
- BISON DECLARATIONS
-
- %%
- GRAMMAR RULES
- %%
-
- ADDITIONAL C CODE
- Comments enclosed in `/* ... */' may appear in any of the sections.
- * Menu:
- * C Declarations:: Syntax and usage of the C declarations section.
- * Bison Declarations:: Syntax and usage of the Bison declarations section.
- * Grammar Rules:: Syntax and usage of the grammar rules section.
- * C Code:: Syntax and usage of the additional C code section.
-
- File: bison.info, Node: C Declarations, Next: Bison Declarations, Prev: Grammar Outline, Up: Grammar Outline
- The C Declarations Section
- --------------------------
- The C DECLARATIONS section contains macro definitions and
- declarations of functions and variables that are used in the actions
- in the grammar rules. These are copied to the beginning of the
- parser file so that they precede the definition of `yylex'. You can
- use `#include' to get the declarations from a header file. If you
- don't need any C declarations, you may omit the `%{' and `%}'
- delimiters that bracket this section.
- File: bison.info, Node: Bison Declarations, Next: Grammar Rules, Prev: C Declarations, Up: Grammar Outline
- The Bison Declarations Section
- ------------------------------
- The BISON DECLARATIONS section contains declarations that define
- terminal and nonterminal symbols, specify precedence, and so on. In
- some simple grammars you may not need any declarations. *Note
- Declarations::.
- File: bison.info, Node: Grammar Rules, Next: C Code, Prev: Bison Declarations, Up: Grammar Outline
- The Grammar Rules Section
- -------------------------
- The "grammar rules" section contains one or more Bison grammar rules,
- and nothing else. *Note Rules::.
- There must always be at least one grammar rule, and the first `%%'
- (which precedes the grammar rules) may never be omitted even if it is
- the first thing in the file.
- File: bison.info, Node: C Code, Prev: Grammar Rules, Up: Grammar Outline
- The Additional C Code Section
- -----------------------------
- The ADDITIONAL C CODE section is copied verbatim to the end of the
- parser file, just as the C DECLARATIONS section is copied to the
- beginning. This is the most convenient place to put anything that
- you want to have in the parser file but which need not come before
- the definition of `yylex'. For example, the definitions of `yylex'
- and `yyerror' often go here. *Note Interface::.
- If the last section is empty, you may omit the `%%' that separates it
- from the grammar rules.
- The Bison parser itself contains many static variables whose names
- start with `yy' and many macros whose names start with `YY'. It is a
- good idea to avoid using any such names (except those documented in
- this manual) in the additional C code section of the grammar file.
- File: bison.info, Node: Symbols, Next: Rules, Prev: Grammar Outline, Up: Grammar File
- Symbols, Terminal and Nonterminal
- =================================
- "Symbols" in Bison grammars represent the grammatical classifications
- of the language.
- A "terminal symbol" (also known as a "token type") represents a class
- of syntactically equivalent tokens. You use the symbol in grammar
- rules to mean that a token in that class is allowed. The symbol is
- represented in the Bison parser by a numeric code, and the `yylex'
- function returns a token type code to indicate what kind of token has
- been read. You don't need to know what the code value is; you can
- use the symbol to stand for it.
- A "nonterminal symbol" stands for a class of syntactically equivalent
- groupings. The symbol name is used in writing grammar rules. By
- convention, it should be all lower case.
- Symbol names can contain letters, digits (not at the beginning),
- underscores and periods. Periods make sense only in nonterminals.
- There are two ways of writing terminal symbols in the grammar:
- * A "named token type" is written with an identifier, like an
- identifier in C. By convention, it should be all upper case.
- Each such name must be defined with a Bison declaration such as
- `%token'. *Note Token Decl::.
- * A "character token type" (or "literal token") is written in the
- grammar using the same syntax used in C for character constants;
- for example, `'+'' is a character token type. A character token
- type doesn't need to be declared unless you need to specify its
- semantic value data type (*note Value Type::.), associativity,
- or precedence (*note Precedence::.).
- By convention, a character token type is used only to represent
- a token that consists of that particular character. Thus, the
- token type `'+'' is used to represent the character `+' as a
- token. Nothing enforces this convention, but if you depart from
- it, your program will confuse other readers.
- All the usual escape sequences used in character literals in C
- can be used in Bison as well, but you must not use the null
- character as a character literal because its ASCII code, zero,
- is the code `yylex' returns for end-of-input (*note Calling
- Convention::.).
- How you choose to write a terminal symbol has no effect on its
- grammatical meaning. That depends only on where it appears in rules
- and on when the parser function returns that symbol.
- The value returned by `yylex' is always one of the terminal symbols
- (or 0 for end-of-input). Whichever way you write the token type in
- the grammar rules, you write it the same way in the definition of
- `yylex'. The numeric code for a character token type is simply the
- ASCII code for the character, so `yylex' can use the identical
- character constant to generate the requisite code. Each named token
- type becomes a C macro in the parser file, so `yylex' can use the
- name to stand for the code. (This is why periods don't make sense in
- terminal symbols.) *Note Calling Convention::.
- If `yylex' is defined in a separate file, you need to arrange for the
- token-type macro definitions to be available there. Use the `-d'
- option when you run Bison, so that it will write these macro
- definitions into a separate header file `NAME.tab.h' which you can
- include in the other source files that need it. *Note Invocation::.
- The symbol `error' is a terminal symbol reserved for error recovery
- (*note Error Recovery::.); you shouldn't use it for any other purpose.
- In particular, `yylex' should never return this value.
- File: bison.info, Node: Rules, Next: Recursion, Prev: Symbols, Up: Grammar File
- Syntax of Grammar Rules
- =======================
- A Bison grammar rule has the following general form:
- RESULT: COMPONENTS...
- ;
- where RESULT is the nonterminal symbol that this rule describes and
- COMPONENTS are various terminal and nonterminal symbols that are put
- together by this rule (*note Symbols::.). For example,
- exp: exp '+' exp
- ;
- says that two groupings of type `exp', with a `+' token in between,
- can be combined into a larger grouping of type `exp'.
- Whitespace in rules is significant only to separate symbols. You can
- add extra whitespace as you wish.
- Scattered among the components can be ACTIONS that determine the
- semantics of the rule. An action looks like this:
- {C STATEMENTS}
- Usually there is only one action and it follows the components.
- *Note Actions::.
- Multiple rules for the same RESULT can be written separately or can
- be joined with the vertical-bar character `|' as follows:
- RESULT: RULE1-COMPONENTS...
- | RULE2-COMPONENTS...
- ...
- ;
- They are still considered distinct rules even when joined in this way.
- If COMPONENTS in a rule is empty, it means that RESULT can match the
- empty string. For example, here is how to define a comma-separated
- sequence of zero or more `exp' groupings:
- expseq: /* empty */
- | expseq1
- ;
-
- expseq1: exp
- | expseq1 ',' exp
- ;
- It is customary to write a comment `/* empty */' in each rule with no
- components.
- File: bison.info, Node: Recursion, Next: Semantics, Prev: Rules, Up: Grammar File
- Recursive Rules
- ===============
- A rule is called "recursive" when its RESULT nonterminal appears also
- on its right hand side. Nearly all Bison grammars need to use
- recursion, because that is the only way to define a sequence of any
- number of somethings. Consider this recursive definition of a
- comma-separated sequence of one or more expressions:
- expseq1: exp
- | expseq1 ',' exp
- ;
- Since the recursive use of `expseq1' is the leftmost symbol in the
- right hand side, we call this "left recursion". By contrast, here
- the same construct is defined using "right recursion":
- expseq1: exp
- | exp ',' expseq1
- ;
- Any kind of sequence can be defined using either left recursion or
- right recursion, but you should always use left recursion, because it
- can parse a sequence of any number of elements with bounded stack
- space. Right recursion uses up space on the Bison stack in
- proportion to the number of elements in the sequence, because all the
- elements must be shifted onto the stack before the rule can be
- applied even once. *Note The Algorithm of the Bison Parser:
- Algorithm, for further explanation of this.
- "Indirect" or "mutual" recursion occurs when the result of the rule
- does not appear directly on its right hand side, but does appear in
- rules for other nonterminals which do appear on its right hand side.
- For example:
- expr: primary
- | primary '+' primary
- ;
-
- primary: constant
- | '(' expr ')'
- ;
- defines two mutually-recursive nonterminals, since each refers to the
- other.
- File: bison.info, Node: Semantics, Next: Declarations, Prev: Recursion, Up: Grammar File
- Defining Language Semantics
- ===========================
- The grammar rules for a language determine only the syntax. The
- semantics are determined by the semantic values associated with
- various tokens and groupings, and by the actions taken when various
- groupings are recognized.
- For example, the calculator calculates properly because the value
- associated with each expression is the proper number; it adds
- properly because the action for the grouping `X + Y' is to add the
- numbers associated with X and Y.
- * Menu:
- * Value Type:: Specifying one data type for all semantic values.
- * Multiple Types:: Specifying several alternative data types.
- * Actions:: An action is the semantic definition of a grammar rule.
- * Action Types:: Specifying data types for actions to operate on.
- * Mid-Rule Actions:: Most actions go at the end of a rule.
- This says when, why and how to use the exceptional
- action in the middle of a rule.
-
- File: bison.info, Node: Value Type, Next: Multiple Types, Prev: Semantics, Up: Semantics
- Data Types of Semantic Values
- -----------------------------
- In a simple program it may be sufficient to use the same data type
- for the semantic values of all language constructs. This was true in
- the RPN and infix calculator examples (*note RPN Calc::.).
- Bison's default is to use type `int' for all semantic values. To
- specify some other type, define `YYSTYPE' as a macro, like this:
- #define YYSTYPE double
- This macro definition must go in the C declarations section of the
- grammar file (*note Grammar Outline::.).
- File: bison.info, Node: Multiple Types, Next: Actions, Prev: Value Type, Up: Semantics
- More Than One Value Type
- ------------------------
- In most programs, you will need different data types for different
- kinds of tokens and groupings. For example, a numeric constant may
- need type `int' or `long', while a string constant needs type `char
- *', and an identifier might need a pointer to an entry in the symbol
- table.
- To use more than one data type for semantic values in one parser,
- Bison requires you to do two things:
- * Specify the entire collection of possible data types, with the
- `%union' Bison declaration (*note Union Decl::.).
- * Choose one of those types for each symbol (terminal or
- nonterminal) for which semantic values are used. This is done
- for tokens with the `%token' Bison declaration (*note Token
- Decl::.) and for groupings with the `%type' Bison declaration
- (*note Type Decl::.).
- File: bison.info, Node: Actions, Next: Action Types, Prev: Multiple Types, Up: Semantics
- Actions
- -------
- An action accompanies a syntactic rule and contains C code to be
- executed each time an instance of that rule is recognized. The task
- of most actions is to compute a semantic value for the grouping built
- by the rule from the semantic values associated with tokens or
- smaller groupings.
- An action consists of C statements surrounded by braces, much like a
- compound statement in C. It can be placed at any position in the
- rule; it is executed at that position. Most rules have just one
- action at the end of the rule, following all the components. Actions
- in the middle of a rule are tricky and used only for special purposes
- (*note Mid-Rule Actions::.).
- The C code in an action can refer to the semantic values of the
- components matched by the rule with the construct `$N', which stands
- for the value of the Nth component. The semantic value for the
- grouping being constructed is `$$'. (Bison translates both of these
- constructs into array element references when it copies the actions
- into the parser file.)
- Here is a typical example:
- exp: ...
- | exp '+' exp
- { $$ = $1 + $3; }
- This rule constructs an `exp' from two smaller `exp' groupings
- connected by a plus-sign token. In the action, `$1' and `$3' refer
- to the semantic values of the two component `exp' groupings, which
- are the first and third symbols on the right hand side of the rule.
- The sum is stored into `$$' so that it becomes the semantic value of
- the addition-expression just recognized by the rule. If there were a
- useful semantic value associated with the `+' token, it could be
- referred to as `$2'.
- `$N' with N zero or negative is allowed for reference to tokens and
- groupings on the stack *before* those that match the current rule.
- This is a very risky practice, and to use it reliably you must be
- certain of the context in which the rule is applied. Here is a case
- in which you can use this reliably:
- foo: expr bar '+' expr { ... }
- | expr bar '-' expr { ... }
- ;
-
- bar: /* empty */
- { previous_expr = $0; }
- ;
- As long as `bar' is used only in the fashion shown here, `$0' always
- refers to the `expr' which precedes `bar' in the definition of `foo'.
- File: bison.info, Node: Action Types, Next: Mid-Rule Actions, Prev: Actions, Up: Semantics
- Data Types of Values in Actions
- -------------------------------
- If you have chosen a single data type for semantic values, the `$$'
- and `$N' constructs always have that data type.
- If you have used `%union' to specify a variety of data types, then
- you must declare a choice among these types for each terminal or
- nonterminal symbol that can have a semantic value. Then each time
- you use `$$' or `$N', its data type is determined by which symbol it
- refers to in the rule. In this example,
- exp: ...
- | exp '+' exp
- { $$ = $1 + $3; }
- `$1' and `$3' refer to instances of `exp', so they all have the data
- type declared for the nonterminal symbol `exp'. If `$2' were used,
- it would have the data type declared for the terminal symbol `'+'',
- whatever that might be.
- Alternatively, you can specify the data type when you refer to the
- value, by inserting `<TYPE>' after the `$' at the beginning of the
- reference. For example, if you have defined types as shown here:
- %union {
- int itype;
- double dtype;
- }
- then you can write `$<itype>1' to refer to the first subunit of the
- rule as an integer, or `$<dtype>1' to refer to it as a double.
- File: bison.info, Node: Mid-Rule Actions, Prev: Action Types, Up: Semantics
- Actions in Mid-Rule
- -------------------
- Occasionally it is useful to put an action in the middle of a rule.
- These actions are written just like usual end-of-rule actions, but
- they are executed before the parser even recognizes the following
- components.
- A mid-rule action may refer to the components preceding it using
- `$N', but it may not refer to subsequent components because it is run
- before they are parsed.
- The mid-rule action itself counts as one of the components of the rule.
- This makes a difference when there is another action later in the
- same rule (and usually there is another at the end): you have to
- count the actions along with the symbols when working out which
- number N to use in `$N'.
- The mid-rule action can also have a semantic value. This can be set
- within that action by an assignment to `$$', and can referred to by
- later actions using `$N'. Since there is no symbol to name the
- action, there is no way to declare a data type for the value in
- advance, so you must use the `$<...>' construct to specify a data
- type each time you refer to this value.
- Here is an example from a hypothetical compiler, handling a `let'
- statement that looks like `let (VARIABLE) STATEMENT' and serves to
- create a variable named VARIABLE temporarily for the duration of
- STATEMENT. To parse this construct, we must put VARIABLE into the
- symbol table while STATEMENT is parsed, then remove it afterward.
- Here is how it is done:
- stmt: LET '(' var ')'
- { $<context>$ = push_context ();
- declare_variable ($3); }
- stmt { $$ = $6;
- pop_context ($<context>5); }
- As soon as `let (VARIABLE)' has been recognized, the first action is
- run. It saves a copy of the current semantic context (the list of
- accessible variables) as its semantic value, using alternative
- `context' in the data-type union. Then it calls `declare_variable'
- to add the new variable to that list. Once the first action is
- finished, the embedded statement `stmt' can be parsed. Note that the
- mid-rule action is component number 5, so the `stmt' is component
- number 6.
- After the embedded statement is parsed, its semantic value becomes
- the value of the entire `let'-statement. Then the semantic value
- from the earlier action is used to restore the prior list of
- variables. This removes the temporary `let'-variable from the list
- so that it won't appear to exist while the rest of the program is
- parsed.
- Taking action before a rule is completely recognized often leads to
- conflicts since the parser must commit to a parse in order to execute
- the action. For example, the following two rules, without mid-rule
- actions, can coexist in a working parser because the parser can shift
- the open-brace token and look at what follows before deciding whether
- there is a declaration or not:
- compound: '{' declarations statements '}'
- | '{' statements '}'
- ;
- But when we add a mid-rule action as follows, the rules become
- nonfunctional:
- compound: { prepare_for_local_variables (); }
- '{' declarations statements '}'
- | '{' statements '}'
- ;
- Now the parser is forced to decide whether to run the mid-rule action
- when it has read no farther than the open-brace. In other words, it
- must commit to using one rule or the other, without sufficient
- information to do it correctly. (The open-brace token is what is
- called the "look-ahead" token at this time, since the parser is still
- deciding what to do about it. *Note Look-Ahead::.)
- You might think that you could correct the problem by putting
- identical actions into the two rules, like this:
- compound: { prepare_for_local_variables (); }
- '{' declarations statements '}'
- | { prepare_for_local_variables (); }
- '{' statements '}'
- ;
- But this does not help, because Bison does not realize that the two
- actions are identical. (Bison never tries to understand the C code
- in an action.)
- If the grammar is such that a declaration can be distinguished from a
- statement by the first token (which is true in C), then one solution
- which does work is to put the action after the open-brace, like this:
- compound: '{' { prepare_for_local_variables (); }
- declarations statements '}'
- | '{' statements '}'
- ;
- Now the first token of the following declaration or statement, which
- would in any case tell Bison which rule to use, can still do so.
- Another solution is to bury the action inside a nonterminal symbol
- which serves as a subroutine:
- subroutine: /* empty */
- { prepare_for_local_variables (); }
- ;
-
- compound: subroutine
- '{' declarations statements '}'
- | subroutine
- '{' statements '}'
- ;
- Now Bison can execute the action in the rule for `subroutine' without
- deciding which rule for `compound' it will eventually use. Note that
- the action is now at the end of its rule. Any mid-rule action can be
- converted to an end-of-rule action in this way, and this is what
- Bison actually does to implement mid-rule actions.
- File: bison.info, Node: Declarations, Next: Multiple Parsers, Prev: Semantics, Up: Grammar File
- Bison Declarations
- ==================
- The "Bison declarations" section of a Bison grammar defines the
- symbols used in formulating the grammar and the data types of
- semantic values. *Note Symbols::.
- All token type names (but not single-character literal tokens such as
- `'+'' and `'*'') must be declared. Nonterminal symbols must be
- declared if you need to specify which data type to use for the
- semantic value (*note Multiple Types::.).
- The first rule in the file also specifies the start symbol, by default.
- If you want some other symbol to be the start symbol, you must
- declare it explicitly (*note Language and Grammar::.).
- * Menu:
- * Token Decl:: Declaring terminal symbols.
- * Precedence Decl:: Declaring terminals with precedence and associativity.
- * Union Decl:: Declaring the set of all semantic value types.
- * Type Decl:: Declaring the choice of type for a nonterminal symbol.
- * Expect Decl:: Suppressing warnings about shift/reduce conflicts.
- * Start Decl:: Specifying the start symbol.
- * Pure Decl:: Requesting a reentrant parser.
- * Decl Summary:: Table of all Bison declarations.
-
- File: bison.info, Node: Token Decl, Next: Precedence Decl, Prev: Declarations, Up: Declarations
- Token Type Names
- ----------------
- The basic way to declare a token type name (terminal symbol) is as
- follows:
- %token NAME
- Bison will convert this into a `#define' directive in the parser, so
- that the function `yylex' (if it is in this file) can use the name
- NAME to stand for this token type's code.
- Alternatively you can use `%left', `%right', or `%nonassoc' instead
- of `%token', if you wish to specify precedence. *Note Precedence
- Decl::.
- You can explicitly specify the numeric code for a token type by
- appending an integer value in the field immediately following the
- token name:
- %token NUM 300
- It is generally best, however, to let Bison choose the numeric codes
- for all token types. Bison will automatically select codes that
- don't conflict with each other or with ASCII characters.
- In the event that the stack type is a union, you must augment the
- `%token' or other token declaration to include the data type
- alternative delimited by angle-brackets (*note Multiple Types::.).
- For example:
- %union { /* define stack type */
- double val;
- symrec *tptr;
- }
- %token <val> NUM /* define token NUM and its type */
- File: bison.info, Node: Precedence Decl, Next: Union Decl, Prev: Token Decl, Up: Declarations
- Operator Precedence
- -------------------
- Use the `%left', `%right' or `%nonassoc' declaration to declare a
- token and specify its precedence and associativity, all at once.
- These are called "precedence declarations". *Note Precedence::, for
- general information on operator precedence.
- The syntax of a precedence declaration is the same as that of
- `%token': either
- %left SYMBOLS...
- or
- %left <TYPE> SYMBOLS...
- And indeed any of these declarations serves the purposes of `%token'.
- But in addition, they specify the associativity and relative
- precedence for all the SYMBOLS:
- * The associativity of an operator OP determines how repeated uses
- of the operator nest: whether `X OP Y OP Z' is parsed by
- grouping X with Y first or by grouping Y with Z first. `%left'
- specifies left-associativity (grouping X with Y first) and
- `%right' specifies right-associativity (grouping Y with Z
- first). `%nonassoc' specifies no associativity, which means
- that `X OP Y OP Z' is considered a syntax error.
- * The precedence of an operator determines how it nests with other
- operators. All the tokens declared in a single precedence
- declaration have equal precedence and nest together according to
- their associativity. When two tokens declared in different
- precedence declarations associate, the one declared later has
- the higher precedence and is grouped first.
- File: bison.info, Node: Union Decl, Next: Type Decl, Prev: Precedence Decl, Up: Declarations
- The Collection of Value Types
- -----------------------------
- The `%union' declaration specifies the entire collection of possible
- data types for semantic values. The keyword `%union' is followed by
- a pair of braces containing the same thing that goes inside a `union'
- in C. For example:
- %union {
- double val;
- symrec *tptr;
- }
- This says that the two alternative types are `double' and `symrec *'.
- They are given names `val' and `tptr'; these names are used in the
- `%token' and `%type' declarations to pick one of the types for a
- terminal or nonterminal symbol (*note Type Decl::.).
- Note that, unlike making a `union' declaration in C, you do not write
- a semicolon after the closing brace.
- File: bison.info, Node: Type Decl, Next: Expect Decl, Prev: Union Decl, Up: Declarations
- Nonterminal Symbols
- -------------------
- When you use `%union' to specify multiple value types, you must
- declare the value type of each nonterminal symbol for which values
- are used. This is done with a `%type' declaration, like this:
- %type <TYPE> NONTERMINAL...
- Here NONTERMINAL is the name of a nonterminal symbol, and TYPE is the
- name given in the `%union' to the alternative that you want (*note
- Union Decl::.). You can give any number of nonterminal symbols in
- the same `%type' declaration, if they have the same value type. Use
- spaces to separate the symbol names.
- File: bison.info, Node: Expect Decl, Next: Start Decl, Prev: Type Decl, Up: Declarations
- Suppressing Conflict Warnings
- -----------------------------
- Bison normally warns if there are any conflicts in the grammar (*note
- Shift/Reduce::.), but most real grammars have harmless shift/reduce
- conflicts which are resolved in a predictable way and would be
- difficult to eliminate. It is desirable to suppress the warning
- about these conflicts unless the number of conflicts changes. You
- can do this with the `%expect' declaration.
- The declaration looks like this:
- %expect N
- Here N is a decimal integer. The declaration says there should be no
- warning if there are N shift/reduce conflicts and no reduce/reduce
- conflicts. The usual warning is given if there are either more or
- fewer conflicts, or if there are any reduce/reduce conflicts.
- In general, using `%expect' involves these steps:
- * Compile your grammar without `%expect'. Use the `-v' option to
- get a verbose list of where the conflicts occur. Bison will
- also print the number of conflicts.
- * Check each of the conflicts to make sure that Bison's default
- resolution is what you really want. If not, rewrite the grammar
- and go back to the beginning.
- * Add an `%expect' declaration, copying the number N from the
- number which Bison printed.
- Now Bison will stop annoying you about the conflicts you have
- checked, but it will warn you again if changes in the grammer result
- in additional conflicts.
- File: bison.info, Node: Start Decl, Next: Pure Decl, Prev: Expect Decl, Up: Declarations
- The Start-Symbol
- ----------------
- Bison assumes by default that the start symbol for the grammar is the
- first nonterminal specified in the grammar specification section.
- The programmer may override this restriction with the `%start'
- declaration as follows:
- %start SYMBOL
- File: bison.info, Node: Pure Decl, Next: Decl Summary, Prev: Start Decl, Up: Declarations
- A Pure (Reentrant) Parser
- -------------------------
- A "reentrant" program is one which does not alter in the course of
- execution; in other words, it consists entirely of "pure" (read-only)
- code. Reentrancy is important whenever asynchronous execution is
- possible; for example, a nonreentrant program may not be safe to call
- from a signal handler. In systems with multiple threads of control,
- a nonreentrant program must be called only within interlocks.
- The Bison parser is not normally a reentrant program, because it uses
- statically allocated variables for communication with `yylex'. These
- variables include `yylval' and `yylloc'.
- The Bison declaration `%pure_parser' says that you want the parser to
- be reentrant. It looks like this:
- %pure_parser
- The effect is that the two communication variables become local
- variables in `yyparse', and a different calling convention is used
- for the lexical analyzer function `yylex'. *Note Pure Calling::, for
- the details of this. The variable `yynerrs' also becomes local in
- `yyparse' (*note Error Reporting::.). The convention for calling
- `yyparse' itself is unchanged.
- File: bison.info, Node: Decl Summary, Prev: Pure Decl, Up: Declarations
- Bison Declaration Summary
- -------------------------
- Here is a summary of all Bison declarations:
- `%union'
- Declare the collection of data types that semantic values may
- have (*note Union Decl::.).
- `%token'
- Declare a terminal symbol (token type name) with no precedence
- or associativity specified (*note Token Decl::.).
- `%right'
- Declare a terminal symbol (token type name) that is
- right-associative (*note Precedence Decl::.).
- `%left'
- Declare a terminal symbol (token type name) that is
- left-associative (*note Precedence Decl::.).
- `%nonassoc'
- Declare a terminal symbol (token type name) that is
- nonassociative (using it in a way that would be associative is a
- syntax error) (*note Precedence Decl::.).
- `%type'
- Declare the type of semantic values for a nonterminal symbol
- (*note Type Decl::.).
- `%start'
- Specify the grammar's start symbol (*note Start Decl::.).
- `%expect'
- Declare the expected number of shift-reduce conflicts (*note
- Expect Decl::.).
- `%pure_parser'
- Request a pure (reentrant) parser program (*note Pure Decl::.).
- File: bison.info, Node: Multiple Parsers, Prev: Declarations, Up: Grammar File
- Multiple Parsers in the Same Program
- ====================================
- Most programs that use Bison parse only one language and therefore
- contain only one Bison parser. But what if you want to parse more
- than one language with the same program? Then you need to avoid a
- name conflict between different definitions of `yyparse', `yylval',
- and so on.
- The easy way to do this is to use the option `-p PREFIX' (*note
- Invocation::.). This renames the interface functions and variables
- of the Bison parser to start with PREFIX instead of `yy'. You can
- use this to give each parser distinct names that do not conflict.
- The precise list of symbols renamed is `yyparse', `yylex', `yyerror',
- `yylval', `yychar' and `yydebug'. For example, if you use `-p c',
- the names become `cparse', `clex', and so on.
- *All the other variables and macros associated with Bison are not
- renamed.* These others are not global; there is no conflict if the
- same name is used in different parsers. For example, `YYSTYPE' is
- not renamed, but defining this in different ways in different parsers
- causes no trouble (*note Value Type::.).
- The `-p' option works by adding macro definitions to the beginning of
- the parser source file, defining `yyparse' as `PREFIXparse', and so
- on. This effectively substitutes one name for the other in the
- entire parser file.
|