1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101111121131141151161171181191201211221231241251261271281291301311321331341351361371381391401411421431441451461471481491501511521531541551561571581591601611621631641651661671681691701711721731741751761771781791801811821831841851861871881891901911921931941951961971981992002012022032042052062072082092102112122132142152162172182192202212222232242252262272282292302312322332342352362372382392402412422432442452462472482492502512522532542552562572582592602612622632642652662672682692702712722732742752762772782792802812822832842852862872882892902912922932942952962972982993003013023033043053063073083093103113123133143153163173183193203213223233243253263273283293303313323333343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434444454464474484494504514524534544554564574584594604614624634644654664674684694704714724734744754764774784794804814824834844854864874884894904914924934944954964974984995005015025035045055065075085095105115125135145155165175185195205215225235245255265275285295305315325335345355365375385395405415425435445455465475485495505515525535545555565575585595605615625635645655665675685695705715725735745755765775785795805815825835845855865875885895905915925935945955965975985996006016026036046056066076086096106116126136146156166176186196206216226236246256266276286296306316326336346356366376386396406416426436446456466476486496506516526536546556566576586596606616626636646656666676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767777787797807817827837847857867877887897907917927937947957967977987998008018028038048058068078088098108118128138148158168178188198208218228238248258268278288298308318328338348358368378388398408418428438448458468478488498508518528538548558568578588598608618628638648658668678688698708718728738748758768778788798808818828838848858868878888898908918928938948958968978988999009019029039049059069079089099109119129139149159169179189199209219229239249259269279289299309319329339349359369379389399409419429439449459469479489499509519529539549559569579589599609619629639649659669679689699709719729739749759769779789799809819829839849859869879889899909919929939949959969979989991000100110021003100410051006100710081009101010111012101310141015101610171018101910201021102210231024102510261027102810291030103110321033103410351036103710381039104010411042104310441045104610471048104910501051105210531054105510561057105810591060106110621063106410651066106710681069107010711072107310741075107610771078107910801081108210831084108510861087108810891090109110921093109410951096109710981099110011011102110311041105110611071108110911101111111211131114111511161117111811191120112111221123112411251126112711281129113011311132113311341135113611371138113911401141114211431144114511461147114811491150115111521153115411551156115711581159116011611162116311641165116611671168116911701171117211731174117511761177117811791180118111821183118411851186118711881189119011911192119311941195119611971198119912001201120212031204120512061207120812091210121112121213121412151216121712181219122012211222122312241225122612271228122912301231123212331234123512361237123812391240124112421243124412451246124712481249125012511252125312541255125612571258125912601261126212631264126512661267126812691270127112721273127412751276127712781279128012811282128312841285128612871288128912901291129212931294129512961297129812991300130113021303130413051306130713081309 |
- This is Info file bison.info, produced by Makeinfo-1.52 from the input
- file ./bison.texinfo.
- This file documents the Bison parser generator.
- Copyright (C) 1988, 1989, 1990, 1991, 1992 Free Software Foundation,
- Inc.
- Permission is granted to make and distribute verbatim copies of this
- manual provided the copyright notice and this permission notice are
- preserved on all copies.
- Permission is granted to copy and distribute modified versions of
- this manual under the conditions for verbatim copying, provided also
- that the sections entitled "GNU General Public License" and "Conditions
- for Using Bison" are included exactly as in the original, and provided
- that the entire resulting derived work is distributed under the terms
- of a permission notice identical to this one.
- Permission is granted to copy and distribute translations of this
- manual into another language, under the above conditions for modified
- versions, except that the sections entitled "GNU General Public
- License", "Conditions for Using Bison" and this permission notice may be
- included in translations approved by the Free Software Foundation
- instead of in the original English.
- File: bison.info, Node: Rpcalc Rules, Next: Rpcalc Lexer, Prev: Rpcalc Decls, Up: RPN Calc
- Grammar Rules for `rpcalc'
- --------------------------
- Here are the grammar rules for the reverse polish notation
- calculator.
- input: /* empty */
- | input line
- ;
-
- line: '\n'
- | exp '\n' { printf ("\t%.10g\n", $1); }
- ;
-
- exp: NUM { $$ = $1; }
- | exp exp '+' { $$ = $1 + $2; }
- | exp exp '-' { $$ = $1 - $2; }
- | exp exp '*' { $$ = $1 * $2; }
- | exp exp '/' { $$ = $1 / $2; }
- /* Exponentiation */
- | exp exp '^' { $$ = pow ($1, $2); }
- /* Unary minus */
- | exp 'n' { $$ = -$1; }
- ;
- %%
- The groupings of the rpcalc "language" defined here are the
- expression (given the name `exp'), the line of input (`line'), and the
- complete input transcript (`input'). Each of these nonterminal symbols
- has several alternate rules, joined by the `|' punctuator which is read
- as "or". The following sections explain what these rules mean.
- The semantics of the language is determined by the actions taken
- when a grouping is recognized. The actions are the C code that appears
- inside braces. *Note Actions::.
- You must specify these actions in C, but Bison provides the means for
- passing semantic values between the rules. In each action, the
- pseudo-variable `$$' stands for the semantic value for the grouping
- that the rule is going to construct. Assigning a value to `$$' is the
- main job of most actions. The semantic values of the components of the
- rule are referred to as `$1', `$2', and so on.
- * Menu:
- * Rpcalc Input::
- * Rpcalc Line::
- * Rpcalc Expr::
- File: bison.info, Node: Rpcalc Input, Next: Rpcalc Line, Up: Rpcalc Rules
- Explanation of `input'
- ----------------------
- Consider the definition of `input':
- input: /* empty */
- | input line
- ;
- This definition reads as follows: "A complete input is either an
- empty string, or a complete input followed by an input line". Notice
- that "complete input" is defined in terms of itself. This definition
- is said to be "left recursive" since `input' appears always as the
- leftmost symbol in the sequence. *Note Recursive Rules: Recursion.
- The first alternative is empty because there are no symbols between
- the colon and the first `|'; this means that `input' can match an empty
- string of input (no tokens). We write the rules this way because it is
- legitimate to type `Ctrl-d' right after you start the calculator. It's
- conventional to put an empty alternative first and write the comment
- `/* empty */' in it.
- The second alternate rule (`input line') handles all nontrivial
- input. It means, "After reading any number of lines, read one more
- line if possible." The left recursion makes this rule into a loop.
- Since the first alternative matches empty input, the loop can be
- executed zero or more times.
- The parser function `yyparse' continues to process input until a
- grammatical error is seen or the lexical analyzer says there are no more
- input tokens; we will arrange for the latter to happen at end of file.
- File: bison.info, Node: Rpcalc Line, Next: Rpcalc Expr, Prev: Rpcalc Input, Up: Rpcalc Rules
- Explanation of `line'
- ---------------------
- Now consider the definition of `line':
- line: '\n'
- | exp '\n' { printf ("\t%.10g\n", $1); }
- ;
- The first alternative is a token which is a newline character; this
- means that rpcalc accepts a blank line (and ignores it, since there is
- no action). The second alternative is an expression followed by a
- newline. This is the alternative that makes rpcalc useful. The
- semantic value of the `exp' grouping is the value of `$1' because the
- `exp' in question is the first symbol in the alternative. The action
- prints this value, which is the result of the computation the user
- asked for.
- This action is unusual because it does not assign a value to `$$'.
- As a consequence, the semantic value associated with the `line' is
- uninitialized (its value will be unpredictable). This would be a bug if
- that value were ever used, but we don't use it: once rpcalc has printed
- the value of the user's input line, that value is no longer needed.
- File: bison.info, Node: Rpcalc Expr, Prev: Rpcalc Line, Up: Rpcalc Rules
- Explanation of `expr'
- ---------------------
- The `exp' grouping has several rules, one for each kind of
- expression. The first rule handles the simplest expressions: those
- that are just numbers. The second handles an addition-expression,
- which looks like two expressions followed by a plus-sign. The third
- handles subtraction, and so on.
- exp: NUM
- | exp exp '+' { $$ = $1 + $2; }
- | exp exp '-' { $$ = $1 - $2; }
- ...
- ;
- We have used `|' to join all the rules for `exp', but we could
- equally well have written them separately:
- exp: NUM ;
- exp: exp exp '+' { $$ = $1 + $2; } ;
- exp: exp exp '-' { $$ = $1 - $2; } ;
- ...
- Most of the rules have actions that compute the value of the
- expression in terms of the value of its parts. For example, in the
- rule for addition, `$1' refers to the first component `exp' and `$2'
- refers to the second one. The third component, `'+'', has no meaningful
- associated semantic value, but if it had one you could refer to it as
- `$3'. When `yyparse' recognizes a sum expression using this rule, the
- sum of the two subexpressions' values is produced as the value of the
- entire expression. *Note Actions::.
- You don't have to give an action for every rule. When a rule has no
- action, Bison by default copies the value of `$1' into `$$'. This is
- what happens in the first rule (the one that uses `NUM').
- The formatting shown here is the recommended convention, but Bison
- does not require it. You can add or change whitespace as much as you
- wish. For example, this:
- exp : NUM | exp exp '+' {$$ = $1 + $2; } | ...
- means the same thing as this:
- exp: NUM
- | exp exp '+' { $$ = $1 + $2; }
- | ...
- The latter, however, is much more readable.
- File: bison.info, Node: Rpcalc Lexer, Next: Rpcalc Main, Prev: Rpcalc Rules, Up: RPN Calc
- The `rpcalc' Lexical Analyzer
- -----------------------------
- The lexical analyzer's job is low-level parsing: converting
- characters or sequences of characters into tokens. The Bison parser
- gets its tokens by calling the lexical analyzer. *Note The Lexical
- Analyzer Function `yylex': Lexical.
- Only a simple lexical analyzer is needed for the RPN calculator.
- This lexical analyzer skips blanks and tabs, then reads in numbers as
- `double' and returns them as `NUM' tokens. Any other character that
- isn't part of a number is a separate token. Note that the token-code
- for such a single-character token is the character itself.
- The return value of the lexical analyzer function is a numeric code
- which represents a token type. The same text used in Bison rules to
- stand for this token type is also a C expression for the numeric code
- for the type. This works in two ways. If the token type is a
- character literal, then its numeric code is the ASCII code for that
- character; you can use the same character literal in the lexical
- analyzer to express the number. If the token type is an identifier,
- that identifier is defined by Bison as a C macro whose definition is
- the appropriate number. In this example, therefore, `NUM' becomes a
- macro for `yylex' to use.
- The semantic value of the token (if it has one) is stored into the
- global variable `yylval', which is where the Bison parser will look for
- it. (The C data type of `yylval' is `YYSTYPE', which was defined at
- the beginning of the grammar; *note Declarations for `rpcalc': Rpcalc
- Decls..)
- A token type code of zero is returned if the end-of-file is
- encountered. (Bison recognizes any nonpositive value as indicating the
- end of the input.)
- Here is the code for the lexical analyzer:
- /* Lexical analyzer returns a double floating point
- number on the stack and the token NUM, or the ASCII
- character read if not a number. Skips all blanks
- and tabs, returns 0 for EOF. */
-
- #include <ctype.h>
-
- yylex ()
- {
- int c;
-
- /* skip white space */
- while ((c = getchar ()) == ' ' || c == '\t')
- ;
- /* process numbers */
- if (c == '.' || isdigit (c))
- {
- ungetc (c, stdin);
- scanf ("%lf", &yylval);
- return NUM;
- }
- /* return end-of-file */
- if (c == EOF)
- return 0;
- /* return single chars */
- return c;
- }
- File: bison.info, Node: Rpcalc Main, Next: Rpcalc Error, Prev: Rpcalc Lexer, Up: RPN Calc
- The Controlling Function
- ------------------------
- In keeping with the spirit of this example, the controlling function
- is kept to the bare minimum. The only requirement is that it call
- `yyparse' to start the process of parsing.
- main ()
- {
- yyparse ();
- }
- File: bison.info, Node: Rpcalc Error, Next: Rpcalc Gen, Prev: Rpcalc Main, Up: RPN Calc
- The Error Reporting Routine
- ---------------------------
- When `yyparse' detects a syntax error, it calls the error reporting
- function `yyerror' to print an error message (usually but not always
- `"parse error"'). It is up to the programmer to supply `yyerror'
- (*note Parser C-Language Interface: Interface.), so here is the
- definition we will use:
- #include <stdio.h>
-
- yyerror (s) /* Called by yyparse on error */
- char *s;
- {
- printf ("%s\n", s);
- }
- After `yyerror' returns, the Bison parser may recover from the error
- and continue parsing if the grammar contains a suitable error rule
- (*note Error Recovery::.). Otherwise, `yyparse' returns nonzero. We
- have not written any error rules in this example, so any invalid input
- will cause the calculator program to exit. This is not clean behavior
- for a real calculator, but it is adequate in the first example.
- File: bison.info, Node: Rpcalc Gen, Next: Rpcalc Compile, Prev: Rpcalc Error, Up: RPN Calc
- Running Bison to Make the Parser
- --------------------------------
- Before running Bison to produce a parser, we need to decide how to
- arrange all the source code in one or more source files. For such a
- simple example, the easiest thing is to put everything in one file.
- The definitions of `yylex', `yyerror' and `main' go at the end, in the
- "additional C code" section of the file (*note The Overall Layout of a
- Bison Grammar: Grammar Layout.).
- For a large project, you would probably have several source files,
- and use `make' to arrange to recompile them.
- With all the source in a single file, you use the following command
- to convert it into a parser file:
- bison FILE_NAME.y
- In this example the file was called `rpcalc.y' (for "Reverse Polish
- CALCulator"). Bison produces a file named `FILE_NAME.tab.c', removing
- the `.y' from the original file name. The file output by Bison contains
- the source code for `yyparse'. The additional functions in the input
- file (`yylex', `yyerror' and `main') are copied verbatim to the output.
- File: bison.info, Node: Rpcalc Compile, Prev: Rpcalc Gen, Up: RPN Calc
- Compiling the Parser File
- -------------------------
- Here is how to compile and run the parser file:
- # List files in current directory.
- % ls
- rpcalc.tab.c rpcalc.y
-
- # Compile the Bison parser.
- # `-lm' tells compiler to search math library for `pow'.
- % cc rpcalc.tab.c -lm -o rpcalc
-
- # List files again.
- % ls
- rpcalc rpcalc.tab.c rpcalc.y
- The file `rpcalc' now contains the executable code. Here is an
- example session using `rpcalc'.
- % rpcalc
- 4 9 +
- 13
- 3 7 + 3 4 5 *+-
- -13
- 3 7 + 3 4 5 * + - n Note the unary minus, `n'
- 13
- 5 6 / 4 n +
- -3.166666667
- 3 4 ^ Exponentiation
- 81
- ^D End-of-file indicator
- %
- File: bison.info, Node: Infix Calc, Next: Simple Error Recovery, Prev: RPN Calc, Up: Examples
- Infix Notation Calculator: `calc'
- =================================
- We now modify rpcalc to handle infix operators instead of postfix.
- Infix notation involves the concept of operator precedence and the need
- for parentheses nested to arbitrary depth. Here is the Bison code for
- `calc.y', an infix desk-top calculator.
- /* Infix notation calculator--calc */
-
- %{
- #define YYSTYPE double
- #include <math.h>
- %}
-
- /* BISON Declarations */
- %token NUM
- %left '-' '+'
- %left '*' '/'
- %left NEG /* negation--unary minus */
- %right '^' /* exponentiation */
-
- /* Grammar follows */
- %%
- input: /* empty string */
- | input line
- ;
-
- line: '\n'
- | exp '\n' { printf ("\t%.10g\n", $1); }
- ;
-
- exp: NUM { $$ = $1; }
- | exp '+' exp { $$ = $1 + $3; }
- | exp '-' exp { $$ = $1 - $3; }
- | exp '*' exp { $$ = $1 * $3; }
- | exp '/' exp { $$ = $1 / $3; }
- | '-' exp %prec NEG { $$ = -$2; }
- | exp '^' exp { $$ = pow ($1, $3); }
- | '(' exp ')' { $$ = $2; }
- ;
- %%
- The functions `yylex', `yyerror' and `main' can be the same as before.
- There are two important new features shown in this code.
- In the second section (Bison declarations), `%left' declares token
- types and says they are left-associative operators. The declarations
- `%left' and `%right' (right associativity) take the place of `%token'
- which is used to declare a token type name without associativity.
- (These tokens are single-character literals, which ordinarily don't
- need to be declared. We declare them here to specify the
- associativity.)
- Operator precedence is determined by the line ordering of the
- declarations; the higher the line number of the declaration (lower on
- the page or screen), the higher the precedence. Hence, exponentiation
- has the highest precedence, unary minus (`NEG') is next, followed by
- `*' and `/', and so on. *Note Operator Precedence: Precedence.
- The other important new feature is the `%prec' in the grammar section
- for the unary minus operator. The `%prec' simply instructs Bison that
- the rule `| '-' exp' has the same precedence as `NEG'--in this case the
- next-to-highest. *Note Context-Dependent Precedence: Contextual
- Precedence.
- Here is a sample run of `calc.y':
- % calc
- 4 + 4.5 - (34/(8*3+-3))
- 6.880952381
- -56 + 2
- -54
- 3 ^ 2
- 9
- File: bison.info, Node: Simple Error Recovery, Next: Multi-function Calc, Prev: Infix Calc, Up: Examples
- Simple Error Recovery
- =====================
- Up to this point, this manual has not addressed the issue of "error
- recovery"--how to continue parsing after the parser detects a syntax
- error. All we have handled is error reporting with `yyerror'. Recall
- that by default `yyparse' returns after calling `yyerror'. This means
- that an erroneous input line causes the calculator program to exit.
- Now we show how to rectify this deficiency.
- The Bison language itself includes the reserved word `error', which
- may be included in the grammar rules. In the example below it has been
- added to one of the alternatives for `line':
- line: '\n'
- | exp '\n' { printf ("\t%.10g\n", $1); }
- | error '\n' { yyerrok; }
- ;
- This addition to the grammar allows for simple error recovery in the
- event of a parse error. If an expression that cannot be evaluated is
- read, the error will be recognized by the third rule for `line', and
- parsing will continue. (The `yyerror' function is still called upon to
- print its message as well.) The action executes the statement
- `yyerrok', a macro defined automatically by Bison; its meaning is that
- error recovery is complete (*note Error Recovery::.). Note the
- difference between `yyerrok' and `yyerror'; neither one is a misprint.
- This form of error recovery deals with syntax errors. There are
- other kinds of errors; for example, division by zero, which raises an
- exception signal that is normally fatal. A real calculator program
- must handle this signal and use `longjmp' to return to `main' and
- resume parsing input lines; it would also have to discard the rest of
- the current line of input. We won't discuss this issue further because
- it is not specific to Bison programs.
- File: bison.info, Node: Multi-function Calc, Next: Exercises, Prev: Simple Error Recovery, Up: Examples
- Multi-Function Calculator: `mfcalc'
- ===================================
- Now that the basics of Bison have been discussed, it is time to move
- on to a more advanced problem. The above calculators provided only five
- functions, `+', `-', `*', `/' and `^'. It would be nice to have a
- calculator that provides other mathematical functions such as `sin',
- `cos', etc.
- It is easy to add new operators to the infix calculator as long as
- they are only single-character literals. The lexical analyzer `yylex'
- passes back all non-number characters as tokens, so new grammar rules
- suffice for adding a new operator. But we want something more
- flexible: built-in functions whose syntax has this form:
- FUNCTION_NAME (ARGUMENT)
- At the same time, we will add memory to the calculator, by allowing you
- to create named variables, store values in them, and use them later.
- Here is a sample session with the multi-function calculator:
- % acalc
- pi = 3.141592653589
- 3.1415926536
- sin(pi)
- 0.0000000000
- alpha = beta1 = 2.3
- 2.3000000000
- alpha
- 2.3000000000
- ln(alpha)
- 0.8329091229
- exp(ln(beta1))
- 2.3000000000
- %
- Note that multiple assignment and nested function calls are
- permitted.
- * Menu:
- * Decl: Mfcalc Decl. Bison declarations for multi-function calculator.
- * Rules: Mfcalc Rules. Grammar rules for the calculator.
- * Symtab: Mfcalc Symtab. Symbol table management subroutines.
- File: bison.info, Node: Mfcalc Decl, Next: Mfcalc Rules, Up: Multi-function Calc
- Declarations for `mfcalc'
- -------------------------
- Here are the C and Bison declarations for the multi-function
- calculator.
- %{
- #include <math.h> /* For math functions, cos(), sin(), etc. */
- #include "calc.h" /* Contains definition of `symrec' */
- %}
- %union {
- double val; /* For returning numbers. */
- symrec *tptr; /* For returning symbol-table pointers */
- }
-
- %token <val> NUM /* Simple double precision number */
- %token <tptr> VAR FNCT /* Variable and Function */
- %type <val> exp
-
- %right '='
- %left '-' '+'
- %left '*' '/'
- %left NEG /* Negation--unary minus */
- %right '^' /* Exponentiation */
-
- /* Grammar follows */
-
- %%
- The above grammar introduces only two new features of the Bison
- language. These features allow semantic values to have various data
- types (*note More Than One Value Type: Multiple Types.).
- The `%union' declaration specifies the entire list of possible types;
- this is instead of defining `YYSTYPE'. The allowable types are now
- double-floats (for `exp' and `NUM') and pointers to entries in the
- symbol table. *Note The Collection of Value Types: Union Decl.
- Since values can now have various types, it is necessary to
- associate a type with each grammar symbol whose semantic value is used.
- These symbols are `NUM', `VAR', `FNCT', and `exp'. Their declarations
- are augmented with information about their data type (placed between
- angle brackets).
- The Bison construct `%type' is used for declaring nonterminal
- symbols, just as `%token' is used for declaring token types. We have
- not used `%type' before because nonterminal symbols are normally
- declared implicitly by the rules that define them. But `exp' must be
- declared explicitly so we can specify its value type. *Note
- Nonterminal Symbols: Type Decl.
- File: bison.info, Node: Mfcalc Rules, Next: Mfcalc Symtab, Prev: Mfcalc Decl, Up: Multi-function Calc
- Grammar Rules for `mfcalc'
- --------------------------
- Here are the grammar rules for the multi-function calculator. Most
- of them are copied directly from `calc'; three rules, those which
- mention `VAR' or `FNCT', are new.
- input: /* empty */
- | input line
- ;
-
- line:
- '\n'
- | exp '\n' { printf ("\t%.10g\n", $1); }
- | error '\n' { yyerrok; }
- ;
-
- exp: NUM { $$ = $1; }
- | VAR { $$ = $1->value.var; }
- | VAR '=' exp { $$ = $3; $1->value.var = $3; }
- | FNCT '(' exp ')' { $$ = (*($1->value.fnctptr))($3); }
- | exp '+' exp { $$ = $1 + $3; }
- | exp '-' exp { $$ = $1 - $3; }
- | exp '*' exp { $$ = $1 * $3; }
- | exp '/' exp { $$ = $1 / $3; }
- | '-' exp %prec NEG { $$ = -$2; }
- | exp '^' exp { $$ = pow ($1, $3); }
- | '(' exp ')' { $$ = $2; }
- ;
- /* End of grammar */
- %%
- File: bison.info, Node: Mfcalc Symtab, Prev: Mfcalc Rules, Up: Multi-function Calc
- The `mfcalc' Symbol Table
- -------------------------
- The multi-function calculator requires a symbol table to keep track
- of the names and meanings of variables and functions. This doesn't
- affect the grammar rules (except for the actions) or the Bison
- declarations, but it requires some additional C functions for support.
- The symbol table itself consists of a linked list of records. Its
- definition, which is kept in the header `calc.h', is as follows. It
- provides for either functions or variables to be placed in the table.
- /* Data type for links in the chain of symbols. */
- struct symrec
- {
- char *name; /* name of symbol */
- int type; /* type of symbol: either VAR or FNCT */
- union {
- double var; /* value of a VAR */
- double (*fnctptr)(); /* value of a FNCT */
- } value;
- struct symrec *next; /* link field */
- };
- typedef struct symrec symrec;
-
- /* The symbol table: a chain of `struct symrec'. */
- extern symrec *sym_table;
-
- symrec *putsym ();
- symrec *getsym ();
- The new version of `main' includes a call to `init_table', a
- function that initializes the symbol table. Here it is, and
- `init_table' as well:
- #include <stdio.h>
-
- main ()
- {
- init_table ();
- yyparse ();
- }
- yyerror (s) /* Called by yyparse on error */
- char *s;
- {
- printf ("%s\n", s);
- }
-
- struct init
- {
- char *fname;
- double (*fnct)();
- };
- struct init arith_fncts[]
- = {
- "sin", sin,
- "cos", cos,
- "atan", atan,
- "ln", log,
- "exp", exp,
- "sqrt", sqrt,
- 0, 0
- };
-
- /* The symbol table: a chain of `struct symrec'. */
- symrec *sym_table = (symrec *)0;
- init_table () /* puts arithmetic functions in table. */
- {
- int i;
- symrec *ptr;
- for (i = 0; arith_fncts[i].fname != 0; i++)
- {
- ptr = putsym (arith_fncts[i].fname, FNCT);
- ptr->value.fnctptr = arith_fncts[i].fnct;
- }
- }
- By simply editing the initialization list and adding the necessary
- include files, you can add additional functions to the calculator.
- Two important functions allow look-up and installation of symbols in
- the symbol table. The function `putsym' is passed a name and the type
- (`VAR' or `FNCT') of the object to be installed. The object is linked
- to the front of the list, and a pointer to the object is returned. The
- function `getsym' is passed the name of the symbol to look up. If
- found, a pointer to that symbol is returned; otherwise zero is returned.
- symrec *
- putsym (sym_name,sym_type)
- char *sym_name;
- int sym_type;
- {
- symrec *ptr;
- ptr = (symrec *) malloc (sizeof (symrec));
- ptr->name = (char *) malloc (strlen (sym_name) + 1);
- strcpy (ptr->name,sym_name);
- ptr->type = sym_type;
- ptr->value.var = 0; /* set value to 0 even if fctn. */
- ptr->next = (struct symrec *)sym_table;
- sym_table = ptr;
- return ptr;
- }
-
- symrec *
- getsym (sym_name)
- char *sym_name;
- {
- symrec *ptr;
- for (ptr = sym_table; ptr != (symrec *) 0;
- ptr = (symrec *)ptr->next)
- if (strcmp (ptr->name,sym_name) == 0)
- return ptr;
- return 0;
- }
- The function `yylex' must now recognize variables, numeric values,
- and the single-character arithmetic operators. Strings of alphanumeric
- characters with a leading nondigit are recognized as either variables or
- functions depending on what the symbol table says about them.
- The string is passed to `getsym' for look up in the symbol table. If
- the name appears in the table, a pointer to its location and its type
- (`VAR' or `FNCT') is returned to `yyparse'. If it is not already in
- the table, then it is installed as a `VAR' using `putsym'. Again, a
- pointer and its type (which must be `VAR') is returned to `yyparse'.
- No change is needed in the handling of numeric values and arithmetic
- operators in `yylex'.
- #include <ctype.h>
- yylex ()
- {
- int c;
-
- /* Ignore whitespace, get first nonwhite character. */
- while ((c = getchar ()) == ' ' || c == '\t');
-
- if (c == EOF)
- return 0;
- /* Char starts a number => parse the number. */
- if (c == '.' || isdigit (c))
- {
- ungetc (c, stdin);
- scanf ("%lf", &yylval.val);
- return NUM;
- }
- /* Char starts an identifier => read the name. */
- if (isalpha (c))
- {
- symrec *s;
- static char *symbuf = 0;
- static int length = 0;
- int i;
- /* Initially make the buffer long enough
- for a 40-character symbol name. */
- if (length == 0)
- length = 40, symbuf = (char *)malloc (length + 1);
-
- i = 0;
- do
- {
- /* If buffer is full, make it bigger. */
- if (i == length)
- {
- length *= 2;
- symbuf = (char *)realloc (symbuf, length + 1);
- }
- /* Add this character to the buffer. */
- symbuf[i++] = c;
- /* Get another character. */
- c = getchar ();
- }
- while (c != EOF && isalnum (c));
-
- ungetc (c, stdin);
- symbuf[i] = '\0';
- s = getsym (symbuf);
- if (s == 0)
- s = putsym (symbuf, VAR);
- yylval.tptr = s;
- return s->type;
- }
-
- /* Any other character is a token by itself. */
- return c;
- }
- This program is both powerful and flexible. You may easily add new
- functions, and it is a simple job to modify this code to install
- predefined variables such as `pi' or `e' as well.
- File: bison.info, Node: Exercises, Prev: Multi-function Calc, Up: Examples
- Exercises
- =========
- 1. Add some new functions from `math.h' to the initialization list.
- 2. Add another array that contains constants and their values. Then
- modify `init_table' to add these constants to the symbol table.
- It will be easiest to give the constants type `VAR'.
- 3. Make the program report an error if the user refers to an
- uninitialized variable in any way except to store a value in it.
- File: bison.info, Node: Grammar File, Next: Interface, Prev: Examples, Up: Top
- Bison Grammar Files
- *******************
- Bison takes as input a context-free grammar specification and
- produces a C-language function that recognizes correct instances of the
- grammar.
- The Bison grammar input file conventionally has a name ending in
- `.y'.
- * Menu:
- * Grammar Outline:: Overall layout of the grammar file.
- * Symbols:: Terminal and nonterminal symbols.
- * Rules:: How to write grammar rules.
- * Recursion:: Writing recursive rules.
- * Semantics:: Semantic values and actions.
- * Declarations:: All kinds of Bison declarations are described here.
- * Multiple Parsers:: Putting more than one Bison parser in one program.
- File: bison.info, Node: Grammar Outline, Next: Symbols, Up: Grammar File
- Outline of a Bison Grammar
- ==========================
- A Bison grammar file has four main sections, shown here with the
- appropriate delimiters:
- %{
- C DECLARATIONS
- %}
-
- BISON DECLARATIONS
-
- %%
- GRAMMAR RULES
- %%
-
- ADDITIONAL C CODE
- Comments enclosed in `/* ... */' may appear in any of the sections.
- * Menu:
- * C Declarations:: Syntax and usage of the C declarations section.
- * Bison Declarations:: Syntax and usage of the Bison declarations section.
- * Grammar Rules:: Syntax and usage of the grammar rules section.
- * C Code:: Syntax and usage of the additional C code section.
- File: bison.info, Node: C Declarations, Next: Bison Declarations, Up: Grammar Outline
- The C Declarations Section
- --------------------------
- The C DECLARATIONS section contains macro definitions and
- declarations of functions and variables that are used in the actions in
- the grammar rules. These are copied to the beginning of the parser
- file so that they precede the definition of `yyparse'. You can use
- `#include' to get the declarations from a header file. If you don't
- need any C declarations, you may omit the `%{' and `%}' delimiters that
- bracket this section.
- File: bison.info, Node: Bison Declarations, Next: Grammar Rules, Prev: C Declarations, Up: Grammar Outline
- The Bison Declarations Section
- ------------------------------
- The BISON DECLARATIONS section contains declarations that define
- terminal and nonterminal symbols, specify precedence, and so on. In
- some simple grammars you may not need any declarations. *Note Bison
- Declarations: Declarations.
- File: bison.info, Node: Grammar Rules, Next: C Code, Prev: Bison Declarations, Up: Grammar Outline
- The Grammar Rules Section
- -------------------------
- The "grammar rules" section contains one or more Bison grammar
- rules, and nothing else. *Note Syntax of Grammar Rules: Rules.
- There must always be at least one grammar rule, and the first `%%'
- (which precedes the grammar rules) may never be omitted even if it is
- the first thing in the file.
- File: bison.info, Node: C Code, Prev: Grammar Rules, Up: Grammar Outline
- The Additional C Code Section
- -----------------------------
- The ADDITIONAL C CODE section is copied verbatim to the end of the
- parser file, just as the C DECLARATIONS section is copied to the
- beginning. This is the most convenient place to put anything that you
- want to have in the parser file but which need not come before the
- definition of `yyparse'. For example, the definitions of `yylex' and
- `yyerror' often go here. *Note Parser C-Language Interface: Interface.
- If the last section is empty, you may omit the `%%' that separates it
- from the grammar rules.
- The Bison parser itself contains many static variables whose names
- start with `yy' and many macros whose names start with `YY'. It is a
- good idea to avoid using any such names (except those documented in this
- manual) in the additional C code section of the grammar file.
- File: bison.info, Node: Symbols, Next: Rules, Prev: Grammar Outline, Up: Grammar File
- Symbols, Terminal and Nonterminal
- =================================
- "Symbols" in Bison grammars represent the grammatical classifications
- of the language.
- A "terminal symbol" (also known as a "token type") represents a
- class of syntactically equivalent tokens. You use the symbol in grammar
- rules to mean that a token in that class is allowed. The symbol is
- represented in the Bison parser by a numeric code, and the `yylex'
- function returns a token type code to indicate what kind of token has
- been read. You don't need to know what the code value is; you can use
- the symbol to stand for it.
- A "nonterminal symbol" stands for a class of syntactically equivalent
- groupings. The symbol name is used in writing grammar rules. By
- convention, it should be all lower case.
- Symbol names can contain letters, digits (not at the beginning),
- underscores and periods. Periods make sense only in nonterminals.
- There are two ways of writing terminal symbols in the grammar:
- * A "named token type" is written with an identifier, like an
- identifier in C. By convention, it should be all upper case. Each
- such name must be defined with a Bison declaration such as
- `%token'. *Note Token Type Names: Token Decl.
- * A "character token type" (or "literal token") is written in the
- grammar using the same syntax used in C for character constants;
- for example, `'+'' is a character token type. A character token
- type doesn't need to be declared unless you need to specify its
- semantic value data type (*note Data Types of Semantic Values:
- Value Type.), associativity, or precedence (*note Operator
- Precedence: Precedence.).
- By convention, a character token type is used only to represent a
- token that consists of that particular character. Thus, the token
- type `'+'' is used to represent the character `+' as a token.
- Nothing enforces this convention, but if you depart from it, your
- program will confuse other readers.
- All the usual escape sequences used in character literals in C can
- be used in Bison as well, but you must not use the null character
- as a character literal because its ASCII code, zero, is the code
- `yylex' returns for end-of-input (*note Calling Convention for
- `yylex': Calling Convention.).
- How you choose to write a terminal symbol has no effect on its
- grammatical meaning. That depends only on where it appears in rules and
- on when the parser function returns that symbol.
- The value returned by `yylex' is always one of the terminal symbols
- (or 0 for end-of-input). Whichever way you write the token type in the
- grammar rules, you write it the same way in the definition of `yylex'.
- The numeric code for a character token type is simply the ASCII code for
- the character, so `yylex' can use the identical character constant to
- generate the requisite code. Each named token type becomes a C macro in
- the parser file, so `yylex' can use the name to stand for the code.
- (This is why periods don't make sense in terminal symbols.) *Note
- Calling Convention for `yylex': Calling Convention.
- If `yylex' is defined in a separate file, you need to arrange for the
- token-type macro definitions to be available there. Use the `-d'
- option when you run Bison, so that it will write these macro definitions
- into a separate header file `NAME.tab.h' which you can include in the
- other source files that need it. *Note Invoking Bison: Invocation.
- The symbol `error' is a terminal symbol reserved for error recovery
- (*note Error Recovery::.); you shouldn't use it for any other purpose.
- In particular, `yylex' should never return this value.
- File: bison.info, Node: Rules, Next: Recursion, Prev: Symbols, Up: Grammar File
- Syntax of Grammar Rules
- =======================
- A Bison grammar rule has the following general form:
- RESULT: COMPONENTS...
- ;
- where RESULT is the nonterminal symbol that this rule describes and
- COMPONENTS are various terminal and nonterminal symbols that are put
- together by this rule (*note Symbols::.).
- For example,
- exp: exp '+' exp
- ;
- says that two groupings of type `exp', with a `+' token in between, can
- be combined into a larger grouping of type `exp'.
- Whitespace in rules is significant only to separate symbols. You
- can add extra whitespace as you wish.
- Scattered among the components can be ACTIONS that determine the
- semantics of the rule. An action looks like this:
- {C STATEMENTS}
- Usually there is only one action and it follows the components. *Note
- Actions::.
- Multiple rules for the same RESULT can be written separately or can
- be joined with the vertical-bar character `|' as follows:
- RESULT: RULE1-COMPONENTS...
- | RULE2-COMPONENTS...
- ...
- ;
- They are still considered distinct rules even when joined in this way.
- If COMPONENTS in a rule is empty, it means that RESULT can match the
- empty string. For example, here is how to define a comma-separated
- sequence of zero or more `exp' groupings:
- expseq: /* empty */
- | expseq1
- ;
-
- expseq1: exp
- | expseq1 ',' exp
- ;
- It is customary to write a comment `/* empty */' in each rule with no
- components.
- File: bison.info, Node: Recursion, Next: Semantics, Prev: Rules, Up: Grammar File
- Recursive Rules
- ===============
- A rule is called "recursive" when its RESULT nonterminal appears
- also on its right hand side. Nearly all Bison grammars need to use
- recursion, because that is the only way to define a sequence of any
- number of somethings. Consider this recursive definition of a
- comma-separated sequence of one or more expressions:
- expseq1: exp
- | expseq1 ',' exp
- ;
- Since the recursive use of `expseq1' is the leftmost symbol in the
- right hand side, we call this "left recursion". By contrast, here the
- same construct is defined using "right recursion":
- expseq1: exp
- | exp ',' expseq1
- ;
- Any kind of sequence can be defined using either left recursion or
- right recursion, but you should always use left recursion, because it
- can parse a sequence of any number of elements with bounded stack
- space. Right recursion uses up space on the Bison stack in proportion
- to the number of elements in the sequence, because all the elements
- must be shifted onto the stack before the rule can be applied even
- once. *Note The Bison Parser Algorithm: Algorithm, for further
- explanation of this.
- "Indirect" or "mutual" recursion occurs when the result of the rule
- does not appear directly on its right hand side, but does appear in
- rules for other nonterminals which do appear on its right hand side.
- For example:
- expr: primary
- | primary '+' primary
- ;
-
- primary: constant
- | '(' expr ')'
- ;
- defines two mutually-recursive nonterminals, since each refers to the
- other.
- File: bison.info, Node: Semantics, Next: Declarations, Prev: Recursion, Up: Grammar File
- Defining Language Semantics
- ===========================
- The grammar rules for a language determine only the syntax. The
- semantics are determined by the semantic values associated with various
- tokens and groupings, and by the actions taken when various groupings
- are recognized.
- For example, the calculator calculates properly because the value
- associated with each expression is the proper number; it adds properly
- because the action for the grouping `X + Y' is to add the numbers
- associated with X and Y.
- * Menu:
- * Value Type:: Specifying one data type for all semantic values.
- * Multiple Types:: Specifying several alternative data types.
- * Actions:: An action is the semantic definition of a grammar rule.
- * Action Types:: Specifying data types for actions to operate on.
- * Mid-Rule Actions:: Most actions go at the end of a rule.
- This says when, why and how to use the exceptional
- action in the middle of a rule.
- File: bison.info, Node: Value Type, Next: Multiple Types, Up: Semantics
- Data Types of Semantic Values
- -----------------------------
- In a simple program it may be sufficient to use the same data type
- for the semantic values of all language constructs. This was true in
- the RPN and infix calculator examples (*note Reverse Polish Notation
- Calculator: RPN Calc.).
- Bison's default is to use type `int' for all semantic values. To
- specify some other type, define `YYSTYPE' as a macro, like this:
- #define YYSTYPE double
- This macro definition must go in the C declarations section of the
- grammar file (*note Outline of a Bison Grammar: Grammar Outline.).
- File: bison.info, Node: Multiple Types, Next: Actions, Prev: Value Type, Up: Semantics
- More Than One Value Type
- ------------------------
- In most programs, you will need different data types for different
- kinds of tokens and groupings. For example, a numeric constant may
- need type `int' or `long', while a string constant needs type `char *',
- and an identifier might need a pointer to an entry in the symbol table.
- To use more than one data type for semantic values in one parser,
- Bison requires you to do two things:
- * Specify the entire collection of possible data types, with the
- `%union' Bison declaration (*note The Collection of Value Types:
- Union Decl.).
- * Choose one of those types for each symbol (terminal or nonterminal)
- for which semantic values are used. This is done for tokens with
- the `%token' Bison declaration (*note Token Type Names: Token
- Decl.) and for groupings with the `%type' Bison declaration (*note
- Nonterminal Symbols: Type Decl.).
- File: bison.info, Node: Actions, Next: Action Types, Prev: Multiple Types, Up: Semantics
- Actions
- -------
- An action accompanies a syntactic rule and contains C code to be
- executed each time an instance of that rule is recognized. The task of
- most actions is to compute a semantic value for the grouping built by
- the rule from the semantic values associated with tokens or smaller
- groupings.
- An action consists of C statements surrounded by braces, much like a
- compound statement in C. It can be placed at any position in the rule;
- it is executed at that position. Most rules have just one action at
- the end of the rule, following all the components. Actions in the
- middle of a rule are tricky and used only for special purposes (*note
- Actions in Mid-Rule: Mid-Rule Actions.).
- The C code in an action can refer to the semantic values of the
- components matched by the rule with the construct `$N', which stands for
- the value of the Nth component. The semantic value for the grouping
- being constructed is `$$'. (Bison translates both of these constructs
- into array element references when it copies the actions into the parser
- file.)
- Here is a typical example:
- exp: ...
- | exp '+' exp
- { $$ = $1 + $3; }
- This rule constructs an `exp' from two smaller `exp' groupings
- connected by a plus-sign token. In the action, `$1' and `$3' refer to
- the semantic values of the two component `exp' groupings, which are the
- first and third symbols on the right hand side of the rule. The sum is
- stored into `$$' so that it becomes the semantic value of the
- addition-expression just recognized by the rule. If there were a
- useful semantic value associated with the `+' token, it could be
- referred to as `$2'.
- If you don't specify an action for a rule, Bison supplies a default:
- `$$ = $1'. Thus, the value of the first symbol in the rule becomes the
- value of the whole rule. Of course, the default rule is valid only if
- the two data types match. There is no meaningful default action for an
- empty rule; every empty rule must have an explicit action unless the
- rule's value does not matter.
- `$N' with N zero or negative is allowed for reference to tokens and
- groupings on the stack *before* those that match the current rule.
- This is a very risky practice, and to use it reliably you must be
- certain of the context in which the rule is applied. Here is a case in
- which you can use this reliably:
- foo: expr bar '+' expr { ... }
- | expr bar '-' expr { ... }
- ;
-
- bar: /* empty */
- { previous_expr = $0; }
- ;
- As long as `bar' is used only in the fashion shown here, `$0' always
- refers to the `expr' which precedes `bar' in the definition of `foo'.
- File: bison.info, Node: Action Types, Next: Mid-Rule Actions, Prev: Actions, Up: Semantics
- Data Types of Values in Actions
- -------------------------------
- If you have chosen a single data type for semantic values, the `$$'
- and `$N' constructs always have that data type.
- If you have used `%union' to specify a variety of data types, then
- you must declare a choice among these types for each terminal or
- nonterminal symbol that can have a semantic value. Then each time you
- use `$$' or `$N', its data type is determined by which symbol it refers
- to in the rule. In this example,
- exp: ...
- | exp '+' exp
- { $$ = $1 + $3; }
- `$1' and `$3' refer to instances of `exp', so they all have the data
- type declared for the nonterminal symbol `exp'. If `$2' were used, it
- would have the data type declared for the terminal symbol `'+'',
- whatever that might be.
- Alternatively, you can specify the data type when you refer to the
- value, by inserting `<TYPE>' after the `$' at the beginning of the
- reference. For example, if you have defined types as shown here:
- %union {
- int itype;
- double dtype;
- }
- then you can write `$<itype>1' to refer to the first subunit of the
- rule as an integer, or `$<dtype>1' to refer to it as a double.
|