bison.info-4 43 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166
  1. This is Info file bison.info, produced by Makeinfo-1.55 from the input
  2. file ./bison.texinfo.
  3. This file documents the Bison parser generator.
  4. Copyright (C) 1988, 89, 90, 91, 92, 93, 1995 Free Software
  5. Foundation, Inc.
  6. Permission is granted to make and distribute verbatim copies of this
  7. manual provided the copyright notice and this permission notice are
  8. preserved on all copies.
  9. Permission is granted to copy and distribute modified versions of
  10. this manual under the conditions for verbatim copying, provided also
  11. that the sections entitled "GNU General Public License" and "Conditions
  12. for Using Bison" are included exactly as in the original, and provided
  13. that the entire resulting derived work is distributed under the terms
  14. of a permission notice identical to this one.
  15. Permission is granted to copy and distribute translations of this
  16. manual into another language, under the above conditions for modified
  17. versions, except that the sections entitled "GNU General Public
  18. License", "Conditions for Using Bison" and this permission notice may be
  19. included in translations approved by the Free Software Foundation
  20. instead of in the original English.
  21. 
  22. File: bison.info, Node: Reduce/Reduce, Next: Mystery Conflicts, Prev: Parser States, Up: Algorithm
  23. Reduce/Reduce Conflicts
  24. =======================
  25. A reduce/reduce conflict occurs if there are two or more rules that
  26. apply to the same sequence of input. This usually indicates a serious
  27. error in the grammar.
  28. For example, here is an erroneous attempt to define a sequence of
  29. zero or more `word' groupings.
  30. sequence: /* empty */
  31. { printf ("empty sequence\n"); }
  32. | maybeword
  33. | sequence word
  34. { printf ("added word %s\n", $2); }
  35. ;
  36. maybeword: /* empty */
  37. { printf ("empty maybeword\n"); }
  38. | word
  39. { printf ("single word %s\n", $1); }
  40. ;
  41. The error is an ambiguity: there is more than one way to parse a single
  42. `word' into a `sequence'. It could be reduced to a `maybeword' and
  43. then into a `sequence' via the second rule. Alternatively,
  44. nothing-at-all could be reduced into a `sequence' via the first rule,
  45. and this could be combined with the `word' using the third rule for
  46. `sequence'.
  47. There is also more than one way to reduce nothing-at-all into a
  48. `sequence'. This can be done directly via the first rule, or
  49. indirectly via `maybeword' and then the second rule.
  50. You might think that this is a distinction without a difference,
  51. because it does not change whether any particular input is valid or
  52. not. But it does affect which actions are run. One parsing order runs
  53. the second rule's action; the other runs the first rule's action and
  54. the third rule's action. In this example, the output of the program
  55. changes.
  56. Bison resolves a reduce/reduce conflict by choosing to use the rule
  57. that appears first in the grammar, but it is very risky to rely on
  58. this. Every reduce/reduce conflict must be studied and usually
  59. eliminated. Here is the proper way to define `sequence':
  60. sequence: /* empty */
  61. { printf ("empty sequence\n"); }
  62. | sequence word
  63. { printf ("added word %s\n", $2); }
  64. ;
  65. Here is another common error that yields a reduce/reduce conflict:
  66. sequence: /* empty */
  67. | sequence words
  68. | sequence redirects
  69. ;
  70. words: /* empty */
  71. | words word
  72. ;
  73. redirects:/* empty */
  74. | redirects redirect
  75. ;
  76. The intention here is to define a sequence which can contain either
  77. `word' or `redirect' groupings. The individual definitions of
  78. `sequence', `words' and `redirects' are error-free, but the three
  79. together make a subtle ambiguity: even an empty input can be parsed in
  80. infinitely many ways!
  81. Consider: nothing-at-all could be a `words'. Or it could be two
  82. `words' in a row, or three, or any number. It could equally well be a
  83. `redirects', or two, or any number. Or it could be a `words' followed
  84. by three `redirects' and another `words'. And so on.
  85. Here are two ways to correct these rules. First, to make it a
  86. single level of sequence:
  87. sequence: /* empty */
  88. | sequence word
  89. | sequence redirect
  90. ;
  91. Second, to prevent either a `words' or a `redirects' from being
  92. empty:
  93. sequence: /* empty */
  94. | sequence words
  95. | sequence redirects
  96. ;
  97. words: word
  98. | words word
  99. ;
  100. redirects:redirect
  101. | redirects redirect
  102. ;
  103. 
  104. File: bison.info, Node: Mystery Conflicts, Next: Stack Overflow, Prev: Reduce/Reduce, Up: Algorithm
  105. Mysterious Reduce/Reduce Conflicts
  106. ==================================
  107. Sometimes reduce/reduce conflicts can occur that don't look
  108. warranted. Here is an example:
  109. %token ID
  110. %%
  111. def: param_spec return_spec ','
  112. ;
  113. param_spec:
  114. type
  115. | name_list ':' type
  116. ;
  117. return_spec:
  118. type
  119. | name ':' type
  120. ;
  121. type: ID
  122. ;
  123. name: ID
  124. ;
  125. name_list:
  126. name
  127. | name ',' name_list
  128. ;
  129. It would seem that this grammar can be parsed with only a single
  130. token of look-ahead: when a `param_spec' is being read, an `ID' is a
  131. `name' if a comma or colon follows, or a `type' if another `ID'
  132. follows. In other words, this grammar is LR(1).
  133. However, Bison, like most parser generators, cannot actually handle
  134. all LR(1) grammars. In this grammar, two contexts, that after an `ID'
  135. at the beginning of a `param_spec' and likewise at the beginning of a
  136. `return_spec', are similar enough that Bison assumes they are the same.
  137. They appear similar because the same set of rules would be active--the
  138. rule for reducing to a `name' and that for reducing to a `type'. Bison
  139. is unable to determine at that stage of processing that the rules would
  140. require different look-ahead tokens in the two contexts, so it makes a
  141. single parser state for them both. Combining the two contexts causes a
  142. conflict later. In parser terminology, this occurrence means that the
  143. grammar is not LALR(1).
  144. In general, it is better to fix deficiencies than to document them.
  145. But this particular deficiency is intrinsically hard to fix; parser
  146. generators that can handle LR(1) grammars are hard to write and tend to
  147. produce parsers that are very large. In practice, Bison is more useful
  148. as it is now.
  149. When the problem arises, you can often fix it by identifying the two
  150. parser states that are being confused, and adding something to make them
  151. look distinct. In the above example, adding one rule to `return_spec'
  152. as follows makes the problem go away:
  153. %token BOGUS
  154. ...
  155. %%
  156. ...
  157. return_spec:
  158. type
  159. | name ':' type
  160. /* This rule is never used. */
  161. | ID BOGUS
  162. ;
  163. This corrects the problem because it introduces the possibility of an
  164. additional active rule in the context after the `ID' at the beginning of
  165. `return_spec'. This rule is not active in the corresponding context in
  166. a `param_spec', so the two contexts receive distinct parser states. As
  167. long as the token `BOGUS' is never generated by `yylex', the added rule
  168. cannot alter the way actual input is parsed.
  169. In this particular example, there is another way to solve the
  170. problem: rewrite the rule for `return_spec' to use `ID' directly
  171. instead of via `name'. This also causes the two confusing contexts to
  172. have different sets of active rules, because the one for `return_spec'
  173. activates the altered rule for `return_spec' rather than the one for
  174. `name'.
  175. param_spec:
  176. type
  177. | name_list ':' type
  178. ;
  179. return_spec:
  180. type
  181. | ID ':' type
  182. ;
  183. 
  184. File: bison.info, Node: Stack Overflow, Prev: Mystery Conflicts, Up: Algorithm
  185. Stack Overflow, and How to Avoid It
  186. ===================================
  187. The Bison parser stack can overflow if too many tokens are shifted
  188. and not reduced. When this happens, the parser function `yyparse'
  189. returns a nonzero value, pausing only to call `yyerror' to report the
  190. overflow.
  191. By defining the macro `YYMAXDEPTH', you can control how deep the
  192. parser stack can become before a stack overflow occurs. Define the
  193. macro with a value that is an integer. This value is the maximum number
  194. of tokens that can be shifted (and not reduced) before overflow. It
  195. must be a constant expression whose value is known at compile time.
  196. The stack space allowed is not necessarily allocated. If you
  197. specify a large value for `YYMAXDEPTH', the parser actually allocates a
  198. small stack at first, and then makes it bigger by stages as needed.
  199. This increasing allocation happens automatically and silently.
  200. Therefore, you do not need to make `YYMAXDEPTH' painfully small merely
  201. to save space for ordinary inputs that do not need much stack.
  202. The default value of `YYMAXDEPTH', if you do not define it, is 10000.
  203. You can control how much stack is allocated initially by defining the
  204. macro `YYINITDEPTH'. This value too must be a compile-time constant
  205. integer. The default is 200.
  206. 
  207. File: bison.info, Node: Error Recovery, Next: Context Dependency, Prev: Algorithm, Up: Top
  208. Error Recovery
  209. **************
  210. It is not usually acceptable to have a program terminate on a parse
  211. error. For example, a compiler should recover sufficiently to parse the
  212. rest of the input file and check it for errors; a calculator should
  213. accept another expression.
  214. In a simple interactive command parser where each input is one line,
  215. it may be sufficient to allow `yyparse' to return 1 on error and have
  216. the caller ignore the rest of the input line when that happens (and
  217. then call `yyparse' again). But this is inadequate for a compiler,
  218. because it forgets all the syntactic context leading up to the error.
  219. A syntax error deep within a function in the compiler input should not
  220. cause the compiler to treat the following line like the beginning of a
  221. source file.
  222. You can define how to recover from a syntax error by writing rules to
  223. recognize the special token `error'. This is a terminal symbol that is
  224. always defined (you need not declare it) and reserved for error
  225. handling. The Bison parser generates an `error' token whenever a
  226. syntax error happens; if you have provided a rule to recognize this
  227. token in the current context, the parse can continue.
  228. For example:
  229. stmnts: /* empty string */
  230. | stmnts '\n'
  231. | stmnts exp '\n'
  232. | stmnts error '\n'
  233. The fourth rule in this example says that an error followed by a
  234. newline makes a valid addition to any `stmnts'.
  235. What happens if a syntax error occurs in the middle of an `exp'? The
  236. error recovery rule, interpreted strictly, applies to the precise
  237. sequence of a `stmnts', an `error' and a newline. If an error occurs in
  238. the middle of an `exp', there will probably be some additional tokens
  239. and subexpressions on the stack after the last `stmnts', and there will
  240. be tokens to read before the next newline. So the rule is not
  241. applicable in the ordinary way.
  242. But Bison can force the situation to fit the rule, by discarding
  243. part of the semantic context and part of the input. First it discards
  244. states and objects from the stack until it gets back to a state in
  245. which the `error' token is acceptable. (This means that the
  246. subexpressions already parsed are discarded, back to the last complete
  247. `stmnts'.) At this point the `error' token can be shifted. Then, if
  248. the old look-ahead token is not acceptable to be shifted next, the
  249. parser reads tokens and discards them until it finds a token which is
  250. acceptable. In this example, Bison reads and discards input until the
  251. next newline so that the fourth rule can apply.
  252. The choice of error rules in the grammar is a choice of strategies
  253. for error recovery. A simple and useful strategy is simply to skip the
  254. rest of the current input line or current statement if an error is
  255. detected:
  256. stmnt: error ';' /* on error, skip until ';' is read */
  257. It is also useful to recover to the matching close-delimiter of an
  258. opening-delimiter that has already been parsed. Otherwise the
  259. close-delimiter will probably appear to be unmatched, and generate
  260. another, spurious error message:
  261. primary: '(' expr ')'
  262. | '(' error ')'
  263. ...
  264. ;
  265. Error recovery strategies are necessarily guesses. When they guess
  266. wrong, one syntax error often leads to another. In the above example,
  267. the error recovery rule guesses that an error is due to bad input
  268. within one `stmnt'. Suppose that instead a spurious semicolon is
  269. inserted in the middle of a valid `stmnt'. After the error recovery
  270. rule recovers from the first error, another syntax error will be found
  271. straightaway, since the text following the spurious semicolon is also
  272. an invalid `stmnt'.
  273. To prevent an outpouring of error messages, the parser will output
  274. no error message for another syntax error that happens shortly after
  275. the first; only after three consecutive input tokens have been
  276. successfully shifted will error messages resume.
  277. Note that rules which accept the `error' token may have actions, just
  278. as any other rules can.
  279. You can make error messages resume immediately by using the macro
  280. `yyerrok' in an action. If you do this in the error rule's action, no
  281. error messages will be suppressed. This macro requires no arguments;
  282. `yyerrok;' is a valid C statement.
  283. The previous look-ahead token is reanalyzed immediately after an
  284. error. If this is unacceptable, then the macro `yyclearin' may be used
  285. to clear this token. Write the statement `yyclearin;' in the error
  286. rule's action.
  287. For example, suppose that on a parse error, an error handling
  288. routine is called that advances the input stream to some point where
  289. parsing should once again commence. The next symbol returned by the
  290. lexical scanner is probably correct. The previous look-ahead token
  291. ought to be discarded with `yyclearin;'.
  292. The macro `YYRECOVERING' stands for an expression that has the value
  293. 1 when the parser is recovering from a syntax error, and 0 the rest of
  294. the time. A value of 1 indicates that error messages are currently
  295. suppressed for new syntax errors.
  296. 
  297. File: bison.info, Node: Context Dependency, Next: Debugging, Prev: Error Recovery, Up: Top
  298. Handling Context Dependencies
  299. *****************************
  300. The Bison paradigm is to parse tokens first, then group them into
  301. larger syntactic units. In many languages, the meaning of a token is
  302. affected by its context. Although this violates the Bison paradigm,
  303. certain techniques (known as "kludges") may enable you to write Bison
  304. parsers for such languages.
  305. * Menu:
  306. * Semantic Tokens:: Token parsing can depend on the semantic context.
  307. * Lexical Tie-ins:: Token parsing can depend on the syntactic context.
  308. * Tie-in Recovery:: Lexical tie-ins have implications for how
  309. error recovery rules must be written.
  310. (Actually, "kludge" means any technique that gets its job done but is
  311. neither clean nor robust.)
  312. 
  313. File: bison.info, Node: Semantic Tokens, Next: Lexical Tie-ins, Up: Context Dependency
  314. Semantic Info in Token Types
  315. ============================
  316. The C language has a context dependency: the way an identifier is
  317. used depends on what its current meaning is. For example, consider
  318. this:
  319. foo (x);
  320. This looks like a function call statement, but if `foo' is a typedef
  321. name, then this is actually a declaration of `x'. How can a Bison
  322. parser for C decide how to parse this input?
  323. The method used in GNU C is to have two different token types,
  324. `IDENTIFIER' and `TYPENAME'. When `yylex' finds an identifier, it
  325. looks up the current declaration of the identifier in order to decide
  326. which token type to return: `TYPENAME' if the identifier is declared as
  327. a typedef, `IDENTIFIER' otherwise.
  328. The grammar rules can then express the context dependency by the
  329. choice of token type to recognize. `IDENTIFIER' is accepted as an
  330. expression, but `TYPENAME' is not. `TYPENAME' can start a declaration,
  331. but `IDENTIFIER' cannot. In contexts where the meaning of the
  332. identifier is *not* significant, such as in declarations that can
  333. shadow a typedef name, either `TYPENAME' or `IDENTIFIER' is
  334. accepted--there is one rule for each of the two token types.
  335. This technique is simple to use if the decision of which kinds of
  336. identifiers to allow is made at a place close to where the identifier is
  337. parsed. But in C this is not always so: C allows a declaration to
  338. redeclare a typedef name provided an explicit type has been specified
  339. earlier:
  340. typedef int foo, bar, lose;
  341. static foo (bar); /* redeclare `bar' as static variable */
  342. static int foo (lose); /* redeclare `foo' as function */
  343. Unfortunately, the name being declared is separated from the
  344. declaration construct itself by a complicated syntactic structure--the
  345. "declarator".
  346. As a result, the part of Bison parser for C needs to be duplicated,
  347. with all the nonterminal names changed: once for parsing a declaration
  348. in which a typedef name can be redefined, and once for parsing a
  349. declaration in which that can't be done. Here is a part of the
  350. duplication, with actions omitted for brevity:
  351. initdcl:
  352. declarator maybeasm '='
  353. init
  354. | declarator maybeasm
  355. ;
  356. notype_initdcl:
  357. notype_declarator maybeasm '='
  358. init
  359. | notype_declarator maybeasm
  360. ;
  361. Here `initdcl' can redeclare a typedef name, but `notype_initdcl'
  362. cannot. The distinction between `declarator' and `notype_declarator'
  363. is the same sort of thing.
  364. There is some similarity between this technique and a lexical tie-in
  365. (described next), in that information which alters the lexical analysis
  366. is changed during parsing by other parts of the program. The
  367. difference is here the information is global, and is used for other
  368. purposes in the program. A true lexical tie-in has a special-purpose
  369. flag controlled by the syntactic context.
  370. 
  371. File: bison.info, Node: Lexical Tie-ins, Next: Tie-in Recovery, Prev: Semantic Tokens, Up: Context Dependency
  372. Lexical Tie-ins
  373. ===============
  374. One way to handle context-dependency is the "lexical tie-in": a flag
  375. which is set by Bison actions, whose purpose is to alter the way tokens
  376. are parsed.
  377. For example, suppose we have a language vaguely like C, but with a
  378. special construct `hex (HEX-EXPR)'. After the keyword `hex' comes an
  379. expression in parentheses in which all integers are hexadecimal. In
  380. particular, the token `a1b' must be treated as an integer rather than
  381. as an identifier if it appears in that context. Here is how you can do
  382. it:
  383. %{
  384. int hexflag;
  385. %}
  386. %%
  387. ...
  388. expr: IDENTIFIER
  389. | constant
  390. | HEX '('
  391. { hexflag = 1; }
  392. expr ')'
  393. { hexflag = 0;
  394. $$ = $4; }
  395. | expr '+' expr
  396. { $$ = make_sum ($1, $3); }
  397. ...
  398. ;
  399. constant:
  400. INTEGER
  401. | STRING
  402. ;
  403. Here we assume that `yylex' looks at the value of `hexflag'; when it is
  404. nonzero, all integers are parsed in hexadecimal, and tokens starting
  405. with letters are parsed as integers if possible.
  406. The declaration of `hexflag' shown in the C declarations section of
  407. the parser file is needed to make it accessible to the actions (*note
  408. The C Declarations Section: C Declarations.). You must also write the
  409. code in `yylex' to obey the flag.
  410. 
  411. File: bison.info, Node: Tie-in Recovery, Prev: Lexical Tie-ins, Up: Context Dependency
  412. Lexical Tie-ins and Error Recovery
  413. ==================================
  414. Lexical tie-ins make strict demands on any error recovery rules you
  415. have. *Note Error Recovery::.
  416. The reason for this is that the purpose of an error recovery rule is
  417. to abort the parsing of one construct and resume in some larger
  418. construct. For example, in C-like languages, a typical error recovery
  419. rule is to skip tokens until the next semicolon, and then start a new
  420. statement, like this:
  421. stmt: expr ';'
  422. | IF '(' expr ')' stmt { ... }
  423. ...
  424. error ';'
  425. { hexflag = 0; }
  426. ;
  427. If there is a syntax error in the middle of a `hex (EXPR)'
  428. construct, this error rule will apply, and then the action for the
  429. completed `hex (EXPR)' will never run. So `hexflag' would remain set
  430. for the entire rest of the input, or until the next `hex' keyword,
  431. causing identifiers to be misinterpreted as integers.
  432. To avoid this problem the error recovery rule itself clears
  433. `hexflag'.
  434. There may also be an error recovery rule that works within
  435. expressions. For example, there could be a rule which applies within
  436. parentheses and skips to the close-parenthesis:
  437. expr: ...
  438. | '(' expr ')'
  439. { $$ = $2; }
  440. | '(' error ')'
  441. ...
  442. If this rule acts within the `hex' construct, it is not going to
  443. abort that construct (since it applies to an inner level of parentheses
  444. within the construct). Therefore, it should not clear the flag: the
  445. rest of the `hex' construct should be parsed with the flag still in
  446. effect.
  447. What if there is an error recovery rule which might abort out of the
  448. `hex' construct or might not, depending on circumstances? There is no
  449. way you can write the action to determine whether a `hex' construct is
  450. being aborted or not. So if you are using a lexical tie-in, you had
  451. better make sure your error recovery rules are not of this kind. Each
  452. rule must be such that you can be sure that it always will, or always
  453. won't, have to clear the flag.
  454. 
  455. File: bison.info, Node: Debugging, Next: Invocation, Prev: Context Dependency, Up: Top
  456. Debugging Your Parser
  457. *********************
  458. If a Bison grammar compiles properly but doesn't do what you want
  459. when it runs, the `yydebug' parser-trace feature can help you figure
  460. out why.
  461. To enable compilation of trace facilities, you must define the macro
  462. `YYDEBUG' when you compile the parser. You could use `-DYYDEBUG=1' as
  463. a compiler option or you could put `#define YYDEBUG 1' in the C
  464. declarations section of the grammar file (*note The C Declarations
  465. Section: C Declarations.). Alternatively, use the `-t' option when you
  466. run Bison (*note Invoking Bison: Invocation.). We always define
  467. `YYDEBUG' so that debugging is always possible.
  468. The trace facility uses `stderr', so you must add
  469. `#include <stdio.h>' to the C declarations section unless it is already
  470. there.
  471. Once you have compiled the program with trace facilities, the way to
  472. request a trace is to store a nonzero value in the variable `yydebug'.
  473. You can do this by making the C code do it (in `main', perhaps), or you
  474. can alter the value with a C debugger.
  475. Each step taken by the parser when `yydebug' is nonzero produces a
  476. line or two of trace information, written on `stderr'. The trace
  477. messages tell you these things:
  478. * Each time the parser calls `yylex', what kind of token was read.
  479. * Each time a token is shifted, the depth and complete contents of
  480. the state stack (*note Parser States::.).
  481. * Each time a rule is reduced, which rule it is, and the complete
  482. contents of the state stack afterward.
  483. To make sense of this information, it helps to refer to the listing
  484. file produced by the Bison `-v' option (*note Invoking Bison:
  485. Invocation.). This file shows the meaning of each state in terms of
  486. positions in various rules, and also what each state will do with each
  487. possible input token. As you read the successive trace messages, you
  488. can see that the parser is functioning according to its specification
  489. in the listing file. Eventually you will arrive at the place where
  490. something undesirable happens, and you will see which parts of the
  491. grammar are to blame.
  492. The parser file is a C program and you can use C debuggers on it,
  493. but it's not easy to interpret what it is doing. The parser function
  494. is a finite-state machine interpreter, and aside from the actions it
  495. executes the same code over and over. Only the values of variables
  496. show where in the grammar it is working.
  497. The debugging information normally gives the token type of each token
  498. read, but not its semantic value. You can optionally define a macro
  499. named `YYPRINT' to provide a way to print the value. If you define
  500. `YYPRINT', it should take three arguments. The parser will pass a
  501. standard I/O stream, the numeric code for the token type, and the token
  502. value (from `yylval').
  503. Here is an example of `YYPRINT' suitable for the multi-function
  504. calculator (*note Declarations for `mfcalc': Mfcalc Decl.):
  505. #define YYPRINT(file, type, value) yyprint (file, type, value)
  506. static void
  507. yyprint (file, type, value)
  508. FILE *file;
  509. int type;
  510. YYSTYPE value;
  511. {
  512. if (type == VAR)
  513. fprintf (file, " %s", value.tptr->name);
  514. else if (type == NUM)
  515. fprintf (file, " %d", value.val);
  516. }
  517. 
  518. File: bison.info, Node: Invocation, Next: Table of Symbols, Prev: Debugging, Up: Top
  519. Invoking Bison
  520. **************
  521. The usual way to invoke Bison is as follows:
  522. bison INFILE
  523. Here INFILE is the grammar file name, which usually ends in `.y'.
  524. The parser file's name is made by replacing the `.y' with `.tab.c'.
  525. Thus, the `bison foo.y' filename yields `foo.tab.c', and the `bison
  526. hack/foo.y' filename yields `hack/foo.tab.c'.
  527. * Menu:
  528. * Bison Options:: All the options described in detail,
  529. in alphabetical order by short options.
  530. * Option Cross Key:: Alphabetical list of long options.
  531. * VMS Invocation:: Bison command syntax on VMS.
  532. 
  533. File: bison.info, Node: Bison Options, Next: Option Cross Key, Up: Invocation
  534. Bison Options
  535. =============
  536. Bison supports both traditional single-letter options and mnemonic
  537. long option names. Long option names are indicated with `--' instead of
  538. `-'. Abbreviations for option names are allowed as long as they are
  539. unique. When a long option takes an argument, like `--file-prefix',
  540. connect the option name and the argument with `='.
  541. Here is a list of options that can be used with Bison, alphabetized
  542. by short option. It is followed by a cross key alphabetized by long
  543. option.
  544. `-b FILE-PREFIX'
  545. `--file-prefix=PREFIX'
  546. Specify a prefix to use for all Bison output file names. The
  547. names are chosen as if the input file were named `PREFIX.c'.
  548. `-d'
  549. `--defines'
  550. Write an extra output file containing macro definitions for the
  551. token type names defined in the grammar and the semantic value type
  552. `YYSTYPE', as well as a few `extern' variable declarations.
  553. If the parser output file is named `NAME.c' then this file is
  554. named `NAME.h'.
  555. This output file is essential if you wish to put the definition of
  556. `yylex' in a separate source file, because `yylex' needs to be
  557. able to refer to token type codes and the variable `yylval'.
  558. *Note Semantic Values of Tokens: Token Values.
  559. `-l'
  560. `--no-lines'
  561. Don't put any `#line' preprocessor commands in the parser file.
  562. Ordinarily Bison puts them in the parser file so that the C
  563. compiler and debuggers will associate errors with your source
  564. file, the grammar file. This option causes them to associate
  565. errors with the parser file, treating it an independent source
  566. file in its own right.
  567. `-o OUTFILE'
  568. `--output-file=OUTFILE'
  569. Specify the name OUTFILE for the parser file.
  570. The other output files' names are constructed from OUTFILE as
  571. described under the `-v' and `-d' switches.
  572. `-p PREFIX'
  573. `--name-prefix=PREFIX'
  574. Rename the external symbols used in the parser so that they start
  575. with PREFIX instead of `yy'. The precise list of symbols renamed
  576. is `yyparse', `yylex', `yyerror', `yynerrs', `yylval', `yychar'
  577. and `yydebug'.
  578. For example, if you use `-p c', the names become `cparse', `clex',
  579. and so on.
  580. *Note Multiple Parsers in the Same Program: Multiple Parsers.
  581. `-t'
  582. `--debug'
  583. Output a definition of the macro `YYDEBUG' into the parser file,
  584. so that the debugging facilities are compiled. *Note Debugging
  585. Your Parser: Debugging.
  586. `-v'
  587. `--verbose'
  588. Write an extra output file containing verbose descriptions of the
  589. parser states and what is done for each type of look-ahead token in
  590. that state.
  591. This file also describes all the conflicts, both those resolved by
  592. operator precedence and the unresolved ones.
  593. The file's name is made by removing `.tab.c' or `.c' from the
  594. parser output file name, and adding `.output' instead.
  595. Therefore, if the input file is `foo.y', then the parser file is
  596. called `foo.tab.c' by default. As a consequence, the verbose
  597. output file is called `foo.output'.
  598. `-V'
  599. `--version'
  600. Print the version number of Bison and exit.
  601. `-h'
  602. `--help'
  603. Print a summary of the command-line options to Bison and exit.
  604. `-y'
  605. `--yacc'
  606. `--fixed-output-files'
  607. Equivalent to `-o y.tab.c'; the parser output file is called
  608. `y.tab.c', and the other outputs are called `y.output' and
  609. `y.tab.h'. The purpose of this switch is to imitate Yacc's output
  610. file name conventions. Thus, the following shell script can
  611. substitute for Yacc:
  612. bison -y $*
  613. 
  614. File: bison.info, Node: Option Cross Key, Next: VMS Invocation, Prev: Bison Options, Up: Invocation
  615. Option Cross Key
  616. ================
  617. Here is a list of options, alphabetized by long option, to help you
  618. find the corresponding short option.
  619. --debug -t
  620. --defines -d
  621. --file-prefix=PREFIX -b FILE-PREFIX
  622. --fixed-output-files --yacc -y
  623. --help -h
  624. --name-prefix -p
  625. --no-lines -l
  626. --output-file=OUTFILE -o OUTFILE
  627. --verbose -v
  628. --version -V
  629. 
  630. File: bison.info, Node: VMS Invocation, Prev: Option Cross Key, Up: Invocation
  631. Invoking Bison under VMS
  632. ========================
  633. The command line syntax for Bison on VMS is a variant of the usual
  634. Bison command syntax--adapted to fit VMS conventions.
  635. To find the VMS equivalent for any Bison option, start with the long
  636. option, and substitute a `/' for the leading `--', and substitute a `_'
  637. for each `-' in the name of the long option. For example, the
  638. following invocation under VMS:
  639. bison /debug/name_prefix=bar foo.y
  640. is equivalent to the following command under POSIX.
  641. bison --debug --name-prefix=bar foo.y
  642. The VMS file system does not permit filenames such as `foo.tab.c'.
  643. In the above example, the output file would instead be named
  644. `foo_tab.c'.
  645. 
  646. File: bison.info, Node: Table of Symbols, Next: Glossary, Prev: Invocation, Up: Top
  647. Bison Symbols
  648. *************
  649. `error'
  650. A token name reserved for error recovery. This token may be used
  651. in grammar rules so as to allow the Bison parser to recognize an
  652. error in the grammar without halting the process. In effect, a
  653. sentence containing an error may be recognized as valid. On a
  654. parse error, the token `error' becomes the current look-ahead
  655. token. Actions corresponding to `error' are then executed, and
  656. the look-ahead token is reset to the token that originally caused
  657. the violation. *Note Error Recovery::.
  658. `YYABORT'
  659. Macro to pretend that an unrecoverable syntax error has occurred,
  660. by making `yyparse' return 1 immediately. The error reporting
  661. function `yyerror' is not called. *Note The Parser Function
  662. `yyparse': Parser Function.
  663. `YYACCEPT'
  664. Macro to pretend that a complete utterance of the language has been
  665. read, by making `yyparse' return 0 immediately. *Note The Parser
  666. Function `yyparse': Parser Function.
  667. `YYBACKUP'
  668. Macro to discard a value from the parser stack and fake a
  669. look-ahead token. *Note Special Features for Use in Actions:
  670. Action Features.
  671. `YYERROR'
  672. Macro to pretend that a syntax error has just been detected: call
  673. `yyerror' and then perform normal error recovery if possible
  674. (*note Error Recovery::.), or (if recovery is impossible) make
  675. `yyparse' return 1. *Note Error Recovery::.
  676. `YYERROR_VERBOSE'
  677. Macro that you define with `#define' in the Bison declarations
  678. section to request verbose, specific error message strings when
  679. `yyerror' is called.
  680. `YYINITDEPTH'
  681. Macro for specifying the initial size of the parser stack. *Note
  682. Stack Overflow::.
  683. `YYLEX_PARAM'
  684. Macro for specifying an extra argument (or list of extra
  685. arguments) for `yyparse' to pass to `yylex'. *Note Calling
  686. Conventions for Pure Parsers: Pure Calling.
  687. `YYLTYPE'
  688. Macro for the data type of `yylloc'; a structure with four
  689. members. *Note Textual Positions of Tokens: Token Positions.
  690. `YYMAXDEPTH'
  691. Macro for specifying the maximum size of the parser stack. *Note
  692. Stack Overflow::.
  693. `YYPARSE_PARAM'
  694. Macro for specifying the name of a parameter that `yyparse' should
  695. accept. *Note Calling Conventions for Pure Parsers: Pure Calling.
  696. `YYRECOVERING'
  697. Macro whose value indicates whether the parser is recovering from a
  698. syntax error. *Note Special Features for Use in Actions: Action
  699. Features.
  700. `YYSTYPE'
  701. Macro for the data type of semantic values; `int' by default.
  702. *Note Data Types of Semantic Values: Value Type.
  703. `yychar'
  704. External integer variable that contains the integer value of the
  705. current look-ahead token. (In a pure parser, it is a local
  706. variable within `yyparse'.) Error-recovery rule actions may
  707. examine this variable. *Note Special Features for Use in Actions:
  708. Action Features.
  709. `yyclearin'
  710. Macro used in error-recovery rule actions. It clears the previous
  711. look-ahead token. *Note Error Recovery::.
  712. `yydebug'
  713. External integer variable set to zero by default. If `yydebug' is
  714. given a nonzero value, the parser will output information on input
  715. symbols and parser action. *Note Debugging Your Parser: Debugging.
  716. `yyerrok'
  717. Macro to cause parser to recover immediately to its normal mode
  718. after a parse error. *Note Error Recovery::.
  719. `yyerror'
  720. User-supplied function to be called by `yyparse' on error. The
  721. function receives one argument, a pointer to a character string
  722. containing an error message. *Note The Error Reporting Function
  723. `yyerror': Error Reporting.
  724. `yylex'
  725. User-supplied lexical analyzer function, called with no arguments
  726. to get the next token. *Note The Lexical Analyzer Function
  727. `yylex': Lexical.
  728. `yylval'
  729. External variable in which `yylex' should place the semantic value
  730. associated with a token. (In a pure parser, it is a local
  731. variable within `yyparse', and its address is passed to `yylex'.)
  732. *Note Semantic Values of Tokens: Token Values.
  733. `yylloc'
  734. External variable in which `yylex' should place the line and
  735. column numbers associated with a token. (In a pure parser, it is a
  736. local variable within `yyparse', and its address is passed to
  737. `yylex'.) You can ignore this variable if you don't use the `@'
  738. feature in the grammar actions. *Note Textual Positions of
  739. Tokens: Token Positions.
  740. `yynerrs'
  741. Global variable which Bison increments each time there is a parse
  742. error. (In a pure parser, it is a local variable within
  743. `yyparse'.) *Note The Error Reporting Function `yyerror': Error
  744. Reporting.
  745. `yyparse'
  746. The parser function produced by Bison; call this function to start
  747. parsing. *Note The Parser Function `yyparse': Parser Function.
  748. `%left'
  749. Bison declaration to assign left associativity to token(s). *Note
  750. Operator Precedence: Precedence Decl.
  751. `%nonassoc'
  752. Bison declaration to assign nonassociativity to token(s). *Note
  753. Operator Precedence: Precedence Decl.
  754. `%prec'
  755. Bison declaration to assign a precedence to a specific rule.
  756. *Note Context-Dependent Precedence: Contextual Precedence.
  757. `%pure_parser'
  758. Bison declaration to request a pure (reentrant) parser. *Note A
  759. Pure (Reentrant) Parser: Pure Decl.
  760. `%right'
  761. Bison declaration to assign right associativity to token(s).
  762. *Note Operator Precedence: Precedence Decl.
  763. `%start'
  764. Bison declaration to specify the start symbol. *Note The
  765. Start-Symbol: Start Decl.
  766. `%token'
  767. Bison declaration to declare token(s) without specifying
  768. precedence. *Note Token Type Names: Token Decl.
  769. `%type'
  770. Bison declaration to declare nonterminals. *Note Nonterminal
  771. Symbols: Type Decl.
  772. `%union'
  773. Bison declaration to specify several possible data types for
  774. semantic values. *Note The Collection of Value Types: Union Decl.
  775. These are the punctuation and delimiters used in Bison input:
  776. `%%'
  777. Delimiter used to separate the grammar rule section from the Bison
  778. declarations section or the additional C code section. *Note The
  779. Overall Layout of a Bison Grammar: Grammar Layout.
  780. `%{ %}'
  781. All code listed between `%{' and `%}' is copied directly to the
  782. output file uninterpreted. Such code forms the "C declarations"
  783. section of the input file. *Note Outline of a Bison Grammar:
  784. Grammar Outline.
  785. `/*...*/'
  786. Comment delimiters, as in C.
  787. `:'
  788. Separates a rule's result from its components. *Note Syntax of
  789. Grammar Rules: Rules.
  790. `;'
  791. Terminates a rule. *Note Syntax of Grammar Rules: Rules.
  792. `|'
  793. Separates alternate rules for the same result nonterminal. *Note
  794. Syntax of Grammar Rules: Rules.
  795. 
  796. File: bison.info, Node: Glossary, Next: Index, Prev: Table of Symbols, Up: Top
  797. Glossary
  798. ********
  799. Backus-Naur Form (BNF)
  800. Formal method of specifying context-free grammars. BNF was first
  801. used in the `ALGOL-60' report, 1963. *Note Languages and
  802. Context-Free Grammars: Language and Grammar.
  803. Context-free grammars
  804. Grammars specified as rules that can be applied regardless of
  805. context. Thus, if there is a rule which says that an integer can
  806. be used as an expression, integers are allowed *anywhere* an
  807. expression is permitted. *Note Languages and Context-Free
  808. Grammars: Language and Grammar.
  809. Dynamic allocation
  810. Allocation of memory that occurs during execution, rather than at
  811. compile time or on entry to a function.
  812. Empty string
  813. Analogous to the empty set in set theory, the empty string is a
  814. character string of length zero.
  815. Finite-state stack machine
  816. A "machine" that has discrete states in which it is said to exist
  817. at each instant in time. As input to the machine is processed, the
  818. machine moves from state to state as specified by the logic of the
  819. machine. In the case of the parser, the input is the language
  820. being parsed, and the states correspond to various stages in the
  821. grammar rules. *Note The Bison Parser Algorithm: Algorithm.
  822. Grouping
  823. A language construct that is (in general) grammatically divisible;
  824. for example, `expression' or `declaration' in C. *Note Languages
  825. and Context-Free Grammars: Language and Grammar.
  826. Infix operator
  827. An arithmetic operator that is placed between the operands on
  828. which it performs some operation.
  829. Input stream
  830. A continuous flow of data between devices or programs.
  831. Language construct
  832. One of the typical usage schemas of the language. For example,
  833. one of the constructs of the C language is the `if' statement.
  834. *Note Languages and Context-Free Grammars: Language and Grammar.
  835. Left associativity
  836. Operators having left associativity are analyzed from left to
  837. right: `a+b+c' first computes `a+b' and then combines with `c'.
  838. *Note Operator Precedence: Precedence.
  839. Left recursion
  840. A rule whose result symbol is also its first component symbol; for
  841. example, `expseq1 : expseq1 ',' exp;'. *Note Recursive Rules:
  842. Recursion.
  843. Left-to-right parsing
  844. Parsing a sentence of a language by analyzing it token by token
  845. from left to right. *Note The Bison Parser Algorithm: Algorithm.
  846. Lexical analyzer (scanner)
  847. A function that reads an input stream and returns tokens one by
  848. one. *Note The Lexical Analyzer Function `yylex': Lexical.
  849. Lexical tie-in
  850. A flag, set by actions in the grammar rules, which alters the way
  851. tokens are parsed. *Note Lexical Tie-ins::.
  852. Look-ahead token
  853. A token already read but not yet shifted. *Note Look-Ahead
  854. Tokens: Look-Ahead.
  855. LALR(1)
  856. The class of context-free grammars that Bison (like most other
  857. parser generators) can handle; a subset of LR(1). *Note
  858. Mysterious Reduce/Reduce Conflicts: Mystery Conflicts.
  859. LR(1)
  860. The class of context-free grammars in which at most one token of
  861. look-ahead is needed to disambiguate the parsing of any piece of
  862. input.
  863. Nonterminal symbol
  864. A grammar symbol standing for a grammatical construct that can be
  865. expressed through rules in terms of smaller constructs; in other
  866. words, a construct that is not a token. *Note Symbols::.
  867. Parse error
  868. An error encountered during parsing of an input stream due to
  869. invalid syntax. *Note Error Recovery::.
  870. Parser
  871. A function that recognizes valid sentences of a language by
  872. analyzing the syntax structure of a set of tokens passed to it
  873. from a lexical analyzer.
  874. Postfix operator
  875. An arithmetic operator that is placed after the operands upon
  876. which it performs some operation.
  877. Reduction
  878. Replacing a string of nonterminals and/or terminals with a single
  879. nonterminal, according to a grammar rule. *Note The Bison Parser
  880. Algorithm: Algorithm.
  881. Reentrant
  882. A reentrant subprogram is a subprogram which can be in invoked any
  883. number of times in parallel, without interference between the
  884. various invocations. *Note A Pure (Reentrant) Parser: Pure Decl.
  885. Reverse polish notation
  886. A language in which all operators are postfix operators.
  887. Right recursion
  888. A rule whose result symbol is also its last component symbol; for
  889. example, `expseq1: exp ',' expseq1;'. *Note Recursive Rules:
  890. Recursion.
  891. Semantics
  892. In computer languages, the semantics are specified by the actions
  893. taken for each instance of the language, i.e., the meaning of each
  894. statement. *Note Defining Language Semantics: Semantics.
  895. Shift
  896. A parser is said to shift when it makes the choice of analyzing
  897. further input from the stream rather than reducing immediately some
  898. already-recognized rule. *Note The Bison Parser Algorithm:
  899. Algorithm.
  900. Single-character literal
  901. A single character that is recognized and interpreted as is.
  902. *Note From Formal Rules to Bison Input: Grammar in Bison.
  903. Start symbol
  904. The nonterminal symbol that stands for a complete valid utterance
  905. in the language being parsed. The start symbol is usually listed
  906. as the first nonterminal symbol in a language specification.
  907. *Note The Start-Symbol: Start Decl.
  908. Symbol table
  909. A data structure where symbol names and associated data are stored
  910. during parsing to allow for recognition and use of existing
  911. information in repeated uses of a symbol. *Note Multi-function
  912. Calc::.
  913. Token
  914. A basic, grammatically indivisible unit of a language. The symbol
  915. that describes a token in the grammar is a terminal symbol. The
  916. input of the Bison parser is a stream of tokens which comes from
  917. the lexical analyzer. *Note Symbols::.
  918. Terminal symbol
  919. A grammar symbol that has no rules in the grammar and therefore is
  920. grammatically indivisible. The piece of text it represents is a
  921. token. *Note Languages and Context-Free Grammars: Language and
  922. Grammar.