BISON-1.INF 48 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956957958959960961962963964965966967968969970971972973974975976977978979980981982983984985986987988989990991992993994995996997998999100010011002100310041005100610071008100910101011101210131014101510161017101810191020102110221023102410251026102710281029103010311032103310341035103610371038103910401041104210431044104510461047104810491050105110521053105410551056105710581059106010611062106310641065106610671068106910701071107210731074107510761077107810791080108110821083108410851086108710881089109010911092109310941095109610971098109911001101110211031104110511061107110811091110111111121113111411151116111711181119112011211122112311241125112611271128112911301131113211331134113511361137113811391140114111421143114411451146114711481149115011511152115311541155115611571158115911601161116211631164116511661167116811691170117111721173117411751176117711781179118011811182118311841185118611871188118911901191
  1. Info file bison.info, produced by Makeinfo, -*- Text -*- from input
  2. file bison.texinfo.
  3. This file documents the Bison parser generator.
  4. Copyright (C) 1988, 1989 Free Software Foundation, Inc.
  5. Permission is granted to make and distribute verbatim copies of this
  6. manual provided the copyright notice and this permission notice are
  7. preserved on all copies.
  8. Permission is granted to copy and distribute modified versions of
  9. this manual under the conditions for verbatim copying, provided also
  10. that the sections entitled ``GNU General Public License'' and
  11. ``Conditions for Using Bison'' are included exactly as in the
  12. original, and provided that the entire resulting derived work is
  13. distributed under the terms of a permission notice identical to this
  14. one.
  15. Permission is granted to copy and distribute translations of this
  16. manual into another language, under the above conditions for modified
  17. versions, except that the sections entitled ``GNU General Public
  18. License'', ``Conditions for Using Bison'' and this permission notice
  19. may be included in translations approved by the Free Software
  20. Foundation instead of in the original English.
  21. 
  22. File: bison.info, Node: Top, Next: Introduction, Prev: (DIR), Up: (DIR)
  23. * Menu:
  24. * Introduction::
  25. * Conditions::
  26. * Copying:: The GNU General Public License says
  27. how you can copy and share Bison
  28. Tutorial sections:
  29. * Concepts:: Basic concepts for understanding Bison.
  30. * Examples:: Three simple explained examples of using Bison.
  31. Reference sections:
  32. * Grammar File:: Writing Bison declarations and rules.
  33. * Interface:: C-language interface to the parser function `yyparse'.
  34. * Algorithm:: How the Bison parser works at run-time.
  35. * Error Recovery:: Writing rules for error recovery.
  36. * Context Dependency::What to do if your language syntax is too
  37. messy for Bison to handle straightforwardly.
  38. * Debugging:: Debugging Bison parsers that parse wrong.
  39. * Invocation:: How to run Bison (to produce the parser source file).
  40. * Table of Symbols:: All the keywords of the Bison language are explained.
  41. * Glossary:: Basic concepts are explained.
  42. * Index:: Cross-references to the text.
  43. 
  44. File: bison.info, Node: Introduction, Next: Conditions, Prev: Top, Up: Top
  45. Introduction
  46. ************
  47. "Bison" is a general-purpose parser generator which converts a
  48. grammar description into a C program to parse that grammar. Once you
  49. are proficient with Bison, you may use it to develop a wide range of
  50. language parsers, from those used in simple desk calculators to
  51. complex programming languages.
  52. Bison is upward compatible with Yacc: all properly-written Yacc
  53. grammars ought to work with Bison with no change. Anyone familiar
  54. with Yacc should be able to use Bison with little trouble. You need
  55. to be fluent in C programming in order to use Bison or to understand
  56. this manual.
  57. We begin with tutorial chapters that explain the basic concepts of
  58. using Bison and show three explained examples, each building on the
  59. last. If you don't know Bison or Yacc, start by reading these
  60. chapters. Reference chapters follow which describe specific aspects
  61. of Bison in detail.
  62. Bison was basically written by Robert Corbett, and made
  63. Yacc-compatible by Richard Stallman.
  64. 
  65. File: bison.info, Node: Conditions, Next: Copying, Prev: Introduction, Up: Top
  66. Conditions for Using Bison
  67. **************************
  68. Bison grammars can be used only in programs that are free software.
  69. This is in contrast to what happens with the GNU C compiler and the
  70. other GNU programming tools.
  71. The reason Bison is special is that the output of the Bison
  72. utility--the Bison parser file--contains a verbatim copy of a sizable
  73. piece of Bison, which is the code for the `yyparse' function. (The
  74. actions from your grammar are inserted into this function at one
  75. point, but the rest of the function is not changed.)
  76. As a result, the Bison parser file is covered by the same copying
  77. conditions that cover Bison itself and the rest of the GNU system:
  78. any program containing it has to be distributed under the standard
  79. GNU copying conditions.
  80. Occasionally people who would like to use Bison to develop
  81. proprietary programs complain about this.
  82. We don't particularly sympathize with their complaints. The purpose
  83. of the GNU project is to promote the right to share software and the
  84. practice of sharing software; it is a means of changing society. The
  85. people who complain are planning to be uncooperative toward the rest
  86. of the world; why should they deserve our help in doing so?
  87. However, it's possible that a change in these conditions might
  88. encourage computer companies to use and distribute the GNU system.
  89. If so, then we might decide to change the terms on `yyparse' as a
  90. matter of the strategy of promoting the right to share. Such a
  91. change would be irrevocable. Since we stand by the copying
  92. permissions we have announced, we cannot withdraw them once given.
  93. We mustn't make an irrevocable change hastily. We have to wait until
  94. there is a complete GNU system and there has been time to learn how
  95. this issue affects its reception.
  96. 
  97. File: bison.info, Node: Copying, Next: Concepts, Prev: Conditions, Up: Top
  98. GNU GENERAL PUBLIC LICENSE
  99. **************************
  100. Version 1, February 1989
  101. Copyright (C) 1989 Free Software Foundation, Inc.
  102. 675 Mass Ave, Cambridge, MA 02139, USA
  103. Everyone is permitted to copy and distribute verbatim copies
  104. of this license document, but changing it is not allowed.
  105. Preamble
  106. =========
  107. The license agreements of most software companies try to keep users
  108. at the mercy of those companies. By contrast, our General Public
  109. License is intended to guarantee your freedom to share and change
  110. free software--to make sure the software is free for all its users.
  111. The General Public License applies to the Free Software Foundation's
  112. software and to any other program whose authors commit to using it.
  113. You can use it for your programs, too.
  114. When we speak of free software, we are referring to freedom, not
  115. price. Specifically, the General Public License is designed to make
  116. sure that you have the freedom to give away or sell copies of free
  117. software, that you receive source code or can get it if you want it,
  118. that you can change the software or use pieces of it in new free
  119. programs; and that you know you can do these things.
  120. To protect your rights, we need to make restrictions that forbid
  121. anyone to deny you these rights or to ask you to surrender the rights.
  122. These restrictions translate to certain responsibilities for you if
  123. you distribute copies of the software, or if you modify it.
  124. For example, if you distribute copies of a such a program, whether
  125. gratis or for a fee, you must give the recipients all the rights that
  126. you have. You must make sure that they, too, receive or can get the
  127. source code. And you must tell them their rights.
  128. We protect your rights with two steps: (1) copyright the software,
  129. and (2) offer you this license which gives you legal permission to
  130. copy, distribute and/or modify the software.
  131. Also, for each author's protection and ours, we want to make certain
  132. that everyone understands that there is no warranty for this free
  133. software. If the software is modified by someone else and passed on,
  134. we want its recipients to know that what they have is not the
  135. original, so that any problems introduced by others will not reflect
  136. on the original authors' reputations.
  137. The precise terms and conditions for copying, distribution and
  138. modification follow.
  139. TERMS AND CONDITIONS
  140. 1. This License Agreement applies to any program or other work
  141. which contains a notice placed by the copyright holder saying it
  142. may be distributed under the terms of this General Public
  143. License. The ``Program'', below, refers to any such program or
  144. work, and a ``work based on the Program'' means either the
  145. Program or any work containing the Program or a portion of it,
  146. either verbatim or with modifications. Each licensee is
  147. addressed as ``you''.
  148. 2. You may copy and distribute verbatim copies of the Program's
  149. source code as you receive it, in any medium, provided that you
  150. conspicuously and appropriately publish on each copy an
  151. appropriate copyright notice and disclaimer of warranty; keep
  152. intact all the notices that refer to this General Public License
  153. and to the absence of any warranty; and give any other
  154. recipients of the Program a copy of this General Public License
  155. along with the Program. You may charge a fee for the physical
  156. act of transferring a copy.
  157. 3. You may modify your copy or copies of the Program or any portion
  158. of it, and copy and distribute such modifications under the
  159. terms of Paragraph 1 above, provided that you also do the
  160. following:
  161. * cause the modified files to carry prominent notices stating
  162. that you changed the files and the date of any change; and
  163. * cause the whole of any work that you distribute or publish,
  164. that in whole or in part contains the Program or any part
  165. thereof, either with or without modifications, to be
  166. licensed at no charge to all third parties under the terms
  167. of this General Public License (except that you may choose
  168. to grant warranty protection to some or all third parties,
  169. at your option).
  170. * If the modified program normally reads commands
  171. interactively when run, you must cause it, when started
  172. running for such interactive use in the simplest and most
  173. usual way, to print or display an announcement including an
  174. appropriate copyright notice and a notice that there is no
  175. warranty (or else, saying that you provide a warranty) and
  176. that users may redistribute the program under these
  177. conditions, and telling the user how to view a copy of this
  178. General Public License.
  179. * You may charge a fee for the physical act of transferring a
  180. copy, and you may at your option offer warranty protection
  181. in exchange for a fee.
  182. Mere aggregation of another independent work with the Program
  183. (or its derivative) on a volume of a storage or distribution
  184. medium does not bring the other work under the scope of these
  185. terms.
  186. 4. You may copy and distribute the Program (or a portion or
  187. derivative of it, under Paragraph 2) in object code or
  188. executable form under the terms of Paragraphs 1 and 2 above
  189. provided that you also do one of the following:
  190. * accompany it with the complete corresponding
  191. machine-readable source code, which must be distributed
  192. under the terms of Paragraphs 1 and 2 above; or,
  193. * accompany it with a written offer, valid for at least three
  194. years, to give any third party free (except for a nominal
  195. charge for the cost of distribution) a complete
  196. machine-readable copy of the corresponding source code, to
  197. be distributed under the terms of Paragraphs 1 and 2 above;
  198. or,
  199. * accompany it with the information you received as to where
  200. the corresponding source code may be obtained. (This
  201. alternative is allowed only for noncommercial distribution
  202. and only if you received the program in object code or
  203. executable form alone.)
  204. Source code for a work means the preferred form of the work for
  205. making modifications to it. For an executable file, complete
  206. source code means all the source code for all modules it
  207. contains; but, as a special exception, it need not include
  208. source code for modules which are standard libraries that
  209. accompany the operating system on which the executable file
  210. runs, or for standard header files or definitions files that
  211. accompany that operating system.
  212. 5. You may not copy, modify, sublicense, distribute or transfer the
  213. Program except as expressly provided under this General Public
  214. License. Any attempt otherwise to copy, modify, sublicense,
  215. distribute or transfer the Program is void, and will
  216. automatically terminate your rights to use the Program under
  217. this License. However, parties who have received copies, or
  218. rights to use copies, from you under this General Public License
  219. will not have their licenses terminated so long as such parties
  220. remain in full compliance.
  221. 6. By copying, distributing or modifying the Program (or any work
  222. based on the Program) you indicate your acceptance of this
  223. license to do so, and all its terms and conditions.
  224. 7. Each time you redistribute the Program (or any work based on the
  225. Program), the recipient automatically receives a license from
  226. the original licensor to copy, distribute or modify the Program
  227. subject to these terms and conditions. You may not impose any
  228. further restrictions on the recipients' exercise of the rights
  229. granted herein.
  230. 8. The Free Software Foundation may publish revised and/or new
  231. versions of the General Public License from time to time. Such
  232. new versions will be similar in spirit to the present version,
  233. but may differ in detail to address new problems or concerns.
  234. Each version is given a distinguishing version number. If the
  235. Program specifies a version number of the license which applies
  236. to it and ``any later version'', you have the option of
  237. following the terms and conditions either of that version or of
  238. any later version published by the Free Software Foundation. If
  239. the Program does not specify a version number of the license,
  240. you may choose any version ever published by the Free Software
  241. Foundation.
  242. 9. If you wish to incorporate parts of the Program into other free
  243. programs whose distribution conditions are different, write to
  244. the author to ask for permission. For software which is
  245. copyrighted by the Free Software Foundation, write to the Free
  246. Software Foundation; we sometimes make exceptions for this. Our
  247. decision will be guided by the two goals of preserving the free
  248. status of all derivatives of our free software and of promoting
  249. the sharing and reuse of software generally.
  250. NO WARRANTY
  251. 10. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO
  252. WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE
  253. LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
  254. HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM ``AS IS''
  255. WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED,
  256. INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
  257. MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE
  258. ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS
  259. WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE
  260. COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
  261. 11. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
  262. WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY
  263. MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE
  264. LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL,
  265. INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR
  266. INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS
  267. OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
  268. YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH
  269. ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN
  270. ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
  271. END OF TERMS AND CONDITIONS
  272. Appendix: How to Apply These Terms to Your New Programs
  273. =======================================================
  274. If you develop a new program, and you want it to be of the greatest
  275. possible use to humanity, the best way to achieve this is to make it
  276. free software which everyone can redistribute and change under these
  277. terms.
  278. To do so, attach the following notices to the program. It is safest
  279. to attach them to the start of each source file to most effectively
  280. convey the exclusion of warranty; and each file should have at least
  281. the ``copyright'' line and a pointer to where the full notice is found.
  282. ONE LINE TO GIVE THE PROGRAM'S NAME AND A BRIEF IDEA OF WHAT IT DOES.
  283. Copyright (C) 19YY NAME OF AUTHOR
  284. This program is free software; you can redistribute it and/or modify
  285. it under the terms of the GNU General Public License as published by
  286. the Free Software Foundation; either version 1, or (at your option)
  287. any later version.
  288. This program is distributed in the hope that it will be useful,
  289. but WITHOUT ANY WARRANTY; without even the implied warranty of
  290. MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
  291. GNU General Public License for more details.
  292. You should have received a copy of the GNU General Public License
  293. along with this program; if not, write to the Free Software
  294. Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
  295. Also add information on how to contact you by electronic and paper
  296. mail.
  297. If the program is interactive, make it output a short notice like
  298. this when it starts in an interactive mode:
  299. Gnomovision version 69, Copyright (C) 19YY NAME OF AUTHOR
  300. Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
  301. This is free software, and you are welcome to redistribute it
  302. under certain conditions; type `show c' for details.
  303. The hypothetical commands `show w' and `show c' should show the
  304. appropriate parts of the General Public License. Of course, the
  305. commands you use may be called something other than `show w' and
  306. `show c'; they could even be mouse-clicks or menu items--whatever
  307. suits your program.
  308. You should also get your employer (if you work as a programmer) or
  309. your school, if any, to sign a ``copyright disclaimer'' for the
  310. program, if necessary. Here a sample; alter the names:
  311. Yoyodyne, Inc., hereby disclaims all copyright interest in the
  312. program `Gnomovision' (a program to direct compilers to make passes
  313. at assemblers) written by James Hacker.
  314. SIGNATURE OF TY COON, 1 April 1989
  315. Ty Coon, President of Vice
  316. That's all there is to it!
  317. 
  318. File: bison.info, Node: Concepts, Next: Examples, Prev: Copying, Up: Top
  319. The Concepts of Bison
  320. *********************
  321. This chapter introduces many of the basic concepts without which the
  322. details of Bison will not make sense. If you do not already know how
  323. to use Bison or Yacc, we suggest you start by reading this chapter
  324. carefully.
  325. * Menu:
  326. * Language and Grammar:: Languages and context-free grammars,
  327. as mathematical ideas.
  328. * Grammar in Bison:: How we represent grammars for Bison's sake.
  329. * Semantic Values:: Each token or syntactic grouping can have
  330. a semantic value (the value of an integer,
  331. the name of an identifier, etc.).
  332. * Semantic Actions:: Each rule can have an action containing C code.
  333. * Bison Parser:: What are Bison's input and output,
  334. how is the output used?
  335. * Stages:: Stages in writing and running Bison grammars.
  336. * Grammar Layout:: Overall structure of a Bison grammar file.
  337. 
  338. File: bison.info, Node: Language and Grammar, Next: Grammar in Bison, Prev: Concepts, Up: Concepts
  339. Languages and Context-Free Grammars
  340. ===================================
  341. In order for Bison to parse a language, it must be described by a
  342. "context-free grammar". This means that you specify one or more
  343. "syntactic groupings" and give rules for constructing them from their
  344. parts. For example, in the C language, one kind of grouping is
  345. called an `expression'. One rule for making an expression might be,
  346. ``An expression can be made of a minus sign and another expression''.
  347. Another would be, ``An expression can be an integer''. As you can
  348. see, rules are often recursive, but there must be at least one rule
  349. which leads out of the recursion.
  350. The most common formal system for presenting such rules for humans to
  351. read is "Backus-Naur Form" or ``BNF'', which was developed in order
  352. to specify the language Algol 60. Any grammar expressed in BNF is a
  353. context-free grammar. The input to Bison is essentially
  354. machine-readable BNF.
  355. In the formal grammatical rules for a language, each kind of
  356. syntactic unit or grouping is named by a "symbol". Those which are
  357. built by grouping smaller constructs according to grammatical rules
  358. are called "nonterminal symbols"; those which can't be subdivided are
  359. called "terminal symbols" or "token types". We call a piece of input
  360. corresponding to a single terminal symbol a "token", and a piece
  361. corresponding to a single nonterminal symbol a "grouping".
  362. We can use the C language as an example of what symbols, terminal and
  363. nonterminal, mean. The tokens of C are identifiers, constants
  364. (numeric and string), and the various keywords, arithmetic operators
  365. and punctuation marks. So the terminal symbols of a grammar for C
  366. include `identifier', `number', `string', plus one symbol for each
  367. keyword, operator or punctuation mark: `if', `return', `const',
  368. `static', `int', `char', `plus-sign', `open-brace', `close-brace',
  369. `comma' and many more. (These tokens can be subdivided into
  370. characters, but that is a matter of lexicography, not grammar.)
  371. Here is a simple C function subdivided into tokens:
  372. int /* keyword `int' */
  373. square (x) /* identifier, open-paren, */
  374. /* identifier, close-paren */
  375. int x; /* keyword `int', identifier, semicolon */
  376. { /* open-brace */
  377. return x * x; /* keyword `return', identifier, */
  378. /* asterisk, identifier, semicolon */
  379. } /* close-brace */
  380. The syntactic groupings of C include the expression, the statement,
  381. the declaration, and the function definition. These are represented
  382. in the grammar of C by nonterminal symbols `expression', `statement',
  383. `declaration' and `function definition'. The full grammar uses
  384. dozens of additional language constructs, each with its own
  385. nonterminal symbol, in order to express the meanings of these four.
  386. The example above is a function definition; it contains one
  387. declaration, and one statement. In the statement, each `x' is an
  388. expression and so is `x * x'.
  389. Each nonterminal symbol must have grammatical rules showing how it is
  390. made out of simpler constructs. For example, one kind of C statement
  391. is the `return' statement; this would be described with a grammar
  392. rule which reads informally as follows:
  393. A `statement' can be made of a `return' keyword, an `expression'
  394. and a `semicolon'.
  395. There would be many other rules for `statement', one for each kind of
  396. statement in C.
  397. One nonterminal symbol must be distinguished as the special one which
  398. defines a complete utterance in the language. It is called the
  399. "start symbol". In a compiler, this means a complete input program.
  400. In the C language, the nonterminal symbol `sequence of definitions
  401. and declarations' plays this role.
  402. For example, `1 + 2' is a valid C expression--a valid part of a C
  403. program--but it is not valid as an *entire* C program. In the
  404. context-free grammar of C, this follows from the fact that
  405. `expression' is not the start symbol.
  406. The Bison parser reads a sequence of tokens as its input, and groups
  407. the tokens using the grammar rules. If the input is valid, the end
  408. result is that the entire token sequence reduces to a single grouping
  409. whose symbol is the grammar's start symbol. If we use a grammar for
  410. C, the entire input must be a `sequence of definitions and
  411. declarations'. If not, the parser reports a syntax error.
  412. 
  413. File: bison.info, Node: Grammar in Bison, Next: Semantic Values, Prev: Language and Grammar, Up: Concepts
  414. From Formal Rules to Bison Input
  415. ================================
  416. A formal grammar is a mathematical construct. To define the language
  417. for Bison, you must write a file expressing the grammar in Bison
  418. syntax: a "Bison grammar" file. *Note Grammar File::.
  419. A nonterminal symbol in the formal grammar is represented in Bison
  420. input as an identifier, like an identifier in C. By convention, it
  421. should be in lower case, such as `expr', `stmt' or `declaration'.
  422. The Bison representation for a terminal symbol is also called a
  423. "token type". Token types as well can be represented as C-like
  424. identifiers. By convention, these identifiers should be upper case
  425. to distinguish them from nonterminals: for example, `INTEGER',
  426. `IDENTIFIER', `IF' or `RETURN'. A terminal symbol that stands for a
  427. particular keyword in the language should be named after that keyword
  428. converted to upper case. The terminal symbol `error' is reserved for
  429. error recovery. *Note Symbols::.
  430. A terminal symbol can also be represented as a character literal,
  431. just like a C character constant. You should do this whenever a
  432. token is just a single character (parenthesis, plus-sign, etc.): use
  433. that same character in a literal as the terminal symbol for that token.
  434. The grammar rules also have an expression in Bison syntax. For
  435. example, here is the Bison rule for a C `return' statement. The
  436. semicolon in quotes is a literal character token, representing part
  437. of the C syntax for the statement; the naked semicolon, and the
  438. colon, are Bison punctuation used in every rule.
  439. stmt: RETURN expr ';'
  440. ;
  441. *Note Rules::.
  442. 
  443. File: bison.info, Node: Semantic Values, Next: Semantic Actions, Prev: Grammar in Bison, Up: Concepts
  444. Semantic Values
  445. ===============
  446. A formal grammar selects tokens only by their classifications: for
  447. example, if a rule mentions the terminal symbol `integer constant',
  448. it means that *any* integer constant is grammatically valid in that
  449. position. The precise value of the constant is irrelevant to how to
  450. parse the input: if `x+4' is grammatical then `x+1' or `x+3989' is
  451. equally grammatical.
  452. But the precise value is very important for what the input means once
  453. it is parsed. A compiler is useless if it fails to distinguish
  454. between 4, 1 and 3989 as constants in the program! Therefore, each
  455. token in a Bison grammar has both a token type and a "semantic
  456. value". *Note Semantics::, for details.
  457. The token type is a terminal symbol defined in the grammar, such as
  458. `INTEGER', `IDENTIFIER' or `',''. It tells everything you need to
  459. know to decide where the token may validly appear and how to group it
  460. with other tokens. The grammar rules know nothing about tokens
  461. except their types.
  462. The semantic value has all the the rest of the information about the
  463. meaning of the token, such as the value of an integer, or the name of
  464. an identifier. (A token such as `','' which is just punctuation
  465. doesn't need to have any semantic value.)
  466. For example, an input token might be classified as token type
  467. `INTEGER' and have the semantic value 4. Another input token might
  468. have the same token type `INTEGER' but value 3989. When a grammar
  469. rule says that `INTEGER' is allowed, either of these tokens is
  470. acceptable because each is an `INTEGER'. When the parser accepts the
  471. token, it keeps track of the token's semantic value.
  472. Each grouping can also have a semantic value as well as its
  473. nonterminal symbol. For example, in a calculator, an expression
  474. typically has a semantic value that is a number. In a compiler for a
  475. programming language, an expression typically has a semantic value
  476. that is a tree structure describing the meaning of the expression.
  477. 
  478. File: bison.info, Node: Semantic Actions, Next: Bison Parser, Prev: Semantic Values, Up: Concepts
  479. Semantic Actions
  480. ================
  481. In order to be useful, a program must do more than parse input; it
  482. must also produce some output based on the input. In a Bison
  483. grammar, a grammar rule can have an "action" made up of C statements.
  484. Each time the parser recognizes a match for that rule, the action is
  485. executed. *Note Actions::. Most of the time, the purpose
  486. of an action is to compute the semantic value of the whole construct
  487. from the semantic values of its parts. For example, suppose we have
  488. a rule which says an expression can be the sum of two expressions.
  489. When the parser recognizes such a sum, each of the subexpressions has
  490. a semantic value which describes how it was built up. The action for
  491. this rule should create a similar sort of value for the newly
  492. recognized larger expression.
  493. For example, here is a rule that says an expression can be the sum of
  494. two subexpressions:
  495. expr: expr '+' expr { $$ = $1 + $3; }
  496. ;
  497. The action says how to produce the semantic value of the sum
  498. expression from the values of the two subexpressions.
  499. 
  500. File: bison.info, Node: Bison Parser, Next: Stages, Prev: Semantic Actions, Up: Concepts
  501. Bison Output: the Parser File
  502. =============================
  503. When you run Bison, you give it a Bison grammar file as input. The
  504. output is a C source file that parses the language described by the
  505. grammar. This file is called a "Bison parser". Keep in mind that
  506. the Bison utility and the Bison parser are two distinct programs: the
  507. Bison utility is a program whose output is the Bison parser that
  508. becomes part of your program.
  509. The job of the Bison parser is to group tokens into groupings
  510. according to the grammar rules--for example, to build identifiers and
  511. operators into expressions. As it does this, it runs the actions for
  512. the grammar rules it uses.
  513. The tokens come from a function called the "lexical analyzer" that
  514. you must supply in some fashion (such as by writing it in C). The
  515. Bison parser calls the lexical analyzer each time it wants a new
  516. token. It doesn't know what is ``inside'' the tokens (though their
  517. semantic values may reflect this). Typically the lexical analyzer
  518. makes the tokens by parsing characters of text, but Bison does not
  519. depend on this. *Note Lexical::.
  520. The Bison parser file is C code which defines a function named
  521. `yyparse' which implements that grammar. This function does not make
  522. a complete C program: you must supply some additional functions. One
  523. is the lexical analyzer. Another is an error-reporting function
  524. which the parser calls to report an error. In addition, a complete C
  525. program must start with a function called `main'; you have to provide
  526. this, and arrange for it to call `yyparse' or the parser will never
  527. run. *Note Interface::.
  528. Aside from the token type names and the symbols in the actions you
  529. write, all variable and function names used in the Bison parser file
  530. begin with `yy' or `YY'. This includes interface functions such as
  531. the lexical analyzer function `yylex', the error reporting function
  532. `yyerror' and the parser function `yyparse' itself. This also
  533. includes numerous identifiers used for internal purposes. Therefore,
  534. you should avoid using C identifiers starting with `yy' or `YY' in
  535. the Bison grammar file except for the ones defined in this manual.
  536. 
  537. File: bison.info, Node: Stages, Next: Grammar Layout, Prev: Bison Parser, Up: Concepts
  538. Stages in Using Bison
  539. =====================
  540. The actual language-design process using Bison, from grammar
  541. specification to a working compiler or interpreter, has these parts:
  542. 1. Formally specify the grammar in a form recognized by Bison
  543. (*note Grammar File::.). For each grammatical rule in the
  544. language, describe the action that is to be taken when an
  545. instance of that rule is recognized. The action is described by
  546. a sequence of C statements.
  547. 2. Write a lexical analyzer to process input and pass tokens to the
  548. parser. The lexical analyzer may be written by hand in C (*note
  549. Lexical::.). It could also be produced using Lex, but the use
  550. of Lex is not discussed in this manual.
  551. 3. Write a controlling function that calls the Bison-produced parser.
  552. 4. Write error-reporting routines.
  553. To turn this source code as written into a runnable program, you must
  554. follow these steps:
  555. 1. Run Bison on the grammar to produce the parser.
  556. 2. Compile the code output by Bison, as well as any other source
  557. files.
  558. 3. Link the object files to produce the finished product.
  559. 
  560. File: bison.info, Node: Grammar Layout, Prev: Stages, Up: Concepts
  561. The Overall Layout of a Bison Grammar
  562. =====================================
  563. The input file for the Bison utility is a "Bison grammar file". The
  564. general form of a Bison grammar file is as follows:
  565. %{
  566. C DECLARATIONS
  567. %}
  568. BISON DECLARATIONS
  569. %%
  570. GRAMMAR RULES
  571. %%
  572. ADDITIONAL C CODE
  573. The `%%', `%{' and `%}' are punctuation that appears in every Bison
  574. grammar file to separate the sections.
  575. The C declarations may define types and variables used in the actions.
  576. You can also use preprocessor commands to define macros used there,
  577. and use `#include' to include header files that do any of these things.
  578. The Bison declarations declare the names of the terminal and
  579. nonterminal symbols, and may also describe operator precedence and
  580. the data types of semantic values of various symbols.
  581. The grammar rules define how to construct each nonterminal symbol
  582. from its parts.
  583. The additional C code can contain any C code you want to use. Often
  584. the definition of the lexical analyzer `yylex' goes here, plus
  585. subroutines called by the actions in the grammar rules. In a simple
  586. program, all the rest of the program can go here.
  587. 
  588. File: bison.info, Node: Examples, Next: Grammar File, Prev: Concepts, Up: Top
  589. Examples
  590. ********
  591. Now we show and explain three sample programs written using Bison: a
  592. reverse polish notation calculator, an algebraic (infix) notation
  593. calculator, and a multi-function calculator. All three have been
  594. tested under BSD Unix 4.3; each produces a usable, though limited,
  595. interactive desk-top calculator.
  596. These examples are simple, but Bison grammars for real programming
  597. languages are written the same way.
  598. You can copy these examples out of the Info file and into a source
  599. file to try them.
  600. * Menu:
  601. * RPN Calc:: Reverse polish notation calculator;
  602. a first example with no operator precedence.
  603. * Infix Calc:: Infix (algebraic) notation calculator.
  604. Operator precedence is introduced.
  605. * Simple Error Recovery:: Continuing after syntax errors.
  606. * Multi-function Calc:: Calculator with memory and trig functions.
  607. It uses multiple data-types for semantic values.
  608. * Exercises:: Ideas for improving the multi-function calculator.
  609. 
  610. File: bison.info, Node: RPN Calc, Next: Infix Calc, Prev: Examples, Up: Examples
  611. Reverse Polish Notation Calculator
  612. ==================================
  613. The first example is that of a simple double-precision "reverse
  614. polish notation" calculator (a calculator using postfix operators).
  615. This example provides a good starting point, since operator
  616. precedence is not an issue. The second example will illustrate how
  617. operator precedence is handled.
  618. The source code for this calculator is named `rpcalc.y'. The `.y'
  619. extension is a convention used for Bison input files.
  620. * Menu:
  621. * Decls: Rpcalc Decls. Bison and C declarations for rpcalc.
  622. * Rules: Rpcalc Rules. Grammar Rules for rpcalc, with explanation.
  623. * Input: Rpcalc Input. Explaining the rules for `input'.
  624. * Line: Rpcalc Line. Explaining the rules for `line'.
  625. * Expr: Rpcalc Expr. Explaining the rules for `expr'.
  626. * Lexer: Rpcalc Lexer. The lexical analyzer.
  627. * Main: Rpcalc Main. The controlling function.
  628. * Error: Rpcalc Error. The error reporting function.
  629. * Gen: Rpcalc Gen. Running Bison on the grammar file.
  630. * Comp: Rpcalc Compile. Run the C compiler on the output code.
  631. 
  632. File: bison.info, Node: Rpcalc Decls, Next: Rpcalc Rules, Prev: RPN calc, Up: RPN calc
  633. Declarations for Rpcalc
  634. -----------------------
  635. Here are the C and Bison declarations for the reverse polish notation
  636. calculator. As in C, comments are placed between `/*...*/'.
  637. /* Reverse polish notation calculator. */
  638. %{
  639. #define YYSTYPE double
  640. #include <math.h>
  641. %}
  642. %token NUM
  643. %% /* Grammar rules and actions follow */
  644. The C declarations section (*note C Declarations::.) contains two
  645. preprocessor directives.
  646. The `#define' directive defines the macro `YYSTYPE', thus specifying
  647. the C data type for semantic values of both tokens and groupings
  648. (*note Value Type::.). The Bison parser will use whatever type
  649. `YYSTYPE' is defined as; if you don't define it, `int' is the
  650. default. Because we specify `double', each token and each expression
  651. has an associated value, which is a floating point number.
  652. The `#include' directive is used to declare the exponentiation
  653. function `pow'.
  654. The second section, Bison declarations, provides information to Bison
  655. about the token types (*note Bison Declarations::.). Each terminal
  656. symbol that is not a single-character literal must be declared here.
  657. (Single-character literals normally don't need to be declared.) In
  658. this example, all the arithmetic operators are designated by
  659. single-character literals, so the only terminal symbol that needs to
  660. be declared is `NUM', the token type for numeric constants.
  661. 
  662. File: bison.info, Node: Rpcalc Rules, Next: Rpcalc Input, Prev: Rpcalc Decls, Up: RPN Calc
  663. Grammar Rules for Rpcalc
  664. ------------------------
  665. Here are the grammar rules for the reverse polish notation calculator.
  666. input: /* empty */
  667. | input line
  668. ;
  669. line: '\n'
  670. | exp '\n' { printf ("\t%.10g\n", $1); }
  671. ;
  672. exp: NUM { $$ = $1; }
  673. | exp exp '+' { $$ = $1 + $2; }
  674. | exp exp '-' { $$ = $1 - $2; }
  675. | exp exp '*' { $$ = $1 * $2; }
  676. | exp exp '/' { $$ = $1 / $2; }
  677. /* Exponentiation */
  678. | exp exp '^' { $$ = pow ($1, $2); }
  679. /* Unary minus */
  680. | exp 'n' { $$ = -$1; }
  681. ;
  682. %%
  683. The groupings of the rpcalc ``language'' defined here are the
  684. expression (given the name `exp'), the line of input (`line'), and
  685. the complete input transcript (`input'). Each of these nonterminal
  686. symbols has several alternate rules, joined by the `|' punctuator
  687. which is read as ``or''. The following sections explain what these
  688. rules mean.
  689. The semantics of the language is determined by the actions taken when
  690. a grouping is recognized. The actions are the C code that appears
  691. inside braces. *Note Actions::.
  692. You must specify these actions in C, but Bison provides the means for
  693. passing semantic values between the rules. In each action, the
  694. pseudo-variable `$$' stands for the semantic value for the grouping
  695. that the rule is going to construct. Assigning a value to `$$' is
  696. the main job of most actions. The semantic values of the components
  697. of the rule are referred to as `$1', `$2', and so on.
  698. 
  699. File: bison.info, Node: Rpcalc Input, Next: Rpcalc Line, Prev: Rpcalc Rules, Up: RPN Calc
  700. Explanation of `input'
  701. ......................
  702. Consider the definition of `input':
  703. input: /* empty */
  704. | input line
  705. ;
  706. This definition reads as follows: ``A complete input is either an
  707. empty string, or a complete input followed by an input line''.
  708. Notice that ``complete input'' is defined in terms of itself. This
  709. definition is said to be "left recursive" since `input' appears
  710. always as the leftmost symbol in the sequence. *Note Recursion::.
  711. The first alternative is empty because there are no symbols between
  712. the colon and the first `|'; this means that `input' can match an
  713. empty string of input (no tokens). We write the rules this way
  714. because it is legitimate to type `Ctrl-d' right after you start the
  715. calculator. It's conventional to put an empty alternative first and
  716. write the comment `/* empty */' in it.
  717. The second alternate rule (`input line') handles all nontrivial input.
  718. It means, ``After reading any number of lines, read one more line if
  719. possible.'' The left recursion makes this rule into a loop. Since
  720. the first alternative matches empty input, the loop can be executed
  721. zero or more times.
  722. The parser function `yyparse' continues to process input until a
  723. grammatical error is seen or the lexical analyzer says there are no
  724. more input tokens; we will arrange for the latter to happen at end of
  725. file.
  726. 
  727. File: bison.info, Node: Rpcalc Line, Next: Rpcalc Expr, Prev: Rpcalc Input, Up: RPN Calc
  728. Explanation of `line'
  729. .....................
  730. Now consider the definition of `line':
  731. line: '\n'
  732. | exp '\n' { printf ("\t%.10g\n", $1); }
  733. ;
  734. The first alternative is a token which is a newline character; this
  735. means that rpcalc accepts a blank line (and ignores it, since there
  736. is no action). The second alternative is an expression followed by a
  737. newline. This is the alternative that makes rpcalc useful. The
  738. semantic value of the `exp' grouping is the value of `$1' because the
  739. `exp' in question is the first symbol in the alternative. The action
  740. prints this value, which is the result of the computation the user
  741. asked for.
  742. This action is unusual because it does not assign a value to `$$'.
  743. As a consequence, the semantic value associated with the `line' is
  744. uninitialized (its value will be unpredictable). This would be a bug
  745. if that value were ever used, but we don't use it: once rpcalc has
  746. printed the value of the user's input line, that value is no longer
  747. needed.
  748. 
  749. File: bison.info, Node: Rpcalc Expr, Next: Rpcalc Lexer, Prev: Rpcalc Line, Up: RPN Calc
  750. Explanation of `expr'
  751. .....................
  752. The `exp' grouping has several rules, one for each kind of expression.
  753. The first rule handles the simplest expressions: those that are just
  754. numbers. The second handles an addition-expression, which looks like
  755. two expressions followed by a plus-sign. The third handles
  756. subtraction, and so on.
  757. exp: NUM
  758. | exp exp '+' { $$ = $1 + $2; }
  759. | exp exp '-' { $$ = $1 - $2; }
  760. ...
  761. ;
  762. We have used `|' to join all the rules for `exp', but we could
  763. equally well have written them separately:
  764. exp: NUM ;
  765. exp: exp exp '+' { $$ = $1 + $2; } ;
  766. exp: exp exp '-' { $$ = $1 - $2; } ;
  767. ...
  768. Most of the rules have actions that compute the value of the
  769. expression in terms of the value of its parts. For example, in the
  770. rule for addition, `$1' refers to the first component `exp' and `$2'
  771. refers to the second one. The third component, `'+'', has no
  772. meaningful associated semantic value, but if it had one you could
  773. refer to it as `$3'. When `yyparse' recognizes a sum expression
  774. using this rule, the sum of the two subexpressions' values is
  775. produced as the value of the entire expression. *Note Actions::.
  776. You don't have to give an action for every rule. When a rule has no
  777. action, Bison by default copies the value of `$1' into `$$'. This is
  778. what happens in the first rule (the one that uses `NUM').
  779. The formatting shown here is the recommended convention, but Bison
  780. does not require it. You can add or change whitespace as much as you
  781. wish. For example, this:
  782. exp : NUM | exp exp '+' {$$ = $1 + $2; } | ...
  783. means the same thing as this:
  784. exp: NUM
  785. | exp exp '+' { $$ = $1 + $2; }
  786. | ...
  787. The latter, however, is much more readable.
  788. 
  789. File: bison.info, Node: Rpcalc Lexer, Next: Rpcalc Main, Prev: Rpcalc Expr, Up: RPN Calc
  790. The Rpcalc Lexical Analyzer
  791. ---------------------------
  792. The lexical analyzer's job is low-level parsing: converting
  793. characters or sequences of characters into tokens. The Bison parser
  794. gets its tokens by calling the lexical analyzer. *Note Lexical::.
  795. Only a simple lexical analyzer is needed for the RPN calculator.
  796. This lexical analyzer skips blanks and tabs, then reads in numbers as
  797. `double' and returns them as `NUM' tokens. Any other character that
  798. isn't part of a number is a separate token. Note that the token-code
  799. for such a single-character token is the character itself.
  800. The return value of the lexical analyzer function is a numeric code
  801. which represents a token type. The same text used in Bison rules to
  802. stand for this token type is also a C expression for the numeric code
  803. for the type. This works in two ways. If the token type is a
  804. character literal, then its numeric code is the ASCII code for that
  805. character; you can use the same character literal in the lexical
  806. analyzer to express the number. If the token type is an identifier,
  807. that identifier is defined by Bison as a C macro whose definition is
  808. the appropriate number. In this example, therefore, `NUM' becomes a
  809. macro for `yylex' to use.
  810. The semantic value of the token (if it has one) is stored into the
  811. global variable `yylval', which is where the Bison parser will look
  812. for it. (The C data type of `yylval' is `YYSTYPE', which was defined
  813. at the beginning of the grammar; *note Rpcalc Decls::..)
  814. A token type code of zero is returned if the end-of-file is
  815. encountered. (Bison recognizes any nonpositive value as indicating
  816. the end of the input.)
  817. Here is the code for the lexical analyzer:
  818. /* Lexical analyzer returns a double floating point number on the
  819. stack and the token NUM, or the ASCII character read if not a
  820. number. Skips all blanks and tabs, returns 0 for EOF. */
  821. #include <ctype.h>
  822. yylex ()
  823. {
  824. int c;
  825. while ((c = getchar ()) == ' ' || c == '\t') /* skip white space */
  826. ;
  827. if (c == '.' || isdigit (c)) /* process numbers */
  828. {
  829. ungetc (c, stdin);
  830. scanf ("%lf", &yylval);
  831. return NUM;
  832. }
  833. if (c == EOF) /* return end-of-file */
  834. return 0;
  835. return c; /* return single chars */
  836. }
  837. 
  838. File: bison.info, Node: Rpcalc Main, Next: Rpcalc Error, Prev: Rpcalc Lexer, Up: RPN Calc
  839. The Controlling Function
  840. ------------------------
  841. In keeping with the spirit of this example, the controlling function
  842. is kept to the bare minimum. The only requirement is that it call
  843. `yyparse' to start the process of parsing.
  844. main ()
  845. {
  846. yyparse ();
  847. }
  848. 
  849. File: bison.info, Node: Rpcalc Error, Next: Rpcalc Gen, Prev: Rpcalc Main, Up: RPN Calc
  850. The Error Reporting Routine
  851. ---------------------------
  852. When `yyparse' detects a syntax error, it calls the error reporting
  853. function `yyerror' to print an error message (usually but not always
  854. `"parse error"'). It is up to the programmer to supply `yyerror'
  855. (*note Interface::.), so here is the definition we will use:
  856. #include <stdio.h>
  857. yyerror (s) /* Called by yyparse on error */
  858. char *s;
  859. {
  860. printf ("%s\n", s);
  861. }
  862. After `yyerror' returns, the Bison parser may recover from the error
  863. and continue parsing if the grammar contains a suitable error rule
  864. (*note Error Recovery::.). Otherwise, `yyparse' returns nonzero. We
  865. have not written any error rules in this example, so any invalid
  866. input will cause the calculator program to exit. This is not clean
  867. behavior for a real calculator, but it is adequate in the first
  868. example.
  869. 
  870. File: bison.info, Node: Rpcalc Gen, Next: Rpcalc Compile, Prev: Rpcalc Error, Up: RPN Calc
  871. Running Bison to Make the Parser
  872. --------------------------------
  873. Before running Bison to produce a parser, we need to decide how to
  874. arrange all the source code in one or more source files. For such a
  875. simple example, the easiest thing is to put everything in one file.
  876. The definitions of `yylex', `yyerror' and `main' go at the end, in
  877. the ``additional C code'' section of the file (*note Grammar
  878. Layout::.).
  879. For a large project, you would probably have several source files,
  880. and use `make' to arrange to recompile them.
  881. With all the source in a single file, you use the following command
  882. to convert it into a parser file:
  883. bison FILE_NAME.y
  884. In this example the file was called `rpcalc.y' (for ``Reverse Polish
  885. CALCulator''). Bison produces a file named `FILE_NAME.tab.c',
  886. removing the `.y' from the original file name. The file output by
  887. Bison contains the source code for `yyparse'. The additional
  888. functions in the input file (`yylex', `yyerror' and `main') are
  889. copied verbatim to the output.
  890. 
  891. File: bison.info, Node: Rpcalc Compile, Prev: Rpcalc Gen, Up: RPN Calc
  892. Compiling the Parser File
  893. -------------------------
  894. Here is how to compile and run the parser file:
  895. # List files in current directory.
  896. % ls
  897. rpcalc.tab.c rpcalc.y
  898. # Compile the Bison parser.
  899. # `-lm' tells compiler to search math library for `pow'.
  900. % cc rpcalc.tab.c -lm -o rpcalc
  901. # List files again.
  902. % ls
  903. rpcalc rpcalc.tab.c rpcalc.y
  904. The file `rpcalc' now contains the executable code. Here is an
  905. example session using `rpcalc'.
  906. % rpcalc
  907. 4 9 +
  908. 13
  909. 3 7 + 3 4 5 *+-
  910. -13
  911. 3 7 + 3 4 5 * + - n Note the unary minus, `n'
  912. 13
  913. 5 6 / 4 n +
  914. -3.166666667
  915. 3 4 ^ Exponentiation
  916. 81
  917. ^D End-of-file indicator
  918. %