123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197 |
- # Purpose: To accept texts like these ($ stands for end of string):
- # package a.b.c;$
- # package p1.p2.p3;$
- # And reject texts which don't follow this form,
- # with a sed (Stream EDitor) implementation of the concepts exposed in LR Parsing, from A.V. Aho and S.C. Johnson.
- # Texts which span across lines, and/or contain whitespaces between unbreakable elements as well as at string edges, are also accepted:
- # package
- # a
- # .
- # b . c
- # ; $
- # TODO Add comments on how to read this program. Let it be sort of mapping from this program itself to the book LR Parsing.
- # 0: _package a.b.c;$
- # 1: PACKAGE_a.b.c;$
- # 2: PACKAGE ID_.b.c;$
- # 3: PACKAGE SEQ_.b.c;$
- # 4: PACKAGE SEQ '.' _b.c;$
- # 5: PACKAGE SEQ '.' ID_.c;$
- # 3: PACKAGE SEQ_.c;$
- # 4: PACKAGE SEQ '.' _c;$
- # 5: PACKAGE SEQ '.' ID_;$
- # 3: PACKAGE SEQ_;$
- # 6: PKGSTMT_;$
- # 7: PKGSTMT';'_$ => accept (currently this just means the program halt with its last significant operation being pushing one state atop ';' -the 'accept' just means the abscense of error until the program execution can't go ahead).
- # Keeping the code correct as more and more table entries are added (both parse action and goto table) usually means to take care of the following:
- # - The states that should be checked for acceptance can change due to new states taking the place of acceptance state. This would be modified in the code that executes only at last input line (the one surrounded with ${ and }).
- # - Coming to a missmatch after having looked up over all table entries is something indeed not expected. It would mean a serious problem in the code. Still, jump into an error routine (:error label) is commanded (or stated, as in statement) to prevent such a bad code from running away (in other words, it's like a failfast on unexpected conditions). The last table entry lookup is followed by this jump. All others jumps following table entry checks should point to the next one. Thus, adding a new state entiles chaining the former last entry with the new one and that one with the error routine. This is true for goto table.
- # - Adding a new state also entiles to re-chain the former with the new last entry and this one with the parse_action_else routine in the parse action table totally likewise to what's done for the error routine and the goto table.
- x
- :next_cycle
- x
- s/^\s\+//
- t L1
- :L1
- s/^$//
- T stick_with_input_line
- # Accept on last input line
- ${
- x
- s/^7/\0/ # check stack top for acceptance states
- T error
- }
- b
- :stick_with_input_line
- # Parse action table
- # State retrieval
- x
- :0
- s/^0/\0/
- T 1
- x
- :0_PACKAGE
- s/^package\b//
- T error
- # shift PACKAGE
- x
- s/^/PACKAGE\n/
- t goto_table
- :1
- s/^1/\0/
- T 2
- x
- :1_ID
- s/^[a-zA-Z_][a-zA-Z0-9_]*//
- T error
- # shift ID
- x
- s/^/ID\n/
- t goto_table
- :2
- s/^2/\0/
- T 3
- x
- :2_'.'
- s/^\./\0/
- T error
- # reduce by rule SEQ -> ID
- x
- s/^2\nID/SEQ/
- t goto_table
- :3
- s/^3/\0/
- T 4
- x
- :3_'.'
- s/^\.//
- T 3_SEMICOLON
- # shift '.'
- x
- s/^/'.'\n/
- t goto_table
- :3_SEMICOLON
- s/^;/\0/
- T error
- # reduce by rule PKGSTMT -> PACKAGE SEQ
- x
- s/^3\nSEQ\n1\nPACKAGE/PKGSTMT/
- t goto_table
- :4
- s/^4/\0/
- T 5
- x
- :4_ID
- s/^[a-zA-Z_][a-zA-Z0-9_]*//
- T error
- # shift ID
- x
- s/^/ID\n/
- t goto_table
- :5
- s/^5/\0/
- T 6
- x
- :5_'.'
- :5_SEMICOLON
- s/^[.;]/\0/
- T error
- # reduce by rule SEQ -> SEQ '.' ID
- x
- s/^5\nID\n4\n'\.'\n3\nSEQ/SEQ/
- t goto_table
- :6
- s/^6/\0/
- T parse_action_else # It should solely follow the last entry in the table lookup (checked for a match).
- x
- :6_SEMICOLON
- s/^;//
- T error
- # shift ';'
- x
- s/^/';'\n/
- t goto_table
- :parse_action_else
- # Before jumping to init_stack a check is done that the parser is not in an acceptance state in which more input is remaining but not allowed.
- # Having arrived the flow to here implicitly always means more input is remaining.
- # A clean and cheap way is to impose for init_stack to be jumped into, that the stack must be totally empty. This check is cheaper (more inexpensive) than checking arround acceptance states.
- s/^$//
- t init_stack
- b error
- :goto_table
- # It's trivial yet worthnoting, upon walking across goto table's 2nd. dimmension (where stack top is being transited from a symbol to a new state), utmost correctness rules would require to jump to the next entry from each one against which a missmatch has turned out. In practice, belonging these 2 types of elements (symbols and states) each on its separate domain, and especially being they of different form distinguishable one another, guarrantee is that at most one entry will hit while the others not.
- # Rightmost state
- :0g
- s/\`.*\n0/\0/m
- T 1g
- :0g_PACKAGE
- s/^PACKAGE/1\n\0/
- t next_cycle
- :0g_PKGSTMT
- s/^PKGSTMT/6\n\0/
- t next_cycle
- :1g
- s/\`.*\n1/\0/m
- T 3g
- :1g_ID
- s/^ID/2\n\0/
- t next_cycle
- :1g_SEQ
- s/^SEQ/3\n\0/
- t next_cycle
- :3g
- s/\`.*\n3/\0/m
- T 4g
- :3g_'.'
- s/^'\.'/4\n\0/
- t next_cycle
- :4g
- s/\`.*\n4/\0/m
- T 6g
- :4g_ID
- s/^ID/5\n\0/
- t next_cycle
- :6g
- s/\`.*\n6/\0/m
- T error # This is a goto table exit condidion. It must go only where every else table entry cannot be provided.
- :6g_SEMICOLON
- s/^';'/7\n\0/
- t next_cycle
- :init_stack
- s/^$/0/
- t next_cycle
- a\
- Oh oh!
- :error
- p
- a\
- Error
- x
- q 1 # GNU's extension
|