regexp.7 2.1 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134
  1. .TH REGEXP 7
  2. .SH NAME
  3. regexp \- Plan 9 regular expression notation
  4. .SH DESCRIPTION
  5. This manual page describes the regular expression
  6. syntax used by the Plan 9 regular expression library
  7. .IR regexp (3).
  8. It is the form used by
  9. .IR egrep (1)
  10. before
  11. .I egrep
  12. got complicated.
  13. .PP
  14. A
  15. .I "regular expression"
  16. specifies
  17. a set of strings of characters.
  18. A member of this set of strings is said to be
  19. .I matched
  20. by the regular expression. In many applications
  21. a delimiter character, commonly
  22. .LR / ,
  23. bounds a regular expression.
  24. In the following specification for regular expressions
  25. the word `character' means any character (rune) but newline.
  26. .PP
  27. The syntax for a regular expression
  28. .B e0
  29. is
  30. .IP
  31. .EX
  32. e3: literal | charclass | '.' | '^' | '$' | '(' e0 ')'
  33. e2: e3
  34. | e2 REP
  35. REP: '*' | '+' | '?'
  36. e1: e2
  37. | e1 e2
  38. e0: e1
  39. | e0 '|' e1
  40. .EE
  41. .PP
  42. A
  43. .B literal
  44. is any non-metacharacter, or a metacharacter
  45. (one of
  46. .BR .*+?[]()|\e^$ ),
  47. or the delimiter
  48. preceded by
  49. .LR \e .
  50. .PP
  51. A
  52. .B charclass
  53. is a nonempty string
  54. .I s
  55. bracketed
  56. .BI [ \|s\| ]
  57. (or
  58. .BI [^ s\| ]\fR);
  59. it matches any character in (or not in)
  60. .IR s .
  61. A negated character class never
  62. matches newline.
  63. A substring
  64. .IB a - b\f1,
  65. with
  66. .I a
  67. and
  68. .I b
  69. in ascending
  70. order, stands for the inclusive
  71. range of
  72. characters between
  73. .I a
  74. and
  75. .IR b .
  76. In
  77. .IR s ,
  78. the metacharacters
  79. .LR - ,
  80. .LR ] ,
  81. an initial
  82. .LR ^ ,
  83. and the regular expression delimiter
  84. must be preceded by a
  85. .LR \e ;
  86. other metacharacters
  87. have no special meaning and
  88. may appear unescaped.
  89. .PP
  90. A
  91. .L .
  92. matches any character.
  93. .PP
  94. A
  95. .L ^
  96. matches the beginning of a line;
  97. .L $
  98. matches the end of the line.
  99. .PP
  100. The
  101. .B REP
  102. operators match zero or more
  103. .RB ( * ),
  104. one or more
  105. .RB ( + ),
  106. zero or one
  107. .RB ( ? ),
  108. instances respectively of the preceding regular expression
  109. .BR e2 .
  110. .PP
  111. A concatenated regular expression,
  112. .BR "e1\|e2" ,
  113. matches a match to
  114. .B e1
  115. followed by a match to
  116. .BR e2 .
  117. .PP
  118. An alternative regular expression,
  119. .BR "e0\||\|e1" ,
  120. matches either a match to
  121. .B e0
  122. or a match to
  123. .BR e1 .
  124. .PP
  125. A match to any part of a regular expression
  126. extends as far as possible without preventing
  127. a match to the remainder of the regular expression.
  128. .SH "SEE ALSO"
  129. .IR regexp (3)