fixity-declarations 14 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388
  1. Suggestion for how operator fixity should be specified
  2. ------------------------------------------------------
  3. Nils Anders Danielsson
  4. (By fixity I mean associativity and precedence.)
  5. The current scheme is a mess. With Unicode symbols and mixfix
  6. operators users (such as myself) tend to define more operators than in
  7. Haskell, and then the Haskell fixity handling is too limited. It is
  8. very hard to get an overview over a total ordering which specifies how
  9. tight every operator binds in comparison to every other.
  10. This note describes a way to avoid these problems. The solution is not
  11. perfect--some limitations are discussed towards the end--but it is
  12. quite lightweight, so it should be relatively easy to implement and
  13. try out.
  14. New approach
  15. ------------
  16. Associativity can be specified just as before. An operator is either
  17. left associative, right associative, or nonassociative. (Note that
  18. only infix operators can be left or right associative; pre- and
  19. postfix operators are always nonassociative.)
  20. The basic idea of the new approach to precedence handling is to
  21. abandon the current total order and instead have a partial order of
  22. precedences. This is an old idea, which is easy to understand. The
  23. basic difference compared to the current scheme is that two operators
  24. of noncomparable precedence cannot be used next to each other without
  25. inserting parentheses. The only crux is to find a good way of
  26. specifying the precedences.
  27. I believe that it is a good idea if the precedence of an operator can
  28. be understood locally, so I suggest that one should only be allowed to
  29. specify precedences at the binding site (in the defining module) of an
  30. operator, conservatively extended when new operators are defined.
  31. (This rules out having "first class precedences", where the
  32. precedences of an operator are free-standing entities which can be
  33. exported and imported separately from the operators themselves.)
  34. Precedences are defined for an operator • by relating it to
  35. previously defined operators. This can be done in three ways:
  36. * infix[lr ] • binds as ∘
  37. This means that • (which is left, right or nonassociative) binds in
  38. exactly the same way as ∘.
  39. * infix[lr ] • binds tighter-than (op₁…) looser-than (op₂…)
  40. This means that • binds strictly tighter than op₁… and strictly
  41. looser than op₂…. The two parts tighter-than (…) and looser-than (…)
  42. can be given in any order, and one of them can be omitted.
  43. This declaration is only valid if it does not change the relations
  44. of any previously declared operators, i.e., if the precedence
  45. relation before this declaration is denoted by ⊰ and the one after
  46. this declaration by ⊰′, then the following property must hold:
  47. ∀ op₁ ≠ •. ∀ op₂ ≠ •. op₁ ⊰ op₂ ⇔ op₁ ⊰′ op₂.
  48. This property ensures some degree of locality for the precedences:
  49. To see if/how two operators are related it is enough to inspect the
  50. fixity declarations of these two operators, plus those of the
  51. operators referred to in these declarations (transitively). It is
  52. impossible for an unrelated fixity declaration to change this
  53. relation.
  54. * infix[lr ] •
  55. With this declaration • becomes unrelated to all other operators.
  56. * No fixity declaration for • is the same as specifying "infix •",
  57. i.e. • becomes nonassociative and unrelated to all other operators.
  58. It should also be possible to combine the fixity declarations of
  59. several operators, for instance as follows:
  60. infixl _op₁_ _op₂_ _op₃ binds looser-than (_+_)
  61. This is equivalent to the following three declarations:
  62. infixl _op₁_ binds looser-than (_+_)
  63. infixl _op₂_ binds as _op₁_
  64. infixl _op₃ binds as _op₁_
  65. (Note the use of _op₁_ in the last two declarations.)
  66. Some minor details
  67. ------------------
  68. Some minor details (as compared to the current fixity handling in
  69. Agda):
  70. * Non-operator (function) symbols should still bind tighter than
  71. everything else.
  72. * Fixity declarations should of course be scope checked, and an error
  73. given if a fixity declaration is given for an operator which is not
  74. in scope.
  75. * It should be possible to give fixity declarations to record fields,
  76. for instance as follows:
  77. infix Setoid._≈_ binds as _≡_
  78. infixl Ring._-_ binds as Ring._+_
  79. * I do not like the fact that, for operators of the same precedence,
  80. the following sub-order of precedence is used:
  81. * postfix
  82. * prefix
  83. * infix right associative
  84. * infix left associative
  85. * infix nonassociative
  86. As an example, take the following fixity declarations:
  87. infixr 6 _∷_
  88. infixl 6 _+_
  89. infix 6 -_ _!
  90. Currently they result in 5 + 6 ∷ [], - 5 + 6 and - 2 ! parsing as (5
  91. + 6) ∷ [], (- 5) + 6 and - (2 !), even though these expressions
  92. should not parse at all.
  93. Limitations
  94. -----------
  95. The scheme outlined above has a limitation, demonstrated by the
  96. following example:
  97. Let us say that two libraries, one for sets and one for arithmetic,
  98. are developed independently. It is probably unreasonable (if one
  99. wants to keep unrelated code separate) for one of these libraries to
  100. depend on the other. Hence expressions such as the following won't
  101. parse: a + b ∈ c. Parentheses will have to be used: (a + b) ∈ c.
  102. To me this example is not very convincing, though. If the two
  103. libraries are really separate, then there should not be _any_
  104. connection between them. If, on the other hand, it is a requirement
  105. that _+_ should really bind tighter than _∈_, then the libraries are
  106. not unrelated, but one should import the other.
  107. Another possible problem with the scheme outlined above is its
  108. implementation. Currently mixfix operator parsing is implemented in
  109. Agda (more or less) as follows:
  110. * Expressions are first parsed as if every operator was a function
  111. symbol. This yields parse trees (rose trees) which need to be
  112. post-processed.
  113. * Operator parsing is then done as part of scope-checking. For every
  114. symbol sequence (list in the rose tree) in a parsed expression a
  115. dedicated parser is generated based on which operator parts are
  116. present in the sequence. Scope information is needed for this step,
  117. since the relative precedences of the operators and also their
  118. associativity are used to construct the parser. The symbol sequence
  119. is then parsed using this dedicated parser.
  120. It is currently unclear whether this implementation method can be made
  121. efficient for the fixity handling scheme outlined above.
  122. Conclusion
  123. ----------
  124. If the implementation can be made efficient, then I believe that the
  125. scheme outlined above is strictly better than the one we have. It is
  126. also easy to understand. In other words, I will start thinking about
  127. the implementation.
  128. ------------------------------------------------------------------------
  129. Improved syntax
  130. ------------------------------------------------------------------------
  131. infix [ε|left|right] <operators>
  132. [ε|binds [as <operator>
  133. |looser than <operators>
  134. |tighter than <operators>
  135. |looser than <operators> tighter than <operators>
  136. |tighter than <operators> looser than <operators>
  137. ]]
  138. ------------------------------------------------------------------------
  139. Refinement
  140. ------------------------------------------------------------------------
  141. Ulf commented that the scheme above is too inflexible. If (the already
  142. existing) library A defines _+_, and library B defines _&_ (which is
  143. unrelated to _+_), then it is impossible to define _==_ in library C
  144. in such a way that _+_ binds tighter than _==_, which in turn binds
  145. tighter than _&_. In order to accommodate this, let us drop
  146. transitivity.
  147. Details (based on discussions with Ulf):
  148. Precedence relations are DAGs, whose nodes are annotated with sets
  149. of operators. Let node(•) be the node of operator • (if any), and
  150. let n₁ ⊰ n₂ mean that there is an edge from node n₁ to node n₂.
  151. Fixity declarations get the following meanings:
  152. • binds as ∘:
  153. • is added to node(∘).
  154. • binds looser than ∘₁ tighter than ∘₂:
  155. A new node annotated with {•} is added,
  156. plus an edge from node(∘₁) and an edge from node n for all n with
  157. n ⊰ node(∘₁),
  158. plus an edge to node(∘₂) and an edge to node n for all n with
  159. node(∘₂) ⊰ n.
  160. Note that this does not create any new dependencies between ∘₁ and
  161. ∘₂, but • inherits earlier dependencies.
  162. A precedence relation now gives rise to a context free grammar in
  163. the following way:
  164. * The top-level production is as follows:
  165. expr ∷= <atom> | ⋁ {n | n is a node in the graph}
  166. * For every node n the following productions are added:
  167. n ∷= prefix-op⁺ n↑
  168. | n↑ postfix-op⁺
  169. | n↑ infix-non-assoc-op n↑
  170. | (n↑ infix-left-assoc-op)⁺ n↑
  171. | n↑ (infix-right-assoc-op n↑)⁺
  172. n↑ ∷= <atom> | ⋁ {n' | n ⊰ n'}
  173. x-op ∷= ⋁ {op-prod | op is an "x" operator annotating n}
  174. op-prod ∷= op₁ expr op₂ expr op₃ … op_n
  175. (where op_i are the name parts of the mixfix operator op)
  176. Note that if all operator name parts are unique, and <atom>s don't
  177. introduce any ambiguity, then the grammar is unambiguous. However,
  178. we don't want to require all operator name parts to be unique, since
  179. this can be rather inflexible. (Consider a DSEL containing both
  180. if_then_ and if_then_else_, for instance. Or the two operators ⟦_⟧_
  181. and ⟦_⟧'_.) All ambiguous parses will be rejected, in many cases
  182. with an error message listing all possible parses:
  183. Ambiguous parse. Could mean any of the following:
  184. if x then (if y then a) else b
  185. if x then (if y then a else b)
  186. We expect there to be rather few cases of ambiguity. A large number
  187. of potentially ambiguous operators will make it harder to write
  188. syntactically correct programs, and programmers will presumably be
  189. reluctant to subject themselves to this situation.
  190. ------------------------------------------------------------------------
  191. Sections
  192. ------------------------------------------------------------------------
  193. We can also support sections. Some examples will outline how this can
  194. be accomplished:
  195. If we have
  196. _+_ : ...
  197. then 5 +_ and _+ 3 are sections. They stand for
  198. \x -> 5 + x
  199. and
  200. \x -> x + 3,
  201. respectively. Note that +_ becomes a postfix operator, and _+ a
  202. prefix operator. Note also that _+_ can be viewed as a section, and
  203. does not need to be treated as a special case. (The qualified
  204. variant M._+_ still needs special treatment, though.)
  205. All mixfix operators can be sectioned. For instance, if we have
  206. if_then_else_ : ...
  207. then if_then x else y stands for
  208. \b -> if b then x else y.
  209. Parsing of sections is accomplished by letting the lexer distinguish
  210. different uses of '_':
  211. * As a wildcard.
  212. * At the beginning of an operator.
  213. * In the middle of an operator.
  214. * At the end of an operator.
  215. The different uses can be distinguished by examining surrounding
  216. white space.
  217. ------------------------------------------------------------------------
  218. Open questions
  219. ------------------------------------------------------------------------
  220. * What is the sub-class of DAGs that the declarations introduced above
  221. can give rise to? Not all DAGs can be constructed in this way. Take
  222. •⟶•⟶•⟶•, for instance. Could this be overly limiting?
  223. * Does the order of the declarations matter? If it does, then the
  224. scheme should be changed, since otherwise we would have a
  225. non-declarative language for specifying fixities. (It would not be
  226. very nice if the relative precedence of two operators depended on in
  227. which order two modules were imported, for instance.)
  228. Order does not matter for this simple example:
  229. infix _≡_
  230. infix left _+_ binds tighter than _≡_
  231. infix ¬_ binds looser than _≡_
  232. The declarations give rise to the following precedence graph:
  233. ╭─────╮
  234. │ _+_ ├⟵╮
  235. ╰──┬──╯ │
  236. ↑ │
  237. ╭──┴──╮ │
  238. │ _≡_ │ │
  239. ╰──┬──╯ │
  240. ↑ │
  241. ╭──┴──╮ │
  242. │ ¬_ ├─╯
  243. ╰─────╯
  244. If the order of the declarations is changed to
  245. infix _≡_
  246. infix ¬_ binds looser than _≡_
  247. infix left _+_ binds tighter than _≡_
  248. we still get the same graph. Is this generally true?
  249. ------------------------------------------------------------------------
  250. Summary of important "correctness" criteria
  251. ------------------------------------------------------------------------
  252. • Adding a new declaration should not change the relations between
  253. previously declared operators.
  254. • If declarations can be reordered, then the semantics must be
  255. independent of their order.
  256. ------------------------------------------------------------------------
  257. A possible problem with the scheme above
  258. ------------------------------------------------------------------------
  259. Consider the following (Agda-like) modules:
  260. module A where
  261. infix _*_
  262. module B where
  263. import A
  264. infix _^_ binds tighter-than (_*_)
  265. module C where
  266. import A
  267. infix _+_ binds looser-than (_*_)
  268. module D where
  269. import B; import C
  270. In D, do we have node(_+_) ⊰ node(_^_)? If not, then order of
  271. declarations does (in some sense) matter, since putting the two
  272. declarations in the same module would lead to a different result.
  273. However, if we do have node(_+_) ⊰ node(_^_), then the relationship
  274. between the two operators is not fixed until they are brought into the
  275. same scope. Neither scenario feels appealing.
  276. ------------------------------------------------------------------------
  277. Refinement of the refinement
  278. ------------------------------------------------------------------------
  279. I am compelled to remove the transitivity emulation from fixity
  280. declarations. It is too hard to understand. To start with we can
  281. require the user to specify every relationship explicitly. If this
  282. should turn out to require too much work, then the following
  283. extensions seem promising:
  284. • One could invent some notation for specifying the fixity of several
  285. operator groups at once, for instance:
  286. infix (_+_ _-_) < (_*_ _/_) < (_^_)
  287. The different groups in this kind of declaration would be
  288. transitively related.
  289. • One could specify a /module/ in an operator list; this would stand
  290. for all the operators exported from the module (top-level, plus
  291. perhaps records). Note that this may be a bit coarse. If Agda's open
  292. public was more like Haskell's re-exports, then it would be easy to
  293. use the module system to package operators for inclusion in fixity
  294. declarations, though.