123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388 |
- Suggestion for how operator fixity should be specified
- ------------------------------------------------------
- Nils Anders Danielsson
- (By fixity I mean associativity and precedence.)
- The current scheme is a mess. With Unicode symbols and mixfix
- operators users (such as myself) tend to define more operators than in
- Haskell, and then the Haskell fixity handling is too limited. It is
- very hard to get an overview over a total ordering which specifies how
- tight every operator binds in comparison to every other.
- This note describes a way to avoid these problems. The solution is not
- perfect--some limitations are discussed towards the end--but it is
- quite lightweight, so it should be relatively easy to implement and
- try out.
- New approach
- ------------
- Associativity can be specified just as before. An operator is either
- left associative, right associative, or nonassociative. (Note that
- only infix operators can be left or right associative; pre- and
- postfix operators are always nonassociative.)
- The basic idea of the new approach to precedence handling is to
- abandon the current total order and instead have a partial order of
- precedences. This is an old idea, which is easy to understand. The
- basic difference compared to the current scheme is that two operators
- of noncomparable precedence cannot be used next to each other without
- inserting parentheses. The only crux is to find a good way of
- specifying the precedences.
- I believe that it is a good idea if the precedence of an operator can
- be understood locally, so I suggest that one should only be allowed to
- specify precedences at the binding site (in the defining module) of an
- operator, conservatively extended when new operators are defined.
- (This rules out having "first class precedences", where the
- precedences of an operator are free-standing entities which can be
- exported and imported separately from the operators themselves.)
- Precedences are defined for an operator • by relating it to
- previously defined operators. This can be done in three ways:
- * infix[lr ] • binds as ∘
- This means that • (which is left, right or nonassociative) binds in
- exactly the same way as ∘.
- * infix[lr ] • binds tighter-than (op₁…) looser-than (op₂…)
- This means that • binds strictly tighter than op₁… and strictly
- looser than op₂…. The two parts tighter-than (…) and looser-than (…)
- can be given in any order, and one of them can be omitted.
- This declaration is only valid if it does not change the relations
- of any previously declared operators, i.e., if the precedence
- relation before this declaration is denoted by ⊰ and the one after
- this declaration by ⊰′, then the following property must hold:
- ∀ op₁ ≠ •. ∀ op₂ ≠ •. op₁ ⊰ op₂ ⇔ op₁ ⊰′ op₂.
- This property ensures some degree of locality for the precedences:
- To see if/how two operators are related it is enough to inspect the
- fixity declarations of these two operators, plus those of the
- operators referred to in these declarations (transitively). It is
- impossible for an unrelated fixity declaration to change this
- relation.
- * infix[lr ] •
- With this declaration • becomes unrelated to all other operators.
- * No fixity declaration for • is the same as specifying "infix •",
- i.e. • becomes nonassociative and unrelated to all other operators.
- It should also be possible to combine the fixity declarations of
- several operators, for instance as follows:
- infixl _op₁_ _op₂_ _op₃ binds looser-than (_+_)
- This is equivalent to the following three declarations:
- infixl _op₁_ binds looser-than (_+_)
- infixl _op₂_ binds as _op₁_
- infixl _op₃ binds as _op₁_
- (Note the use of _op₁_ in the last two declarations.)
- Some minor details
- ------------------
- Some minor details (as compared to the current fixity handling in
- Agda):
- * Non-operator (function) symbols should still bind tighter than
- everything else.
- * Fixity declarations should of course be scope checked, and an error
- given if a fixity declaration is given for an operator which is not
- in scope.
- * It should be possible to give fixity declarations to record fields,
- for instance as follows:
- infix Setoid._≈_ binds as _≡_
- infixl Ring._-_ binds as Ring._+_
- * I do not like the fact that, for operators of the same precedence,
- the following sub-order of precedence is used:
- * postfix
- * prefix
- * infix right associative
- * infix left associative
- * infix nonassociative
- As an example, take the following fixity declarations:
- infixr 6 _∷_
- infixl 6 _+_
- infix 6 -_ _!
- Currently they result in 5 + 6 ∷ [], - 5 + 6 and - 2 ! parsing as (5
- + 6) ∷ [], (- 5) + 6 and - (2 !), even though these expressions
- should not parse at all.
- Limitations
- -----------
- The scheme outlined above has a limitation, demonstrated by the
- following example:
- Let us say that two libraries, one for sets and one for arithmetic,
- are developed independently. It is probably unreasonable (if one
- wants to keep unrelated code separate) for one of these libraries to
- depend on the other. Hence expressions such as the following won't
- parse: a + b ∈ c. Parentheses will have to be used: (a + b) ∈ c.
- To me this example is not very convincing, though. If the two
- libraries are really separate, then there should not be _any_
- connection between them. If, on the other hand, it is a requirement
- that _+_ should really bind tighter than _∈_, then the libraries are
- not unrelated, but one should import the other.
- Another possible problem with the scheme outlined above is its
- implementation. Currently mixfix operator parsing is implemented in
- Agda (more or less) as follows:
- * Expressions are first parsed as if every operator was a function
- symbol. This yields parse trees (rose trees) which need to be
- post-processed.
- * Operator parsing is then done as part of scope-checking. For every
- symbol sequence (list in the rose tree) in a parsed expression a
- dedicated parser is generated based on which operator parts are
- present in the sequence. Scope information is needed for this step,
- since the relative precedences of the operators and also their
- associativity are used to construct the parser. The symbol sequence
- is then parsed using this dedicated parser.
- It is currently unclear whether this implementation method can be made
- efficient for the fixity handling scheme outlined above.
- Conclusion
- ----------
- If the implementation can be made efficient, then I believe that the
- scheme outlined above is strictly better than the one we have. It is
- also easy to understand. In other words, I will start thinking about
- the implementation.
- ------------------------------------------------------------------------
- Improved syntax
- ------------------------------------------------------------------------
- infix [ε|left|right] <operators>
- [ε|binds [as <operator>
- |looser than <operators>
- |tighter than <operators>
- |looser than <operators> tighter than <operators>
- |tighter than <operators> looser than <operators>
- ]]
- ------------------------------------------------------------------------
- Refinement
- ------------------------------------------------------------------------
- Ulf commented that the scheme above is too inflexible. If (the already
- existing) library A defines _+_, and library B defines _&_ (which is
- unrelated to _+_), then it is impossible to define _==_ in library C
- in such a way that _+_ binds tighter than _==_, which in turn binds
- tighter than _&_. In order to accommodate this, let us drop
- transitivity.
- Details (based on discussions with Ulf):
- Precedence relations are DAGs, whose nodes are annotated with sets
- of operators. Let node(•) be the node of operator • (if any), and
- let n₁ ⊰ n₂ mean that there is an edge from node n₁ to node n₂.
- Fixity declarations get the following meanings:
- • binds as ∘:
- • is added to node(∘).
- • binds looser than ∘₁ tighter than ∘₂:
- A new node annotated with {•} is added,
- plus an edge from node(∘₁) and an edge from node n for all n with
- n ⊰ node(∘₁),
- plus an edge to node(∘₂) and an edge to node n for all n with
- node(∘₂) ⊰ n.
- Note that this does not create any new dependencies between ∘₁ and
- ∘₂, but • inherits earlier dependencies.
- A precedence relation now gives rise to a context free grammar in
- the following way:
- * The top-level production is as follows:
- expr ∷= <atom> | ⋁ {n | n is a node in the graph}
- * For every node n the following productions are added:
- n ∷= prefix-op⁺ n↑
- | n↑ postfix-op⁺
- | n↑ infix-non-assoc-op n↑
- | (n↑ infix-left-assoc-op)⁺ n↑
- | n↑ (infix-right-assoc-op n↑)⁺
- n↑ ∷= <atom> | ⋁ {n' | n ⊰ n'}
- x-op ∷= ⋁ {op-prod | op is an "x" operator annotating n}
- op-prod ∷= op₁ expr op₂ expr op₃ … op_n
- (where op_i are the name parts of the mixfix operator op)
- Note that if all operator name parts are unique, and <atom>s don't
- introduce any ambiguity, then the grammar is unambiguous. However,
- we don't want to require all operator name parts to be unique, since
- this can be rather inflexible. (Consider a DSEL containing both
- if_then_ and if_then_else_, for instance. Or the two operators ⟦_⟧_
- and ⟦_⟧'_.) All ambiguous parses will be rejected, in many cases
- with an error message listing all possible parses:
- Ambiguous parse. Could mean any of the following:
- if x then (if y then a) else b
- if x then (if y then a else b)
- We expect there to be rather few cases of ambiguity. A large number
- of potentially ambiguous operators will make it harder to write
- syntactically correct programs, and programmers will presumably be
- reluctant to subject themselves to this situation.
- ------------------------------------------------------------------------
- Sections
- ------------------------------------------------------------------------
- We can also support sections. Some examples will outline how this can
- be accomplished:
- If we have
- _+_ : ...
- then 5 +_ and _+ 3 are sections. They stand for
- \x -> 5 + x
- and
- \x -> x + 3,
- respectively. Note that +_ becomes a postfix operator, and _+ a
- prefix operator. Note also that _+_ can be viewed as a section, and
- does not need to be treated as a special case. (The qualified
- variant M._+_ still needs special treatment, though.)
- All mixfix operators can be sectioned. For instance, if we have
- if_then_else_ : ...
- then if_then x else y stands for
- \b -> if b then x else y.
- Parsing of sections is accomplished by letting the lexer distinguish
- different uses of '_':
- * As a wildcard.
- * At the beginning of an operator.
- * In the middle of an operator.
- * At the end of an operator.
- The different uses can be distinguished by examining surrounding
- white space.
- ------------------------------------------------------------------------
- Open questions
- ------------------------------------------------------------------------
- * What is the sub-class of DAGs that the declarations introduced above
- can give rise to? Not all DAGs can be constructed in this way. Take
- •⟶•⟶•⟶•, for instance. Could this be overly limiting?
- * Does the order of the declarations matter? If it does, then the
- scheme should be changed, since otherwise we would have a
- non-declarative language for specifying fixities. (It would not be
- very nice if the relative precedence of two operators depended on in
- which order two modules were imported, for instance.)
- Order does not matter for this simple example:
- infix _≡_
- infix left _+_ binds tighter than _≡_
- infix ¬_ binds looser than _≡_
- The declarations give rise to the following precedence graph:
- ╭─────╮
- │ _+_ ├⟵╮
- ╰──┬──╯ │
- ↑ │
- ╭──┴──╮ │
- │ _≡_ │ │
- ╰──┬──╯ │
- ↑ │
- ╭──┴──╮ │
- │ ¬_ ├─╯
- ╰─────╯
- If the order of the declarations is changed to
- infix _≡_
- infix ¬_ binds looser than _≡_
- infix left _+_ binds tighter than _≡_
- we still get the same graph. Is this generally true?
- ------------------------------------------------------------------------
- Summary of important "correctness" criteria
- ------------------------------------------------------------------------
- • Adding a new declaration should not change the relations between
- previously declared operators.
- • If declarations can be reordered, then the semantics must be
- independent of their order.
- ------------------------------------------------------------------------
- A possible problem with the scheme above
- ------------------------------------------------------------------------
- Consider the following (Agda-like) modules:
- module A where
- infix _*_
- module B where
- import A
- infix _^_ binds tighter-than (_*_)
- module C where
- import A
- infix _+_ binds looser-than (_*_)
- module D where
- import B; import C
- In D, do we have node(_+_) ⊰ node(_^_)? If not, then order of
- declarations does (in some sense) matter, since putting the two
- declarations in the same module would lead to a different result.
- However, if we do have node(_+_) ⊰ node(_^_), then the relationship
- between the two operators is not fixed until they are brought into the
- same scope. Neither scenario feels appealing.
- ------------------------------------------------------------------------
- Refinement of the refinement
- ------------------------------------------------------------------------
- I am compelled to remove the transitivity emulation from fixity
- declarations. It is too hard to understand. To start with we can
- require the user to specify every relationship explicitly. If this
- should turn out to require too much work, then the following
- extensions seem promising:
- • One could invent some notation for specifying the fixity of several
- operator groups at once, for instance:
- infix (_+_ _-_) < (_*_ _/_) < (_^_)
- The different groups in this kind of declaration would be
- transitively related.
- • One could specify a /module/ in an operator list; this would stand
- for all the operators exported from the module (top-level, plus
- perhaps records). Note that this may be a bit coarse. If Agda's open
- public was more like Haskell's re-exports, then it would be easy to
- use the module system to package operators for inclusion in fixity
- declarations, though.
|