design.tex 16 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238
  1. \chapter{Program Design and Implementation}
  2. The design of an OpenMath/MathML interface must aim to keep the structure simple, extensible if needed and easy to maintain. This document will
  3. attempt to describe the structure of the overall system and the individual modules which compose it. A common interface coordinating the separate
  4. components will be analysed and defined.
  5. Furthermore we will explain why the system will be table based and what advantages this offers for our application. Because both OpenMath and
  6. MathML are XML languages, we must specify the requirements the translator's lexer and parser must follow. Finally we will see what new
  7. functionalities can be added to the interface in possible future extensions.
  8. \section{System architecture}
  9. The task of translating one language to another, as is the case of our OpenMath/MathML interface, can be compared to the task performed by a
  10. compiler when passing from a programming language to a computer executable representation.
  11. We will need to lex and parse an expression, represent it in some intermediate language which allows a certain degree of freedom for manipulation,
  12. and then from there an expression can be generated in the target language.
  13. Following this approach, the architecture of the REDUCE OpenMath to MathML interface is going to be composed of four independent modules. One for
  14. each of the following tasks:
  15. \begin{itemize}
  16. \item Passing MathML to the intermediate representation
  17. \item Passing OpenMath to the intermediate representation
  18. \item Passing from the intermediate representation to MathML
  19. \item Passing from the intermediate representation to OpenMath
  20. \end{itemize}
  21. Dividing the interface into these separate modules gives us the possibility to better understand the overall process of translation. It has the
  22. advantage of permitting efficient modifications to the system.
  23. If MathML syntax were to change, for instance, it would only be necessary to modify two of the four modules. This separation also makes it easy to
  24. add extensions. By implementing a module going from the intermediate representation to \LaTeX, it is possible to extend the interface's
  25. capabilities to offer OpenMath to \LaTeX~or MathML to \LaTeX~translations. Figure \ref{archi} illustrates the system architecture.
  26. \begin{figure}
  27. \begin{center}
  28. {\includegraphics{architecture.eps}}
  29. %{\includegraphics{architecture.jpg}}
  30. \caption{OpenMath/MathML Interface System Architecture}
  31. \label{archi}
  32. \end{center}
  33. \end{figure}
  34. \subsection{Module Requirements}
  35. Each of these modules has several requirements to respect. These requirements ensure the system is efficient and behaves satisfactorily. {\it (Here
  36. IR stands for intermediate representation)}
  37. \begin{description}
  38. \item[MathML to IR:] This module parses through MathML and generates an equivalent expression in the intermediate representation. It should ensure
  39. that the input given is not lexically or syntactically incorrect. In which case the translation process is aborted. Incorrect or unimportant
  40. attribute values should be ignored unless they compromise the translation process. Both MathML 1.0 and MathML 2.0 expressions must be accepted as
  41. valid and parsed. It should be designed so any modification in MathML can be easily adapted to.
  42. \item[IR to MathML:] This module generates valid MathML from the intermediate representation of an expression. The user should have the option to
  43. generate either MathML 2.0 or MathML 1.0, since most applications today are only MathML 1.0 compliant . In order to embed MathML into a web page
  44. for rendering by a plug-in, there should also be an option outputting the MathML inside HTML \verb|<embed>| tags.
  45. \item[OpenMath to IR:] This module reads in OpenMath expressions and transforms them into the intermediate representation. It should ensure that
  46. the input given is not lexically or syntactically incorrect. In which case the translation process is aborted. Symbols must be checked to see if
  47. they have a MathML equivalent. This means checking each symbol against the CD it belongs to and then looking up in a table to see whether a mapping
  48. is possible. If there is no equivalent, this module must encode the OpenMath symbol into the intermediate representation as an unknown symbol for
  49. inclusion in MathML \verb|<semantic>| tags.
  50. \item[IR to OpenMath:] This module generates valid OpenMath from the intermediate representation. It is important that all symbols generated appear
  51. next to the correct CD to which they belong. This is done by consulting a table containing this information.
  52. \end{description}
  53. Because it is important to specify which OpenMath CDs an application handles, Appendix A gives a comprehensive list of all the OpenMath CDs and
  54. elements which are supported by the translator.
  55. \section{The Intermediate Representation}
  56. If the breakdown of the system into separate modules is to be effective, we need a clean interface between all parts. An intermediate
  57. representation representing expressions in a generic way accomplishes this task. For an intermediate representation to be useful, it is important
  58. that it conveys and preserves all the information MathML and OpenMath objects are capable of representing. Let us look at the requirements such an
  59. intermediate representation must satisfy for use in our OpenMath/MathML interface.
  60. Both OpenMath and MathML build expressions by using prefix operators. REDUCE's symbolic representation of expressions also uses prefix operators to
  61. construct expressions. This connection motivates us to use prefix operators in our intermediate representation, thus allowing an uncomplicated
  62. mapping between the intermediate representation, OpenMath, MathML and REDUCE's representation of expressions. Subsequently the intermediate
  63. representation is closely related to the parse trees of each language.
  64. Given that MathML elements may take attributes changing their semantic meaning, it is necessary that attribute values are represented by the
  65. intermediate representation. Thus permitting MathML elements mapping to different OpenMath symbols (depending on their attribute values) to be
  66. correctly translated from one standard to the other. The attributes conveyed by the intermediate representation are then interpreted differently by
  67. the various modules according to the context they appear in.
  68. Considering that OpenMath extensibility is a key issue, our intermediate representation must be able to encode objects without MathML equivalent.
  69. The unsupported OpenMath symbol and its CD will be passed on from the {\bf OpenMath to IR} module to the {\bf IR to MathML} module so that the
  70. MathML extension mechanism is employed.
  71. Moreover, the intermediate representation will need to be simple to manipulate. Since RLISP is the programming language in which this interface is
  72. written, we must keep in mind the possibilities and limitations this language offers. Therefore the intermediate representation expressions will be
  73. structured as lists. Lists are the basic data structures in RLISP and there exist many commands permitting very easy and efficient manipulation of
  74. them.
  75. Because our intermediate representation is designed in terms of the syntactic structure of both OpenMath and MathML, and certain subroutines are
  76. attached to the MathML and OpenMath production rules to produce proper intermediate encoding, we can classify our methodology as {\it
  77. syntax-directed translation}~\cite{compilers}. Basically, the actions of the syntax analysis phase guide the translation. Thus the intermediate
  78. code is generated as syntax analysis takes place.
  79. \section{Use of Tables in the Translation Process}
  80. The complexity and diversity of MathML and OpenMath elements require that a translator has some way of keeping information concerning all elements.
  81. The parsing and generation of OpenMath requires a translator to have some way of knowing which content dictionaries symbols belong to. Similarly,
  82. the correct procedures must be employed upon each element encountered. This information must be stored in a readily accessible way. It is important
  83. to design these tables and that we understand how each module will use them to appropriately accomplish their tasks.
  84. The information guiding the translator can be either hand coded into the program or gathered into tables. Hand coding is complex and useful only in
  85. situations where an element needs to be handled in a very precise way. Tables however can contain organized information related to each element
  86. useful when parsing and generating expressions.
  87. We believe using a table-based system is more efficient for our application and can produce better and more compact code, thus improving code
  88. readability, extensibility and maintenance. Because a translator must deal with a variety of elements, most of similar structure, a table-based
  89. system permits the translator to relate an element to a set of functions and/or information. This way, any modifications of the MathML standard can
  90. be easily adapted to by modifying a table or adding a new entry to it.
  91. The idea is to gather in a few tables all the necessary information for properly handling all MathML and OpenMath recognized elements. Let us
  92. describe the main tables\footnote{These tables are defined in the file {\tt tables.red}} which are used by the interface. To better understand the
  93. system we will describe how they should be used by each module to accomplish the task.
  94. \subsubsection{MathML to IR Module}
  95. All MathML elements are stored in the tables {\tt constructors!*}, {\tt relations!*} and {\tt functions!*}. These tables determine what functions
  96. must be called for each MathML element encountered and what the equivalent intermediate representation operator is.
  97. When a MathML object is encountered, the first element will inform us of how the expression is constructed. We look this element up in the {\tt
  98. constructors!*} table to call the proper function which deals with objects constructed in this manner.
  99. If the expression constructor is the \verb|<reln>| element then the {\tt relations!*} table is used. This table will determine which function to
  100. call as well as containing the equivalent intermediate representation operator. The {\tt functions!*} table is the same as the {\tt relations!*}
  101. table only that it contains all operators appearing within \verb|<apply>|\ldots\verb|</apply>| instead.
  102. These tables together will inform the translator of how to deal with all MathML elements
  103. New MathML elements can be added to these tables to modify the translator's scope. An existing procedure can be related to the new element, or a
  104. new procedure can be implemented and added to the table next to the element's entry. An equivalent intermediate operator must also be defined here.
  105. \subsubsection{IR to MathML Module}
  106. When an intermediate representation expression must be translated to MathML the table {\tt ir2om\_mml!*} specifies which function to call for the
  107. translation of each intermediate representation operator. As an intermediate expression is parsed, this table will ensure that proper production of
  108. MathML is achieved. This table also contains the function to call when producing OpenMath.
  109. New operators are added to this table. The procedure name specifying how the new IR operator is translated to MathML is also added to the table.
  110. \subsubsection{OpenMath to IR Module}
  111. OpenMath objects must be thoroughly checked for various reasons. Firstly, not all OpenMath symbols have MathML equivalents. Table {\tt mmleq!*}
  112. contains all OpenMath symbols which easily translate to MathML. If a symbol is not contained within this table then it is searched inside tables
  113. {\tt special\_cases!*} and {\tt special\_cases2!*}.
  114. Table {\tt special\_cases!*} contains all OpenMath symbols which have a MathML equivalent but under a different name. It also deals with OpenMath
  115. symbols mapping to one MathML element but with different attribute values. This table will also specify where necessary the correct attribute types
  116. and values the MathML equivalent element must take.
  117. Table {\tt special\_cases2!*} contains all OpenMath symbols which require careful translation. For each element, a specific function is associated.
  118. These functions are specially designed to deal with these elements efficiently.
  119. If a symbol is not contained within any of these tables, then the elements is considered unknown and the MathML extension mechanism is used to
  120. produce a reasonable translation.
  121. \subsubsection{IR to OpenMath Module}
  122. Producing OpenMath from the intermediate representation follows a similar procedure as that described for generation of MathML. The table {\tt
  123. ir2om\_mml!*} contains the function to call for each intermediate representation operator to produce OpenMath.
  124. \section{XML Lexing and Parsing}
  125. Because there are no XML lexers or parsers for REDUCE, it is necessary to design and implement them. In order to do so it is important to establish
  126. what the requirements of such procedures are.
  127. \subsection{The Lexer}
  128. Both MathML and OpenMath are based on the structures defined by XML. The lexer must validate XML markup languages and extract the necessary tokens
  129. from the successive characters in the input source.
  130. Hence it is important that our lexer tokenizes XML elements as well as determining the different attribute types and values an element may possess.
  131. These requirements must be met in order to retrieve the different attributes contained in MathML elements or to find out what symbol and content
  132. dictionary is expressed by an OpenMath \verb|<OMS>| tag.
  133. An XML lexer must also be flexible with spaces, ignoring any amount of spaces or return carriages contained in the input source.
  134. \subsection{The Parser}
  135. The lexical analysis and the following phase, the syntax analysis, will be grouped together into the same pass. Under that pass the lexer operates
  136. under the control of the parser. The parser will ask the lexical analyzer for the next token whenever it needs one. The lexer will return this
  137. information as well as storing the attribute types and values of the current token parsed. The parser will not generate a parse tree explicitly but
  138. rather go to intermediate code directly as syntax analysis takes place.
  139. The parser will stop its task when a syntactical error or a misspelled or unrecognized token is encountered. It should not attempt to correct it.
  140. In some cases a constructive error message\footnote{The efficiency of this facility will depend on the time left to correctly implement it} will be
  141. printed to the user.
  142. The parser we will implement will follow the widely used LL(1) parsing method also known as {\it predictive recursive descent} parsing. The parser
  143. will use top-down parsing following the grammars defined in both the MathML and OpenMath standards and will only need to look at the next token in
  144. the token stream ~\cite{compilers}.
  145. \section{Possible Future Extensions}
  146. The desire to extend the OpenMath/MathML interface to include new functions or adapt to changes was paramount in the design process. Here we would
  147. like to mention some possible extensions which could be added in the future.
  148. \begin{description}
  149. \item[Evaluation of expressions:] It should be possible to extend the interface to allow evaluation of OpenMath and MathML expressions using
  150. REDUCE's computational power. This extension is possible because the intermediate representation was designed imitating REDUCE's internal
  151. representation of expressions. Without difficulty a procedure could be implemented which would evaluate an intermediate representation expression
  152. by mapping it to REDUCE's internal representation. The appropriate modules would then print out the evaluated intermediate representation as MathML
  153. or OpenMath.
  154. \item[Separate Interfaces to REDUCE:] For the same reasons as expressed above, it is possible to modify the interface so it offers a MathML to
  155. REDUCE interface and/or an OpenMath to REDUCE interface. This would allow a REDUCE user to import and export MathML or OpenMath expressions
  156. separately for use in calculations or for transmission on the Internet to other applications.
  157. \item[Interfaces to other Representations:] Because the system architecture is designed around the intermediate representation, it is possible to
  158. implement modules which transform the intermediate representation into other representations such as \LaTeX, \TeX, HTML, or WEB\TeX, thus allowing
  159. translation from MathML or OpenMath to any of the mentioned representations.
  160. \end{description}