introduction.tex 22 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276
  1. \chapter{Introduction}
  2. Nearly eight years after the appearance of the World Wide Web, it is still a difficult medium to use for the transmission
  3. of mathematics and scientific material in spite of its success in other areas. Sending mathematics via e-mail or reading
  4. mathematics into a software package from a web page is not a simple task, depriving the scientific community from a
  5. powerful communications tool which is the Internet. Likewise, displaying mathematics on the Internet in a way that allows
  6. editing and reuse has until now been impossible.
  7. As the Internet continues to grow it is becoming ever more important to facilitate the exchange of mathematics amongst
  8. users and computer algebra software packages, offering automatic processing of expressions, searching, editing and reuse.
  9. To overcome these difficulties, various companies and societies have joined together to produce standards for representing mathematics whilst
  10. preserving mathematical meaning. The World Wide Web Consortium\index{World Wide Web Consortium}~\cite{w3c} and the OpenMath\index{OpenMath
  11. Society} society~\cite{openmath} have developed the two leading standards currently receiving most attention. These are MathML\index{MathML}
  12. \cite{mathml} and OpenMath\index{OpenMath} \cite{openmathspec} respectively.
  13. The chief purpose of OpenMath\index{OpenMath} is to facilitate consistent communication of mathematics between
  14. mathematical applications. MathML\index{MathML} however, concentrates on displaying mathematics on the web whilst
  15. maintaining its meaning. Both standards are complementary and used together can provide the opportunity to expand our
  16. ability to represent, encode and successfully communicate mathematical ideas with one another across the Internet.
  17. The primary aim of this project is to understand the differences and similarities between OpenMath\index{OpenMath} and
  18. MathML\index{MathML}, to assess their exchangeability and develop a way of mapping one standard to the other. The main
  19. objective will be to ultimately design and implement an interface running on REDUCE\index{REDUCE} which will translate
  20. OpenMath\index{OpenMath} into MathML\index{MathML} and vice versa. This interface will provide REDUCE\index{REDUCE} with
  21. the capability of exchanging mathematics with other applications as well as displaying output on the World Wide Web and
  22. reading from it, allowing REDUCE to join the MathML/OpenMath trend.
  23. \chapter{Literature Review}
  24. The notation of mathematics has constantly evolved with the appearance of new concepts and ideas. Modern mathematical
  25. notation is the result of centuries of refinement. As a result of this, the sophisticated symbols with which we write
  26. mathematics pose certain problems when bringing them onto printed paper. Publishing mathematics is a difficult task simply
  27. because mathematics do not lend themselves easily to publication.
  28. Recently, the advances in Internet publishing, following the Internet expansion, have added a new dimension to
  29. mathematical publishing. New problems as well as new requirements must be dealt with. We want the Internet not only to be
  30. a medium for displaying mathematics around the world, but also a communications tool for transmitting them.
  31. How can we ensure that mathematics published on a web page are reusable? Editable? The outputs of one application should
  32. be displayed on the Internet in a way humans can understand and other applications can reuse. But because there is a
  33. distinction between presenting mathematical objects, and transmitting their content, merging both into one notation to
  34. achieve this duality is a non-trivial task.
  35. In order to fully understand the motivations of this project, as well as appreciating its outcome, it is important to
  36. carefully illustrate any related issues. We will look into the development of mathematical publishing and how it has
  37. evolved with the growth of the Internet. This will permit us to better understand the need for mathematical representation
  38. standards such as MathML\index{MathML} and OpenMath\index{OpenMath} which we shall introduce. Finally we will talk about
  39. the relation between these standards, the existing software supporting them, and their future.
  40. With such an overview of the current situation, the necessity of a MathML\index{MathML} to OpenMath\index{OpenMath}
  41. interface for REDUCE\index{REDUCE} will become clear.
  42. \section{Mathematical Publishing}
  43. Before the foundation of the World Wide Web, encoding of mathematical documents was already a widespread practice. Back in
  44. the days when computers were starting to become popular, the ASCII\index{ASCII} character set (and encodings based on it)
  45. was the only widely available encoding scheme. The restrictions of such a limited symbol set were soon apparent.
  46. In the mid seventies, Donald Knuth developed \TeX\index{\TeX}, from which variants such as \LaTeX\index{\LaTeX} stemmed. Layout and
  47. typesetting of mathematics is extremely demanding and until now, Donald Knuth's \TeX\index{\TeX} had been able to address
  48. these difficulties in a successful way, appealing to the scientific community who has now made it a standard in scientific
  49. publishing. \TeX\index{\TeX} has become the tool of choice for producing scientific and mathematical documents.
  50. Despite its widespread use and ease with which it is authored, \TeX\index{\TeX} does not preserve mathematical semantic
  51. value, making it unpractical for use in web documents and useless for transmission between applications. \TeX\index{\TeX}
  52. is only concerned with describing the presentation of mathematics, not the content. Because people are interested in
  53. transmitting their ideas and research via e-mail or web pages it is fundamental that semantic value is kept.
  54. While \TeX\index{\TeX} is mainly a UNIX based application, PC applications dealing with mathematical encoding have also emerged. Generally these
  55. are equipped with a graphical user interface making them easier to use: Design Science\index{Design Science}'s MS Word Equation Editor,
  56. FrameMaker\index{FrameMaker}, WordPerfect\index{WordPerfect} or ScientificWord\index{ScientificWord} are a few to name examples. All these
  57. applications\footnote{It is worth noting that PC applications have not had the same success as \TeX\index{\TeX}.} just deal with displaying
  58. mathematics and ignore semantic value. They are usually vendor specific making them unpractical for use in mathematical web publishing.
  59. \section{Mathematics and the Internet Challenge}
  60. \subsection{Html and Mathematics}
  61. In the early 1990's, The World Wide Web Consortium\index{World Wide Web Consortium}'s Html \index{Html} became the
  62. standard markup language for publishing on the World Wide Web. It has since evolved and has become an extensible and very
  63. powerful means of representing interactive Internet documents. In terms of representing mathematics however, Html has
  64. little support.
  65. In the first versions of Html\index{Html} , no support for mathematics was included. It was not until 1993 that the first
  66. intent of embedding mathematics within Internet documents was attempted in the Html+\index{Html!Html+} draft \cite{htmlp}
  67. presented by the World Wide Web Consortium\index{World Wide Web Consortium}. Equations were represented directly as
  68. Html+\index{Html!Html+} using an SGML\index{SGML} \cite{sgml} based notation, inspired by \LaTeX's\index{\LaTeX} approach.
  69. In 1994, the World Wide Web Consortium\index{World Wide Web Consortium} went further in mathematics Internet publishing by
  70. presenting the Html 3.0\index{Html!Html 3.0} draft \cite{html3} (which later was officially published as the Html
  71. 3.2\index{Html!Html 3.2} \cite{html3.2} specification with a few modifications) which offered a more comprehensive support.
  72. They claimed {\it ``Html math is powerful enough to describe the range of math expressions you can create in common word
  73. processing packages, as well as being suitable for rendering to speech.''}
  74. Nonetheless, both drafts failed because of lack of interest from popular browser vendors. But even though the mathematical
  75. ideas in the Html 3.2\index{Html!Html 3.2} specification were never fully deployed, people started thinking more carefully
  76. about mathematics, and how they could be represented on the WWW.
  77. In the meantime, while the World Wide Web Consortium\index{World Wide Web Consortium} and other societies continued
  78. working on developing mathematical support for Internet documents, other solutions to transmitting mathematics on the web
  79. arose. The lack of a standard approach to uniformly represent mathematics on the Internet pushed mathematicians and
  80. scientists to use a variety of different techniques to achieve this purpose. Let us give a brief overview of the main
  81. ones.
  82. \subsection{Embedded Graphics}
  83. One way of displaying mathematics on the web is by the use of embedded graphics inside Html documents. Mathematical
  84. equations are represented by graphical images (e.g. gifs) which all browsers display without difficulties. Formulae can be
  85. viewed in their original rendering, without the browser requiring additional fonts or external viewing programs.
  86. Nevertheless, these images display low resolutions and printing them results in poor quality documents. There are also
  87. problems with alignment and sizing. Because graphical images are generally slow to download, documents might take more
  88. time than desired to be rendered. Since we are only dealing with images, the equations are not editable. No modifications
  89. can be done on them. For the same reasons, they are not reusable, because semantic value is completely lost.
  90. This method is widespread but not very appreciated. In the Html 3.0\index{Html!Html 3.0} draft, the World Wide Web
  91. Consortium\index{World Wide Web Consortium} specifically states its intention of helping users avoid the use of inline
  92. images to display equations.
  93. This is the approach used by programs such as \LaTeX\index{\LaTeX}2Html \cite{latex2html} or \TeX\index{\TeX}4ht
  94. \cite{tex4ht} which can convert \LaTeX\index{\LaTeX} and \TeX\index{\TeX} documents to Html\index{Html} format for direct insertion into the
  95. Internet. \LaTeX\index{\LaTeX} markup is translated into Html while mathematical equations are converted into graphical
  96. images. It is worth noting however, that there exist programs such as TtM\index{TtM} \cite{TtM} which translate the
  97. mathematical sections directly into MathML\index{MathML} presentation markup \index{MathML!presentation markup}.
  98. \subsection{Graphical Page Display}
  99. Another way of approaching the problem is by using graphical page displays. The page is rendered into a page-description
  100. language such as postscript\index{postscript} or PDF\index{PDF}. Internet browsers, aided by an external viewer or plug-in
  101. can then display the page in its integrity, including any mathematical formulae within it. When using this method,
  102. documents are displayed with exactly the same layout as the original documents, which could be \TeX\index{\TeX} documents
  103. for instance. The printing resolution is also maintained at a high quality level.
  104. But using an external viewer or plug-in involves everyone possessing a copy. A viewer also requires a verbose and large
  105. file format including all the non-standard fonts used. Just in the same way as the embedded graphics display, any
  106. mathematics contained within these documents looses its semantic value, as well as the possibility to edit it or modify
  107. it.
  108. \section{OpenMath\index{OpenMath} and MathML\index{MathML}}
  109. These interim solutions have only contributed to the problem by putting in evidence the need of a consistent standardized methodology for the
  110. transmission of mathematics via the World Wide Web. In view of the failure of existing methods MathML and OpenMath's\footnote{Describing these
  111. standards in detail is not in the scope of this report. We do encourage the reader to have a careful read through both standard specifications
  112. \cite{openmath}\cite{mathml} in order to better understand this report and its implications.} significance and importance increased. Both standards
  113. are complementary yet serving different purposes.
  114. The primary aim of OpenMath\index{OpenMath} is to facilitate reliable communication of mathematical objects between mathematical applications. It
  115. ensures semantic content is preserved within the notation. The semantic scope of OpenMath\index{OpenMath} is defined within its content
  116. dictionaries\index{content dictionaries} (CD) where all symbols used are described defining their semantic value. Related symbols and functions are
  117. grouped into CD groups. It is expected that applications using OpenMath\index{OpenMath} declare which CD groups they understand.
  118. MathML\index{MathML} however is World Wide Web oriented in that it seeks to display mathematics on web pages.
  119. MathML\index{MathML} has two combinable versions, one encoding mathematical objects (presentation
  120. markup\index{MathML!presentation markup}) and the other encoding mathematical meaning (content markup\index{content
  121. markup}). Both versions allow authors to encode both the notation which represents a mathematical object and the
  122. mathematical structure of the object itself. Moreover, authors can mix both kinds of encoding in order to specify both the
  123. presentation and content of a mathematical idea.
  124. In fact there are strong links between both recommendations. The communities developing both standards are closely
  125. related, with some members belonging to both groups. This has resulted in both standards superceding each other in some
  126. areas.
  127. The {\it core} OpenMath\index{OpenMath} CD group is the principal CD group. The {\it core} CD group was designed based on
  128. MathML\index{MathML!MathML 1.0} 1.0, extending the set of symbols covered by MathML\index{MathML!MathML 1.0} 1.0. Its
  129. intention is not to be very specific, only covering everyday and K-12 (kindergarden to high school level) mathematics just
  130. as MathML\index{MathML} does.
  131. For completeness, a MathML\index{MathML} CD group was introduced in the OpenMath\index{OpenMath} standard. It is a subset
  132. of the {\it core} CD group and has the same semantic scope as do the content elements of MathML\index{MathML}. It is
  133. expected that most applications will understand the {\it core} CD group, automatically understanding the
  134. MathML\index{MathML} CD group.
  135. The recently published MathML\index{MathML!MathML 2.0} 2.0 version has incorporated elements of the {\it core}
  136. OpenMath\index{OpenMath} CD group which weren't before in MathML\index{MathML!MathML 1.0} 1.0. But in order to keep the
  137. scope of content markup\index{content markup} down to a reasonable size, the designers of MathML\index{MathML} have
  138. restricted the mathematics that it attempts to cover to high school level mathematics limiting MathML\index{MathML}'s
  139. ability to convey mathematical meaning. Because OpenMath\index{OpenMath} is more powerful in this respect, the designers
  140. of MathML\index{MathML} have introduced means allowing for extensibility. It is possible to encode semantic information
  141. inside MathML by embeding OpenMath\index{OpenMath} objects within MathML\index{MathML} code.
  142. This demonstrates the close ties existing between both the World Wide Web Consortium\index{World Wide Web Consortium} and
  143. the OpenMath\index{OpenMath Society} society. In the MathML\index{MathML!MathML 2.0} 2.0 specification one can read: {\it
  144. ``The MathML\index{MathML} content elements are heavily indebted to the OpenMath\index{OpenMath} project \ldots''}
  145. \section{Current Support}
  146. Both standards have received considerable attention, and have mobilized many developers. Support for MathML\footnote{For a comprehensive list of software supporting MathML look at the W3C web site~\cite{w3c}}
  147. \index{MathML}
  148. and OpenMath\index{OpenMath} is being introduced in many areas now that a future seems to profile itself.
  149. The dominance of Java\index{Java} on the Internet today has made it a good candidate for offering a solution to the
  150. problem of publishing mathematics. The flexibility and power of Java\index{Java} applets can be used in conjunction with
  151. MathML or OpenMath to display mathematical formulae.
  152. This approach is currently best represented by WebEQ\index{WebEQ} \cite{webeq}. WebEQ\index{WebEQ} is a collection of programs and Java\index{Java}
  153. programming libraries dealing with all aspects of putting math on the Web. Because WebEQ\index{WebEQ} is based on MathML\index{MathML},
  154. WebEQ\index{WebEQ} tools can easily be combined with each other and with other MathML\index{MathML} software to accomplish a wide range of tasks.
  155. The applet takes a representation of an equation as input, and displays it. The representation has to be some markup language which the applet
  156. supports (MathML\index{MathML} or Web\TeX\index{WebTeX}). Another Java\index{Java} application is ICEBrowser \cite{ice}. A browser component
  157. written in Java\index{Java} which renders MathML\index{MathML}.
  158. By using a Java\index{Java} applet we encounter the same difficulties as when using embedded graphics. In addition to
  159. this, Java\index{Java} applets have a larger initial download overhead, which can be disturbing to some users.
  160. Java\index{Java} applets usually offer good equation displays, but different vendors supply different solutions and markup
  161. languages.
  162. Another set of applications currently offering MathML support are plug-ins. The main distinction in principle between
  163. using plug-ins or Java\index{Java} applets is that plug-ins need to be pre-installed on the Internet browser for any
  164. rendering to take place. IBM\index{IBM} Techexplorer\index{TechExplorer} \cite{ibm} is a representative example under
  165. development. It currently supports MathML\index{MathML} encodings. IBM\index{IBM}'s approach to the problem is definetely
  166. bordering the solution the scientific community is hoping to see. Techexplorer can display MathML\index{MathML} and the
  167. quality of display is acceptable. Hopefully, IBM\index{IBM}'s techexplorer initiative will push other browser vendors and
  168. companies to adopt MathML\index{MathML} as the leading standard.
  169. But as with the other temporary solutions, plug-ins also have their limitations.
  170. Plug-ins have trouble getting the current HTML document font size, changing the size of the window to fit the display, or getting the current HTML document background color. Plug-ins such as IBM\index{IBM}'s are not
  171. yet widespread, and most people are not familiar with plug-in download and installation.
  172. In the area of computer algebra, soon many computer algebra packages should have interfaces to both standards. An example
  173. of this is the MathML\index{MathML} to REDUCE\index{REDUCE} interface available in REDUCE\index{REDUCE} 3.7, or the MathML
  174. interface built in Mathematica Version 4.
  175. Various programs convert \LaTeX~documents into MathML. This is important because of the large amount of documents written
  176. in LaTeX\index{\LaTeX} until now. An example of a program accomplishing this task is TtM\index{TtM} \cite{TtM} for
  177. instance.
  178. Various equation editors such as MathType or Design Science\index{Design Science}'s MS equation editor also support
  179. MathML\index{MathML}. They manipulate expressions and offer easy to use graphical user interfaces. It is possible to
  180. export equations to MathML format.
  181. Until now however, both Explorer\index{Explorer} and Netscape\index{Netscape} have not yet incorporated support for
  182. MathML\index{MathML}, although they have committed themselves in doing so in the near future. Because these are the most
  183. popular browsers, it is important that they soon provide MathML\index{MathML} facilities in order to boost the use of
  184. MathML\index{MathML}.
  185. \newpage
  186. \section{The future}
  187. \begin{quotation}
  188. \emph{``While many in the mathematical and scientific community have already adopted \LaTeX~as the standard for writing
  189. papers, it appears that MathML\index{MathML} is the future of scientific and mathematical notation on the Web.''} Bob
  190. Henshaw, UNC.
  191. \end{quotation}
  192. Regardless of how efficient MathML \index{MathML}and OpenMath are in transmitting and displaying mathematics, it is clear
  193. that they will only be of any use if all communities adopt it. It is expected however that most popular software companies
  194. working on the Internet or on computer algebra packages will soon support MathML and OpenMath. It seems as if MathML and
  195. OpenMath will recieve the necessary support due to the commitment that various big companies have already shown
  196. (IBM\index{IBM}, Netscape\index{Netscape}, Microsoft\index{Microsoft}, Wolfram\index{Wolfram}, Design Science\index{Design
  197. Science}, and many others).
  198. At the moment some browsers have already implemented MathML\index{MathML} rendering facilities (Amaya\index{Amaya} for
  199. instance), and soon other bigger browser vendors will join the trend. Mozilla has recently released its latest browser
  200. which does render MathML. Netscape should follow soon with Navigator5\index{Netscape!Navigator 5}. MathType from Design
  201. Science\index{Design Science} has released a new version incorporating various tools for dealing with MathML and OpenMath.
  202. For those not familiar with Design Science\index{Design Science}, they also make MS Word's equation editor. Other
  203. companies (mainly Stilo) are developing equation editors with MathML and OpenMath facilities which will soon hit the
  204. market.
  205. While substantial progress has been made, there are still areas in which more work is required before MathML can be
  206. incorporated easily into the Internet. Further improvement in coordination between browsers and embedded elements will be
  207. necessary. Furthermore, higher printing resolution must be achieved.
  208. MathML and OpenMath are the first XML\index{XML} based markup language to appear on the Internet. They will show the power and limitations of XML.
  209. An example has been set for other specialist areas which also want to benefit from the Internet.; areas such as Chemical Engineering or Music are
  210. using XML to develop representation standards. Both standards have been recieved enthousiastically and it will surely not take long before they are
  211. used widely by the scientific community.