odt_design.txt 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317
  1. ==============================================
  2. Design document for ODT parsing and generation
  3. ==============================================
  4. :Author: ts
  5. The scope of this document is to define the design for a first implementaion of
  6. ODT (Open Document Text) support in the eZ Document component. The parts of the
  7. Document component designed in this document do not affect other Open Document
  8. formats like spreadsheets or graphics. The goal is to define the infrastructure
  9. for reading and writing ODT documents, i.e. to convert existing ODT documents
  10. into the internal representation of the Document component (DocBook XML) and to
  11. generate new ODT documents from the internal representation.
  12. ------------
  13. Requirements
  14. ------------
  15. The following sections describe the requirements for the ODT handling in the
  16. Document component. The first section defines requirements for reading ODT, the
  17. second for writing ODT and the third section defines requirements for later
  18. enhancements to be kept in mind during the initial implementation.
  19. Import
  20. ======
  21. The Document component should be able to parse existing ODT documents and to
  22. convert them to the internal format used by the Document component (DocBook
  23. XML). Requirements for the import process are:
  24. - Read plain XML ODT files
  25. - Parse all necessary structural ODT elements
  26. - Convert ODT elements properly into equivalent or similar DocBook representations
  27. - Maintaining the content semantics provided by the ODT as good as possible
  28. - Maintain meta information provided by the ODT as good as possible
  29. - Develop a first heuristical approach of how ODT styling information can be
  30. used to determine semantics of an element.
  31. Export
  32. ======
  33. The Document component should be able to generate new ODT documents from an
  34. existing internal representations (DocBook XML). Requirements for this process
  35. are:
  36. - Write plain XML ODT files
  37. - Convert DocBook representation elements to their corresponding ODT
  38. representations
  39. - Maintain the document structure
  40. - Maintain content and metadata semantics as good as possible
  41. - Styling of ODT elements.
  42. Later enhancements
  43. ==================
  44. In the first step of ODT integration only rudimentary features for import and
  45. export should be realized. The following ideas must be kept in mind during the
  46. design and implementation, to ensure future extensibility.
  47. - Reading / writing of ODT package files (ZIP)
  48. - ODF can be presented either as a single XML file or as a ZIP package
  49. containg multiple XML files and other related files (e.g. images) in
  50. addition.
  51. - Reading and writing this format is not necessary from the start, but since
  52. it is the default way for users to store ODT, it should be supported later
  53. on.
  54. - The handling of ZIP files requires a tie-in with the Archive component or
  55. similar.
  56. ------
  57. Design
  58. ------
  59. In the first development cycle, only the structural conversion between ODT and
  60. DocBook XML will be considered. In addition, rudimentary styling information
  61. will be taken into account. The reading and writing of ODF packages is not
  62. considered in this design.
  63. Import
  64. ======
  65. Three different steps are necessary to import an ODT document and convert it
  66. into DocbookXml:
  67. 1. Read the XML data
  68. 2. Preprocess the ODT representation
  69. 3. Actual conversion to DocBook XML representation
  70. Step 1 will be performed through the DOM extension in PHP, the internal
  71. representation of an ODT will be a DOM treee. The second step performs
  72. pre-processing on this DOM tree. Pre-processing is e.g. needed to assign
  73. additional semantics to the ODT elements to achieve a better rendering.
  74. Finally, the pre-processed DOM tree will be visited, to achieve the actual
  75. creation of the DocBook XML representation.
  76. Pre-processing
  77. --------------
  78. The step of pre-processing the ODT representation is necessary to assign
  79. DocBoox semantics to the ODT elements. ODT and DocBook XML have some
  80. similarities, but also differ widely in some parts. The pre-processing step
  81. performs manipultations on the ODT representation and potentially adds
  82. information which is utilized by the latter conversion step to create a correct
  83. semantical representation.
  84. This process works similar to filters in the XHTML document import. The class
  85. level design of this feature is inspired by the XHTML handling: Filters can be
  86. registered which pre-process the incoming ODT in the given order.
  87. A filter may process the following steps on a DOMElement:
  88. - Add type information to an XML element to determine into which DocBook XML
  89. element the element will be converted
  90. - Add attribute information to determine the attributes in the DocBook XML
  91. representation
  92. - Add additional elements or element hierarchies
  93. The resulting DOM tree must not necessarily be valid ODT anymore, to reflect
  94. the latter DocBook structure in a better way.
  95. The first implemented filter will only perform rudimentary operations on the
  96. DOM to assign basic semantical information to the elements. A second
  97. implementation will be an additional filter which takes some styling
  98. information into account to enhance this information. Futher filters can be
  99. implemented by third parties to extend or replace these mechanisms.
  100. Conversion
  101. ----------
  102. The conversion process itself will mostly visit the DOM tree and utilize the
  103. information, attached to the elements in the pre-processing step, to generate a
  104. DocBook XML with the corresponding content. The filter pre-processing step is
  105. responsible to annotate all significant elements properly so that the
  106. conversion can use them.
  107. Flat ODT documents (consisting of only 1 XML file), which will purely be
  108. handled in the first version of ODT support, may contain image content embeded.
  109. To extract those, the user my specify a target directory or the system temp dir
  110. will be used as the default. The content will then be referenced in DocBook
  111. from this location.
  112. .. note:: We should check if it is possible to define and handle data URLs in
  113. docbook. May be problematic with other formats though. (kn)
  114. Export
  115. ======
  116. .. note:: First sentence a bit unclear ;) (kn)
  117. The export process for ODT works similar to PDF rendering, except for that is a
  118. little bit less strict. The internal DocBook representation is converted to the
  119. desired ODT representation according to its semantics.
  120. Based on the DocBook XML elements, the user can define styles using a
  121. simplified CSS syntax (see PDF). Each of the style definitions is converted to
  122. an automatic style in the resulting ODT document. ODT elements affected by a
  123. certain style get this style applied.
  124. Styles
  125. ======
  126. A style is defined for each styling information. There is no direct assignement
  127. of layouting elements to styling information, but always a style in between.
  128. The <style/> element has the following properties:
  129. name
  130. The internal name of the style. Must be unique over all styles, in
  131. concatenation with the style:family.
  132. displayname
  133. Name of the style to display in GUIs. If left out, the name is used.
  134. family
  135. Family collection of the style. One of (in context of text documents):
  136. text
  137. Style that might be applied to any piece of text.
  138. paragraph
  139. Style for complete paragraphs and headings.
  140. section
  141. Style to be applied to sections of text in text documents (@TODO: Not
  142. handled yet!).
  143. ruby
  144. Not handled, yet.
  145. table
  146. table-column
  147. table-row
  148. table-cell
  149. table-page
  150. chart
  151. default
  152. graphic
  153. parent-style-name
  154. Identifies a parten style. Style properties of the parent are inherited and
  155. maybe overwritten. If no parent style is specified, the default style for
  156. the styles family will be the base for inheritence.
  157. next-style-name
  158. Next paragraph style. If a new paragraph is started after the element this
  159. style is applied to, this paragraph will have the style named in this
  160. element. Only sensible for editing in a GUI.
  161. list-style-name
  162. Style used in headings and paragraphs of lists contained in the styled
  163. element, only if the lists have no list-style applied themselves.
  164. master-page-name
  165. Styles with a master page applied will force a page break before the
  166. element and load the styles from the master-page then.
  167. data-style-name
  168. Styling of table cells (e.g. formulas, currencies, ...).
  169. class
  170. Information for GUIs, to sort styles into categories.
  171. default-outline-level
  172. "Transforms" a paragraph into some kind of heading, without making it a
  173. heading itself. Senseless.
  174. Style mappings (replacing a style conditionally with another style) will not be
  175. taken into account, yet.
  176. Types of styles
  177. ---------------
  178. default-style
  179. Default styles must be defined for each used style family. The default
  180. style is always the base of inheritance for the style family.
  181. page-layout
  182. Definition of the global page properties, format and stuff.
  183. header-style / footer-style
  184. Styling of the header and footer area.
  185. master-page
  186. Definition of a master page. Defines header / footer, forms, styles for the
  187. page and more.
  188. Table templates
  189. ---------------
  190. Not yet handled.
  191. Font face declaration
  192. ---------------------
  193. Correspond to the @font-face declaration of CSS2.
  194. Data styles
  195. -----------
  196. Not yet handled.
  197. List styles
  198. -----------
  199. Define properties of a list (not its content!). A style for each list level. If
  200. no style exists for a specific level, the next lower level style is used. If
  201. none is defined, a default style is used. name and display-name properties as
  202. ususal. Can have the consecutive-numbering attribute defined, to specify if
  203. different list levels restart numbering or not
  204. List styles
  205. -----------
  206. Define properties of a list (not its content!). A style for each list level. If
  207. no style exists for a specific level, the next lower level style is used. If
  208. none is defined, a default style is used. name and display-name properties as
  209. ususal. Can have the consecutive-numbering attribute defined, to specify if
  210. different list levels restart numbering or not.
  211. List-level styles
  212. ^^^^^^^^^^^^^^^^^
  213. A list-level style commonly has a level attribute, defining, to which
  214. list-level the style is applied. All other attribute depend on the type of
  215. list. A list may contain different kinds of lists, depending on the depth of
  216. the level.
  217. Number level styles
  218. ~~~~~~~~~~~~~~~~~~~
  219. Defining an enumeration list level using a list-level-style-number element. Has
  220. the following attributes:
  221. style-name
  222. Defines the text style for list item numbers.
  223. num-format
  224. Defines the formatting of the list item numbers.
  225. display-levels
  226. Defines how many level numberings to display (e.g. 1.2.3 or just 1.2).
  227. start-value
  228. Defines the first number to be used by the very first element of the
  229. defined level.
  230. Bullet level style
  231. ~~~~~~~~~~~~~~~~~~
  232. Attributes defining a list level to be an item list.
  233. text-style
  234. Style for the bullet character.
  235. bullet-character
  236. A unicode character to be used as the bullet.
  237. num-format-prefix / num-format-suffix
  238. Prefix and suffix to be placed before / after a bullet.
  239. bullet-relative-size
  240. Relative size (percentage, integer) of the bullet in respect to the item
  241. content.
  242. Image level style
  243. ~~~~~~~~~~~~~~~~~
  244. Creates items preceeded by images. The image to be used is either referenced or
  245. stored using base64 encoded binary data.
  246. ..
  247. Local Variables:
  248. mode: rst
  249. fill-column: 79
  250. End:
  251. vim: et syn=rst tw=79