123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317 |
- ==============================================
- Design document for ODT parsing and generation
- ==============================================
- :Author: ts
- The scope of this document is to define the design for a first implementaion of
- ODT (Open Document Text) support in the eZ Document component. The parts of the
- Document component designed in this document do not affect other Open Document
- formats like spreadsheets or graphics. The goal is to define the infrastructure
- for reading and writing ODT documents, i.e. to convert existing ODT documents
- into the internal representation of the Document component (DocBook XML) and to
- generate new ODT documents from the internal representation.
- ------------
- Requirements
- ------------
- The following sections describe the requirements for the ODT handling in the
- Document component. The first section defines requirements for reading ODT, the
- second for writing ODT and the third section defines requirements for later
- enhancements to be kept in mind during the initial implementation.
- Import
- ======
- The Document component should be able to parse existing ODT documents and to
- convert them to the internal format used by the Document component (DocBook
- XML). Requirements for the import process are:
- - Read plain XML ODT files
- - Parse all necessary structural ODT elements
- - Convert ODT elements properly into equivalent or similar DocBook representations
- - Maintaining the content semantics provided by the ODT as good as possible
- - Maintain meta information provided by the ODT as good as possible
- - Develop a first heuristical approach of how ODT styling information can be
- used to determine semantics of an element.
- Export
- ======
- The Document component should be able to generate new ODT documents from an
- existing internal representations (DocBook XML). Requirements for this process
- are:
- - Write plain XML ODT files
- - Convert DocBook representation elements to their corresponding ODT
- representations
- - Maintain the document structure
- - Maintain content and metadata semantics as good as possible
- - Styling of ODT elements.
- Later enhancements
- ==================
- In the first step of ODT integration only rudimentary features for import and
- export should be realized. The following ideas must be kept in mind during the
- design and implementation, to ensure future extensibility.
- - Reading / writing of ODT package files (ZIP)
- - ODF can be presented either as a single XML file or as a ZIP package
- containg multiple XML files and other related files (e.g. images) in
- addition.
- - Reading and writing this format is not necessary from the start, but since
- it is the default way for users to store ODT, it should be supported later
- on.
- - The handling of ZIP files requires a tie-in with the Archive component or
- similar.
- ------
- Design
- ------
- In the first development cycle, only the structural conversion between ODT and
- DocBook XML will be considered. In addition, rudimentary styling information
- will be taken into account. The reading and writing of ODF packages is not
- considered in this design.
- Import
- ======
- Three different steps are necessary to import an ODT document and convert it
- into DocbookXml:
- 1. Read the XML data
- 2. Preprocess the ODT representation
- 3. Actual conversion to DocBook XML representation
- Step 1 will be performed through the DOM extension in PHP, the internal
- representation of an ODT will be a DOM treee. The second step performs
- pre-processing on this DOM tree. Pre-processing is e.g. needed to assign
- additional semantics to the ODT elements to achieve a better rendering.
- Finally, the pre-processed DOM tree will be visited, to achieve the actual
- creation of the DocBook XML representation.
- Pre-processing
- --------------
- The step of pre-processing the ODT representation is necessary to assign
- DocBoox semantics to the ODT elements. ODT and DocBook XML have some
- similarities, but also differ widely in some parts. The pre-processing step
- performs manipultations on the ODT representation and potentially adds
- information which is utilized by the latter conversion step to create a correct
- semantical representation.
- This process works similar to filters in the XHTML document import. The class
- level design of this feature is inspired by the XHTML handling: Filters can be
- registered which pre-process the incoming ODT in the given order.
- A filter may process the following steps on a DOMElement:
- - Add type information to an XML element to determine into which DocBook XML
- element the element will be converted
- - Add attribute information to determine the attributes in the DocBook XML
- representation
- - Add additional elements or element hierarchies
- The resulting DOM tree must not necessarily be valid ODT anymore, to reflect
- the latter DocBook structure in a better way.
- The first implemented filter will only perform rudimentary operations on the
- DOM to assign basic semantical information to the elements. A second
- implementation will be an additional filter which takes some styling
- information into account to enhance this information. Futher filters can be
- implemented by third parties to extend or replace these mechanisms.
- Conversion
- ----------
- The conversion process itself will mostly visit the DOM tree and utilize the
- information, attached to the elements in the pre-processing step, to generate a
- DocBook XML with the corresponding content. The filter pre-processing step is
- responsible to annotate all significant elements properly so that the
- conversion can use them.
- Flat ODT documents (consisting of only 1 XML file), which will purely be
- handled in the first version of ODT support, may contain image content embeded.
- To extract those, the user my specify a target directory or the system temp dir
- will be used as the default. The content will then be referenced in DocBook
- from this location.
- .. note:: We should check if it is possible to define and handle data URLs in
- docbook. May be problematic with other formats though. (kn)
- Export
- ======
- .. note:: First sentence a bit unclear ;) (kn)
- The export process for ODT works similar to PDF rendering, except for that is a
- little bit less strict. The internal DocBook representation is converted to the
- desired ODT representation according to its semantics.
- Based on the DocBook XML elements, the user can define styles using a
- simplified CSS syntax (see PDF). Each of the style definitions is converted to
- an automatic style in the resulting ODT document. ODT elements affected by a
- certain style get this style applied.
- Styles
- ======
- A style is defined for each styling information. There is no direct assignement
- of layouting elements to styling information, but always a style in between.
- The <style/> element has the following properties:
- name
- The internal name of the style. Must be unique over all styles, in
- concatenation with the style:family.
- displayname
- Name of the style to display in GUIs. If left out, the name is used.
- family
- Family collection of the style. One of (in context of text documents):
- text
- Style that might be applied to any piece of text.
- paragraph
- Style for complete paragraphs and headings.
- section
- Style to be applied to sections of text in text documents (@TODO: Not
- handled yet!).
- ruby
- Not handled, yet.
- table
- table-column
- table-row
- table-cell
- table-page
- chart
- default
- graphic
- parent-style-name
- Identifies a parten style. Style properties of the parent are inherited and
- maybe overwritten. If no parent style is specified, the default style for
- the styles family will be the base for inheritence.
- next-style-name
- Next paragraph style. If a new paragraph is started after the element this
- style is applied to, this paragraph will have the style named in this
- element. Only sensible for editing in a GUI.
- list-style-name
- Style used in headings and paragraphs of lists contained in the styled
- element, only if the lists have no list-style applied themselves.
- master-page-name
- Styles with a master page applied will force a page break before the
- element and load the styles from the master-page then.
- data-style-name
- Styling of table cells (e.g. formulas, currencies, ...).
- class
- Information for GUIs, to sort styles into categories.
- default-outline-level
- "Transforms" a paragraph into some kind of heading, without making it a
- heading itself. Senseless.
- Style mappings (replacing a style conditionally with another style) will not be
- taken into account, yet.
- Types of styles
- ---------------
- default-style
- Default styles must be defined for each used style family. The default
- style is always the base of inheritance for the style family.
- page-layout
- Definition of the global page properties, format and stuff.
- header-style / footer-style
- Styling of the header and footer area.
- master-page
- Definition of a master page. Defines header / footer, forms, styles for the
- page and more.
- Table templates
- ---------------
- Not yet handled.
- Font face declaration
- ---------------------
- Correspond to the @font-face declaration of CSS2.
- Data styles
- -----------
- Not yet handled.
- List styles
- -----------
- Define properties of a list (not its content!). A style for each list level. If
- no style exists for a specific level, the next lower level style is used. If
- none is defined, a default style is used. name and display-name properties as
- ususal. Can have the consecutive-numbering attribute defined, to specify if
- different list levels restart numbering or not
- List styles
- -----------
- Define properties of a list (not its content!). A style for each list level. If
- no style exists for a specific level, the next lower level style is used. If
- none is defined, a default style is used. name and display-name properties as
- ususal. Can have the consecutive-numbering attribute defined, to specify if
- different list levels restart numbering or not.
- List-level styles
- ^^^^^^^^^^^^^^^^^
- A list-level style commonly has a level attribute, defining, to which
- list-level the style is applied. All other attribute depend on the type of
- list. A list may contain different kinds of lists, depending on the depth of
- the level.
- Number level styles
- ~~~~~~~~~~~~~~~~~~~
- Defining an enumeration list level using a list-level-style-number element. Has
- the following attributes:
- style-name
- Defines the text style for list item numbers.
- num-format
- Defines the formatting of the list item numbers.
- display-levels
- Defines how many level numberings to display (e.g. 1.2.3 or just 1.2).
- start-value
- Defines the first number to be used by the very first element of the
- defined level.
- Bullet level style
- ~~~~~~~~~~~~~~~~~~
- Attributes defining a list level to be an item list.
- text-style
- Style for the bullet character.
- bullet-character
- A unicode character to be used as the bullet.
- num-format-prefix / num-format-suffix
- Prefix and suffix to be placed before / after a bullet.
- bullet-relative-size
- Relative size (percentage, integer) of the bullet in respect to the item
- content.
- Image level style
- ~~~~~~~~~~~~~~~~~
- Creates items preceeded by images. The image to be used is either referenced or
- stored using base64 encoded binary data.
- ..
- Local Variables:
- mode: rst
- fill-column: 79
- End:
- vim: et syn=rst tw=79
|