requirements.txt 4.4 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113
  1. eZ publish Enterprise Component: Document, Requirements
  2. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  3. :Author: Kirill Subbotin
  4. :Revision: $Revision: $
  5. :Date: $Date: $
  6. Introduction
  7. ============
  8. Description
  9. -----------
  10. The purpose of the Document component is to provide an easy and universal way to
  11. deal with structured text documents of any format.
  12. Current implementation
  13. ----------------------
  14. In eZ publish 3 there is a datatype 'ezxmltext' that stores structured text in
  15. it's own internal XML format, but this format has some disadvantages. The main
  16. weak point is that it's scheme is very different from XHTML, which makes
  17. it hard to present it in WYSIWYG editors and has a negative impact on the
  18. performance. The main things are:
  19. - All content is wrapped in paragraphs, even "block" elements like tables,
  20. which is not allowed in XHTML.
  21. - Using sections instead of plain headers like in XHTML 1.*.
  22. - Using "line" tags with line's content instead of single "br".
  23. - "Custom" element's placement in the schema depends on it's attribute, which is
  24. not compatible with XML schema definitions.
  25. This datatype can take input in some formats including 'simplified XML' and basic
  26. HTML (for OE) and produces output in HTML and PDF using eZ publish's templates.
  27. Also there is an ODF extension for eZ publish that can convert simple documents
  28. in Oasis Open Document format to the internal format of 'ezxmltext' datatype and
  29. back. It uses the custom code plus OpenOffice.org in daemon mode for conversion.
  30. Requirements
  31. ============
  32. The main task of the component is to take an input in one of the supported
  33. formats and convert it to another. There should be always a way to convert
  34. one format to another, it could be a direct conversion, or chain conversion,
  35. that involves one or more intermediate formats.
  36. There should be one special format that is recommended to use as intermediate
  37. format in complex conversions, unless there are reasons to peek another. This
  38. format is called "internal format".
  39. Additionally the next features should be implemented:
  40. * Validating the input document according to the given schema
  41. * Auto-fix (tidy) incorrect input.
  42. * Generate/edit a document in internal format using PHP API.
  43. Main formats the component deal with include:
  44. * Simple text markup formats (ReST, BBcode, WikiText, 'simplified XML')
  45. * XML-based formats (XHTML, DocBook, eZ publish 4 XML text)
  46. * Complex formats that include styles and additional info (doc, odt, pdf)
  47. The purpose of the component is to present document's structured content only,
  48. so all styling information that comes from the input will be ignored. But if
  49. some semantics is coded with styles in the input document, it should be
  50. possible to convert it properly.
  51. Design goals
  52. ============
  53. The component should be able to:
  54. * Take an input and present it as a DOM tree internally.
  55. * Validate it by the schema of it's format and report about errors or tidy
  56. document automatically.
  57. * Transform to another formats using one of the available methods.
  58. * Provide an interface to create/edit a document in the internal format.
  59. Internal format should have a schema that is rich enough to present any document.
  60. There should always be a away to convert from any of the supported formats to
  61. the internal format. This rule guarantees that there is always a way between
  62. any formats.
  63. XML transformations can be done with XSL templates. Thus XSL extension for
  64. PHP 5 is required for this component.
  65. Design document should contain a list of formats we are willing to support in
  66. the first release and the ways they can be processed.
  67. Formats
  68. =======
  69. The good candidate for the inner document format is DocBook (http://docbook.org/)
  70. This is open format that has a lot of features while not being too complex.
  71. Also there are a lot of already existing tools that support this format, so
  72. we can use some of them if it is allowed by the license.
  73. In the first release we will most likely support a subset of this format, so it
  74. probably has a sense to use an already existing subset called Simplified DocBook
  75. (http://docbook.org/schemas/simplified).
  76. Other document formats that will defenitely supported in some way:
  77. - ReST
  78. - Wikitext
  79. - HTML
  80. - XHTML
  81. - OpenDocument
  82. - PDF
  83. - eZ publish 4 XML text
  84. - eZ publish 4 simplified XML text
  85. The attached diagram (document-formats.svg) shows which formats will be
  86. supported in the first release of the component and possible directions of
  87. transforming from one format to anthoer.