pdf_requirements.txt 6.1 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203
  1. ===============================
  2. Requirements for PDF generation
  3. ===============================
  4. Generation of PDF documents should be added to the document component. Those
  5. PDF documents should be created from Docbook documents, which can be generated
  6. from each markup available in the document component.
  7. This document summarizes the requirements for PDF generation.
  8. Central requirements
  9. ====================
  10. The key requirements for the component
  11. Layout
  12. ------
  13. The central requirement is to generate user styles PDF documents from Docbook
  14. markup. The customized styling should include, but is not limited to:
  15. - Text
  16. - Fonts and text sizes
  17. - Line heights
  18. - Colors
  19. - Alignments
  20. - Pages
  21. - Footer and headers
  22. - Background images
  23. - Multiple text columns per page
  24. - Page sizes
  25. - Margins and paddings
  26. - Block level elements (tables, graphics, literal blocks, ...)
  27. - Borders, backgrounds, fonts, colors
  28. It must be possible to assign different styles depending on the parent
  29. elements in the Docbook markup, so that the titles in the following Docbook
  30. document can be formatted differently::
  31. <?xml version="1.0"?>
  32. <article>
  33. <title>Article title</title>
  34. <section>
  35. <title>First heading</title>
  36. <section>
  37. <title>Second heading</title>
  38. </section>
  39. </section>
  40. </article>
  41. It would be nice if the styles can be imported and exported from / to a easily
  42. readable and writeable format.
  43. Text formatting
  44. ---------------
  45. Proper formatting of texts is most probably the biggest problem in the
  46. implementation, the requirements include:
  47. Hyphenation
  48. ^^^^^^^^^^^
  49. Especially justified texts in narrow text columns requires hyphenation for
  50. words, otherwise the blanks between characters and words might increase to
  51. much. A pluggable hyphenation mechanism is required, which can be adapted to
  52. different languages, based on externally available dictionaries.
  53. Widows and orphans
  54. ^^^^^^^^^^^^^^^^^^
  55. See: http://en.wikipedia.org/wiki/Widows_and_orphans
  56. There should be ways to configure the thresholds under which paragraphs are
  57. considered widows or orphans, which should be avoided.
  58. Inline formatting
  59. ^^^^^^^^^^^^^^^^^
  60. Depending on the used font and styles inline formatting might have a serious
  61. effect on the text width. This MUST be respecting during text rendering.
  62. LTF and RTL languages
  63. ^^^^^^^^^^^^^^^^^^^^^
  64. The text wrapping must be able to work with left-to-right and right-to-left
  65. languages.
  66. Floating media objects
  67. ^^^^^^^^^^^^^^^^^^^^^^
  68. For media objects, which do not span the whole column width, it should be
  69. possible to float text around the media objects. Detection of the actual image
  70. borders is not required - the rectangular frame around the image should be
  71. sufficant for text floating.
  72. Embedding of media
  73. ------------------
  74. There are a lot of different media types, which might be embedded into PDF:
  75. The most common format seem to be JPEG and EPS. JPEG is not suitable for
  76. several types of graphics [1], and EPS can only be used properly for some
  77. types of vector based images. Conversion options and supported formats must be
  78. evaluated.
  79. It might depend on the used driver which formats are supported.
  80. PDI allows embedding of other PDFs inside the created PDF - this can be useful
  81. when merging different generated documents.
  82. .. [1] http://kore-nordmann.de/blog/image_formats.html
  83. Metadata
  84. --------
  85. The document component already preserves metadata associated with documents.
  86. PDF supports embedding additional document metadata. This should definitely
  87. be embedded, but it might also be useful to offer a easy accessible API for
  88. embedding of additional metadata. XMP is especially designed to embed metadata
  89. using the RDF.
  90. Autogenerated contents
  91. ----------------------
  92. Headers and footers often contain some fixed texts, but might also contain
  93. autogenerated contents, like:
  94. - Current page / number of pages / page orientation (left, right)
  95. - Current section title
  96. - Author, read from document metadata
  97. It must be possible to define callbacks which generate those contents for the
  98. page they are currently rendered on. The best possible markup used for
  99. generation of those contents needs to be evaluated.
  100. There are several elements, which can require automatic generation, those are
  101. at least:
  102. - Header / Footer
  103. - Cover page
  104. - Table of contents
  105. - Back page
  106. For most of those elements a predefined generator can be implemented which
  107. creates meaningful default contents, and then can be extended by the user.
  108. Especially for cover and back pages it might be useful to include them
  109. directly from other PDF documents.
  110. Driver infrastructure
  111. ---------------------
  112. There are multiple ways to generate PDF documents, like:
  113. - pecl/libharu
  114. - FPDF
  115. - TCPDF
  116. - pdflib
  117. - Zend_PDF
  118. It might depend on the environment which one of those libraries is available
  119. and performs the best. A driver infrastructure should offer the user the
  120. choice of selecting the best output driver for writing the actual PDF. Not all
  121. of those drivers do support proper text wrapping themselves, so that this
  122. cannot be handed over to the drivers.
  123. Optional requirements
  124. =====================
  125. Once PDF rendering is implemented correctly, including correct rendering and
  126. wrapping of texts, it might be useful in similar cases, for example:
  127. - SVG to PDF conversion
  128. The conversion of SVG to PDF is used for distribution of heavily customized
  129. designed documents. With a proper rendering infrastructure the API should be
  130. kept flexible enough to support such conversions later
  131. - HTML to PDF conversion
  132. It might be useful to directly convert styled HTML to PDF - if the API stays
  133. flexible enough this should be possible to add later.
  134. One major problem might be the used markup for formatting of inline text
  135. elements.
  136. Import of PDF pages
  137. -------------------
  138. For cover pages (or similar) of documents it might be useful to extract whole
  139. pages from other PDFs and embed them in the generated PDF document.
  140. This requires reading of existing PDF documents, though - which is not planned
  141. to be implemented yet.
  142. Signing PDFs / write protection
  143. -------------------------------
  144. It is common to make PDF documents write protected or sign PDF documents. If
  145. the respective PDF creation library can handle that, it should be exposed in
  146. the API of the PDF creation.