pdf_design.txt 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365
  1. ==================================
  2. Design document for PDF generation
  3. ==================================
  4. :Author: kn
  5. This is design document for the PDF generation in the eZ Components document
  6. component. PDF documents should be created from Docbook documents, which can
  7. be generated from each markup available in the document component.
  8. The requirements which should be designed in this document are specified in
  9. the pdf_requirements.txt document.
  10. Layout directives
  11. =================
  12. The PDF document will be created from the Docbook document object. It will
  13. already contain basic formatting rules for meaningful default document layout.
  14. Additional rules may be passed to the document in various ways.
  15. Generally a CSS like approach will be used to encode layout information. This
  16. allows both, the easily readable addressing of nodes in an XML tree, like
  17. already known from CSS, and humanly readable formatting options.
  18. A limited subset of CSS will be used for now for addressing elements inside the
  19. Docbook XML tree. The grammar for those rules will be::
  20. Address ::= Element ( Rule )*
  21. Rule ::= '>'? Element
  22. Element ::= ElementName ( '.' ClassName | '#' ElementId )
  23. ClassName ::= [A-Za-z_-]+
  24. ElementName ::= XMLName* | '*'
  25. ElementId ::= XMLName
  26. * XMLName references to http://www.w3.org/TR/REC-xml/#NT-Name
  27. The semantics of this simple subset of addressing directives are the same as in
  28. CSS. A second level title could for example then be addressed by::
  29. section title
  30. The formatting options are also mostly the same as in CSS, but again only
  31. using a subset of the definitions available in CSS and with some additional
  32. formatting options, relevant especially for PDF rendering. The used formatting
  33. options depend on the renderer - unknown formatting options may issue errors
  34. or warnings.
  35. The PDF document wrapper class will implement Iterator and ArrayAccess to
  36. access the layout directives, like the following example shows::
  37. $pdf = new ezcDocumentPdf();
  38. $pdf->createFromDocbook( $docbook );
  39. $pdf->styles['article > section title']['font-size'] = '1.6em';
  40. Directives which are specified later will always overwrite earlier directives,
  41. for each each formatting option specified in the later directive. The
  42. overwriting of formatting options will NOT depend on the complexity of the
  43. node addressing like in CSS.
  44. Importing and exporting layout directives
  45. -----------------------------------------
  46. The layout directives can be exported and imported to and from files, so that
  47. users of the component may store a custom PDF layout. The storage format will
  48. again very much look like a simplified variant of CSS::
  49. File ::= Directive+
  50. Directive ::= Address '{' Formatting* '}'
  51. Formatting ::= Name ':' '"' Value '"' ';'
  52. Name ::= [A-Za-z-]+
  53. Value ::= [^"]+
  54. C-style comments are allowed anywhere in the definition file, like ```/* ..
  55. */``` and ```// ...```.
  56. Importing and exporting styles may be accomblished by::
  57. $pdf->styles->load( 'styles.pcss' );
  58. List of formatting options
  59. --------------------------
  60. There will be formatting options just processed, like they are defined in CSS,
  61. and some custom options. The options just reused from CSS are:
  62. - background-color
  63. - background-image
  64. - background-position
  65. - background-repeat
  66. - border-color
  67. - border-width
  68. - border-bottom-color
  69. - border-bottom-width
  70. - border-left-color
  71. - border-left-width
  72. - border-right-color
  73. - border-right-width
  74. - border-top-color
  75. - border-top-width
  76. - color
  77. - direction
  78. - font-family
  79. - font-size
  80. - font-style
  81. - font-variant
  82. - font-weight
  83. - line-height
  84. - list-style
  85. - list-style-position
  86. - list-style-type
  87. - margin
  88. - margin-bottom
  89. - margin-left
  90. - margin-right
  91. - margin-top
  92. - orphans
  93. - padding
  94. - padding-bottom
  95. - padding-left
  96. - padding-right
  97. - padding-top
  98. - page-break-after
  99. - page-break-before
  100. - text-align
  101. - text-decoration
  102. - text-indent
  103. - white-space
  104. - widows
  105. - word-spacing
  106. Custom properties are:
  107. text-columns
  108. Number of text text columns in one section.
  109. text-column-spacing
  110. The margin between multiple text comlumns on one page
  111. page-size
  112. Size of pages
  113. page-orientation
  114. Orientation of pages
  115. Not all options can be applied to each element. The renderer might complain on
  116. invalid options, depending on the configured error level.
  117. Special layout elements
  118. =======================
  119. Footers & Headers
  120. -----------------
  121. Footnotes and Headers are special layout elements, which can be rendered
  122. manually by the user of the component. They can be considered as small
  123. sub-documents, but their renderer receives additional information about the
  124. current page they are rendered on.
  125. They can be set like::
  126. $pdf = new ezcDocumentPdf();
  127. $pdf->createFromDocbook( $docbook );
  128. $pdf->footer = new myDocumentPdfPart();
  129. Each of those parts can render itself and calculate the appropriate bounding.
  130. There might be extensions from the basic PDFPart class, which again render small
  131. Docbook sub documents into one header, or just take a string, replacing
  132. placeholders with page dependent contents.
  133. Possible implementations would be:
  134. ezcDocumentPdfDocbookPart
  135. Receives a docbook document and renders it using a a defined style at the
  136. header or footer of the current page. Placeholders in the text,
  137. represented by, for example, entities might be replaced.
  138. ezcDocumentPdfStringPart
  139. Receives a simple string, in which simple placeholders are replaced.
  140. Other elements
  141. --------------
  142. There are various possible full site elements, which might be rendered before or
  143. after the actual contents. Those are for example:
  144. - Cover page
  145. - Bibliography
  146. - Back page
  147. To add those to on PDF document you can create a pdf set, which is then rendered
  148. into one file::
  149. $pdf = new ezcDocumentPdf();
  150. $pdf->createFromDocbook( $docbook );
  151. $set = new ezcDocumentPdfSet();
  152. $set->parts = array(
  153. new ezcDocumentPdfPdfPart( 'title.pdf' ),
  154. $customTableOfContents,
  155. $pdf,
  156. $bibliography,
  157. );
  158. $set->render( 'my.pdf' );
  159. Some of the documents aggregated in one set can of course again be documents
  160. created from Docbook documents. Each element in the set may contain custom
  161. layout directives.
  162. For the inclusion of other document parts into a PdfSet you are expected to
  163. extend from the PDF base class and implement you custom functionality there.
  164. This could mean generating idexes, or a bibliography from the content.
  165. Drivers
  166. =======
  167. The actual PDF renderer calls methods on the driver, which abstract the quirks
  168. of the respective implementations. There will be drivers for at least:
  169. - pecl/libharu
  170. - TCPDF
  171. Renderer
  172. ========
  173. The renderer will be responsible for the actual typesetting. It will receive a
  174. Docbook document, apply the given layout directives and calculate the
  175. appropriate calls to the driver from those.
  176. The renderer optionally receives a set of helper objects, which perform relevant
  177. parts of the typesetting, like:
  178. Hyphenator
  179. Class implementing hyphenation for a specific language. We might provide a
  180. default implementation, which reads standard hyphenation files.
  181. The renderer state will be shared using an object representing the page
  182. currently processed, which contains information about the already covered
  183. areas and therefore the still available space.
  184. Using such a state object, the state can easily be shared between different
  185. renderers for different aspects of the rendering process. This should allow us
  186. to write simpler rendering classes, which should be better maintainable then
  187. one big renderer class, which methods would take care of all aspects.
  188. This page state object, knowing about free space on the current page, for
  189. example allows to float text around images spanning multiple paragraphs,
  190. because the already covered space is encoded. This allows all renderers for
  191. the different aspects to reuse this knowledge and depend their rendering on
  192. this. The space already covered on a page will most probably be represented by
  193. a list of bounding boxes.
  194. Which renderer classes can be separated, will show up during implementation,
  195. but those for example could be:
  196. ezcDocumentPdfParagraphRenderer
  197. Takes care of rendering the Docbook inline markup inside one paragraph.
  198. Respects orphans and widows and might be required to split paragraphs.
  199. ezcDocumentPdfTableRenderer
  200. Renders tables. It might be useful to even split this up more into a table
  201. row and cell renderer.
  202. Additional renderer features
  203. ----------------------------
  204. If the used driver class implements the respective interfaces the renderer will
  205. also offer to sign PDF documents, or add write protection (or similar) to the
  206. PDF document.
  207. Example
  208. =======
  209. A full example for the creation of a PDF document from a HTML page could look
  210. like::
  211. <?php
  212. $html = new ezcDocumentXhtml();
  213. $html->loadFile( 'http://ezcomponents.org/introduction' );
  214. $pdf = new ezcDocumentPdf();
  215. $pdf->createFromDocbook( $html->getAsDocbook() );
  216. // Load some custom layout directives
  217. $pdf->style->load( 'my_styles.pcss' );
  218. $pdf->style['article']['text-columns'] = 3;
  219. // Set a custom header
  220. $pdf->header = new ezcDocumentPdfStringPart(
  221. '%title by %author - %pageNum / %pageCount'
  222. );
  223. // Set a custom paragraph renderer
  224. $pdf->renderer->paragraph = new myPdfParagraphRenderer();
  225. // Use the hyphenator with a german dictionary
  226. $pdf->renderer->hyphenator = new myDictionaryHyphenator(
  227. '/path/to/german.dict'
  228. );
  229. // Store the generated PDF
  230. file_put_contents( 'my.pdf', $pdf );
  231. ?>
  232. A file containing the layout directives could look like::
  233. article {
  234. page-size: "A4";
  235. }
  236. paragraph {
  237. font-family: "Bitstream Vera Sans";
  238. font-size: "1em";
  239. }
  240. article > title {
  241. font-weight: "bold";
  242. }
  243. section title {
  244. font-weight: "normal";
  245. }
  246. Classes
  247. =======
  248. The classes implemented for the PDF generation are:
  249. ezcDocumentPdf
  250. Base class, representing the PDF generation. Aggregates the style
  251. information, the docbook source document, renderer and page parts like
  252. footer and header.
  253. ezcDocumentPdfSet
  254. Class aggregating multiple ezcDocumentPdf objects, to create one single
  255. PDF document from multiple parts, like a cover page, the actual content, a
  256. bibliography, etc.
  257. ezcDocumentPdfStyles
  258. Class containing the PDF layout directives, also implements loading and
  259. storing of those layout directives.
  260. ezcDocumentPdfPart
  261. Abstract base class for page parts, like headers and footers. Renders the
  262. respective part and will be extended by multiple concrete
  263. implementations, which offer convient rendering methods.
  264. ezcDocumentPdfRenderer
  265. Basic renderer class, which aggregates renderers for distinct page
  266. elements, like paragraphs and tables, and dispatches the rendering to
  267. them. Also maintains the ezcDocumentPdfPage state object, which contains
  268. information of already covered parts of the pages.
  269. ezcDocumentPdfParagraphRenderer
  270. Example for the concrete aspect specific renderer classes, which only
  271. implement the rendering of small parts of a document, like single
  272. paragraphs, tables, or table cell contents.
  273. ezcDocumentPdfPage
  274. State object describing the current state of a single page in the PDF
  275. document, like still available space.
  276. ezcDocumentPdfHyphenator
  277. Abstract base class for hyphenation implementations for more accurate word
  278. wrapping.