tutorial.txt 32 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909
  1. ========================
  2. eZ Components - Document
  3. ========================
  4. .. contents:: Table of Contents
  5. :depth: 3
  6. Introduction
  7. ============
  8. The document component offers transformations between different semantic markup
  9. languages, like:
  10. - `ReStructured text`__
  11. - `XHTML`__
  12. - `Docbook`__
  13. - `eZ Publish XML markup`__
  14. - Wiki markup languages, like: Creole__, Dokuwiki__ and Confluence__
  15. - `Open Document Text`__ as used by `OpenOffice.org`__ and other office suites
  16. Like shown in figure 1, each format supports conversions from and to docbook
  17. as a central intermediate format and may implement additional shortcuts for
  18. conversions from and to other formats. Not each format can express the same
  19. semantics, so there may be some information lost, which is `documented in a
  20. dedicated document`__.
  21. .. figure:: img/document-architecture.png
  22. :alt: Conversion architecture in document component
  23. Figure 1: Conversion architecture in document component
  24. There are central handler classes for each markup language, which follow a
  25. common conversion interface ezcDocument and all implement the methods
  26. getAsDocbook() and createFromDocbook().
  27. Additionally the document component can render documents in the following
  28. output formats. Those formats cannot be read, but just generated:
  29. - PDF
  30. __ http://docutils.sourceforge.net/rst.html
  31. __ http://www.w3.org/TR/xhtml1/
  32. __ http://www.docbook.org/
  33. __ Document_conversion.html
  34. __ http://doc.ez.no/eZ-Publish/Technical-manual/4.x/Reference/XML-tags
  35. __ http://www.wikicreole.org/
  36. __ http://www.dokuwiki.org/dokuwiki
  37. __ http://confluence.atlassian.com/renderer/notationhelp.action?section=all
  38. __ http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office
  39. __ http://www.openoffice.org/
  40. Markup languages
  41. ================
  42. The following markup languages are currently handled by the document
  43. component.
  44. ReStructured text
  45. -----------------
  46. `RsStructured Text`__ (RST) is a simple text based markup language, intended
  47. to be easy to read and write by humans. Examples can be found in the
  48. `documentation of RST`__.
  49. The transformation of a simple RST document to docbook can be done just like
  50. this:
  51. .. include:: tutorial/00_00_convert_rst.php
  52. :literal:
  53. In line 3 the document is actually loaded and parsed into an internal abstract
  54. syntax tree. In line 5 the internal structure is then transformed back to a
  55. docbook document. In the last line the resulting document is returned as a
  56. string, so that you can echo or store it.
  57. __ http://docutils.sourceforge.net/rst.html
  58. __ http://docutils.sourceforge.net/docs/user/rst/quickstart.html
  59. Error handling
  60. ^^^^^^^^^^^^^^
  61. By default each parsing or compiling error will be transformed into an
  62. exception, so that you are noticed about those errors. The error reporting
  63. settings can be modified like for all other document handlers::
  64. <?php
  65. $document = new ezcDocumentRst();
  66. $document->options->errorReporting = E_PARSE | E_ERROR | E_WARNING;
  67. $document->loadFile( '../tutorial.txt' );
  68. $docbook = $document->getAsDocbook();
  69. echo $docbook->save();
  70. ?>
  71. Where the setting in line 3 causes, that only warnings, errors and fatal errors
  72. are transformed to exceptions now, while the notices are only collected, but
  73. ignored. This setting affects both, the parsing of the source document and the
  74. compiling into the destination language.
  75. Directives
  76. ^^^^^^^^^^
  77. `RST directives`__ are elements in the RST documents with parameters, optional
  78. named options and optional content. The document component implements a well
  79. known subset of the `directives implemented in the docutils RST parser`__. You
  80. may register custom directive handlers, or overwrite existing directive
  81. handlers using your own implementation. A directive in RST markup with
  82. parameters, options and content could look like::
  83. My document
  84. ===========
  85. The custom directive:
  86. .. my_directive:: parameters
  87. :option: value
  88. Some indented text...
  89. For such a directive you should register a handler on the RST document, like::
  90. <?php
  91. $document = new ezcDocumentRst();
  92. $document->registerDirective( 'my_directive', 'myCustomDirective' );
  93. $document->loadFile( $from );
  94. $docbook = $document->getAsDocbook();
  95. $xml = $docbook->save();
  96. ?>
  97. The class myCustomDirective must extend the class ezcDocumentRstDirective, and
  98. implement the method toDocbook(). For rendering you get access to the full AST,
  99. the contents of the current directive and the base path, where the document
  100. resist in the file system - which is necessary for accessing external files.
  101. __ http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#directives
  102. __ http://docutils.sourceforge.net/docs/ref/rst/directives.html
  103. Directive example
  104. `````````````````
  105. A full example for a custom directive, where we want to embed real world
  106. addresses into our RST document and maintain the semantics in the resulting
  107. docbook, could look like::
  108. Address example
  109. ===============
  110. .. address:: John Doe
  111. :street: Some Lane 42
  112. We would possibly add more information, like the ZIP code, city and state, but
  113. skip this to keep the code short. The implemented directive then would just
  114. need to take these information and transform it into valid docbook XML using
  115. the DOM extension.
  116. .. include:: tutorial/00_01_address_directive.php
  117. :literal:
  118. The AST node, which should be rendered, is passed to the constructor of the
  119. custom directive visitor and available in the class property $node. The
  120. complete DOMDocument and the current DOMNode are passed to the method. In this
  121. case we just create a `address node`__ with the optional child nodes street and
  122. personname, depending on the existence of the respective values.
  123. You can now render the RST document after you registered you custom directive
  124. handler as shown above:
  125. .. include:: tutorial/00_02_custom_directive.php
  126. :literal:
  127. The output will then look like::
  128. <?xml version="1.0"?>
  129. <article xmlns="http://docbook.org/ns/docbook">
  130. <section id="address_example">
  131. <sectioninfo/>
  132. <title>Address example</title>
  133. <address>
  134. <personname> John Doe</personname>
  135. <street> Some Lane 42</street>
  136. </address>
  137. </section>
  138. </article>
  139. __ http://docbook.org/tdg/en/html/address.html
  140. XHTML rendering
  141. ^^^^^^^^^^^^^^^
  142. For RST a conversion shortcut has been implemented, so that you don't need to
  143. convert the RST to docbook and the docbook to XHTML. This saves conversion time
  144. and enables you to prevent from information loss during multiple conversions::
  145. <?php
  146. $document = new ezcDocumentRst();
  147. $document->loadFile( $from );
  148. $xhtml = $document->getAsXhtml();
  149. $xml = $xhtml->save();
  150. ?>
  151. The default XHTML compiler generates complete XHTML documents, including header
  152. and meta-data in the header. If you want to in-line the result, you may specify
  153. another XHTML compiler, which just creates a XHTML block level element, which
  154. can be embedded in your source code::
  155. <?php
  156. $document = new ezcDocumentRst();
  157. $document->options->xhtmlVisitor = 'ezcDocumentRstXhtmlBodyVisitor';
  158. $document->loadFile( $from );
  159. $xhtml = $document->getAsXhtml();
  160. $xml = $xhtml->save();
  161. ?>
  162. You can of course also use the predefined and custom directives for XHTML
  163. rendering. The directives used during XHTML generation also need to implement
  164. the interface ezcDocumentRstXhtmlDirective.
  165. Modification of XHTML rendering
  166. ```````````````````````````````
  167. You can modify the generated output of the XHTML visitor by creating a custom
  168. visitor for the RST AST. The easiest way probably is to extend from one of the
  169. existing XHTML visitors and reusing it. For example you may want to fill the
  170. type attribute in bullet lists, like known from HTML, which isn't valid XHTML,
  171. though::
  172. class myDocumentRstXhtmlVisitor extends ezcDocumentRstXhtmlVisitor
  173. {
  174. protected function visitBulletList( DOMNode $root, ezcDocumentRstNode $node )
  175. {
  176. $list = $this->document->createElement( 'ul' );
  177. $root->appendChild( $list );
  178. $listTypes = array(
  179. '*' => 'circle',
  180. '+' => 'disc',
  181. '-' => 'square',
  182. "\xe2\x80\xa2" => 'disc',
  183. "\xe2\x80\xa3" => 'circle',
  184. "\xe2\x81\x83" => 'square',
  185. );
  186. // Not allowed in XHTML strict
  187. $list->setAttribute( 'type', $listTypes[$node->token->content] );
  188. // Decoratre blockquote contents
  189. foreach ( $node->nodes as $child )
  190. {
  191. $this->visitNode( $list, $child );
  192. }
  193. }
  194. }
  195. The structure, which is not enforced for visitors, but used in the docbook and
  196. XHTML visitors, is to call special methods for each node type in the AST to
  197. decorate the AST recursively. This method will be called for all bullet list
  198. nodes in the AST which contain the actual list items. As the first parameter
  199. the current position in the XHTML DOM tree is also provided to the method.
  200. To create the XHTML we can now just create a new list node (<ul>) in the
  201. current DOMNode, set the new attribute, and recursively decorate all
  202. descendants using the general visitor dispatching method visitNode() for all
  203. children in the AST. For the AST children being also rendered as children in
  204. the XML tree, we pass the just created DOMNode (<ul>) as the new root node to
  205. the visitNode() method.
  206. After defining such a class, you could use the custom visitor like shown
  207. above::
  208. <?php
  209. $document = new ezcDocumentRst();
  210. $document->options->xhtmlVisitor = 'myDocumentRstXhtmlVisitor';
  211. $document->loadFile( $from );
  212. $xhtml = $document->getAsXhtml();
  213. $xml = $xhtml->save();
  214. ?>
  215. Now the lists in the generated XHTML will also the type attribute set.
  216. Writing RST
  217. ^^^^^^^^^^^
  218. Writing a RST document from an existing docbook document, or a
  219. ezcDocumentDocbook object generated from some other source, is trivial:
  220. .. include:: tutorial/00_03_write_rst.php
  221. :literal:
  222. For the conversion internally the ezcDocumentDocbookToRstConverter class is
  223. used, which can also be called directly, like::
  224. $converter = new ezcDocumentDocbookToRstConverter();
  225. $rst = $converter->convert( $docbook );
  226. Using this you can configure the converter to your wishes, or extend the
  227. convert to handle yet unhandled docbook elements. The converter is, as usaul
  228. configured using its option property, and the options are defined in the
  229. ezcDocumentDocbookToRstConverterOptions class. There you may configure the
  230. header underlines used, the bullet types or the line wrapping.
  231. Extending RST writing
  232. `````````````````````
  233. As said before, not all existing docbook elements might already be handled by
  234. the converter. But its handler based mechanism makes it easy to extend or
  235. overwrite existing behaviour.
  236. Similar to the example above we can convert the <address> docbook element back
  237. to the address RST directive.
  238. .. include:: tutorial/00_04_address_element.php
  239. :literal:
  240. The handler classes are assigned to XML elements in some namespace, "docbook"
  241. in this case. It is registered in line 18 for the element "address". The class
  242. itself has to extend from the ezcDocumentElementVisitorHandler class, which is
  243. in this case already extended by ezcDocumentDocbookToRstBaseHandler, which
  244. provides some convenience methods for RST creation, like renderDirective() used
  245. in this example.
  246. The handler is called, whenever the element, it has been registered for, occurs
  247. in the docbook XML tree. In this case it has to append the generated RST part
  248. for this element to the RST document - and may call the general conversion
  249. handler again for its child elements. This example converts the above shown
  250. docbook XML back to::
  251. .. _address_example:
  252. ===============
  253. Address example
  254. ===============
  255. .. address::
  256. John Doe
  257. Some Lane 42
  258. Which ignores any special address sub elements for the simplicity of the
  259. example. For more examples on element handlers check the existing
  260. implementations.
  261. XHTML
  262. -----
  263. Converting XHTML or HTML to a document markup language is a non trivial task,
  264. because XHTML elements are often used for layout, ignoring the actual semantics
  265. of the element. Therefore the document component allows to stack a set of
  266. filters, which each performs a specific conversion task. The default filter
  267. stack may work fine, but you may want to also implement custom filters
  268. depending on the contents of the filtered website, or to cover additional
  269. sources of meta data information, like RDF, Microformats or similar.
  270. The available filters are:
  271. - ezcDocumentXhtmlElementFilter
  272. This filter just maintains the common semantics of XHTML elements by
  273. converting them to their docbook equivalents. It ignores common class names.
  274. This filter is the most basic and you probably want to always add this one to
  275. the filter stack.
  276. - ezcDocumentXhtmlXpathFilter
  277. The XPath filter takes a XPath expression to locate the root of the document
  278. contents. It makes no sense to use this one together with the content locator
  279. filter. This is a more static, but also more precise way to tell the
  280. converter where to find the actual contents.
  281. - ezcDocumentXhtmlMetadataFilter
  282. This filter extracts common meta data from the XHTML head, and converts it
  283. into docbook section info elements.
  284. - ezcDocumentXhtmlTablesFilter
  285. HTML tables are especially often used for layout markup. This filter takes a
  286. threshold, and if the table text factor drops below this threshold the table
  287. is ignored. The same is true for stacked tables.
  288. - ezcDocumentXhtmlContentLocatorFilter
  289. The content locator filter tries to find the actual article in the markup of
  290. a website, ignoring the surrounding layout markup. This seems to work well
  291. for example for common news sites.
  292. By default just the element and meta data filters are used. So the conversion
  293. of a common website, like the `introduction article`__ from ezcomponents.org,
  294. results in a docbook document containing all lists for the navigation, etc..
  295. .. include:: tutorial/01_00_read_html.php
  296. :literal:
  297. So let's additionally use the XPath filter to pass the location of the actual
  298. content to the conversion:
  299. .. include:: tutorial/01_01_read_html_filtered.php
  300. :literal:
  301. With this additional filter, the contents are correctly found and converted
  302. properly.
  303. __ http://ezcomponents.org/introduction
  304. Writing XHTML
  305. ^^^^^^^^^^^^^
  306. Writing XHTML from docbook is very similar to the approach used for writing
  307. RST: It the same handler based mechanism, so you may want to check that chapter
  308. to learn how to extend it for unhandled docbook elements.
  309. .. include:: tutorial/01_02_write_html.php
  310. :literal:
  311. As you can see, it happens the same way, as for other conversion from Docbook
  312. to any other format.
  313. HTML styles
  314. ^^^^^^^^^^^
  315. By default inline CSS is embedded in all generated HTML, to create a more
  316. appealing default experience. This may of course be deactivated and you may
  317. also reference custom style sheets to be included in the generated HTML.
  318. .. include:: tutorial/01_03_write_html_styled.php
  319. :literal:
  320. For this we again use the converted directly to be able to configure it as we
  321. like.
  322. eZ Xml
  323. ------
  324. eZ XML describes the markup format used internally by `eZ Publish`__ for
  325. storing markup in content objects. The format is roughly specified in the `eZ
  326. Publish documentation`__.
  327. Modules are often register custom elements, which are not specified anywhere,
  328. so there might be several elements not handled by default.
  329. __ http://ez.no/ezpublish
  330. __ http://ez.no/doc/ez_publish/technical_manual/4_0/reference/xml_tags
  331. Reading eZ XML
  332. ^^^^^^^^^^^^^^
  333. Reading eZ XML is basically the same as for all other formats:
  334. .. include:: tutorial/02_00_read_ezxml.php
  335. :literal:
  336. As always the document object is either constructed from an input string or
  337. file. To convert into docbook you may just use the method getAsDocbook().
  338. Link handling
  339. `````````````
  340. Inside eZ XML documents link URIs are replaced with IDs, which reference the
  341. links inside the eZ Publish database, to ensure that a changed link is update
  342. globally. The replacing of such links is handled by a class extending from
  343. ezcDocumentEzXmlLinkProvider. By default dummy URLs are added to the documents.
  344. URLs are either referenced directly by their ID, a node ID, or an object ID.
  345. Those parameters are passed to the link provide, which then should return an
  346. URL for that.
  347. .. include:: tutorial/02_01_link_provider.php
  348. :literal:
  349. The link provider is only implemented as a trivial stub, but you can establish
  350. a database connection there and actually fetch the required data. I this case
  351. the generated docbook document look like::
  352. <?xml version="1.0"?>
  353. <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
  354. <article xmlns="http://docbook.org/ns/docbook">
  355. <section>
  356. <title>Paragraph</title>
  357. <para>Some content, with a <ulink url="http://host/path/1">link</ulink>.</para>
  358. </section>
  359. </article>
  360. The link provider is set again as a option of the converter. Like shown for the
  361. docbook conversions of the other handlers, you can register element handlers
  362. for yet unhandled eZ XML elements on the converter, too.
  363. Wrting eZ XML
  364. ^^^^^^^^^^^^^
  365. Writing eZ XML works nearly the same as reading. It again uses a XML based
  366. element handled, like shown in the Docbook to RST conversion in more detail.
  367. For the link conversion an object extending from ezcDocumentEzXmlLinkConverter
  368. is used, which returns an array with the attributes of the link in the eZ XML
  369. document.
  370. Wiki markup
  371. -----------
  372. Wiki markup has no central standard, but is used as a term to describe some
  373. common subset with lots of different extensions. Most wiki markup languages
  374. only support a quite trivial markup with severe limitations on the recursion of
  375. markup blocks. For example no markup really tables containing lists, or
  376. especially not tables containing other tables.
  377. The document component implements a generic parser to support multiple wiki
  378. markup languages. For each different markup syntax a tokenizer has to be
  379. implemented, which converts the implemented markup into a unified token stream,
  380. which can then be handled by the generic parser.
  381. The document component currently supports reading three wiki markup languages,
  382. but new ones are added easily by implementing another tokenizer. Supported are:
  383. - Creole__, developed by a initiative with the intention to create a unified
  384. wiki markup standard. This is the default wiki language, and currently the
  385. only one which can be written.
  386. Creole currently only supports a very limited set of markup__, all further
  387. markup additions are still up to discussion.
  388. - Dokuwiki__ is a popular wiki system, for example used on `wiki.php.net`__
  389. with a quite different syntax, and the most complete markup support, even
  390. including something like footnotes.
  391. - Confluence__ is a common Java based wiki with an entirely different and most
  392. uncommon syntax, which has mainly been implemented to prove the generic
  393. nature of the parser.
  394. All markup languages are tested against all examples from the respective
  395. markup language documentation, there might still be cases where the parsers of
  396. the default implementation behaves slightly different from the implementation
  397. in the document component.
  398. __ http://www.wikicreole.org/
  399. __ http://www.wikicreole.org/wiki/Elements
  400. __ http://www.dokuwiki.org/dokuwiki
  401. __ http://wiki.php.net/
  402. __ http://confluence.atlassian.com/renderer/notationhelp.action?section=all
  403. Reading wiki markup
  404. ^^^^^^^^^^^^^^^^^^^
  405. Reading wiki texts basically works like for any other markup language:
  406. .. include:: tutorial/03_00_read_wiki.php
  407. :literal:
  408. As said, by default the Creoletokenizer is used. The same result can be
  409. produced with dokuwiki markup and switching the tokenizer:
  410. .. include:: tutorial/03_01_read_wiki_confluence.php
  411. :literal:
  412. Writing wiki markup
  413. ^^^^^^^^^^^^^^^^^^^
  414. Until now only writing of creole wiki markup is supported. Since creole does
  415. not support a lot of the markup available in docbook, not all documents might
  416. get converted properly. Because it does not even support explicit internal
  417. references, we cannot even simulate footnotes like in HTML.
  418. If you want to add support for such conversions, it works exactly like the
  419. docbook RST conversion and can be extended the same way.
  420. .. include:: tutorial/03_02_write_wiki.php
  421. :literal:
  422. PDF
  423. ---
  424. PDF (Portable Document Format) has been developed to provide a document
  425. format, which can be presented software and system independent. Because of
  426. this it is often used as a pre-print document exchange format.
  427. The document componen can generate PDF document from all other input formats
  428. and offers a language very similar to CSS to apply custom styling to the
  429. generated output. Additionally it supports adding custom parts, like footers
  430. and headers, to the PDF document.
  431. Reading PDF
  432. ^^^^^^^^^^^
  433. The document component for now does not support reading PDF documents.
  434. Writing PDF
  435. ^^^^^^^^^^^
  436. Writing PDF basically works like writing any other format supported by the
  437. document component, like the basic example shows:
  438. .. include:: tutorial/04_01_create_pdf.php
  439. :literal:
  440. First we include some RST file to create a Docbook file from it, because, like
  441. described before, Docbook is the central conversion format.
  442. Afterwards the Docbook document is loaded by the PDF class and saved. When
  443. converting the document to a string the PDF is renderer using the default
  444. options and the default driver. The result of this rendering call can be
  445. watched here: `04_01_create_pdf.pdf`__.
  446. __ 04_01_create_pdf.pdf
  447. Output writers
  448. ``````````````
  449. Since there are numerous different PDF renderers in the PHP world and the
  450. available ones might depend on the current environment, the document component
  451. supports different PDF driver, as wrapper around different existent libraries.
  452. For now two implementation exist for pecl/haru and TCPDF, but it is fairly easy
  453. to write another one, for another PDF class.
  454. Haru
  455. """"
  456. libharu__ is a open source PDF generation library, written in C, and wrapped
  457. by the haru PHP extension, available from PECL__. If PEAR is correctly setup
  458. on your machine it should install as easy as::
  459. pear install pecl/haru
  460. The Haru driver is pretty fast, but currently has issues with some special
  461. characters. It is the default driver, but can be explicitly used by setting
  462. the driver option on the PDF class, like::
  463. $pdf = new ezcDocumentPdf();
  464. $pdf->options->driver = new ezcDocumentPdfHaruDriver();
  465. __ http://libharu.org
  466. __ http://pecl.php.net/package/haru
  467. TCPDF
  468. """""
  469. TCPDF is a pure PHP based PDF generation library, available from
  470. `tcpdf.org`__. To use the TCPDF driver you need to download and include its
  471. main class before rendering the PDF. It supports all aspects of PDF rendering
  472. required by the document component, but has some bad coding practices, like:
  473. - Throws lots of warnings and notices, which you might want to silence by
  474. temporarily changing the error reporting level
  475. - Reads and writes several global variables, which might or might not
  476. interfere with your application code
  477. - Uses eval() in several places, which results in non-cacheable OP-Codes.
  478. The TCPDF driver can be used after including the TCPDF source code, using::
  479. $pdf = new ezcDocumentPdf();
  480. $pdf->options->driver = new ezcDocumentPdfTcpdfDriver();
  481. __ http://tcpdf.org
  482. Styling the PDF
  483. ```````````````
  484. The PDF output can be styled using a CSS like language, which assigns styles
  485. based on the Docbook XML structure. The default styling rules are defined in
  486. the `default.css`__.
  487. __ https://svn.apache.org/repos/asf/incubator/zetacomponents/trunk/Document/src/pcss/style/default.css
  488. The first most relevant part are the general layout options, which can be
  489. defined for the common article root node in the Docbook XML file. You can set
  490. global font options there, like::
  491. article {
  492. // Basic font style definitions
  493. font-size: "12pt";
  494. font-family: "serif";
  495. font-weight: "normal";
  496. font-style: "normal";
  497. line-height: "1.4";
  498. text-align: "left";
  499. // Basic page layout definitions
  500. text-columns: "1";
  501. text-column-spacing: "10mm";
  502. // General text layout options
  503. orphans: "3";
  504. widows: "3";
  505. }
  506. The meaning of the first set of options should be obvious from CSS. We require
  507. each value to be wrapped by quotes for easier parsing, though.
  508. The second set of options defines options for multi-column layouts, which are
  509. not available in the web, but quite common in generated PDF documents. You can
  510. specify the number of text columns, as well as the distance between the text
  511. columns here.
  512. The third set in this example defines lesser known text layout options like
  513. the handling of `orphans and widows`__, which specify the handling of
  514. overlapping parts of paragraphs on page wrapping.
  515. You can, of course, apply those styles to any elements in your document, using
  516. the common CSS addressing rules, like::
  517. // Emphasis node anywhere in the document
  518. emphasis { ... }
  519. // Title element directly below a section element
  520. section > title { ... }
  521. // Title element anywhere below a section element
  522. section title { ... }
  523. // Title element with the ID "first_title"
  524. title#first_title { ... }
  525. // Title element with the class "foo"
  526. title.foo { ... }
  527. // emphasis node directly below a title with class "foo", anywhere in a
  528. // section with the ID "first"
  529. section#first title.foo > emphasis { ... }
  530. The values and `measures`__ for the properties are very similar to the
  531. properties in CSS. For example the margin and padding properties accept one-
  532. to four-tuples of values, with the same respective meaning like in CSS.
  533. Another central formatting element, which is special to the PDF generation, is
  534. the virtual element "page"::
  535. page {
  536. page-size: "A4";
  537. page-orientation: "portrait";
  538. padding: "22mm 16mm";
  539. }
  540. The page-size property accepts several known page size identifiers and the
  541. page-orientation defines the orientation of a page. You can also address any
  542. page directly by its ID, which will be 'page_1' for the first page, or its
  543. class, which will be "right", or "left", depending on the current page number.
  544. A detailed description of all available `PDF style options`__ is available
  545. here__.
  546. __ http://en.wikipedia.org/wiki/Widows_and_orphans
  547. __ measures
  548. __ Document_styles.html
  549. __ Document_styles.html
  550. Measures
  551. """"""""
  552. The properties in the PDF component accept different measures, which are:
  553. - "mm", Millimeters, the default measure, if none is specified
  554. - "pt", Points, 72 points per inch
  555. - "px", Pixel, depends on the set resolution, by default also 72 points per
  556. inch
  557. - "in", Inch
  558. The unit "Points" is most common for font sizes, while millimeters or inches
  559. will probably more useful for page paddings. You are free to choose any of
  560. them and can even combine different units in one tuple, like::
  561. para {
  562. // Top margin: 12 mm; Right margin: .1 inch; Bottom margin: 10 points,
  563. // Left margin: 1 pixel
  564. margin: "12 .1in 10pt 1px";
  565. }
  566. PDF parts
  567. `````````
  568. PDF parts are additional parts in a rendered document, like headers and
  569. footers. You can implement and register them yourself, and they are activated
  570. by different triggers, like:
  571. - on document creation
  572. - on page creation
  573. - when a document has been finished
  574. The default implementation for headers and footers is triggered on page
  575. creation and renders the title of the document, its author and a page number
  576. in the header or the footer. To develop a custom PDF part you should extend
  577. from the ezcDocumentPdfPart class.
  578. For the following document we are using a set of custom styles, as well as a
  579. header and a footer to customize the rendered PDF document. The additional
  580. custom CSS changes the default font and the page border:
  581. .. include:: tutorial/custom.css
  582. :literal:
  583. The code using the custom CSS and headers and footers then looks like:
  584. .. include:: tutorial/04_02_create_pdf_styled.php
  585. :literal:
  586. The first part, the creation of a Docbook document from a RST document is just
  587. the same like in the first example.
  588. Afterwards we load the above mentioned custom.css as an additional style. You
  589. can load as many styles as you want. If multiple styles are loaded, the latter
  590. ones always (partly) redefine the first styles.
  591. After that two custom PDF parts are registered using their respective option
  592. class to configure their skin. The footer should only show the page number,
  593. while the header should display all parts (title and author), but the page
  594. number.
  595. At the end of the example the document is created as usual, and looks like
  596. this: `04_02_create_pdf_styled.pdf`__ Since the source document does not
  597. include any author information, this information is also not rendered in the
  598. header.
  599. __ 04_02_create_pdf_styled.pdf
  600. Hyphenating
  601. ```````````
  602. Proper hyphenation is crucial for nice text rendering especially for justified
  603. paragraph formatting. Since hyphenation is highly language dependent you can
  604. create and use your own custom hyphenator - the default one doesn't do any
  605. hyphenation by default, but just keeps every word as it is.
  606. Custom hyphenators can be implemented by extending from the abstract class
  607. ezcDocumentPdfHyphenator. The only need to implement one Method,
  608. ```splitWord()```, which should return possible splitting points of the given
  609. word, as documented in the ezcDocumentPdfHyphenator class.
  610. The custom hyphenator can be configured in the ezcDocumentPdfOptions class,
  611. like this::
  612. $pdf = new ezcDocumentPdf();
  613. $pdf->options->hyphenator = new myHyphenator();
  614. The hyphenator will then be used by all text renderers during the rendering
  615. process.
  616. Open Document Text
  617. ------------------
  618. The Open Document Text (ODT) format is natively provided by the
  619. `OpenOffice.org`__ office application suite and supported by other common word
  620. processing tools. The Document component supports importing, exporting and
  621. styling of ODT files.
  622. .. note:: By now only im- and export of flat ODT (.fodt) files is possible.
  623. These can be processed by OpenOffice.org natively. To store FODT,
  624. simply choose the file type from the save dialog.
  625. Reading ODT
  626. ^^^^^^^^^^^
  627. The ODT document class reads FODT files and converts them into the internal
  628. Docbook representation of the Document component:
  629. .. include:: tutorial/05_00_read_fodt.php
  630. :literal:
  631. You can generate any of the supported document formats from the Docbook
  632. representation.
  633. FODT files may contain embedded media files, i.e. usually images, which will be
  634. extracted during the import process. You can specify the directory where these images will
  635. be stored through the ```imageDir``` option::
  636. <?php
  637. $odt->options->imageDir = '/path/to/your/images';
  638. ?>
  639. The default is your systems temporary directory.
  640. Since Open Document only contains few semantical information compared to
  641. Docbook, the import mechanism performs heuristic detection of information like
  642. emphasized text. This mechanism is quite rudimentary by now and will be made
  643. available as a public API as it matured.
  644. Writing ODT
  645. ^^^^^^^^^^^^^
  646. FODT files can be written similar to any of the other formats supported by the
  647. Document component:
  648. .. include:: tutorial/05_01_write_fodt.php
  649. :literal:
  650. Styling ODT
  651. ^^^^^^^^^^^
  652. FODT output can be styled using a CSS like language similar to `Styling the
  653. PDF`_. Using simplified CSS you assign style rules to Docbook XML elements,
  654. which are generated into automatic styles in the resulting Open Document. The
  655. default styling rules (`default.css`__) are the same as for PDF.
  656. __ https://svn.apache.org/repos/asf/incubator/zetacomponents/trunk/Document/src/pcss/style/default.css
  657. Applying custom styles can be done as follows:
  658. .. include:: tutorial/05_02_write_fodt_styled.php
  659. :literal:
  660. A detailed description of the available `style options` is available `here`__.
  661. __ Document_styles.html
  662. __ Document_styles.html
  663. ..
  664. Local Variables:
  665. mode: rst
  666. fill-column: 79
  667. End:
  668. vim: et syn=rst tw=79