123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365 |
- ==================================
- Design document for PDF generation
- ==================================
- :Author: kn
- This is design document for the PDF generation in the eZ Components document
- component. PDF documents should be created from Docbook documents, which can
- be generated from each markup available in the document component.
- The requirements which should be designed in this document are specified in
- the pdf_requirements.txt document.
- Layout directives
- =================
- The PDF document will be created from the Docbook document object. It will
- already contain basic formatting rules for meaningful default document layout.
- Additional rules may be passed to the document in various ways.
- Generally a CSS like approach will be used to encode layout information. This
- allows both, the easily readable addressing of nodes in an XML tree, like
- already known from CSS, and humanly readable formatting options.
- A limited subset of CSS will be used for now for addressing elements inside the
- Docbook XML tree. The grammar for those rules will be::
- Address ::= Element ( Rule )*
- Rule ::= '>'? Element
- Element ::= ElementName ( '.' ClassName | '#' ElementId )
- ClassName ::= [A-Za-z_-]+
- ElementName ::= XMLName* | '*'
- ElementId ::= XMLName
- * XMLName references to http://www.w3.org/TR/REC-xml/#NT-Name
- The semantics of this simple subset of addressing directives are the same as in
- CSS. A second level title could for example then be addressed by::
- section title
- The formatting options are also mostly the same as in CSS, but again only
- using a subset of the definitions available in CSS and with some additional
- formatting options, relevant especially for PDF rendering. The used formatting
- options depend on the renderer - unknown formatting options may issue errors
- or warnings.
- The PDF document wrapper class will implement Iterator and ArrayAccess to
- access the layout directives, like the following example shows::
- $pdf = new ezcDocumentPdf();
- $pdf->createFromDocbook( $docbook );
- $pdf->styles['article > section title']['font-size'] = '1.6em';
- Directives which are specified later will always overwrite earlier directives,
- for each each formatting option specified in the later directive. The
- overwriting of formatting options will NOT depend on the complexity of the
- node addressing like in CSS.
- Importing and exporting layout directives
- -----------------------------------------
- The layout directives can be exported and imported to and from files, so that
- users of the component may store a custom PDF layout. The storage format will
- again very much look like a simplified variant of CSS::
- File ::= Directive+
- Directive ::= Address '{' Formatting* '}'
- Formatting ::= Name ':' '"' Value '"' ';'
- Name ::= [A-Za-z-]+
- Value ::= [^"]+
- C-style comments are allowed anywhere in the definition file, like ```/* ..
- */``` and ```// ...```.
- Importing and exporting styles may be accomblished by::
- $pdf->styles->load( 'styles.pcss' );
- List of formatting options
- --------------------------
- There will be formatting options just processed, like they are defined in CSS,
- and some custom options. The options just reused from CSS are:
- - background-color
- - background-image
- - background-position
- - background-repeat
- - border-color
- - border-width
- - border-bottom-color
- - border-bottom-width
- - border-left-color
- - border-left-width
- - border-right-color
- - border-right-width
- - border-top-color
- - border-top-width
- - color
- - direction
- - font-family
- - font-size
- - font-style
- - font-variant
- - font-weight
- - line-height
- - list-style
- - list-style-position
- - list-style-type
- - margin
- - margin-bottom
- - margin-left
- - margin-right
- - margin-top
- - orphans
- - padding
- - padding-bottom
- - padding-left
- - padding-right
- - padding-top
- - page-break-after
- - page-break-before
- - text-align
- - text-decoration
- - text-indent
- - white-space
- - widows
- - word-spacing
- Custom properties are:
- text-columns
- Number of text text columns in one section.
- text-column-spacing
- The margin between multiple text comlumns on one page
- page-size
- Size of pages
- page-orientation
- Orientation of pages
- Not all options can be applied to each element. The renderer might complain on
- invalid options, depending on the configured error level.
- Special layout elements
- =======================
- Footers & Headers
- -----------------
- Footnotes and Headers are special layout elements, which can be rendered
- manually by the user of the component. They can be considered as small
- sub-documents, but their renderer receives additional information about the
- current page they are rendered on.
- They can be set like::
- $pdf = new ezcDocumentPdf();
- $pdf->createFromDocbook( $docbook );
- $pdf->footer = new myDocumentPdfPart();
- Each of those parts can render itself and calculate the appropriate bounding.
- There might be extensions from the basic PDFPart class, which again render small
- Docbook sub documents into one header, or just take a string, replacing
- placeholders with page dependent contents.
- Possible implementations would be:
- ezcDocumentPdfDocbookPart
- Receives a docbook document and renders it using a a defined style at the
- header or footer of the current page. Placeholders in the text,
- represented by, for example, entities might be replaced.
- ezcDocumentPdfStringPart
- Receives a simple string, in which simple placeholders are replaced.
- Other elements
- --------------
- There are various possible full site elements, which might be rendered before or
- after the actual contents. Those are for example:
- - Cover page
- - Bibliography
- - Back page
- To add those to on PDF document you can create a pdf set, which is then rendered
- into one file::
- $pdf = new ezcDocumentPdf();
- $pdf->createFromDocbook( $docbook );
- $set = new ezcDocumentPdfSet();
- $set->parts = array(
- new ezcDocumentPdfPdfPart( 'title.pdf' ),
- $customTableOfContents,
- $pdf,
- $bibliography,
- );
- $set->render( 'my.pdf' );
- Some of the documents aggregated in one set can of course again be documents
- created from Docbook documents. Each element in the set may contain custom
- layout directives.
- For the inclusion of other document parts into a PdfSet you are expected to
- extend from the PDF base class and implement you custom functionality there.
- This could mean generating idexes, or a bibliography from the content.
- Drivers
- =======
- The actual PDF renderer calls methods on the driver, which abstract the quirks
- of the respective implementations. There will be drivers for at least:
- - pecl/libharu
- - TCPDF
- Renderer
- ========
- The renderer will be responsible for the actual typesetting. It will receive a
- Docbook document, apply the given layout directives and calculate the
- appropriate calls to the driver from those.
- The renderer optionally receives a set of helper objects, which perform relevant
- parts of the typesetting, like:
- Hyphenator
- Class implementing hyphenation for a specific language. We might provide a
- default implementation, which reads standard hyphenation files.
- The renderer state will be shared using an object representing the page
- currently processed, which contains information about the already covered
- areas and therefore the still available space.
- Using such a state object, the state can easily be shared between different
- renderers for different aspects of the rendering process. This should allow us
- to write simpler rendering classes, which should be better maintainable then
- one big renderer class, which methods would take care of all aspects.
- This page state object, knowing about free space on the current page, for
- example allows to float text around images spanning multiple paragraphs,
- because the already covered space is encoded. This allows all renderers for
- the different aspects to reuse this knowledge and depend their rendering on
- this. The space already covered on a page will most probably be represented by
- a list of bounding boxes.
- Which renderer classes can be separated, will show up during implementation,
- but those for example could be:
- ezcDocumentPdfParagraphRenderer
- Takes care of rendering the Docbook inline markup inside one paragraph.
- Respects orphans and widows and might be required to split paragraphs.
- ezcDocumentPdfTableRenderer
- Renders tables. It might be useful to even split this up more into a table
- row and cell renderer.
- Additional renderer features
- ----------------------------
- If the used driver class implements the respective interfaces the renderer will
- also offer to sign PDF documents, or add write protection (or similar) to the
- PDF document.
- Example
- =======
- A full example for the creation of a PDF document from a HTML page could look
- like::
- <?php
- $html = new ezcDocumentXhtml();
- $html->loadFile( 'http://ezcomponents.org/introduction' );
- $pdf = new ezcDocumentPdf();
- $pdf->createFromDocbook( $html->getAsDocbook() );
- // Load some custom layout directives
- $pdf->style->load( 'my_styles.pcss' );
- $pdf->style['article']['text-columns'] = 3;
- // Set a custom header
- $pdf->header = new ezcDocumentPdfStringPart(
- '%title by %author - %pageNum / %pageCount'
- );
- // Set a custom paragraph renderer
- $pdf->renderer->paragraph = new myPdfParagraphRenderer();
- // Use the hyphenator with a german dictionary
- $pdf->renderer->hyphenator = new myDictionaryHyphenator(
- '/path/to/german.dict'
- );
- // Store the generated PDF
- file_put_contents( 'my.pdf', $pdf );
- ?>
- A file containing the layout directives could look like::
- article {
- page-size: "A4";
- }
- paragraph {
- font-family: "Bitstream Vera Sans";
- font-size: "1em";
- }
- article > title {
- font-weight: "bold";
- }
- section title {
- font-weight: "normal";
- }
- Classes
- =======
- The classes implemented for the PDF generation are:
- ezcDocumentPdf
- Base class, representing the PDF generation. Aggregates the style
- information, the docbook source document, renderer and page parts like
- footer and header.
- ezcDocumentPdfSet
- Class aggregating multiple ezcDocumentPdf objects, to create one single
- PDF document from multiple parts, like a cover page, the actual content, a
- bibliography, etc.
- ezcDocumentPdfStyles
- Class containing the PDF layout directives, also implements loading and
- storing of those layout directives.
- ezcDocumentPdfPart
- Abstract base class for page parts, like headers and footers. Renders the
- respective part and will be extended by multiple concrete
- implementations, which offer convient rendering methods.
- ezcDocumentPdfRenderer
- Basic renderer class, which aggregates renderers for distinct page
- elements, like paragraphs and tables, and dispatches the rendering to
- them. Also maintains the ezcDocumentPdfPage state object, which contains
- information of already covered parts of the pages.
- ezcDocumentPdfParagraphRenderer
- Example for the concrete aspect specific renderer classes, which only
- implement the rendering of small parts of a document, like single
- paragraphs, tables, or table cell contents.
- ezcDocumentPdfPage
- State object describing the current state of a single page in the PDF
- document, like still available space.
- ezcDocumentPdfHyphenator
- Abstract base class for hyphenation implementations for more accurate word
- wrapping.
|