123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203 |
- ===============================
- Requirements for PDF generation
- ===============================
- Generation of PDF documents should be added to the document component. Those
- PDF documents should be created from Docbook documents, which can be generated
- from each markup available in the document component.
- This document summarizes the requirements for PDF generation.
- Central requirements
- ====================
- The key requirements for the component
- Layout
- ------
- The central requirement is to generate user styles PDF documents from Docbook
- markup. The customized styling should include, but is not limited to:
- - Text
- - Fonts and text sizes
- - Line heights
- - Colors
- - Alignments
- - Pages
- - Footer and headers
- - Background images
- - Multiple text columns per page
- - Page sizes
- - Margins and paddings
- - Block level elements (tables, graphics, literal blocks, ...)
- - Borders, backgrounds, fonts, colors
- It must be possible to assign different styles depending on the parent
- elements in the Docbook markup, so that the titles in the following Docbook
- document can be formatted differently::
- <?xml version="1.0"?>
- <article>
- <title>Article title</title>
- <section>
- <title>First heading</title>
- <section>
- <title>Second heading</title>
- </section>
- </section>
- </article>
- It would be nice if the styles can be imported and exported from / to a easily
- readable and writeable format.
- Text formatting
- ---------------
- Proper formatting of texts is most probably the biggest problem in the
- implementation, the requirements include:
- Hyphenation
- ^^^^^^^^^^^
- Especially justified texts in narrow text columns requires hyphenation for
- words, otherwise the blanks between characters and words might increase to
- much. A pluggable hyphenation mechanism is required, which can be adapted to
- different languages, based on externally available dictionaries.
- Widows and orphans
- ^^^^^^^^^^^^^^^^^^
- See: http://en.wikipedia.org/wiki/Widows_and_orphans
- There should be ways to configure the thresholds under which paragraphs are
- considered widows or orphans, which should be avoided.
- Inline formatting
- ^^^^^^^^^^^^^^^^^
- Depending on the used font and styles inline formatting might have a serious
- effect on the text width. This MUST be respecting during text rendering.
- LTF and RTL languages
- ^^^^^^^^^^^^^^^^^^^^^
- The text wrapping must be able to work with left-to-right and right-to-left
- languages.
- Floating media objects
- ^^^^^^^^^^^^^^^^^^^^^^
- For media objects, which do not span the whole column width, it should be
- possible to float text around the media objects. Detection of the actual image
- borders is not required - the rectangular frame around the image should be
- sufficant for text floating.
- Embedding of media
- ------------------
- There are a lot of different media types, which might be embedded into PDF:
- The most common format seem to be JPEG and EPS. JPEG is not suitable for
- several types of graphics [1], and EPS can only be used properly for some
- types of vector based images. Conversion options and supported formats must be
- evaluated.
- It might depend on the used driver which formats are supported.
- PDI allows embedding of other PDFs inside the created PDF - this can be useful
- when merging different generated documents.
- .. [1] http://kore-nordmann.de/blog/image_formats.html
- Metadata
- --------
- The document component already preserves metadata associated with documents.
- PDF supports embedding additional document metadata. This should definitely
- be embedded, but it might also be useful to offer a easy accessible API for
- embedding of additional metadata. XMP is especially designed to embed metadata
- using the RDF.
- Autogenerated contents
- ----------------------
- Headers and footers often contain some fixed texts, but might also contain
- autogenerated contents, like:
- - Current page / number of pages / page orientation (left, right)
- - Current section title
- - Author, read from document metadata
- It must be possible to define callbacks which generate those contents for the
- page they are currently rendered on. The best possible markup used for
- generation of those contents needs to be evaluated.
- There are several elements, which can require automatic generation, those are
- at least:
- - Header / Footer
- - Cover page
- - Table of contents
- - Back page
- For most of those elements a predefined generator can be implemented which
- creates meaningful default contents, and then can be extended by the user.
- Especially for cover and back pages it might be useful to include them
- directly from other PDF documents.
- Driver infrastructure
- ---------------------
- There are multiple ways to generate PDF documents, like:
- - pecl/libharu
- - FPDF
- - TCPDF
- - pdflib
- - Zend_PDF
- It might depend on the environment which one of those libraries is available
- and performs the best. A driver infrastructure should offer the user the
- choice of selecting the best output driver for writing the actual PDF. Not all
- of those drivers do support proper text wrapping themselves, so that this
- cannot be handed over to the drivers.
- Optional requirements
- =====================
- Once PDF rendering is implemented correctly, including correct rendering and
- wrapping of texts, it might be useful in similar cases, for example:
- - SVG to PDF conversion
- The conversion of SVG to PDF is used for distribution of heavily customized
- designed documents. With a proper rendering infrastructure the API should be
- kept flexible enough to support such conversions later
- - HTML to PDF conversion
- It might be useful to directly convert styled HTML to PDF - if the API stays
- flexible enough this should be possible to add later.
- One major problem might be the used markup for formatting of inline text
- elements.
- Import of PDF pages
- -------------------
- For cover pages (or similar) of documents it might be useful to extract whole
- pages from other PDFs and embed them in the generated PDF document.
- This requires reading of existing PDF documents, though - which is not planned
- to be implemented yet.
- Signing PDFs / write protection
- -------------------------------
- It is common to make PDF documents write protected or sign PDF documents. If
- the respective PDF creation library can handle that, it should be exposed in
- the API of the PDF creation.
|