wellton
/
gnu-social
fork de diogo/gnu-social


			
				
					
						
						
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203
							===============================
Requirements for PDF generation
===============================

Generation of PDF documents should be added to the document component. Those
PDF documents should be created from Docbook documents, which can be generated
from each markup available in the document component.

This document summarizes the requirements for PDF generation.

Central requirements
====================

The key requirements for the component

Layout
------

The central requirement is to generate user styles PDF documents from Docbook
markup. The customized styling should include, but is not limited to:

- Text
  - Fonts and text sizes
  - Line heights
  - Colors
  - Alignments

- Pages
  - Footer and headers
  - Background images
  - Multiple text columns per page
  - Page sizes
  - Margins and paddings

- Block level elements (tables, graphics, literal blocks, ...)
  - Borders, backgrounds, fonts, colors

It must be possible to assign different styles depending on the parent
elements in the Docbook markup, so that the titles in the following Docbook
document can be formatted differently::

	<?xml version="1.0"?>
	<article>
		<title>Article title</title>
		<section>
			<title>First heading</title>
			<section>
				<title>Second heading</title>
			</section>
		</section>
	</article>

It would be nice if the styles can be imported and exported from / to a easily
readable and writeable format.

Text formatting
---------------

Proper formatting of texts is most probably the biggest problem in the
implementation, the requirements include:

Hyphenation
^^^^^^^^^^^

Especially justified texts in narrow text columns requires hyphenation for
words, otherwise the blanks between characters and words might increase to
much. A pluggable hyphenation mechanism is required, which can be adapted to
different languages, based on externally available dictionaries.

Widows and orphans
^^^^^^^^^^^^^^^^^^

See: http://en.wikipedia.org/wiki/Widows_and_orphans

There should be ways to configure the thresholds under which paragraphs are
considered widows or orphans, which should be avoided.

Inline formatting
^^^^^^^^^^^^^^^^^

Depending on the used font and styles inline formatting might have a serious
effect on the text width. This MUST be respecting during text rendering.

LTF and RTL languages
^^^^^^^^^^^^^^^^^^^^^

The text wrapping must be able to work with left-to-right and right-to-left
languages.

Floating media objects
^^^^^^^^^^^^^^^^^^^^^^

For media objects, which do not span the whole column width, it should be
possible to float text around the media objects. Detection of the actual image
borders is not required - the rectangular frame around the image should be
sufficant for text floating.

Embedding of media
------------------

There are a lot of different media types, which might be embedded into PDF:
The most common format seem to be JPEG and EPS. JPEG is not suitable for
several types of graphics [1], and EPS can only be used properly for some
types of vector based images. Conversion options and supported formats must be
evaluated.

It might depend on the used driver which formats are supported.

PDI allows embedding of other PDFs inside the created PDF - this can be useful
when merging different generated documents.

.. [1] http://kore-nordmann.de/blog/image_formats.html

Metadata
--------

The document component already preserves metadata associated with documents.
PDF supports embedding additional document metadata. This should definitely
be embedded, but it might also be useful to offer a easy accessible API for
embedding of additional metadata. XMP is especially designed to embed metadata
using the RDF.

Autogenerated contents
----------------------

Headers and footers often contain some fixed texts, but might also contain
autogenerated contents, like:

- Current page / number of pages / page orientation (left, right)
- Current section title
- Author, read from document metadata

It must be possible to define callbacks which generate those contents for the
page they are currently rendered on. The best possible markup used for
generation of those contents needs to be evaluated.

There are several elements, which can require automatic generation, those are
at least:

- Header / Footer
- Cover page
- Table of contents
- Back page

For most of those elements a predefined generator can be implemented which
creates meaningful default contents, and then can be extended by the user.
Especially for cover and back pages it might be useful to include them
directly from other PDF documents.

Driver infrastructure
---------------------

There are multiple ways to generate PDF documents, like:

- pecl/libharu
- FPDF
- TCPDF
- pdflib
- Zend_PDF

It might depend on the environment which one of those libraries is available
and performs the best. A driver infrastructure should offer the user the
choice of selecting the best output driver for writing the actual PDF. Not all
of those drivers do support proper text wrapping themselves, so that this
cannot be handed over to the drivers.

Optional requirements
=====================

Once PDF rendering is implemented correctly, including correct rendering and
wrapping of texts, it might be useful in similar cases, for example:

- SVG to PDF conversion

  The conversion of SVG to PDF is used for distribution of heavily customized
  designed documents. With a proper rendering infrastructure the API should be
  kept flexible enough to support such conversions later

- HTML to PDF conversion

  It might be useful to directly convert styled HTML to PDF - if the API stays
  flexible enough this should be possible to add later.

  One major problem might be the used markup for formatting of inline text
  elements.

Import of PDF pages
-------------------

For cover pages (or similar) of documents it might be useful to extract whole
pages from other PDFs and embed them in the generated PDF document.

This requires reading of existing PDF documents, though - which is not planned
to be implemented yet.

Signing PDFs / write protection
-------------------------------

It is common to make PDF documents write protected or sign PDF documents. If
the respective PDF creation library can handle that, it should be exposed in
the API of the PDF creation.