W3C home > Mailing lists > Public > www-html@w3.org > February 2002

Re: html to pdf conversion

From: Christian Wolfgang Hujer <Christian.Hujer@itcqis.com>
Date: Tue, 12 Feb 2002 11:30:39 +0100
To: phantom kr <me_the_phantom@yahoo.com>, www-html@w3c.org
Message-ID: <16aaG5-2AbiL2C@fmrl02.sul.t-online.com>
Hash: SHA1


Am Dienstag, 12. Februar 2002 07:06 schrieb phantom kr:
> where can i find source code for converting html files
> to pdf format, source code being in java.

several tools exist for that purpose.

I searched in google for "HTML Java PDF Conversion".

I found iText, which is an open source PDF library in Java that already has 
capabilities of converting HTML to PDF.
are links to several other PDF engines in Java.
iText is an open source project hosted on sourceforge.

But the "modern art of HTML2PDF conversion" is the 
1. Make sure the HTML files are valid XHTML (or at least well-formed XML).
2. Use a transformation stylesheet that transforms HTML to XSL Formatting 
Objects using XSL Transformation. Apply that stylesheet using an XSLT 
processor like xt, xalan, saxon...
3. Run a Formatting Objects engine (like FOP from James Tauber / Apache 
Group) that converts the generated FO-Tree from step 2 to PDF.

Most XSLT processors and most Formatting Objects engines are written in Java, 
including all mentioned products (xt, xalan, saxon, fop) and come with their 
source code.

The following points must be kept in mind:
- - Knowledge
XSLT and XSL:FO are 1-4 new languages to learn (depends on the point of view 
and your knowledge, my opinion is that it's four languages altogether: XML, 
XPath, XSLT and XSL:FO, but these are quite easy languages)
- - Development speed
XSLT Stylesheets and XSL Formatting are quite easy to learn and very quick to 
It takes only few time to write an XSLT stylesheet.
- - Servlet Usage
XSLT and XSL:FO are usable as servlets.
- - Performance
Java native library performance might be considerably better.
- - The XSLT and XSL:FO is highly configurable. You control nearly every pixel 
(resp. point) of the resulting PDF.

But I've never used the non-XSL:FO way, so I can't say much about that.

Just my 2 cents.


- -- 
Christian Wolfgang Hujer
Geschäftsführender Gesellschafter
Telefon: +49 (089) 27 37 04 37
Telefax: +49 (089) 27 37 04 39
E-Mail: mailto:Christian.Hujer@itcqis.com
WWW: http://www.itcqis.com/
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

Received on Tuesday, 12 February 2002 05:34:23 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 15 July 2018 06:07:48 UTC