martes, 29 de mayo de 2012

Creating a PDF from Website Content

I recently had a requirement to convert website content to a PDF file. First I thought it was going to be a lot of coding. I would have to make every page component look the way it should on a printable document and generate a nice PDF file from the whole page.

I did not want to spend too much time. There must be an easier way. How else could the Save as PDF button in Safari do such a good job printing a page without knowing anything about my Web application?

Searching for a shortcut I came across the Flying Saucer Project:
Flying Saucer is an XML/CSS renderer, which means it takes XML files as input, applies formatting and styling using CSS, and generates a rendered representation of that XML as output. The output may go to a PDF file. -- Flying Saucer User's Guide
Very impressive work! Especially considering how much it can accomplish with very little code and a print style sheet.

Using the Flying Saucer library in a servlet to access the website anonymously worked OK. But when I tried authenticated access Magnolia CMS resources, everything stopped working. Fortunately, this was easily resolved with the callback class that the Flying Saucer Project offers. I must confess that I overlooked it first. The callback class not only allows you to decide how to access the website resources but also lets you decide how to render the images and styles.

The rest of the work such as which elements to display, where to display them, the page size and page breaks are specified in the style sheet using the W3C Paged Media syntax.

See linked document on how would the document look like by printing the About section of the demo project with all pages in the same PDF document.

For this example, I used the standard print.css that comes with the demo-project theme-pop, adding just a style for the images so it does not make a page break in the middle of the image:

img { page-break-inside: avoid; }

Many thanks to the Flying Saucer Project developers for providing this amazing utility!