So, the problem is this : You let your customers enter as little as 1 sentence or as much as 50 pages of text into a rich text area, with formatting, tables, images, etc. Then, they want to output their fluid body of a document into a template of sorts with headers/footers and better yet, in PDF Format, for emailing to customers!
Well, HTML doesn't know anything about page breaks, and I don't know about you, but programming to the PDF standard, or even using FPDF is a nightmare I wouldn't wish on any developer, especially not if you want your users to be able to create content, that's later saved in PDF Format.
I had no luck with the HTML to PDF Converters; they lack CSS support almost completely, don't handle images particularly well, and if an integration package that's supposed to make my life easier has me pulling my hair out after 6 hours working on a simple sample that they provided, I tend to look for alternate solutions.
Enter : Microsoft Word 2007.
Tho previous "new releases" of word carried with them only a nicer logo, and 5 new features buried under 30 menus that no one needs any way, it appears that Word 2007 has introduced an actual improvement that will make my job here a lot easier : Open XML.
See, previously a Word Document was encoded in some way, so generating an actual word document was near impossible; especially for those of us who don't use Microsoft programming languages. Word did, however, do a nice job of taking my HTML Document (named with .doc on it), and converting it to look just like a Word document when opened in Word. There were some issues here tho (Save As... "Web Page" ??; and no support for Headers, Footers, Water marks, etc.) which didn't make it much more attractive as a final format. Meaning it didn't add anything I couldn't already do with just straight HTML.
Word 2007, however, has changed that landscape quite a bit. DOCX files (the new, Word 2007 document extension) is now just a .zip file! Upon unzipping one, you will find the insides quite interesting. Heavy use of XML, and document relationships which point one XML document to another in the tiered directory structure inside your Word Document. This structured format is termed OpenXML.
The prior "mystery encoding" is gone, and we are left with something that humans can understand.. for the most part :-) The document.xml file (the body of the actual document) still uses XML to describe the body of your document, which would leave me in a position much like writing to fpdf or some other non-html structure.
So, Treff and I set out trying to figure out how Microsoft could go to an open standard like this, support converting HTML to it's native format (for display inside Word) without allowing us to feed it HTML, embed the HTML inside the docx file in some way. Were they really that... unhelpful?
Turns out, the answer is no... they weren't! It's actually quite easy to take normal HTML, and merge it into a Word Document with Header, Footers and Water marks and these elements stay in tact, while the body of the document changes to what you added to the HTML. AND... once opened, Word converts this HTML to it's document.xml "Voodoo" syntax, and saves as a normal .docx document, or as a Word 2003/XP format... OR as a PDF!
It took me a while to get all the pieces to come together, but at this point the process to create a PDF with Headers, Footers and Water mark from HTML input involves :
- Create your headers/footers/watermark in Word 2007, save as DOCX
- Unzip the document
- Follow this example for adding HTML to it
- Save all files, ZIP, change name to .docx.
- Open the document in Word
- Save as PDF
If only PDF had an open standard like that, we could skip all the remaining silly steps.
No comments:
Post a Comment