Word To PDF

Why isn’t there are good and simple command line doc2pdf application? I just can’t find any good command line programs that can faithfully produce a PDF document given a Word document. There are a lot of commercial and some open source applications that can create a PDF document but I can’t find a simple command line tool that does this. For example, PDFCreator is an open source application that allows you to create a PDF document from Word by ‘printing’ the document to a virtual PDFCreator printer. Several commercially available Word to PDF solutions do this same thing; installing a ‘printer’ to print a document as a PDF. This solution is really a hack that exploits the fact that documents sent to the printer need to be transformed to what is essentially PostScript. Once you have a document in its PostScript format you can create a PDF using Adobe Acrocat Distiller or GhostScript’s ps2pdf.cmd batch file.

PDFCreator does not provide a nice command line interface but that is easy to get past that limitation with some simple Visual Basic. You can write some simple Visual Basic script code that opens a Word document, sets the default printer to PDFCreator, and ‘prints’ out document allowing PDFCreator to create a PDF for you. You might want to edit the PDFCreator’s auto-save options otherwise you will be prompted where to save the new PDF. Here is some sample Visual Basic code that does just what I described above.

Set word = CreateObject("Word.Application")
Set docs = wdo.Documents

' Remember current active printer
Set sPrevPrinter = wdo.ActivePrinter

' Select the PDFCreator as your printer
word.ActivePrinter = "PDFCreator"

' Open the Word document
Set document = docs.Open(sMyDocumentFile)

' Print the document file to the PDFCreator
word.ActiveDocument.PrintOut

document.Close WdDoNotSaveChanges
word.ActivePrinter = sPrevPrinter
word.Quit WdDoNotSaveChanges

For completeness sakes let me mention how to create a PDF document using the Apache POI project. You can of course convert a Word document to PDF using the Apache POI API. Using POI you can create a XSL-FO version of your document which can be transformed into a PDF using Apache FOP. It has been my experience that the results generated by POI are not perfect but here is some code for you go get started. The POI scratch pad jar contains a WordDocument class that will create a XSL-FO version of the Word document. The WordDocment might have been intended to be just a command line application because it throws a NullPointerException if you try to use it in your code so you will have to modify this class. Once you fix the exception you can code the following two lines to produce an XSL-FO for a given Word document:

WordDocument file = new WordDocument(wordDocumentPath);
file.closeDoc();

Of course once you have the XSL-FO version of your document you can transform it to a PDF using Apache FOP. One word of warning, the WordDocument class is in the scratch pad jar and might not be as stable as you might think.