Word To PDF

Why isn’t there are good and simple command line doc2pdf application? I just can’t find any good command line programs that can faithfully produce a PDF document given a Word document. There are a lot of commercial and some open source applications that can create a PDF document but I can’t find a simple command line tool that does this. For example, PDFCreator is an open source application that allows you to create a PDF document from Word by ‘printing’ the document to a virtual PDFCreator printer. Several commercially available Word to PDF solutions do this same thing; installing a ‘printer’ to print a document as a PDF. This solution is really a hack that exploits the fact that documents sent to the printer need to be transformed to what is essentially PostScript. Once you have a document in its PostScript format you can create a PDF using Adobe Acrocat Distiller or GhostScript’s ps2pdf.cmd batch file.

PDFCreator does not provide a nice command line interface but that is easy to get past that limitation with some simple Visual Basic. You can write some simple Visual Basic script code that opens a Word document, sets the default printer to PDFCreator, and ‘prints’ out document allowing PDFCreator to create a PDF for you. You might want to edit the PDFCreator’s auto-save options otherwise you will be prompted where to save the new PDF. Here is some sample Visual Basic code that does just what I described above.

Set word = CreateObject("Word.Application")
Set docs = wdo.Documents

' Remember current active printer
Set sPrevPrinter = wdo.ActivePrinter

' Select the PDFCreator as your printer
word.ActivePrinter = "PDFCreator"

' Open the Word document
Set document = docs.Open(sMyDocumentFile)

' Print the document file to the PDFCreator
word.ActiveDocument.PrintOut

document.Close WdDoNotSaveChanges
word.ActivePrinter = sPrevPrinter
word.Quit WdDoNotSaveChanges

For completeness sakes let me mention how to create a PDF document using the Apache POI project. You can of course convert a Word document to PDF using the Apache POI API. Using POI you can create a XSL-FO version of your document which can be transformed into a PDF using Apache FOP. It has been my experience that the results generated by POI are not perfect but here is some code for you go get started. The POI scratch pad jar contains a WordDocument class that will create a XSL-FO version of the Word document. The WordDocment might have been intended to be just a command line application because it throws a NullPointerException if you try to use it in your code so you will have to modify this class. Once you fix the exception you can code the following two lines to produce an XSL-FO for a given Word document:

WordDocument file = new WordDocument(wordDocumentPath);
file.closeDoc();

Of course once you have the XSL-FO version of your document you can transform it to a PDF using Apache FOP. One word of warning, the WordDocument class is in the scratch pad jar and might not be as stable as you might think.

Java Five-Oh #1: For/In »
« Decode Java
 
Related Posts
Recent Posts
 

4 Comments so far

  1. Shradha on April 19th, 2007

    Hi I need to convert a word documnet to a pdf document, before saving this documnet in database as blob datatype, and i m java developer so i want to know is there any api or any jar that i can use to fulfil my requirement.

  2. Clark on May 14th, 2007

    Hi, I am a french speaker for beginning. I have seen your code to convert a word document to pdf. My problem is that, I need to convert an Excel graphic to pdf automatically with a VBS code via pdfcreator

    I have done it (I have create the PDF) but I want to save it in a new name and a new Folder. The problem is taht I don’t know how to ?

    My code
    ——-

    Dim xlapp, classeur, feuille, Doc, pdf
    
    'I open Excel
    Set xlapp = CreateObject("Excel.Application")
    
    'It stay visible
    xlapp.Visible = True
    
    'I open an Excel Sheet
    Set Doc = xlApp.WorkBooks.open("C:\path\to\file.xls")
    
    'I define the printer which will be use
    ActivePrinter="PDFCreator"
    
    'I print the graph1 in the sheet of the open workbook Doc
    Doc.Sheets("Graph1").PrintOut
    
    'je définis les taches automatiques qui seront faites aprés
    'l'impression: retrouver automatiquement le chemin de sauvegarde
    'renommer automatiquement le pdf crée
    
    'THE PROBLEM IS HERE. THIS DOES NOT WORK NORMALLY.
    'THE FILE IS NOT RENAMED.
    'THE FIRST TIME IT FINDS THE AUTOSAVEDIRECTORY,
    'BUT WHEN I CHANGE IT IT STAY IN THE OLD DIRECTORY.
    
    'Repertoire dans lequel l'enregistrement sera fait
    AutosaveDirectory = "C:\path\to\autosave\directory"
    ' Nom sous lequel l'enregistrement sera fait
    AutosaveFilename = "impdf"
    UseAutosave = 1
    UseAutosaveDirectory = 1
    UseAutosaveFilename = 1
    AutosaveFormat = 0 ' PDF
    
    'Close and Exit the application
    Doc.Close
    xlApp.quit
    

    Merci d’avance de m’aider

  3. TechKnow on May 14th, 2007

    Clark - Thanks for sharing your experience/code with PDFCreator. So you still want to place the created PDF document to a new location? Why don’t you just move the file where you need it after you quit excel? Can’t you dynamically set the autosave directory? Unless I have missed something those seem like good solutions.

  4. aj on June 12th, 2007

    A great tool for PDF on the cheap Ive found is PDF995. There is a command line (underlying workings of the project are based on the open source Ghost Script) included in the copies I have used, and it makes for simple, fast, easy PDF creation and collation. (a daunting task in my experience). As shareware, you can try before you buy with only a pop-up ad as limitation… well worth a look. www.pdf995.com

Leave a reply