The Word Is POI
POI stands for Poor Obfuscation Implementation. The POI subproject HSSF (Horrible Spread Sheet Format) can access MS Excel files and HSD (Horrible Document Format), you guessed it, can access Word documents. I am working on a project and I am researching what POI can do for me given a Word document. I didn’t find a lot of example online so I deceived to look into the POI source code. Using the WordDocument class from the org.apache.poi.hdf.extractor package found in the poi-scratchpad 2.5.1.jar file I was able to write out the contents of a Word document as a plain text.
WordDocument doc = new WordDocument("word.doc");
Writer writer = new BufferedWriter(
new FileWriter("text.txt"));
doc.writeAllText(writer);
writer.flush();
writer.close();
You can also print all the content of the Word document as a XSL-FO file, except that you have to fix a null pointer exception. The code for this is horrible to read with little or no comments. In the end, what I want to do is generate custom XML file given a Word document and I was able to hack what I wanted to do but I had to refactor the hell out of this code.
















Dear frnd
if u had got solution for fixing nullpointer exception plz help me for that
@Amol - After investigating POI we decided that we really didn’t want to go that route. The nullpointer is obvious if you run the WordDocument, just step through it with the debugger. Sorry, this wasn’t helpful.
this is excellent,the material u have given is very nice and understandable
pl send me sample code of how to convert a jsp page to a word document