Showing posts with label microsoft. Show all posts
Showing posts with label microsoft. Show all posts

Sunday, January 01, 2012

Merging Word documents with docx4j

Recently I needed to merge some Word .docx  documents and the tools that we chose for this was docx4j (www.docx4java.org). This is library for Java for working with Microsoft Open XML. There we two things that we needed to accomplish:

  1. Binding XML to various templates
  2. Merging documents as a result of the binding

I will write about merging documents as binding them is already explained well in docx4j site. Author of docx4j also offers commercial package for merging documents but if you want to try it for yourself, here are couple of things that I managed to do and got pretty decent results.

In version 2.7.1 docx4j you can work with java.util.File or java.io.InputStream. First one will do a god job if you have file present in your drive and second one if you keep content in the database (for example).  When merging Word documents you have to take care of relationships in the document itself. There are several elements that have relationships that can span through the document, but we were interested in just a few of them (images, footers and headers). It is worth to mention that if you miss one relationship, your document will be unreadable (in most cases). These are listed as resources and have references that you can use in your paragraphs.

So, to start we would:

  1. Load our initial file in WordprocessingMLPackage (this is the file where we want to attach the rest of the files, so in the end they look as one)
  2. Create unique section template
  3. Reset sections (this will serve the purpose of removing all references from the existing template, remember that section defines page layout)
  4. Remove body section (we can add this in the end)
  5. Loop through the attachment files (if you do not have sections separating pages, you might add page breaks)
  6. Copy relationships that you are interested in
  7. Copy elements
  8. If you do not want page breaks, then you can add empty section
  9. Add body section
  10. Reapply all headers and footers to empty sections

This all might sound complicated, but in the end, once you get to know the structure of the WordprocessingMLPackage, it becomes easier.

These are the code snippets that might be useful:

Note: All code displayed in upper window is property of Sapiens North America