ODF
Auto-generate ODF documents
Marco Fioretti demonstrates how you can create the same office files that you can make with LibreOffice – but without LibreOffice!
Part One!
Don’t miss next issue, subscribe on page 16!
OUR EXPERT
Marco Fioretti
is a long-time open source trainer and writer, and an aspiring polymath.
Credit: https://groups.oasis-open.org
QUICK TIPS
Whatever format they are in, if you are going to process lots of files with shell scripts, give them names without spaces. It will make the scripts more robust, and easier to write or debug.
So-called office documents, the collective name given to complex text files with rich formatting, spreadsheets and presentations, are a necessary evil of modern life. Creating such files without any programming skill is easy with suites such as LibreOffice. However, using such tools to manually create large numbers of documents, spreadsheets or slideshows can be really time-consuming. Ditto for creating many versions of the same document, each with different values of one or more variables.
This two-part tutorial introduces a quick and dirty – but very flexible – general and important approach to solve this very problem, based on the OpenDocument Format (ODF) for office documents and really simple scripts. We say important for two reasons: first because, as you will see shortly, the data inserted in office documents with this method can be text (including whole files) and images of all sorts, generated on the fly or automatically extracted from all conceivable sources, from databases to email archives or scraped web pages. The second reason is its simplicity, which places it well within the reach of anybody with little time but a basic knowledge of scripting languages.
This first part of this tutorial explains what ODF is and why it’s made to order for the automatic creation of large quantities of similar documents, and demonstrates with a practical example how to create from the command line multiple text documents identical to those you could have produced with LibreOffice, Microsoft Office and similar programs. The second part will demonstrate how to extend the same approach to spreadsheets and presentations.
Internal ODF structure
To make a long, fascinating and very important story short (see the ODF history boxout, page 59, for more), ODF is the native format of LibreOffice and OpenOffice for texts, spreadsheets and slideshows, which Microsoft would have loved to destroy. You may think that a format suitable for such sophisticated programs would be very complex, and ODF files would be black boxes full of indecipherable bits, but that’s not true, and it’s the source of ODF relevance. In a nutshell, and despite its extension, any ODF file is nothing more than a standard ZIP archive of a few folders and files, for the most part formatted in the very verbose but plain text eXtensible Markup Language, or XML for short.