Part One!
Don’t miss next issue, subscribe on page 16!
Part One!
Don’t miss next issue, subscribe on page 16!
OUR EXPERT
Marco Fioretti
is a long-time open source trainer and writer, and an aspiring polymath.
Credit: https://groups.oasis-open.org
QUICK TIPS
Whatever format they are in, if you are going to process lots of files with shell scripts, give them names without spaces. It will make the scripts more robust, and easier to write or debug.
QUICK TIPS
Whatever format they are in, if you are going to process lots of files with shell scripts, give them names without spaces. It will make the scripts more robust, and easier to write or debug.
So-called office documents, the collective name given to complex text files with rich formatting, spreadsheets and presentations, are a necessary evil of modern life. Creating such files without any programming skill is easy with suites such as LibreOffice. However, using such tools to manually create large numbers of documents, spreadsheets or slideshows can be really time-consuming. Ditto for creating many versions of the same document, each with different values of one or more variables.
This two-part tutorial introduces a quick and dirty – but very flexible – general and important approach to solve this very problem, based on the OpenDocument Format (ODF) for office documents and really simple scripts. We say important for two reasons: first because, as you will see shortly, the data inserted in office documents with this method can be text (including whole files) and images of all sorts, generated on the fly or automatically extracted from all conceivable sources, from databases to email archives or scraped web pages. The second reason is its simplicity, which places it well within the reach of anybody with little time but a basic knowledge of scripting languages.
This first part of this tutorial explains what ODF is and why it’s made to order for the automatic creation of large quantities of similar documents, and demonstrates with a practical example how to create from the command line multiple text documents identical to those you could have produced with LibreOffice, Microsoft Office and similar programs. The second part will demonstrate how to extend the same approach to spreadsheets and presentations.
Internal ODF structure
To make a long, fascinating and very important story short (see the ODF history boxout, page 59, for more), ODF is the native format of LibreOffice and OpenOffice for texts, spreadsheets and slideshows, which Microsoft would have loved to destroy. You may think that a format suitable for such sophisticated programs would be very complex, and ODF files would be black boxes full of indecipherable bits, but that’s not true, and it’s the source of ODF relevance. In a nutshell, and despite its extension, any ODF file is nothing more than a standard ZIP archive of a few folders and files, for the most part formatted in the very verbose but plain text eXtensible Markup Language, or XML for short.
Figure 1: A simple ODF text document template, created with LibreOffice, that contains only the placeholders for the text and images to be replaced automatically.
To learn how ODF works in the simplest and shortest possible way, let’s start from Figure 1 (above), which shows the template file created with LibreOffice as the starting point for our practical example. It’s a one-page text, containing just placeholders for a title, a paragraph of text, and finally an image, which for consistence is a screenshot of an IMAGE_HERE string, but could be anything you want, as long as it’s an image. The example will show how fewer than 30 lines of code can clone that file countless times, replacing TITLE_HERE, TEXT_HERE and the image with different texts and images each time.