Localising Linux
LOCAL LINUX FOR LOCAL USERS
You’re not from around here, says Aaron Peters but that’s fine! As he’s here to explain how your distro of choice gets to work in any language.
Linux is the result of the work of thousands of developers worldwide. It’s only right that it has support for the languages all those volunteers speak, to say nothing of its users. The multilingual question is how does open source software support not just translations for all the different languages of the world, but its different and varied alphabets alongside notations. How you represent decimal places or thousandth separators, for instance?
Translation work is a vital aspect of enabling the take-up of open source software and Linux-based distros around the world. While English comprehension is often a perceived requirement for dev work, the truth is much of the world doesn’t speak English, let alone read it. So knowing how FOSS handles translations, more often called localisation, is key to growing wider adoption. Being a good open source citizen means being supportive of all these brethren, so let’s explore the localisation features of Linux!
The first of the interconnected components is character encoding. Since at the end of it all computers only understand zeros and ones, there needs to be a convention for some number of these zeros and ones to be converted to more meaningful characters. The character encoding is this convention.
Standards, standards, standards
The ASCII standard is one of the earliest standards. In ASCII every seven bits represents a single character, or more correctly, a keypress. If you examine the characters in this encoding scheme, you’ll see common keystrokes such as Return (or Line Feed, 10 in ASCII) and Escape (27 in ASCII) alongside characters like uppercase M (77), lowercase g (103) and the closing brace: }, or 125 in ASCII.
So how does this pertain to localising your Linux system? Well, consider that ASCII only contains the lower- and uppercase for the 26 “standard” Roman letters, along with some specific punctuation. But you may speak a language that uses diacritics, such as Spanish. This in turn requires more entries in the standard to represent the letter ‘a’ both with and without its accent marks (á, à and a, respectively).
This resulted in a large number of standards as part of the IEC/ISO 8859 (commonly written ISO-8859) covering languages primarily in Europe. Some examples are as follows:
ISO-8859-1 covers the majority of romance languages, Scandinavian languages and Celtic tongues, as well as languages in Africa and Asia.
ISO-8859-2 supports Eastern European languages based on the Latin alphabet, such as Polish, Czech and Hungarian.