On 4 June 1996, the maiden flight of the Ariane 5 launcher didn’t go well. 40 seconds after take-off, the massive rocket suddenly veered from its flight path and exploded. The cause was a tiny software error: a floating-point number represented using 64 bits was converted to a 16-bit signed integer, but the conversion failed as the number was larger than 32,767 – the maximum that 16 bits could represent. This overflow error caused the software to dump debugging data into the area of memory being used to control the rocket’s engines. The backup computer did no better, with the result that the rocket lost control and came to a fiery end.
In 2015 it was reported that tests had revealed a similar overflow error could shut down the electricity of Boeing 787 aircraft if their generator control units were on for 248 days continuously. Under these circumstances their software counters reached 2,147,483,647 – the maximum value for a 32-bit signed register. Turning them off and on would reset the counter to make them work again and, luckily, it never led to disasters, in the way the much faultier software of the 737 Max did, three years later.
While overflow errors like these are similar to rounding errors, there’s a subtle difference. Instead of a number being too big, a rounding error is typically caused when a number is inaccurately calculated and stored in binary. For example, the results of some calculations are irrational numbers: like the number Pi (3.14159265…). It never ends so we have to approximate its value, perhaps as just 3.142. Even simple calculations such as 2/3 in decimal can’t be written down precisely and may have to be the equivalent of 0.667 in binary. Continue to perform calculations like this and the tiny errors accumulate, until they add up to be significant.