A couple of years ago I found myself working somewhere I’d previously worked about seven years before that, and in close proximity to some software that I’d written then, which was still being used, essentially unchanged from the day I left. Which was cool, but slightly disconcerting — it’s not as if the software couldn’t have done with some spring cleaning in the intervening years.
Anyway, one day, one of the people using the software came to me with a strange problem — as if, even after seven years away, I was naturally the one to come to first. The job of the software was to process data from an earth-observation satellite, generating products that scientists could then do science with; the products were typically very large, multi-dimensional images or otherwise gridded data, so handling the storage of these products on output media was an issue. I’d set up a simple but reasonably robust system whereby the software would spread the products across a bank of available hard-drives, leaving enough space on each for the other needs of the system.
Some huge new multi-Terabyte drives had just been installed, but the system refused to write any products to them, insisting that they were already full. I did some poking around, and realised, with a combination of amazement and amusement, that the drives were simply too big for the machines the software was running on to handle. There are typically limits, imposed by a combination of the hardware and the operating system, to how large a number a system can represent, without either resorting to using floating-point arithmetic (with a loss of precision in very large numbers), or implementing some very clever software hack. Since these were still 32-bit machines, the largest integer that they could comfortably represent was (2 to the power of 32) minus 1, which is a bit more than 4 billion.
But even this range wasn’t available to me, because 4 billion requires an abandonment of the possibility of representing negative numbers — the negative integers can be sacrificed for a larger range of positive numbers if you know that what you’re representing is always going to be non-negative, like the available free space on a hard-drive. The system routine I was calling to figure out how much space was available on each drive — the only way to do this job — was returning a signed integer as its result, ranging, potentially, from about -2 billion to about +2 billion.
I remember all of us gathering around the first Gigabyte hard-drives we got hold of, at the time the software was first written, cooing and marvelling at how huge they were. The size of these state-of-the-art hard-drives was so far within the limit of the system to address that size, it was inconceivable (in the Princess Bride sense of the word) that that might not be true of any larger drives we might use later. And yet here we were. Brand-new, multi-Terabyte hard-drives were appearing to the system as full, because it couldn’t represent properly how big they were.
The problem is called ‘overflow’. It happens when a number gets too big to be represented, and literally overflows in some way the space that’s allocated for it. Exactly how the overflow manifests itself depends on how the number is being represented. When it’s a signed integer, and it gets too large and positive — as was the case here — the number overflows into the last bit of the word (the chunk of memory) it’s stored in. Since the last bit is typically used as a sign-bit, which identifies whether the number is positive or negative, integer overflow often causes the number to flip from large and positive to large and negative. That’s what happened with the new drives; they were empty, yet they appeared to have a large and negative amount of free space on them, because their size couldn’t be represented in a signed 32-bit integer without overflow.
Since the transition from Kilo to Mega, Mega to Giga, Giga to Tera and so on, is only ever a matter of time in digital circles, you might see our failure to plan for multi-Terabyte hard-drives as a kind of Y2K bug, and I think you’d be mostly right about that. (The solution we implemented, by the way, was to reformat/repartition the drives so that they appeared to be smaller than they actually were — small enough that their size was representable within the 32-bit words the system was built around.)
Courtesy of Danny O’Brien’s characteristically sarcastic account [scroll down to ‘The Voting Machines’] of this piece in the Palm Beach Post, here’s basically the very same effect, on a smaller scale: integer overflow in electronic voting machines. The reason that the overflow occurs at about 32,000, rather than about 4 billion — and the source of Danny’s sarcasm — is that the programmer had allocated only 16-bit words to the integer vote tallies, and (2 to the power of 15) minus 1 is 32,767. Why only 16-bit words, now that even 64-bit machines (with integer ranges somewhere in the billion billions) sit on desktops? No idea. It’s obviously spectacularly clueless programming — and would remain so even if the problem hadn’t been known as early as 2002.
I don’t have the heart to try to present this in some eloquent form. Voting machines shouldn’t use software, because software fucks up, and software fucks up because people fuck up. Moreover, software hides fuck-ups within the density of its code and the blithe confidence with which it presents results both correct and incorrect, and treats those two impostors just the same. The solution isn’t software engineering rigour. It’s not more programmers, and it’s not more funding. The solution is to ditch machines entirely for this job. It matters too much to compromise.
So why are electronic voting machines becoming more and more popular? As with most questions one might ask about America, dig deep enough and the answer will be a simple one: money. No-one’s going to build a successful business from paper ballots and stubby pencils.