### Last digits analysis of UK and Iranian elections

After all the Benford's Law posts I've made, I read an interesting article by Alexandra Scacco and Bernd Beber in the Washington Post about analyzing the last two digits of election results.

Their thesis is that the last two digits should be random (i.e. equally likely) in genuine results, and would be non-random in faked results because people have biases about which numbers they come up with when thinking of 'random' numbers.

Scacco and Beber have an annotated version of the article available and their actual paper.

I thought this was pretty interesting I decided to run my own analysis on the 2001 UK General Election and the 2009 Iranian Presidential Election. Scacco and Beber used a smaller set of Iranian data (the first set that I used) and so I ran my reanalysis against the larger set from per-country returns.

Start with the UK. The following chart shows the expected distribution of last and second-to-last digits (i.e. a uniform distribution: all digits are equally likely) and the actual counts. A quick application of the chi-squared test shows that there's a good fit: we can't reject the hypothesis that the UK digits are uniformly distributed (i.e. random).

Now switch to Iran. Once again I show the exact same analysis of the vote counts across all candidates across the country. And once again the chi-squared test shows that these are random.

These two show random distribution. The chi-squared test confirms that (the actual values are 11.125 for the last digit and 4.875 for the second to last digit. With 9 degrees of freedom the critical cut off point is 16.92 and neither of these exceeds that so we cannot reject the hypothesis that the Iranian digits are uniformly distributed.

It's an intriguing idea that just by looking at the numbers it would be possible to detect election fraud, but it equally seems to me that you could cherry pick your data to come up with your viewpoint.

For example, in my analysis the UK election is not Benford's Law distributed but the Iranian one is. Which is fraudulent? Either, both, neither?

Also, my analysis shows that both the UK and the Iranian election have randomly distributed last digits. Are either fraudulent? Or neither?

I think what's needed is a large scale analysis of election results to see where and when different mathematical tests work. Otherwise the correct preconditions aren't established (e.g. in the UK election Benford's Law probably fails because of the redistribution of constituencies) and you can end up finding your favorite conclusion in the data.

Their thesis is that the last two digits should be random (i.e. equally likely) in genuine results, and would be non-random in faked results because people have biases about which numbers they come up with when thinking of 'random' numbers.

Scacco and Beber have an annotated version of the article available and their actual paper.

I thought this was pretty interesting I decided to run my own analysis on the 2001 UK General Election and the 2009 Iranian Presidential Election. Scacco and Beber used a smaller set of Iranian data (the first set that I used) and so I ran my reanalysis against the larger set from per-country returns.

Start with the UK. The following chart shows the expected distribution of last and second-to-last digits (i.e. a uniform distribution: all digits are equally likely) and the actual counts. A quick application of the chi-squared test shows that there's a good fit: we can't reject the hypothesis that the UK digits are uniformly distributed (i.e. random).

Now switch to Iran. Once again I show the exact same analysis of the vote counts across all candidates across the country. And once again the chi-squared test shows that these are random.

These two show random distribution. The chi-squared test confirms that (the actual values are 11.125 for the last digit and 4.875 for the second to last digit. With 9 degrees of freedom the critical cut off point is 16.92 and neither of these exceeds that so we cannot reject the hypothesis that the Iranian digits are uniformly distributed.

It's an intriguing idea that just by looking at the numbers it would be possible to detect election fraud, but it equally seems to me that you could cherry pick your data to come up with your viewpoint.

For example, in my analysis the UK election is not Benford's Law distributed but the Iranian one is. Which is fraudulent? Either, both, neither?

Also, my analysis shows that both the UK and the Iranian election have randomly distributed last digits. Are either fraudulent? Or neither?

I think what's needed is a large scale analysis of election results to see where and when different mathematical tests work. Otherwise the correct preconditions aren't established (e.g. in the UK election Benford's Law probably fails because of the redistribution of constituencies) and you can end up finding your favorite conclusion in the data.

Labels: pseudo-randomness

## 3 Comments:

Even if you used the same set of data, you wouldn't find fraud using your method. You, correctly, test the null hypothesis "the last digit is picked from a uniform distribution." They, incorrectly, see which numbers show up more and less frequently than others, calculate the probability of the occurrence. Identifying a rare event in a random sequence is simple, and it's trivial to think of dozens of events that are equivalent to the one observed in this article.

If you look at the second-to-last digit in their US 2008 data set you'll see how ridiculous the article is.

I think the Iranian election was tainted with some degree of fraud; I'm sure this article is either fraudulent or totally misguided.

Check the comments section on that article in the Washington Post. Your point, and others as well, are made. Alack and alas, no one seems to care. The article is spreading all over the planet without any one actually reading what those with some knowledge have to say.

What's sad is that "some knowledge" is a generous description when it comes to me. I've never taken a course in statistics; this is a textbook example of how it's dangerous for an investigator to "know" the answer. Send this article to anyone who's ever applied a statistical test (and had a clue what they were doing) with just the numbers and the political context stripped out and it would be instantly recognized for what it is.

Post a Comment

## Links to this post:

Create a Link

<< Home