### The Scacco/Beber analysis of the Iranian election is bogus

OK, I wasn't going to write another blog entry about the 2009 Iranian election, but the article in the Washington Post that supposedly gives statistical evidence for vote fraud just won't die in the blogosphere and just got a boost from a tweet by Tim O'Reilly.

The trouble is the analysis is bogus.

The authors propose a simple hypothesis: the last and second-to-last digits of vote counts should be random. In statistical terms this is often called uniformly distributed, which just means that they are each equally likely. So you'd expect to see 10% 0s, 10% 1s, 10% 2s, and so on.

Of course, you only expect to see that if you had an infinite number of vote counts because the point about random processes is that they only 'even out' to the expected probabilities in the long run. So if you've got a short run of numbers you have to be careful because they won't actually be exactly uniform.

To confirm that try tossing a coin six times. Did it come up with exactly 3 heads and 3 tails? Probably not, but that doesn't mean it's unfair.

Now, given some run of numbers (vote counts for example), the right thing to do is ask the statistical question "Could these numbers have occurred from a random process?" If they couldn't then you can go looking for some other reason (e.g. fraud).

The question "Could these numbers have occurred from a random process?" is given the ugly name the 'null hypothesis' by stats-heads. That just means that thing you are testing.

More concretely, the Scacco/Beber null hypothesis is "the last and second-to-last digits in the vote counts are random". What you want to know is with what confidence can you reject this, and for Scacco/Beber rejecting means fraud.

Now, what you don't do is go count the last and second-to-last digits, look for some that have counts that deviate from what you expect (the exactly 10% figure) and then try to work out how often that happens. That's like tossing a coin a few times, noticing that heads has come up more than 50% of the time and then starting to think the coin is biased.

Unfortunately, that's essentially what Scacco/Beber did. They picked on two numbers that lay outside their expected value and went off to calculate how frequently that would occur. That's cherrypicking the data.

What you do do is apply a chi-square test to figure out whether the numbers you are seeing could have been generated by a random process. And you use that test because it gives you the probability with which you can reject your null hypothesis.

To prevent you, dear reader, from having to run the test I've done it for you. I took their data and wrote a little program to do the calculation against the last and second-to-last digits. Here's the program:

Here's a little CSV table that you can steal to do your own analysis:

And true enough I get the same figures as Scacco/Beber. The number 7 does occur 17% of the time in the last digit, and the number 5 only occurs 4% of the time. But, I don't care. What I want to know is, is the null hypothesis wrong. Could these results have occurred from a random process? And with what likelihood.

So here's where I avoid staring at the numbers (which can get to be borderline numerology) and do the chi-square test.

For the last digit the magic chi-square number is (drum roll, please): 15.55 and for the second-to-last digit it's 9.33. Then I go to my chi-square table and I look at the row for 9 degrees of freedom (that corresponds to the 10 possible digits; if you want to know why it's 9 and not 10 go read up on the subject) and I see that the critical value is 16.92.

If either of my numbers exceeded 16.92 then I'd have high confidence (greater than 95%) that the digit counts were not random. But neither do. I cannot with confidence reject the null hypothesis, I cannot with confidence say that these numbers are not random, and I cannot with confidence, therefore, conclude that the vote counts are fraudulent.

What this means is, is that there is no 'statistically significant' difference between the Iranian results and randomness. So, what we learn is that this statistical analysis tells us nothing.

It doesn't mean that the numbers weren't fiddled, it just means that we haven't found evidence fiddling.

PS In the notes added to their annotated version of the article Scacco/Beber mention that they did the chi-square test and got a p-value of 0.077. This is below the 'statistical significance' cut off of 0.05 and so their results are (as I find) not statistically significant.

~~To put 0.077 in context it means that there's a 7.7% chance that the digits are random. Sounds small but 7.7 is approximately 8 in 100 or 4 in 50 or 2 in 25 or ... 1 in 12.5. i.e. in 1 in every 12.5 fair elections we shouldn't be surprised to see the sort of figures we saw in Iran. That's pretty often! That's why chi-square tells us not to find non-randomness in the Iranian results.~~

30 June 2009 Update: I've removed that paragraph because that interpretation of the p-value is arguably inaccurate and if you are a statistician you'd probably shout at me about it. Doesn't change the fact that the data says the Iranian result is not statistically significant; it just says that my attempt to do a 'layman's version' is faulty.

To come up with better layman's version I ran a little simulation to find out how often you'd expect to see one digit occurring more than 17% of the time with another occurring less than 4% of the time (as in the Iranian election). The answer is about 1.48% of the time, or in about 1 in 67 fair elections.

The trouble is the analysis is bogus.

The authors propose a simple hypothesis: the last and second-to-last digits of vote counts should be random. In statistical terms this is often called uniformly distributed, which just means that they are each equally likely. So you'd expect to see 10% 0s, 10% 1s, 10% 2s, and so on.

Of course, you only expect to see that if you had an infinite number of vote counts because the point about random processes is that they only 'even out' to the expected probabilities in the long run. So if you've got a short run of numbers you have to be careful because they won't actually be exactly uniform.

To confirm that try tossing a coin six times. Did it come up with exactly 3 heads and 3 tails? Probably not, but that doesn't mean it's unfair.

Now, given some run of numbers (vote counts for example), the right thing to do is ask the statistical question "Could these numbers have occurred from a random process?" If they couldn't then you can go looking for some other reason (e.g. fraud).

The question "Could these numbers have occurred from a random process?" is given the ugly name the 'null hypothesis' by stats-heads. That just means that thing you are testing.

More concretely, the Scacco/Beber null hypothesis is "the last and second-to-last digits in the vote counts are random". What you want to know is with what confidence can you reject this, and for Scacco/Beber rejecting means fraud.

Now, what you don't do is go count the last and second-to-last digits, look for some that have counts that deviate from what you expect (the exactly 10% figure) and then try to work out how often that happens. That's like tossing a coin a few times, noticing that heads has come up more than 50% of the time and then starting to think the coin is biased.

Unfortunately, that's essentially what Scacco/Beber did. They picked on two numbers that lay outside their expected value and went off to calculate how frequently that would occur. That's cherrypicking the data.

What you do do is apply a chi-square test to figure out whether the numbers you are seeing could have been generated by a random process. And you use that test because it gives you the probability with which you can reject your null hypothesis.

To prevent you, dear reader, from having to run the test I've done it for you. I took their data and wrote a little program to do the calculation against the last and second-to-last digits. Here's the program:

use strict;

use warnings;

use Text::CSV;

my $csv = Text::CSV->new();

my %la;

my %sl;

foreach my $i (0..9) {

$la{$i} = 0;

$sl{$i} = 0;

}

my $count = 0;

open I, "<i.csv";

while (<I>) {

chomp;

$csv->parse($_);

my @cols = $csv->fields();

for my $i (@cols[1..4]) {

my @d = reverse split( //, $i );

$la{$d[0]}++;

$sl{$d[1]}++;

$count++;

}

}

close I;

print "Count: $count\n";

my $e = $count/10;

my $slchi = 0;

my $lachi = 0;

foreach my $i (0..9) {

print "$i,$e,$sl{$i},$la{$i}\n";

$slchi += ( $sl{$i} - $e ) * ( $sl{$i} - $e ) / $e;

$lachi += ( $la{$i} - $e ) * ( $la{$i} - $e ) / $e;

}

print "slchi: $slchi\n";

print "lachi: $lachi\n";

Here's a little CSV table that you can steal to do your own analysis:

Digit,Expected Count,Second-to-last Count,Last Count

0,11.6,10,9

1,11.6,9,11

2,11.6,15,8

3,11.6,6,9

4,11.6,11,10

5,11.6,11,5

6,11.6,14,14

7,11.6,18,20

8,11.6,13,17

9,11.6,9,13

And true enough I get the same figures as Scacco/Beber. The number 7 does occur 17% of the time in the last digit, and the number 5 only occurs 4% of the time. But, I don't care. What I want to know is, is the null hypothesis wrong. Could these results have occurred from a random process? And with what likelihood.

So here's where I avoid staring at the numbers (which can get to be borderline numerology) and do the chi-square test.

For the last digit the magic chi-square number is (drum roll, please): 15.55 and for the second-to-last digit it's 9.33. Then I go to my chi-square table and I look at the row for 9 degrees of freedom (that corresponds to the 10 possible digits; if you want to know why it's 9 and not 10 go read up on the subject) and I see that the critical value is 16.92.

If either of my numbers exceeded 16.92 then I'd have high confidence (greater than 95%) that the digit counts were not random. But neither do. I cannot with confidence reject the null hypothesis, I cannot with confidence say that these numbers are not random, and I cannot with confidence, therefore, conclude that the vote counts are fraudulent.

What this means is, is that there is no 'statistically significant' difference between the Iranian results and randomness. So, what we learn is that this statistical analysis tells us nothing.

It doesn't mean that the numbers weren't fiddled, it just means that we haven't found evidence fiddling.

PS In the notes added to their annotated version of the article Scacco/Beber mention that they did the chi-square test and got a p-value of 0.077. This is below the 'statistical significance' cut off of 0.05 and so their results are (as I find) not statistically significant.

30 June 2009 Update: I've removed that paragraph because that interpretation of the p-value is arguably inaccurate and if you are a statistician you'd probably shout at me about it. Doesn't change the fact that the data says the Iranian result is not statistically significant; it just says that my attempt to do a 'layman's version' is faulty.

To come up with better layman's version I ran a little simulation to find out how often you'd expect to see one digit occurring more than 17% of the time with another occurring less than 4% of the time (as in the Iranian election). The answer is about 1.48% of the time, or in about 1 in 67 fair elections.

Labels: rants and raves

## 3 Comments:

Thanks for following up on this; I did some similar analysis here and show a counterexample here.

The Post should correct the article for simply having bad math (0.0015 instead of 0.005) and at least publish critical letters given the faulty logic. It's really disheartening to see so many seemingly reasonable, accredited folks passing it on uncritically. I'm attributing that to the underlying assumption that the election was a sham (which I suspect) and any analysis that points in that direction must be correct.

If I were the head of some dodgy government and I wanted to fiddle with the election results, I wouldn't just make up numbers. I would take a few boxes with ballots in each district and count every ballot from that box as if it had a vote for my favourite candidate. That's just one of many ways of fiddling with the results that is very hard to detect by statistical methods.

Following up on Zach's comment, you don't have to assume the election was legitimate to find problems with this analysis. A nice comment on this at:

http://www.analyticpolitics.org/2009/06/devil-is-in-statistics.html

Post a Comment

## Links to this post:

Create a Link

<< Home