Monitoring the Potential for Afghan Election Fraud: Leveraging Open Data for Transparency
September 1st, 2009by Sean Gorman
In the course of working with Todd Huffman on the Afghanistan elections we came across the Afghan Independent Election Commission (IEC) results page. As results have been posted we’ve kept a running tally at each release, and have updated the results on our dashboard accordingly. Working with each release (10%, 17.2%, 35% and 47.8% reporting) we started to notice some irregularities in the numbers. In some polling locations the number of counted ballots actually decreased. This lead us to believe there could be instances where ballot stuffing had been caught and the vote tally decreased. The two maps below illustrate where votes tallies have decreased for the two main candidates – Karzai and Abdullah:
The areas in blue are provinces whose vote totals decreased between the two reporting periods. While over all vote totals increased from 35% to 47% votes counted, these locations decreased.
At the same time the news has been rife with reports of voter fraud in Afghanistan. We began to wonder if there was a statistical method we could use to analyze the potential for fraud in the data we’d collected from the IEC. Raj had played with Benford’s law when we were mapping Iranian election data and looked like the right approach for an Afghan analysis. Benford’s law states “that in lists of numbers from many (but not all) real-life sources of data, the leading digit is distributed in a specific, non-uniform way.” Benford’s law is often used as a technique to detect fraud because artificially altering numbers would disrupt the specific non-uniform distribution.
Specifically Benford’s Law was used in a paper by Roukema to detect potential fraud in the 2009 Iranian elections. It has not been without contention, critiqued here and here. The criticism was geared around the distribution of votes not being a power law, thus Benford not applying. In this case expected voter turn per province has an 80% match to a power law distribution.
While not perfect it does make a good case for Benford’s law being applicable as a predictor. Additional caveats there is currently only 47.8% of the vote totaled thus far. In some areas the vote count is far lower, some below 10%. This can skew the statistics, so this is all preliminary analysis and should be caveatted as such. The map of Benford coefficients is below:
The darker orange areas are locations that have high Benford coefficients which could indicate a higher probability of potential fraud.
As discussed above there are tons of caveats to this analysis and it should be viewed with the up most skepticism. Until all the results are in I’d put more weight in the maps showing vote counts decreasing, which is at this point a better indicator of potential fraud. There is some visual correlation between the two maps. A good follow on analysis correlating these two plus looking at correlations with votes for the two leading candidates could be intriguing as well. All will need to wait till we get a full vote count, but there is no doubt that opening the data up creates many opportunities.
The biggest take away for me is that open data does truly creates opportunities for better transparency. The amount and level of granularity for Afghan election data has been great. It truly opens the door for better transparency around the elections and the government that will be put into power by them. In that spirit we’ve uploaded our own analysis plus the IEC data to GeoCommons for others folks to use and question.
Popularity: 9% [?]









December 22nd, 2009 at 6:45 pm
I read this interesting presentation about the databrowser of the Afghan election.
What do you make of it ?
http://developmentseed.org/blog/2009/dec/17/opening-afghanistans-election-data-open-source-data-browser
Did you know that the http://www.iec.org.af/ is now password protected and that devellopement seed had to process pdf file.
April 29th, 2010 at 3:29 pm
[...] Affected by DHD (Data Hugging Disorder)? Get cured at Beer4Data tonight at FourCourts in Arlington at 7PM. Following the success of the Beer 4 Data program in Jalalabad, Afghanistan we want to encourage those of you in the US to also share your data. In Afghanistan’s elections in 2009, the Beer 4 Data program provided valuable information in monitoring and security. [...]
June 8th, 2010 at 1:26 pm
[...] FortiusOne we have a history of mapping elections. Previously in this blog Sean Gorman discussed “Leveraging Open Data for Transparency” during the 2009 Afghanistan elections. This blog also looked at rates of violence and its [...]
June 9th, 2010 at 12:37 pm
[...] you can remix, reuse and share the data. Data and the government agency that supplies it are not transparent if you can’t download the raw data. PDF’s and web services don’t count. They can be useful [...]