About the Author:  Sean Gorman founded FortiusOne in 2005 to bring location based analytics to the mass market. Sean brings over 10 years of experience at the forefront of the geospatial revolution as a researcher, practitioner, and entrepreneur at FortiusOne. Through both academic and entreprenurial efforts he has been working to make geographic data more accessible to the public since 1997 culminating in the creation of GeoCommons – a crowd-sourced repository of statistical data and social feeds that can be easily mapped, remixed and reused by non-technical users. Sean has been featured in media such as, Wired, Der Spiegel, ABC, Washington Post, Business 2.0, MSNBC, CBS and CNN. He also holds a PhD. From George Mason University in Public Policy where he was the Provost’s High Potential Scholar and was the recipient of the Fischer Prize. He has published dozens of articles on geographic data sharing and analysis, and authored the book Networks, Complexity and Security: The Role of Public Policy in Critical Infrastructure Protection. Read more from this author


In the course of working with Todd Huffman on the Afghanistan elections we came across the Afghan Independent Election Commission (IEC) results page. As results have been posted we’ve kept a running tally at each release, and have updated the results on our dashboard accordingly. Working with each release (10%, 17.2%, 35% and 47.8% reporting) we started to notice some irregularities in the numbers. In some polling locations the number of counted ballots actually decreased. This lead us to believe there could be instances where ballot stuffing had been caught and the vote tally decreased. The two maps below illustrate where votes tallies have decreased for the two main candidates – Karzai and Abdullah:

karzai_vote_withdrawls

abdullah_vote_withdrawls

The areas in blue are provinces whose vote totals decreased between the two reporting periods. While over all vote totals increased from 35% to 47% votes counted, these locations decreased.

At the same time the news has been rife with reports of voter fraud in Afghanistan. We began to wonder if there was a statistical method we could use to analyze the potential for fraud in the data we’d collected from the IEC. Raj had played with Benford’s law when we were mapping Iranian election data and looked like the right approach for an Afghan analysis. Benford’s law states “that in lists of numbers from many (but not all) real-life sources of data, the leading digit is distributed in a specific, non-uniform way.” Benford’s law is often used as a technique to detect fraud because artificially altering numbers would disrupt the specific non-uniform distribution.

Specifically Benford’s Law was used in a paper by Roukema to detect potential fraud in the 2009 Iranian elections. It has not been without contention, critiqued here and here. The criticism was geared around the distribution of votes not being a power law, thus Benford not applying. In this case expected voter turn per province has an 80% match to a power law distribution.

afghan_vote_power_law

While not perfect it does make a good case for Benford’s law being applicable as a predictor. Additional caveats there is currently only 47.8% of the vote totaled thus far. In some areas the vote count is far lower, some below 10%. This can skew the statistics, so this is all preliminary analysis and should be caveatted as such. The map of Benford coefficients is below:


The darker orange areas are locations that have high Benford coefficients which could indicate a higher probability of potential fraud.

As discussed above there are tons of caveats to this analysis and it should be viewed with the up most skepticism. Until all the results are in I’d put more weight in the maps showing vote counts decreasing, which is at this point a better indicator of potential fraud. There is some visual correlation between the two maps. A good follow on analysis correlating these two plus looking at correlations with votes for the two leading candidates could be intriguing as well. All will need to wait till we get a full vote count, but there is no doubt that opening the data up creates many opportunities.

The biggest take away for me is that open data does truly creates opportunities for better transparency. The amount and level of granularity for Afghan election data has been great. It truly opens the door for better transparency around the elections and the government that will be put into power by them. In that spirit we’ve uploaded our own analysis plus the IEC data to GeoCommons for others folks to use and question.

Popularity: 9% [?]

4 Responses to “Monitoring the Potential for Afghan Election Fraud: Leveraging Open Data for Transparency”

  1. jbNo Gravatar Says:

    I read this interesting presentation about the databrowser of the Afghan election.

    What do you make of it ?

    http://developmentseed.org/blog/2009/dec/17/opening-afghanistans-election-data-open-source-data-browser

    Did you know that the http://www.iec.org.af/ is now password protected and that devellopement seed had to process pdf file.

  2. Beer for Data - Arlington | Off the Map - Official Blog of FortiusOne Says:

    [...] Affected by DHD (Data Hugging Disorder)? Get cured at Beer4Data tonight at FourCourts in Arlington at 7PM. Following the success of the Beer 4 Data program in Jalalabad, Afghanistan we want to encourage those of you in the US to also share your data. In Afghanistan’s elections in 2009, the Beer 4 Data program provided valuable information in monitoring and security. [...]

  3. GeoCommons and the Georgian Election | Off the Map - Official Blog of FortiusOne Says:

    [...] FortiusOne we have a history of mapping elections. Previously in this blog Sean Gorman discussed “Leveraging Open Data for Transparency” during the 2009 Afghanistan elections. This blog also looked at rates of violence and its [...]

  4. An Open Data Litmus Test: Is There a Download Button | Off the Map - Official Blog of FortiusOne Says:

    [...] you can remix, reuse and share the data. Data and the government agency that supplies it are not transparent if you can’t download the raw data. PDF’s and web services don’t count. They can be useful [...]

Leave a Reply