Huffman’s Three Principles for Data Sharing

September 16th, 2009by Sean Gorman


About the Author:  Sean Gorman founded FortiusOne in 2005 to bring location based analytics to the mass market. Sean brings over 10 years of experience at the forefront of the geospatial revolution as a researcher, practitioner, and entrepreneur at FortiusOne. Through both academic and entreprenurial efforts he has been working to make geographic data more accessible to the public since 1997 culminating in the creation of GeoCommons – a crowd-sourced repository of statistical data and social feeds that can be easily mapped, remixed and reused by non-technical users. Sean has been featured in media such as, Wired, Der Spiegel, ABC, Washington Post, Business 2.0, MSNBC, CBS and CNN. He also holds a PhD. From George Mason University in Public Policy where he was the Provost’s High Potential Scholar and was the recipient of the Fischer Prize. He has published dozens of articles on geographic data sharing and analysis, and authored the book Networks, Complexity and Security: The Role of Public Policy in Critical Infrastructure Protection. Read more from this author


Todd came into town for the Gov2.0 Summit last week, and in additional to dropping off a terabyte worth of data from Afghanistan, he talked a bit about what has made the “beer for data” program work at the Taj. Outside the universal thirst for beer data sharing success boiled down to three basic principles:

1) Create immediate value for anyone contributing data: when users contribute data they should get an immediate return on that investment. In the case of the Afghan pilot that meant getting to see your contributed data on a map of high resolution satellite imagery as soon as you uploaded it. The imagery for Afghanistan was made available by NGA, then tiled and served up by a Fusion Server, graciously donated by Google.

2) Make contributor’s data available back to them with improvements: any data that goes in should be available to download back out again. Further, the data should come back better than when it went in. In the Afghan pilot this meant if you shared data in a spreadsheet format into the platform you could get it back out as KML, shapefile, Atom, JSON, spatialite etc. (Addendum to principle 2 – PDF’s are evil, and make parsing and extracting data into a sharable format complete misery.)

3) Share derivative works back with the data sharing community: urge users who create derivative works, with shared data, to contribute their data products back to the group. In the case of the Afghan pilot researchers were taking the detailed data from the field and feeding it into their sophisticated models and simulations. Researchers would then upload the results into GeoIQ to share the derivative works back with the data sharing community. This meant that agencies and individuals that shared data again got a better product back by contributing. The researchers get better data to feed their models, and a self perpetuating feedback loop is created that sustains increasing data sharing.

While these sound like simple principles, it is amazing how often they are not followed and effective data sharing is blunted. Too often data sharing – especially with government and corporations – is a black hole. Data goes in but never comes back out. Also it is rare to see the positive feedback loops of researchers sharing their work products back with the data sharing community. Too often researchers get wrapped around the axle on their products being proprietary or sensitive. While this can be the case there is huge benefit in feeding results back to gauge their veracity and accuracy. I’ve definitely seen way too many models that look great in the lab and completely fall apart in reality because researchers would not feed results back to the field for verification and error bounding. I’m hoping we’ll have more opportunities to implement these principles in future projects and we can see the success of Todd’s work in Jalalabad duplicated hundreds of times over.

Popularity: 11% [?]

One Response to “Huffman’s Three Principles for Data Sharing”

  1. Good user experience is not optional [The Book of Trogool] « Technology Blogs Says:

    [...] the Map offers Huffman’s Three Principles for Data Sharing, which are really principles for data-collection and -display [...]

Leave a Reply