Quality Assurance for Crowdsourced GeoData: Icons and Comments?
December 16th, 2008by Sean Gorman
Whenever we present GeoCommons there are always questions about the accuracy and validity of crowdsourced data. The standard answer has been the data is as good as the source, and we provide multiple levels of citation to clearly identify the source. Sometimes the source is an individual who created their own data and there is no citation other than Bob made a spreadsheet or took his GPS out on the town. More frequently the data comes from an existing source like OECD, the United Nations, US Dept. of Transportation, etc. etc. and there is a link back to the source URL where the data was found. Lastly there is GIS data that has a full metadata specification (FGDC or ISO 19115) which can be included as a link.
While this information is all available on any metadata page in Finder there is nothing that really covers if the data has been quality checked. One of the dirty secrets of all data is there are inherently errors and mistakes. If anyone tells your their data is perfect they are most likely fibbing, and also believe their armpits never smell.
The challenges of data accuracy was reinforced recently on two different blog posts where readers identified errors on maps that we posted. One was a map our data team created on “College Coaches Salaries” where there were geocoding errors and the second was Steve Chilton’s OSM coverage map that had Monaco in place of Munich.
If you’ve spent a lot of time with geospatial data you’ll know these errors happen quite easily. Errors can be happen frequently with geocoding software and often it is just easy to overlook a misplaced city name when going over hundred of columns. I’ve been thinking about how we can introduce better quality assurance into both the data we contribute and help users of GeoCommons identify issues in their shared data.
For inspiration I looked into two existing projects Wikipedia and Swivel. Wikipedia probably has the most advanced quality assurance mechanisms in place for a crowdsourced project, but it is focused on text. Swivel on the other hand deals directly with data although not geospatial data.
One of the most useful approaches I’ve seen in Wikipedia is a common set of icons for labeling articles that have issues (no citation, too long, reads biased, needs verification, etc.). With the icons and text I can quickly see issues that exist with an article, which can help me gauge the extent to which I should trust the text. While the Wikipedia taxonomy is quite thorough it is geared around articles and not geospatial data.
One of the great things about data is that many organizations release it into the public domain, so copying data does not have the same issues that copying text has (plagiarism). This provides the opportunity to have data come directly from an “official source”. Swivel had the great idea of formalizing this by creating partnerships with organizations to share their data with the community as an “official source”. This again helps users decide on the level of confidence they have in a particular data set.
So my conclusion after spending some time looking at both was creating a set of icons and labels for datasets to let users know their level of vetting could be useful when combined with a clear labeling of a data set as “official source” or transcribed by someone else. Here a few possible labels for data and icons.
Geocoding Error
Needs Citation
Data Needs Cleanup
Data has been QA’d by an Editor
Then there are the icons that Swivel has created for “official source” data managed by Swivel and “official source” data uploaded by the source organization.
These are the tags that seemed to be most relevant. Are there others that tags folks think would be useful, or does anyone see issue with these? If there is general consensus around labels and icons to tag the level of validation for a given data set they could be used by anyone that has a need.
The other bit of this that I think is critical is creating a feedback loop to identify what the issues are with a particular data set. Which opens the question should these be georeferenced annotations indicating where on the map there error is, comments on the metadata page explaining the problem, or a combination of both. This requires a bit more engineering effort than the icons, but my first take is that a combination of the two could work well. Any other suggestions out there for providing better QA on geospatial data? Would love to hear them.
Popularity: 17% [?]










December 16th, 2008 at 6:33 pm
I think this is a really great idea and that the icons work fine. But what exactly is the Q&A process and how does that affect data credibility?
December 17th, 2008 at 10:57 am
Hi Amenity -
The QA process is still a bit of an open question. We have data team here that goes through and does a check on data and could serve as a kernel for an editor system like Wikipedia. Although in success I think they would just be assessing the comments of the community to provide a stamp to the data set. An editor function could be anyone in the community, but will be nice to have a full time staff to get it going.
There is also the question of what should be QA’d. The most obvious errors are usually geocoding and that would probably be the bulk of the work. There is also issues of copyrighted work being posted or someone maliciously posting bad data, but neither of those have been issues to date.
We are not really in a position to check to veracity of data numbers (i.e. is that the correct population for Malawi), but we can check georeferencing, use of the correct borders, typos, etc.
Are there other issues that people see often in geospatial data that could be QA’d?
December 20th, 2008 at 4:03 pm
Where did this icons come from?
December 21st, 2008 at 10:36 am
I borrowed them from wikipedia as place holders. Game for doing some icon design? How is the trip going?
February 11th, 2009 at 11:29 am
Hi, Im just want to ask you guys if you know some specific tools to check road maps for avoid cartographic errors. If someone know any way or method to analyze maps and have more visual facilities to recognize mistakes, I´ll be apreciatte it. Thanks
February 12th, 2009 at 10:14 am
Hi Alejandro -
Not off the top of my head, but if anyone would know it would be the http://www.openstreetmap.org folks. I know they’ve done accuracy comparisons between themselves and other street providers, but not sure if they had an automated tool for it.
best,
sean