Quality Assurance for Crowdsourced GeoData: Icons and Comments?
December 16th, 2008by Sean Gorman
Whenever we present GeoCommons there are always questions about the accuracy and validity of crowdsourced data. The standard answer has been the data is as good as the source, and we provide multiple levels of citation to clearly identify the source. Sometimes the source is an individual who created their own data and there is no citation other than Bob made a spreadsheet or took his GPS out on the town. More frequently the data comes from an existing source like OECD, the United Nations, US Dept. of Transportation, etc. etc. and there is a link back to the source URL where the data was found. Lastly there is GIS data that has a full metadata specification (FGDC or ISO 19115) which can be included as a link.
While this information is all available on any metadata page in Finder there is nothing that really covers if the data has been quality checked. One of the dirty secrets of all data is there are inherently errors and mistakes. If anyone tells your their data is perfect they are most likely fibbing, and also believe their armpits never smell.
The challenges of data accuracy was reinforced recently on two different blog posts where readers identified errors on maps that we posted. One was a map our data team created on “College Coaches Salaries” where there were geocoding errors and the second was Steve Chilton’s OSM coverage map that had Monaco in place of Munich.
If you’ve spent a lot of time with geospatial data you’ll know these errors happen quite easily. Errors can be happen frequently with geocoding software and often it is just easy to overlook a misplaced city name when going over hundred of columns. I’ve been thinking about how we can introduce better quality assurance into both the data we contribute and help users of GeoCommons identify issues in their shared data.
For inspiration I looked into two existing projects Wikipedia and Swivel. Wikipedia probably has the most advanced quality assurance mechanisms in place for a crowdsourced project, but it is focused on text. Swivel on the other hand deals directly with data although not geospatial data.
One of the most useful approaches I’ve seen in Wikipedia is a common set of icons for labeling articles that have issues (no citation, too long, reads biased, needs verification, etc.). With the icons and text I can quickly see issues that exist with an article, which can help me gauge the extent to which I should trust the text. While the Wikipedia taxonomy is quite thorough it is geared around articles and not geospatial data.
One of the great things about data is that many organizations release it into the public domain, so copying data does not have the same issues that copying text has (plagiarism). This provides the opportunity to have data come directly from an “official source”. Swivel had the great idea of formalizing this by creating partnerships with organizations to share their data with the community as an “official source”. This again helps users decide on the level of confidence they have in a particular data set.
So my conclusion after spending some time looking at both was creating a set of icons and labels for datasets to let users know their level of vetting could be useful when combined with a clear labeling of a data set as “official source” or transcribed by someone else. Here a few possible labels for data and icons.
Geocoding Error
Needs Citation
Data Needs Cleanup
Data has been QA’d by an Editor
Then there are the icons that Swivel has created for “official source” data managed by Swivel and “official source” data uploaded by the source organization.
These are the tags that seemed to be most relevant. Are there others that tags folks think would be useful, or does anyone see issue with these? If there is general consensus around labels and icons to tag the level of validation for a given data set they could be used by anyone that has a need.
The other bit of this that I think is critical is creating a feedback loop to identify what the issues are with a particular data set. Which opens the question should these be georeferenced annotations indicating where on the map there error is, comments on the metadata page explaining the problem, or a combination of both. This requires a bit more engineering effort than the icons, but my first take is that a combination of the two could work well. Any other suggestions out there for providing better QA on geospatial data? Would love to hear them.
Popularity: 17% [?]
Dataset of the Day: International Unemployment
December 16th, 2008by Emily Sciarillo
Global economic crisis! Record level unemployment in the U.S.!
With our latest dataset on unemployment levels for select countries from 1995 to 2008 from the U.S. Department of Labor, I decided to take a look at what has been happening to unemployment in this economic environment.
The next three maps show unemployment levels for three different years at the same scale.
Then too see the more short term effects of the current crisis on unemployment rates, I made a map based on the percent change of unemployment rates from the first quarter of 2008 to the third quarter of 2008.
It is clear that globally things have worsened since 2000, however they still have not reached the levels seen in 1995. Also, the U.S. still has much lower unemployment rates than many European countries, such as Spain, France, Portugal, Germany, Greece and Italy (this may change with the latest figures for the fourth quarter of 2008).
Although comparatively, the U.S. has lower rates of unemployment than many European countries; it is important to note that the U.S. has a much less significant safety net for the unemployed (in the area of health care for example) so that the social effects may be as devastating.
The U.S. is also one of the countries that has seen the largest percent increase in unemployment rates since the beginning of 2008. Only Spain, Portugal and Ireland have had larger increases in unemployment rates than the U.S. (Italy does not have data after the second quarter of 2008). Since this data is based on self reporting from each country, figures may be inflated or deflated, such as the case of the U.S. It is important to note that this data does not represent unemployment in poorer countries where increasing unemployment may be more devastating.
Take a look at these maps yourself or go to Maker! and make your own maps from the dataset.
Popularity: 17% [?]
Dataset of the Day: Holiday Shopping, Let’s Save Some Money
December 15th, 2008by Kevin Burke
It’s the holidays and what is one thing that is on the minds of everyone? Shopping! Yes, and this year with the economy slumping people are trying to not only find the perfect gift but the perfectly-priced gift. As I myself have pondered this question a thought entered my head. What if I were to do my shopping in a state that has no sales/general tax? Yes, these states do exist and Finder! and Maker! have a dataset that show sales tax across the USA by state. The map is below:

The states that are a very light cream color (Oregon, Montana, Delaware, New Hampshire) are the states that have no sales/general tax. The darker the state the higher the sales tax rate is in that state.
Now my next question is this. If I am to go to one of these states to shop will I really end up saving more money? I may not be spending money for a sales tax but I certainly will be spending more money on gas to travel the extra distance. I will set up a hypothetical situation using Finder! and Maker! to see what my answer will be.
Let’s say I live in the lovely state of Washington in the city called Castle Rock. In Washington the sales tax is at a rate of 6.5%. Next door to me is my neighbor Oregon that has a 0% sales tax. Now on Finder! I can load major shopping centers that are around me in my area. The map below shows that I have two major shopping centers right by me that are relatively close off of Interstate 5, one in Centralia, WA (Centralia Shopping Center, 34.4 miles away) and the other in Portland, OR (Jantzen Beach SuperCenter, 50.4 miles). These will be the two places that I will compare and the map is below of the two with Castle Rock right in the middle. The map is shown below:

Now let’s do some math. My holiday shopping expenses look like this:
Wife = $70, Mom = $60, Dad = $60, Sisters = $120 = = Total of $310 on gifts
In Washington, with shopping tax this equals 310 x 6.5% tax = 20.15, 310 + 20.15 = $330.15. So the difference between the two states is $20.15.
Now let’s look at gas expenses:
Let’s say gas in Castle Rock is $2.00 a gallon and my car averages a rate of 25 mpg. If my round trip from Castle Rock to Centralia is 68.8 and my trip from Portland and back is 100.8 miles, then my gas costs will look like this.
Castle Rock to Centralia: 68.8/25 = 2.75 g x $2 = $5.50
Castle Rock to Portland: 100.8/25 = 4.03 g x $2 = $8.06
By going to Centralia I will end up saving 8.06 – 5.50 = $2.56
Now as we put these two savings figures together we see that overall our trip to Portland would be a wiser choice. You will spend more money on fuel ($2.56), but you will save much more on your shopping expenses ($20.15). Together it will provide us with a savings of $17.59.
I would like to mention that this is very hypothetical. Often, other circumstances (county taxes, municipal taxes, toll roads encountered, different mpg rates on the trips, and many others) may enter into the equation and change figures. All in all this might be a solution to save money, so create your own hypotheticals using Finder! and Maker! and see if it will help. Below are links to Finder! datasets that show major shopping centers (malls, outlet malls) in a few 0% sales tax states. Happy Holidays and good luck shopping!
Popularity: 13% [?]
OpenStreetMap vs. Google/TeleAtlas Street Coverage
December 12th, 2008by Sean Gorman
Steve Chilton of Middlesex University recently created a cool map in GeoCommons comparing street coverage for OpenStreetMap (OSM) and Google/TeleAtlas in several cities across the globe. It provided a fascinating perspective and thought it would be cool to share it with the community.
The project began with work by Bernard Zwischenbrugger to visually compare coverages between OSM and Google/TeleAtlas. Then Alex Mauer picked up the ball and did a numerical analysis of coverage. Steve then took Bernard’s original visual comparison (location data) and Alex’s scoring (numerical comparison) and produced a map to visualize the results of the comparison:
The size of the circles are proportional to the values for both, so small circles equal poor coverage and large circles equal good coverage. The overlap of the circles shows who appears to be doing better (orangey/brown showing means that osm is doing better, blue google). OSM is the top layer so a tie will have OSM looking better, but you can click the layers on and off to see both views of the coverage.
Alex’s original assessment was that OSM is slightly ahead of Google/TeleAtlas worldwide and in in Africa and Asia. In Europe, OSM is well ahead. Google is slightly ahead in Oceania, and well ahead in North and especially South America.
Steve would have liked to be able to show results on a combined scale from +5 (for osm 5, google 0) to -5 (osm 0, google +5), with 0 for equal, but we do not yet ha ve a bi-polar colour scale for point data in the software. A great suggestion for future development.
It will be interesting to see how Google’s launch of MapMaker for 162 countries will impact this comparison in the future. Many thanks to Steve for loading the data into Finder and making cool maps with it.
Popularity: 41% [?]









