Myths of Crowdsouring
November 3rd, 2007by Sean Gorman
Figured I would keep on the crowdsourced data theme going with some myths I’ve seen crop up in many people’s perception of crowdsourced data and its reliability. First lets take a step back and look at a definition of crowdsourcing, ” [the] act of taking a job traditionally performed by an employee or contractor, and outsourcing it to an undefined, generally large group of people, in the form of an open call.” The fact this “group” is not paid or under contract leads many to believe what they produce cannot be trusted. I think this general assumption leads to a number of myths about crowdsourced data.
Crowdsourced Data and Official Data are Mutually Exclusive
There is a common perception (especially from traditional data providers) that data comes from an official source and is guaranteed accurate or it is crowdsourced and you have no clue if it is accurate or not. Encyclopedia Britannica articles come from an official source and Wikipedia is crowdsourced. NAVTEQ street data comes from an official source and OpenStreetMaps is crowdsourced. We can trust Encyclopedia Britannica and NAVTEQ because we pay them to provide us an accurate product, but we are not sure if we can trust Wikipedia and OpenStreetMap because we do not have a contract for them and any willy nilly crazy person could enter bad data. The issue is seen in black and white - non-trusted and trusted.
In reality crowdsourcing is a tool to collect data. Sometimes it is an end in and of itself like Wikipedia and OpenStreetMap. Other times it is an enabler - like voting news stories from third party sources on Digg. Digg does not user generate the stories but crowdsources the determination of which stories are most worth reading. More recently Tom Tom has used crowdsourced data to enhance their official base data. Perhaps the greatest potential of the crowdsourcing model is a hybrid working with traditional/official data sets. Not only mixing the two together, but using crowdsourcing to enhance the accuracy and validity of existing official data. For instance a map of toxic dumping sites from the EPA is interesting by itself, but it is imminently more valuable if you can add your own data of the schools, playgrounds, and friend’s houses your kids play at. Secondly, if you would like to add evidence to the map supporting the damage caused by the dumping site or add evidence showing the dumping site has been cleaned up then everyone has better context for the original data set and its validity. In both cases crowdsourcing is being used to enhance existing data and does not stand by itself.
Official Data is Automatically Accurate than Crowdsourced Data is Not
Their is a pervasive myth that if data comes from an official source or has official metadata then it must be accurate. Vice versa if it is crowdsourced it must be inaccurate. The truth of the matter is official data and metadata has inaccuracies and crowdsourced data has inaccuracies. In fact the vast vast majority of data in the world has inaccuracies. To quote Chris (our beloved Heretic Alpaca and CTO), “your data sucks and my data sucks - now that we have that settled we can go do something.” The fact that people think corwdsourced data is inaccurate is truly a good thing because they think about what they are consuming and are looking to see if there are problems. The beauty is that when they find problems they can actually go and fix them. The worst thing about official data is that we blindly assume that everything is perfect and when we find that perfection lacking there is no recourse to fix it.
Metadata is the Panacea
Many a GIS wonk has preached without metadata geographic information is just content. Once there is metadata the professionals have entered the room and all concerns evaporate. When people ask me about metadata in GeoCommons, especially our government customers, I say sure we can include your metadata. We can even make it mandatory to include metadata before inclusion if that is your preference, but just having metadata we do not think is sufficient. Metadata can often be anonymous and there is seldom repercussions or rewards if you are sloppy and quick putting in your metadata or thorough and diligent. When you fuse metadata with a crowdsourcing approach there can now be accountability. I create and contribute the data and that data is attached to me. You can click on the source and you get my profile. If the data rocks - kudos and praise for me, if the data blows - everyone knows I was the slacker who put it in.
Recently I did some digging back into the arguments around FGDC metadata when it came out in the early nineties. The standards was not without criticism and suggestions for improvement (Dutton 1994), “The metadata standard is per force formulated from a producer’s perspective. It is, one assumes, the responsibility of data producers to document published datasets, and there is not much consumers can do other than to offer feedback on the adequacy of the organization, usability and quality of datasets they acquire.” We now have the technological means by which to address what could not be addressed then, yet we are to ensconced in the statas quo and dogma to embrace the opportunity to improve the system.
Crowdsourcing is the Wild West of Data
Crowdsourcing is often conflated with “no rules” or “anything goes”, thus leading to a perception of not being trustworthy. While you can crowdsource with no rules it does not mean you are not allowed to have rules. Further, those rules can result in highly trusted content. Think of academic publishing, one of the most successful crowdsourced experiments of all time. Anyone can submit an article have it reviewed by a group of peers anonymously and published in some of the most trusted publications on the planet. No one pays me to publish an article. I have no economic incentive for the data in my paper to be accurate yet I would trust information from “The New England Journal of Medicine” way before I would anything out of the Encyclopedia Britannica. But…you say…academic journal are written by professionals! Not necessarily true - anyone can submit to an academic journal. You need no pedigree, and articles have been published by undergraduates that have no degree at all. The same kid with a Facebook page and 254 friends. Academic journals are trusted because of the peer driven culture that surrounds them, not economic incentive or accuracy standards that must be adhered to. A crowdsourced system can be highly trustworthy depending on how it is structured and the rules that are put in place. I do believe there is a trade off between the number of rules and requirements and the level of participation and innovation in a crowdsourced system. The more rules and requirements the higher the level of trust, but the less participation and possible innovation. Those that can maximize trust and participation in a crowdsourced application will be those who succeed.
Conclusion
In short I think crowdsourced data and tools often get an undeserved stereotype. People tend to lump it all together instead of looking at opportunities to leverage a new tool to enhance their competitiveness. I think this is often the result of fearful knee jerk reactions. Crowdsourcing does have the ability to disintermediate market places, but those who figure out how to harness that to their advantage will be the ones who succeed. Defensive criticism is usually a sign you are strategically headed in the wrong direction.
Popularity: 8% [?]






November 7th, 2007 at 8:17 am
[…] Gorman at FortiusOne has a great post on the Myths of Crowdsourcing. Some of the myths: Crowdsourced Data and Official Data are Mutually Exclusive & Crowdsourcing […]
December 29th, 2009 at 5:50 pm
Just want to say your article is astounding. The clearness in your post is simply striking and i can take for granted you are an expert on this field. Well with your permission allow me to grab your rss feed to keep up to date with future post. Thanks a million and please keep up the admirable work.