The Possibilities of Collective Statistical Intelligence
January 9th, 2009by Sean Gorman
I was reading Kevin Burke’s post today on the relationship between political affiliation and charitable giving, and thought it was a great example of “collective statistical intelligence“. In the post Kevin does a set of correlations between political affiliation and a generosity index then posts the results.
While the post was fascinating and great content, the comments were even more engaging. There is a great discussion on the data used and how the results could be interpreted and what some of the potential pitfalls are - like ecological fallacy. One of the most challenging aspects of doing a statistical analysis is interpreting the results. Running an analysis is fairly straight forward, but arriving at the right conclusion from that analysis can be quite challenging. Interpretation can go wrong because a user does not know the theory well enough or they do not the know the subject matter well enough (academically or “on the ground” experience).
The response to Kevin’s post I thought really showed the potential of “crowdsourcing” better statistical intelligence. When you open up the results of an analysis as well as the data used to perform the analysis there is a great opportunity for real collaboration. The type of discussion and conjecture that can lead to better decisions with statistical data. Since this is all discussion being done within a connected platform (i.e. the Web) the results can be harnessed over time and mined to see trends and macro correlations that help validate findings.
If we think about the way this is done traditionally it revolves around academic peer review. I have a hypothesis (that variable “x” could be an explanation of phenomenon “y”). I read the literature to see if there is theory to back up my hypothesis. I look at other studies to see what variables they used to explain phenomenon “y”. Then I build my model, run my results, write up my findings and send them off in hopes of being published. The journal takes my paper and sends it to other academic experts and they critique my research based on their experience and the relevant literature in the field. If I do my job well the paper is published and those with access to the journal can consume my research and hopefully be informed by it.
The problem is this is a very long process - on the order on years. It can take over a year to just go through the submittal, peer review and publication process. So, while the approach is great for validating research and producing meaningful results it is rarely done outside of academia in a rigorous way. What if that same process could be done in minutes/hours/days instead of years? We see a little bit of this in blogs every day - massively distributed peer review - but it is peer review of opinion 99% of the time. Kevin’s post showed something different, peer review of data. Not just reviewing “is the data accurate”, but “is the analysis of the data correct”. Over the course of a day the post has a really solid peer review of the analysis. To be honest it is better than many of the peer reviews I’ve gotten from academic journals.
If we go the next step and begin to harness this analysis to make it discoverable for the next user who runs an analysis with political affiliation or charitable giving it becomes yet more interesting. Lots of directions this can go and would love to get peoples thoughts on what they would find useful. If you’ve used GeoCommons a bit it is probably obvious that the scatter plot screen shots look awfully similar to the Maker user interface. That is no coincidence and we hope to have more details on a whole new set of GeoCommons functionality here shortly - stay tuned.
Popularity: 14% [?]






January 10th, 2009 at 10:19 am
Thanks for a great post, Sean
January 12th, 2009 at 7:10 am
I agree - in theory. The strength of academic review is that it’s a controlled environment. Therefore the results contain a certain amount of credibility. The internet, on the other hand, is remarkably (and, in my not-so-humble opinion, beautifully) devoid of control. Therefore, any ‘peer review’ produced on the web is automatically suspect. While a comment thread may, indeed, produce valid information, you stand an equal chance of just getting a bunch of political opinions or trolls telling you how great their Macs are.
January 12th, 2009 at 1:01 pm
Hi Terry,
Good point I think in order for the idea to be successful it would need to be within a social network, so that a person was not anonymous and their credentials were accessible. All peers are not equal and there would need to be a good system to delineate constructive peer review from trolls.
best,
sean
January 14th, 2009 at 8:22 am
Even though the discussions could and should be a little more robust, both the Swivel and Many Eyes sites provide (IMHO) the right kind of web 2.0 tools to encourage robust debate over interpretation of statistics.