Crowdsourcing to Create Resilience: Why Security through Obscurity will Never Work
October 15th, 2007by Sean Gorman
NPR ran a story on Monday’s Morning Edition entitled “Security Officials Seek to Block Some Online Maps”. The story centered around local government officials refusing to release electronic maps of what they call “critical infrastructure,” such as water mains and fire hydrants. Specifically the story of Steven Whitaker’s futile quest to obtain infrastructure data from the Greenwich, CT local GIS repository. As part of the story NPR came by to ask my opinion on the matter because of our history of creating security concerns using open source data.
The story has a nice quote of me saying it was an impossible task to try and control all the geodata out there and who has access to it. The part that did not air is that no one even knows what data is accessible and not accessible to the public. While we do have a good index and census of most of the web pages that exist, we have much less understanding of the databases including geospatial databases connected to the Web (often called the Deep Web). The indexes run by Google and others do a great job finding web pages but databases are a different game. A Cal Berkley study by Bergman found that, “the deep web consists of about 91,000 terabytes. By contrast, the surface web, which is easily reached by search engines, is only about 167 terabytes.” While it is uncertain how much of this data is geospatial in nature it is fair to assume it is a considerable amount of data that we largely have little clue about. Often times government agencies do not even realize what data they have online available to the public, and we definitely do not have a comprehensive way to understand the entire universe of geospatial data. What raised so much alarm with our original research were the authorities realizing that that the data was available open source. Everyone clamored the work should be classified, but the source data is all still out there hidden in myriad local, state, federal and NGO data repositories. This begs the question, how are we going to control a world of data that we have so little comprehension of?
In order to move towards greater security I believe we actually need to open up more so that the entirety of geospatial data can be indexed. We will have no true idea as to what geospatial data available to the public is potentially dangerous until know what is out there. The move towards making KML an OGC standard is a great first step as a standard geospatial data format for the Web. Although KML natively is geared towards providing a geographic framework for text, html, pictures etc., and not structured information like databases. We’ve been working on changing that by ensuring a mechanism exists by which to include feature attribute data in the schema tag of KML . Some of this work has carried over into KML 2.2 as “extended data“.
Once you begin to index the geospatial data out there you are in a much better position to have a logical debate about what data is a security threat and what data contributes to the public good. For instance you may want to know where there have been hazardous pipeline accidents, but not divulge where critical pipeline routing junctures are. By opening up geospatial data, not only do we have a foundation to better insure dangerous data stays out of the hands of bad guys, but we also have the positive externality of a whole wealth of data being made available to the public to solve a wide range of problems.
Potential next steps are even more interesting. Once you have an open and indexable pool of geospatial data you can begin to leverage the power of crowdsourcing as discussed in the last blog post. In that post the discussion centered around crowdsourcing as a tool to improve the accuracy of data, but I believe it also has potential to create greater security through more resilient infrastructures. One of the lessons we learned from our infrastructure research was that by adding a small amount of diversity you could greatly increase resiliency. Take this example from Iraq of routes carrying goods to a series of destinations.
To connect up the locations convoys have to travel 433.5635 miles and they repeatedly use the same roads 33.13% of the time. Each time they repeatedly use the same path the vehicles are further exposed to IEDs and snipers. If we run a little Monte Carlo simulation we can diversify the routes so the same roads are not used repeatedly and we only increase the distance traveled fractionally.
With the new set of routes the distance traveled is 436.8805 miles and the same roads are only used 25.03%; a 8.1% diversity improvement at only a .77% efficiency cost. This works well when you are in a centrally planned military environment, but what happens in the “rent seeking” world of civilian commuter traffic.
Here is where I think there is real potential for “crowdsourcing” to not only enable resilience but greater efficiency. Traffic congestion is typically caused by everyone using the same shortest route repeatedly and trying to maximize their own personal position at others expense (yes the tailgaters). Roughly this is why roads hit a phase transition (congestion) at only 15% of carrying capacity. What if we could add in a little diversity to the routes people take.
So lets take the argument full circle. Once you’ve opened and indexed a good chunk of geodata you can create a common base and road map that users can annotate or automatically ping. Then when traffic becomes congested, a road is closed or an accident happens a user could add data to the map (either automatically or manually) from their car or mobile. All users could then have the option of a requesting a diversified route that avoids the road problems that are being reported by the crowd (including the location of “the crowd” itself
. There are technological leaps in this scenario, but is a concept you could easily employ today with existing technologies. Lets take Google Maps as an example. Users are requesting driving directions and also dragging those directions to create new routes. Just as with the Iraq routing map the same roads are being used repeatedly. You could add an option of avoiding the heavily crowd sourced routes which takes that input as another variable in calculating the best route (in addition to speed, distance, number of turns etc.). At the end of the day you have not only a more efficient system but a more resilient system. When a major catastrophe happens the same principles allows destroyed roads and other infrastructure failures to be quickly communicated and routed around. I believe enabling adaptive resiliency through technology and crowdsourcing is infinitely more valuable than our current infrastructure protection mindset where we invest in guns gates and guards. The private sector has never bought into this (and they own 85% of infrastructure), but if you show a means by which they can be more efficient and more resilient then you have a case worth listening to.
Roads and traffic are just one possible area that crowdsourced geospatial data could make a huge differentiation. Once you’ve opened up geospatial data there is the ability to build upon those data sets to not only generate more context, but to also allow that data to respond as situations change, like traffic. I think there is a very solid case that what little security we gain by locking up and trying to control geospatial data is greatly outweighed by the public benefits of opening up geospatial data.
Popularity: 13% [?]








October 16th, 2007 at 7:19 pm
[...] Gorman chimes in at Moving Past Push Pins on Crowdsourcing and why the Security through Obscurity model for data management will not [...]
March 5th, 2009 at 8:53 pm
Blogs like this is waht we blog addicts are looking for, will visit often.