Why Geocoding Should be a Commodity

July 13th, 2009by Sean Gorman

Arguably the largest positive externality to the Web ecosystem that geospatial technologies can provide is creating more linked geo-enabled data. The beauty is the externalities work both ways. Not only does the Web get more useful content we also create more reasons for the public to use geospatial tools and software. Without the ability to geoereference data none of our collective mapping brilliance is terribly useful. Yet we put all sorts of obstacles in the way of the most basic geo-enabling capabilities - namely geocoding. We treat geocoding as a precious resource that needs to be metered and monetized. In short we put a strangle hold on the lifeblood of our business, geo-enabled data. Without geo-enabled content our relevance to the larger Web diminishes immensely.

The major providers all put restrictions around geocoding making it especially difficult to do batch geocoding operations to get large chunks of data geo-enabled. Google, Yahoo and Microsoft’s geocoders are all geared to single address look ups, and not for mass data geo-enablement. There are services like batchgeocode.com that get around some of the limitations but are still restricted by provider’s TOS.

The second big issue with current geocoding is further upstream. All the geocoding API’s are dependent on NAVTEQ, TeleAtlas’s and a few other providers data to geocode against. So, if the street data companies don’t think a country has a big enough market you can’t geocode in these areas. This especailly limits the ability to geocode data in developing countries.

Our thought is the best solution to this problem is an open source geocoder. There have been other open source geocoder projects, some of which have taken criticism as a bad business decisions.

We’ve taken a slightly different approach. One, we enlisted the brilliant help of Schuyler to evolve his work from Geocoder.us to best take advantage of the work and community already existing. Second, we decided to make the Geocoder street data neutral. Meaning that you can plug whatever street data source you want into the geocoder and have it work - sometimes with a bit of tweaking. In the first go we’ve set up the geocoder to work with TIGER data and NAVTEQ. We chose these two mainly because they both use all CAPS for their names.

The hope is that with the community’s help we can extend the geocoder to work with a large number of other data sources. As Andrew mentioned in his post OpenStreetMap is top of the list. Integrating OSM data will be key enabling geocoding in developing countries and other areas overlooked by current commercial providers. I think this is one of many areas where the OSM community is really going to show its power. While the geocoder is currently only accessible to developers through github, stay tuned because we’ll be exposing it as a web application in GeoCommons shortly. We want everyone to be able to geo-enable their data and access it in whatever format meets their needs. Data wants to be free and we all win when the gates are unlocked.

Popularity: 16% [?]

A group of UCLA geographers published a paper yesterday in the MIT International Review entitled “Finding Osama bin Laden: An Application of Biogeographic Theories and Satellite Imagery”. The UCLA team used purely open source data, including “Landsat ETM+, Shuttle Radar Topography Mission, Defense Meteorological Satellite, QuickBird”. Then used a variety of commons geographic analysis techniques, “distance-decay theory, island biogeography theory, and life history characteristics” to predict high probability locations for Osama Bin Laden. The story has already been picked up by 90 media outlets and has been popping up on the front page of several major news outlets.

It would never make it out of the labyrinth of classification schemas in the US government, but it would be fascinating to see what a crowdsourced search for Bin Laden would turn up if better data was made available from the intel/defense community. Since the government data will never be released we thought we could at least help make the open source data easily accesible. So, we took the available data in the MIT article plus relevant data on Afghanistan and pushed it into GeoCommons. We’ve embedded a map with our own take below.

To view this map in GeoCommons Maker! click here.

In addition to the UCLA data we’ve added gridded population data for the area. A big part of the UCLA thesis was Osama would be, “in a larger town rather than a smaller and more isolated town where extinction rate would be higher”. So, the gridded data gives a rough view of population densities in the remote Tora Bora region.

Source data for the maps is here:

Structure Locations of Possible Hiding Spots of Osama Bin Laden, Parachinar, Pakistan, 2009
Tora Bora 10 KM Buffer Rings
Gridded Population Data, Afghanistan and Pakistan border near Tora Bora

Would be great to see what other folks can do with the data to promote other perspectives. Also a nice opportunity to show the power of opening data up for better analysis, QA and alternative perspectives. Kudos to the UCLA team - great to see geographers in the news for doing what they do best.

Popularity: 18% [?]

OpenStreetMap vs. Google/TeleAtlas Street Coverage

December 12th, 2008by Sean Gorman

Steve Chilton of Middlesex University recently created a cool map in GeoCommons comparing street coverage for OpenStreetMap (OSM) and Google/TeleAtlas in several cities across the globe. It provided a fascinating perspective and thought it would be cool to share it with the community.

The project began with work by Bernard Zwischenbrugger to visually compare coverages between OSM and Google/TeleAtlas. Then Alex Mauer picked up the ball and did a numerical analysis of coverage. Steve then took Bernard’s original visual comparison (location data) and Alex’s scoring (numerical comparison) and produced a map to visualize the results of the comparison:

The size of the circles are proportional to the values for both, so small circles equal poor coverage and large circles equal good coverage. The overlap of the circles shows who appears to be doing better (orangey/brown showing means that osm is doing better, blue google). OSM is the top layer so a tie will have OSM looking better, but you can click the layers on and off to see both views of the coverage.

Alex’s original assessment was that OSM is slightly ahead of Google/TeleAtlas worldwide and in in Africa and Asia. In Europe, OSM is well ahead. Google is slightly ahead in Oceania, and well ahead in North and especially South America.

Steve would have liked to be able to show results on a combined scale from +5 (for osm 5, google 0) to -5 (osm 0, google +5), with 0 for equal, but we do not yet ha ve a bi-polar colour scale for point data in the software. A great suggestion for future development.

It will be interesting to see how Google’s launch of MapMaker for 162 countries will impact this comparison in the future. Many thanks to Steve for loading the data into Finder and making cool maps with it.

Popularity: 41% [?]

With the elections over I’ve had a little time to think about what the new administration could mean for the GeoWeb. For those who follow the GeoWanking list serv there has been a raging debate on neogeography versus paleogeography. Some of the rhetoric reminds me of the just finished election and how we strive to create a binary world - blue state/red state or neo/paleo. In the spirit of moving beyond stereotypes and on to solving problems; I thought a closer look at what the potential impact of Obama’s technology platform on the GeoWeb could be. Might be a good diversion from our own self reflection - despite the fact I’ve added plenty of fuel to that fire ;-)

You can read Obama’s technology platform overview here. The plank that really grabbed my attention was the promise to “Open Up Government to its Citizens”. The idea that data about government (Congressional voting records) and created by the government (census data) should be easily available to the public. Specifically:

Making government data available online in universally accessible formats to allow citizens to make use of that data to comment, derive value, and take action in their own communities. Greater access to environmental data, for example, will help citizens learn about pollution in their communities, provide information about local conditions back to government and empower people to protect themselves.”

The beauty is that we (the collective GeoWeb) have so many of these tools already built. The ability to deliver the data once it is made easily available has great promise. For instance here is EPA data on power plant emissions from GeoCommons:

From the map above you can see which power plants are producing the most poisonous CO2 emissions (click the down carrot on the layers box for the filter) or zoom into your specific neighborhood to see the plant and the type of environment around it. (Still refining the embed capability, but an example of how data can be virally spread).

The report goes on to recommend that the federal government should:

Establishing pilot programs to open up government decision-making and involve the public in the work of agencies, not simply by soliciting opinions, but by tapping into the vast and distributed expertise of the American citizenry to help government make more informed decisions.

This strikes again at the heart of the GeoWeb - enabling collaboration of experts and citizens across the country. Several projects and companies have pioneered dynamic collaboration around maps. Below is a Google MyMap with feedback around the GeoCommons power plant data in Florida”


View Larger Map

The blue push pins are the user generated feedback linking to expert opinion and photos from the field. This is just the tip of the iceberg of what is possible with collaboration around maps. These approaches can also be leveraged inside of government agencies, which is another plank in the Obama technology platform:

Employing technologies, including blogs, wikis and social networking tools, to modernize internal, cross-agency, and public communication and information sharing to improve government decision making.”

We’ve seen a lot of this type of work going on in the intelligence community with Intelink, Intellipedia, and A-Space. There is also data fusion and sharing concepts, like the EPA’s Central Data Exchange. I’d love to hear other projects that fit in with the three planks, and more importantly existing or planned GeoWeb technologies that could help enable the new vision. I’ve really only highlighted two and I know there are tons more out there.

Popularity: 24% [?]