I got a bit wrapped up trying to get a side project finished up yesterday, so I’ll just skip to day three of ETech. The morning opening speakers were better that Day Two, although the session thus far have been a bit below Day Two’s. We kicked off the morning with an abbreviated talk by John McCarthy (father of LISP) on a new language he’s working for several years called Elephant. The elephant name coming from the fact it never forgets, and the broad concept of a semantic programming language that can create structured relationships from natural language. Unfortunately he ran out of time before he really got into the guts of it, but there were some fascinating concepts with how natural language can be leveraged in a structured way to do computation. Definitely something worth looking into more, and it reminded me a lot of our thoughts about a context driven architecture and natural language for data. Although we were looking to turn quantitative data into natural language versus turning natural language into data.

Following McCarthy’s talk there were some interesting bits on open source personal robots, then an informal launch on Yahoo’s Fire Eagle. Fire Eagle has taken some flack in the blogs for having minimal or “zero” functionality. I think this misses the point of what Fire Eagle is intended to do. My impression was that Fire Eagle is not meant to be a stand alone consumer application but a straight forward tool that does a simple thing very well. That simple thing being a platform for sharing your location online. The functionality folks are clamoring for is left to the users and developers and I think there are good number of fun possibilities here. For instance with GeoCommons we have big pile o’ data and would be very useful to personalize that data delivery to a users location, or have user have the ability to comment on that data from their location and have that comment geo-located. This creates a dependency on clever users, but form what I’ve heard floating around ETech there seem to be a good number of clever ideas floating around.

The last session of the day I attended was Mikel and Jesse’s presentation on “Disaster Tech”. I’d seen Mikel’s presentation at the State of the Map conference on open source disaster technology, and it was cool to see how the project has evolved. The whole topic is something close to us, especially getting up close doing disaster response after the London Bombings and Hurricane Katrina. The presentation has some great examples of Open Street Maps, Twitter and Google Maps being used in creative ways during disasters. Mikel gave a nice example of using the USGS GeoRSS earthquake feed, the EU lightweight tsunami propagation model and a feed to republish the resulting polygons as GeoRSS. With this approach they can churn out a polygon warning area in under a minute. A similar concept is seen at the United Nations - GDAC application.

All great stuff for ad hoc implementation that is cost effective and not over engineered. Lots of good discussion of how take the information produced by technology and effectively transmit it to non-technical or completely unconnected people. Also Jesse and Mikel had a nice bit at the end of the presentation on anti-patterns - i.e. what happens when you don’t have a champion for the technology to create repeatable and successful implementations. Specifically the case of the search for Steve Fosset where the crowd sourced help to find him actually slowed down the search and rescue teams having to deal with all the input. Resulting in the emergence of champions like InternetSAR that creates a structure that could be replicated and effective for search and rescue. Lots of good thought on an important topic

Popularity: 18% [?]

We’ve been doing work recently integrating GeoServer with GeoCommons to provide more hooks and capabilities for our platform. I was catching up reading the GeoServer blog and saw a new demo they had going to demonstrate their map annotations tools in development.

The map only has a base street map for NYC, but the annotation features and presentation is quite nice. You can add annotations and pictures to the map and all works very smoothly. The ability to create annotations and layer them on top of structured data like crime rates or toxic release points is very compelling. Then users can not only see where a statistical phenomenon is happening but also comment, including confirmation or criticism. For instance add a photo of dead fish in green bubbling ooze at a toxic release point.

We had some fun with the concept about a year ago after a trip to NYC mapping the location of bars and single women then testing out the hot spots. Less altruistic than the example above but again demonstrates the value of adding qualitative comments to quantitative data. For fun I added the heat map we made of the bars and singles to the GeoServer demo. If you go to the lower east side it is the yellow marker on 6th St.

nyc_singles_bar_heat_map

Look forward to seeing if we can make use of the new GeoServer collaboration tools and props to them for all the good work.

Popularity: 16% [?]

The folks at Puhpin had a great comment they posted to our last blog entry on “free public data“. I thought there was enough interesting content to expand on the comment thread with another blog post. The Pushpin team did a great job providing far more nuanced thoughts on the issues of “for fee” data. At the end of the day my issue is truly with the government/s for not providing the data in easy to use formats or even open standard non-proprietary formats. In an open market anyone is free to take that government supplied data, make it easy to use, and charge a price the market is willing to pay. In addition to making the data easy to use many vendors also add an additional layer of quality assurance and many times value added data derivatives like forecasts.

There are many instances where vendor supplied data is truly value added and worth the money an end user pays, but there are also situations where it is not and there is a better alternative. Take for instance the 2000 Census data ESRI provides to Pushpin to resell - the added work there is taking the boundary files provided by Census and joining them to the data tables provided by the Census. I’ll be the first to admit it is tedious to do all the database joins, and it requires having pricey GIS software, but in my opinion the ratio of value add to price is way out of wack.

That is the philosophical difference with GeoCommons. If you have a community of people willing to put in that little bit of work to extract the data from places like Census and share it with the community you get a network effect. Since the data goes in under Creative Commons, anyone can take that data and combine it with their data or anyone else’s contributed data. Allowing any user to make something new and innovative with the collective data. Anytime you work to create a dataset/database there is value created and work done. Every member of OpenStreetMaps GPS-tracing roads has put in solid sweat equity, but they choose to contribute that to the community because the collective value of that data is far greater than its value alone.

In the end I believe this helps the data vendors because there is more data the market can mashup with the vendor data (vendors benefit from the network effect also). There is also a larger market of people that realize the value of the data because the barrier to entry to experience it has been removed. That said, I believe it also means the data providers are really going to have to add true value and not just do a few database joins. The real value comes in the technology and not the raw data itself. The data is what enables the technology to be more valuable.

Tim O’Reilly states that one of the key value drivers for Web 2.0 is “Data is the Intel Inside“. Specifically O’Reilly cites NAVTEQ’s proprietary database of streets as a big value drivers for many GeoWeb applications. I agree that databases (i.e. SQL is the new HTML) are creating new value propositions, but now the value is having data on the “outside” not the “inside”. The walled proprietary gardens of “inside” data are being trumped by open source “outside” data that allows a network effect to be created. With data on the “outside” not only can new combinations (data mashups) be created, but the data itself can adapt (like OpenSteetMaps and TomTom). In response to Brady’s post on the Nokeia acquisition of NAVTEQ O’Reilly comments, “the real question is going to be whether there’s a web 2.0 answer (i.e. a user-generated content) answer to the expensive data development and curation currently employed by Navteq.” I think the answer is a resounding yes and as standards like KML 3.0 progress and technologies evolve around them, the power given to the user so they can contribute meaningful data and context is only going to increase. The real value is in the technology that allows the data to be delivered, mashed up, and interconnected.

Popularity: 15% [?]

NPR ran a story on Monday’s Morning Edition entitled “Security Officials Seek to Block Some Online Maps”. The story centered around local government officials refusing to release electronic maps of what they call “critical infrastructure,” such as water mains and fire hydrants. Specifically the story of Steven Whitaker’s futile quest to obtain infrastructure data from the Greenwich, CT local GIS repository. As part of the story NPR came by to ask my opinion on the matter because of our history of creating security concerns using open source data.

The story has a nice quote of me saying it was an impossible task to try and control all the geodata out there and who has access to it. The part that did not air is that no one even knows what data is accessible and not accessible to the public. While we do have a good index and census of most of the web pages that exist, we have much less understanding of the databases including geospatial databases connected to the Web (often called the Deep Web). The indexes run by Google and others do a great job finding web pages but databases are a different game. A Cal Berkley study by Bergman found that, “the deep web consists of about 91,000 terabytes. By contrast, the surface web, which is easily reached by search engines, is only about 167 terabytes.” While it is uncertain how much of this data is geospatial in nature it is fair to assume it is a considerable amount of data that we largely have little clue about. Often times government agencies do not even realize what data they have online available to the public, and we definitely do not have a comprehensive way to understand the entire universe of geospatial data. What raised so much alarm with our original research were the authorities realizing that that the data was available open source. Everyone clamored the work should be classified, but the source data is all still out there hidden in myriad local, state, federal and NGO data repositories. This begs the question, how are we going to control a world of data that we have so little comprehension of?

In order to move towards greater security I believe we actually need to open up more so that the entirety of geospatial data can be indexed. We will have no true idea as to what geospatial data available to the public is potentially dangerous until know what is out there. The move towards making KML an OGC standard is a great first step as a standard geospatial data format for the Web. Although KML natively is geared towards providing a geographic framework for text, html, pictures etc., and not structured information like databases. We’ve been working on changing that by ensuring a mechanism exists by which to include feature attribute data in the schema tag of KML . Some of this work has carried over into KML 2.2 as “extended data“.

Once you begin to index the geospatial data out there you are in a much better position to have a logical debate about what data is a security threat and what data contributes to the public good. For instance you may want to know where there have been hazardous pipeline accidents, but not divulge where critical pipeline routing junctures are. By opening up geospatial data, not only do we have a foundation to better insure dangerous data stays out of the hands of bad guys, but we also have the positive externality of a whole wealth of data being made available to the public to solve a wide range of problems.

Read the rest of this entry »

Popularity: 14% [?]

Andrew Turner has a great series of blog posts on the future of KML that were the product of meetings at the OGC on the topic a week or so ago. Lots of interesting content in Andrew’s series, but the one most near and dear to us is the discussion on metadata. Chris made it out to the meeting with Andrew to throw our 2 cents into the discussion, and convey Chris’s thoughts on the schema tag and how attributed data can be embedded into it. We should not confuse adding attribute data to KML to adding metadata to KML as Sean Gillies points out in response to Andrew’s post. Both are important but serve two different and distinct functions.

Our use of the schema tag is to allow additional data to be added to KML to describe a location on the map. Natively KML supports the ability to add a description and Z coordinate to a location. So, you can describe a push pin with text, HTML and/or a picture then add a Z coordinate that provides a metric to that push pin. This allows you to do many things and has created a lot of great KML, but there are limits. Namely you can only really add two attributes - a description and a metric. Lots of locations descriptions and data in general is multi dimensional.

Lets take a simple example of one of the first Google “My Maps” mashups of the 2004 US Presidential Election. The election mashup is a nice thematic map of Bush (red states) versus Kerry votes (blue states), and when you click on a state it shows you the percent of votes for each candidate. The data on the percentage of votes for Bush and Kerry is placed in the description field of the KML requiring the user to color code each state to create the thematic map. This is quite a bit of work since your are using a qualitative data field to try and do something quantitative.

This is something we would like to change, by making it a lot easier for anyone to create KML that easily handles quantitative data. The geoweb, to date, has done a great job of opening up mapping by allowing anyone to create a qualitative description (text, HTML, pictures) of a location. This is what KML is currently geared to support, but there are an increasing number of people that would like to expand quantitative data beyond a single Z attribute.

In his post Andrew pointed to our use of the schema tag to enable thematic mapping, and that is accurate, but only the tip of the iceberg of what is possible. Once you have access to multiple data descriptors about a location it enables a range of decision making tools. KML currently reflects the “read - write” functionality of Web 2.0, but in order to evolve to a “read-write-execute” web it will need the ability to support quantitative functions that allows users to be enabled by decision support.

Since things are always clearer with examples and our favorite example is finding bars and single (men/women) let me give it a shot. Currently we would search for bars and get back KML that describes the bar - name, address, user comments, maybe a user rating. The KML and current applications cover this very well - we can “read” and “write” back to the KML - very Web 2.0. What is missing is any analysis of those bars that tell me the best one to go to.

Lets say the application already knows a few things about me - I am a 33 years old, single, male, work in IT, and I am a Taurus. This information and much more could be easily picked up from a social network profile like Facebook or MySpace. If I now did a search on bars and the KML had embedded feature attribute data for the bars and the surrounding contextual data I could be directed to the bars that had the highest correlation with women that are single, in an adjacent age bracket, and work in IT. If I had a good experience at the bar I could post back my comment to the bar further reinforcing that quantitative correlation with user generated validation. Now my KML has enabled a “read-write-execute” application that is both qualitative and quantitative. That I believe is the long term value proposition for KML 3.0.

Popularity: 20% [?]