Chiming in on KML Metadata

April 23rd, 2007by Chris Ingrassia

Everyone else is doing it, so I thought I might as well jump on the bandwagon with everyone’s new favorite geodata topic: KML Metadata.

I’m going to try and keep this short and sweet, since so many other people are already commenting on it.  Notably, the folks at Platial who propose its use for attribution information.  There’s also Google’s knowledge base article on the topic.  Although I think Paul Ramsey summed up some of the various opinions and comments on the matter well in his recent blog post.

Long story short, yes, there are real issues that need to be addressed, but let’s think before this turns into a monster.  KML was already becoming the standard way to store and transport geodata around the web, and it’s recent introduction into the OGC standards process has further cemented that.  One of the reasons that FortiusOne ultimately decided to support KML import in the short term, instead of GML or GML Simple Features profile was because it was simple.  It was one file that you needed for import, and you didn’t need to pull down a separate application-specific schema for more complex files.

KML’s Schema element handles this nicely.  The only restriction is that, as of right now, according to the documentation: “… the only value for parent is ‘Placemark.’”

This works wonderfully for our purposes.  Namely, importing geodata with attribute structures on each individual record.  This does not explicitly solve the attribution issue, and, frankly, I think the suggestion put forward by Platial re: the use of the Metadata element is a sound one, although for something as universal as attribution information, a top level element or elements to handle it explicitly might be called for, but whatever gets the job done, really.

What sort of scares me is in Google’s own example.  By their definition, the contents of the Metadata element are ignored by the viewer, but preserved in the file otherwise.  And their example is the tagging of air temperature data (using the ObsKML schema) to a Placemark with a polygon geometry inside of it…. huh?  I don’t know if this was just some example that got pulled out of a hat and was not intended to be overly serious, but please, please, nobody start doing that.

As it stands right now, things aren’t exactly perfect in the world of geodata and and geospatial analysis, but let’s face it, they could be a lot worse.  A lot of us, FortiusOne included, are trying to make things better and open things up in such a way that these sorts of problems can be approached by as many people as possible.

Starting a trend wherein the de-facto standard is for publishers to use their own application-specific derivation of the KML schema in such a way that the application-specific pieces of the data are inherently incomprehensible to generic data viewers doesn’t help a whole lot of people, IMHO.  Right now, I can do something like:

[other kml goes here ...]
<Schema name=“City” parent=“Placemark”>

<SimpleField name=“Name” type=“string”/>
<SimpleField name=“Population” type=“int”/>

</Schema>
<City>

<Name>Nowheretown</Name>
<Population>3</Population>

</City>[other kml goes here ...]

And what I will get is:

  • Structured data that a generic KML consumer like GeoCommons can use in a productive way
  • Something that still shows up properly in Google Earth (you even get nice little lists with the field names and their values)
  • A single file that defines what little custom schema it needs
  • Something that is SIMPLE

Key word there: SIMPLE.  Yes, there probably is a place for more structured and complex application schemas, but those are really the exception to the rule in my mind.  Don’t we want to stick with something that solves as many problems to as many people as is humanly possible without making everyone’s lives harder?

There are still many uses for something like the KML Metadata tag, but let’s be very wary of making things unduly complex so that everyone benefits.

Flexibility is good, but just because you can do something doesn’t mean you should.  My car has a luggage rack that might make a nice mounting point for some JATO rockets, and using it for such could potentially cut my commute time down.  However, it probably isn’t a terribly great idea for me to do so in the grand scheme of things. (editors note: apologies if that example was a little nonsensical and out of left field, but I really just wanted to work a rocket car into this somehow)

After all, what good is publishing your data if only you,  somebody, or some machine that had to explicitly learn how to, can read it?

Just my $0.02, flame away, I’m off to find some JATO rockets :)

Technorati Tags: , , , , ,

Popularity: 9% [?]

I spent the last four days out in San Francisco at the Location Intelligence conference, which was a great time but wish I could have done the Web 2.0 Expo as well. Lousy scheduling. While there were lots of interesting technologies presented at the conference my biggest take away from the conference was the tension between traditional geospatial vendors and emerging Web 2.0 technologies and start ups. Actually Web 2.0 and mashups are the wrong labels because now every vendor claims they are Web 2.0 and enabling “enterprise mashups”. Of course you are using their proprietary closed application, with proprietary data and not enabling any kind of open community discourse, but I digress.

The tension that was palpable at the conference was between open source things that are free and enterprise things that cost herds of money. While this is nothing new for software the issue is now data. The fear and sometimes animosity towards open source data really showed itself in the comments of enterprise people when asked about OpenStreetMaps.org. For those of you not familiar Open Street Maps (OSM) is a project to provide free street data for web mapping apps. In the US we take free TIGER line data from the Census for granted but every where else in the world you pay big bucks for street data, especially in the UK where OSM started. OSM has turned into a thriving community rapidly mapping large chunks of geography with quality street map data.

When the enterprise folks were asked about OSM they were dismissive in a rather hostile way. Statements like “30 inches matter in street maps and can be the difference between life and death. Do you want to trust open source data for this” or “we spend millions of dollars creating the worlds most accurate data – you can’t duplicate that with a community of volunteers”. The entertaining response from open source folks was “how many times has a proprietary street map led you to the wrong place, not had your house on it, missed a road etc.”….”how successful were you in getting that changed in the system”.

The interesting bit was the proprietary enterprise folks singing an O’Reilly mantra “Data is the Intel inside” for Web 2.0 and the future. “All those applications that use publicly available data are gonna be sunk.”  While I agree that data and content are going to huge drivers of value I do not believe that walled gardens of proprietary data is going to be the future.

Data benefits from Metcalfe’s law just as any application or technology does.  The power of one data set pales in comparison to a network of data sets that can be mashed together to create new datasets and relationships.   Also using a community to police data has all the benefits of open source software.  The big difference is that open source data creates a much larger community than open source software.  The number of people who can work code versus the number of people who can use and look at data creates a whole new game.  Blogs and wikis started the two way conversation, and there no reason with some hard work and innovative technology this cannot be extended to data.  OSM is a great example of early success as is Swivel.  The acquisition of GapMinder by Google also points to this future.  GapMinder is all about extracting public open source statistical data and making it available to the masses.

I think this is really at the heart of why people are struggling to find lasting value in mashups, especially map mashups.  You end up with a lot of proprietary data silos that are not connected.  I can look at tons of different map mashups with different data but there is no way to combine them to solve problems.  The more data you can interconnect the more problems you can solve and the emphasis and value comes in the technology and not walled gardens of over priced proprietary data.

Popularity: 9% [?]

A heckuva lot is being written these days about the blockbuster or lackluster fundraising performance of the presidential hopefuls, as well as, in the cases of Obama and Clinton, the numbers of individuals giving to their campaigns. It seems strange, though, that little has been said about where the individual support for each candidate is coming from.

The FortiusOne data team decided to see what they could find on this question, and discovered some data from the FEC on total campaign contributions raised from individuals, by zip code, in the fourth quarter of last year.

After a less-than-straightforward processing effort, the team created a shapefile from this data and loaded it into GeoCommons, where the GIS-challenged like myself can then have at it. I created a couple of heatmaps to see from where Republican candidates John McCain and Rudy Giuliani are drawing their grassroots support.

Not surprisingly, the map below shows that most of Guiliani’s individual contributors for Q4 ’06 live in New York City, with another cluster in Dallas.
Individual Contributions - Giuliani’s Campaign Q4 2006

McCain’s Q4 ’06 individual support, in contrast, is coming from a much more geographically diverse base, as you can see on the next map.
Individual Contributions - McCain’s Campaign Q4 2006

We have just begun to tap the surface of the FEC data, and are hoping to get maps of Clinton, Obama and Romney’s individual contributor support bases next. It will be cool to look at these alongside the demographics of their support bases too. So, stay tuned for more on the geography of campaign finance!

Popularity: 5% [?]

The release of Google My Maps caused quite a lot of stir in the office and on the blogs, not to mention calls from investors asking if this was trouble for us, and how exactly we are different. Which considering the buzz on the blogs was a fair question to ask, from Giga Om “The consequences of Google’s announcement could be quite dire for a gaggle of map mashup start-ups including Platial, Frappr, Flagr and Plazes, to name a few — that have raised millions of dollars in venture capital.”

While it is a very good question what I found far more interesting was a sentence at the end of Om’s post

“Google, like its peers, is realizing that in the future when digital content explodes exponentially, context will become more important. Especially, when it comes to local search. MyMaps are a quick way to provide some context. It will only be a matter of time before these Google-hosted map mash-ups start showing up next to local search results.”

The concept of how you provide context to maps and local search has been one of our driving missions here. It was also the basis of our last blog post about how our approach to delivering context to maps had been tagged as a Web 3.0 thing (which we are still not so sure about). Needless to say we were pretty stoked about “context” being part of the future for local search. Which is a nice segue to describing how we are different from My Maps. The big difference is that we are trying to organize and make consumable the world of geospatial data and creatively combining that with user generated data like what is being mashed up in My Maps.

One of the tough things with geospatial data is that it is technical and not user friendly. While it often provides great context about a location it usually needs context for your average user to understand it in the first place. Mookie has built a great little widget to do this which we’ll be offering up as an API for developers to play with in the near future. Probably the easiest way to explain it is with an example.

Lets say that a user is interested in knowing about the air quality for a location. They may live there, planning on visiting, moving there, or just want to compare two places to see which has the better air to breathe. EPA has some great data on air quality but it is fairly terrible to work with if you can even find the GIS data to download. On GeoCommons you just type in “air quality” or “air” etc. etc. and it pops up. With Mookie’s new widget you can go beyond just making a map to creating an intelligent contextual description of the data. For instance a user chooses to look at average air quality index and clicks on Denver or types in a Denver address they would get:

The mile high city- Denver, CO

The Median Air Quality Index (AQI) is 40. The Median Air Quality Index (AQI) is a measure of how unhealthy the air is in the area. High AQI values are bad. This index was produced by the EPA. This value is average (66th percentile).”

The user now has a short story describing the air quality in Denver and context to understand it in. It goes towards our metaphor (Minh’s actually) of building technology that enables people to tell stories with maps. It also goes back to allowing people to create contexts about locations they are interested in. In this case we are just describing the context of one variable of one data set, but the system is built to allow people to explore a wide range of contexts for all sorts of user defined locations. This is our take on the future of local search and My Maps is a definite reminder of the need keep innovating and avoiding commodity mash ups that can easily be duplicated. If anyone happens to be out at Location Intelligence in San Francisco next week we’ll be showing off some the technology, so please stop by and give us some feedback.

Popularity: 7% [?]