Chiming in on KML Metadata

April 23rd, 2007by Chris Ingrassia

Everyone else is doing it, so I thought I might as well jump on the bandwagon with everyone’s new favorite geodata topic: KML Metadata.

I’m going to try and keep this short and sweet, since so many other people are already commenting on it.  Notably, the folks at Platial who propose its use for attribution information.  There’s also Google’s knowledge base article on the topic.  Although I think Paul Ramsey summed up some of the various opinions and comments on the matter well in his recent blog post.

Long story short, yes, there are real issues that need to be addressed, but let’s think before this turns into a monster.  KML was already becoming the standard way to store and transport geodata around the web, and it’s recent introduction into the OGC standards process has further cemented that.  One of the reasons that FortiusOne ultimately decided to support KML import in the short term, instead of GML or GML Simple Features profile was because it was simple.  It was one file that you needed for import, and you didn’t need to pull down a separate application-specific schema for more complex files.

KML’s Schema element handles this nicely.  The only restriction is that, as of right now, according to the documentation: “… the only value for parent is ‘Placemark.’”

This works wonderfully for our purposes.  Namely, importing geodata with attribute structures on each individual record.  This does not explicitly solve the attribution issue, and, frankly, I think the suggestion put forward by Platial re: the use of the Metadata element is a sound one, although for something as universal as attribution information, a top level element or elements to handle it explicitly might be called for, but whatever gets the job done, really.

What sort of scares me is in Google’s own example.  By their definition, the contents of the Metadata element are ignored by the viewer, but preserved in the file otherwise.  And their example is the tagging of air temperature data (using the ObsKML schema) to a Placemark with a polygon geometry inside of it…. huh?  I don’t know if this was just some example that got pulled out of a hat and was not intended to be overly serious, but please, please, nobody start doing that.

As it stands right now, things aren’t exactly perfect in the world of geodata and and geospatial analysis, but let’s face it, they could be a lot worse.  A lot of us, FortiusOne included, are trying to make things better and open things up in such a way that these sorts of problems can be approached by as many people as possible.

Starting a trend wherein the de-facto standard is for publishers to use their own application-specific derivation of the KML schema in such a way that the application-specific pieces of the data are inherently incomprehensible to generic data viewers doesn’t help a whole lot of people, IMHO.  Right now, I can do something like:

[other kml goes here …]
<Schema name=“City” parent=“Placemark”>

<SimpleField name=“Name” type=“string”/>
<SimpleField name=“Population” type=“int”/>

</Schema>
<City>

<Name>Nowheretown</Name>
<Population>3</Population>

</City>[other kml goes here …]

And what I will get is:

  • Structured data that a generic KML consumer like GeoCommons can use in a productive way
  • Something that still shows up properly in Google Earth (you even get nice little lists with the field names and their values)
  • A single file that defines what little custom schema it needs
  • Something that is SIMPLE

Key word there: SIMPLE.  Yes, there probably is a place for more structured and complex application schemas, but those are really the exception to the rule in my mind.  Don’t we want to stick with something that solves as many problems to as many people as is humanly possible without making everyone’s lives harder?

There are still many uses for something like the KML Metadata tag, but let’s be very wary of making things unduly complex so that everyone benefits.

Flexibility is good, but just because you can do something doesn’t mean you should.  My car has a luggage rack that might make a nice mounting point for some JATO rockets, and using it for such could potentially cut my commute time down.  However, it probably isn’t a terribly great idea for me to do so in the grand scheme of things. (editors note: apologies if that example was a little nonsensical and out of left field, but I really just wanted to work a rocket car into this somehow)

After all, what good is publishing your data if only you,  somebody, or some machine that had to explicitly learn how to, can read it?

Just my $0.02, flame away, I’m off to find some JATO rockets :)

Technorati Tags: , , , , ,

Popularity: 7% [?]

I spent the last four days out in San Francisco at the Location Intelligence conference, which was a great time but wish I could have done the Web 2.0 Expo as well. Lousy scheduling. While there were lots of interesting technologies presented at the conference my biggest take away from the conference was the tension between traditional geospatial vendors and emerging Web 2.0 technologies and start ups. Actually Web 2.0 and mashups are the wrong labels because now every vendor claims they are Web 2.0 and enabling “enterprise mashups”. Of course you are using their proprietary closed application, with proprietary data and not enabling any kind of open community discourse, but I digress.

The tension that was palpable at the conference was between open source things that are free and enterprise things that cost herds of money. While this is nothing new for software the issue is now data. The fear and sometimes animosity towards open source data really showed itself in the comments of enterprise people when asked about OpenStreetMaps.org. For those of you not familiar Open Street Maps (OSM) is a project to provide free street data for web mapping apps. In the US we take free TIGER line data from the Census for granted but every where else in the world you pay big bucks for street data, especially in the UK where OSM started. OSM has turned into a thriving community rapidly mapping large chunks of geography with quality street map data.

When the enterprise folks were asked about OSM they were dismissive in a rather hostile way. Statements like “30 inches matter in street maps and can be the difference between life and death. Do you want to trust open source data for this” or “we spend millions of dollars creating the worlds most accurate data - you can’t duplicate that with a community of volunteers”. The entertaining response from open source folks was “how many times has a proprietary street map led you to the wrong place, not had your house on it, missed a road etc.”….”how successful were you in getting that changed in the system”.

The interesting bit was the proprietary enterprise folks singing an O’Reilly mantra “Data is the Intel inside” for Web 2.0 and the future. “All those applications that use publicly available data are gonna be sunk.”  While I agree that data and content are going to huge drivers of value I do not believe that walled gardens of proprietary data is going to be the future.

Data benefits from Metcalfe’s law just as any application or technology does.  The power of one data set pales in comparison to a network of data sets that can be mashed together to create new datasets and relationships.   Also using a community to police data has all the benefits of open source software.  The big difference is that open source data creates a much larger community than open source software.  The number of people who can work code versus the number of people who can use and look at data creates a whole new game.  Blogs and wikis started the two way conversation, and there no reason with some hard work and innovative technology this cannot be extended to data.  OSM is a great example of early success as is Swivel.  The acquisition of GapMinder by Google also points to this future.  GapMinder is all about extracting public open source statistical data and making it available to the masses.

I think this is really at the heart of why people are struggling to find lasting value in mashups, especially map mashups.  You end up with a lot of proprietary data silos that are not connected.  I can look at tons of different map mashups with different data but there is no way to combine them to solve problems.  The more data you can interconnect the more problems you can solve and the emphasis and value comes in the technology and not walled gardens of over priced proprietary data.

Popularity: 6% [?]

A heckuva lot is being written these days about the blockbuster or lackluster fundraising performance of the presidential hopefuls, as well as, in the cases of Obama and Clinton, the numbers of individuals giving to their campaigns. It seems strange, though, that little has been said about where the individual support for each candidate is coming from.

The FortiusOne data team decided to see what they could find on this question, and discovered some data from the FEC on total campaign contributions raised from individuals, by zip code, in the fourth quarter of last year.

After a less-than-straightforward processing effort, the team created a shapefile from this data and loaded it into GeoCommons, where the GIS-challenged like myself can then have at it. I created a couple of heatmaps to see from where Republican candidates John McCain and Rudy Giuliani are drawing their grassroots support.

Not surprisingly, the map below shows that most of Guiliani’s individual contributors for Q4 ‘06 live in New York City, with another cluster in Dallas.
Individual Contributions - Giuliani’s Campaign Q4 2006

McCain’s Q4 ‘06 individual support, in contrast, is coming from a much more geographically diverse base, as you can see on the next map.
Individual Contributions - McCain’s Campaign Q4 2006

We have just begun to tap the surface of the FEC data, and are hoping to get maps of Clinton, Obama and Romney’s individual contributor support bases next. It will be cool to look at these alongside the demographics of their support bases too. So, stay tuned for more on the geography of campaign finance!

Popularity: 6% [?]