Structured Feature Data in KML, part one

June 6th, 2007by Chris Ingrassia


About the Author:  To change this standard text, you have to enter some information about your self in the Dashboard -> Users -> Your Profile box. Read more from this author


With the official launch of GeoCommons at Where 2.0 last week, along with some of the recent buzz surrounding the OGC’s involvement in the future of KML, I thought it would be prudent to outline some of FortiusOne’s thoughts and concerns on some of these topics.

Specifically, the use (or lack thereof) of structured feature data in KML, how we’re approaching the problem now, our short term suggestions on how to maintain the KML spec, and some of our ideas for the future.

Since there is a lot to cover, this is the first post in what will likely turn into a two or three part series on the topic.

Technorati Tags: , ,


What is a feature, anyway?

Off the bat, let me provide my simple definition of a “feature” in the context of these posts in the hopes of avoiding an argument over what the canonical definition of a geographic feature is.

I am defining a feature as, simply, a location with attributes.  So, something like “POINT(0 0)” is a location, something like “POINT(0 0)” + “Cool factor = 100” would be a feature.

Why do we need features?

Simply stated, because you need that information in a format that can be quickly and easily parsed by a machine (that “machine” part is important, more on that in a bit) to assist you in digging deeper into your locational data.  This is one of the fundamental concepts that GeoCommons is built around.  Knowing where a set of things are is step one, but knowing why they’re there, how they relate to other locations, and allowing you to dig deeper and assisting you in reaching some meaningful conclusion, learning something you didn’t know before, or making a decision usually requires a bit more information.

Haven’t other people been doing this for a while?

Well, yes and no.  There are several KML files floating around that will have attribute information packed into the description field.  Unfortunately, these are written more for presentation that indexing and analysis and are not uniform (in general) across multiple KML files and not machine parsable in a generic fashion.

Another common practice is to use the z, or altitude coordinate in 3d space to rank your features by a particular attribute.  I have to give credit where credit is due — this is a pretty clever way of classifying your locational data, specifically for display in Google Earth.  There are several downsides to this, though.  Namely:

  • If you want to classify by multiple attributes, you’ll need to generate separate KML files, or duplicate your geometries in other Placemarks/folders.
  • If your data is actually three dimensional, this technique won’t be terribly effective
  • It’s difficult to determine (again, for a machine), just what that z coordinate means in a generic way.  It’s great for me as a human because I can see a fairly intuitive 3d representation of locations, read the description to figure out what it’s ranked by, and go my merry way.

That “machine” part is especially important, and especially for us, because it makes it considerably harder to extract information out of your geodata that can be used in any sort of deeper analytical tasks.

The presentation problem

The examples I outlined above bring up a very significant point.  Embedding data beyond locations inside of KML has taken a very presentation-oriented slant in the larger community of KML publishers.  This isn’t surprising, and indeed speaks to the power and ease of use that Keyhole, and later Google brought to the table in the form of Google Earth, Google Maps, and the KML spec.

I do not think that this aspect should be compromised or downplayed at all.  While the presentation/styling information embedded into KML is not of great immediate concern to what we’re doing at FortiusOne, and specifically with GeoCommons, that’s not to say it never will be, and the beauty of the communities that have sprung up around google maps and google earth that are publishing KML is that geographic information can be portrayed in a very intuitive, visually appealing way.

It’s a bit outside the scope of this initial post, but in my next one I plan on offering some suggestions for how the data and presentation could be unified in a way that makes it relatively easy for everyone (hopefully).

The Schema tag is dead.  Long live the Schema Tag!

Luckily enough, there is a great way of dealing with feature data in KML right now, through the use of the KML schema tag.  Let me just come straight out and say that I think the schema tag is a thing of beauty.  It is pretty much the ideal way of representing structured feature data in KML without having to give up too much in return.  And this largely has to do with two key points that are crucial to the success of any further standardization efforts in this context:

1. It’s EASY
2. Everything is in one file, which doesn’t require me to pull additional resources or have prior knowledge of the custom structure, which makes it EASY
3. Did I mention it’s easy?

And by easy, I mean both for humans and machines.  As a regular human being, it’s pretty easy for me to open up a KML file with a text editor and add in schema information.  Or, if I’m a slightly less regular human being who perhaps writes a bit of code on the side, generating and parsing schema-KML is pretty simple.  No external schema files to pull in/validate, the syntax is simple and intuitive, and it gives me just about everything I’m probably going to need the vast majority of the time.

And I really can’t stress enough how important ease of use is.  It is definitely worth it to me to sacrifice some flexibility or extra whiz-bang features to be able to have a format that is easily consumable and usable by as many people as possible.  Without this, it raises the barriers to entry and adoption to the larger community, and that’s not good for us, and I don’t think it’s good for anyone interested in publishing or using geographic data quickly and easily.  If the “standard” way of doing things is too complex or doesn’t make sense to the larger community, this will lead to two things, in my opinion:

1. Nobody will use it
2. Somebody will come up with their own standard that is easier to use, and you’re going to end up with the usual dichotomy of the “serious” folks with their one format to rule them all that only a handful of people use and nobody else feels particularly inclined to deal with, and the leotard-wearning neogeographers who will independently come to some sort of consensus on a better way of doing it that will be made fun of by the other camp, but will ultimately see larger and quicker adoption.

For those of you unfamiliar with the use of the Schema tag in KML, it simply allows you to extend the Placemark element to add your own fields.  So, if I wanted to publish a KML file with feature data relating to my albino pigeon research, I could do this in KML thusly:

<kml …>
<Document>

<Schema parent=“placemark” name=“AlbinoPigeon”>
  <SimpleField name=“Freakiness” type=“int”/>
  <SimpleField name=“email” type=“string”/>
</Schema>
<AlbinoPigeon>
  <name>Bob</name>
  <description>Bob, a moderately freaky albino pigeon who also has an email address for some reason</description>
  <Freakiness>12</Freakiness>
  <email>bob@albinopigeon.org</email>
  <Point>
    <coordinates>-77.065285,38.904343,0</coordinates>
  </Point>
</AlbinoPigeon>

And I now have basically your plain, average KML file suitable for viewing in Google Earth or sharing with other albino pigeon fanciers, with the added side benefit that it is now easy for a machine to extract some of the feature data to perform some sort of analysis on albino pigeon freakiness (and who wouldn’t want a heatmap of albino pigeon freakiness?), or, I could relatively easily write a script to pick out the top 10 freakiest albino pigeons and send them emails (they probably wouldn’t respond, though).

Now, as I said, I think the Schema tag is a thing of beauty, because it makes it so simple to leverage the extensive communities already publishing locational data in KML without requiring anyone to do too much more work than they already are.  But, I also realize it isn’t perfect.  Google Earth will display your custom fields in the markers that get displayed in a Schema-based KML file, but it’s still up to you to make a pretty-looking description if you want, and if you want to include some of the data in your custom fields, you’ll have to duplicate the information which isn’t ideal.

However, given that this exists right now, I see it as a perfectly valid short-term solution, and would very, very much like to see it not deprecated, or at least not removed from whatever the next version of the KML spec ends up being (either from Google, the OGC, or both).

Other concerns and suggestions

There are other things that are not completely ideal with the Schema tag solution.  For one thing, as far as I know, GeoCommons is the only application around that is actively using it and publishing Schema-based KML (if there are others, I would love to know about them, please comment).  This makes getting data that is already in this form a bit more difficult, but one of the things I would like to achieve here is trying to convince people that something along the lines that I’ve just been talking about is valuable right now, and is worth using more extensively either because of the sorts of things it allows you to do with your data in GeoCommons, or in other applications that I hope will spring up independently or around GeoCommons.

Additionally, you are somewhat limited in the complexity of your custom tags.  You are limited to simple, flat structures, and the only information you’re going to know about your custom fields is what you’ve named them, and what type they are.

I would argue that this covers the vast majority of cases for what you really need initially.  But I also realize that sometimes you do need a little more.  My suggestion would be to turn away from the path of requiring externally defined custom schemas/namespaces pulled into KML as this breaks the whole “it’s easy”/ease of adoption/only have to parse a single file ideal.

What would perhaps work better is if, in the cases where there is more information, context, meaning, etc. that you need or want to convey on top of what is possible in a simple schema tag-like structure, that’s when you go to the external schemas or require a bit of extra legwork.

So, for example, don’t do this (taken from a Google developer knowledge base article):
<Placemark>     
    <name>Ocean Temperatures off Marin, CA</name>
    <Metadata>
      <obsList>
      <obs>
        <obsType>air_temperature</obsType>
        <uomType>celsius</uomType>
        <value>19</value>
        <elev>1</elev>
      </obs>
      <obs>
        <obsType>water_temperature</obsType>
        <uomType>celsius</uomType>
        <value>13</value>
        <elev>-10</elev>
      </obs>
      </obsList>     
    </Metadata>     
    <Polygon>

If I wanted to use this data in a generic geodata related application (say… GeoCommons), it would require the software parsing it to know ahead of time what obs was, and would require custom parsing to be done for each separate type of metadata structure.  Yes, if you embed a link to the obs schema in your document, use XML namespaces, etc, it is technically possible for a machine to extract at least most of the information automatically.  But this is an awful lot of extra work to be going through, especially for a casual developer or KML publisher, when most of what I really need to get across is probably simply that the air temperature at that point in the ocean is 19 degrees celsius, and the water temperature is 13 degrees celsius.

Instead, I would suggest a less intrusive use of the flexibility that XML and XML namespaces can give us:

<Schema parent=“Placemark” name=“OBS” xmlns:obs=“http://carocoops.org/obskml/1.0.0/obskml_simple.xsd”>
  <SimpleField type=“int” name=“air_temperature” obs:type=“air_temperature”/>
  <SimpleField type=“string” name=“uom” obs:type=“uomType”/>
  <SimpleField type=“int” name=“elev” obs:type=“elev” />
  <!– water_temperature stuff left out because I’m too lazy to type it –>
</Schema>
<OBS>
  <air_temperature>19</air_temperature>
  <uom>celsius</uom>
  <elev>1</elev>
</OBS>

That’s not a perfect example, but what it does give you is a reference back to additional information about data that falls under the obs domain if you want it, and is not terribly far off from what we’re doing now.  I do not think it’s the ideal long-term solution, either (more suggestions on that in the next post).  However, I think it’s a good starting point for discussions of how to keep everyone happy.

And that’s it for the first post in the series.  In my next one, I will offer some additional suggestions and thoughts on longer-term solutions to some of the issues mentioned above, and go into more detail on some of the topics I glossed over in this post.

I would encourage anyone who made it this far to not hesitate to share your thoughts and comments on anything I just said, these are just my thoughts on the topic in the context of what we’re doing here at FortiusOne and I would be very interested to hear other’s perspectives.

Popularity: 7% [?]

7 Responses to “Structured Feature Data in KML, part one”

  1. Cam W.No Gravatar Says:

    KML has always struck me as the PowerPoint of geo-spatial data exchange. It’s an outstanding format to use when you want to ‘present’ data to others quickly and easily. If you want to exchange data you might be better off with a different data format. To further the analogy would you store tables of data in a PowerPoint file? You could, but I wouldn’t recommend it.

    The million dollar question is what data format should you use to store the data. I really think GeoRSS (another format working it’s way through the OGC) is better suited to the job, but I fear it has fractured into too many sub-formats already. I believe there is something like 6 different versions. I’m curious to see what you like to see for a long term solution.

  2. chrisNo Gravatar Says:

    Thanks for the feedback, Cam.

    Some of the things I’m writing up to propose as better long term solutions lie somewhere in the middle of fully structured geodata and things like GeoRSS or just plain ‘ol XHTML in the description field. I hope to have that post written up in the next couple days and hopefully have it spark some interesting discussion.

    I actually agree with you fully re: KML being the “powerpoint of geo-spatial data” — but that’s kind of the kicker. At least for our purposes, we need something that is widely accepted and used, otherwise we’re asking our users to convert their data into something that they might not have used otherwise, and the more work that is entailed, the harder it is to see widespread use and adoption, and we do really feel that making the data and the analysis as open and accessible to as many people as possible benefits everyone.

    I guess the thing to walk away with from that post is essentially “Presentation is good, feature data is good, leveraging a widely-adopted format like KML has many benefits, and we’re not too far off from having perhaps not the best of both worlds, but quite enough to do 90% of what we need 90% of the time quickly and easily.”

  3. Richard KNo Gravatar Says:

    Chris,

    Bravo on your statement!: “it’s easy/ease of adoption/only have to parse a single file ideal.” if OGC gets it right, I can see KML becoming the SINGLE file that could replace all the files that make up a shapefile, that ubiquitous GIS format all GIS pros are familiar with.

    The .SHP, ,DBF, .SBN, .SBX, .SHX, .PRJ, .XML, .AVL, and .LYR. Just imagine it! Geometry, attributes, the index between the two, metadata, and symbology all wrapped into one! Talk about REAL geopublication!

    Now the geodatabase may bring this eventually and I believe its been deemed an “open”, but I have yet to see anything of real “WOW” potential yet.

    I’ll be following your posts. Keep up the good work.

  4. Ron LakeNo Gravatar Says:

    Hi,

    This is a most cool discussion. Please see the blog rollover at http:www.geoblog.org.

    We are evaluating point GML inside the KML Metadata tag as means of structuring the content. Very simple approaches are suggested to start with – i.e. GML Application Schemas that are flat (e.g. simple properties) and which may not even use geometry or other core elements of GML unless you want them). Why do this? Well it constrains the XML Schema so that it is very simple – GML schemas (hence the data) just look like Named property lists.

    3
    gravel

    This would easily ride along as a KML payload and like KML also can use xlink:ref to reference other content.

  5. Ron LakeNo Gravatar Says:

    Last blog did not work – let me fix that:

    The escaping mechanism for angle brackets eluded me ..

    So that 3 gravel should read without the angle brackets.

    Road gml:id=”t21″
    numberLanes 3
    surfaceType gravel

    AND I mean including GML within the Metadata tag.

    Sorry for the creaky posting!

  6. Ron LakeNo Gravatar Says:

    NOT A GOOD DAY – but the GOOD NEWS is the GeoWeb 2007 (http://www.geoweb.org) blog roll is up at http://www.geowebblog.org).

  7. Random Nodes » I Heart KML Schema!: Jason Birch's geospatial ramblings Says:

    [...] Chris over at FortiusOne has written a couple articles, focusing largely on the Schema tag in KML. One thing in particular from this article got [...]

Leave a Reply