Structured Feature Data in KML, part two

June 27th, 2007by Chris Ingrassia


About the Author:  To change this standard text, you have to enter some information about your self in the Dashboard -> Users -> Your Profile box. Read more from this author


In my last post I started making a case for maintaining a simple way of publishing simple feature data in KML and talked up the KML Schema tag quite a bit.

The key requirements I had for any such endeavor going forward were:

  • Simplicity
  • Everything you need to get the basic feature data out should be in one file
  • Simplicity
  • We shouldn’t discount the benefits of being able to embed styling logic into a geographic data interchange format
  • Simplicity

In this post, I intend to put forth a suggestion that I think represents a good combination of the points outlined above, and still allows for KML publishers and consumers to go about their business as usual, should they not have any need for more complex feature data into their documents. Additionally, I think the sort of approach I’m about to suggest could have wide ranging applications across multiple other formats as well, which is an added side benefit.

Read past the jump for the full writeup, but my proposal centers largely around the use of microformats.


I’m pretty new to microformats myself, but have been keeping an eye on them for a while. In short (and I trust any of the more informed members of the microformat community out there to correct me if I’m getting this wrong), microformats provide a way to embed structured data in standard formats, primarily through the extension of the HTML schema. Or, to use an excellent phrase from the official “About microformats” page — “Designed for humans first and machines second…,” which I quite like. I encourage anyone who hasn’t already to go check out the content on their site, there’s quite a bit of it, and I found it to be very informative and well written.

There is actually already a geo microformat, which is primarily for the representation of spatial coordinate data (as far as I can tell), and ties in nicely with the relatively widely used hCard microformat.

As I’m talking specifically about the representation of feature data in KML here, the coordinate data part is pretty much taken care of in this context, but would still be extremely useful in other contexts, or if you wanted to copy the description of a KML feature into a non-KML document, although such examples are probably outside the scope of this post.

So why microformats? Well, myself and some of the talented engineers here at FortiusOne have been thinking about this problem for a while. Specifically, how to allow the vast community of KML publishers out there already to add more to their data, without having to completely change the way they operate, and so that the existing tools that consume KML (Google Earth, I think I remember hearing about a little thing called GeoCommons that might do something with it as well) would still work without any modification, but would require only slight modification (hopefully) in order to support the richer format.

This sort of approach could also still work with data that requires more complex structure, as you can still embed it at a different level of the document to provide additional information or context to your features, but could still leverage the simpler representation as well so that your data could be consumed by the widest variety of clients possible.

So, enough chit-chat, let me dive into some examples. Again, I hope that anyone more in the know than myself can jump in and correct me on any of the finer (or perhaps not so fine) points in here that I might be off on. I would very much like to solicit some feedback from the community on this and publish it as a new microformat specification, but I want to make sure I’m doing things correctly according to best practices/style/etc., as well as making sure that there’s at least a relatively good chance that somebody will use it if given the opportunity, otherwise this is all for naught.

A quick, simple example

As a hypothetical example, let’s say I had a list of people’s locations who expected the Spanish Inquisition (this is obviously hypothetical since nobody expects the spanish inquisition) that I wanted to publish in KML.

I might start off with something like this:

…
<Placemark>
<name>Nobody</name>
<description>
This person was expecting 3 inquisitors, led by Cardinal Richelieu to employ their chief weapons of surprise, fear, and ruthless efficiency. The use of the comfy chair was also a possibility.
</description>
<Point>
<coordinates>40.498951,-74.684669</coordinates>
</Point>
</Placemark>

…

Now, if I wanted to use the KML Schema format to do so, it might be something like:

<Schema name=“ExpectsInquisition” parent=“Placemark”>
<SimpleField name=“NumInquisitors” type=“int”/>
<SimpleField name=“Leader” type=“string”/>
<SimpleField name=“ChiefWeapons” type=“string”/>
<SimpleField name=“TortureDevice” type=“string”/>
</Schema>
<ExpectsInqusition>

<name>Nobody</name>
<description>
This person was expecting 3 inquisitors, led by Cardinal Richelieu to employ their chief weapons of surprise, fear, and ruthless efficiency. The use of the comfy chair was also a possibility.
</description>
<NumInquisitors>3</NumInquisitors>
<Leader>Cardinal Richelieu</Leader>
<ChiefWeapons>surprise, fear, ruthless efficiency</ChiefWeapons>
<TortureDevice>comfy chair</TortureDevice>
<Point>
<coordinates>40.498951,-74.684669</coordinates>
</Point>

</ExpectsInquisition>

Which is all fine and good, works right now, and if you had a collection of such features, you could upload them to GeoCommons right now and share maps with everyone based around the sorts of locations where people were expecting the spanish inquisition.

As I mentioned before, I love the schema tag. It makes things easy, you can parse everything in a single file, and it’s there and working right now.

However, there are some problems with the above KML fragment. Namely, if you’re intent on using it as a presentation medium and all of the text in your description tag is important for that purpose, even though it might not all be attributes that make up the feature, you’re duplicating data in your document, which is generally undesirable.

So, let’s try this with a microformat approach:

<Placemark>
<name>Nobody</name>
<description>
<div class=“feature”>
This person was expecting <span class=“integer-value” title=“NumInquisitors”>3</span> inquisitors, led by <span class=“string-value” title=“Leader”>Cardinal Richelieu</span> to employ their chief weapons of <span class=“string-value” title=“ChiefWeapons”>surprise, fear, and ruthless efficiency</span>. The use of the <span class=“string-value” title=“TortureDevice”>comfy chair</span> was also a possibility.
</div>
</description>

</Placemark>

And now I have my feature data embedded straight into my HTML with only some minor modifications and no duplication of data. There are some additional design patterns that could be followed as well to take care of potentially more complex examples. Let’s say, for instance, that the value for “NumInqusitors” is the integer “3”, but I want the text to have it written out as “three.” In that case, if I’m understanding microformat and semantic HTML etiquette somewhat correctly, part of that previous example could be rewritten as:

<span class=“integer-value” title=“NumInquisitors”><abbr title=“3”>three</abbr></span>

Actually, the nested span/abbr there might not be the correct way of doing it, but it should be close.

And if you wanted to do some even crazier stuff, such as provide additional data/structure/etc. above and beyond simple name/value pairs for features, you should be able to use some combination of existing KML, or your own data packed into something like the Metadata tag, linked into the microformat pieces via id references and using the microformat include-pattern or something similar.

Really, you’re getting a whole lot of flexibility and power to expand the use of your data without giving up much at all, and not cutting off too many other paths.

Certainly, the examples above are not fully complete and probably not 100% correct within the context of microformat best practices, but I think this sort of approach definitely has a place in the future of geodata markup and conforms very well with the patterns already in use by the larger community.

One of the potential issues with the specific examples I gave above that I would expect somebody out there to call me out on is that the way I’ve written it, you won’t know for sure which attributes are present in the features until you parse all of them, or at least run some sort of XPath query over the document to figure it out. I probably don’t have as much of a problem with this as others might, but I can certainly see some very valid arguments against doing things this way. To solve that problem, a hybrid Schema/microformat approach might be called for, such that at the top level of the document I can define my schema using the Schema tag, and use that definition to more easily gain some knowledge about the feature structure I can expect later in the document. The problem I have with that is that if people get used to doing things that way, it becomes a little bit harder to extract microformat KML descriptions and place them elsewhere, even though it would still work if you didn’t require the schema to be predefined.

I would very much like to entertain some discussion and argument over all of this with interested members of the community in the hopes that we can either decide that this is just an awful idea and not waste any time with it, or, I hope, nail it down a little more and get to the point of being able to use it as quickly as possible. So please leave comments or otherwise respond if you have any thoughts or input into the matter, we’d love to hear them.

Technorati Tags: , , ,

Popularity: 8% [?]

9 Responses to “Structured Feature Data in KML, part two”

  1. Bill ThorpNo Gravatar Says:

    I’ll get this one out of the way. I won’t comment on your idea, but your example is certainly bunk. The RFC 2426 3.4.2 GEO Type Definition says:

    “The value specifies latitude and longitude, in that order (i.e., “LAT LON” ordering). The longitude represents the location east and west of the prime meridian as a positive or negative real number, respectively. The latitude represents the location north and south of the equator as a positive or negative real number, respectively. The longitude and latitude values MUST be specified as decimal degrees and should be specified to six decimal places. This will allow for granularity within a meter of the geographical position.”

    While this clearly specifies a coordinate system, and even a easting and northing, this does not specify an ellipsoid (and therefore spatial reference system). You’d need to know the SRS (NAD83, WGS84, etc.) for lat/long *not* to be meaningless. That “one meter” reference may be true given a particular SRS, but is totally off otherwise. If your standard clearly defines this, fine, but generally this is why we don’t look to Netscape for geodata standards.

    http://www.sharpgis.net/2007/05/05/SpatialReferencesCoordinateSystemsProjectionsDatumsEllipsoidsConfusing.aspx

  2. Bill ThorpNo Gravatar Says:

    I hastily said SRS here a couple of times when I should have said datum, just to simplify what I was saying. The point is, you need more than lat/long.

  3. chrisNo Gravatar Says:

    Bill,

    Thanks for the comment. Although I’m not 100% sure I’m on the same page here. The coordinates given in my examples were intended to be KML document fragments, and while it’s quite possible that I screwed something up in my haste, they appear to be more or less correct according to KML. The examples were intended to show the use of feature markup inside of KML, and not the markup of locational/coordinate data specifically.

    I could probably launch into a rant re: projections/SRS/etc. but this probably isn’t the time or place.

    I might have also completely misinterpreted your comment, if so, please step in and correct me.

    -Chris

  4. Matthew PerryNo Gravatar Says:

    “”"We shouldn’t discount the benefits of being able to embed styling logic into a geographic data interchange format”"”

    The flip-side, of course, is that you shouldn’t discount the benefits of keeping styling logic *separated* from data. For example being able to easily reclassify data and apply the same styling rules to other datasets.

  5. Bill ThorpNo Gravatar Says:

    I agree that microformats seem useful for human readable text with embedded machine readable information. I only meant to say “be careful what corners you cut when making complex things human readable.” Sometimes metadata is better off hidden. Just imagine the public running into something like:

    “The dog is located at 40.498951,74.684669in the spatial reference system of EPSG:4326″.

    Mostly I just wanted to make fun of that RFC.

  6. mookieNo Gravatar Says:

    I think this is an interesting suggestion that may have some very seriously detrimental effects if it catches on.

    I wrote up a little response here:

    http://www.fortiusforge.com/2007/6/28/kml-with-microformats-what-would-we-lose

    Let me know what you think.

  7. chrisNo Gravatar Says:

    Guess I should at least chime in briefly here on some of the comments…

    Firstly, there’s definitely going to have to be a “part three” of this now, as this post clearly raised a lot of questions and has generated more confusion than I intended (although that’s not all bad).

    Despite the fact that I was alright with the examples and content of the post after reading through it a few times and publishing it, I have since decided I really don’t like it, not because I think the premise I put forward is invalid, but because I don’t think I gave enough context, background, and explanation.

    In response to Matthew’s comment re: presentational data alongside strictly geographic data — I agree. One of the other things I had intended to bring forward (which will probably make its way into part three) is that if we’re going to have potentially multi-file/more complex parsing logic or various external, potentially complex schemas to deal with, they should be kept on the presentation side as much as possible — your average “styling client” — if you will, is running client side, is probably doing a bunch of stuff behind the scenes anyway, and has not nearly the problems that, say, a server side process handling requests for gobs of simultaneous users would have in needing to parse multiple documents to extract what it needs out of any given set of data.

    That being said, I’m still very strongly advocating that everything either side of the fence really needs, strictly speaking, should be contained in a single file.

    In response to Mookie, Re: “what we lose with microformats” –

    I’m pretty much in agreement with a lot of what you said, which I see as a failing of my original post, although I’m not sure I’m quite catching the point about Cardinal Richelieu losing his identity. My examples could have been better, but if I use my microformat example and change “Leader” to “Inquisitor” (which doesn’t really break the meaning of the feature, whatever that might mean in this context), I can still XPathily deduce that there are several Inquisitors with parent features in the document, and what their names are.

    This does, however, lead to the potential problem I cited in “…you won’t know for sure which attributes are present in the features until you parse all of them, or at least run some sort of XPath query over the document to figure it out”.

    Clearly, this isn’t the most efficient way of doing things from many points of view. And I would be fully in favor of a hybrid Schema/Microformat/Something else approach, although I’m still a little wary of requiring the hybrid approach.

    The rationale behind this is that marking up these features should be about as simple and brainless as possible in the majority of cases, and not potentially break existing workflows where that is unnecessary. The beauty of microformats is that you’re still pretty much just making HTML, but just like you might italicize a word for emphasis, you can similarly throw another tag in there to… umm… “featurize” a geographic location you were textually describing already anyway.

    I’m also fully aware we’re still a ways off from having a solid solution, and I’m basically on the verge of changing my mind about the requiring of an explicit schema definition, but I haven’t quite jumped off that ledge yet :)

    Thanks for all the feedback and comments so far, keep ‘em coming!

    Incidentally, I submitted a proposal for FOSS4G earlier today that uses my last two blog posts on this topic as a starting point for discussion, so if you’re interested in exploring the topic further, you should be able to review and vote for it soon for a chance to hear me blab incoherently in person!

  8. Random Nodes Says:

    I Heart KML Schema!

    Recently, Chris over at FortiusOne has written a couple articles, focusing largely on the Schema tag in KML. One thing in particular from this article got me thinking:
    Google Earth will display your custom fields in the markers that get displayed in a…

  9. Mac ConnerNo Gravatar Says:

    The point is, you need more than lat/long.
    Been looking for a blog like this one for a while.
    The rationale behind this is that marking up these features should be about as simple and brainless as possible in the majority of cases.

Leave a Reply