A Proposal for a GeoWeb Metadata Implementation
April 1st, 2008by Sean Gorman
One of the criticisms we received when we launched GeoCommons was the lack of metadata for the content we had collected. Since then we’ve been looking into what would be a reasonable approach to implement metadata for the GeoWeb.
When it comes to GIS data the existing standard is the FGDC’s Content Standard for Digital Geospatial Metadata (CSDGM). The standard calls for 335 metadata elements to describe a geospatial data set, which covers a wide variety of descriptions for the data. The one thing that came clear very quickly was that the FGDC CSDGM is far too onerous and outdated for the GeoWeb. For instance in the FAQ provided by the USGS they recommend you hire a full time person to create your CSDGM compliant metadata:
Who should create metadata?
“Data managers who are either technically-literate scientists or scientifically-literate computer specialists. Creating correct metadata is like library cataloging, except the creator needs to know more of the scientific information behind the data in order to properly document them. Don’t assume that every -ologist or -ographer needs to be able to create proper metadata. They will complain that it is too hard and they won’t see the benefits. But ensure that there is good communication between the metadata producer and the data producer; the former will have to ask questions of the latter….”
In a GeoWeb where self publication is a key innovation the model of having a full time metadata guru is antiquated. A specification with 335 elements is antiquated. The mantra that “certainly if there is no pain, there will likely be no gain” when it comes to metadata is antiquated. The end result of these draconian approaches to metadata is about a zero likelihood the GeoWeb will implement them.
This is a shame because metadata is very useful, especially when it comes to describing, finding and federating data. This is one of the shortcomings of KML – little/no metadata (although several argue it has no place in either of these formats). GeoRSS has limited metadata support with “Feature Type Tag” and “Relationship Tag” which are useful, but fairly confined.
The question we faced with rebuilding GeoCommons – is there a middle ground between 335 elements and two elements? Fortunately we were not the first to look at this issue. In 1995 a bunch of librarians got together to devise an approach that “provides a simple and standardized set of conventions for describing things online in ways that make them easier to find”. The fifteen elements standard they devised is called Dublin-Core and is widely implemented across the web. If the librarians could come up with 15 core elements then surely the GeoWeb can, and even make those map to the Dublin-Core standard and the FGDC CSDGM standard. So, after a good bit of work here is what we would like to implement as a lightweight core set of metadata for GeoWeb data:
This covers seventeen elements about half of which we trap automatically. You can map them to either FGDC or Dublin Core thus giving you the ability to expose your data to the GIS world and general web community in a straightforward manner. As with any metadata standard you do not need all seventeen elements, but the more you populate the more useful the data becomes. The metadata could be exposed as microformats enabling a number of possibilities for discovery and potential federation. This could be particularly interesting with Yahoo! opening up their search to support Dublin Core vocabularies and microformats. Our feeling is that the more data we can make available on the web the more problems everyone can solve. We’ll be testing this out when we launch the next iteration of GeoCommons at Where 2.0 and would be great to get feedback and thoughts on the approach.
Popularity: 16% [?]
Semantics, Semantics, Everywhere, Nor Any Drop to Drink
March 20th, 2008by Sean Gorman
It seems like it is a daily dose of semantic web on the tech blogs of late. Today it was Textwise’s Million Dollar Semantic Hacker Challenge and a few days ago it was Yahoo opening their search platform to support a wide variety of semantic web standards. This has lead to a good bit of proselytizing, mostly in the comments, that this heralds the arrival of the Semantic Web, or Web 3.0 or the Next Generation Web. All of which sounds like the circling of the marketing band wagons.
Unfortunately when the wagons circle everything starts picking up the label – in this case semantic. This is especially dangerous when you have a word like “semantics” that can be defined, so many different ways. Just look at the definition tree created by Wikipedia:
*Semantics is the study of meaning in communication.
*In computer science semantics reflects the meaning of programs or functions.
*The Semantic Web refers to the extension of the World Wide Web through the embedding of additional semantic metadata
More often I see folks labeling things semantic that are really syntax. “Syntax” being the rules to construct and define something like a sentence or line of code and “semantics” the meaning of those rules or definitions. Syntax is fairly easy and semantics are fairly hard, as most folks in artificial intelligence would argue. Even going so far as saying all programming languages other than LISP are syntax and not semantic.
This is a bit more clear with an example. Lets take the Textwise announcement – a technology that will parse plain text on a website or elsewhere and categorizes it to predefined topics. One example in the Techcrunch comments was the following:
input text:
Call us crazy, but we think there are some brilliant minds out there that can find some really amazing uses for this incredibly powerful and scalable technology. Think you’re up to the Challenge? We think you are!
categories (ranked from 0 (worst) to 100 (best)):
Shopping/Health/Alternative/Hypnotherapy/Audio_and_Video 43 Business/Telecommunications/Services/Wireless/Software 33 Arts/Music/Bands_and_Artists/311/Tablature 28
Computers/Internet/Consultants/Research 26 Shopping/Health/Alternative/Meditation/Audio_and_Video 25
The output is really not telling me anything about the meaning of the text just setting up rules to provide categorization. So I would definitely put this in the syntax and not semantic category. I would also say what Yahoo! is doing is really more syntax than semantics although there is the possibility of building truly semantic technologies on top of what they are enabling. They’ve created a set of rules based on rich standards to allow applications to be built. Remains to be seen what will come of it, but in rush of market buzz I think it is easy to miss that building truly semantic technologies is quite hard. Some folks in AI (the Chinese room) would argue machines are not even capable of semantic meaning or understanding.
From this perspective I think we’ll see a lot of people building applications based on syntax that reorganize and categorize content by giving the “page web” a bit of structure. Oddly its like we’ve gone full circle back to DMOZ. While these technologies may be clever and useful I do not think they will fundamentally change the Web. In the other category I think we’ll see a few companies pushing towards something more sophisticated (call it a semantic, implicit, computational web) where new data and services are mixed with existing web content to provide answers to users questions.
Popularity: 15% [?]
Are Push Pins Inescapable?
March 12th, 2008by Sean Gorman
It is only fitting that the day after I posted “Moving Push Pins Off the Map” I saw the post on Ogle Earth about a new geotagging icon….which is?

A GIANT PUSH PIN!
With my interest peaked we did a little digging and found another geotagging icon:

ANOTHER GIANT PUSH PIN (actually when I dug into it this icon was a first version that evolved into the red one.)
I of course blame this all on the Google monolith for perpetuating push pin mania. Last time I saw Mike Jones he even had a push pin tie tack. Joking aside the reason for creating a geotagging icon itself is worth discussing.
The stated purpose on the GeoTagIcons.com website is “The Geotag Icon is intended as a web “standard” icon for identifying geotagged content to humans.” So, if a photo or blog post has been geotagged then there is an icon on it to let you know. The thought being many times geotags are hidden in microformats or the URL, thus not visible to the user.
This seems like a straight forward approach to the problem, but also seems to have overlap with existing icons such as KML and GeoRSS. The tutorial on GeoTagIcons has examples of using it for links to both KML and GeoRSS content. This could lead to some ambiguity and confusion for users.
One of the most interesting parts of the pitch for using the GeoTagIcon is, “Reason 4: It encourages development of the semantic web”. On first blush this got me excited, but reading a bit deeper realized they meant it acts as an advertisement for linked content that could help support an evolving semantic web. This is in and of itself is a worthy cause and advertising has been directed at far less useful goals.
The link between geotagging and the semantic web does bring up a good topic for debate. How will all these geotagged objects (KML, GeoRSS, geo-microformats, GPX, etc.) be tied together in a method that creates semantic meaning? What questions will the semantic technologies answer? The GeoTagIcon site provides an example of , “Show me a plot of other bloggers in my vicinity”, or “I’d like to see a map showing which of my friends have also visited Australia”, “Who else has photographed this location?”, etc.
While these are interesting I think the examples and the direction many folks are taking geotagging misses the real potential of the semantic web. The geotagging premise is based on doing increasingly sophisticated things with geo-coded annotations – 99% of the time taking the form of a pushpin. In each of the examples above users or a screen scraper and geo-coder (most likely) have added a latitude and longitude to a piece of unstructured data (bloggers, my friends, photos). While this all useful information it is often relegated to only answering trivial questions.
There is only so much you can do with a bit of unstructured text or html that has geographic coordinates. You can measure vicinity (bloggers nearby), intersection (friends that have visited Australia) and union (show me all photos from a location). There might be a few that I am missing but it is fairly small universe of questions that can be answered, and the semantic web is all about answering questions. Hopefully a very large universe of questions.
From my limited perspective the semantic web is all about bringing vast data resources to the web in an easy and intuitive way. While turning unstructured text into geocoded annotations already on the web is important I think the bigger challenge is blending existing structured data (largely in databases and not on directly on the page web) with organized unstructured data through the web in a seamless way like we have for text, pictures and video.
Metaweb has done some compelling work with Freebase. They have even been doing some interesting geo work with their database. To date Freebase has largely been working with conceptual data, but from the look of their GIS app could be getting into more quantitative data.
As you get into quantitative data the power and tools available for asking sophisticated questions increase exponentially. Unfortunately so do the technical challenges, both computational and creating an intuitive user experience for something not intuitive to most people – numbers, math, statistics, etc. Despite the challenges I think this is where some of the greatest potential awaits for the emerging semantic web. That said I do think the new icons are quite nice and serve a useful function – despite the push pin.
Popularity: 22% [?]






