Links List 4.11.08

April 11th, 2008by Sean Gorman

Brett Taylor says that we need a Wikipedia for data. He realizes how hard it is for a everyday programmer to get access to even the most basic factual data, which is a barrier to innovation.

Dave Bouwman shows us the National Geographic MetaLens service with Virtual Earth. MetaLens is a geospatial content management and archival system that National Geographic uses to secure and manage its content.

Dan Catt from Geobloggers and Flickr shares the new Flickr video and geo-tagging option.

James Fee shares how to leverage the Google application engine with GIS applications. He also reviews the confusing commercial difference in licenses with Microsoft Virtual Earth Mapcruncher and the MSR edition.

According to GISUSer, General Dynamics has completed the testing for Geo-Eye, an earth imaging satellite. GeoEye-1 is part of the National Geospatial-Intelligence Agency (NGA) NextView program. The NextView program is designed to ensure that the NGA has access to commercial imagery in support of its mission to provide timely, relevant and accurate geospatial intelligence in support of national security.

GISLounge shares top causes of errors in online mapping systems, including inaccurate base data, accuracy of geocoding, lag time to incorporate newly developed areas and difficulty in interpreting variations on addresses.

Popularity: 22% [?]

GeoWeb Metadata Follow Up

April 2nd, 2008by Sean Gorman

First off want to thanks the folk that commented on the last post. Lots of useful feedback and it also highlighted a bit of confusion I created with the first post. The purpose of the first post was not a proposal to create a new metadata standard. Instead it was simply a proposal of how we could map the metadata we collect in GeoCommons to existing standards.

From that standpoint the proposal is for an implementation not a standard. We have just about 5,000 unique datasets and about 70,000 data layers, and it would be great to expose useful metadata for the data. The data covers the gambit, from EPA toxic release sites to the number of Facebook users by city. The system and metadata requirements needs to be flexible enough to accommodate both a user uploading Facebook data and one uploading EPA data.

While GIS users might not be intimidated by a metadata form with 75 or even 335 elements your average Web/GeoWeb user definitely will be. The goal with GeoCommons is to provide a destination where both communities can consume and share data, and I think both communities will find tools and data that are useful.

In regard to the metadata elements we proposed to map to in the last post, we were looking for those that both technical and non-technical users would understand, and also automatically trap as many additional elements as possible. To cover technical users, that have a full compliment of metadata, the plan is to have an element where you can you can provide a link to a full metadata specification.

The comments directing us to the ISO 19115 standard were very useful and we are looking to see what elements we are missing to map to that standard as we evolve. The thing we want to make sure we get right is finding to best set of metadata elements to request from users. Balancing the fact that if we have a huge number of elements, most people are going to go running for the hills.

Right now it looks like we’ll have 17-20 elements that will map to Dublin Core, FGDC, and in a next release ISO 19115. So, for each data set in Geocommons you’ll have a page that lists those 17-20 elements in the metadata format technical folks are used to seeing. This should also provide a means by which to explore federating the data with other applications and search approaches.

The goal here is to create a bridge between content being created for the GeoWeb and content created for the GIS world and make both usable and remixable by the web community as a whole. I fully respect the motivations and requirements for the GIS metadata specifications out there, and I hope we can leverage them to create an implementation that will see a high level of adoption.

Without adoption standards are pretty hollow as we’ve seen with all the work that went into GML versus the much lighter specifications for KML and GeoRSS. While both have their place it is clear what the market is supporting. As more geospatial data is created outside of the government we are not going to have the government mandate to force metadata creation and what the market accepts is going to become increasing critical - IMHO. Look forward to getting more feedback as we get ready to launch.

Popularity: 23% [?]

A Proposal for a GeoWeb Metadata Implementation

April 1st, 2008by Sean Gorman

One of the criticisms we received when we launched GeoCommons was the lack of metadata for the content we had collected. Since then we’ve been looking into what would be a reasonable approach to implement metadata for the GeoWeb.

When it comes to GIS data the existing standard is the FGDC’s Content Standard for Digital Geospatial Metadata (CSDGM). The standard calls for 335 metadata elements to describe a geospatial data set, which covers a wide variety of descriptions for the data. The one thing that came clear very quickly was that the FGDC CSDGM is far too onerous and outdated for the GeoWeb. For instance in the FAQ provided by the USGS they recommend you hire a full time person to create your CSDGM compliant metadata:

Who should create metadata?
“Data managers who are either technically-literate scientists or scientifically-literate computer specialists. Creating correct metadata is like library cataloging, except the creator needs to know more of the scientific information behind the data in order to properly document them. Don’t assume that every -ologist or -ographer needs to be able to create proper metadata. They will complain that it is too hard and they won’t see the benefits. But ensure that there is good communication between the metadata producer and the data producer; the former will have to ask questions of the latter….”

In a GeoWeb where self publication is a key innovation the model of having a full time metadata guru is antiquated. A specification with 335 elements is antiquated. The mantra that “certainly if there is no pain, there will likely be no gain” when it comes to metadata is antiquated. The end result of these draconian approaches to metadata is about a zero likelihood the GeoWeb will implement them.

This is a shame because metadata is very useful, especially when it comes to describing, finding and federating data. This is one of the shortcomings of KML - little/no metadata (although several argue it has no place in either of these formats). GeoRSS has limited metadata support with “Feature Type Tag” and “Relationship Tag” which are useful, but fairly confined.

The question we faced with rebuilding GeoCommons - is there a middle ground between 335 elements and two elements? Fortunately we were not the first to look at this issue. In 1995 a bunch of librarians got together to devise an approach that “provides a simple and standardized set of conventions for describing things online in ways that make them easier to find”. The fifteen elements standard they devised is called Dublin-Core and is widely implemented across the web. If the librarians could come up with 15 core elements then surely the GeoWeb can, and even make those map to the Dublin-Core standard and the FGDC CSDGM standard. So, after a good bit of work here is what we would like to implement as a lightweight core set of metadata for GeoWeb data:

metadata_table

This covers seventeen elements about half of which we trap automatically. You can map them to either FGDC or Dublin Core thus giving you the ability to expose your data to the GIS world and general web community in a straightforward manner. As with any metadata standard you do not need all seventeen elements, but the more you populate the more useful the data becomes. The metadata could be exposed as microformats enabling a number of possibilities for discovery and potential federation. This could be particularly interesting with Yahoo! opening up their search to support Dublin Core vocabularies and microformats. Our feeling is that the more data we can make available on the web the more problems everyone can solve. We’ll be testing this out when we launch the next iteration of GeoCommons at Where 2.0 and would be great to get feedback and thoughts on the approach.

Popularity: 29% [?]

Content for ArcGIS Explorer vs. Google Earth

March 25th, 2008by Sean Gorman

I thought it would be fun to take a different angle on the virtual globe competition and look at the content and data made available by two of the players - Google and ESRI. From a technical perspective I think most would agree that ArcGIS explorer is pale emulation of Google Earth especially when it comes to user experience. I’ll put aside my gripes around difficulty in set up and version updates with Explorer and focus on content. The ESRI supporters have made the argument that ArcGIS Explorer is about access to GIS content and has a different mission than Google Earth. Actually, I would say the missions (exposing geographic data to larger audiences) are the same but the approaches are much different.

I’ll assume that some day ESRI will nail the technical side of Explorer, and look a little further down the road - to the content available for Google and ESRI’s thick client applications and how easy it is to access. As the GeoWeb evolves and we’ve rendered ever more amazing three dimensional worlds I think an increasing premium is going to be on the scope of data and content that can be delivered to these applications.

So where do the two ends of the GeoWeb spectrum sit on the topic? From my observations pretty far apart. For practical purposes I’ll break the comparison into four topics 1) data formats 2) data sources 3) data search and 4) data packaging

Data Formats

This is one category where ArcGIS Explorer ups Google Earth cleanly. Explorer allows you to load ArcGIS Explorer files (.nmf why they felt the need to create another proprietary file format that works with nothing else is beyond me), servers (WMS, IMS, file server), geodatabases, shapefile, raster, and KML. This is an impressive list, although a little bit less impressive when you consider that half are open standards KML, WMS and raster and the others are really ESRI proprietary formats at the end of the day. On the upside the raster support is quite extensive (30 or so different formats), although they require a spatial reference file.

On the Google side you have KML and KMZ (once proprietary now turned over to the OGC - jury is still out on how open it will be) for the free version of Google Earth, and within the free Google Earth 3-D warehouse support for SketchUp “.skp” and COLLADA “.dae”. In Google Earth Pro and Enterprise versions there is vector support for ESRI shapefile and MapInfo .tab, and for imagery support for TIFF (.tif), including GeoTiff, National Imagery Transmission Format (.ntf), Erdas Imagine Images (.img), Atlantis MFF Raster (.hdr), PCIDSK Database File (.pix), Portable Pixmap Format (.pnm), Device Independent Bitmap (.bmp). While this is an extensive list you have to pay to get the file support, in ArcGIS Explorer the support is currently free, so ESRI does have a distinct edge in the category.

Data Sources

In addition to supporting different file formats there is also baked in content for both applications. For Explorer you can access ArcGIS Online directly through the file server option, which allows you to access a file directory of cotnent. A second option is you can access the “Resource Center” website through the help tab, where you can download content in the .nmf file format. The “Resource Center” is definitely the more user friendly of the two with a nice user interface categorizing content into useful categories like, “imagery, street, physical features”.

On the downside the content is very limited - twenty four layers supplied by ESRI and four contributed by the community. ArcGIS Online has more content, but was a pain in the ass to access. You have to get a user name and password from ESRI, read through the incorrect direction to access it, then you get a list of UNIX style titles with abbreviations and underscores, like UNEP_WCMC_WDPA2006_2D. Not exactly user friendly, but you do get close to fifty additional layers of data once you jump through the hoops.

In Google Earth there are two sources of baked in data from the application, “Layers” including (terrain, geoweb, roads, traffic, 3-building, borders and labels, gallery, global awareness, and places of interest) and the ability to search for businesses. You can also click the “help” tab and be taken to the Google Earth Community web page. On the Google Earth Community page site alone there are 638,213 KML or KMZ files.

While finding this content (they are all file attachments to bulletin board posts) is pretty clunky and often frustrating, it is a LOT of content. Especially when you compare it to the four user generated files on the ESRI equivalent. The quality and source of this content/data varies wildly, and it is difficult to tell what is good and what is bad, but the potential is there. Actually the metadata support for both is pretty sparse. This is ESRI’s metadata for a .nmf imagery layer from the “Resource Center”:

“Displays satellite and aerial imagery at a 15m minimum resolution worldwide, and 1m resolution for the U.S. World boundaries, place names, and transportation layers are also included. Use this map to view man-made and natural features, or as a base map for overlaying associated data layers.”

For an ArcGIS Online layer this:

Layer Name: Imagery
Layer Source: ArcGIS Globe Service Layer
Layer Type: Draped
URL: http://services.arcgisonline.com/v92
Service: I3_Imagery_Prime_World
Sub Service: Imagery
LOD Tile Fetch: True
Hidden: False

Neither super useful.

Data Search

Neither application directly supports data search, but both have communities or services built around them that do. On the ESRI side there is a huge number of geospatial data repositories that have shapefiles in them and the search capabilities of them vary widely. Probably the largest is the Geospatial One-Stop that has a connection to most Federal geospatial data. Although in reality it is more often access to the metadata than the data itself. Still a large amount of content that could be conceivably viewed in ArcGIS Explorer, although largely disconnected in a large number of different repositories.

Google Earth has not only a good number of community aggregators like the official GE Community site above and unofficial like Mapufacture, GE Library and GLayers, but also the ability to search all KML files indexed on the web. I’ve heard numbers north of 10 million KML files indexed by this approach, but have nothing official. One way to search this content base is to type filetype:kml into the standard Google search box. You get a good amount of content in the results but figuring out what that content is and means is pretty sketchy. Here is the result for a search for “sharks”:

Google_KML_search

Till you download it and open it in Google Earth you really have no clue, and even then you still might have no clue.

So with GIS/ESRI data you get great metadata and context, but no unified search. On the Google front you get great unified search and community content but no metadata or context for the data.

Data Packaging

The last topic is short and sweet. ArcGIS Explorer gives you a blank globe with just one layer of base imagery (looks like blue marble), then it is up to you to populate the globe with data to fit your needs. Google Earth on the other hand comes packaged with a wide variety of layers already populated on the map. One is geared towards a professional audience and the other mass consumer. Although I would argue that if ESRI truly wants to create GIS for everyone they are going to need to package up content and GIS data, so anyone can hit the ground running.

Even as a GIS geek it took me way to long to get the whole rig going to create something useful. Having all the options to bring in a wide variety of content was great, but I think there is still a lot to be learned from Google about how to package up content to appeal to a much broader audience. End of the day I’d say ESRI wins the content variety category and Google wins the content volume and packaging category. Lot of good things being done by both from opposite directions, but I believe they inevitably run up against each other. How and when this happens will be interesting to watch.

Popularity: 48% [?]

It seems like it is a daily dose of semantic web on the tech blogs of late. Today it was Textwise’s Million Dollar Semantic Hacker Challenge and a few days ago it was Yahoo opening their search platform to support a wide variety of semantic web standards. This has lead to a good bit of proselytizing, mostly in the comments, that this heralds the arrival of the Semantic Web, or Web 3.0 or the Next Generation Web. All of which sounds like the circling of the marketing band wagons.

Unfortunately when the wagons circle everything starts picking up the label - in this case semantic. This is especially dangerous when you have a word like “semantics” that can be defined, so many different ways. Just look at the definition tree created by Wikipedia:

*Semantics is the study of meaning in communication.
*In computer science semantics reflects the meaning of programs or functions.
*The Semantic Web refers to the extension of the World Wide Web through the embedding of additional semantic metadata

More often I see folks labeling things semantic that are really syntax. “Syntax” being the rules to construct and define something like a sentence or line of code and “semantics” the meaning of those rules or definitions. Syntax is fairly easy and semantics are fairly hard, as most folks in artificial intelligence would argue. Even going so far as saying all programming languages other than LISP are syntax and not semantic.

This is a bit more clear with an example. Lets take the Textwise announcement - a technology that will parse plain text on a website or elsewhere and categorizes it to predefined topics. One example in the Techcrunch comments was the following:

input text:
Call us crazy, but we think there are some brilliant minds out there that can find some really amazing uses for this incredibly powerful and scalable technology. Think you’re up to the Challenge? We think you are!

categories (ranked from 0 (worst) to 100 (best)):
Shopping/Health/Alternative/Hypnotherapy/Audio_and_Video 43 Business/Telecommunications/Services/Wireless/Software 33 Arts/Music/Bands_and_Artists/311/Tablature 28
Computers/Internet/Consultants/Research 26 Shopping/Health/Alternative/Meditation/Audio_and_Video 25

The output is really not telling me anything about the meaning of the text just setting up rules to provide categorization. So I would definitely put this in the syntax and not semantic category. I would also say what Yahoo! is doing is really more syntax than semantics although there is the possibility of building truly semantic technologies on top of what they are enabling. They’ve created a set of rules based on rich standards to allow applications to be built. Remains to be seen what will come of it, but in rush of market buzz I think it is easy to miss that building truly semantic technologies is quite hard. Some folks in AI (the Chinese room) would argue machines are not even capable of semantic meaning or understanding.

From this perspective I think we’ll see a lot of people building applications based on syntax that reorganize and categorize content by giving the “page web” a bit of structure. Oddly its like we’ve gone full circle back to DMOZ. While these technologies may be clever and useful I do not think they will fundamentally change the Web. In the other category I think we’ll see a few companies pushing towards something more sophisticated (call it a semantic, implicit, computational web) where new data and services are mixed with existing web content to provide answers to users questions.

Popularity: 26% [?]