Links List 4.18.08

April 18th, 2008by Sean Gorman

Moxie designed a demonstration to show how integrated geo spatial service, RIA technology, location based service and digital mapa can make life easier. From that, a geo spatial service was developed that enabled Flex Yahoo AS3 map application.

Virtual Earth has been updated to include new imagery, new 3D buildings, direct support for MapCruncher, movie capture, export to KML and GPX files, and more.

Geomantic shares some coverage of geospatial topics in the Washington Post the past week including a story on Yahoo! Maps Live and a mashup from the Center for Neighborhood Technologies.

GISLounge and the Daily ACK announced that KML is now an Open Geospatial Consortium Standard. This means that Google will no longer be responsible for maintaining the KML file format which, instead, will be handled by OGC. KML (Keyhole Markup Language) is a file format that uses XML-based language to manage geographic information.

Flowing Data provides a list of data visualization blogs you may not know about, including Strange Maps, Well-formed Data, Random Etc, Serial Consign, and AnyGeo.

Popularity: 16% [?]

When we started the very first iteration of GeoCommons in 2005 folksonomies were all the rage and we jumped on board using tags to organize the geospatial data that was pushed into the new platform. During the time we had the prototype deployed we ran into many of the same issues other applications have found with folksonomies

1) people’s tags may be difficult for others to understand,
2) people may have tagged items inappropriately for others’ needs.

In short your users will not always implement tags in ways that are productive for the community - in the extreme resulting in Flickr’s 20 million unique tags. How many of those 20 million tags are misspelled words or so off the path they never get found.

In addition to the problems you encounter with folksonomies in general you have the further complications of geopspatial data. All geospatial data sets have location tags, but adding them in an unstructured way creates enough chaos that it is very difficult to leverage location tags in a thorough way. Secondly many potential users do not know the variety of geodata available. Put more simply they do not know what to search for, and having the ability to browse through data by topics is appealing.

Despite the downsides of folksonomies they are incredibly powerful and have been hugely effective in organizing vast amount of data on the web. So, as we worked on the next iteration of GeoCommons we started looking at possible hybrid approaches to folksonomies and hierarchies.

Specifically we looked at the two problems specific to geospatial data listed above 1) place tags and 2) organizing data for browsing. Solving the problems required both short term and long term solutions.

Fortunately we had a small advantage over many crowd sourced project in that we have a full time data team. They are a great group of folks that spend their day finding cool geodata and coming up with clever ways to organize it.

Through the data team and the other community members that contributed data to the first iteration of GeoCommons we had a big pool of data with a wide variety of tags to examine. What we found were some distinct trends in the tagging and titling of data. Across the data there were a commons set of tags that broke the data up into a useful set of distinct categories, but there were also many data sets that were tagged with elements that made them often indiscoverable. After the analysis we started to look at structures we could establish to help create self similarity in tagging that still had the flexibility to be adaptive.

The result was the creation of a location and topical taxonomy based on our existing corpus of data that has the intelligence to adapt as the content grows and evolves. I can’t go into the technical details in depth, but fundamentally the concept is to intelligently leverage the taxonomies and structures to provide suggestions to users to tag their data better.

In many cases this can be very simple - like providing tips on how to tag and title effectively to make your data more valuable to the community. For instance with titles we found across GeoCommons there were four key pieces of information used for datasets in the past.

1) Source name, 2) Original Name of Dataset from Source (or short description of dataset) 3) Geographic Area, 4) Time period of data

Examples:

  • OECD, Information and Communication Technology, Global, 2007
  • USGS, Earthquake Records, Worldwide, 1998-2007
  • NOAA, Hurricane Track Data, North America, 1851-2004
  • Communicating this effectively to users is a great way to get better consistency across data contributions, while still allowing flexibility for users to be creative and bring in information that does fit the rigid mold of a hierarchy. Of course this is the most simple and you can get far more clever.

    Del.icio.us for instance has a great feature that notifies a user they are putting in a new tag no one has used before and asking if that is what they meant to do. You can also suggest tags from your taxonomy that are semantically related to the data the user is contributing. This creates a consistency across tags that makes data easier to find as the system scales to larger volumes.

    The nice thing about taxonomies as opposed to folksonomies is that they can be structured as trees, which means you can compute across them quite easily. With a solid and adaptive taxonomy in place you can go a long ways in intelligently guiding users towards creating better and more consistent tags. At least that is what we think and it will be fun to see how it works out after the launch.

    Popularity: 28% [?]

    I promised Andrew a comparison of the big three map creation applications by feature and functionality, so here it goes. The story of how lightweight web based map creation applications came to be is interesting in and of itself. I think looking at how the three applications evolved historically will provide a bit of insight.

    Before the GeoWeb came into mainstream popularity both Microsoft and Yahoo! had mapping applications. Microsoft offered their browser based Terraserver which hooked up USGS imagery for the map tiles. Microsoft launched Terraserver in June of 1998 - practically prehistoric. ;-)

    Microsoft had also been active in the mapping space with products like MapPoint (both desktop application and web services). Yahoo! also was an early adopter of mapping applications in conjunction with their local search destination (although I completely failed at finding a date for when they first added maps). Despite the early adoption of web based mapping applications by Yahoo! and Microsoft it was arguably the launch of Google Maps in 2005 that jump started both the GeoWeb and the mash up craze.

    Shortly after Google Maps launched, Paul Radamacher hacked the application to allow it to display Craig’s List rental listings on the Google slippy map. Shortly there after Adrien Holovaty followed suit mashing up Chicago crime statistics with Google Maps. Google quickly released an API to allow developers to do the same thing seamlessly and we were off to the races. Microsoft quickly created Virtual Earth and Yahoo! pushed out Yahoo! Maps. Microsoft created compelling innovations with birds eye imagery and Yahoo! launched several popular GeoWeb services like free geocoding and Flash based mapping APIs.

    Microsoft Collections

    Through all these innovations there was a constant one way flow of content creation - developers could create unique maps and users could view them. Microsoft changed this when they launched Collections May 23, 2006:

    Collections. Social networking functionality allows customers to create lists of favorite landmarks and locations, attach personal photos and save them to a Scratchpad. Collections can be saved, recalled later, “permalinked,” and shared with friends and community in e-mail or through their MSN® Spaces blog.

    While not well publicized the “Collections” concept fundamentally changed the work flow for creating maps. No longer did you need to be a developer or GIS pro to create a basic map and share it with other people. The Virtual Earth folks even gave users a decent amount of cartographic power and options:

    Customized pushpins. A pushpin is essentially a marker indicating points of interest on a map view. A customized pushpin can easily be added with a simple right click, anywhere on a map, which will display a small red dot and a pop-up menu. A pushpin title or note of up to 200 characters can be added that will appear with the pushpin whenever a mouse hovers over it. Pushpins can easily be edited or deleted. When a pushpin is removed, whether customized or standard, the remaining pushpins will be automatically renumbered.

    2-D drawings in Collections. Users can add lines and drawings in a variety of colors, shapes and styles to personalize their Collection. They also can draw lines and shade areas that they want to mark on the map, such as for marking a running or bike trail, or neighborhood boundaries).

    MyMaps

    Despite the potential of the innovation the new functionality did not get much coverage in the press or massive levels of adoption. The TechCrunch article on it was lumped in with other new features from Yahoo! Maps.

    Just short of a year later Google launched Google MyMaps on April 4th 2007 to big headlines across the blogs, including MyMaps being the death knell of popular map mashups like Platial, Frappr and Tagzania.

    Fundamentally the functionality and features of MyMaps was not remarkably different than Collections, but the buzz around it was at least ten fold. So why was the attention so skewed towards Google for fundamentally the same innovation Microsoft had launched a year earlier? A few guesses:

  • better user exerpeince for Google - “so easy a cave man can do it
  • it was launched as a stand alone application instead of as a new feature
  • more effective blog outreach
  • Google halo effect

  • MapMixer

    Yahoo! was not too far behind launching their own map creation application, Yahoo! Mapmixer on September 13th 2007. Mapmixer took a different angle on map creation by allowing users to put static maps on top of the Yahoo! Maps applications. For instance after the Buscan oil spill in the San Francisco Bay last year I made a lot of calls trying to get the raw data on the location of the spills, for GeoCommons, but had no luck.

    I did find a PDF with a map of the oil spills so I saved it as a PNG then uploaded it to Yahoo Mapmixer and they took me through three easy steps to georeference the map on Yahoo! Maps. The user experience I thought was the best of the three and there were lots of great social features for me to give a short description of the map and for other users to comment on the map. Although much like Microsoft the application did not generate lots of buzz as with Google MyMaps, and the gallery only features 38 user submitted maps today. Interestingly, in concept, it is quite similar to Microsoft’s MapCruncher, although it is a download and supports a wider variety of raster based formats that must already be georeferenced.

    Since the launch of map creation applications by the three big players there have been two noticeable waves of enhancement 1) support for external data and 2) collaboration features. Microsoft put themselves out as being the first to support loading KML, “The October 07 release of Live Maps was the first to support KML viewing and import to Collections”. November 27th 2007 Google added KML, KMZ and GeoRSS support to MyMaps. Google followed this up with social features, like commenting, rating and open collaboration invitations for MyMaps.

    Performance Trials

    That covers features and functionality from a historic evolution stand point, but how do they perform? We did a very informal, one user, stress test. Create push pins as quickly as possible and see when the map application maxes out or gets sluggish. For Yahoo! Mapmixer this was pretty easy. You can overlay one picture or map onto the application, so you max out at one.

    In the process of loading and georeferencing the image you get speedy performance and predictable response times. For MyMaps and and Collections we had a bit more to stress. We’ll start with Collections where we created 200 push pins with good response time then got the following message “You cannot add more than 200 items to a collection. To add more items, create another collection.”

    When we went with the same test on MyMaps,we did high rate push pin creation and after about 30 the system got a bit sluggish, and sometimes it would create a listing for a pushpin on left hand pane but not create the push pin on the map. The caveat here is we were doing this high speed, and when we slowed down to a more deliberate pace the system handled it fine.

    MyMaps also maxes out at 200 push pins on the map, but instead of providing a warning it generates a pagination for a continuing set of push pins. So when you click on the first page you get a map with the first 200 push pins and when you click on the second page you get the next 200 push pins on a new map in the same browser and tab. Oddly it stops at 820 push pins and starts back over at the number one but you can keeping adding push pins to the map.

    What’s Next?

    That pretty much wraps it up for a comparison of the big three, how they evolved in a competitive environment, and a very ad hoc test of their limits.

    I believe the most interesting part will be where they evolve to next. What is the next set of functionality that will distinguish one from the other? Can Microsoft or Yahoo! introduce the next killer functionality that will catch up to 7 million maps that have been created with MyMaps?

    Popularity: 71% [?]

    A Proposal for a GeoWeb Metadata Implementation

    April 1st, 2008by Sean Gorman

    One of the criticisms we received when we launched GeoCommons was the lack of metadata for the content we had collected. Since then we’ve been looking into what would be a reasonable approach to implement metadata for the GeoWeb.

    When it comes to GIS data the existing standard is the FGDC’s Content Standard for Digital Geospatial Metadata (CSDGM). The standard calls for 335 metadata elements to describe a geospatial data set, which covers a wide variety of descriptions for the data. The one thing that came clear very quickly was that the FGDC CSDGM is far too onerous and outdated for the GeoWeb. For instance in the FAQ provided by the USGS they recommend you hire a full time person to create your CSDGM compliant metadata:

    Who should create metadata?
    “Data managers who are either technically-literate scientists or scientifically-literate computer specialists. Creating correct metadata is like library cataloging, except the creator needs to know more of the scientific information behind the data in order to properly document them. Don’t assume that every -ologist or -ographer needs to be able to create proper metadata. They will complain that it is too hard and they won’t see the benefits. But ensure that there is good communication between the metadata producer and the data producer; the former will have to ask questions of the latter….”

    In a GeoWeb where self publication is a key innovation the model of having a full time metadata guru is antiquated. A specification with 335 elements is antiquated. The mantra that “certainly if there is no pain, there will likely be no gain” when it comes to metadata is antiquated. The end result of these draconian approaches to metadata is about a zero likelihood the GeoWeb will implement them.

    This is a shame because metadata is very useful, especially when it comes to describing, finding and federating data. This is one of the shortcomings of KML - little/no metadata (although several argue it has no place in either of these formats). GeoRSS has limited metadata support with “Feature Type Tag” and “Relationship Tag” which are useful, but fairly confined.

    The question we faced with rebuilding GeoCommons - is there a middle ground between 335 elements and two elements? Fortunately we were not the first to look at this issue. In 1995 a bunch of librarians got together to devise an approach that “provides a simple and standardized set of conventions for describing things online in ways that make them easier to find”. The fifteen elements standard they devised is called Dublin-Core and is widely implemented across the web. If the librarians could come up with 15 core elements then surely the GeoWeb can, and even make those map to the Dublin-Core standard and the FGDC CSDGM standard. So, after a good bit of work here is what we would like to implement as a lightweight core set of metadata for GeoWeb data:

    metadata_table

    This covers seventeen elements about half of which we trap automatically. You can map them to either FGDC or Dublin Core thus giving you the ability to expose your data to the GIS world and general web community in a straightforward manner. As with any metadata standard you do not need all seventeen elements, but the more you populate the more useful the data becomes. The metadata could be exposed as microformats enabling a number of possibilities for discovery and potential federation. This could be particularly interesting with Yahoo! opening up their search to support Dublin Core vocabularies and microformats. Our feeling is that the more data we can make available on the web the more problems everyone can solve. We’ll be testing this out when we launch the next iteration of GeoCommons at Where 2.0 and would be great to get feedback and thoughts on the approach.

    Popularity: 23% [?]

    The folks at Puhpin had a great comment they posted to our last blog entry on “free public data“. I thought there was enough interesting content to expand on the comment thread with another blog post. The Pushpin team did a great job providing far more nuanced thoughts on the issues of “for fee” data. At the end of the day my issue is truly with the government/s for not providing the data in easy to use formats or even open standard non-proprietary formats. In an open market anyone is free to take that government supplied data, make it easy to use, and charge a price the market is willing to pay. In addition to making the data easy to use many vendors also add an additional layer of quality assurance and many times value added data derivatives like forecasts.

    There are many instances where vendor supplied data is truly value added and worth the money an end user pays, but there are also situations where it is not and there is a better alternative. Take for instance the 2000 Census data ESRI provides to Pushpin to resell - the added work there is taking the boundary files provided by Census and joining them to the data tables provided by the Census. I’ll be the first to admit it is tedious to do all the database joins, and it requires having pricey GIS software, but in my opinion the ratio of value add to price is way out of wack.

    That is the philosophical difference with GeoCommons. If you have a community of people willing to put in that little bit of work to extract the data from places like Census and share it with the community you get a network effect. Since the data goes in under Creative Commons, anyone can take that data and combine it with their data or anyone else’s contributed data. Allowing any user to make something new and innovative with the collective data. Anytime you work to create a dataset/database there is value created and work done. Every member of OpenStreetMaps GPS-tracing roads has put in solid sweat equity, but they choose to contribute that to the community because the collective value of that data is far greater than its value alone.

    In the end I believe this helps the data vendors because there is more data the market can mashup with the vendor data (vendors benefit from the network effect also). There is also a larger market of people that realize the value of the data because the barrier to entry to experience it has been removed. That said, I believe it also means the data providers are really going to have to add true value and not just do a few database joins. The real value comes in the technology and not the raw data itself. The data is what enables the technology to be more valuable.

    Tim O’Reilly states that one of the key value drivers for Web 2.0 is “Data is the Intel Inside“. Specifically O’Reilly cites NAVTEQ’s proprietary database of streets as a big value drivers for many GeoWeb applications. I agree that databases (i.e. SQL is the new HTML) are creating new value propositions, but now the value is having data on the “outside” not the “inside”. The walled proprietary gardens of “inside” data are being trumped by open source “outside” data that allows a network effect to be created. With data on the “outside” not only can new combinations (data mashups) be created, but the data itself can adapt (like OpenSteetMaps and TomTom). In response to Brady’s post on the Nokeia acquisition of NAVTEQ O’Reilly comments, “the real question is going to be whether there’s a web 2.0 answer (i.e. a user-generated content) answer to the expensive data development and curation currently employed by Navteq.” I think the answer is a resounding yes and as standards like KML 3.0 progress and technologies evolve around them, the power given to the user so they can contribute meaningful data and context is only going to increase. The real value is in the technology that allows the data to be delivered, mashed up, and interconnected.

    Popularity: 14% [?]