When we started the very first iteration of GeoCommons in 2005 folksonomies were all the rage and we jumped on board using tags to organize the geospatial data that was pushed into the new platform. During the time we had the prototype deployed we ran into many of the same issues other applications have found with folksonomies

1) people’s tags may be difficult for others to understand,
2) people may have tagged items inappropriately for others’ needs.

In short your users will not always implement tags in ways that are productive for the community - in the extreme resulting in Flickr’s 20 million unique tags. How many of those 20 million tags are misspelled words or so off the path they never get found.

In addition to the problems you encounter with folksonomies in general you have the further complications of geopspatial data. All geospatial data sets have location tags, but adding them in an unstructured way creates enough chaos that it is very difficult to leverage location tags in a thorough way. Secondly many potential users do not know the variety of geodata available. Put more simply they do not know what to search for, and having the ability to browse through data by topics is appealing.

Despite the downsides of folksonomies they are incredibly powerful and have been hugely effective in organizing vast amount of data on the web. So, as we worked on the next iteration of GeoCommons we started looking at possible hybrid approaches to folksonomies and hierarchies.

Specifically we looked at the two problems specific to geospatial data listed above 1) place tags and 2) organizing data for browsing. Solving the problems required both short term and long term solutions.

Fortunately we had a small advantage over many crowd sourced project in that we have a full time data team. They are a great group of folks that spend their day finding cool geodata and coming up with clever ways to organize it.

Through the data team and the other community members that contributed data to the first iteration of GeoCommons we had a big pool of data with a wide variety of tags to examine. What we found were some distinct trends in the tagging and titling of data. Across the data there were a commons set of tags that broke the data up into a useful set of distinct categories, but there were also many data sets that were tagged with elements that made them often indiscoverable. After the analysis we started to look at structures we could establish to help create self similarity in tagging that still had the flexibility to be adaptive.

The result was the creation of a location and topical taxonomy based on our existing corpus of data that has the intelligence to adapt as the content grows and evolves. I can’t go into the technical details in depth, but fundamentally the concept is to intelligently leverage the taxonomies and structures to provide suggestions to users to tag their data better.

In many cases this can be very simple - like providing tips on how to tag and title effectively to make your data more valuable to the community. For instance with titles we found across GeoCommons there were four key pieces of information used for datasets in the past.

1) Source name, 2) Original Name of Dataset from Source (or short description of dataset) 3) Geographic Area, 4) Time period of data

Examples:

  • OECD, Information and Communication Technology, Global, 2007
  • USGS, Earthquake Records, Worldwide, 1998-2007
  • NOAA, Hurricane Track Data, North America, 1851-2004
  • Communicating this effectively to users is a great way to get better consistency across data contributions, while still allowing flexibility for users to be creative and bring in information that does fit the rigid mold of a hierarchy. Of course this is the most simple and you can get far more clever.

    Del.icio.us for instance has a great feature that notifies a user they are putting in a new tag no one has used before and asking if that is what they meant to do. You can also suggest tags from your taxonomy that are semantically related to the data the user is contributing. This creates a consistency across tags that makes data easier to find as the system scales to larger volumes.

    The nice thing about taxonomies as opposed to folksonomies is that they can be structured as trees, which means you can compute across them quite easily. With a solid and adaptive taxonomy in place you can go a long ways in intelligently guiding users towards creating better and more consistent tags. At least that is what we think and it will be fun to see how it works out after the launch.

    Popularity: 35% [?]

    Links List 4.11.08

    April 11th, 2008by Sean Gorman

    Brett Taylor says that we need a Wikipedia for data. He realizes how hard it is for a everyday programmer to get access to even the most basic factual data, which is a barrier to innovation.

    Dave Bouwman shows us the National Geographic MetaLens service with Virtual Earth. MetaLens is a geospatial content management and archival system that National Geographic uses to secure and manage its content.

    Dan Catt from Geobloggers and Flickr shares the new Flickr video and geo-tagging option.

    James Fee shares how to leverage the Google application engine with GIS applications. He also reviews the confusing commercial difference in licenses with Microsoft Virtual Earth Mapcruncher and the MSR edition.

    According to GISUSer, General Dynamics has completed the testing for Geo-Eye, an earth imaging satellite. GeoEye-1 is part of the National Geospatial-Intelligence Agency (NGA) NextView program. The NextView program is designed to ensure that the NGA has access to commercial imagery in support of its mission to provide timely, relevant and accurate geospatial intelligence in support of national security.

    GISLounge shares top causes of errors in online mapping systems, including inaccurate base data, accuracy of geocoding, lag time to incorporate newly developed areas and difficulty in interpreting variations on addresses.

    Popularity: 29% [?]

    Are Push Pins Inescapable?

    March 12th, 2008by Sean Gorman

    It is only fitting that the day after I posted “Moving Push Pins Off the Map” I saw the post on Ogle Earth about a new geotagging icon….which is?

    geotag-icon

    A GIANT PUSH PIN!

    With my interest peaked we did a little digging and found another geotagging icon:

    geotag-icon2

    ANOTHER GIANT PUSH PIN (actually when I dug into it this icon was a first version that evolved into the red one.)

    I of course blame this all on the Google monolith for perpetuating push pin mania. Last time I saw Mike Jones he even had a push pin tie tack. Joking aside the reason for creating a geotagging icon itself is worth discussing.

    The stated purpose on the GeoTagIcons.com website is “The Geotag Icon is intended as a web “standard” icon for identifying geotagged content to humans.” So, if a photo or blog post has been geotagged then there is an icon on it to let you know. The thought being many times geotags are hidden in microformats or the URL, thus not visible to the user.

    This seems like a straight forward approach to the problem, but also seems to have overlap with existing icons such as KML and GeoRSS. The tutorial on GeoTagIcons has examples of using it for links to both KML and GeoRSS content. This could lead to some ambiguity and confusion for users.

    One of the most interesting parts of the pitch for using the GeoTagIcon is, “Reason 4: It encourages development of the semantic web”. On first blush this got me excited, but reading a bit deeper realized they meant it acts as an advertisement for linked content that could help support an evolving semantic web. This is in and of itself is a worthy cause and advertising has been directed at far less useful goals.

    The link between geotagging and the semantic web does bring up a good topic for debate. How will all these geotagged objects (KML, GeoRSS, geo-microformats, GPX, etc.) be tied together in a method that creates semantic meaning? What questions will the semantic technologies answer? The GeoTagIcon site provides an example of , “Show me a plot of other bloggers in my vicinity”, or “I’d like to see a map showing which of my friends have also visited Australia”, “Who else has photographed this location?”, etc.

    While these are interesting I think the examples and the direction many folks are taking geotagging misses the real potential of the semantic web. The geotagging premise is based on doing increasingly sophisticated things with geo-coded annotations - 99% of the time taking the form of a pushpin. In each of the examples above users or a screen scraper and geo-coder (most likely) have added a latitude and longitude to a piece of unstructured data (bloggers, my friends, photos). While this all useful information it is often relegated to only answering trivial questions.

    There is only so much you can do with a bit of unstructured text or html that has geographic coordinates. You can measure vicinity (bloggers nearby), intersection (friends that have visited Australia) and union (show me all photos from a location). There might be a few that I am missing but it is fairly small universe of questions that can be answered, and the semantic web is all about answering questions. Hopefully a very large universe of questions.

    From my limited perspective the semantic web is all about bringing vast data resources to the web in an easy and intuitive way. While turning unstructured text into geocoded annotations already on the web is important I think the bigger challenge is blending existing structured data (largely in databases and not on directly on the page web) with organized unstructured data through the web in a seamless way like we have for text, pictures and video.

    Metaweb has done some compelling work with Freebase. They have even been doing some interesting geo work with their database. To date Freebase has largely been working with conceptual data, but from the look of their GIS app could be getting into more quantitative data.

    As you get into quantitative data the power and tools available for asking sophisticated questions increase exponentially. Unfortunately so do the technical challenges, both computational and creating an intuitive user experience for something not intuitive to most people - numbers, math, statistics, etc. Despite the challenges I think this is where some of the greatest potential awaits for the emerging semantic web. That said I do think the new icons are quite nice and serve a useful function - despite the push pin. ;-)

    Read the rest of this entry »

    Popularity: 47% [?]