When we started the very first iteration of GeoCommons in 2005 folksonomies were all the rage and we jumped on board using tags to organize the geospatial data that was pushed into the new platform. During the time we had the prototype deployed we ran into many of the same issues other applications have found with folksonomies

1) people’s tags may be difficult for others to understand,
2) people may have tagged items inappropriately for others’ needs.

In short your users will not always implement tags in ways that are productive for the community - in the extreme resulting in Flickr’s 20 million unique tags. How many of those 20 million tags are misspelled words or so off the path they never get found.

In addition to the problems you encounter with folksonomies in general you have the further complications of geopspatial data. All geospatial data sets have location tags, but adding them in an unstructured way creates enough chaos that it is very difficult to leverage location tags in a thorough way. Secondly many potential users do not know the variety of geodata available. Put more simply they do not know what to search for, and having the ability to browse through data by topics is appealing.

Despite the downsides of folksonomies they are incredibly powerful and have been hugely effective in organizing vast amount of data on the web. So, as we worked on the next iteration of GeoCommons we started looking at possible hybrid approaches to folksonomies and hierarchies.

Specifically we looked at the two problems specific to geospatial data listed above 1) place tags and 2) organizing data for browsing. Solving the problems required both short term and long term solutions.

Fortunately we had a small advantage over many crowd sourced project in that we have a full time data team. They are a great group of folks that spend their day finding cool geodata and coming up with clever ways to organize it.

Through the data team and the other community members that contributed data to the first iteration of GeoCommons we had a big pool of data with a wide variety of tags to examine. What we found were some distinct trends in the tagging and titling of data. Across the data there were a commons set of tags that broke the data up into a useful set of distinct categories, but there were also many data sets that were tagged with elements that made them often indiscoverable. After the analysis we started to look at structures we could establish to help create self similarity in tagging that still had the flexibility to be adaptive.

The result was the creation of a location and topical taxonomy based on our existing corpus of data that has the intelligence to adapt as the content grows and evolves. I can’t go into the technical details in depth, but fundamentally the concept is to intelligently leverage the taxonomies and structures to provide suggestions to users to tag their data better.

In many cases this can be very simple - like providing tips on how to tag and title effectively to make your data more valuable to the community. For instance with titles we found across GeoCommons there were four key pieces of information used for datasets in the past.

1) Source name, 2) Original Name of Dataset from Source (or short description of dataset) 3) Geographic Area, 4) Time period of data

Examples:

  • OECD, Information and Communication Technology, Global, 2007
  • USGS, Earthquake Records, Worldwide, 1998-2007
  • NOAA, Hurricane Track Data, North America, 1851-2004
  • Communicating this effectively to users is a great way to get better consistency across data contributions, while still allowing flexibility for users to be creative and bring in information that does fit the rigid mold of a hierarchy. Of course this is the most simple and you can get far more clever.

    Del.icio.us for instance has a great feature that notifies a user they are putting in a new tag no one has used before and asking if that is what they meant to do. You can also suggest tags from your taxonomy that are semantically related to the data the user is contributing. This creates a consistency across tags that makes data easier to find as the system scales to larger volumes.

    The nice thing about taxonomies as opposed to folksonomies is that they can be structured as trees, which means you can compute across them quite easily. With a solid and adaptive taxonomy in place you can go a long ways in intelligently guiding users towards creating better and more consistent tags. At least that is what we think and it will be fun to see how it works out after the launch.

    Popularity: 28% [?]

    I promised Andrew a comparison of the big three map creation applications by feature and functionality, so here it goes. The story of how lightweight web based map creation applications came to be is interesting in and of itself. I think looking at how the three applications evolved historically will provide a bit of insight.

    Before the GeoWeb came into mainstream popularity both Microsoft and Yahoo! had mapping applications. Microsoft offered their browser based Terraserver which hooked up USGS imagery for the map tiles. Microsoft launched Terraserver in June of 1998 - practically prehistoric. ;-)

    Microsoft had also been active in the mapping space with products like MapPoint (both desktop application and web services). Yahoo! also was an early adopter of mapping applications in conjunction with their local search destination (although I completely failed at finding a date for when they first added maps). Despite the early adoption of web based mapping applications by Yahoo! and Microsoft it was arguably the launch of Google Maps in 2005 that jump started both the GeoWeb and the mash up craze.

    Shortly after Google Maps launched, Paul Radamacher hacked the application to allow it to display Craig’s List rental listings on the Google slippy map. Shortly there after Adrien Holovaty followed suit mashing up Chicago crime statistics with Google Maps. Google quickly released an API to allow developers to do the same thing seamlessly and we were off to the races. Microsoft quickly created Virtual Earth and Yahoo! pushed out Yahoo! Maps. Microsoft created compelling innovations with birds eye imagery and Yahoo! launched several popular GeoWeb services like free geocoding and Flash based mapping APIs.

    Microsoft Collections

    Through all these innovations there was a constant one way flow of content creation - developers could create unique maps and users could view them. Microsoft changed this when they launched Collections May 23, 2006:

    Collections. Social networking functionality allows customers to create lists of favorite landmarks and locations, attach personal photos and save them to a Scratchpad. Collections can be saved, recalled later, “permalinked,” and shared with friends and community in e-mail or through their MSN® Spaces blog.

    While not well publicized the “Collections” concept fundamentally changed the work flow for creating maps. No longer did you need to be a developer or GIS pro to create a basic map and share it with other people. The Virtual Earth folks even gave users a decent amount of cartographic power and options:

    Customized pushpins. A pushpin is essentially a marker indicating points of interest on a map view. A customized pushpin can easily be added with a simple right click, anywhere on a map, which will display a small red dot and a pop-up menu. A pushpin title or note of up to 200 characters can be added that will appear with the pushpin whenever a mouse hovers over it. Pushpins can easily be edited or deleted. When a pushpin is removed, whether customized or standard, the remaining pushpins will be automatically renumbered.

    2-D drawings in Collections. Users can add lines and drawings in a variety of colors, shapes and styles to personalize their Collection. They also can draw lines and shade areas that they want to mark on the map, such as for marking a running or bike trail, or neighborhood boundaries).

    MyMaps

    Despite the potential of the innovation the new functionality did not get much coverage in the press or massive levels of adoption. The TechCrunch article on it was lumped in with other new features from Yahoo! Maps.

    Just short of a year later Google launched Google MyMaps on April 4th 2007 to big headlines across the blogs, including MyMaps being the death knell of popular map mashups like Platial, Frappr and Tagzania.

    Fundamentally the functionality and features of MyMaps was not remarkably different than Collections, but the buzz around it was at least ten fold. So why was the attention so skewed towards Google for fundamentally the same innovation Microsoft had launched a year earlier? A few guesses:

  • better user exerpeince for Google - “so easy a cave man can do it
  • it was launched as a stand alone application instead of as a new feature
  • more effective blog outreach
  • Google halo effect

  • MapMixer

    Yahoo! was not too far behind launching their own map creation application, Yahoo! Mapmixer on September 13th 2007. Mapmixer took a different angle on map creation by allowing users to put static maps on top of the Yahoo! Maps applications. For instance after the Buscan oil spill in the San Francisco Bay last year I made a lot of calls trying to get the raw data on the location of the spills, for GeoCommons, but had no luck.

    I did find a PDF with a map of the oil spills so I saved it as a PNG then uploaded it to Yahoo Mapmixer and they took me through three easy steps to georeference the map on Yahoo! Maps. The user experience I thought was the best of the three and there were lots of great social features for me to give a short description of the map and for other users to comment on the map. Although much like Microsoft the application did not generate lots of buzz as with Google MyMaps, and the gallery only features 38 user submitted maps today. Interestingly, in concept, it is quite similar to Microsoft’s MapCruncher, although it is a download and supports a wider variety of raster based formats that must already be georeferenced.

    Since the launch of map creation applications by the three big players there have been two noticeable waves of enhancement 1) support for external data and 2) collaboration features. Microsoft put themselves out as being the first to support loading KML, “The October 07 release of Live Maps was the first to support KML viewing and import to Collections”. November 27th 2007 Google added KML, KMZ and GeoRSS support to MyMaps. Google followed this up with social features, like commenting, rating and open collaboration invitations for MyMaps.

    Performance Trials

    That covers features and functionality from a historic evolution stand point, but how do they perform? We did a very informal, one user, stress test. Create push pins as quickly as possible and see when the map application maxes out or gets sluggish. For Yahoo! Mapmixer this was pretty easy. You can overlay one picture or map onto the application, so you max out at one.

    In the process of loading and georeferencing the image you get speedy performance and predictable response times. For MyMaps and and Collections we had a bit more to stress. We’ll start with Collections where we created 200 push pins with good response time then got the following message “You cannot add more than 200 items to a collection. To add more items, create another collection.”

    When we went with the same test on MyMaps,we did high rate push pin creation and after about 30 the system got a bit sluggish, and sometimes it would create a listing for a pushpin on left hand pane but not create the push pin on the map. The caveat here is we were doing this high speed, and when we slowed down to a more deliberate pace the system handled it fine.

    MyMaps also maxes out at 200 push pins on the map, but instead of providing a warning it generates a pagination for a continuing set of push pins. So when you click on the first page you get a map with the first 200 push pins and when you click on the second page you get the next 200 push pins on a new map in the same browser and tab. Oddly it stops at 820 push pins and starts back over at the number one but you can keeping adding push pins to the map.

    What’s Next?

    That pretty much wraps it up for a comparison of the big three, how they evolved in a competitive environment, and a very ad hoc test of their limits.

    I believe the most interesting part will be where they evolve to next. What is the next set of functionality that will distinguish one from the other? Can Microsoft or Yahoo! introduce the next killer functionality that will catch up to 7 million maps that have been created with MyMaps?

    Popularity: 71% [?]

    The winner-take-all Republican primary awarded McCain with 57 delegates. This was his second win over Romney in nearly as many weeks. A spatial distribution of the share of votes at the county level (see the map below) shows an interesting pattern where bright hues indicate higher share of votes for a candidate compared to all other candidates. McCain won more votes in 47 counties than any other candidate (counties shown in red), while Romney had plurality of votes in 17 counties (green) and Huckabee with an overall showing of 4th place, managed to claim Holmes county (blue), in Florida panhandle. On the other hand, Giuliani, who invested so heavily in just one state strategy, ignoring all the early primary/caucuses came in a distant third with not a single county to show for all his concentrated efforts/resources in Florida. Late today, Giuliani annouced that he was ending his presidential bid.

    Share of total votes by County by Candidate

    Color key: McCain counties = Red, Romney counties = Green and Huckabee counties = Blue

    Vote counts by counties for McCain and Giuliani shows a very strong correlation coefficient of 0.999048, indicating that both were going after similar constituency within the Republican party and that McCain’s gain was Giuliani’s loss. Below are vote counts for the top five counties for each of the four candidates.

    McCain’s top five counties by vote counts are Miami-Dade (~75,500), Pinnellas (~44,000), Broward (~40,660), Palm Beach (~38,480), Hillsborough (~37,800).
    Giuliani’s top five counties by vote counts are: Miami-Dade (~ 40,250), Pinellas (~19,280), Broward (~18,660), Palm Beach (15,975), Hillsborough (~15,850).
    Romney’s top five counties by vote counts are: Duval (~36,650), Pinellas (~34,970), Lee (~34,140), Hillsborough (~30,450), Palm Beach (~29,230).
    Huckabee’s top five counties by vote counts are: Orange (~15,710), Duval (~13,830), Hillsborough (~12,840), Pinellas (~12,350), Brevard (~11,580).
    Overall, out of 1.9 million votes cast in the closed primary, McCain won 36% (689,000) votes to Romneys’ 31% (~593,000), followed by Giuliani (~279,880) and Huckabee (~258,300).

    For detailed datasets go to GeoCommons and search with keywords “Florida Republican Primary” and explore interactive thematic and heatmaps of vote counts for each of the top four republican candidates.

    Popularity: 15% [?]

    Economy was the number 1 issue for Michigan Republicans and they voted in large numbers for Mr. Romney. Michigan has one of the highest rates of unemployment in the U.S. Huge job losses in manufacturing sector, mainly due to down-turn in Michigan’s auto-industry has voters worried about the future. Romney’s “optimistic” message that he would fight to bring those jobs back to Michigan resonated with voters as opposed to McCain’s “straight talk” message that the lost jobs are never coming back!

    Romney vis McCain: Michigan primary vote

    Note: Brighter hues = Higher vote count, Darker hues = Lower vote count

    This is Romney’s first win (not including his win in the Wyoming primary) and probably saved him from dropping out of the primaries after finishing 2nd, both in Iowa and New Hampshire. McCain and Huckabee finished 2nd and 3rd respectively. With Romney’s win in Michigan, the GOP has no clear cut front runner. Romney got nearly 38.9% votes (~337,700), to McCain’s 29.7% (257,400), followed by Huckabee at 16% (~139,600).

    DNC (Democratic National Committee) decided to punish Michigan for violating primary rules by moving forward its primary date. They stripped Michigan of all its delegates for the National convention. As a result, both Obama and Edwards withdrew their names, while Clinton’s name remained on the ballot and she won the primary. According to some, Clinton would have won the primary anyway.

    Clinton vis Mr. Uncommitted: Michigan primary vote


    Note: Brighter hues = Higher vote count, Darker hues = Lower vote count

    Interestingly many who wanted to support either Obama or Edwards voted “Uncommitted”. Clinton won more than 55% vote (~327,300) compared to 40% vote for Mr. “Uncommitted” (~236,900). And much of the “Uncommitted” vote came from African Americans and young voters according to the exit polls. This does not bode well for the Clinton camp, as the Democratic primaries move south where African-Americans are a major constituency. More on this in the future blogs.

    In the meanwhile you may want to explore all of the maps shown above at Geocommons. Search with keywords “Michigan” or “Primary” to discover dynamic, interactive maps with zoom-in and pan.

    Popularity: 17% [?]

    Clinton’s come-from-behind stunning victory in the New Hampshire Primary makes her the New Comeback Kid. Below is a heatmap of spatial distribution of Clinton’s votes by cities/towns and places in the southern New Hampshire.

    We at FortiusOne further analyzed voting patterns to find spatial distribution of where Obama, the New Kid on the block won more votes than Clinton in New Hampshire primary.

    Explore the interactive heatmaps along with tons of data on Presidential politics on Geocommons.

    Popularity: 15% [?]