Using MapShaper to Create Smaller Shapefiles and KML through Finder!
May 24th, 2008by Sean Gorman
We’ve been doing a lot of data migration and new data uploads with Finder! and often times our data team runs into data and mapping headaches. One that we commonly encounter are largish shapefiles that make for really bloated KML when we convert it (for instance a 2mb shapefile for US counties becomes a 5.4 mb KML file). The end result are big files that completely kill browser based applications like Virtual Earth and Google Maps, or load really slowly in thick client applications like Google Earth and ESRI AGX.
There are three factors that constitute file bloat for any vector based geospatial data:
1) The number of attributes (how many columns)
2) The number of features (how many rows)
3) The complexity of the geometry (how much needs to be drawn)
You can do some clever things to manage the first two at a low level - although you still are going to have bloat when you convert to a standard file format. The third factor, geometry complexity, is interesting because you can also do some low level tricks whose savings can be passed along to standard file formats. Reducing the complexity of geometry is often called “map generalization” in academic circles.
The general concept is that you remove details from the map without loosing the message and context of the map. All maps have some form of generalization otherwise it would be a perfect reflection of reality. Academics have used algorithms to heuristically derive a map generalization. This is probably best explained with a few examples. Below is a map of Europe in full detail:
Next is map generalization that removes some of the detail but still keeps the context of Europe and the country boundaries:
Last a more extreme example with even greater detail removed:
To pull off these nifty computational tricks used to require some fairly sophisticated desktop software, but Matt Bloch and Mark Harrower at the University of Wisconsin figured out a clever way to enable enable real-time WYSIWIG map generalization. The resulting application is called MapShaper. You can upload a shapefile and run different generalization routines (with high level of control if you choose) then export the result back out as a shapefile or an EPS file. The shapefile export is down at the moment, but hopefully will back in action soon.
I think these kinds of technologies and mathematics are going to be increasingly important as we need to make ever larger datasets available. Especially when the receiving devices are increasingly mobile with even smaller data handling capabilities.
Popularity: 26% [?]
Map Creation Apps - Google vs. Microsoft vs. Yahoo
April 9th, 2008by Sean Gorman
I promised Andrew a comparison of the big three map creation applications by feature and functionality, so here it goes. The story of how lightweight web based map creation applications came to be is interesting in and of itself. I think looking at how the three applications evolved historically will provide a bit of insight.
Before the GeoWeb came into mainstream popularity both Microsoft and Yahoo! had mapping applications. Microsoft offered their browser based Terraserver which hooked up USGS imagery for the map tiles. Microsoft launched Terraserver in June of 1998 - practically prehistoric.
Microsoft had also been active in the mapping space with products like MapPoint (both desktop application and web services). Yahoo! also was an early adopter of mapping applications in conjunction with their local search destination (although I completely failed at finding a date for when they first added maps). Despite the early adoption of web based mapping applications by Yahoo! and Microsoft it was arguably the launch of Google Maps in 2005 that jump started both the GeoWeb and the mash up craze.
Shortly after Google Maps launched, Paul Radamacher hacked the application to allow it to display Craig’s List rental listings on the Google slippy map. Shortly there after Adrien Holovaty followed suit mashing up Chicago crime statistics with Google Maps. Google quickly released an API to allow developers to do the same thing seamlessly and we were off to the races. Microsoft quickly created Virtual Earth and Yahoo! pushed out Yahoo! Maps. Microsoft created compelling innovations with birds eye imagery and Yahoo! launched several popular GeoWeb services like free geocoding and Flash based mapping APIs.
Microsoft Collections
Through all these innovations there was a constant one way flow of content creation - developers could create unique maps and users could view them. Microsoft changed this when they launched Collections May 23, 2006:
Collections. Social networking functionality allows customers to create lists of favorite landmarks and locations, attach personal photos and save them to a Scratchpad. Collections can be saved, recalled later, “permalinked,” and shared with friends and community in e-mail or through their MSN® Spaces blog.
While not well publicized the “Collections” concept fundamentally changed the work flow for creating maps. No longer did you need to be a developer or GIS pro to create a basic map and share it with other people. The Virtual Earth folks even gave users a decent amount of cartographic power and options:
Customized pushpins. A pushpin is essentially a marker indicating points of interest on a map view. A customized pushpin can easily be added with a simple right click, anywhere on a map, which will display a small red dot and a pop-up menu. A pushpin title or note of up to 200 characters can be added that will appear with the pushpin whenever a mouse hovers over it. Pushpins can easily be edited or deleted. When a pushpin is removed, whether customized or standard, the remaining pushpins will be automatically renumbered.
2-D drawings in Collections. Users can add lines and drawings in a variety of colors, shapes and styles to personalize their Collection. They also can draw lines and shade areas that they want to mark on the map, such as for marking a running or bike trail, or neighborhood boundaries).
MyMaps
Despite the potential of the innovation the new functionality did not get much coverage in the press or massive levels of adoption. The TechCrunch article on it was lumped in with other new features from Yahoo! Maps.
Just short of a year later Google launched Google MyMaps on April 4th 2007 to big headlines across the blogs, including MyMaps being the death knell of popular map mashups like Platial, Frappr and Tagzania.
Fundamentally the functionality and features of MyMaps was not remarkably different than Collections, but the buzz around it was at least ten fold. So why was the attention so skewed towards Google for fundamentally the same innovation Microsoft had launched a year earlier? A few guesses:
MapMixer
Yahoo! was not too far behind launching their own map creation application, Yahoo! Mapmixer on September 13th 2007. Mapmixer took a different angle on map creation by allowing users to put static maps on top of the Yahoo! Maps applications. For instance after the Buscan oil spill in the San Francisco Bay last year I made a lot of calls trying to get the raw data on the location of the spills, for GeoCommons, but had no luck.
I did find a PDF with a map of the oil spills so I saved it as a PNG then uploaded it to Yahoo Mapmixer and they took me through three easy steps to georeference the map on Yahoo! Maps. The user experience I thought was the best of the three and there were lots of great social features for me to give a short description of the map and for other users to comment on the map. Although much like Microsoft the application did not generate lots of buzz as with Google MyMaps, and the gallery only features 38 user submitted maps today. Interestingly, in concept, it is quite similar to Microsoft’s MapCruncher, although it is a download and supports a wider variety of raster based formats that must already be georeferenced.
Since the launch of map creation applications by the three big players there have been two noticeable waves of enhancement 1) support for external data and 2) collaboration features. Microsoft put themselves out as being the first to support loading KML, “The October 07 release of Live Maps was the first to support KML viewing and import to Collections”. November 27th 2007 Google added KML, KMZ and GeoRSS support to MyMaps. Google followed this up with social features, like commenting, rating and open collaboration invitations for MyMaps.
Performance Trials
That covers features and functionality from a historic evolution stand point, but how do they perform? We did a very informal, one user, stress test. Create push pins as quickly as possible and see when the map application maxes out or gets sluggish. For Yahoo! Mapmixer this was pretty easy. You can overlay one picture or map onto the application, so you max out at one.
In the process of loading and georeferencing the image you get speedy performance and predictable response times. For MyMaps and and Collections we had a bit more to stress. We’ll start with Collections where we created 200 push pins with good response time then got the following message “You cannot add more than 200 items to a collection. To add more items, create another collection.”
When we went with the same test on MyMaps,we did high rate push pin creation and after about 30 the system got a bit sluggish, and sometimes it would create a listing for a pushpin on left hand pane but not create the push pin on the map. The caveat here is we were doing this high speed, and when we slowed down to a more deliberate pace the system handled it fine.
MyMaps also maxes out at 200 push pins on the map, but instead of providing a warning it generates a pagination for a continuing set of push pins. So when you click on the first page you get a map with the first 200 push pins and when you click on the second page you get the next 200 push pins on a new map in the same browser and tab. Oddly it stops at 820 push pins and starts back over at the number one but you can keeping adding push pins to the map.
What’s Next?
That pretty much wraps it up for a comparison of the big three, how they evolved in a competitive environment, and a very ad hoc test of their limits.
I believe the most interesting part will be where they evolve to next. What is the next set of functionality that will distinguish one from the other? Can Microsoft or Yahoo! introduce the next killer functionality that will catch up to 7 million maps that have been created with MyMaps?
Popularity: 71% [?]
MIT’s GeoWeb Repository of Data
March 16th, 2008by Sean Gorman
We came across a small blurb in the MIT news today about the release of “MIT GeoWeb“
“… a new interface to the MIT Geodata Repository, enables users to access Geographic Information Systems (GIS) data, once accessible only in ArcGIS, through a standard web browser.”

The MIT GeoWeb provides a Google Maps interface to their extensive repository of geodata in shapefile format. In short you can search the MIT repository of data by geographic region, keyword or browse, then visualize the file that you find on Google Maps in the same browser. If you like what you find you can check out the metadata and/or download the shapefile. While the user interface is not the prettiest thing I’ve ever seen it looks to be effective with has a nice array of data you can browse. The quick visualization of lines, points and polygons is also a very nice feature.
On the downside you can’t click on the data rendered in Google Maps to see the information behind it. You also can’t download the data in a file format other than shapefile, so accessing the data is still restricted to GIS applications. Although the biggest kicker is to access to the application at all you have to be a MIT student or employee. That puts a bit of a damper on the whole thing, but still a clever implementation further pushing the frontier of open data access.
There is a nice screencast of the application here.
Popularity: 19% [?]
ETech Day Three - Elephants, Fire Eagles and Disaster Tech
March 5th, 2008by Sean Gorman
I got a bit wrapped up trying to get a side project finished up yesterday, so I’ll just skip to day three of ETech. The morning opening speakers were better that Day Two, although the session thus far have been a bit below Day Two’s. We kicked off the morning with an abbreviated talk by John McCarthy (father of LISP) on a new language he’s working for several years called Elephant. The elephant name coming from the fact it never forgets, and the broad concept of a semantic programming language that can create structured relationships from natural language. Unfortunately he ran out of time before he really got into the guts of it, but there were some fascinating concepts with how natural language can be leveraged in a structured way to do computation. Definitely something worth looking into more, and it reminded me a lot of our thoughts about a context driven architecture and natural language for data. Although we were looking to turn quantitative data into natural language versus turning natural language into data.
Following McCarthy’s talk there were some interesting bits on open source personal robots, then an informal launch on Yahoo’s Fire Eagle. Fire Eagle has taken some flack in the blogs for having minimal or “zero” functionality. I think this misses the point of what Fire Eagle is intended to do. My impression was that Fire Eagle is not meant to be a stand alone consumer application but a straight forward tool that does a simple thing very well. That simple thing being a platform for sharing your location online. The functionality folks are clamoring for is left to the users and developers and I think there are good number of fun possibilities here. For instance with GeoCommons we have big pile o’ data and would be very useful to personalize that data delivery to a users location, or have user have the ability to comment on that data from their location and have that comment geo-located. This creates a dependency on clever users, but form what I’ve heard floating around ETech there seem to be a good number of clever ideas floating around.
The last session of the day I attended was Mikel and Jesse’s presentation on “Disaster Tech”. I’d seen Mikel’s presentation at the State of the Map conference on open source disaster technology, and it was cool to see how the project has evolved. The whole topic is something close to us, especially getting up close doing disaster response after the London Bombings and Hurricane Katrina. The presentation has some great examples of Open Street Maps, Twitter and Google Maps being used in creative ways during disasters. Mikel gave a nice example of using the USGS GeoRSS earthquake feed, the EU lightweight tsunami propagation model and a feed to republish the resulting polygons as GeoRSS. With this approach they can churn out a polygon warning area in under a minute. A similar concept is seen at the United Nations - GDAC application.
All great stuff for ad hoc implementation that is cost effective and not over engineered. Lots of good discussion of how take the information produced by technology and effectively transmit it to non-technical or completely unconnected people. Also Jesse and Mikel had a nice bit at the end of the presentation on anti-patterns - i.e. what happens when you don’t have a champion for the technology to create repeatable and successful implementations. Specifically the case of the search for Steve Fosset where the crowd sourced help to find him actually slowed down the search and rescue teams having to deal with all the input. Resulting in the emergence of champions like InternetSAR that creates a structure that could be replicated and effective for search and rescue. Lots of good thought on an important topic
Popularity: 22% [?]
Bumping up Against the Limits of Google MyMaps
February 26th, 2008by Sean Gorman
Yesterday we posted a blog about the international fiber cuts a few weeks ago. While I am interested in the geography of fiber and failures in general, we thought it would be a good opportunity to put Google MyMaps through its paces for creating a substantive data driven map. After 25 or so hours of collective labor I thought it would be useful to give the postmortem on our experience.
While there are many positive qualities to Google MyMaps the biggest complaint is that we spent 40 hours mucking about with it. The goal for the last blog post was to create a map that had 1) the fiber routes and landings for impacted carriers, 2) the location of the fiber failures, and 3) the countries that lost connectivity because of the failure. Seemed like a straight forward set of goals and I naively thought we could bang it out in a few hours. So, what ate up our time? Could we just be cartographically challenged?
1) Creating country boundaries - tracing all the countries with outlines so we could make polygons for the failed states was a big sink hole of time. The worst part was when we were not quite complete we hit the limit for the number of points a MyMap could support. Thus it was unfinished and did not make it to the blog post. If you are curious at what point MyMaps bonked here is the map:
I’m trying to convince someone to count all the points so we have a numeric threshold but I think I need to offer more beer to get the bribe to work. The limit I’ve seen for number of points a My Map can support is 150, but it looks as if we exceeded that for drawing polygons.
2) Dealing with multiple layers - since there were three distinct layers to our MyMap we thought it would be useful to separate them out so the map would be easier to understand. The issue is that you can’t embed a Google MyMap with multiple distinct layers, they have to created as one continuous set. This was almost a deal breaker since we had broken up the work between three people (Bill Emily and myself). Fortunately we found a work around where we saved each of our maps/layers as kml then imported all three onto a new map (except Emily’s countries since it was the limit busting bonking layer).
3) Little bit of cartographic love - while push pins and drawing tools are great for posting pictures of my summer vacation some basic cartographic tools would have made life far easier. Dealing with the lack of a legend is challenging for conveying the story the map is telling. In MyMaps you get a list of every point on the map running down the right pain and with the embed you get nothing.
The conclusion at the end of it is MyMaps is a phenomenal drawing tool for maps - simple and intuitive. On the other hand if you want to create a data intensive map be prepared to run up against some technological limits, but more importantly be prepared to invest a good chunk of time. A large number of these limitations (need for enhancement) have been suggested in the MyMaps Google Group and it will be interesting to see if any are picked up in future releases.
* When I searched for other blog posts that have talked about the pros and cons for MyMaps I came up with zilch - making cross linking pretty tough. Interestingly the only comparison I found was for mapping service, but no one has compared the newer map creation tools. Maybe a topic for next time.
Popularity: 20% [?]








