We’ve been doing a lot of data migration and new data uploads with Finder! and often times our data team runs into data and mapping headaches. One that we commonly encounter are largish shapefiles that make for really bloated KML when we convert it (for instance a 2mb shapefile for US counties becomes a 5.4 mb KML file). The end result are big files that completely kill browser based applications like Virtual Earth and Google Maps, or load really slowly in thick client applications like Google Earth and ESRI AGX.

There are three factors that constitute file bloat for any vector based geospatial data:

1) The number of attributes (how many columns)
2) The number of features (how many rows)
3) The complexity of the geometry (how much needs to be drawn)

You can do some clever things to manage the first two at a low level - although you still are going to have bloat when you convert to a standard file format. The third factor, geometry complexity, is interesting because you can also do some low level tricks whose savings can be passed along to standard file formats. Reducing the complexity of geometry is often called “map generalization” in academic circles.

The general concept is that you remove details from the map without loosing the message and context of the map. All maps have some form of generalization otherwise it would be a perfect reflection of reality. Academics have used algorithms to heuristically derive a map generalization. This is probably best explained with a few examples. Below is a map of Europe in full detail:

europe_mapshaper_detail

Next is map generalization that removes some of the detail but still keeps the context of Europe and the country boundaries:

europe_mapshaper_medium

Last a more extreme example with even greater detail removed:

europe_mapshaper_sparse

To pull off these nifty computational tricks used to require some fairly sophisticated desktop software, but Matt Bloch and Mark Harrower at the University of Wisconsin figured out a clever way to enable enable real-time WYSIWIG map generalization. The resulting application is called MapShaper. You can upload a shapefile and run different generalization routines (with high level of control if you choose) then export the result back out as a shapefile or an EPS file. The shapefile export is down at the moment, but hopefully will back in action soon.

I think these kinds of technologies and mathematics are going to be increasingly important as we need to make ever larger datasets available. Especially when the receiving devices are increasingly mobile with even smaller data handling capabilities.

Popularity: 21% [?]

Links List 5.23.08

May 23rd, 2008by Sean Gorman

Computing, GIS and Archeology in the UK shares a tip for importing UK Mastermap data into Postgres or Shape files. The Mastermap importer imports mastermap data into ogr formats such as shapefiles, or into a postgresql database, in a free and easy to use way.

Glenn at the AnyGeo Blog posts an interesting question, who owns data? A recent article in Forbes prompted the question, and we’re really interested to know. We think that the more open data out there that is shared, the better everyone will be. Data ownership is all about creating a bigger network effect for data.

Last week we talked about Google’s Map API for Flash. But, does it really work? FlexRIA tries to work it out.

Free Geography Tools maps out PolicyMap, which is designed to easily display data on a map. With data and information from real estate to crime to health to schools, the possibilities are endless!

Popularity: 7% [?]

James Fee got another great discussion going on the ESRI / Google partnership asking the fundamental question of how “do you monetize your information in such a world?” Specifically a world where Google is giving away data for free.

The Google answer to this at Where 2.0 and WhereCamp was to post data that leads new users to your “for fee” content and services. This general concept was well debated and summed up fairly well is this comment. While there are many short term issues and decisions to be made by folks in regard to Google’s geo-index, I believe in the long run it will be beneficial. This may sound odd since we just launched Finder!, which provides searching, sharing and organization of geodata.

My philosophy is based on the power of a network effect for data. Just like “network effects” with technologies like cell phones and faxes, each additional data set that is made available to combine with other data sets increases the overall value of the network of data. This is especially true as you begin to make semantic association between a variety of disparate data sets. The same concept is what landed Metaweb and their Freebase application $42 million in funding. It is a powerful concept and Google, of course, is looking to be the center of the network.

The upside is you do not have to be Google to benefit from the network effect. All you need to do is have the ability to interconnect your data with the new content being produced. This does not mean giving away all your data to Google. In fact you could do the opposite and just suck in data from the Google geo idex API (although I’m not sure if the TOS restricts usage to publicly viewable websites).

The trick I believe will be to leverage the interconnectivity of Google to increase the value of your own data and services.

  • Leverage the ability to remix content opened by Google with your own data
  • Use the index to drive users to your premium data and value added services
  • Create semantics and rich metadata that differentiates and creates value around your data
  • I think what is missing in this new equation is the ability cleanly segment public and private data while still retaining the ability to seamlessly mash them up. One of the things we did with Finder! was to allow data to be flagged public or private. This allows users to take advantage of the network effect without having to share confidential data. That way the option to expose data to web services is not binary (everyone gets it or no one gets it).

    This also opens up the possibility of data marketplaces. Paul Bisset has some great comments in the Fee thread on how they have done this with WeoGeo. I believe we’ll see several new creative ways to deliver “for fee” content to the GeoWeb (hopefully interconnected and federated), especially as tools and applications develop that can leverage the data. The beauty is we are opening a whole new market to purchase the data and services that did not exist before.

    As GIS data and services enter the IT main stream there will be a new market of customers to win. I think at the end of the day that will be a win for everyone just as a larger network of interconnected data (both free and for fee) will.

    Popularity: 6% [?]

    Links List 5.16.08

    May 16th, 2008by Sean Gorman

    Mashable reports that Google Maps features a new API that has flash graphics that can be used for each title layer, maker and information windows. This means you can create more dynamic map mashups.

    Not only does Google Maps have flash graphics, but they have also added the ability for video sharing, Wikipedia entries, real estate listings, and geo-coded photos.

    Google Earth and David Rumsey have formed a relationship where historical map collections are available through a Google Earth layer. More data means more mashups!

    Interesting how Where 2.0 has a Twitter account that wasn’t followed nearly as much as it should have been.

    All Points Blog also provides a “plain-English” explanation of the Google / ESRI announcement.

    Popularity: 13% [?]