Links List 6.20.08
June 20th, 2008by Sean Gorman
Data Transfer Solutions (DTS) developed an application for the Texas Forest service called the Texas Wildfire Data Browser. The application provides viewing for wildfire threats, fuel hazards and fire locations.
In light of the Iowa flooding, MSNBC posted an interactive map allowing users to track flooding locations in the Midwest. The majority of the points on the map give the levels of historical rivers and the others even link to specified news areas.
Google Earth’s text gets a make-over with a new option to view KML texts on the map. Designed by Sergey Devytakov, the new tool, called Labels, allows the specification of font changes, shadows and outlines and choice of icon, etc.
Maps and texts combine through Kvisu.com. This unique search engine takes text based results and aligns them with a surface map using visualized keywords.
Zimbabwe gets on the map. Google Maps has been used to track the political campaign of Morgan Tsvangirai and the unfortunate terror occurring in the country.
Popularity: 10% [?]
Google Earth API for the Web Browser
May 28th, 2008by Sean Gorman
Frank at the Google Earth Blog just leaked that Google will be announcing an API for Google Earth that will run in a browser. The short of it is you will be able to get GE’s 3D rendering capabilities and KML support to run in a browser. The first release will be just Windows, but will support IE, Firefox and other Mozilla flavors.
This looks to be a direct shot at Microsoft’s 3D Virtual Earth that also runs in the browser. The question mark in my mind will be if the Google Earth version has the same performance issues as MSVE. It is also interesting that Google released an API instead of a new version of GE that ran in a browser. Will this be a case of Google testing the waters with the API then releasing a product?
From a personal perspective I’ll be very interested to see how the new Google Earth API handles KML. Frank says the new API will be a, “subset of the Google Earth 3D graphics rendering engine and interfaces with KML support”. The question is will that KML support be robust like Google Earth allowing thousands of geometries to be drawn or less robust like Google Maps where you are limited to the low hundreds. I’m sure we’ll see soon enough, but congrats to Google on porting the technology to a browser, surely not an easy task. Although it begs one last question – does this herald the end of thick client geobrowsers?
Popularity: 12% [?]
Using MapShaper to Create Smaller Shapefiles and KML through Finder!
May 24th, 2008by Sean Gorman
We’ve been doing a lot of data migration and new data uploads with Finder! and often times our data team runs into data and mapping headaches. One that we commonly encounter are largish shapefiles that make for really bloated KML when we convert it (for instance a 2mb shapefile for US counties becomes a 5.4 mb KML file). The end result are big files that completely kill browser based applications like Virtual Earth and Google Maps, or load really slowly in thick client applications like Google Earth and ESRI AGX.
There are three factors that constitute file bloat for any vector based geospatial data:
1) The number of attributes (how many columns)
2) The number of features (how many rows)
3) The complexity of the geometry (how much needs to be drawn)
You can do some clever things to manage the first two at a low level – although you still are going to have bloat when you convert to a standard file format. The third factor, geometry complexity, is interesting because you can also do some low level tricks whose savings can be passed along to standard file formats. Reducing the complexity of geometry is often called “map generalization” in academic circles.
The general concept is that you remove details from the map without loosing the message and context of the map. All maps have some form of generalization otherwise it would be a perfect reflection of reality. Academics have used algorithms to heuristically derive a map generalization. This is probably best explained with a few examples. Below is a map of Europe in full detail:
Next is map generalization that removes some of the detail but still keeps the context of Europe and the country boundaries:
Last a more extreme example with even greater detail removed:
To pull off these nifty computational tricks used to require some fairly sophisticated desktop software, but Matt Bloch and Mark Harrower at the University of Wisconsin figured out a clever way to enable enable real-time WYSIWIG map generalization. The resulting application is called MapShaper. You can upload a shapefile and run different generalization routines (with high level of control if you choose) then export the result back out as a shapefile or an EPS file. The shapefile export is down at the moment, but hopefully will back in action soon.
I think these kinds of technologies and mathematics are going to be increasingly important as we need to make ever larger datasets available. Especially when the receiving devices are increasingly mobile with even smaller data handling capabilities.
Popularity: 20% [?]
Power Law Distributions of Google Indexed KML: Is the Long Tail the Wrong Tail for the GeoWeb
April 29th, 2008by Sean Gorman
We made the cross country hop out to Santa Clara to attend Location Intelligence today. The weather is awesome and we just finished the morning workshops. I sat in on Lior Ron and David Minogue’s talk on “Searching the GeoWeb“.
The talk produced many interesting insights on Google’s approach to searching geodata, but one statistic really grabbed my attention. Of the millions of KML files Google has indexed roughly 95% of them have only a single feature. Meaning the vast majority of KML indexed by Google consists of single place marks like “this is my house” or “this is an airplane in flight“.
There are also many single place marks that have more useful data as well, and Lior did a great job presenting several work flows pulling up very relevant place marks for things like finding a place to windsurf in the Bay Area or a place to hike in Austria.
What I found fascinating was Google’s focus on the long tail of data, which has been a popular meme in Web 2.0 in general. The long tail refers to the tail end of a statistical distribution that covers a large number entities with small number of observations.
You can also think of this as the 80/20 rule, where 20% of the people have 80% of the wealth and the other 80% of the people have only 20% of the wealth. In this situation the long tale of the distribution is the 80% of the people with 20% of the wealth – where there are a large number of people with only small numbers of observations (wealth).
This is also called the Pareto principle and often manifests itself as a power law distribution that are commonly referenced in http://en.wikipedia.org/wiki/Complex_system to describe self organizing systems and networks.
Google’s indexing of KML on the GeoWeb is fundamentally a self organizing system of user generated content and not surprisingly it looks to fit a power law distribution. specifically a power law distribution where 95% of the KML has a single feature and the other 5% has a very large number of features that accounts for a disproportionate amount of the total features in the database. Without the raw data it is just a hunch on my part but I would bet a bar tab on the R square of a power law fit being above .85 on a rank order distribution of KML on file size or number of features.
So geeking out on statistics and complexity theory aside why does this matter? It matters because I believe it ignores the power of the short tail. The long tail is easy from a computational perspective to deal with – the files sizes are small and rendering small numbers of place marks is easy. This keeps everything very manageable and scalable.
The downside is it leaves out many of the most interesting datasets potentially available, because they are large and complex – sitting on the short tail. Another popular Web 2.0 meme is that “data is the Intel inside” – positing that large complex data sets are one of the key differentiators on the Web. So, it would seem in this case that the focus on the “long tail” and positioning “data as the Intel inside” are in conflict. This also may be another indicator of where the semantic web (or what ever you want to call the next evolution of things) diverges distinctly from Web 2.0. Until the GeoWeb can solve the problem of dealing with large complex datasets I think it will be difficult to answer deeper questions for users that create substantive value.
Talking with Lior and Dave after the workshop we agreed it was a tough problem, but definitely had big potential if solved well. Although Dave brought up the thorny issue of how do you know you are answering questions correctly. That is another can of worms that will have to wait for another blog post, but will be hugely important as things evolve.
As a side note apologies to everyone for the issues we’ve been having with the date on the blog. Our virtual machine decided it wanted to peer into the future and run its system clock faster than reality. Looks like we have it fixed but it blew away this blog post and several recent comments. I’ve done my best a rewriting this one but sadly looks like we’ve lost the comments. Fortunately most of them were letting us know the date had done gone crazy. On the upside if anyone wants to know what the weather is going to be like this weekend or how the primaries will turn out just let me know
Popularity: 27% [?]









