James Fee Gives His Two Cents on GIS and GeoWeb
June 4th, 2008by Sean Gorman
We had the opportunity to catch up with James Fee at Spatially Adjusted and Planet Geospatial to get his opinion on the current and future state of GIS and the GeoWeb. James is a certified GIS Professional (GISP) and GIS developer, analyst, and consultant and has spent the last decade implementing, developing and consulting on GIS projects. He has experience with almost all of the large players in the geospatial field such as ESRI, MapInfo, Manifold, OSGeo (MapServer, GDAL, QGIS, OpenLayers), MapDotNet, Oracle, Microsoft and Google.
FortiusOne: Where do you see GIS going in the next 10 years?
James Fee: I think data and collaboration will be huge in the next 10 years. The explosion of Neogeography and projects such as OpenStreetMap have brought many new faces into GIS. Not only are we seeing spatial data being pushed out to the public at large, but this data is beginning to be integrated into GIS workflows. We’ll begin to see metadata and documentation of these datasets as well making them very valuable.
While freely sharing data has been great, the next logical step is allowing companies to monetize their datasets, share as easily. The ability to pay and use data services should revolutionize the industry. The price of data hasn’t really been a limiting factor yet, but the difficulty of integrating these datasets into online mapping or even in desktop GIS has hurt adoption. In addition the speed of geospatial data services has been poor so moving these services into the Cloud should improve performance and increase profitability given that there is no need for large overhead (such as servers, or bandwidth).
FortiusOne: Will there be convergence between GIS and the GeoWeb to the point that they become indistinguishable?
James Fee: Possibly, I think this have been the holy grail that everyone has been trying to attain. GIS by its nature is complex and you generally need complex solutions to complex problems. That said I think we’ll see many operations that were the domain of GIS begin to be part of the GeoWeb. Basic geoprocessing over the web via an easy-to-use interface can satisfy a vast number of use cases of general users without hitting them over the head with a steep learning curve.
Usually moving GIS to general uses has been by giving them the kitchen sink and expecting them to figure it out. Simple solutions to their “simple” problems is how we’ll see GIS and the GeoWeb converge. Over time more and more “complex” analysis will be available to use by just about anyone with a computer, but I’m not sure we’ll see that in the next 10 years.
FortiusOne: Do datasharing and crowdsourcing have a place in GIS?
James Fee: Yes, but the problem is how do you give GIS professionals the ability to use the data and make decisions about its accuracy. I guess it brings up the question, do you trust a Biologist in the field with at GPS more than a hobbyist? I’d guess most GIS professionals would pick the Biologist, but a degree in Biology doesn’t mean the data is necessarily good.
Datasharing and crowdsourcing are great ideas but for GIS to use them, they need metadata, documentation, and possibly a rating system. A “marketplace” should allow users to rate the quality and accuracy of the data which both helps others make decisions about the data and gives feedback to the creator on how they can improve their dataset. OpenStreetMap has been a great example on how “experts” can help “novices” grow to be experts in data collection.
FortiusOne: Should there be a marketplace for online geodata?
James Fee: Totally, I think there has to be. First off, you need some place users can feel comfortable buying data. Second you need a place where data can be rated and reviewed. Third you need a place where data providers can put their information in the cloud for quick and easy access buy everyone. If someone is investing time and energy into creating their data, I don’t see any reason they can’t be rewarded for this.
I think some data will be available via micropayments and other data will be very expensive (or the ability to pay for read only data vs editable data). Having some place where users can go to both sell and buy data, search for data, and review data is critical today. Sure Google will index spatial data, but being able to go to a focused marketplace will put buyers and sellers together quickly. And at least today, any site that sells data should be compatible with ESRI software. Offering up data types that aren’t compatible with ESRI will limit any marketplace.
FortiusOne: What emerging technology trend will have the biggest impact on GIS?
James Fee: I think putting a GPS in so many “ordinary” things is going to impact GIS immensely. Walking around with a GPS in your phone should give you access to many GIS applications, digital cameras and video cameras with GPS will spatially enable tons of datasets.
FortiusOne: What is your reaction to the Google – ESRI announcement?
James Fee: We’ll have to see what impact this really has. It isn’t revolutionary the idea that Google might index GIS servers, the hard part is getting all these traditional ESRI clients to open up their data is the challenge. They’ll need to see the benefit to allowing users to view their data in any way they choose rather than the traditional hard to use ESRI web mapping front end.
FortiusOne: What impact will Google have on GIS?
James Fee: Google has already had a huge impact on GIS. At a minimal level, it has already allowed GIS users to search for data sets. Google Maps has totally changed how web mapping is used and displayed on the internet, Google Earth has pushed 3D GIS to the mainstream and now their geo search 2.0 and geo sitemaps they pushing spatial searching. Google has been really good about getting spatial data in front of everyone in a way everyone can use it. GIS has learned much from this and the new tools coming out by ESRI, Autodesk, etc. all are very “Google-like”.
Popularity: 25% [?]
Dataset of the Day: Health Care in Cuba
June 3rd, 2008by Emily Sciarillo
Cuba has been in the spotlight lately as Raúl Castro officially takes over as President ending the 49 year rule of his brother Fidel Castro. What will be the legacy of Fidel Castro and the socialist revolution that he led since 1959? One of the most acclaimed successes for the Cuban government has been its progress in health and health care, particularly in the rural areas in the eastern part of the island. Whether or not health care in Cuba is what the government claims it to be is strongly debated. See for yourself the state of health and health care in Cuba using Finder!.
The Cuban government provides in depth statistics on the health of its population by province and finder has these data available for the years 1996 to 2006 with more than 80 health and health care related attributes. Whether you are interested in the change in infant mortality over the last decade, which provinces have more doctors per resident, or what is the leading cause of death in each province, this dataset will help illustrate what the situation is on the island.
Here is an example of what these data can be used for. This map shows the number of family doctors per habitant in 2006. Provinces in red have less doctors and the green ones have more.

See data for:
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
Popularity: 15% [?]
Using MapShaper to Create Smaller Shapefiles and KML through Finder!
May 24th, 2008by Sean Gorman
We’ve been doing a lot of data migration and new data uploads with Finder! and often times our data team runs into data and mapping headaches. One that we commonly encounter are largish shapefiles that make for really bloated KML when we convert it (for instance a 2mb shapefile for US counties becomes a 5.4 mb KML file). The end result are big files that completely kill browser based applications like Virtual Earth and Google Maps, or load really slowly in thick client applications like Google Earth and ESRI AGX.
There are three factors that constitute file bloat for any vector based geospatial data:
1) The number of attributes (how many columns)
2) The number of features (how many rows)
3) The complexity of the geometry (how much needs to be drawn)
You can do some clever things to manage the first two at a low level - although you still are going to have bloat when you convert to a standard file format. The third factor, geometry complexity, is interesting because you can also do some low level tricks whose savings can be passed along to standard file formats. Reducing the complexity of geometry is often called “map generalization” in academic circles.
The general concept is that you remove details from the map without loosing the message and context of the map. All maps have some form of generalization otherwise it would be a perfect reflection of reality. Academics have used algorithms to heuristically derive a map generalization. This is probably best explained with a few examples. Below is a map of Europe in full detail:
Next is map generalization that removes some of the detail but still keeps the context of Europe and the country boundaries:
Last a more extreme example with even greater detail removed:
To pull off these nifty computational tricks used to require some fairly sophisticated desktop software, but Matt Bloch and Mark Harrower at the University of Wisconsin figured out a clever way to enable enable real-time WYSIWIG map generalization. The resulting application is called MapShaper. You can upload a shapefile and run different generalization routines (with high level of control if you choose) then export the result back out as a shapefile or an EPS file. The shapefile export is down at the moment, but hopefully will back in action soon.
I think these kinds of technologies and mathematics are going to be increasingly important as we need to make ever larger datasets available. Especially when the receiving devices are increasingly mobile with even smaller data handling capabilities.
Popularity: 26% [?]
Power Law Distributions of Google Indexed KML: Is the Long Tail the Wrong Tail for the GeoWeb
April 29th, 2008by Sean Gorman
We made the cross country hop out to Santa Clara to attend Location Intelligence today. The weather is awesome and we just finished the morning workshops. I sat in on Lior Ron and David Minogue’s talk on “Searching the GeoWeb“.
The talk produced many interesting insights on Google’s approach to searching geodata, but one statistic really grabbed my attention. Of the millions of KML files Google has indexed roughly 95% of them have only a single feature. Meaning the vast majority of KML indexed by Google consists of single place marks like “this is my house” or “this is an airplane in flight“.
There are also many single place marks that have more useful data as well, and Lior did a great job presenting several work flows pulling up very relevant place marks for things like finding a place to windsurf in the Bay Area or a place to hike in Austria.
What I found fascinating was Google’s focus on the long tail of data, which has been a popular meme in Web 2.0 in general. The long tail refers to the tail end of a statistical distribution that covers a large number entities with small number of observations.
You can also think of this as the 80/20 rule, where 20% of the people have 80% of the wealth and the other 80% of the people have only 20% of the wealth. In this situation the long tale of the distribution is the 80% of the people with 20% of the wealth - where there are a large number of people with only small numbers of observations (wealth).
This is also called the Pareto principle and often manifests itself as a power law distribution that are commonly referenced in http://en.wikipedia.org/wiki/Complex_system to describe self organizing systems and networks.
Google’s indexing of KML on the GeoWeb is fundamentally a self organizing system of user generated content and not surprisingly it looks to fit a power law distribution. specifically a power law distribution where 95% of the KML has a single feature and the other 5% has a very large number of features that accounts for a disproportionate amount of the total features in the database. Without the raw data it is just a hunch on my part but I would bet a bar tab on the R square of a power law fit being above .85 on a rank order distribution of KML on file size or number of features.
So geeking out on statistics and complexity theory aside why does this matter? It matters because I believe it ignores the power of the short tail. The long tail is easy from a computational perspective to deal with - the files sizes are small and rendering small numbers of place marks is easy. This keeps everything very manageable and scalable.
The downside is it leaves out many of the most interesting datasets potentially available, because they are large and complex - sitting on the short tail. Another popular Web 2.0 meme is that “data is the Intel inside” - positing that large complex data sets are one of the key differentiators on the Web. So, it would seem in this case that the focus on the “long tail” and positioning “data as the Intel inside” are in conflict. This also may be another indicator of where the semantic web (or what ever you want to call the next evolution of things) diverges distinctly from Web 2.0. Until the GeoWeb can solve the problem of dealing with large complex datasets I think it will be difficult to answer deeper questions for users that create substantive value.
Talking with Lior and Dave after the workshop we agreed it was a tough problem, but definitely had big potential if solved well. Although Dave brought up the thorny issue of how do you know you are answering questions correctly. That is another can of worms that will have to wait for another blog post, but will be hugely important as things evolve.
As a side note apologies to everyone for the issues we’ve been having with the date on the blog. Our virtual machine decided it wanted to peer into the future and run its system clock faster than reality. Looks like we have it fixed but it blew away this blog post and several recent comments. I’ve done my best a rewriting this one but sadly looks like we’ve lost the comments. Fortunately most of them were letting us know the date had done gone crazy. On the upside if anyone wants to know what the weather is going to be like this weekend or how the primaries will turn out just let me know
Popularity: 32% [?]
Virtual Earth vs. Google MyMaps KML Support
April 26th, 2008by Sean Gorman
As we’ve been putting GeoCommons through its paces I’ve been testing KML files we generate in different applications. The most interesting comparison by far has been between Virtual Earth and Google MyMaps. I did a high level comparison of the two plus Yahoo! MapMixer a few blog posts back, but after testing several KML files in each I thought it would make for a good follow up. Especially after Michael Jones’ comments to James Fee’s post about KML being the HTML of the GeoWeb.
The good news is that both Virtual Earth and Google Maps support KML, and we are seeing a greater number of applications supporting it and GeoRSS as GeoWeb standards. As the standards get picked up it will be interesting to see how they are supported and how applications differentiate themselves in doing so. Already we can see this beginning between the two titans (Microsoft and Google) expressing how their support of KML has advantages over the other. So, I thought I’d share what our experience was testing with both applications.
Google KML Support
For testing purposes I started off with a polygon data set of the 100 most polluted counties in the United States. The upload process for Google MyMaps was straight forward and my uploaded KML (or GeoRSS) file prepopulated a title and description field. Then after a bit of chugging rendered the KML file on the map. You can see the map I created embedded below:
If you look closely you’ll notice that there are not 100 counties on the map (only about 44). Google MyMaps will support 200 pushpins on a map, but when you add in complex polygons the number of polygons and associated pushpins it will support goes down significantly. In the MyMaps application it gets around this problem by paginating the KML file into multiple maps each supporting the maximum number of pushpins, lines or polygons. Unfortunately you can only embed one map page at a time, so the map above only shows the first set of polygons.
An interesting observation in the Microsoft blog post about KML support noted that, “on Google Maps the polygons representing the parks didn’t load at all”. Our KML rendered the polygons fine, but we took an extra step in GeoCommons to generate our polygons as multigeometries where a pushpin with the data is included inside the polygon and highlights when you mouse over (at least in Google Earth). So, my hunch is that in order to get polygon KML to render in Google MyMaps you need to structure it as a multigeometry, or they’ve added the functionality since then. It would be great to not to have to add the pushpin to get the data, and enable clickable polygons in both Google Earth and Google Maps.
On the plus side Google MyMaps does a good job handling multi-polygons. A multi-polygon is when you have multiple polygons representing one geographic entity. For instance the United States of America consists several separate polygons, including Alaska, the Hawaiian Islands, and the contiguous states. Several of the counties in our test data set had multi-polygons and you can see those rendered in detail in the embedded map below:
A second plus for Google MyMaps is balloon support for the data that shows all the attributes in a nicely parsed list. Even when I loaded up a census data set with 74 attributes it listed them all out with a scroll bar. So to recap:
Advantages = prepopulated title and description, quick load, multi-polygon support, full listing of data attributes.
Disadvantages = limited number of polygons rendered on one map, requires multigeometry KML to support clickable polygons, slow rendering of polygons, no ability to export KML or other standard.
Virtual Earth KML Support
Virtual Earth KML support is provided through the “Collections” feature. When you click “Import Collection” you are given the option to add a KML file (or GeoRSS or GPX). I uploaded the same county pollution file and Virtual Earth chugged along for a bit then gave me a message saying, “100 out 100 items uploaded”. I’ve tried this with other files and if the files has more than 200 features it will not upload all of them - just the first 200 then stop. Also if your KML file is over 2mb it will tell you it is too large. Over all this is a nice feature that lets you know the bounds of the system and what will work and what will not.
The second nice part is that all 100 counties made it on one map instead of just 44 as with Google. A second bonus was that Virtual Earth did not need the multigeometries to support the clickable polygons rendered on the map. In fact the multigeometries we included in our KML generation caused both a pushpin to be drawn and and second square that gets highlighted when you mouse over the polygon. You can check out the map here and see the screen shot below:
Sadly Virtual Earth does not support embeds, so just the screen shot and link. Another small ding, a,s you can see in the screen shot, is that Virtual Earth does not support multiple polygons. The spots where you see push pins instead of polygons is indicative of multiple polygons representing a county, like Galveston, that could not be rendered so a push pin was placed there instead. It still gets the job done, but there is still something dissatisfying about America’s or any other political unit’s borders being replaced by a push pin. The last complaint is Virtual Earth only supports a limited number of characters for attributes, so when I tested a census file with 74 attributes I only got the first twenty or so and they were not well formated. So to recap:
Advantages = ability to render more polygons, ability to render polygons faster, ability to support clickable polygons without mulitgeometries, ability to export KML (and other formats)
Disadvantages = inability to support multi-polygons, slow to load KML, limited support of data attributes, no support of balloon styling
Over all I would give a slight edge to Virtual Earth when it comes to KML support from our unique perspective. Specifically the ability to load a larger number of polygons on a map and make those easily clickable allows more of our content to be leveraged at this point. It will be interesting to see how Google, Microsoft and others continue to enhance KML support to make more data available. I believe there is still a long way to go and the vast majority of the datasets in GeoCommons are too large for either to handle at this point. As the GeoWeb and the data it interconnects becomes more sophisticated I think it will be a necessity to greatly increase the amount and complexity of data that can be handled in a browser based map. Hopefully the market pushes Microsoft, Google and others to innovate in that direction.
Popularity: 75% [?]










