The GeoCoding Quest and 2006 VC Investment Data from Rob Finn
March 6th, 2007by Sean Gorman
A little while back I got a call from Rob Finn at Edison Ventures asking about our little start up and some help with a side project he was working on with his blog – VentureBlogalist. Rob had collected a list of venture deals for 2006 by the location of the funded company and wanted to create a heat map of it. He was nice enough to send along the list and I figured it would be a good chance to see how hard it was to morph that spread sheet into geospatial data and put it on a map. There were 1082 funded companies on the list so I started testing geocoders to see what would get the job done.
I took my first shot with Batchgeocode.com which has a straight forward web interface that allows you to cut and paste a tab delimited list of addresses and have that geocoded and exported as KML. Sweet – just what I needed, so I saved the excel file as a .txt tab delimited file, opened it in notepad, cut and pasted into the box and I was off and running. Clicked the button and hit a road block – “recommend you do not geocode more than 500 addresses at a time”. It did not work to break the data set up into chunks of 500 and I had over a thousand addresses so I hit the “run it anyways” choice. The app auto picked the street, city, zip and state fields and all I had to do was pick the name and description fields – good to go. So, I set the batch geocoder running and waited to see if it would handle all 1082 addresses. First thing I noticed was my computer – a beastly Alienware laptop – was really straining and all my Firefox windows were barely functional. About two hours later it completed, and the bad news is the KML file was empty and not usable. Not bashing Batchgeocode.com just don’t excede the 500 limit because it no workie. Otherwise I though the process was pretty simple and intuitive.
Bad news is I still did not have any KML to show Rob pretty heat maps with. So I went back to the drawing board and downloaded the Juice Analytics gecoder . Once I enabled macros in excel it was not too bad of a process. You have to get an application ID for the Yahoo! geocoder and have a YahooID, but otherwise was pretty smooth sailing. It is a bit of work to cut and paste your street address information and you can usually only get 100 or so addresses geocoded in one stretch before you loose connection to the Yahoo! server (do not hit debug on the macro script just hit end and highlight another chunk of addresses to geocode).
Once I got through my list I clicked on “launch in Google Earth” then right clicked on the data set and saved it as KML. From there I uploaded it into GeoCommons added it to my workspace and made some pretty heatmaps to send back to Rob. So if you want to see where the hottest VC activity is just look below:

I got a nice heat map of all the venture deals (green squares) Rob had collected across the USA. The obvious hotspots showed up in the Bay Area and the Boston Washington corridor. The activity in Denver is a bit new from previous tech spurts, but it amazes me how much things stay the same tracking innovation and diffusion of new technology. Back in school we did lots of research on this and the trends are amazingly similar over time. Richard Florida, now at GMU, found strikingly similar patterns of VC investment back in the eighties. During the last Internet boom Matt Zook found the largest explanatory factor in the aggregation of domain name registration by geography was the presence of venture capital in the region – the maps looked the same as above. In general it is a fascinating topic but I’ll not bore everyone to death getting to academic on it all. Usually best to stick with the pretty pictures. So, here are some close ups of the hotspots:

The usual suspects in the Bay Area – Silicon Valley innovation is not going any where

A cool close up of where the funded startups are in downtown San Francisco

Boston is still taking the lead along the BoWash Corridor, but the DC metro region is making a strong showing
While the venture hotspots was fun stuff the whole process illustrated some fundamental need to have easier tools to get unstructured geospatial data onto maps. We’ve been working on supporting .csv and .txt uploads into GeoCommons with geocoding support, but todays experiment gave me several ideas about how the process could be a whole lot more simple. The potential of tapping into all the data sitting in spread sheets that has street address data, zipcodes, county, or states and turning that into information that is easy to map, share and consume gets us excited for the upcoming releases. We’ll see how easy and fluid we can make.
Popularity: 9% [?]






March 7th, 2007 at 10:47 am
Fantastic work but is there an actual link to the data set?
March 7th, 2007 at 11:01 am
Thanks Niki – the data set belongs to Rob Finn and you can contact him through his blog at http://www.ventureblogalist.com/ to ask about the dataset. We are beta testing GeoCommons with about 15 people currently. When it goes live you’ll be able to download any of the datasets in it as KML or export the mashup and analysis as a live map in your web page or blog. We are working on getting the current version more user friendly and the bugs worked out. Also a little dbase upgrade – MySQL gets grumpy when you start pushing over a billion different spatial geometries. Good news is there will be lots of data sets to choose from
March 7th, 2007 at 2:08 pm
[...] I feel like the kid who had his parents do his science fair project for him. Background: I wanted to create a map of the 1,080 VC investments in software and internet related companies. The plan was to use GeoIQ, an interesting web service with API to add heatmaps and other analytics to mapping applications like Google Maps. Unfortunately, I chose the wrong day to work with Google’s API (abridged version). Fortunately, the folks at FortiusOne, proud parents of GeoIQ, were able to come to my rescue. The below heat maps are the end result. Sean Gorman, CEO of FortiusOne, describes the entire process here. FortiusOne will be the source for DIY mapping/mashups/analytics for the masses. It’s already fairly easy. [...]
March 14th, 2007 at 10:36 pm
Would you be able to develop a close up map of the Boston area similar to San Francisco. Though include 128 and 495.
March 19th, 2010 at 12:50 am
?Excellent blog with lots of useful information on financial