Data is the Public Good. Data is the Infrastructure. Data is the Stimulus
January 28th, 2009by Sean Gorman
In the last post I whinged about what I thought was wrong with the various “geo” stimulus proposals, so I thought in this post I would talk about what I think is right. In short the value is in the data and making the data available to the public. Making data available and transparent to the public is what the administration wants to do, and they are writing the checks. This is different than making technology, infrastructure or even maps available to the public.
Make government data a public good at all levels - local, tribal, state, and federal. Expose the data on the Web in commonly used formats. If that format is a standard great, but it does not necessarily need to be. Shapefile is not a standard but data should be made available in the format because it is commonly used and many different technologies can consume it. Same goes for GeoRSS.
I think the greatest cost/benefit ratio can be from something this simple. Sean Gillies is correct that providing access to the raw data is more valuable than web services. The technology community is going to do a much better job making the data available as either web services or web destinations than the government will. We, the community, complain about how lousy government mapping sites are yet we, the tax payers, continue to pour tons of cash into building and maintaining them.
The examples of great technology being built on top of open data grows longer every day - EveryBlock (municipal data), Cloudmade (OpenStreetMap), Apps for Democracy (DC govt. data), Apps for America (Sunlight Foundation), Google Transit (municipal data), FixMyStreet and TravelMaps (MySociety.org), NAVTEQ and TeleAtlas (TIGER line data). I’m sure folks can add several more, this is just the tip of the iceberg. In each one of these examples companies or non-profits have sprung up on top of open data. Each one of these companies creates jobs, innovation, and competition in the market place. The data is no less valuable if one company uses it or a hundred do. It is a classic public good, which means the government can invest in it without tilting the market.
The New York Times has a great piece about providing stimulus to the technology sector. Robert Hall, an economist at Stanford, warns that providing stimulus to niche fields (which geo surely is) pushes funding to “a bunch of specialists, where if we raised spending quickly, the limited number of competent suppliers would be in short supply and get increased incomes,”. The result being an unbalanced playing field that only generates a few jobs with a minimal number of vendors.
While I believe this all looks good on paper there are still some challenges to efficiently making government data available as a public good. The biggest issues will be making it easily discoverable, since government data inherently resides in stove pipes across the Web. Many will pop in at this point and say, Spatial Data Infrastructure! Then we are right back to square one. Instead we could go for an easily implemented method allowing government agencies to expose their data for indexing and federation. Let the private sector build the infrastructure to discover the data and deliver it to the consumer. Do most people find local data through their municipal GIS map portal or Google/Yahoo/MS/Yelp/EveryBlock. My greatest fear is that a national SDI/GIS will just be a glorified municipal data portal.
The same will hold true for data at every other level of government. A lot of people have spoken on making raw government data publicly available. It is not a new problem, and that is exactly the point. Let’s not invent a problem that does not exist and hope we can get a chunk of the bailout. Instead solve a problem that already exists and use the rare opportunity of “collective will” within the government to jump start it. The cost is minimal and the potential economic stimulus is large. The market remains open and balanced.
Popularity: 17% [?]






January 28th, 2009 at 11:47 am
Hi Sean,
Good insights. We diverge on some of the finer points but there appears to be great convergence in our thinking regarding the availability of open data.
This “a glorified municipal data portal” made me chuckle a bit…
Jason
January 28th, 2009 at 1:15 pm
The issue with just providing data (e.g. shapefiles) is that they require download/conversion/etc… a process. In this process, how often do you update? Do you have/provide adequate metadata to know whether or not it’s most-current data? Do then you need to build a refresh process, to schedule a mechanism to perform the download and update on your end? Are there going to be dozens of other stakeholders all making redundant investments in the same type of refresh processes?
With Apps for Democracy et al, it was beyond just “data” but specifically directly-mashable data feeds - and this can be a means of providing and ensuring currentness, via KML network links, live GeoRSS feeds et al.
Part of my concern is in economies of scale (why not build it once, use it many times) and in potential liabilities, e.g. folks who might not be dilligent in routinely updating the datasets that feed their apps.
Easiest solution would be to just publish a live feed. Have agencies provide direct data access via KML network link, GeoRSS, WxS services, tile services, e.g. GeoServer. With a modicum of infrastructure planning, this could be quite scalable and robust, and serve a vast majority of need across the entire community. And, the data would reside in-place with each steward, in a federated NSDI. This is basic stuff, not complicated star-wars physics.
The flipside of the equation is in data collection efforts - e.g. EPA’s Exchange Network, which collects data from all 50 states, tribes and other participants. Or… you have OAM, great idea for crowdsourced data, but what happened here?- again, infrastructure crunch, needing sponsorship and funding.
“Just do it” is all fine and good, but definitely has its practical limits, particularly when dealing with an entire national dataset and applications which require cross-agency and inter-agency data.
January 28th, 2009 at 1:33 pm
It’s also important to note that “just give us the data” still doesn’t quite work, as there is still a lot of useful data residing in various agencies which isn’t yet geo-enabled, which isn’t structured, which might not be consistent, which might not be documented, and so on - the challenge goes beyond “just step away from the data and let us have at it” - many agencies themselves are still trying to get that data available, even for their own use. In some cases, even the geodata isn’t readily sharable, as it may reside in operational data stores, which is being continually updated, but which needs to be vetted for quality/completeness/accuracy prior to public access, along with data which might need to be scrubbed due to various security considerations, such as personal data, confidential business data, sensitive/critical infrastructure security, and so on.
For many reasons, that geo infrastructure needs to be improved with the various stewards themselves, as opposed to just turning it over as-is.
January 28th, 2009 at 3:08 pm
Hi Dave,
I think we are agreeing in many ways but probably differ on execution. I agree that federation is the path forward, but not in the ways that SDI’s have been done to date. I definitely do not think that the federation and infrastructure is something that should be built or managed by the government.
Go back to the discussion before there was a bailout - http://highearthorbit.com/ogc-geospatial-search-summit/
The community, making good progress on solving the problem we are talking about, in a vendor neutral way. All I am saying is lets continue down that path and use the opportunity to make more government data available within a system the market is already embracing.
The stated requirement by the administration is to make government data available to the public. So, data that is confidential, has privacy concerns, etc. is off the list to begin with.
Solving the problem of making the “right” data available to the public (cleaning it up, segmenting it, etc.) is exactly where the investment should be instead of a massive SDI. If you do not execute the data basics what is the value or point of the SDI in the first place. I have no issue with data feeds versus data downloads just so long as the raw data is available for remixing and repurposing. It should not be trapped in the tech delivery. A good federation and indexing approach solves the problem of currency, conversion, and refresh (we’ll post details along the lines of what came out of the search summit in the near future).
I think the difference here really comes down to who should put the infrastructure in place and run it - the government or the market. When it comes to technology management and deployment I know where I would invest my money.
best,
sean
January 28th, 2009 at 3:10 pm
Mr. Hall, who is a senior fellow at the Hoover Institution, a conservative research group.
What a highly unfortunate name to have for a research group advising on this particular topic at this particular time.
January 28th, 2009 at 5:17 pm
Solving the problem of making the “right” data available to the public (cleaning it up, segmenting it, etc.) is exactly where the investment should be instead of a massive SDI.
And… this is exactly what the three proposals are seeking. Providing means by which the flows can be improved - not some huge, monolithic rehash of GOS and centralized GIS data.
I think folks at this point are far less concerned about the prior OMB A-16 NSDI mandate, “NSDI 1.0″ as it were, of FGDC and Geospatial One-Stop, as they are about bringing data assets online and completing various national data initiatives already underway, such as Imagery For The Nation, EPA’s Exchange Network, and so on. And again, in some cases, such as Exchange Network, it’s a partnership of federal/state/tribal/industry, and so on. IFTN is similarly partner-driven. And these are the types of efforts for which funding is sought - not necessarily massive “big-box” vendor contracts, but instead, within some of the proposals offered, these opportunities exist, all across all segments - where in many instances the market can fully engage and participate.
January 29th, 2009 at 2:54 pm
Hi Dave,
I look at this as an opportunity for our industry/community to share new ideas and innovations to help solve problems.
You are working within the current system to innovate and that is admirable. I happen to think we need to step out in a new direction.
To be honest there is probably not a big chance what myself and others expose will be implemented. We have no big companies behind it, no lobbyists, no connections to the established statas quo, just ideas and what we’ve built to demonstrate them.
I do hope the points made and the debate help inform decision makers on the potential of open government data and making it available to the public for remixing and innovation. If we get any closer to this goal it has been a good day.
best,
sean
January 29th, 2009 at 5:55 pm
I agree that technology is among those sectors where new investment should be focused, as it is one of the few things we are actually capable of innovation or producing these days. But frankly, I don’t see how a national data repository can even be feasible. In my view, the breadth of streamlining the organization, administration, and distribution of geodata across the board is simply far too wide.
Anyone who works with GIS would agree that the data glut out there is truly mind blowing. The data within even a single agency is already sprawled and fragmented beyond comprehension, probably even to the creators (perhaps necessarily so). There are also massive data redundancy issues, divergence on data identification needs, and unique uses for geodata between different agencies that just cannot be reconciled without having to systemically address how agencies interact with one and other.
It’s just too much and everything just feels too hurried. Imposing a top-down hierarchy is not going to work for data that, so far, has been by definition bottom-up. I would suggest starting from “the middle”, namely spearheading state initiatives and commercial/non-profit interests who can best assess the shortcomings of geodata on both the demand and supply side. As you guys know, this middle approach is already happening but probably not quite ready for primetime, and that’s why all effort and investment should be geared toward precisely that.
Yes, something needs to be done. Even if they end up throwing tons of money to a few firms at the top, the key to success is to make sure the top-down model is structured so as to facilitate and not undermine the current “middle way” approach.
January 30th, 2009 at 5:53 am
This guy is selling his ebook for $2,000 and he already sold five. Damn clever! Oh and, cool blog you have sir. Jack
February 2nd, 2009 at 11:20 pm
Miles, you may want to look at the “other” NSDI proposal (as apart from the ESRI/BAH and Autodesk/MSFT/Oracle/Google ones) posted to http://www.nsdi2.net, which DOES acknowledge that the data originates at the local level, and which essentially proposes exactly what you are suggesting, dovetailing into efforts such as the EPA’s CDX effort, which ties together states, tribes, and other stakeholders, or efforts like Imagery For The Nation, which provides statewide frameworks wherein counties can collaborate and leverage their investments, to provide economies of scale and buy-ups toward better-quality data and better coverage.
February 4th, 2009 at 8:28 pm
again, I’m seeing so much Neo-geographer speak here. I am one myself, big into everything fortiusone and the geoweb is doing, I’ve attended every Where2.0 conference. But where ESRI makes its money, where the 100s of billions of dollars in GIS are spent every year is not here (yet!).
The market, the demand for this is in the people using traditional GIS still: the Land surveyors. The counties who track cadestral data. Do you design and implement a freeway system in Google Earth? Do you use OSM? NO. You use Leica TS System 1200 and Arc. You NEED to get access to county records.
Do you know, it costs about $10,000 right now to have a piece of property divided into two parcels for sale. Most of that cost is having a certified survey company survey the land, enter that data into 10 different city, county, regional and state databases and draw up a legal document. Same with planning a road, bridge, housing or commercial development. These are HUGE $$ industries and a National GIS would be a boon for them.
Just my $0.02, (from the other side of the fence, or rather straddling it).
February 5th, 2009 at 11:30 am
[…] http://blog.fortiusone.com/2009/01/28/data-is-the-public-good-data-is-the-infrastructure-data-is-the... […]
February 6th, 2009 at 9:11 pm
[…] . Off the Map : Les données sont un bien public (angl.) Sean Gorman, spécialiste des technologies géospatiales, vient de publier un billet plutôt stimulant intitulé : “les données sont un bien public. Les données sont l’infrastructure. Les données sont le stimuli”. Il y explique que l’important pour une administration est de rendre les données accessibles et transparentes au public et que c’est au secteur privé de construire l’infrastructure pour les rendre accessibles. […]
February 9th, 2009 at 7:46 pm
One perspective on this that I haven’t seen yet is the usefulness of having simple access to raw data in times of emergency, rather than having the data delivered through some clever, special, fancy, and slow system.
There was an Australian fire maps site that had an on-site viewer for maps; it was crushed in the demand for this information in the recent Melbourne fires.
http://sentinel.ga.gov.au/acres/sentinel/index.shtml
“Please note - due to unprecedented demand on this website,Geoscience Australia has had to temporarily remove access to the interactive application.”
Now it turns out that this is MODIS data and a bunch of other sites around the world have the same dataset in a variety of formats, some unencumbered by the burden of also hosting an interactive application.
February 11th, 2009 at 3:09 pm
[…] a time when debate (after debate, after debate) takes place on geospatial infrastructures being pitched to various […]
May 4th, 2009 at 8:55 pm
[…] Data is the Public Good. Data is the Infrastructure. Data is the Stimulus […]
February 5th, 2010 at 10:44 am
[…] stimulus response. This is something we’ve discussed as a positive externality of open data as a public good, but now we could have the momentum to seize upon […]