With the elections over I’ve had a little time to think about what the new administration could mean for the GeoWeb. For those who follow the GeoWanking list serv there has been a raging debate on neogeography versus paleogeography. Some of the rhetoric reminds me of the just finished election and how we strive to create a binary world - blue state/red state or neo/paleo. In the spirit of moving beyond stereotypes and on to solving problems; I thought a closer look at what the potential impact of Obama’s technology platform on the GeoWeb could be. Might be a good diversion from our own self reflection - despite the fact I’ve added plenty of fuel to that fire ;-)

You can read Obama’s technology platform overview here. The plank that really grabbed my attention was the promise to “Open Up Government to its Citizens”. The idea that data about government (Congressional voting records) and created by the government (census data) should be easily available to the public. Specifically:

Making government data available online in universally accessible formats to allow citizens to make use of that data to comment, derive value, and take action in their own communities. Greater access to environmental data, for example, will help citizens learn about pollution in their communities, provide information about local conditions back to government and empower people to protect themselves.”

The beauty is that we (the collective GeoWeb) have so many of these tools already built. The ability to deliver the data once it is made easily available has great promise. For instance here is EPA data on power plant emissions from GeoCommons:

From the map above you can see which power plants are producing the most poisonous CO2 emissions (click the down carrot on the layers box for the filter) or zoom into your specific neighborhood to see the plant and the type of environment around it. (Still refining the embed capability, but an example of how data can be virally spread).

The report goes on to recommend that the federal government should:

Establishing pilot programs to open up government decision-making and involve the public in the work of agencies, not simply by soliciting opinions, but by tapping into the vast and distributed expertise of the American citizenry to help government make more informed decisions.

This strikes again at the heart of the GeoWeb - enabling collaboration of experts and citizens across the country. Several projects and companies have pioneered dynamic collaboration around maps. Below is a Google MyMap with feedback around the GeoCommons power plant data in Florida”


View Larger Map

The blue push pins are the user generated feedback linking to expert opinion and photos from the field. This is just the tip of the iceberg of what is possible with collaboration around maps. These approaches can also be leveraged inside of government agencies, which is another plank in the Obama technology platform:

Employing technologies, including blogs, wikis and social networking tools, to modernize internal, cross-agency, and public communication and information sharing to improve government decision making.”

We’ve seen a lot of this type of work going on in the intelligence community with Intelink, Intellipedia, and A-Space. There is also data fusion and sharing concepts, like the EPA’s Central Data Exchange. I’d love to hear other projects that fit in with the three planks, and more importantly existing or planned GeoWeb technologies that could help enable the new vision. I’ve really only highlighted two and I know there are tons more out there.

Popularity: 19% [?]

On one of many flights this week I was asked the question, “what would you do with the $700 billion of bailout money?” Not an easy question to answer and there has been lots of arm chair quarterbacking on the topic. I’m hardly an expert on financial policy, but in short this was my layover induced answer.

There seem to be two fundamental problems, of many, worsening our current economic quagmire. 1) The housing bubble pushed home prices to levels most working Americans could not afford and to keep the bubble going the financial community became very creative with mortgages and how the risk associated with them was calculated. The end result was lots of people in houses they could not really afford and very little transparency in the risk this created in the financial markets. There is a lot more to the story but for the sake of brevity we’ll leave it at that. 2) Credit liquidity in the current market has almost ossified causing our collective economic gears to come to a rattling halt. Wall Street freaks…the media freaks…the consumer freaks (no spending)…sales of goods plummet…Wall Street freaks again…media fuels more freaking…rinse and repeat.

To break the cycle it would seem logical that liquidity needs to be injected into the market. A lot of pundits have looked at this being solved by the government buying up the bad assets, giving capital to the banks in return for equity stakes, and several other derivative plans. While all these ideas have their merits and risks the idea I exposed on the plane was slightly different. Back to the core issues - I saw the biggest failing being lack of market transparency and a fundamental mismatch between supply and demand in the housing market. So how could we restore transparency to the market while getting people in homes they can actually afford thus freeing capital for consumer spending and financial investment.

My answer was a foreclosure clearing house. This may be Polly Anna and not feasible, but it made for a fun intellectual exercise. There has been lots of talk around providing bail outs to people whose homes are foreclosing, but even this will be short term and will not solve the fundamental problem that they are in a home they cannot afford. The only real solution is to put these individuals and families into homes they can afford. The easy credit and risk shell game that banks ran has created a basic mismatch of people buying supply with demand they did not really have.

The clearing house is a simple idea of providing a transparent market place where people can trade down to houses they can afford and have new loans guaranteed to do so. The loans could be guaranteed by the government but competed for by the banks. Banks that already have the mortgages on existing properties could have the choice of refinancing the house so the owner could afford the payments (that would be their own risk calculation) or entering the home into the clearing house. Also the home owner could have the choice to enter their home into the clearing house if they would like to trade down voluntarily.

The clearing house itself could run like many of the existing home real estate market places matching buyers and sellers (Zillow, Trullia. RedFin etc.). In fact the government could probably contract with one of the sites to run the technology side of the clearing house at a reasonable cost. Once a person’s home was identified for purchase they would then be free to look for a new home in the clearinghouse they could afford. The government backing would allow loans to be made so the individual, now free of the foreclosed home, could buy a new home they could afford. Banks would still compete to provide the best rate and terms to new owner, but the risk would all be transparent to the government since they would be providing financial backing and to the owners so they were not mislead into buying more house than they could afford (again).

In theory this should introduce liquidity back into the market and with a little time put liquidity back into the consumer market since the majority of a person’s paycheck would no longer be going to a mortgage. The market would be transparent again but not run or partially owned by the government. I would argue that it was not capitalism or the market economy that broke during this financial crisis, but a loss of transparency and a resulting hiding of risk. In fixing the crisis the government’s role should be ensuring transparency in the market place so that it can function effectively. My idea is most likely off the deep end, but I do hope government action is centered around restoring transparency and restoring liquidity to the market. If you were Sec. Paulson for a day what would you do with $700 billion? There are no shortage of smart people around the globe. Can we crowdsource an answer?

Popularity: 18% [?]

While getting ready to launch Finder! we had an internal debate whether or not to put limits on dataset downloading. There were several options, ranging from requiring a user to be logged in before they downloaded to limiting the number of downloads a user could make in a day. A lot of the argument centered around the value of raw data - echoing the O’Reilly manifesto that “data is the Intel inside“. This belief holds that the value of the NAVTEQ’s and TeleAtlas’s of the world is derived from the proprietary data they collected.

One side of the company felt that by not limiting access to data we were giving away the family jewels. The other side felt that open access was the best way to create a network effect for data by making it as accessible as possible. At the end of the day the open access philosophy prevailed, and from the sound of comments to James Fee’s post after GeoWeb, access to data is still an important facet to both GIS and GeoWeb users.

Now that Finder! has been out for a little while we’ve begun to see a big surge in downloads. I noted last week we hit 18,000 downloads and just a week later we are now over 28,000. This has caused us to take a second look at our access policies. “Knock on wood”, the system has scaled like a champ handling the traffic, but as we get ready to launch Maker! some concerns have come up about potential abuse and its effect on user experience.

The biggest concern is around systematic downloading of data and the potential for that to impact other users experiences on the site. The question is how to make the content available without impinging on the collective user experience. Wikipedia approaches this by making content available as one big tarball and asks users “Please do not use a web crawler to download large numbers of articles. Aggressive crawling of the server can cause a dramatic slow-down of Wikipedia. Our robots.txt blocks many ill-behaved bots.”

I’m not sure a giant tar ball of data is the best way to go for us, especially since the data is available in a variety of formats. A second option is to provide third party access to the data via an API. This API could also work for both download and upload. Andrei had an interesting suggestion in our last post:

“The two-way API will definitely help with the number of uploads. The cool thing to do, would be to add (”Add to Finder!”) a URL request:

…finder.com/add?file=file.kml&type=kml&name…”

If people have other ideas on how they could better access the data in bullk without impinging performance we’d love to hear them. Also thoughts on what the line is between fair use of content and abuse of the commons. It is a bit of gray line in my mind. Is systematic downloading (manually hitting every dataset) abusive? Is scraping datasets with bots abusive? The main goal in my mind is to provide the best service possible without creating a “tragedy of the commons“.

Popularity: 22% [?]

GeoCommons Metadata Implementation Screenshots

April 22nd, 2008by Sean Gorman

We got such useful feedback from the last metadata post I thought I would add some screen shots of how it is starting to come together. Unfortunately we were not able to get all the suggestions in because of the time crunch hitting our release date, but please keep posting the feedback and we’ll work it in as we have more time.

The first screen shot is of the data details page, which contains the metadata information for the data set. In this case 2000 US Census data at the tract level for Alabama:

finder_data_page

Here you can see the major elements we are capturing in a user friendly graphical lay out. One of the cool new bits is the system automatically calculates statistics when you upload the data. Being able to data mine and run statistics on the fly is one of the new developments we are particularly excited about.

All the metadata on the data details page is exposed as Dublin Core elements which should make them machine readable to the rest of the world:

finder_view_source

Also there are links to FGDC and ISO 19115 metadata mappings which take you to simple text pages with the indicated information. We probably need another pass to get these completely correct, but the infrastructure is all in place to do so.

FGDC looks like this:

Finder_FGDC

ISO 19115 looks like this:

Finder_ISO

Hopefully this will help make the data in GeoCommons useful to multiple geospatial work flows. We hope having the ability to get data out in shapefile, KML, and .CSV (spreadsheets) will create more cross fertilization between GeoWeb and GIS users. With some luck it can help get more geospatial data out to the public that has been difficult to access in the past. A couple of examples below.

US Census Tract Data for Alabama

Alabama Census Tract

Global Maritime Shipping Lanes

Global Shipping Lanes

Zillow Neighborhoods and Shipping Lanes (just because it looked kinda cool)

SF_neighborhoods

Thanks again for the feedback from folks on the metadata and we’ll keep iterating on getting it spot on.

Popularity: 27% [?]

When we started the very first iteration of GeoCommons in 2005 folksonomies were all the rage and we jumped on board using tags to organize the geospatial data that was pushed into the new platform. During the time we had the prototype deployed we ran into many of the same issues other applications have found with folksonomies

1) people’s tags may be difficult for others to understand,
2) people may have tagged items inappropriately for others’ needs.

In short your users will not always implement tags in ways that are productive for the community - in the extreme resulting in Flickr’s 20 million unique tags. How many of those 20 million tags are misspelled words or so off the path they never get found.

In addition to the problems you encounter with folksonomies in general you have the further complications of geopspatial data. All geospatial data sets have location tags, but adding them in an unstructured way creates enough chaos that it is very difficult to leverage location tags in a thorough way. Secondly many potential users do not know the variety of geodata available. Put more simply they do not know what to search for, and having the ability to browse through data by topics is appealing.

Despite the downsides of folksonomies they are incredibly powerful and have been hugely effective in organizing vast amount of data on the web. So, as we worked on the next iteration of GeoCommons we started looking at possible hybrid approaches to folksonomies and hierarchies.

Specifically we looked at the two problems specific to geospatial data listed above 1) place tags and 2) organizing data for browsing. Solving the problems required both short term and long term solutions.

Fortunately we had a small advantage over many crowd sourced project in that we have a full time data team. They are a great group of folks that spend their day finding cool geodata and coming up with clever ways to organize it.

Through the data team and the other community members that contributed data to the first iteration of GeoCommons we had a big pool of data with a wide variety of tags to examine. What we found were some distinct trends in the tagging and titling of data. Across the data there were a commons set of tags that broke the data up into a useful set of distinct categories, but there were also many data sets that were tagged with elements that made them often indiscoverable. After the analysis we started to look at structures we could establish to help create self similarity in tagging that still had the flexibility to be adaptive.

The result was the creation of a location and topical taxonomy based on our existing corpus of data that has the intelligence to adapt as the content grows and evolves. I can’t go into the technical details in depth, but fundamentally the concept is to intelligently leverage the taxonomies and structures to provide suggestions to users to tag their data better.

In many cases this can be very simple - like providing tips on how to tag and title effectively to make your data more valuable to the community. For instance with titles we found across GeoCommons there were four key pieces of information used for datasets in the past.

1) Source name, 2) Original Name of Dataset from Source (or short description of dataset) 3) Geographic Area, 4) Time period of data

Examples:

  • OECD, Information and Communication Technology, Global, 2007
  • USGS, Earthquake Records, Worldwide, 1998-2007
  • NOAA, Hurricane Track Data, North America, 1851-2004
  • Communicating this effectively to users is a great way to get better consistency across data contributions, while still allowing flexibility for users to be creative and bring in information that does fit the rigid mold of a hierarchy. Of course this is the most simple and you can get far more clever.

    Del.icio.us for instance has a great feature that notifies a user they are putting in a new tag no one has used before and asking if that is what they meant to do. You can also suggest tags from your taxonomy that are semantically related to the data the user is contributing. This creates a consistency across tags that makes data easier to find as the system scales to larger volumes.

    The nice thing about taxonomies as opposed to folksonomies is that they can be structured as trees, which means you can compute across them quite easily. With a solid and adaptive taxonomy in place you can go a long ways in intelligently guiding users towards creating better and more consistent tags. At least that is what we think and it will be fun to see how it works out after the launch.

    Popularity: 27% [?]