Dataset of the Day: The Housing Bubble and Spatial Irrational Exuberance
August 13th, 2008by Laurie Schintler
The wild stock market run of the 90’s and the more recent boom in the housing market share a similar property: irrational exuberance. In his book “Irrational Exuberance”, Yale Economics Professor Robert Shiller describes the phenomenon “as a situation in which news of price increases spurs investor enthusiasm, which spreads by psychological contagion from person to person, in the process of amplifying stories that might justify the price increases and bringing in a larger and larger class of investors, who, despite doubts about the real value of an investment are drawn to it partly through envy of others’ successes and partly through a gambler’s excitement.”
In the statistics world, irrational exuberance manifests itself in the numbers as something called serial autocorrelation. Simply put, that just means that the prices we see today were largely determined by what we saw yesterday. But, what about the role of space, particularly in real estate markets where location is such a prominent force in dictating the price of assets. In a speculative housing bubble, is there is a spatial dimension to the psychological contagion that Shiller says is a defining characteristic of irrational exuberance?
To begin to explore this, we look at an index of housing prices that the Office of Federal Housing Enterprise Oversight (OFHEO) publishes quarterly for a large number of MSAs across the United States. The index, called the HPI, reflects the average price changes in repeat sales or refinancing of single-family homes whose mortgages were financed by Fannie Mae or Freddie Mac. While the HPI has been criticized for providing a dampened perspective of market prices, owing to the 400k cap on Fannie and Freddie loans, it does offer the best spatio-temporal coverage out of all publicly available sources of information on housing prices.
The maps in the slideshow below, which show percent changes in the 3d quarter HPI by MSA going back to 1988-89, do provide some evidence of spatial contagion. (Note: on each map, warmer hues indicate price appreciation while on the other end of the continuum the darker colors are areas of depreciation. All maps are on the same scale – i.e., in reference to the minimum and maximum change in HPI over the 19 time periods.) Towards the beginning of this decade, areas of northern California and the northeast begin to heat up, quickly spilling over into more locations along the east and west coast and Florida, and then to cities inward. The once hot markets eventually take a sharp downturn in 2006-2007.
How can we measure the spatial contagion? One way is through the Global Moran’s I statistic, which is a measure of the spatial counterpart to serial autocorrelation. It is a number that ranges from -1 to 1, where values in the negative territory would indicate a checkerboard pattern in the phenomenon being analyzed and those on the positive side just the opposite: spatial clustering of similarly, high valued units.
Shown in the figure below is the Moran’s I statistic computed for each time period on the lags of the changes in HPI. While it is important to keep in mind that the values of the statistic during the bubble may be lower than what they should be due to limitations of the HPI in terms of the cap on loans, there is a sharp and rapid incline during that period suggesting that irrational exuberance in real estate markets may in fact involve a spatial dimension.
More data can be found here:
OFHEO Housing Price Index (HPI) by MSA/CBSA - 3d Quarter (1988-2007)
Housing Variables (Prices/Supply/Demand) by MSA (2000-2007)
Popularity: 14% [?]
Dataset of the Day: The New Clear Solution
August 12th, 2008by Brian Gopalan
With so much of talk about energy policies on the Presidential campaign trail, I decided to look into the nuclear option that both the candidates support to different degrees. Lo and behold I found a listing of nuclear power plants in Finder!. Now that we have all these nuclear power plants with promises of more to come - what’s next - hmmm, yes, we need to store the highly radioactive spent fuel somewhere.
Senator McCain’s most frequent example that the US Navy uses nuclear powered vessels safely all the time was somewhat debunked by news that emerged from the USS Houston - a globe-trotting nuclear powered submarine, that has been leaking radiation for the past several years. Listening to Senator McCain here on YouTube describe how we will have nuclear waste lying around in a puddle of water in our street corner reminded me of the Simpsons movie where Mr. Burns could roll down nuclear waste in a truck and dump it into the lake just outside Springfield.
Of course the next thing I wanted to check was where we dump the spent nuclear waste. I added this dataset to Finder! with all the spent fuel storage installations.
The map below indicates where these dumping grounds (a.k.a storage installations) are located. It also shows the locations where nuclear power plants are located from the dataset I found in Finder!
Given the spatial concentration of most of the nuclear power plants as well as the current storage sites on the Eastern seaboard it is intriguing to note that millions of tax payer dollars have been spent into building a mega-storage facility in the Yucca mountain in Nevada. This facility is still not ready - long past its scheduled 1998 opening date. Once it does open, then comes the question of transporting all these highly radioactive wastes across the country. This article points to how Senator McCain says “no” to allow nuclear waste to be transported through Arizona - the state that got him elected, but supports building the Yucca mountain site in Nevada.
Popularity: 16% [?]
Dataset of the Day: Beijing’s Good and Bad Air Days
August 10th, 2008by Raj Kulkarni
Sean mentioned in his blog about how pooling together of efforts by Andrew, Sean, Bill et. al, the Fortifacture/MapuCommons folks were able to bring to you in record time the near-real time pollution data from Beijing. As we were working on this, we realized that there is a huge difference in the perceptions between the host nation and most of the western world/media on what constitutes severe air quality problem. For eg. see below the two pics, both dated 5th August, 2008. One shows Beijing “Clear skies” while the other has haze/smog blanketing Beijing. Wonder whether they are talking about different places and different days!
Xinhua Photo

The photo taken on Aug. 5, 2008 shows the clear sky above the National Stadium, namely the Bird’s Nest, in Beijing, capital of China. (Xinhua Photo/Li Ziheng)
BBC Photo

5 August PM10 reading: 104 micrograms per cubic metre. The World Health Organisation guideline maximum is 50 micrograms per cubic metre, averaged over 24 hours.
Knowing that many countries in Asia, including India and China share the dubious distinction of having the most polluted cities in the world, the media’s obsession with hazy skies should come as no surprise and that much of the media coverage of Beijing Olympics has been about the quality of air. See for example, this split picture of Beijing skyline on a clear and a hazy day on the BBC’sBeijing Pollution: Facts and Figures.
BBC has, for last several weeks, a daily pic of Beijing skyline with a running commentary on the hazy conditions, on their Beijing Pollution Watch site. So we at FortiusOne/Mapufacture decided to generate a daily map of the official stats on PM10 published by Beijing Municipal Environmental Protection Bureau (BMEPB) and compare it with BBC’s Beijing Pollution Watch. PM10, the airborne particles consisting of dust from construction,landfill sites, vehicle exhaust, industrial sources etc. of size 10 microns or less, are the main culprit behind the hazy skies /bad air days in Beijing.
The map below is based on the air quality monitors spread across dozens of Beijing districts along with the locationsof Olympic events (red circles). The six slices of each pie-chart show share of PM10 at each location between 5th and 10th Aug, 2008.
The second map shows today’s readings of PM10 (purple colored proportional circles) for each of the air quality monitoring stations, along with a pie-chart that has share of the SO2, PM10 and NO2.

For comparison, see BBC’s pic of the same day below.

BBC: 10 August PM10 reading: 278 micrograms per cubic metre. We test for 10 minutes at midday from a seventh floor balcony in central Beijing..
While the official readings in nearly half dozen air quality monitoring stations nearby have readings near 90, it has apparently, not had an adverse effect on the athletes thus far in the games. As BBC offers daily pics of the smog, we will have daily updates on the air quality all through the Olympics. In the mean while you may explore on the Finder! the air quality data (SO2, NO2 and PM10) for the last six days i.e, 5th to 10th August, 2008, the road network, and the “>district polys as well as Olympic Athletic Venues,and Olympic village. Search using keyword “Olympics.” You are welcome to download, add, update and upload these data back to Finder!
Popularity: 20% [?]
Dataset of the Day: Here Come the Olympics!
August 1st, 2008by Kevin Burke
The 2008 Summer Olympics are coming to Beijing, China on 8.8.08 and USA athletes are poised to place very well in the international competition. For the past four years athletes have been practicing for the games and now the time has come to represent the USA.
A dataset was created on Finder! that maps the hometowns of all the USA Summer Olympic athletes competing this summer.
“USA 2008 Summer Olympic Athlete Hometowns”
The map above shows points that represent individual athletes and their hometowns in the lower 48 states. From looking at the map you can see that there are a few “Olympic Athlete Hotspots.” Some include: Los Angeles area, San Francisco area, and Philadelphia area.
You can also use this map to see if any of the athletes are from your own hometown. If any are, you can then cheer for your hometown athletes as they compete in Beijing.
There are also a few other datasets on Finder! that deal with the Olympic games. They include:
“All-Time Medal Count by Country, Global, 1900-2006”
“US Olympic Gold Medals Per State, USA”
“US Olympic Gold Medal Winners – Track and Field – by Hometown, USA”
All these datasets show us a unique way to look at sports data through the use of maps.
Popularity: 19% [?]
Ethics of Crowdsourcing - What Constitutes an Abuse of the Commons
July 29th, 2008by Sean Gorman
While getting ready to launch Finder! we had an internal debate whether or not to put limits on dataset downloading. There were several options, ranging from requiring a user to be logged in before they downloaded to limiting the number of downloads a user could make in a day. A lot of the argument centered around the value of raw data - echoing the O’Reilly manifesto that “data is the Intel inside“. This belief holds that the value of the NAVTEQ’s and TeleAtlas’s of the world is derived from the proprietary data they collected.
One side of the company felt that by not limiting access to data we were giving away the family jewels. The other side felt that open access was the best way to create a network effect for data by making it as accessible as possible. At the end of the day the open access philosophy prevailed, and from the sound of comments to James Fee’s post after GeoWeb, access to data is still an important facet to both GIS and GeoWeb users.
Now that Finder! has been out for a little while we’ve begun to see a big surge in downloads. I noted last week we hit 18,000 downloads and just a week later we are now over 28,000. This has caused us to take a second look at our access policies. “Knock on wood”, the system has scaled like a champ handling the traffic, but as we get ready to launch Maker! some concerns have come up about potential abuse and its effect on user experience.
The biggest concern is around systematic downloading of data and the potential for that to impact other users experiences on the site. The question is how to make the content available without impinging on the collective user experience. Wikipedia approaches this by making content available as one big tarball and asks users “Please do not use a web crawler to download large numbers of articles. Aggressive crawling of the server can cause a dramatic slow-down of Wikipedia. Our robots.txt blocks many ill-behaved bots.”
I’m not sure a giant tar ball of data is the best way to go for us, especially since the data is available in a variety of formats. A second option is to provide third party access to the data via an API. This API could also work for both download and upload. Andrei had an interesting suggestion in our last post:
“The two-way API will definitely help with the number of uploads. The cool thing to do, would be to add (”Add to Finder!”) a URL request:
…finder.com/add?file=file.kml&type=kml&name…”
If people have other ideas on how they could better access the data in bullk without impinging performance we’d love to hear them. Also thoughts on what the line is between fair use of content and abuse of the commons. It is a bit of gray line in my mind. Is systematic downloading (manually hitting every dataset) abusive? Is scraping datasets with bots abusive? The main goal in my mind is to provide the best service possible without creating a “tragedy of the commons“.
Popularity: 17% [?]











