A Proposal for a GeoWeb Metadata Implementation

April 1st, 2008by Sean Gorman

One of the criticisms we received when we launched GeoCommons was the lack of metadata for the content we had collected. Since then we’ve been looking into what would be a reasonable approach to implement metadata for the GeoWeb.

When it comes to GIS data the existing standard is the FGDC’s Content Standard for Digital Geospatial Metadata (CSDGM). The standard calls for 335 metadata elements to describe a geospatial data set, which covers a wide variety of descriptions for the data. The one thing that came clear very quickly was that the FGDC CSDGM is far too onerous and outdated for the GeoWeb. For instance in the FAQ provided by the USGS they recommend you hire a full time person to create your CSDGM compliant metadata:

Who should create metadata?
“Data managers who are either technically-literate scientists or scientifically-literate computer specialists. Creating correct metadata is like library cataloging, except the creator needs to know more of the scientific information behind the data in order to properly document them. Don’t assume that every -ologist or -ographer needs to be able to create proper metadata. They will complain that it is too hard and they won’t see the benefits. But ensure that there is good communication between the metadata producer and the data producer; the former will have to ask questions of the latter….”

In a GeoWeb where self publication is a key innovation the model of having a full time metadata guru is antiquated. A specification with 335 elements is antiquated. The mantra that “certainly if there is no pain, there will likely be no gain” when it comes to metadata is antiquated. The end result of these draconian approaches to metadata is about a zero likelihood the GeoWeb will implement them.

This is a shame because metadata is very useful, especially when it comes to describing, finding and federating data. This is one of the shortcomings of KML - little/no metadata (although several argue it has no place in either of these formats). GeoRSS has limited metadata support with “Feature Type Tag” and “Relationship Tag” which are useful, but fairly confined.

The question we faced with rebuilding GeoCommons - is there a middle ground between 335 elements and two elements? Fortunately we were not the first to look at this issue. In 1995 a bunch of librarians got together to devise an approach that “provides a simple and standardized set of conventions for describing things online in ways that make them easier to find”. The fifteen elements standard they devised is called Dublin-Core and is widely implemented across the web. If the librarians could come up with 15 core elements then surely the GeoWeb can, and even make those map to the Dublin-Core standard and the FGDC CSDGM standard. So, after a good bit of work here is what we would like to implement as a lightweight core set of metadata for GeoWeb data:

metadata_table

This covers seventeen elements about half of which we trap automatically. You can map them to either FGDC or Dublin Core thus giving you the ability to expose your data to the GIS world and general web community in a straightforward manner. As with any metadata standard you do not need all seventeen elements, but the more you populate the more useful the data becomes. The metadata could be exposed as microformats enabling a number of possibilities for discovery and potential federation. This could be particularly interesting with Yahoo! opening up their search to support Dublin Core vocabularies and microformats. Our feeling is that the more data we can make available on the web the more problems everyone can solve. We’ll be testing this out when we launch the next iteration of GeoCommons at Where 2.0 and would be great to get feedback and thoughts on the approach.

Popularity: 17% [?]

20 Responses to “A Proposal for a GeoWeb Metadata Implementation”

  1. Greg L Says:

    Great post, I would agree with you - the concept that “You will give us all metadata content or nothing!” - or even just providing too many options can intimidate people from even starting to document their information.

    BTW I noticed in your list you left off the Dublin Core Element “TYPE” - which might be combined with FORMAT. e.g. TYPE:Colour Orthophoto of FORMAT:TIFF into something valuable. I see you also reference the element COVERAGE twice.

    If is difficult to determine which are the more valuable COVERAGE references - BOUNDING BOX or PLACE/LOCATION tags? Depending on the type of data and service provided you might forgo “bounding box” - since this can be determined from the data or capabilities info.

  2. Sean Gorman Says:

    Thanks for the feedback. I think type would blend well with format and cover both - good catch and recommendation. The bit I liked about bounding box is that you can trap it automatically and do not have to ask the user for it. Over all we are trapping about half of the specifications automatically so the user work load stays pretty light. You can do the same with place tags as well although a hybrid approach is probably best.

  3. Archaeogeek Says:

    Forgive me if I’m wrong, but isn’t there already a reduced set of elements based on dublin core, and part of the ISO 19115 standard (which your mega FGDC set is based on)? It would seem a little counter-productive to come up with yet another set of elements. I’m sitting looking at a table in a book- but it’s here http://portal.opengeospatial.org/files/?artifact_id=1094 (table 3).

    In short it contains extra elements like spatial resolution and lineage, which are surely essential for an assessment of suitability of the data and a paper trail as to where it came from.

    Best

    Jo

  4. Bruce Westcott Says:

    By making no reference to developments over the past 5 years in standardizing on ISO-19115 metadata, community profiles of this standard, and the supporting ISO-19139 schema, the author merits a grade of “Incomplete - more research required.”

    The ISO wizards have drawn on the legacies of Dublin Core and FGDC, and have a standard perfectly applicable for forward compatibility (of legacy FGDC and DC metadata), for profiling (including the “core” profile mentioned in another comment, and for global interoperability.

    And I hope we don’t waste more electrons bemoaning the onerous details standardized in FGDC or ISO: we’ve all heard it before. The ability to profile ISO-19115 to specific needs and subsets offers the way forward.

  5. Peter Schweitzer Says:

    What matters is whether you can answer the questions people need to have answers for. See http://geology.usgs.gov/tools/metadata/tools/doc/ctc/
    Let your users decide which of those questions they don’t need answered, that’ll tell you what you can leave out.

    Everyone who has actually worked with metadata knows that FGDC metadata records only use 75 to 100 or so fields, many of which are things like your name, your phone number, the bounding coordinates. So your complaint that the FGDC standard is too hard is a bit wearisome. It just isn’t true.

    Another thing to note is that strict conformance with the standard (that is, having all mandatory elements) is not necessary for many applications. If your metadata lacks Logical_Consistency_Report, that doesn’t mean it’s worthless or that it won’t work in a software system built to handle metadata. It only means you might have told users about some inconsistencies in the data and they’ll have to discover them on their own.

  6. Sean Gorman Says:

    Thanks for the feedback - I was hoping some folks from the GIS side of the fence would weigh in. I found references to the Denver Core when doing research but not the list referred to by Archaeogeek. Looking at that list there seems to be a lot of overlap, so at least seems we are headed in the right direction. Although after a quick search I could not find anything that mapped those elements to Dublin Core.

    While I respect the work that has gone into the various GIS metadata standards, and understand the reasoning behind them the simple fact is no one is using them on the GeoWeb. So we are hoping to find a happy medium that is lightweight enough to foster adoption but structured enough to map to existing efforts.

    Sounds like adding and mapping to ISO-19115 would be a good step going forward, and we appreciate the feedback pointing it out.

  7. Lynda Wayne Says:

    It is a bit contradictory to complain that there are too many elements in the CSDGM and, in the same document, suggest that new elements are needed. The many elements in the CSDGM were added by individuals, much like the author, that required the metadata address a specific data type (imagery, GIS files, surveying documents, databases with geo-references), metadata function (archive, discovery, accountability, liability)and/or application type (digital map, data model, website, etc.). The standard does not ‘call for 335 metadata elements to describe a geospatial data set’; instead it provides a set of standaridized options that enable you to describe the data in a manner that best suits your documentation objectives. AS the author demonstrates with the proposed ‘core set of metadata for GeoWeb data’, it doesn’t take much study (pain?)to determine which elements are of most value (gain?)to a particular mission.

    As stated by several others, if you want to document geospatial resources for the web then look to the more contemporary ISO 19115 standard and the US/CA joint profile of 19115 for guidance.

  8. GeoCommons metadata proposal Says:

    […] Read Sean’s original post here […]

  9. Sean Gorman Says:

    I’ve written up a longer response in a follow on blog post, but in short we are not proposing a new standard nor are we proposing new elements. We are proposing an implementation that will map data we collect to current standards. Just looking to make sure we collect the right elements and map them appropriately.

  10. Peter Schweitzer Says:

    The “Denver core” was never a serious proposal. It was just a discussion piece for a meeting in 1995.

    I don’t think anybody minds that you don’t want to use complete metadata–much of my own metadata omits some minor elements or even some major ones (quality reports for example) when we simply don’t have much to say in them.

    Our point is that you should try to make your stuff as compatible as possible with what others have done in the past rather than invent your own or redefine existing element names. So don’t put attribute information into Supplemental_Information, because there’s a better place for the same sort of thing (Entity_and_Attribute_Overview).
    If it’s structurally and semantically compatible, people can use your info in ways you might not anticipate now.

  11. Sean Gorman Says:

    Thanks for the Entity_and_Attribute_Overview tip. The good news is that we can update the mappings fairly easily. The tougher thing is adding new information requests to the online forms if you want consistency across datasets. It will be a big task updating our existing datasets with the new elements and we’d like to only have to do it once. Are there any elements that are missing people feel are super critical or exceptionally useful?

  12. Peter Schweitzer Says:

    Here are a few technical notes on your mapping, on the assumption that you might see data that had FGDC metadata, and you might want to mine the metadata record for info to put into your simplified scheme.

    FGDC has a Publisher field–that would map to Dublin Core’s Publisher more closely than Originator would; Originator is for the authors of the data. In most scientific data authorship is important for both good and bad reasons.

    So I would say DC Publisher -> FGDC Publisher

    Citation in FGDC is the bibliographic info for the item; it includes much more than the Dublin Core Creator field. The latter probably maps to the Originator fields within the Citation.

    So I would say DC Creator -> FGDC Originator(s)

    For User Name, your FGDC element needs to be Contact_Person within the Point_of_Contact (Contact_Information is used four different places in FGDC: Point_of_Contact, Process_Contact, Distributor, and Metadata_Contact)

    So I would say User Name -> FGDC Point_of_Contact:Contact_Person

    For Date like “date of these data” you probably want the Publication_Date in the Citation rather than the Time_Period_of_Content, though the complexity allowed by the latter illustrates how messy this concept can get.

    So I would say Date -> Publication_Date

    The best single FGDC element matching “Name of Source” would probably be Source_Citation_Abbreviation. I like to use a scientific style citation there, like “Smith (1994)”
    though it’s possible to use a much more cryptic value (like number them 1, 2, 3, etc.)

    So maybe Name of Source -> Source_Citation_Abbreviation

    If by Features you mean “are there points, arcs, polys, multigeometries, or what”, then the answer in FGDC is stored in SDTS_Point_and_Vector_Object_Type (and there can be more than one of these) FGDC doesn’t assume that the documentation describes only one geometry type–it’s often helpful to lump things under the same metadata record (makes it easier to maintain the metadata and easier for people to read too).

    So maybe Features -> SDTS_Point_and_Vector_Object_Type(s)

  13. James Says:

    you may be interested in a piece of work that my organization is currently undertaking ion behalf of the UK academic sector. We are reviewing DC and ISO and essentially creating a DC profile that will leverage the existing ISO work. Given the way the wider world is going it is important that whatever is used can play nicely with ISO and we have a draft profile available here:
    http://www.ukoln.ac.uk/repositories/digirep/images/e/ef/Geospatial_Application_Profile.doc

  14. Sean Gorman Says:

    Thanks James - appreciate the pointer. After the blog post we did add an ISO mapping to the data page. It will be good to get feedback on whether it is done correctly. Hope to get up some screen shots later this week.

  15. Sean Gorman Says:

    Sorry Peter I completely missed this response. Greatly appreciate the mapping suggestions. Later this week we are going to post up some screen shots with our metadata pages and the different mappings we put together. We’ve altered what we originally put in the blog post so I’ll have to see how the suggestions map to the new stuff. It will all get posted up and would be great to get your thoughts.

    thanks,
    sean

  16. GeoCommons Metadata Implementation Screenshots | Off the Map - Official Blog of FortiusOne Says:

    […] got such useful feedback from the last metadata post I thought I would add some screen shots of how it is starting to come together. Unfortunately we […]

  17. Some notes and links for early summer Says:

    […] is an important addition to the stack on the strength of its metadata editing capabilities. As Sean Gorman pointed out a while ago, the existing specs for geospatial metadata are rather unwieldy for use on the geoweb. […]

  18. FortiusOne GeoCommons Finder is now Public Beta Says:

    […] saying that the great GeoCommons Finder! is now public beta. Many were very interested when it was announced so if you’ve been waiting for an invite to the private beta, you no longer have to wait. Sign […]

  19. Quality Assurance for Crowdsourced GeoData: Icons and Comments? | Off the Map - Official Blog of FortiusOne Says:

    […] this information is all available on any metadata page in Finder there is nothing that really covers if the data has been quality checked. One of the […]

  20. Kendall Lockerz Says:

    Greetings. 1st I desire to say that I actually like your webpage, just observed it last week but I’ve been following it constantly since then.

    I seem to come to an agreement with most of your thinkings and beliefs and this post is no exception. entirely

    Thank you for any great website and I hope you keep up the beneficial operate. If you do I will continue to read it.

    Have a very excellent evening.

Leave a Reply