Global Ecosystems Database
GED User's Guide

About the Project

  • Project Credits
  • Background
  • Data distribution
  • Data integration
  • Documentation and metadata
  • Quality Assurance
  • GIS operability
  • Review of the prototype
  • Give us your feedback


    About the Project

    Background

    The Global Ecosystems Database (GED) project began in 1990 as an Interagency project between the National Geophysical Data Center (NGDC) of the U.S. National Oceanic and Atmospheric Administration (NOAA), and the U.S. Environmental Protection Agency's (EPA) Environmental Research Laboratory in Corvallis, Oregon (ERL-C). The primary objective of the project was to produce a spatially integrated database (including observational time sequences and model simulation outputs) with high quality metadata for characterization and modeling support within the US Global Change Research Program (USGCRP). Datasets were selected for publication to meet research priorities established by the EPA/ERL-C and to expand the use of datasets of potential value to the modeling and applied research community.

    Following prototype development, the first publicly released volume of the database was GED "Disk A" (September 1992). In 1993 datasets for Disk B were distributed for review. The final version of disk B, published in 1997 completed the review, update, and publication process with new or revised versions plus several important additional datasets contributed through 1997.  The current online GED combines the datasets from Disk A and Disk B, the former has been updated from its original version to meet the same metadata standards as Disk B.  This combined A-B Disk database, referred to as "GED Version 2" allows users to download data and documentation directly from the web page but can also be ordered on a two CD set.

    The Goals of the project are to:


    About the Project

    Data Distribution

    All datasets are provided in integrated Geographic Information System (GIS) form. In cases where NGDC has authority for distribution of the original dataset, the complete source version is provided on the data product in the form it was contributed. This project thus meets two product requirements: It provides an integrated version of commonly used global and regional research datasets related to landscape ecological characterization and modeling for a variety of uses in various disciplines; and it distributes original versions of such datasets contributed to NGDC for public distribution.

    About the Project

    Data Integration

    The ability to compare datasets where they are coincident in both space and time can greatly assist their evaluation and use in multi-thematic applications. The GED  provides multi-thematic datasets that may be used in spatial/temporal combination.

    To allow their intercomparison (comparison between datasets) and use in combination, the datasets are provided in geographically and temporally compatible vector and raster GIS form (i.e., registered to common origins, cell boundary conventions, projection, and in comparable time steps). By performing this basic integration prior to distribution, it is possible to incorporate the most appropriate means of representation under reviewed and quality controlled conditions, often in collaboration with the Principal Investigators. In most cases this involves decisions about GIS structure, vector vs. raster representation, the statistical, temporal, and spatial meaning of cell and point values, registration (origin) corrections, precision vs. resolution, and other important issues for representing and documenting the nature and quality of the dataset.

    In some rare cases, integration may also require a different representation of the sampling design (that is, the way data are aggregated spatially, temporally, or statistically) to allow combined or comparative use. For example, this may involve resampling the data to correspond to conventional cell boundaries (i.e., GED multiple "nested" grids), or to achieve spatial consistency for mapping purposes (especially for satellite data). In such cases it is presumed advantageous to determine an appropriate resampling means with the involvement of the investigator/author of the dataset. In these cases the method is carefully documented in the corresponding Dataset Documentation .

    Recognizing, however, that some users, especially those involved in research and modeling, may have different integration requirements, individual ("source") examples are provided for users to evaluate the conversion process by comparison, or to test other methods. Also, full citation and availability information of source datasets is provided, should the user wish to obtain original datasets from their respective distribution points and apply different integration methods. If NGDC is the source distribution point, then the full source version is included in the "source" directory.

    Integrated datasets are provided within geographically defined databases. Each database has a common geographical "window," spatial reference system, and projection. The global database (GLGEO) and southeast Asia database (SEAGEO) are in a flat "geographic" projection (longitude/latitude reference system mapped to a Cartesian coordinate system: also referred to as "platte carree"). The U.S.A. regional database (USALB) is in an Albers equal area projection (reference system in meters).


    About the Project

    Documentation and Metadata

    An important emphasis of the project is on documentation and metadata development to ensure adequate presentation of the nature and limitations of each dataset. Rather than treating documentation and metadata as ancillary aspects of datasets, or as reference material, this project presents them as integral parts of the dataset. For this reason, the documentation is provided in as complete a form as possible (given source material) in an HTML (HyperText Markup Language) user interface that can be "navigated" (using common Web browsers) to access the documentation or link directly to metadata and dataset files representing elements of the dataset. For specific information about auxiliary software capabilities on this database, please see the section in the User's Guide on Functionality.

    A documentation "template" was developed for the GED in 1991 and was modified in 1997 in response to reviewer comments. Continued development of this template and interface has occurred since then and all the datasets in GED Version 2 have been updated in this common template.  Content standards in the GED predate Federal Geographic Data Committee (FGDC) "metadata content standards," but are generally compatible with the FGDC standards and seek the same goal. Full FGDC form and content compatibility that recognizes and employs the hierarchical nature of metadata is under development and may be anticipated in future products.


    About the Project

    Quality Assurance

    Many datasets in common use in the research and modeling community contain uncertainties that are acceptable for some applications and unacceptable for others. For this reason, emphasis is placed here on quality assurance, spatial structuring (topology and attributing), and thorough documentation ("metadata" in today's parlance -- the terms are sometimes used interchangeably). These "intelligent" data structures and "data about data" provide strong internal checks on quality, interoperability, and appropriate geographical use. While it may be impossible to provide truly "definitive" data, or to ensure against all forms of inappropriate use, it is possible to assist users to understand the value, structure, and limitations of data for each application through appropriate structuring and documentation. This is deemed especially important for ecosystem variables and descriptors that depend greatly on the nature of special research purposes and research designs (as compared to datasets that conform to national or other standards, or describe extremely well defined environmental variables as found more commonly in the physical disciplines).

    The GIS environment in general, and especially those developed for analytical research, supports concepts and capabilities for verification and quality assessment. Where feasible, investigators are encouraged to develop and provide quality control data layers to accompany datasets (the GED contains a number of exemplary cases). These may document resolution and source data issues, error ranges, reliability indices, and other measures of quality. By taking advantage of GIS structure and analytical/statistical capabilities built into many research-oriented GIS's (and related display, visualization, and analysis software), one can perform error analysis operations to assess the value of spatial data combinations (static spatial/temporal models, surrogate data produced to drive models, etc.), and the level of cumulative uncertainty.


    About the Project

    GIS operability

    The GED project adopted a Geographic Information System (GIS) operability requirement early in its design for data integration and quality assurance purposes. In doing so, it recognizes the contemporary spatial context of most landscape characterization and analysis (i.e., geographic object-orientation, topological structures, attributing conventions, etc.).

    Each dataset contained here is fully operable using Idrisi software (Clark University). Idrisi is a GIS that employs "open" (documented and non-proprietary) data and metadata formats and is part of a continuing research effort in geospatial analysis methods and technology. As such it met the operability requirements of the GED and was selected in 1990 as the main development environment. This ensured compatibility with a comprehensive set of readily available data management and analysis tools that helped the project define and meet publication standards for research data. However, adopting this operable environment was intended to assure a high quality and widely usable data product, not to imply or favor the use of any particular software. GIS or analytical software, if desired, must be obtained independently of this online database.

    To the extent feasible within the constraints of time and budgets, steps were taken to aid conversion between various software systems. As discussed above, the use of relatively common GIS conventions in the structuring of data is an important first step. This also provides relatively simple binary data formats with ascii metadata (accessible through the metadata interface). The native data formats are carefully documented in the User's Guide to aid programming-level access to data and other format translation techniques. Additionally, format translation software (Newform) and a manual explaining its use are available from NGDC for technically-oriented users. 


    About the Project

    Project Credits

    Principal Investigator:
    John J. Kineman (Project lead), NGDC

    Management:

    Herbert N. Meyers and Allan M. Hittelmann (Division management), NGDC
    David A. Hastings (Program leadership), NGDC

    Project Advisor:
    Dr. Bradley O. Parks (Program analysis), CIRES

    Development Team (Version II, 1993-2000):
    Craig Anderson (data processing and analysis)
    Julie Burland (technical assistance)
    John Dietz (data processing)
    Henry Fisher (data processing)
    Amy Hannaughan (data management and technical assistance)
    Patrick J. Hayes (software implementation, data processing and management)
    David A. Hastings (scientific and technical development, data processing and analysis)
    John Kineman (project design, scientific and technical development, data processing and analysis)
    Margaret Kelly (data processing and management)
    Joshua Klaus (data processing and technical assistance)
    Joshua Knight (data processing and technical assistance)
    Lori Krager (technical assistance and data management)
    Mark A. Ohrenschall (software development and data processing)
    Karen Fay O'Loughlin (data processing and technical editing)
    David C. Schoolcraft (data processing and management)

    Development Team (Version I, 1990-1993):
    Susan Boyle
    Michael Callaghan
    Jeffrey Colby
    Sivana Delamana
    Liping Di
    Kevin Gomolski
    Doug Green
    David Hastings
    Steve Hochberg
    Warren Holquist
    Joy Ikelman
    Greg Johnson
    John Kineman
    Leah Eicher Lewis
    Andrew Locher
    Andrea Mealey
    Lane Middleton
    David Mellon
     
    Laura Nigro
    Mark Ohrenschall
    John Panskowitz
    Stuart Racey
    Brandon Roake
    James Ross
    Lee Row
    Joel Schacter
    David Schoolcraft
    Paul Weschler

    Institutions:

    National Oceanic and Atmospheric Administration
    National Geophysical Data Center (NGDC)
    325 S. Broadway
    Boulder, Colorado 80303-3337
    http://www.ngdc.noaa.gov/

    University of Colorado
    Cooperative Institute for Research in the Environmental Sciences (CIRES)
    Campus Box 216
    Boulder, CO 80309-0216
    http://cires.colorado.edu/

    University of Colorado
    Work-Study Program
    Office of Financial Aid
    Boulder, CO 80303
    http://www.colorado.edu/finaid/

    US Environmental Protection Agency
    Environmental Research Laboratory (ERL-C)
    (and ManTech Environmental Technology, Inc.)
    200 S.W. 35th St.
    Corvallis, OR 97333-4902
    http://www.epa.gov/wed

    Acknowledgements: We wish to acknowledge the founding inspiration and support of Stanley Ruttenberg, Chairman of the ICSU Panel on World Data Centres, the financial and programmatic leadership of Dr. Daniel Marks, USEP-Corvallis, and the collaborative support of Dr. Michael Gwynn Director of the UNEP Global Environmental Monitoring Program, Dr. Harvey Croze, Director of the UNEP Global Resources Information Database Program, and Dr. Istiaque Rasool, Director of the IGBP Data and Information System.

    Data Contributors:
    See Dataset Documentation sections