GED User's Guide
Dataset Selection

DOCUMENTATION TEMPLATE DEFINITIONS

Each dataset has its own Dataset Documentation (in Hypertext Markup Language - HTML) format, which follows a standard template developed in the Global Ecosystems Database (GED) project. The Dataset Documentation file is in the dataset directory with the data files and is named with the two or three letter dataset code (same letters that identify the dataset directory, and which are used as the beginning letters in each data file name) and the file extension ".htm."

This section describes the Dataset Documentation Template and gives definitions for each of the content elements.


Documentation Template Definitions

Identification

Each Dataset Documentation file begins with an identification section which contains links to the other sections and the following information:
Dataset Name
This is the full title of the dataset using a convention of combining the author's (or editor's) last name with the dataset title. Author names are in regular type and the title (generally created by the authors) is in italics. The format is {Principle Investigator(s)} and {dataset name} in italics, preceded by editor/analyst names, if the data have been significantly re-worked. If the name used is that of an editor, authorship attribution is generally indicated in the title that follows. Principal Investigators are the scientists responsible for the actual numerical or classed data values represented in the dataset, and/or the agency or institution resonsible for data production, if relevant. Editor/analyst are those who have produced a new or revised version of original data, for example digitizing it from analog maps. This file serves as a heading for the entire document.
Principal Investigator(s)
Names, address, and contact information for the primary authors / principal investigators.
Summary
Brief abstract or summary describing the dataset and its intended purpose.
Primary Reference(s)
These are citations of the primary scientific literature relating to the production or release of the data. These may include unpublished documents and published articles. The references may be divided into sub-categories relating to the various dataset elements. One of the goals of this compilation is to provide reprints of all Primary References in addition to the GED documentation. Due to availability and copyright restrictions, however, some documents may not have been available at the time of publication. If hyperlinked, reprints of these papers are included on the website as scanned image-files (in GIF format).
Display Image
A graphic image created from individual or combined dataset files is presented to give a general visual impression of the nature of the data.

Documentation Template Definitions

DATASET DESCRIPTION

This section describes the integrated dataset, its "experimental" design, relevant information about the specific source that was used to produce the integrated version for the GED, additional references, and dataset file lists.The purpose of this section is to describe everything that comprises the complete dataset as provided here, including key documentation references; and to give technical and statistical information about the data structure.

Dataset Description

Integrated Dataset

This section describes the integrated version of the dataset, as provided by this GED project.  While every attempt has been made to preserve the full content and nature of the original data, some alterations may have been necessary to achieve a common structure and geographic registration. For raster data, this may involve interpolation to one of the conventional "nested" grids, or various forms of re-registration of the grid, or perhaps both. To aid in assessing the appropriateness or accuracy of interpolation methods, all interpolated datasets have corresponding examples of the original form of the data in the SOURCE directory. The original grid representation is clearly documented in the preceding section (Original Design), and interpolation methods are indicated in this section (Data Integration). The user must understand that values represented on a new grid still retain the statistical meaning from their original grid. The information provided here is thus important for proper interpretation and use of the datasets.
Dataset Citation:
The recommended way of referring to the integrated dataset as published in the GED. This is not a literature reference, but a unique citation for the digital dataset that distinguishes it from other versions, including the source. The suggested format is:
{Principal Investigator}. {current publication date}. {dataset name, including geographic and temporal coverage}. "Digital {type, e.g., "Raster," or "Vector"} Data on a {cell size, if standard} {projection, e.g., "Cartesian Geodetic (lat/long)"} {grid dimension} grid. In: Global Ecosystems Database Disc {disc number}, Boulder CO: NOAA National Geophysical Data Center. {number of independent spatial layers and attributes} on {number and type of media}, {volume}.

If the dataset was previously published, add: [first published in {date}] For example:

Fedorova, I.T., Y.A. Volkova, and D.L. Varlyguin. 1994. World Vegetation Cover. Digital Raster Data on a 10-minute Geodetic (lat/long) 1080x2160 grid. In: Global Ecosystems Database Version 2.0. Boulder, CO: NOAA National Geophysical Data Center. 3 independent single-attribute spatial layers and one tabular attribute data file on CD-ROM, 11,075,991 MB.

Projection
The region, projection or mapping, and coordinate system for the integrated dataset.
Spatial Representation
The spatial interval (between grid or vector values) and other spatial characteristics of the integrated version of the dataset, including the type of spatial object, and the numerical statistic (i.e. vector point, line, or polygon unit with various attributes; or Grid point sample, cell average, mode, etc.). This may differ from the original sampling design due to registration differences or aggregation from finer resolution into a standard grid dimension. If re-sampling was performed, this field indicates the method used.
Temporal Representation
The time step and other temporal characteristics of the integrated version of the dataset. This may differ from the original sampling design due to phase differences or aggregation from finer intervals into standard time sequences. If temporal re-sampling was performed, this field indicates the method used.
Data Representation
The form of representing the data values in the integrated dataset, including any numerical type conversion or re-classification that was performed. Usually this will involve only type conversions, although occasionally actual numerical changes may have been necessary. The units and precision are also indicated (e.g., "Real numbers expressed to .001 inches/month," or "Byte integers representing units of 0.1 degree," or "Two-byte integers representing units of meters above sea-level, rounded to the nearest 30 meters").
Layers and Attributes
Spatial Layers and Attributes. The number of geographically different data layers (i.e., independently distributed spatial data layers) represented, and the number of associated attributes, regardless of file structure. For example, multiple data mapped on political units may in reality have only one geographic distribution but many attributes.
Dataset Description

DESIGN

This section refers to the nature of the data before integration into the global database structure. This is important information when considering the reliability and potential application of the dataset to new problems, perhaps not foreseen by the original investigators. It is also useful for those who wish to track back to the original data for quality control or verification purposes, or to compare with the integrated form of the data, which may bear changes that are important for a given application.
Variables
The specific environmental/thematic measurements included in the dataset, the units used, and the numerical or class precision (class precision is a qualitative or descriptive indicator, e.g., "species", "major types", "primary/secondary classes", etc.)
Origin
Description of instrument, data sources, and/or method of original investigation or observation.
Geographic Reference
The coordinate system or projection, and projection parameters (i.e., grid orientation, origin, central meridian, zone, etc.), for the original dataset.
Geographic Coverage
Keyword and geographical limits for the dataset coverage.
The original spatial interval or sampling resolution of the data, the type of spatial object, and the numerical statistic (i.e. Vector point, line, or polygon unit with various attributes; or Grid point sample, cell average, mode, etc.).
Time Period
The time period represented in the dataset. In the case of time series, this indicates the beginning and ending of the data series. In the case of long-term averages, this field indicates the period from which data were combined.
Temporal Sampling
The original time interval or sampling resolution of the data and the type of statistic (i.e. discrete sample, peak values, running average, typical or average period, etc.).
Dataset Description

SOURCE

This section refers to the source data acquired for this project, documentation references, and the full lineage up to integration into the GED. This information clearly identifies the version of the data used and gives proper citation of the actual numerical data (regardless of format, media, etc.) and its principal investigators.
Source Data Citation
Citation of the particular dataset used as a source in the GED. This is not a literature reference, but a citation of the source version of the digital dataset. The format of this citation is: {Principal Investigator}. {Availability date}. {Dataset description or name, including geographic and temporal coverage}. {"Digital" or "Analog"} {type, e.g., "Raster," "Vector," "Map," etc.} Data on a {cell size, if standard} {projection, e.g., "Geographic (lat/long)"} {grid dimension} grid. {City, State}: {Institution /publisher}. {number of files} on {media}, {size}. For example:


NCDC Satellite Data Services Division. 1985-1988. Weekly Plate Carreé (uncalibrated) Global Vegetation Index Product from NOAA-9 (APR 1985 - DEC 1988). Digital Raster Data on a Geographic (lat/long) 904x2500 grid. Washington DC: NOAA National Climatic Data Center. 199 files on five 9-track tapes, 425MB.

Contributor(s)
Person(s) or institution responsible for disposition of the data, and for releasing the data into the public domain. In most cases this will be the PI, however in some cases data have corporate or institutional ownership prior to release, or are released through an indirect route. This field also provides contact information, if available.
Distributor(s)
Data or research center(s) with official responsibility for distributing earlier versions of the dataset, up to and including the source used for the GED. To work cooperatively with other distribution centers, NGDC will refer requests for source data to these official distribution points. Information for each of the institution abbreviations referenced in this field follows this section.
The approximate date(s) of the project (digitizing work) creating the dataset represented in the current release. This will precede the publication date and may be later than the period described by the data and the date the data were collected. Continuing projects should be noted.
Lineage
Chronological list of the previous versions of the dataset from the original data up to the source version used for the GED (including data processing to produce the integrated version for the GED). Sufficient information should be given to identify the previous versions and their source, including the primary persons involved, their role, address, and contact information.
Dataset Description

ADDITIONAL REFERENCES

This section lists key references to the nature and/or application of the database, or other information that may be especially useful to the user. The references may be divided into sub-categories relating to the various dataset elements.

Dataset Description

FILE LISTS

This section contains hyperlinks to the dataset directory, the reprints directory, and the source directory of the database.  Most browsers will produce a directory listing, from which the user can copy files or obtain information about them.
Dataset Files
A complete listing of all files comprising the integrated dataset. Each dataset file will begin with the two or three letter identifier for the dataset.
Reprint Files
A complete listing of all bit-mapped (scanned) image files of previously published documentation articles. These files are provided in GIF format for users who have appropriate software to display them (for example, a Web browser). Each file contains one page of the scanned article.
Source Example Files
Listing of all source data files provided as examples for comparison and experimentation. Some datasets required changes, such as re-gridding, to fully integrate them with the database. Although the integration methods are described, these example source files can be used to verify the results, or to test other methods. The location, names, number, and sizes of all files is given.

Documentation Template Definitions

DATASET ELEMENT DESCRIPTION

The Dataset Element Description section refers to the actual data files (after processing) that are included in the database, linking to technical information contained in the ASCII system metadata files in the dataset directory. Each element of the database (i.e., "entity-attribute" combination which represents a unique spatial-temporal variable or theme) contains data file-pairs, or a series of file-pairs, each consisting of a spatial data file and its corresponding metadata file. For Raster (i.e., 'image') data, these have the file extensions ".IMG" and ".DOC" respectively. Vector data and metadata file extensions are ".VEC" and ".DVC" respectively. A data element may include Attribute data for re-classing or re-labeling the data file. Attribute data file extensions are ".VAL" or ".MDB," and their corresponding metadata file extensions are ".DVL" respectively. A data element may also be accompanied by specific color palette (".PAL" and ".SMP") files, and time-series (".TS") files. Version 1 of the database employs the Idrisi 4.0 file (and Idrisi for Windows 2.0) structure that was jointly developed with Clark University (described in Structure and Formats).

The file descriptions are organized as indicated below. To avoid redundancy in the file system, only the first metadata file of a series is shown, followed by a table of "series parameters" that show only what changes through the series.

Data Element
The variable or theme represented in this part of the dataset, and the units used.
Structure
Raster or Vector topology (e.g., nested grid-cell or grid-point, arc/node vector-line, etc.)
Series
Number and type of data series (e.g., "45 month time-series", etc.)
System Files
Tabulation of all files associated with this element, as provided in the dataset directory, with hyperlinks to the metadata files (or first and last of a series). The File Lists (earlier) may be used to link to other metadata files in a series.
Notes
Any notes or additional information for the data element, for example notes about any color "palette" files or time-series files included with the dataset.

Documentation Template Definitions

TECHNICAL REPORT

This section provides technical reports provided by the investigator (if relevant) and an Integration Report produced by the analyst responsible for integration of the dataset into the GED. Contributed reports may be on various topics and for various purposes. the Integration Report will contain a narrative description of methods and significant procedures used in the integration process, such as re-gridding, re-projecting, registration changes, temporal compositing, etc.; as well as information on quality assessment and control procedures.