GED
User's Guide
Dataset Selection
DOCUMENTATION TEMPLATE DEFINITIONS
Each dataset has its own Dataset Documentation (in Hypertext
Markup Language - HTML) format, which follows a standard template developed
in the Global Ecosystems Database (GED) project. The Dataset Documentation
file is in the dataset directory with the data files and is named with
the two or three letter dataset code (same letters that identify the dataset
directory, and which are used as the beginning letters in each data file
name) and the file extension ".htm."
This section describes the Dataset Documentation Template and
gives definitions for each of the content elements.
Documentation Template Definitions
Identification
Each Dataset Documentation file begins with an identification section
which contains links to the other sections and the following information:
-
Dataset Name
-
This is the full title of the dataset using a convention of combining the
author's (or editor's) last name with the dataset title. Author names are
in regular type and the title (generally created by the authors) is in
italics. The format is {Principle Investigator(s)} and {dataset name} in
italics, preceded by editor/analyst names, if the data have been significantly
re-worked. If the name used is that of an editor, authorship attribution
is generally indicated in the title that follows. Principal Investigators
are the scientists responsible for the actual numerical or classed data
values represented in the dataset, and/or the agency or institution resonsible
for data production, if relevant. Editor/analyst are those who have produced
a new or revised version of original data, for example digitizing it from
analog maps. This file serves as a heading for the entire document.
-
Principal Investigator(s)
-
Names, address, and contact information for the primary authors / principal
investigators.
-
Summary
-
Brief abstract or summary describing the dataset and its intended purpose.
-
Primary Reference(s)
-
These are citations of the primary scientific literature relating to the
production or release of the data. These may include unpublished documents
and published articles. The references may be divided into sub-categories
relating to the various dataset elements. One of the goals of this compilation
is to provide reprints of all Primary References in addition to
the GED documentation. Due to availability and copyright restrictions,
however, some documents may not have been available at the time of publication.
If hyperlinked, reprints of these papers are included on the website as
scanned image-files (in GIF format).
-
Display Image
-
A graphic image created from individual or combined dataset files is presented
to give a general visual impression of the nature of the data.
Documentation Template Definitions
DATASET DESCRIPTION
This section describes the integrated dataset,
its "experimental" design, relevant information about
the specific source that was used to produce the
integrated version for the GED, additional references,
and dataset file lists.The purpose of this section
is to describe everything that comprises the complete dataset as provided
here, including key documentation references; and to give technical and
statistical information about the data structure.
Dataset Description
Integrated Dataset
This section describes the integrated version of the dataset, as provided
by this GED project. While every attempt has been made to preserve
the full content and nature of the original data, some alterations may
have been necessary to achieve a common structure and geographic registration.
For raster data, this may involve interpolation to one of the conventional
"nested" grids, or various forms of re-registration of the grid, or perhaps
both. To aid in assessing the appropriateness or accuracy of interpolation
methods, all interpolated datasets have corresponding examples of the original
form of the data in the SOURCE directory. The original grid representation
is clearly documented in the preceding section (Original Design), and interpolation
methods are indicated in this section (Data Integration). The user must
understand that values represented on a new grid still retain the statistical
meaning from their original grid. The information provided here is thus
important for proper interpretation and use of the datasets.
-
Dataset Citation:
-
The recommended way of referring to the integrated dataset as published
in the GED. This is not a literature reference, but a unique citation for
the digital dataset that distinguishes it from other versions, including
the source. The suggested format is:
{Principal Investigator}. {current publication date}. {dataset name,
including geographic and temporal coverage}. "Digital {type, e.g., "Raster,"
or "Vector"} Data on a {cell size, if standard} {projection, e.g., "Cartesian
Geodetic (lat/long)"} {grid dimension} grid. In: Global Ecosystems Database
Disc {disc number}, Boulder CO: NOAA National Geophysical Data Center.
{number of independent spatial layers and attributes} on {number and type
of media}, {volume}.
If the dataset was previously published, add: [first published in
{date}] For example:
Fedorova, I.T., Y.A. Volkova, and D.L. Varlyguin. 1994. World Vegetation
Cover. Digital Raster Data on a 10-minute Geodetic (lat/long) 1080x2160
grid. In: Global Ecosystems Database Version 2.0. Boulder, CO: NOAA National
Geophysical Data Center. 3 independent single-attribute spatial layers
and one tabular attribute data file on CD-ROM, 11,075,991 MB.
-
Projection
-
The region, projection or mapping, and coordinate system for the integrated
dataset.
-
Spatial Representation
-
The spatial interval (between grid or vector values) and other spatial
characteristics of the integrated version of the dataset, including the
type of spatial object, and the numerical statistic (i.e. vector point,
line, or polygon unit with various attributes; or Grid point sample, cell
average, mode, etc.). This may differ from the original sampling design
due to registration differences or aggregation from finer resolution into
a standard grid dimension. If re-sampling was performed, this field indicates
the method used.
-
Temporal Representation
-
The time step and other temporal characteristics of the integrated version
of the dataset. This may differ from the original sampling design due to
phase differences or aggregation from finer intervals into standard time
sequences. If temporal re-sampling was performed, this field indicates
the method used.
-
Data Representation
-
The form of representing the data values in the integrated dataset, including
any numerical type conversion or re-classification that was performed.
Usually this will involve only type conversions, although occasionally
actual numerical changes may have been necessary. The units and precision
are also indicated (e.g., "Real numbers expressed to .001 inches/month,"
or "Byte integers representing units of 0.1 degree," or "Two-byte integers
representing units of meters above sea-level, rounded to the nearest 30
meters").
-
Layers and Attributes
-
Spatial Layers and Attributes. The number of geographically different data
layers (i.e., independently distributed spatial data layers) represented,
and the number of associated attributes, regardless of file structure.
For example, multiple data mapped on political units may in reality have
only one geographic distribution but many attributes.
-
Dataset Description
DESIGN
This section refers to the nature of the data before integration into the
global database structure. This is important information when considering
the reliability and potential application of the dataset to new problems,
perhaps not foreseen by the original investigators. It is also useful for
those who wish to track back to the original data for quality control or
verification purposes, or to compare with the integrated form of the data,
which may bear changes that are important for a given application.
-
Variables
-
The specific environmental/thematic measurements included in the dataset,
the units used, and the numerical or class precision (class precision is
a qualitative or descriptive indicator, e.g., "species", "major types",
"primary/secondary classes", etc.)
-
Origin
-
Description of instrument, data sources, and/or method of original investigation
or observation.
-
Geographic Reference
-
The coordinate system or projection, and projection parameters (i.e., grid
orientation, origin, central meridian, zone, etc.), for the original dataset.
-
Geographic Coverage
-
Keyword and geographical limits for the dataset coverage.
-
The original spatial interval or sampling resolution of the data, the type
of spatial object, and the numerical statistic (i.e. Vector point, line,
or polygon unit with various attributes; or Grid point sample, cell average,
mode, etc.).
-
Time Period
-
The time period represented in the dataset. In the case of time series,
this indicates the beginning and ending of the data series. In the case
of long-term averages, this field indicates the period from which data
were combined.
-
Temporal Sampling
-
The original time interval or sampling resolution of the data and the type
of statistic (i.e. discrete sample, peak values, running average, typical
or average period, etc.).
-
Dataset Description
SOURCE
This section refers to the source data acquired for this project, documentation
references, and the full lineage up to integration into the GED. This information
clearly identifies the version of the data used and gives proper citation
of the actual numerical data (regardless of format, media, etc.) and its
principal investigators.
-
Source Data Citation
-
Citation of the particular dataset used as a source in the GED. This is
not a literature reference, but a citation of the source version of the
digital dataset. The format of this citation is: {Principal Investigator}.
{Availability date}. {Dataset description or name, including geographic
and temporal coverage}. {"Digital" or "Analog"} {type, e.g., "Raster,"
"Vector," "Map," etc.} Data on a {cell size, if standard} {projection,
e.g., "Geographic (lat/long)"} {grid dimension} grid. {City, State}: {Institution
/publisher}. {number of files} on {media}, {size}. For example:
NCDC Satellite Data Services Division. 1985-1988. Weekly Plate
Carreé (uncalibrated) Global Vegetation Index Product from NOAA-9
(APR 1985 - DEC 1988). Digital Raster Data on a Geographic (lat/long) 904x2500
grid. Washington DC: NOAA National Climatic Data Center. 199 files on five
9-track tapes, 425MB.
-
Contributor(s)
-
Person(s) or institution responsible for disposition of the data, and for
releasing the data into the public domain. In most cases this will be the
PI, however in some cases data have corporate or institutional ownership
prior to release, or are released through an indirect route. This field
also provides contact information, if available.
-
Distributor(s)
-
Data or research center(s) with official responsibility for distributing
earlier versions of the dataset, up to and including the source used for
the GED. To work cooperatively with other distribution centers, NGDC will
refer requests for source data to these official distribution points. Information
for each of the institution abbreviations referenced in this field follows
this section.
-
The approximate date(s) of the project (digitizing work) creating the dataset
represented in the current release. This will precede the publication date
and may be later than the period described by the data and the date the
data were collected. Continuing projects should be noted.
-
Lineage
-
Chronological list of the previous versions of the dataset from the original
data up to the source version used for the GED (including data processing
to produce the integrated version for the GED). Sufficient information
should be given to identify the previous versions and their source, including
the primary persons involved, their role, address, and contact information.
Dataset Description
ADDITIONAL REFERENCES
This section lists key references to the nature and/or application of the
database, or other information that may be especially useful to the user.
The references may be divided into sub-categories relating to the various
dataset elements.
Dataset Description
FILE LISTS
-
This section contains hyperlinks to the dataset directory, the reprints
directory, and the source directory of the database. Most browsers
will produce a directory listing, from which the user can copy files or
obtain information about them.
-
Dataset Files
-
A complete listing of all files comprising the integrated dataset. Each
dataset file will begin with the two or three letter identifier for the
dataset.
-
Reprint Files
-
A complete listing of all bit-mapped (scanned) image files of previously
published documentation articles. These files are provided in GIF format
for users who have appropriate software to display them (for example, a
Web browser). Each file contains one page of the scanned article.
-
Source Example Files
-
Listing of all source data files provided as examples for comparison and
experimentation. Some datasets required changes, such as re-gridding, to
fully integrate them with the database. Although the integration methods
are described, these example source files can be used to verify the results,
or to test other methods. The location, names, number,
and sizes of all files is given.
-
Documentation Template Definitions
DATASET ELEMENT DESCRIPTION
The Dataset Element Description section refers to the actual data files
(after processing) that are included in the database, linking to technical information
contained in the ASCII system metadata files in the dataset directory. Each element
of the database (i.e., "entity-attribute" combination which represents a unique
spatial-temporal variable or theme) contains data file-pairs, or a series of file-pairs,
each consisting of a spatial data file and its corresponding metadata file. For
Raster (i.e., 'image') data, these have the file extensions ".IMG" and
".DOC" respectively. Vector data and metadata file extensions are ".VEC"
and ".DVC" respectively. A data element may include Attribute data for
re-classing or re-labeling the data file. Attribute data file extensions are ".VAL"
or ".MDB," and their corresponding metadata file extensions are ".DVL" respectively.
A data element may also be accompanied by specific color palette (".PAL" and ".SMP")
files, and time-series (".TS") files. Version 1 of the database employs the Idrisi
4.0 file (and Idrisi for Windows 2.0) structure that was jointly developed with
Clark University (described in Structure and Formats).
The file descriptions are organized as indicated below. To avoid redundancy
in the file system, only the first metadata file of a series is shown,
followed by a table of "series parameters" that show only what changes
through the series.
-
Data Element
-
The variable or theme represented in this part of the dataset, and the
units used.
-
Structure
-
Raster or Vector topology (e.g., nested grid-cell or grid-point, arc/node
vector-line, etc.)
-
Series
-
Number and type of data series (e.g., "45 month time-series", etc.)
-
System Files
-
Tabulation of all files associated with this element, as provided in the
dataset directory, with hyperlinks to the metadata files (or first and
last of a series). The File Lists (earlier) may be used to link to other
metadata files in a series.
-
Notes
-
Any notes or additional information for the data element, for example notes
about any color "palette" files or time-series files included with the
dataset.
Documentation Template Definitions
TECHNICAL REPORT
This section provides technical reports provided by the investigator (if
relevant) and an Integration Report produced by the analyst responsible
for integration of the dataset into the GED. Contributed reports may be
on various topics and for various purposes. the Integration Report will
contain a narrative description of methods and significant procedures used
in the integration process, such as re-gridding, re-projecting, registration
changes, temporal compositing, etc.; as well as information on quality
assessment and control procedures.