Difference between revisions of "MetadataHarvestingInstructions"

From NGDC Wiki
Jump to: navigation, search
m (Harvesting (Z39.50))
m (Replacing page with 'out of date content - see history tab')
 
Line 1: Line 1:
== Harvesting with Web Accessible Folder (WAF) ==
+
out of date content - see history tab
# Go to GOS at http://geodata.gov
+
# Sign up and log in
+
# Go to My Tools, Configure Harvesting, Register New Metadata Site
+
# Select radio button Web Accessible Folder
+
# copy/paste the URL for your published records into the Host URL field
+
## e.g. ''http://www.ngdc.noaa.gov/metadata/published/NESDIS_Products/''
+
## e.g. ''http://www.ngdc.noaa.gov/metadata/published/NOAA/NESDIS/NGDC/MGG/DEM/fgdc/xml''
+
## note: in the NMMR an underscore replaces a space in the record set name
+
# If you are using the ISO Topic Category Keywords select: Theme Keyword=Already inserted in the metadata
+
# Choose your harvest frequency: e.g. Once per week
+
# Select email notification if you want it
+
 
+
=== WAF Observations ===
+
* automatic harvesting fails if there is an index.html or default.html in the WAF
+
* the Title is the unique identifier for recognizing metadata records
+
** unable to store different records with the same TITLE.
+
*** silently replaces records with the same name (no error message is given)
+
*** creates duplicate records if a Title changes from one week to the next
+
* previously harvested records that are obsolete or in an unpublished state are not deleted or removed from GOS repository
+
** Marten Hogeweg of ESRI is aware of this. They may provide an option to address this.
+
** For now, delete old records manually in My Tools, Manage My Metadata
+
 
+
==== WAF Harvesting of NGDC Record Set ====
+
* As of 2008-10-06, NGDC changed over to harvesting from WAF.
+
* This configuration change unfortunately created a duplicate set of records on top of the ones previously harvested by Z39.50.
+
* GOS was contacted and deleted the earlier Z39.50 set.
+
* From now on, all subsequent harvests will be synchronized with the NGDC WAF: added, changed, and removed records (from the WAF) will be reflected in the weekly GOS harvests.  If problems persist, please contact the NGDC Data Administrator to investigate. 
+
* Duplicate Issues
+
** It's possible that some duplicates may still persist from earlier harvest testing or by other agencies submitting the same records - should be rare in case of NGDC.
+
** If duplicates are found, this usually indicates that each record has different owners. Please provide the NGDC Data Administrator with the following information to track down this problem:
+
*** Data Set Title
+
*** Published Document ID, otherwise known as the UUID found in the ESRI Metdata section of the metadata record such as {A59A12E0-11A1-A5F7-7BD9-1DCA20AED864} for both records.
+
** This information will be passed along to GOS who will find out the owner of the record. Once known, NGDC can talk with the other record owner and determine if a duplicate exists, which one should be retained, etc....
+
 
+
==Harvesting (Z39.50)==
+
<font color="red">DEPRECATED, USE WAF INSTEAD</font>
+
 
+
==Searching in GOS==
+
===Observations===
+
 
+
* temporal search
+
** Inconsistent results
+
** temporal range in GOS do not allow for the FGDC accepted value of 'Present' for end date  
+
* 'what' search field seems to search only in /idinfo/citation/citeinfo > Citation Info and /idinfo/descript > Description sections of metadata
+
** idinfo/citation/citeinfo/onlink > online linkage is search-able
+
*** e.g. "www.class.noaa.gov" returns all CLASS metadata in repository
+
** unique identifier in RSE element: <datsetid> is not searchable
+
 
+
 
+
==Search Results==
+
===Observations===
+
 
+
* in 'Full Metadata' the profiles and extension element names appear as the xml/short name in parenthesis
+
** GOS intends to in the future incorporate FGDC endorsed profiles and extensions in the stylesheet rendition of the full metadata record
+
* 'Go to website' button - uses the first URL from XPath /idinfo/citation/citeinfo/onlink
+
** this is a repeatable field and there should be more than one 'go to website' links available in GOS
+
* strange content in 'View Summary'
+
** Under Coverage Area header it looks like they are pulling content from the place keyword thesaurus element, perhaps the place keywords would be more appropriate
+
** e.g. looks like: Coverage Area: NASA/GCMD Location Keywords
+
 
+
===Questions/Further Research===
+
 
+
* TO DO: study functionality of each search options -
+
** what metadata elements are searched and how well does it work?
+
** 'What:' search field
+
** 'Where:' search field
+
** 'My Geography' option
+
** 'Time Frame' fields
+
** 'Content Type'
+
** 'Data Category'
+
** 'Spatial Frame' option
+
 
+
 
+
==Managing Metadata==
+
===Observations===
+
 
+
* when records become obsolete or unpublished, the metadata manager has to manually delete them from the GOS repository
+
** work around 1: clear the whole collection of harvested records and the next harvest will only retrieve published records
+
*** there is no multiple delete capability and it is very tedious if more than a few records in a collection
+
** work around 2: easier is to go to the last page of the repository, already sorted by date, and delete each record with an older date
+
* there is an option to update/edit a harvested record through an online interface
+
** this interface seems to enforce GOS validation rules that are slightly different from FGDC validation rules
+
 
+
 
+
 
+
==Other Limitations==
+
 
+
* 'content type' - one-to-one relationship between content type and a metadata record,e.g. a dataset could be 'downloadable data' and 'live data and maps', but in GOS it will only be searchable by one, such as 'live data and maps'
+
* 'data category' - one-to-one relationship between 'data category' and a metadata record
+
** data category goes to first keyword value under ISO Topic Category Keyword
+

Latest revision as of 14:55, 30 April 2013

out of date content - see history tab