MetadataHarvestingInstructions

From NGDCWiki

(Difference between revisions)
Jump to: navigation, search
m (Harvesting (Z39.50))
Current revision (14:55, 30 April 2013) (view source)
m (Replacing page with 'out of date content - see history tab')
 
Line 1: Line 1:
-
== Harvesting with Web Accessible Folder (WAF) ==
+
out of date content - see history tab
-
# Go to GOS at http://geodata.gov
+
-
# Sign up and log in
+
-
# Go to My Tools, Configure Harvesting, Register New Metadata Site
+
-
# Select radio button Web Accessible Folder
+
-
# copy/paste the URL for your published records into the Host URL field
+
-
## e.g. ''http://www.ngdc.noaa.gov/metadata/published/NESDIS_Products/''
+
-
## e.g. ''http://www.ngdc.noaa.gov/metadata/published/NOAA/NESDIS/NGDC/MGG/DEM/fgdc/xml''
+
-
## note: in the NMMR an underscore replaces a space in the record set name
+
-
# If you are using the ISO Topic Category Keywords select: Theme Keyword=Already inserted in the metadata
+
-
# Choose your harvest frequency: e.g. Once per week
+
-
# Select email notification if you want it
+
-
 
+
-
=== WAF Observations ===
+
-
* automatic harvesting fails if there is an index.html or default.html in the WAF
+
-
* the Title is the unique identifier for recognizing metadata records
+
-
** unable to store different records with the same TITLE.
+
-
*** silently replaces records with the same name (no error message is given)
+
-
*** creates duplicate records if a Title changes from one week to the next
+
-
* previously harvested records that are obsolete or in an unpublished state are not deleted or removed from GOS repository
+
-
** Marten Hogeweg of ESRI is aware of this. They may provide an option to address this.
+
-
** For now, delete old records manually in My Tools, Manage My Metadata
+
-
 
+
-
==== WAF Harvesting of NGDC Record Set ====
+
-
* As of 2008-10-06, NGDC changed over to harvesting from WAF.
+
-
* This configuration change unfortunately created a duplicate set of records on top of the ones previously harvested by Z39.50.
+
-
* GOS was contacted and deleted the earlier Z39.50 set.
+
-
* From now on, all subsequent harvests will be synchronized with the NGDC WAF: added, changed, and removed records (from the WAF) will be reflected in the weekly GOS harvests.  If problems persist, please contact the NGDC Data Administrator to investigate. 
+
-
* Duplicate Issues
+
-
** It's possible that some duplicates may still persist from earlier harvest testing or by other agencies submitting the same records - should be rare in case of NGDC.
+
-
** If duplicates are found, this usually indicates that each record has different owners. Please provide the NGDC Data Administrator with the following information to track down this problem:
+
-
*** Data Set Title
+
-
*** Published Document ID, otherwise known as the UUID found in the ESRI Metdata section of the metadata record such as {A59A12E0-11A1-A5F7-7BD9-1DCA20AED864} for both records.
+
-
** This information will be passed along to GOS who will find out the owner of the record. Once known, NGDC can talk with the other record owner and determine if a duplicate exists, which one should be retained, etc....
+
-
 
+
-
==Harvesting (Z39.50)==
+
-
<font color="red">DEPRECATED, USE WAF INSTEAD</font>
+
-
 
+
-
==Searching in GOS==
+
-
===Observations===
+
-
 
+
-
* temporal search
+
-
** Inconsistent results
+
-
** temporal range in GOS do not allow for the FGDC accepted value of 'Present' for end date  
+
-
* 'what' search field seems to search only in /idinfo/citation/citeinfo > Citation Info and /idinfo/descript > Description sections of metadata
+
-
** idinfo/citation/citeinfo/onlink > online linkage is search-able
+
-
*** e.g. "www.class.noaa.gov" returns all CLASS metadata in repository
+
-
** unique identifier in RSE element: <datsetid> is not searchable
+
-
 
+
-
 
+
-
==Search Results==
+
-
===Observations===
+
-
 
+
-
* in 'Full Metadata' the profiles and extension element names appear as the xml/short name in parenthesis
+
-
** GOS intends to in the future incorporate FGDC endorsed profiles and extensions in the stylesheet rendition of the full metadata record
+
-
* 'Go to website' button - uses the first URL from XPath /idinfo/citation/citeinfo/onlink
+
-
** this is a repeatable field and there should be more than one 'go to website' links available in GOS
+
-
* strange content in 'View Summary'
+
-
** Under Coverage Area header it looks like they are pulling content from the place keyword thesaurus element, perhaps the place keywords would be more appropriate
+
-
** e.g. looks like: Coverage Area: NASA/GCMD Location Keywords
+
-
 
+
-
===Questions/Further Research===
+
-
 
+
-
* TO DO: study functionality of each search options -
+
-
** what metadata elements are searched and how well does it work?
+
-
** 'What:' search field
+
-
** 'Where:' search field
+
-
** 'My Geography' option
+
-
** 'Time Frame' fields
+
-
** 'Content Type'
+
-
** 'Data Category'
+
-
** 'Spatial Frame' option
+
-
 
+
-
 
+
-
==Managing Metadata==
+
-
===Observations===
+
-
 
+
-
* when records become obsolete or unpublished, the metadata manager has to manually delete them from the GOS repository
+
-
** work around 1: clear the whole collection of harvested records and the next harvest will only retrieve published records
+
-
*** there is no multiple delete capability and it is very tedious if more than a few records in a collection
+
-
** work around 2: easier is to go to the last page of the repository, already sorted by date, and delete each record with an older date
+
-
* there is an option to update/edit a harvested record through an online interface
+
-
** this interface seems to enforce GOS validation rules that are slightly different from FGDC validation rules
+
-
 
+
-
 
+
-
 
+
-
==Other Limitations==
+
-
 
+
-
* 'content type' - one-to-one relationship between content type and a metadata record,e.g. a dataset could be 'downloadable data' and 'live data and maps', but in GOS it will only be searchable by one, such as 'live data and maps'
+
-
* 'data category' - one-to-one relationship between 'data category' and a metadata record
+
-
** data category goes to first keyword value under ISO Topic Category Keyword
+

Current revision

out of date content - see history tab

Personal tools