SAWG MTG Minutes 2010-12-10
- Dan Kowal
- Phil Jones
- Anna Milan
- Scott McCormick
- Jeremy Throwe
- Tess Brandon
Data Center Activities
- Number of project charters in stages of review and approvals.
- Ingesting tbs of data model started this week. Fairly high volumes. > 2 TB/day. NCDC will ramp up the volume to about 4 TB. CFSRR data. 64 TB initially. Total 400TB. One of the challenges is scheduling that and preparing for increased volumes.
- Upgrading network/infrastructure for NPP. 4-5 tb/day. Factor in a # of TB/day. What do we do at launch time? Pause model ingest? Don't risk the real time programs. NASA and NPP program requested a system freeze in August. We have to see what that means in real terms. Does it mean we can't reconfigure for a new data set?
- In August, got "clear sky" product approved to use CLASS. During discussion Rick said that it could be done during the margins, but nothing has been done. There are issues at the COPB for prioritizing data sets for funding. ACSPO has been resolved (On Nov. 26th Nancy R. sent out an email regarding this). Sounds like COPB is going to have to develop a procedure for what they will approve during the given year. Bunch of things have been approved in the margins, but the funding has run out.
- Received R_to_A for SAR Ocean Winds Products from STAR. Checking out to see how it fits in the mission. Next summer.
- NO is going through all of the metadata records in CLASS and starting to take responsibility for it.
- End of FY, had a mad rush to archive CDR products (internally and externally); a challenge, the people to do the work didn't have a lot of lead time to make it happen. But working out a plan - 11 CDR products for CLASS this FY11. Still up in the air about bringing CLASS into the ingest and archive.
- Reanalysis Model - focusing on the reforecast part to send. First looking at a subset; still sorting out details. Finding out info at the last minute. Not ideal. This is the first data set we've documented in the CLASS ICD.
- Working on CDMP data project for old marine data logs.
- CLASS Charter received for "Total Precip."; Try to get on the COWG Agenda this January.
- Looking at capacity for taking in model data at 8-10 TB/day through our pre-ingester to CLASS. Only part of the picture. Special processing needs for packaging up data prior to delivery to CLASS would be another issue to size up.
- VIPIR charter estimate under review.
- AIS (Nationwide Automatic Identification System from USCG) has been requested for archive. Serves a lot of NOAA constituencies. Not directly proposed by provider. Have some bandwidth issues possibly.
- Put in a work request to CLASS to have them change their metadata support from fgdc to iso. However, some testing done on the iso format was a very low priority and results were not great. If they put some more effort into it, it can be achieved. Still waiting.
- Supporting FGDC and ISO at this point. Rather not keep doing both versions much longer. Once supported in CLASS, will only do iso.
- Phil: with use of a translator, can't we do one from the other?
- Anna: Yes. We do have one that's minimal fgdc. We're testing it now to make sure that all of the correct xpaths will work for CLASS - simple discovery metadata.
- Couple weeks ago, someone needed to use the NMMR for creating metadata - datasets at NCDC; CSF model data. She was told to put it in the NMMR. Didn't want it used on the CLASS front page.
- Phil provided Shirley Briscoe with metadata. Ping is more involved with cal/val data for NPP; the metadata should be more focussed on this. Do we still want the metadata to go through the NMMR or through the NCEP at NCDC? Phil prefers it to go through NCDC. If NMMR is there and we are using it, they publish to the NCDC WAF.
- Anna: This is something we need to work on. Where should CLASS get the metadata from? Centralized or distributed.
- Phil: John Caron from Unidata visited at NCDC; Common Model - CF metadata presentation; mentioned the netcdf attributes for data discovery. Implications for GOES R to add more metadata into the netcdf.
- Phil: Attended a Tech. Interchange Mtg. yesterday for GOES-R. PD does not have any details planned to write metadata to the netcdf. They are going to use netcdf 4, but will be using netcdf 3 standard. They don't have a good idea on how to write metadata into netcdf.
- Is this a DEWG matter? Jeremy, "yes." Nothing has been decided yet. There could be a mix , some using the current model, some using the advanced. Best to deal with this issue sooner rather than later.
- Phil: DEWG should work on the development of both the XML and the netCDF.
- Ananth mentioned the new data group (asked Phil,Dan to join) where these type of topics could be discussed. Dan Wilkinson will represent NGDC. Dan will forward Jeremy a team description.
Steps for New Campaigns
- Dan used the Flow Chart for the What to Archive Procedure for basis of discussion.
- Dan mentioned the marine voluminous video collections that NODC has been collecting and contemplating for CLASS, but want to know the path for moving it forward. Suggestion has been to get the metrics on the data set and backing from the director if this is something worth pursuing doing the full blown appraisal compared with other data sets that are up for proposal to CLASS.
- What steps are not well defined well that should be part of the workflow?
- Phil: Could add CLASS Project Charter.
- Phil: Don't know if the level of detail in this diagram is necessary for the data provider (requestor). All they need to know is what to submit for the appraisal. NCDC would like to have a set of archive guidelines that describe what information a requestor needs to provide about their data set. Want to make the process efficient, give ideas on acceptable formats and a timeline.
- Scott: One thing to fit in the Workflow Chart is defining what are the basic set of requirements to providing data to CLASS; process would identify the CLASS needs for unique file names, manifests, documentation submitted. This is more than what the data centers want, but perhaps there's a way to merge the two.
- Tess: NODC is working on a set of best practices for formatting per type of data. What the attributes, conventions should be used, etc....
- Dan. Based on conversation so far, trying to figure out what's missing. NGDC has a preliminary appraisal through it's initial Request to Archive Questionnaire that provides some initial dimensions of the data. NODC has the SIF...even NCDC has a questionnaire. Aren't these enough to capture preliminary information for the proposed data sets?
- Scott: We have a ROM estimate Spreadsheet that's based on volume. Has a general complexity level for the amount of labor involved.
- Dan: Funding issue; do we have resources to support the request.
- Tess: We need a first pass estimate. The SIF incorporates the 28 questions. If it has CLASS identified as the backend archive, submit to Director; Decide to do in-house or in CLASS after running it through our cost estimation process; We decide on CLASS primarily on existing relationships and volume.
- Phil: Testing before ingest in the another piece to add to the Flow Chart.
- Dan showed the SPIWG Agenda PPT and asked about how to convey the tracking of requests coming from them.
- ATRAC: point providers to that site; use as the front page and submit an archive request. Still in early devel. Some holes and bugs. Intermediate release soon. Hasn't had a lot of use yet.
- Tess: They have an internal tracking system on a wiki; could easily migrate to ATRAC.
Next topics, structure, meeting times
- 2nd Friday of the month at 11:30 AM EST/9:30 AM MST
- Next Mtg. January 14, 2011.
- Dan will create a place on the wiki for people to post topics for the agenda.
- Will keep to a structure as done above, choosing one topic to discuss more. Will discuss at the next meeting if additional meetings are required given the range of issues we cover.