Pathfinder Data Set Product Notes The following information about the Pathfinder AVHRR Land Data set is intended to explain to Pathfinder data users some of the unique features of the Pathfinder data set. These notes contain very important information about the data which all investigators should review before using Pathfinder data for research purposes. Information presented in these notes deals exclusively with the geophysical data; information specific to scaling, reading, subsetting, and remapping the data can be obtained from the Goddard DAAC either through the on-line system, the Pathfinder Data User's Manual, or contacting the User Services Office. This information has been prepared by the Pathfinder algorithm development team and can also be found in the User's Manual. A. Uncharted Territory Prior to the Pathfinder data set, global, daily, mapped NDVI and associated channel data have been available to very few researchers. Rather than creating a highly refined, single channel NDVI product, the Pathfinder seeks to maximize available data to foster new data application research and new AVHRR data derivation and compositing methods. Having a daily data set introduces many data conditions which will be unfamiliar to users; orbital overlap, solar zenith cutoffs, missing data, etc., all have characteristic patterns which need to be understood. Similarly, the fact that in addition to NDVI, one has thermal, reflectance, cloud, and geometry information available for research presents many problems and opportunities. Data which, at first glance, appear "noisy" or "suspect" may appear in one or more data layers, but after reviewing this information, it is hoped that you will understand the causes and consequences of the data as found in the Pathfinder data sets. When beginning an investigation with Pathfinder data, it is critical to spend a little time looking at the data which you'll be using in your research. You should also review these notes and the "Data Set Details" in the Pathfinder Data User's Manual. The following sections describe some of the known data conditions which may result in unexpected and undesired research results. Most of the data issues described in these notes are being investigated to determine the effect of possible modifications in future reprocessing. B. Missing Data 1. Missing scans and orbits. If you are working with daily data, watch for missing orbits and scans. Many daily data sets contain data gaps ranging from a few scan lines to an orbit or more. In some cases, this is because orbits were lost in acquiring the Pathfinder 1b input data stream. In other cases, part of an orbit may be fine, but the rest of the orbit may not be processed due to unrecoverable errors such as bad scan times (if the scan time is invalid it can not be navigated). In our experience, we have found that about one out of 20 orbits contains some bad scan time or date information, and about 1 out of 12 days is missing an orbit. Users can determine if data are missing either by viewing the quality control comments (available on-line at the DAAC and in the data file) or by viewing the browse data. 2. Missing data 'wedges'. When a single orbit is missing, the mapped result will have a wedge shape since as latitude increases the amount of overlap between orbits increases. As a result, north or south of about 55 degree latitude the area of the missing orbit(s) will be filled in with data from adjacent orbits. 3. Seasonal "saw tooth" pattern. The precision of the AVHRR visible channels degenerates rapidly in twilight areas. For this reason, all data with high solar zenith angles (i.e., where the sun is setting) are discarded in processing. Specifically, all data with a solar zenith greater than 80 deg. are discarded. As a result, in winter where areas are already in twilight at the satellites overpass time (between about 14:40 - 17:30 local time) the data are discarded. However, because the solar zenith changes along a scan only part of the data (the easternmost part) is discarded. The resulting pattern is a "saw tooth" of missing data. This is of course most pronounced at the summer and winter solstice. This feature becomes extreme in the latter part of the NOAA-9 data (1987/88) and the NOAA-11 data (1993/94) when, due to orbital drift, the local time of equator crossing is close to 5 PM and the satellite approaches darkness at relatively low latitudes. This also results in a strong contrast west to east along orbit for the visible channels. 4. "Filled" data gaps. In mapping from satellite scan coordinates to the output 8 km mapped data, the center latitude and longitude of the scan pixel is used to determine the output pixel. Because the pixels at the edge of scan may be 6 km x 12 km or larger, there are some output bins to which no scan data map. To generate an uninterrupted data field, data gaps of 1 or 2 pixels are "filled in" as a final production step. This is done by simply checking for missing data over land (data value of 0 at x,y) and filling in that value of the adjacent pixel to the east (x-1,y). If the pixel (x-1,y) is also missing, the next pixel to the east (x-2,y) is copied to both missing pixels. No value is copied to fill more than two pixels. In all cases, the QC layer bit is updated to indicate "filled data." In order to maintain physical values derived from the satellite no data smoothing or interpolation is performed - pixels values are simply replicated. This may enhance the "checker-board" pattern described later in these notes. If this will affect your research, you may want to consider restricting your data to close to nadir observations (using information from the scan angle layer in the Pathfinder data set) or, if applicable, running a noise filter. C. Quality Control Flags Users should watch these flags as they may be indicators of major problems. For example, bad calibration information will often (but not always) be identified by a NOAA QC flag (numeric value 16), and pixels with "out of range" flags may be invalid. Since the flags do not indicate whichÊ data layer has a problem, it is best to either view the data you are using, or, at a minimum, read the qc_comments (which may be viewed in the DAAC online Information Management System or by the browse system, or by extracting metadata from the file) as the comments will generally indicate which layers have a large number of QC flags or bad values. If your analyses are sensitive to noise, you may choose to mask out all values with QC flags, particularly the NOAA QC and the DATA OUT OF RANGE flags. SCALE the QC flags! You must subtract 1 from the data set value to determine the actual QC flag value (in 8 bit layers, it's easy to forget to do this!). D. Navigation and Land/Sea Mask Errors The land/sea mask used in Pathfinder processing is the NOAA "operational" land/sea mask which was acquired by the Pathfinder team. This mask has nominal 6 km resolution, but due to the geometry of mapping from the mapped, 8 km bin center to the NOAA file, a general shift of one or two pixels is observed, particularly along northern coastlines. This shift is consistent throughout processing and therefore is not thought to be a navigation error. Exceptional shifts of 2 to 3 pixels are noted in the QC comments metadata; larger shifts are assumed to be navigational errors and are investigated and corrected. This shift is not likely to be detected in most continental or global scale analyses, however users may wish to use the land/sea mask in the ancillary data to "grow" a larger mask if contaminated pixels might affect research. Another apparent navigation error occurs at the edge of scan pixels. These pixels are far larger than the 8 km output bins, and any spacecraft attitude adjustments (e.g., jitter) will result in the most extreme misnavigation at the edge of the scan. Occasionally in the data you may observe large coastline shifts of 3 or so pixels (which is different from the systematic coastline shift described above), and in most cases these are at the edge of scan. In areas where there are sharp changes in surface type (e.g., along the Nile valley), the selection of the greenest pixel, combined with orbital overlap at the edge of scan, may result in "ghosting" where the surface edge appears twice (e.g., the Nile would appear twice with a one or two 8 km bin separation). Because these data are outside 42 deg. scan angle, this will have minimal effect on studies of vegetation index or surface characteristics using the Composite Data Set. In addition, a few obvious errors exist (e.g., several ocean pixels near the Kamchatka peninsula are flagged as land) so again, check the area in which you are working. It was decided that no changes would be made to the land/sea flag in order to maintain a consistent data set throughout the Pathfinder period. E. "Bad" scans and pixels Occasionally, the counts in Channel 1 or 2 are incorrect. Frequently a NOAA QC flag is set a scan or two after the bad scans, but there is no indication of a problem for the specific bad pixel. When converted to reflectance, some of the bad pixels will be out of range (0-100%) and will be flagged as such, but basically the only way to determine if this condition exists is to inspect the data before analysis or read the qc_comments. These occasional bad values are the reason that pixels flagged as out of range are not used in compositing, however many pixels which are within the valid range are still incorrect (e.g., seeing pixels with NDVI's of .8 in the middle of deserts or ice sheets). One artifact of compositing with the highest NDVI value is that bad pixels with unusually high values will be preferentially brought forward into the composites. Schemes to detect bad counts and calibration are being developed, however all currently proposed schemes involve re-generation of the 1b orbital data. At times, calibration data for the thermal channels is either incorrect or unavailable. Rather than a calibration value, NOAA may provide a flag value, and because early in the processing (the data year 1987) this flag was not checked, bad temperature values (i.e., extremely low) were derived from these incorrectly calibrated data. These pixels generally contain a QC flag indicating that a NOAA QC bit was set. The valid range for brightness temperatures used in the Pathfinder (160[[ring]]K - 340[[ring]]K) was chosen based on a compromise of experiences of NOAA data users and the ISCCP processing team. F. Edge of scan "checkered" pattern In the daily data, pixel selection is based on the greenest (i.e., highest) NDVI with preference being given to pixels within 42 degrees of nadir. When an orbit is mapped to the output 8 km bins, where there are no existing pixels the incoming pixels are binned to the output grid regardless of scan angle or reflectance. When the following orbit is mapped, data from the second orbit which was imaged from a different viewing angle and was imaged 101 minutes later is considered for binning. In the area of orbital overlap, the pixel from the second orbit will be binned if it has a larger NDVI value or if a data gap existed as a result of binning edge of scan pixels to the 8 km output. Frequently, a checkered pattern will result due to a) different viewing geometry (bi-directional effects), or b) cloud fronts moving into or out of an area in the 101 minutes between orbits. Since this area of orbital overlap is outside 42 degrees, these data are not included in the Composite Data Set. G. CLAVR The basic approach to the CLAVR layer should be caveat emptor. The CLAVR algorithm was chosen as an experimental cloud layer. This algorithm is very sensitive to surface inhomogeneity, and a single set of reflectance and thermal thresholds is used for the whole globe. The algorithm is presently not validated globally, so when using this cloud layer, users should read about the CLAVR algorithm and investigate which specific cloud flags will be important to your research. CLAVR uses all 5 channels, so if any QC flags are set, it is wise to be suspicious of CLAVR - this is particularly true of data in which NOAA QC flags are set (as this often an indication of bad thermal calibration). The Pathfinder implementation of CLAVR currently does not explicitly check for missing or fill values in tests using thermal channels, and as such may calculate incorrect values where a data gap of a few scan lines is found. Also, the CLAVR values for 1987 data are slightly different from those in other data years due to an error in implementation which was not discovered due to the lack of available validation data sets. When comparing a CLAVR value to the actual channel data of a pixel and surrounding pixels, it may not match the expected thresholds found in CLAVR. There are two reasons for this. First, the CLAVR value is calculated based on pixels in satellite scan resolution - when mapping to the output 8 km bins, some of the adjacent pixels may not be retained in the output data set. Also, CLAVR uses top of the atmosphere reflectances normalized for solar illumination (i.e., reflectance divided by cos(solar zenith), while the Channel 1 and 2 reflectances given in the Pathfinder data set are corrected only for Rayleigh scattering and ozone absorption. Finally, you must SCALE the CLAVR flags. You must subtract 1 from the data set value to determine the actual CLAVR value. H. Incorrect Nadir Values A known problem exists with the Pathfinder Processing System which results in the calculation of an incorrect scan angle and solar zenith angle at nadir for some pixels. This only occurs with the nadir pixels north of 55 degrees. north latitude. Because these almost always have a scan angle of greater than 42 deg. these data are not included in the composites. Once this problem has been investigated and corrected, the Processing Changes report will be updated to identify the time period of data potentially affected. I. Compositing The Pathfinder data are composited for three "ten-day" periods per month. The day with the highest NDVI value is selected for the Composite Data Set, however, because data at the edge of scan may contain distortion and/or bi-directional effect biases, only data within 42[[ring]] of nadir are used in the composite. It is important to note that in generating the 10-day composite, pixels flagged out of range (in the QC layer) are not included in the composite. This helps (but does not totally eliminate) the "compositing in" of data with abnormally high NDVIs resulting from bad Channel 1 and 2 calibration. However, there are frequent cases of good Channel 1 and 2 data where the thermal channels are missing or incorrect due to lack of calibration coefficients. If you are using the thermal channels of the composite, it is important to check the QC flags. If there are residual areas of bad thermal data, and this might effect your results, you are advised to either generate a custom composite or use a different time period. J. Browse Browse data are useful for a quick check of cloud cover and missing data in daily images, or cloud/snow contamination in the 10-day composite. The browse files are generated by subsampling every 8th pixel of every 8th line of two layers of the original data set. In the daily browse image, Channel 2 reflectances and Channel 4 brightness temperatures are used, and for the 10-day composite browse, the NDVI and Channel 4 are used. For the daily browse images a histogram is calculated for each subsampled image, and 2% of the data points from each tail of the histogram are removed and an equalization stretch is performed on the rest of the values. The values at the tails of the histogram are then set to the remaining lowest/highest 8 bit gray level and the remaining data are scaled to 8 bit. Due to this, gray levels in any one image do NOT correspond to the gray levels in any other image and browse data should not be compared over a time series. In the 10-day composite browse, channel 4 is scaled by setting all data below 273 K to a flag value, and all data above 315 K to another flag value, and then performing an equalization stretch on the remaining values. In composite Browse Images a color palette is included in which greener areas indicate more vegetation. Some of the data conditions described in these notes may be observable in the browse. Among these are areas of missing data in the daily data set, bad visible or thermal calibration in the daily files, bad thermal channels associated with good NDVI values in the composite file, and large areas of residual cloud and snow in the composites. More information * Data Set Description * Pathfinder Data Samples and Details * Archive Access and Data Ordering [Image] [Image] Index [Image] Home