Technical Explanation & Further Documentation
of the Data Hypercube of DSDP Legacy Data
Return to the Data Hypercube
Why a Data Hypercube?
The DSDP led to many revolutionary advances in our understanding of earth and ocean history over 200 million years. Much of the observational data that underpinned the science is contained in this project. The project has reformed the data to allow bulk analysis and visualisation of trends in global-scale earth history, as seen through lithologies.
This was best achieved by forming a multidimensional data structure - a hypercube. Until now re-processing, large scale computer visualization and analysis of the data was not possible. It was held pagewise as a type of written-corelog, unsuitable for spreadsheets, geographic information systems and databases. To make the dataset available for use in such applications it has had to be brought into a cellular format, and descriptive data has had to be parsed linguistically. The other major issues are data sparsity (the sheer amount of null values across the samples*parameters matrix), database granularity, and data quality control.
By describing the data as a 'hypercube' we want to convey that the whole of these geological data data can now be cut, viewed and analysed in many different planes, by the XYZT coordinates (longitude, latitude, depth bSL, depth bSF, geologic time) and also by one parameter against another. If people think of a multi-dimensional cube of information, admittedly with many gaps, then they will be correct.
Of course, a poly-dimensional data hypercube (this one is 4 dimensional at minumum, XYZT) cannot truly be imagined. Likewise, data products from that concept of the data have to live in the reality of various common software applications. Hypercubes are rendered to humans by operations such as projecting to planes or volumes, and 'splatting' (e.g., Yang 2003). We happen to render the data in ways that are strongly spatially directed, but inter-parameter splats are equally possible with the set.
In the NGDC CDROM, the best organized collection of DSDP data, the database granularity remained at drillsite "hole" level, unless items were extracted manually from the page-long hole descriptions. With this project, the data per core section is broken out into separate data items, a granularity of approximately 1.5m vertically. However, in many instances we have been able to discern and treat observations on individual small segregations and fractions within the core sections, giving a granularity on the scale of centimetres.
The data sparsity of this project is considerable. Only about ##% of the parameter*sample matrix holds non-null values. This is partly because not all observations are made on all samples in the on-board or lab programs. However it is also due to that fact that not all the lithologic descriptions could be parsed successfully, especially where the prose was irregular. Some of the analytical results will also have failed at processing quality filters.
The illustration along-side this introduction and others at THIS_PAGE show what is possible now, using the hypercube. Basically, with this in place, it is possible to voxelize aspects of ocean lithologic history, akin to the gridding on flat maps.
Geological time in the original DSDP data was given in terms of period, stage, and zone biostratigraphic and chronostratigraphic terms. Only rarely were absolute isotopic or paleomagnetic ages attached to materials. Unfortunately, but inevitably, the geologic time scales in use evolved during DSDP, and on-board age determinations were interim. Lazarus et al. (1995) developed age-depth models for 88 of the drilled holes according to one time scale and those models are refined, extended and served now through Chronos (2007). The revisions of time scales and time terms were not propagated systematically through the DSDP data, though some post-cruise datings were merged in during creation of the CDROM compilation.
We have taken the CDROM age determinations such as "Early Oligocene" and applied the International Stratigraphic Commission (ICS) timescale (Gradstein et al. 2004) to those names. This is a simplistic approach, admittedly, but we look to qualified geochronologists to replace these ages with better calibrated values in the future. The assessed scale of error in the method is of the order of <1My (exceptionally up to 4My) to judge from successive revisions of stage absolute ages (Gradstein et al. 2004).
The method of parsing the age terms was as follows. Age values are encountered in the CDROM 'AGEPROF' or 'PALEO' lines, given usually as a stage name, perhaps with a division like "early". In the dbSEABED dictionary the chronological unit names are assigned absolute values (e.g. entry, "rupeln,Rupelian Stage,date,28.4,33.9,0.1, 0.1") of youngest, oldest, youngest uncertainty, oldest uncertainty. The unit is millions of years (my). The parser uses the youngest/oldest limits to create a code like "28.4y:o33.9" (with the uncertainties e.g., "28.4[0.1]y:o33.9[0.1]"). An analysed age such as K-Ar dating will appear in EXT (e.g., "0.0023[0.0001]y:o"), a biostratigraphic age in PRS. Where an age range is given, such as "upper_oligocene to lower_miocene" the two age ranges are combined, giving in this case the result "15.97t:b28.4".
So that data can be plotted to GIS, the code is transferred to a single central value in the preparation of the DSD_***n and Shapefile filesets. So that all samples have a time coordinate, just as they have a geographic coordinate, an age-depth index was built and was used to spread the age values throughout entire the DSD_***n and Shapefile filesets. Undated samples took the age of the sample next above.
- The vertical datums used during data collection and archiving have been an impediment to creating a global analysable structure from ocean lithologic data. In this project we retain the original values, but the prime vertical coordinate is altitude relative to present sealevel. By using altitudes we keep the proper handedness of the data. Of course, sealevel is an inexact datum, but the variations are unlikely to be an issue except for closed-spaced or re-occupied DSDP holes.
- We attach a sequential number - Sample Key - to each observed unit, segregation, sample, phase or fraction: in short to each different analysed material. Some samples are subject to many different analyses, and then one key applies to all those analyses. High value is obtained from this because it allows inter-parameter comparisons. When an observation is made at different scales, such as a visual description versus a smear slide, that counts as different material and key.
- A code for the DSDP Leg, Site, Hole, Core and Section is given for each material (e.g., "DSDP:23:310:A:15:6") and can be used in relational databases. Other details on the sectioning and labelling of the core materials is provided by NOAA (2000).
A feature of dbSEABED outputs is the "DataType" or Audit Code. In first-level outputs it holds record of the data themes that contribute to an output record, for instance "LTH.COL.GTC" for lithology, colour, geotechnical. It will be different for extracted and parsed outputs. On merging these, as is done for the ONE and WWD output levels, DataType records whether a parameter is extracted (i.e., analysed, numeric) or parsed (i.e., descriptive, word-based), or specially calculated (estimated). A sequence like"PPPxPPxxxxPEEEPExxxPE" shows the EXT, PRS, CLC origins of the next 20 parameters, from 'Gravel' to 'GeolAge'.
- Detailed documentation of dbSEABED methods, standards and outputs can be found on the web, especially under the usSEABED EEZ-mapping project. Good point-of-entry URL's are the Processing methods and FAQ web pages of Jenkins (2005a,b).
- A document describing details of the processing of the DSDP data is available at NOAA (2000b).
Return to the Data Hypercube