Adaptation of Darwin Core Standards and Development of New Standards for Geologic Specimens

Biodiversity Information Science and Standards ◽

10.3897/biss.2.25929 ◽

2018 ◽

Vol 2 ◽

pp. e25929

Author(s):

Christina Byrd

Keyword(s):

Natural History ◽

Primary Objective ◽

Biological Information ◽

Data Sets ◽

Data Standards ◽

Data Set ◽

Core Data ◽

Management Standards ◽

Darwin Core ◽

Term Basis

The Darwin Core data standard has rapidly become the go-to standard for biological and paleontological specimens. In order to accommodate all of the timescale data for paleontology specimens, standards for geologic age were developed and incorporated into Darwin Core. At the Sternberg Museum of Natural History (FHSM), digitization of the paleontology collection has been a primary objective. The adoption of the Darwin Core standard for FHSM’s paleontology data spurred the idea to use Darwin Core for the geology collection as well. There are currently no widely accepted data standards for geology specimens, but there are some organizations who have uploaded their data management standards online. Even though Darwin Core was developed for the dissemination of biological information, many of the data fields are applicable to geology. FHSM is working to adopt and adapt Darwin Core standards for its geology collection. FHSM currently has 84 fields to record geology data. Approximately sixty percent of these data fields directly correspond with Darwin Core terms and have been adopted with the corresponding data format. Seven percent of the fields correspond with Darwin Core terms but require adaptation by adding new shared language within the term. These fields include the classification of rocks and minerals and the addition of “geologicSpecimen” for the Darwin Core term “Basis Of Record”. Fortunately, minerals have a classification system that loosely resembles animal taxonomy. For example, quartz is a mineral species that is part of a group called Tectosilicates, which is subsequently grouped into Silicates. One quarter of the FHSM fields are specific to geology and do not fit within the current Darwin Core data set. When determining terminology for these fields, FHSM staff utilized the terms and standards set by the Open Geospatial Consortium (OGC), an international organization for making open standards for the global geospatial community. The terms adopted from the OGC come from a category called “EarthMaterial.” The remaining fields are specific to FHSM recordkeeping. In order to share these terms with others and hopefully start a larger conversation about data standards for this area of natural history, the terms and definitions will be made available on the FHSM website in the geology section. Using the same terms, formats, and overall standard across the disciplines at FHSM increases usability and uniformity of the different data sets, increases workflow efficiency, and simplifies development of the relational database for paleontological and geological specimens at FHSM.

Download Full-text

A proposed polarity standard for multicomponent seismic data

Geophysics ◽

10.1190/1.1500363 ◽

2002 ◽

Vol 67 (4) ◽

pp. 1028-1037 ◽

Cited By ~ 15

Author(s):

R. James Brown ◽

Robert R. Stewart ◽

Don C. Lawton

Keyword(s):

Ground Motion ◽

Pressure Change ◽

Seismic Profile ◽

Primary Objective ◽

Data Sets ◽

Ocean Bottom ◽

Data Set ◽

Field Recording ◽

Sea Bottom ◽

Cable Lines

This paper proposes a multicomponent acquisition and preprocessing polarity standard that will apply generally to the three Cartesian geophone components and the hydrophone or microphone components of a 2‐D or 3‐D multicomponent survey on land, at the sea bottom, acquired as a vertical seismic profile, vertical‐cable, or marine streamer survey. We use a four‐component ocean‐bottom data set for purposes of illustration and example. A primary objective is a consistent system of polarity specifications to facilitate consistent horizon correlation among multicomponent data sets and enable determination of correct reflectivity polarity. The basis of this standard is the current SEG polarity standard, first enunciated as a field‐recording standard for vertical geophone data and hydrophone streamer data. It is founded on a right‐handed coordinate system: z positive downward; x positive in the forward line direction in a 2‐D survey, or a specified direction in a 3‐D survey, usually that of the receiver‐cable lines; and y positive in the direction 90° clockwise from x. The polarities of these axes determine the polarity of ground motion in any component direction (e.g., downward ground motion recording as positive values on the vertical‐geophone trace). According also to this SEG standard, a pressure decrease is to be recorded as positive output on the hydrophone trace. We also recommend a cyclic indexing convention, [W, X, Y, Z] or [0, 1, 2, 3], to denote hydrophone or microphone (pressure), inline (radial) geophone, crossline (transverse) geophone, and vertical geophone, respectively. We distinguish among three kinds of polarity standard: acquisition, preprocessing, and final‐display standards. The acquisition standard (summarized in the preceding paragraph) relates instrument output solely to sense of ground motion (geophones) and of pressure change (hydrophones). Polarity considerations beyond this [involving, e.g., source type, wave type (P or S), direction of arrival, anisotropy, tap‐test adjustments, etc.] fall under preprocessing polarity standards. We largely defer any consideration of a display standard.

Download Full-text

Comparison of silhouette-based reallocation methods for vegetation classification

10.1101/630384 ◽

2019 ◽

Cited By ~ 1

Author(s):

Attila Lengyel ◽

David W. Roberts ◽

Zoltán Botta-Dukát

Keyword(s):

Simulated Data ◽

Primary Objective ◽

Vegetation Classification ◽

Data Sets ◽

Data Set ◽

Number Of Clusters ◽

Silhouette Width ◽

Diagnostic Species ◽

Order Of Magnitude ◽

Initial Classification

AbstractAimsTo introduce REMOS, a new iterative reallocation method (with two variants) for vegetation classification, and to compare its performance with OPTSIL. We test (1) how effectively REMOS and OPTSIL maximize mean silhouette width and minimize the number of negative silhouette widths when run on classifications with different structure; (2) how these three methods differ in runtime with different sample sizes; and (3) if classifications by the three reallocation methods differ in the number of diagnostic species, a surrogate for interpretability.Study areaSimulation; example data sets from grasslands in Hungary and forests in Wyoming and Utah, USA.MethodsWe classified random subsets of simulated data with the flexible-beta algorithm for different values of beta. These classifications were subsequently optimized by REMOS and OPTSIL and compared for mean silhouette widths and proportion of negative silhouette widths. Then, we classified three vegetation data sets of different sizes from two to ten clusters, optimized them with the reallocation methods, and compared their runtimes, mean silhouette widths, numbers of negative silhouette widths, and the number of diagnostic species.ResultsIn terms of mean silhouette width, OPTSIL performed the best when the initial classifications already had high mean silhouette width. REMOS algorithms had slightly lower mean silhouette width than what was maximally achievable with OPTSIL but their efficiency was consistent across different initial classifications; thus REMOS was significantly superior to OPTSIL when the initial classification had low mean silhouette width. REMOS resulted in zero or a negligible number of negative silhouette widths across all classifications. OPTSIL performed similarly when the initial classification was effective but could not reach as low proportion of misclassified objects when the initial classification was inefficient. REMOS algorithms were typically more than an order of magnitude faster to calculate than OPTSIL. There was no clear difference between REMOS and OPTSIL in the number of diagnostic species.ConclusionsREMOS algorithms may be preferable to OPTSIL when (1) the primary objective is to reduce or eliminate negative silhouette widths in a classification, (2) the initial classification has low mean silhouette width, or (3) when the time efficiency of the algorithm is important because of the size of the data set or the high number of clusters.

Download Full-text

Phylogenetic position of Pelusios williamsi and a critique of current GenBank procedures (Reptilia: Testudines: Pelomedusidae)

Amphibia-Reptilia ◽

10.1163/156853812x627204 ◽

2012 ◽

Vol 33 (1) ◽

pp. 150-154 ◽

Cited By ~ 4

Author(s):

Uwe Fritz ◽

Mario Vargas-Ramírez ◽

Pavel Široký

Keyword(s):

Natural History ◽

Genetic Divergence ◽

Best Practice ◽

Nuclear Data ◽

Phylogenetic Position ◽

Published Data ◽

Data Sets ◽

Natural History Museums ◽

Data Set ◽

History Museums

We re-examine the phylogenetic position of Pelusios williamsi by merging new sequences with an earlier published data set of all Pelusios species, except the possibly extinct P. seychellensis, and the nine previously identified lineages of the closely allied genus Pelomedusa (2054 bp mtDNA, 2025 bp nDNA). Furthermore, we include new sequences of Pelusios broadleyi, P. castanoides, P. gabonensis and P. marani. Individual and combined analyses of the mitochondrial and nuclear data sets indicate that P. williamsi is sister to P. castanoides, as predicted by morphology. This provides evidence for the misidentification of GenBank sequences allegedly representing P. williamsi. Such mislabelled GenBank sequences contribute to continued confusion, because only the original submitter can revise their identification; an impractical procedure impeding the rectification of obvious mistakes. We recommend implementing another option for revising taxonomic identifications, paralleling the century-old best practice of natural history museums for new determinations of specimens. Within P. broadleyi, P. gabonensis and P. marani, there is only shallow genetic divergence, while some phylogeographic structuring is present in the wide-ranging species P. castaneus and P. castanoides.

Download Full-text

BIG: a large-scale data integration tool for renal physiology

AJP Renal Physiology ◽

10.1152/ajprenal.00249.2016 ◽

2016 ◽

Vol 311 (4) ◽

pp. F787-F792 ◽

Cited By ~ 8

Author(s):

Yue Zhao ◽

Chin-Rang Yang ◽

Viswanathan Raghuram ◽

Jaya Parulekar ◽

Mark A. Knepper

Keyword(s):

Big Data ◽

Large Scale ◽

Data Science ◽

Relevant Information ◽

Biological Information ◽

Renal Physiology ◽

Data Sets ◽

Data Set ◽

Large Scale Data ◽

Quantify Gene Expression

Due to recent advances in high-throughput techniques, we and others have generated multiple proteomic and transcriptomic databases to describe and quantify gene expression, protein abundance, or cellular signaling on the scale of the whole genome/proteome in kidney cells. The existence of so much data from diverse sources raises the following question: “How can researchers find information efficiently for a given gene product over all of these data sets without searching each data set individually?” This is the type of problem that has motivated the “Big-Data” revolution in Data Science, which has driven progress in fields such as marketing. Here we present an online Big-Data tool called BIG (Biological Information Gatherer) that allows users to submit a single online query to obtain all relevant information from all indexed databases. BIG is accessible at http://big.nhlbi.nih.gov/ .

Download Full-text

Facts for Policymakers: Complex Trauma and Mental Health of Children Placed in Foster Care: Highlights from the National Center for Child Traumatic Stress (NCCTS) Core Data Set

PsycEXTRA Dataset ◽

10.1037/e617062012-001 ◽

2011 ◽

Keyword(s):

Mental Health ◽

Foster Care ◽

Traumatic Stress ◽

Complex Trauma ◽

Data Set ◽

Core Data ◽

Child Traumatic Stress

Download Full-text

The social wasp Vespula germanica (Fabricius) (Hymenoptera: Vespidae) population dynamics in England over 39 years.

The Entomologist s monthly magazine ◽

10.31184/m00138908.1542.3906 ◽

2018 ◽

Vol 154 (2) ◽

pp. 149-155

Author(s):

Michael Archer

Keyword(s):

Population Dynamics ◽

Population Dynamic ◽

Ecological Factors ◽

Social Wasp ◽

Data Sets ◽

Data Set ◽

Vespula Germanica ◽

The Social ◽

Minimum Number ◽

Suction Traps

1. Yearly records of worker Vespula germanica (Fabricius) taken in suction traps at Silwood Park (28 years) and at Rothamsted Research (39 years) are examined. 2. Using the autocorrelation function (ACF), a significant negative 1-year lag followed by a lesser non-significant positive 2-year lag was found in all, or parts of, each data set, indicating an underlying population dynamic of a 2-year cycle with a damped waveform. 3. The minimum number of years before the 2-year cycle with damped waveform was shown varied between 17 and 26, or was not found in some data sets. 4. Ecological factors delaying or preventing the occurrence of the 2-year cycle are considered.

Download Full-text

Predictive and Descriptive CoMFA Models: The Effect of Variable Selection

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207321666180212162028 ◽

2018 ◽

Vol 21 (2) ◽

pp. 117-124 ◽

Cited By ~ 4

Author(s):

Bakhtyar Sepehri ◽

Nematollah Omidikia ◽

Mohsen Kompany-Zareh ◽

Raouf Ghavami

Keyword(s):

Variable Selection ◽

Predictive Power ◽

Selection Method ◽

Data Sets ◽

Data Set ◽

Comfa Model ◽

Variable Selection Method

Aims & Scope: In this research, 8 variable selection approaches were used to investigate the effect of variable selection on the predictive power and stability of CoMFA models. Materials & Methods: Three data sets including 36 EPAC antagonists, 79 CD38 inhibitors and 57 ATAD2 bromodomain inhibitors were modelled by CoMFA. First of all, for all three data sets, CoMFA models with all CoMFA descriptors were created then by applying each variable selection method a new CoMFA model was developed so for each data set, 9 CoMFA models were built. Obtained results show noisy and uninformative variables affect CoMFA results. Based on created models, applying 5 variable selection approaches including FFD, SRD-FFD, IVE-PLS, SRD-UVEPLS and SPA-jackknife increases the predictive power and stability of CoMFA models significantly. Result & Conclusion: Among them, SPA-jackknife removes most of the variables while FFD retains most of them. FFD and IVE-PLS are time consuming process while SRD-FFD and SRD-UVE-PLS run need to few seconds. Also applying FFD, SRD-FFD, IVE-PLS, SRD-UVE-PLS protect CoMFA countor maps information for both fields.

Download Full-text

Human Activity Recognition using Fourier Transform Inspired Deep Learning Combination Model

International Journal of Sensors Wireless Communications and Control ◽

10.2174/2210327908666180727123657 ◽

2019 ◽

Vol 9 (1) ◽

pp. 16-31

Author(s):

Kyungkoo Jun

Keyword(s):

Fourier Transform ◽

Deep Learning ◽

Short Term Memory ◽

Window Size ◽

Sensor Data ◽

Data Sets ◽

Data Set ◽

Proposed Model ◽

Testing Data ◽

Labeling Scheme

Background & Objective: This paper proposes a Fourier transform inspired method to classify human activities from time series sensor data. Methods: Our method begins by decomposing 1D input signal into 2D patterns, which is motivated by the Fourier conversion. The decomposition is helped by Long Short-Term Memory (LSTM) which captures the temporal dependency from the signal and then produces encoded sequences. The sequences, once arranged into the 2D array, can represent the fingerprints of the signals. The benefit of such transformation is that we can exploit the recent advances of the deep learning models for the image classification such as Convolutional Neural Network (CNN). Results: The proposed model, as a result, is the combination of LSTM and CNN. We evaluate the model over two data sets. For the first data set, which is more standardized than the other, our model outperforms previous works or at least equal. In the case of the second data set, we devise the schemes to generate training and testing data by changing the parameters of the window size, the sliding size, and the labeling scheme. Conclusion: The evaluation results show that the accuracy is over 95% for some cases. We also analyze the effect of the parameters on the performance.

Download Full-text

An Algorithm for the Removal of Cosmic Ray Artifacts in Spectral Data Sets

Applied Spectroscopy ◽

10.1177/0003702819839098 ◽

2019 ◽

Vol 73 (8) ◽

pp. 893-901

Author(s):

Sinead J. Barton ◽

Bryan M. Hennelly

Keyword(s):

Cosmic Ray ◽

Data Sets ◽

Biological Cells ◽

Statistical Classification ◽

Signal To Noise ◽

Multivariate Statistical ◽

Data Set ◽

Artefact Removal ◽

Single Capture ◽

Acquisition Method

Cosmic ray artifacts may be present in all photo-electric readout systems. In spectroscopy, they present as random unidirectional sharp spikes that distort spectra and may have an affect on post-processing, possibly affecting the results of multivariate statistical classification. A number of methods have previously been proposed to remove cosmic ray artifacts from spectra but the goal of removing the artifacts while making no other change to the underlying spectrum is challenging. One of the most successful and commonly applied methods for the removal of comic ray artifacts involves the capture of two sequential spectra that are compared in order to identify spikes. The disadvantage of this approach is that at least two recordings are necessary, which may be problematic for dynamically changing spectra, and which can reduce the signal-to-noise (S/N) ratio when compared with a single recording of equivalent duration due to the inclusion of two instances of read noise. In this paper, a cosmic ray artefact removal algorithm is proposed that works in a similar way to the double acquisition method but requires only a single capture, so long as a data set of similar spectra is available. The method employs normalized covariance in order to identify a similar spectrum in the data set, from which a direct comparison reveals the presence of cosmic ray artifacts, which are then replaced with the corresponding values from the matching spectrum. The advantage of the proposed method over the double acquisition method is investigated in the context of the S/N ratio and is applied to various data sets of Raman spectra recorded from biological cells.

Download Full-text

Imbalanced Data Detection Kernel Method in Closed Systems

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.756-759.3652 ◽

2013 ◽

Vol 756-759 ◽

pp. 3652-3658

Author(s):

You Li Lu ◽

Jun Luo

Keyword(s):

Kernel Methods ◽

Kernel Method ◽

Imbalanced Data ◽

Data Detection ◽

Data Sets ◽

System Call ◽

Data Set ◽

Imbalanced Data Sets ◽

Lower Complexity ◽

Closed Systems

Under the study of Kernel Methods, this paper put forward two improved algorithm which called R-SVM & I-SVDD in order to cope with the imbalanced data sets in closed systems. R-SVM used K-means algorithm clustering space samples while I-SVDD improved the performance of original SVDD by imbalanced sample training. Experiment of two sets of system call data set shows that these two algorithms are more effectively and R-SVM has a lower complexity.

Download Full-text