scholarly journals Third-party Annotations: Linking PlutoF platform and the ELIXIR Contextual Data ClearingHouse for the reporting of source material annotation gaps and inaccuracies

Author(s):  
Kessy Abarenkov ◽  
Allan Zirk ◽  
Kadri Põldmaa ◽  
Timo Piirmann ◽  
Raivo Pöhönen ◽  
...  

Third-party annotations are a valuable resource to improve the quality of public DNA sequences. For example, sequences in International Nucleotide Sequence Databases Collaboration (INSDC) often lack important features like taxon interactions, species level identification, information associated with habitat, locality, country, coordinates, etc. Therefore, initiatives to mine additional information from publications and link to the public DNA sequences have become common practice (e.g. Tedersoo et al. 2011, Nilsson et al. 2014, Groom et al. 2021). However, third-party annotations have their own specific challenges. For example, annotations can be inaccurate and therefore must be open for permanent data management. Further, every DNA sequence (except sequences from type material) can carry different species names, which must be databased as equal scientific hypotheses. PlutoF platform provides such data management services for third-party annotations. PlutoF is an online data management platform and computing service provider for biology and related disciplines. Registered users can enter and manage a wide range of data, e.g., taxon occurrences, metabarcoding data, taxon classifications, traits, and lab data. It also features an annotation module where third-party annotations (on material source, geolocation and habitat, taxonomic identifications, interacting taxa, etc.) can be added to any collection specimen, living culture or DNA sequence record. The UNITE Community is using these services to annotate and improve the quality of INSDC rDNA Internal Transcribed Spacer (ITS) sequence datasets. The National Center for Biotechnology Information (NCBI) is linking its ITS sequences with their annotations in PlutoF. However, there is still missing an automated solution for linking annotations in PlutoF with any sequence and sample record stored in INSDC databases. One of the ambitions of the BiCIKL Project is to solve this through operating the ELIXIR Contextual Data ClearingHouse (CDCH). CDCH offers a light and simple RESTful Application Programming Interface (API) to enable extension, correction and improvement of publicly available annotations on sample and sequence records available in ELIXIR data resources. It facilitates feeding improved or corrected annotations from databases (such as secondary databases, e.g., PlutoF, which consume and curate data from repositories) back to primary repositories (databases of the three INSDC collaborative partners). In the Biodiversity Community Integrated Knowledge Library (BiCIKL) Project, the University of Tartu Natural History Museum is leading the task of linking the two components—the web interface provided by the PlutoF platform and CDCH APIs—to allow user-friendly and effortless reporting of errors and gaps in sequenced material source annotations. The API and web interface will be promoted to those communities (such as taxonomists, those abstracting from the literature, and those already using the community curated data) with the appropriate knowledge and tools who will be encouraged to report their enhanced annotations back to primary repositories.

Author(s):  
Johannes Felix Simon Brachmann ◽  
Andreas Baumgartner ◽  
Peter Gege

The Calibration Home Base (CHB) is an optical laboratory designed for the calibration of imaging spectrometers for the VNIR/SWIR wavelength range. Radiometric, spectral and geometric calibration as well as the characterization of sensor signal dependency on polarization are realized in a precise and highly automated fashion. This allows to carry out a wide range of time consuming measurements in an ecient way. The implementation of ISO 9001 standards in all procedures ensures a traceable quality of results. Spectral measurements in the wavelength range 380–1000 nm are performed to a wavelength uncertainty of +- 0.1 nm, while an uncertainty of +-0.2 nm is reached in the wavelength range 1000 – 2500 nm. Geometric measurements are performed at increments of 1.7 µrad across track and 7.6 µrad along track. Radiometric measurements reach an absolute uncertainty of +-3% (k=1). Sensor artifacts, such as caused by stray light will be characterizable and correctable in the near future. For now, the CHB is suitable for the characterization of pushbroom sensors, spectrometers and cameras. However, it is planned to extend the CHBs capabilities in the near future such that snapshot hyperspectral imagers can be characterized as well. The calibration services of the CHB are open to third party customers from research institutes as well as industry.


Bioinformatics, which is now a well known field of study, originated in the context of biological sequence analysis. Recently graphical representation takes place for the research on DNA sequence. Research in biological sequence is mainly based on the function and its structure. Bioinformatics finds wide range of applications specifically in the domain of molecular biology which focuses on the analysis of molecules viz. DNA, RNA, Protein etc. In this review, we mainly deal with the similarity analysis between sequences and graphical representation of DNA sequence.


2015 ◽  
Vol 4 (2) ◽  
pp. 203-213 ◽  
Author(s):  
M. B. Krassovski ◽  
J. S. Riggs ◽  
L. A. Hook ◽  
W. R. Nettles ◽  
P. J. Hanson ◽  
...  

Abstract. Ecosystem-scale manipulation experiments represent large science investments that require well-designed data acquisition and management systems to provide reliable, accurate information to project participants and third party users. The SPRUCE project (Spruce and Peatland Responses Under Climatic and Environmental Change, http://mnspruce.ornl.gov) is such an experiment funded by the Department of Energy's (DOE), Office of Science, Terrestrial Ecosystem Science (TES) Program. The SPRUCE experimental mission is to assess ecosystem-level biological responses of vulnerable, high carbon terrestrial ecosystems to a range of climate warming manipulations and an elevated CO2 atmosphere. SPRUCE provides a platform for testing mechanisms controlling the vulnerability of organisms, biogeochemical processes, and ecosystems to climatic change (e.g., thresholds for organism decline or mortality, limitations to regeneration, biogeochemical limitations to productivity, and the cycling and release of CO2 and CH4 to the atmosphere). The SPRUCE experiment will generate a wide range of continuous and discrete measurements. To successfully manage SPRUCE data collection, achieve SPRUCE science objectives, and support broader climate change research, the research staff has designed a flexible data system using proven network technologies and software components. The primary SPRUCE data system components are the following: 1. data acquisition and control system – set of hardware and software to retrieve biological and engineering data from sensors, collect sensor status information, and distribute feedback to control components; 2. data collection system – set of hardware and software to deliver data to a central depository for storage and further processing; 3. data management plan – set of plans, policies, and practices to control consistency, protect data integrity, and deliver data. This publication presents our approach to meeting the challenges of designing and constructing an efficient data system for managing high volume sources of in situ observations in a remote, harsh environmental location. The approach covers data flow starting from the sensors and ending at the archival/distribution points, discusses types of hardware and software used, examines design considerations that were used to choose them, and describes the data management practices chosen to control and enhance the value of the data.


2021 ◽  
Vol 12 ◽  
Author(s):  
Xue Li ◽  
Ziqi Wei ◽  
Bin Wang ◽  
Tao Song

DNA computing is a new method based on molecular biotechnology to solve complex problems. The design of DNA sequences is a multi-objective optimization problem in DNA computing, whose objective is to obtain optimized sequences that satisfy multiple constraints to improve the quality of the sequences. However, the previous optimized DNA sequences reacted with each other, which reduced the number of DNA sequences that could be used for molecular hybridization in the solution and thus reduced the accuracy of DNA computing. In addition, a DNA sequence and its complement follow the principle of complementary pairing, and the sequence of base GC at both ends is more stable. To optimize the above problems, the constraints of Pairing Sequences Constraint (PSC) and Close-ending along with the Improved Chaos Whale (ICW) optimization algorithm were proposed to construct a DNA sequence set that satisfies the combination of constraints. The ICW optimization algorithm is added to a new predator–prey strategy and sine and cosine functions under the action of chaos. Compared with other algorithms, among the 23 benchmark functions, the new algorithm obtained the minimum value for one-third of the functions and two-thirds of the current minimum value. The DNA sequences satisfying the constraint combination obtained the minimum of fitness values and had stable and usable structures.


2020 ◽  
Author(s):  
John Michael Adrian Wojahn

Humans have become a major factor in reshaping the Earth’s biosphere. One of the major effects of human changes to the environment is an increase in the rate of species extinction as compared to background rates. Biodiversity hotspots are areas whose species assemblages are very rich (50% of the world’s plants and 42% of land vertebrates) yet very threatened with extinction ( > 70% habitat destruction), and which ought to be foci for conservation efforts. The intense peril in which the flora of these endangered regions are requires an equally intense response from the scientific community. This study investigated the benefits of adding genomic information to voucher specimens to alleviate the Linnaean (lack of species description), Wallacean (lack of data on species distribution) and Darwinian (lack of data on species evolution) shortfalls. An open-source R bioinformatic pipeline was developed to determine the percentage of vascular plant species present in biodiversity hotspots with at least one reproducible DNA sequence deposited on GenBank. Reproducible DNA sequences were defined as being underpinned by traceable material and methods and accurate taxonomic identifications. A vascular plant species checklist for the 36 biodiversity hotspots was inferred using 32,914,892 GBIF occurrences, comprising 204,044 species. A total of 736,532 GenBank accessions (representing DNA barcodes) were downloaded for those species. Associated abstracts and metadata were mined from 3,127 publications deposited on PubMed to assess DNA sequences reproducibility. The reproducibility of each study was tested by a sentiments (natural language processing) analysis. Overall, the analyses indicated that the reproducibility crisis also extended to the realm of biodiversity. There was a significant shortfall in genetic information available for biodiversity hotspots, where 80.3% of the sequences produced (591,431) were not reproducible. This meant that only 19.7% of sequences—representing only 37,637 species (18% of the total)— were reproducible. This phenomenon was named the Wu-Meyersian shortfall to recognize that we are critically lacking DNA sequence data for threatened biodiversity. This shortfall was named in honor of Ray Wu (the father of DNA sequencing; 1928-2008) and Norman Meyers (a pioneer in establishing biodiversity hotspots; 1934-2019). Working on this shortfall could contribute to alleviating the Linnean, Wallacean and Darwinian shortfalls and support conservation. Information was particularly lacking in tropical biodiversity hotspots, but no biodiversity hotspot other than Japan had > 50% of its flora reproducibly sequenced. Older biodiversity hotspots were less known than those established more recently. This is concerning since those are among the most diverse and threatened (e.g. Madagascar, Sundaland). From a DNA region perspective, ITS (23,422 species), matK (17,164 species), and rbcL (16,509 species) were the most commonly used barcodes. From a lineage perspective, gymnosperms (N=895) are exceptionally well-sequenced, with three quarters of their species having been reproducibly sequenced. Angiosperms are comparatively poorly sequenced (18%), but this may be explained by their extreme diversity (N=195,433). Finally, ferns and their allies (N=7,716) are poorly sequenced (22%). This is especially troubling because extinction of these species would represent the loss of hundreds of millions of years of unique evolutionary history. This study finally proposed best practices to ensure maximizing reproducibility of DNA sequences produced by the scientific community. The bioinformatic pipeline can be applied to systems at multiple geographical scales and any taxonomic groups and is therefore appealing to a wide range of stakeholders. We recommended using it periodically to monitor progress towards alleviating the Wu-Meyersian shortfall.


Author(s):  
M. B. Krassovski ◽  
J. S. Riggs ◽  
L. A. Hook ◽  
W. R. Nettles ◽  
P. J. Hanson ◽  
...  

Abstract. Ecosystem-scale manipulation experiments represent large science investments that require well-designed data acquisition and management systems to provide reliable, accurate information to project participants and third party users. The SPRUCE Project (Spruce and Peatland Responses Under Climatic and Environmental Change, http://mnspruce.ornl.gov) is such an experiment funded by the Department of Energy's (DOE), Office of Science, Terrestrial Ecosystem Science (TES) Program. The SPRUCE experimental mission is to assess ecosystem-level biological responses of vulnerable, high carbon terrestrial ecosystems to a range of climate warming manipulations and an elevated CO2 atmosphere. SPRUCE provides a platform for testing mechanisms controlling the vulnerability of organisms, biogeochemical processes, and ecosystems to climatic change (e.g., thresholds for organism decline or mortality, limitations to regeneration, biogeochemical limitations to productivity, the cycling and release of CO2 and CH4 to the atmosphere). The SPRUCE experiment will generate a wide range of continuous and discrete measurements. To successfully manage SPRUCE data collection, achieve SPRUCE science objectives, and support broader climate change research, the research staff has designed a flexible data system using proven network technologies and software components. The primary SPRUCE data system components are: 1. Data acquisition and control system – set of hardware and software to retrieve biological and engineering data from sensors, collect sensor status information, and distribute feedback to control components. 2. Data collection system – set of hardware and software to deliver data to a central depository for storage and further processing. 3. Data management plan – set of plans, policies, and practices to control consistency, protect data integrity, and deliver data. This publication presents our approach to meeting the challenges of designing and constructing an efficient data system for managing high volume sources of in-situ observations in a remote, harsh environmental location. The approach covers data flow starting from the sensors and ending at the archival/distribution points, discusses types of hardware and software used, examines design considerations that were used to choose them, and describes the data management practices chosen to control and enhance the value of the data.


Author(s):  
Barbara Trask ◽  
Susan Allen ◽  
Anne Bergmann ◽  
Mari Christensen ◽  
Anne Fertitta ◽  
...  

Using fluorescence in situ hybridization (FISH), the positions of DNA sequences can be discretely marked with a fluorescent spot. The efficiency of marking DNA sequences of the size cloned in cosmids is 90-95%, and the fluorescent spots produced after FISH are ≈0.3 μm in diameter. Sites of two sequences can be distinguished using two-color FISH. Different reporter molecules, such as biotin or digoxigenin, are incorporated into DNA sequence probes by nick translation. These reporter molecules are labeled after hybridization with different fluorochromes, e.g., FITC and Texas Red. The development of dual band pass filters (Chromatechnology) allows these fluorochromes to be photographed simultaneously without registration shift.


2020 ◽  
Vol 7 (2) ◽  
pp. 34-41
Author(s):  
VLADIMIR NIKONOV ◽  
◽  
ANTON ZOBOV ◽  

The construction and selection of a suitable bijective function, that is, substitution, is now becoming an important applied task, particularly for building block encryption systems. Many articles have suggested using different approaches to determining the quality of substitution, but most of them are highly computationally complex. The solution of this problem will significantly expand the range of methods for constructing and analyzing scheme in information protection systems. The purpose of research is to find easily measurable characteristics of substitutions, allowing to evaluate their quality, and also measures of the proximity of a particular substitutions to a random one, or its distance from it. For this purpose, several characteristics were proposed in this work: difference and polynomial, and their mathematical expectation was found, as well as variance for the difference characteristic. This allows us to make a conclusion about its quality by comparing the result of calculating the characteristic for a particular substitution with the calculated mathematical expectation. From a computational point of view, the thesises of the article are of exceptional interest due to the simplicity of the algorithm for quantifying the quality of bijective function substitutions. By its nature, the operation of calculating the difference characteristic carries out a simple summation of integer terms in a fixed and small range. Such an operation, both in the modern and in the prospective element base, is embedded in the logic of a wide range of functional elements, especially when implementing computational actions in the optical range, or on other carriers related to the field of nanotechnology.


2019 ◽  
pp. 462-471
Author(s):  
Lyudmila Shirokova

The historical polyethnicity of the Slovak society and the connected problems of the interrelations of cultures, ethics, interpersonal relations, are reflected in the works of modern Slovak prose. They are represented most clearly in the novels of middle generation writers P. Rankov, S. Lavrík, P. Krištúfek. They dwell upon the dramatical events of the 20 th century. They cover wide range problems, from the fruitful coexistence of various ethnic groups and their representatives to national contradictions and racial repressions. The artistic quality of the mentioned works, their composition, the way of narrating, the type of the main character, can be highly evaluated. For example, in a novel by P. Rankov the plot, in spite of its linearity, is a chain of episodes in the span of 30 years from the life of the main characters. It reflects not only their fates, but also the historical and political changes of the world they live in. The main female character of a S. Lavrík ’s novel narrates about everyday life and tragedies in the lives of the dwellers of a Slovak town in the Slovak Republic during the war. P. Krištúfek in his novel focuses on several decades from the life of a Slovak-Jewish family and dwellers of a Slovak provincial society with types and relations specific for this milieu.


Sign in / Sign up

Export Citation Format

Share Document