scholarly journals Efficient duplicate rate estimation from subsamples of sequencing libraries

Author(s):  
Christopher Schröder ◽  
Sven Rahmann

In high-throughput sequencing (HTS) projects, the sequenced fragments’ duplicate rate is a key quality metric. A high duplicate rate may arise from a low amount of input DNA and many PCR cycles. Many methods for downstream analyses require that duplicates be removed. If the duplicate rate is high, most of the sequencing effort and money spent would have been in vain. Therefore, it is of considerable interest to estimate the duplicate rate after sequencing only a small subsample at low depth (multiplexed with other libraries) for quality control before running the full experiment. In this article, we provide an elementary mathematical framework and an efficient computational approach based on quadratic and linear optimization to estimate the true duplicate rate from a small subsample. Our method is based on up-sampling the occupancy distribution of the reads’ copy numbers. Compared to an existing approach, we use an explicit and easily explained mathematical model that accurately inverts the sub-sampling process. We evaluate the performance of our approach in comparison to that of the existing method on several artificial and real datasets. The same ideas can be used for diversity estimation in general. Software implementing our approach is available under the MIT license.

2015 ◽  
Author(s):  
Christopher Schröder ◽  
Sven Rahmann

In high-throughput sequencing (HTS) projects, the sequenced fragments’ duplicate rate is a key quality metric. A high duplicate rate may arise from a low amount of input DNA and many PCR cycles. Many methods for downstream analyses require that duplicates be removed. If the duplicate rate is high, most of the sequencing effort and money spent would have been in vain. Therefore, it is of considerable interest to estimate the duplicate rate after sequencing only a small subsample at low depth (multiplexed with other libraries) for quality control before running the full experiment. In this article, we provide an elementary mathematical framework and an efficient computational approach based on quadratic and linear optimization to estimate the true duplicate rate from a small subsample. Our method is based on up-sampling the occupancy distribution of the reads’ copy numbers. Compared to an existing approach, we use an explicit and easily explained mathematical model that accurately inverts the sub-sampling process. We evaluate the performance of our approach in comparison to that of the existing method on several artificial and real datasets. The same ideas can be used for diversity estimation in general. Software implementing our approach is available under the MIT license.


2015 ◽  
Author(s):  
Christopher Schröder ◽  
Sven Rahmann

In high-throughput sequencing (HTS) projects, the sequenced fragments’ duplicate rate is a key quality metric. A high duplicate rate may arise from a low amount of input DNA and many PCR cycles. Many methods for downstream analyses require that duplicates be removed. If the duplicate rate is high, most of the sequencing effort and money spent would have been in vain. Therefore, it is of considerable interest to estimate the duplicate rate after sequencing only a small subsample at low depth (multiplexed with other libraries) for quality control before running the full experiment. In this article, we provide an elementary mathematical framework and an efficient computational approach based on quadratic and linear optimization to estimate the true duplicate rate from a small subsample. Our method is based on up-sampling the occupancy distribution of the reads’ copy numbers. Compared to an existing approach, we use an explicit and easily explained mathematical model that accurately inverts the sub-sampling process. We evaluate the performance of our approach in comparison to that of the existing method on several artificial and real datasets. The same ideas can be used for diversity estimation in general. Software implementing our approach is available under the MIT license.


2015 ◽  
Author(s):  
david miguez

The understanding of the regulatory processes that orchestrate stem cell maintenance is a cornerstone in developmental biology. Here, we present a mathematical model based on a branching process formalism that predicts average rates of proliferative and differentiative divisions in a given stem cell population. In the context of vertebrate spinal neurogenesis, the model predicts complex non-monotonic variations in the rates of pp, pd and dd modes of division as well as in cell cycle length, in agreement with experimental results. Moreover, the model shows that the differentiation probability follows a binomial distribution, allowing us to develop equations to predict the rates of each mode of division. A phenomenological simulation of the developing spinal cord informed with the average cell cycle length and division rates predicted by the mathematical model reproduces the correct dynamics of proliferation and differentiation in terms of average numbers of progenitors and differentiated cells. Overall, the present mathematical framework represents a powerful tool to unveil the changes in the rate and mode of division of a given stem cell pool by simply quantifying numbers of cells at different times.


Plants ◽  
2021 ◽  
Vol 10 (10) ◽  
pp. 2120
Author(s):  
Jessica Frigerio ◽  
Giulia Agostinetto ◽  
Valerio Mezzasalma ◽  
Fabrizio De De Mattia ◽  
Massimo Labra ◽  
...  

Medicinal plants have been widely used in traditional medicine due to their therapeutic properties. Although they are mostly used as herbal infusion and tincture, employment as ingredients of food supplements is increasing. However, fraud and adulteration are widespread issues. In our study, we aimed at evaluating DNA metabarcoding as a tool to identify product composition. In order to accomplish this, we analyzed fifteen commercial products with DNA metabarcoding, using two barcode regions: psbA-trnH and ITS2. Results showed that on average, 70% (44–100) of the declared ingredients have been identified. The ITS2 marker appears to identify more species (n = 60) than psbA-trnH (n = 35), with an ingredients’ identification rate of 52% versus 45%, respectively. Some species are identified only by one marker rather than the other. Additionally, in order to evaluate the quantitative ability of high-throughput sequencing (HTS) to compare the plant component to the corresponding assigned sequences, in the laboratory, we created six mock mixtures of plants starting both from biomass and gDNA. Our analysis also supports the application of DNA metabarcoding for a relative quantitative analysis. These results move towards the application of HTS analysis for studying the composition of herbal teas for medicinal plants’ traceability and quality control.


Author(s):  
Muddasar Anwar ◽  
Toufik Al Khawli ◽  
Irfan Hussain ◽  
Dongming Gan ◽  
Federico Renda

Purpose This paper aims to present a soft closed-chain modular gripper for robotic pick-and-place applications. The proposed biomimetic gripper design is inspired by the Fin Ray effect, derived from fish fins physiology. It is composed of three axisymmetric fingers, actuated with a single actuator. Each finger has a modular under-actuated closed-chain structure. The finger structure is compliant in contact normal direction, with stiff crossbeams reorienting to help the finger structure conform around objects. Design/methodology/approach Starting with the design and development of the proposed gripper, a consequent mathematical representation consisting of closed-chain forward and inverse kinematics is detailed. The proposed mathematical framework is validated through the finite element modeling simulations. Additionally, a set of experiments was conducted to compare the simulated and prototype finger trajectories, as well as to assess qualitative grasping ability. Findings Key Findings are the presented mathematical model for closed-loop chain mechanisms, as well as design and optimization guidelines to develop controlled closed-chain grippers. Research limitations/implications The proposed methodology and mathematical model could be taken as a fundamental modular base block to explore similar distributed degrees of freedom (DOF) closed-chain manipulators and grippers. The enhanced kinematic model contributes to optimized dynamics and control of soft closed-chain grasping mechanisms. Practical implications The approach is aimed to improve the development of soft grippers that are required to grasp complex objects found in human–robot cooperation and collaborative robot (cobot) applications. Originality/value The proposed closed-chain mathematical framework is based on distributed DOFs instead of the conventional lumped joint approach. This is to better optimize and understand the kinematics of soft robotic mechanisms.


1992 ◽  
Vol 114 (1) ◽  
pp. 174-179 ◽  
Author(s):  
N. P. Juster ◽  
P. M. Dew ◽  
A. de Pennington

One of the tests carried out by designers in an attempt to check whether an assembly of components will function correctly is tolerance analysis. Tolerance analysis, although relatively straightforward, is liable to be time consuming and error prone. It cannot be automated unless a suitable mathematical framework is developed to model the variations introduced by the manufacturing process. The designer allows for the variations by means of tolerances attached to the dimensions. This paper describes a suitable mathematical model and shows how it may be used to automate linear worst case tolerance analysis across assemblies. Experimental software has been written, based on the theory.


2017 ◽  
Vol 63 (6) ◽  
pp. 502-515 ◽  
Author(s):  
Natalie P. Blain ◽  
Bobbi L. Helgason ◽  
James J. Germida

The Bitumount Provincial Historic site is the location of 2 of the world’s first oil-extracting and -refining operations. Despite hydrocarbon levels ranging from 330 to 24 700 mg·(kg soil)−1, plants have been able to recolonize the site through means of natural revegetation. This study was designed to achieve a better understanding of the plant-root-associated bacterial partnerships occurring within naturally revegetated hydrocarbon-contaminated soils. Root endophytic bacterial communities were characterized from representative plant species throughout the site by both high-throughput sequencing and culturing techniques. Population abundance of rhizosphere and root endosphere bacteria was significantly influenced (p < 0.05) by plant species and sampling location. In general, members of the Actinomycetales, Rhizobiales, Pseudomonadales, Burkholderiales, and Sphingomonadales orders were the most commonly identified orders. Community structure of root-associated bacteria was influenced by both plant species and sampling location. Quantitative real-time polymerase chain reaction was used to determine the potential functional diversity of the root endophytic bacteria. The gene copy numbers of 16S rRNA and 2 hydrocarbon-degrading genes (CYP153 and alkB) were significantly affected (p < 0.05) by the interaction of plant species and sampling location. Our findings suggest that some of the bacterial communities detected are known to exhibit plant growth promotion characteristics.


2014 ◽  
Vol 13s1 ◽  
pp. CIN.S13890 ◽  
Author(s):  
Changjin Hong ◽  
Solaiappan Manimaran ◽  
William Evan Johnson

Quality control and read preprocessing are critical steps in the analysis of data sets generated from high-throughput genomic screens. In the most extreme cases, improper preprocessing can negatively affect downstream analyses and may lead to incorrect biological conclusions. Here, we present PathoQC, a streamlined toolkit that seamlessly combines the benefits of several popular quality control software approaches for preprocessing next-generation sequencing data. PathoQC provides a variety of quality control options appropriate for most high-throughput sequencing applications. PathoQC is primarily developed as a module in the PathoScope software suite for metagenomic analysis. However, PathoQC is also available as an open-source Python module that can run as a stand-alone application or can be easily integrated into any bioinformatics workflow. PathoQC achieves high performance by supporting parallel computation and is an effective tool that removes technical sequencing artifacts and facilitates robust downstream analysis. The PathoQC software package is available at http://sourceforge.net/projects/PathoScope/ .


Sign in / Sign up

Export Citation Format

Share Document