Simulating the Dynamics of Targeted Capture Sequencing with CapSim

Mapping Intimacies ◽

10.1101/134510 ◽

2017 ◽

Cited By ~ 1

Author(s):

Minh Duc Cao ◽

Devika Ganesamoorthy ◽

Lachlan J.M. Coin

Keyword(s):

Statistical Power ◽

Simulated Data ◽

Targeted Sequencing ◽

Design Parameters ◽

Probe Design ◽

Analysis Pipeline ◽

A Genome ◽

Targeted Capture ◽

Sequencing Platforms ◽

Sequencing Process

AbstractMotivationTargeted sequencing using capture probes has become increasingly popular in clinical applications due to its scalability and cost-effectiveness. The approach also allows for higher sequencing coverage of the targeted regions resulting in better analysis statistical power. However, because of the dynamics of the hybridisation process, it is difficult to evaluate the efficiency of the probe design prior to the experiments which are time consuming and costly.ResultsWe developed CapSim, a software package for simulation of targeted sequencing. Given a genome sequence and a set of probes, CapSim simulates the fragmentation, the dynamics of probe hybridisation, and the sequencing of the captured fragments on Illumina and PacBio sequencing platforms. The simulated data can be used for evaluating the performance of the analysis pipeline, as well as the efficiency of the probe design. Parameters of the various stages in the sequencing process can also be evaluated in order to optimise the efficacy of the experiments.AvailabilityCapSim is publicly available under BSD license at https://github.com/mdcao/capsim.

Download Full-text

AFLAP: assembly-free linkage analysis pipeline using k-mers from genome sequencing data

Genome Biology ◽

10.1186/s13059-021-02326-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Kyle Fletcher ◽

Lin Zhang ◽

Juliana Gil ◽

Rongkui Han ◽

Keri Cavanaugh ◽

...

Keyword(s):

Linkage Analysis ◽

Genome Sequencing ◽

Genome Assembly ◽

Simulated Data ◽

Genetic Maps ◽

Sequencing Data ◽

Analysis Pipeline ◽

A Genome ◽

Genotype By Sequencing ◽

Genome Assemblies

AbstractOur assembly-free linkage analysis pipeline (AFLAP) identifies segregating markers as k-mers in the raw reads without using a reference genome assembly for calling variants and provides genotype tables for the construction of unbiased, high-density genetic maps without a genome assembly. AFLAP is validated and contrasted to a conventional workflow using simulated data. AFLAP is applied to whole genome sequencing and genotype-by-sequencing data of F1, F2, and recombinant inbred populations of two different plant species, producing genetic maps that are concordant with genome assemblies. The AFLAP-based genetic map for Bremia lactucae enables the production of a chromosome-scale genome assembly.

Download Full-text

Building a Genome Analysis Pipeline to Predict Disease Risk and Prevent Disease

Journal of Molecular Biology ◽

10.1016/j.jmb.2013.07.038 ◽

2013 ◽

Vol 425 (21) ◽

pp. 3993-4005 ◽

Cited By ~ 27

Author(s):

Y. Bromberg

Keyword(s):

Genome Analysis ◽

Disease Risk ◽

Analysis Pipeline ◽

A Genome

Download Full-text

Optimal Sample Allocation Under Unequal Costs in Cluster-Randomized Trials

Journal of Educational and Behavioral Statistics ◽

10.3102/1076998620912418 ◽

2020 ◽

Vol 45 (4) ◽

pp. 446-474

Author(s):

Zuchao Shen ◽

Benjamin Kelcey

Keyword(s):

Optimal Design ◽

Statistical Power ◽

Randomized Trials ◽

Intraclass Correlation ◽

R Package ◽

Design Parameters ◽

Cluster Randomized Trials ◽

Cost Structures ◽

Cluster Randomized ◽

Sample Allocation

Conventional optimal design frameworks consider a narrow range of sampling cost structures that thereby constrict their capacity to identify the most powerful and efficient designs. We relax several constraints of previous optimal design frameworks by allowing for variable sampling costs in cluster-randomized trials. The proposed framework introduces additional design considerations and has the potential to identify designs with more statistical power, even when some parameters are constrained due to immutable practical concerns. The results also suggest that the gains in efficiency introduced through the expanded framework are fairly robust to misspecifications of the expanded cost structure and concomitant design parameters (e.g., intraclass correlation coefficient). The proposed framework is implemented in the R package odr.

Download Full-text

GenMap: ultra-fast computation of genome mappability

Bioinformatics ◽

10.1093/bioinformatics/btaa222 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3687-3692 ◽

Cited By ~ 1

Author(s):

Christopher Pockrandt ◽

Mai Alzamel ◽

Costas S Iliopoulos ◽

Knut Reinert

Keyword(s):

Source Code ◽

Probe Design ◽

Fast Method ◽

Biological Applications ◽

Fast Computation ◽

Genomic Position ◽

Guide Rna ◽

Binary Output ◽

A Genome ◽

Reciprocal Value

Abstract Motivation Computing the uniqueness of k-mers for each position of a genome while allowing for up to e mismatches is computationally challenging. However, it is crucial for many biological applications such as the design of guide RNA for CRISPR experiments. More formally, the uniqueness or (k, e)-mappability can be described for every position as the reciprocal value of how often this k-mer occurs approximately in the genome, i.e. with up to e mismatches. Results We present a fast method GenMap to compute the (k, e)-mappability. We extend the mappability algorithm, such that it can also be computed across multiple genomes where a k-mer occurrence is only counted once per genome. This allows for the computation of marker sequences or finding candidates for probe design by identifying approximate k-mers that are unique to a genome or that are present in all genomes. GenMap supports different formats such as binary output, wig and bed files as well as csv files to export the location of all approximate k-mers for each genomic position. Availability and implementation GenMap can be installed via bioconda. Binaries and C++ source code are available on https://github.com/cpockrandt/genmap.

Download Full-text

A genome-wide scan for a simulated data set using two newly developed methods

Genetic Epidemiology ◽

10.1002/gepi.13701707101 ◽

1999 ◽

Vol 17 (S1) ◽

pp. S621-S626

Author(s):

Li Hsu ◽

Corinne Aragaki ◽

Filemon Quiaoit ◽

Xiangjing Wang ◽

Xiubin Xu ◽

...

Keyword(s):

Simulated Data ◽

Data Set ◽

Genome Wide ◽

A Genome ◽

Genome Wide Scan

Download Full-text

An ancestral recombination graph of human, Neanderthal, and Denisovan genomes

Science Advances ◽

10.1126/sciadv.abc0776 ◽

2021 ◽

Vol 7 (29) ◽

pp. eabc0776

Author(s):

Nathan K. Schaefer ◽

Beth Shapiro ◽

Richard E. Green

Keyword(s):

Incomplete Lineage Sorting ◽

Simulated Data ◽

Modern Human ◽

Ancestral Recombination Graph ◽

Lineage Sorting ◽

Human Genomes ◽

Genome Wide ◽

A Genome ◽

Graph Inference ◽

And Function

Many humans carry genes from Neanderthals, a legacy of past admixture. Existing methods detect this archaic hominin ancestry within human genomes using patterns of linkage disequilibrium or direct comparison to Neanderthal genomes. Each of these methods is limited in sensitivity and scalability. We describe a new ancestral recombination graph inference algorithm that scales to large genome-wide datasets and demonstrate its accuracy on real and simulated data. We then generate a genome-wide ancestral recombination graph including human and archaic hominin genomes. From this, we generate a map within human genomes of archaic ancestry and of genomic regions not shared with archaic hominins either by admixture or incomplete lineage sorting. We find that only 1.5 to 7% of the modern human genome is uniquely human. We also find evidence of multiple bursts of adaptive changes specific to modern humans within the past 600,000 years involving genes related to brain development and function.

Download Full-text

A Bayesian nonparametric semi-supervised model for integration of multiple single-cell experiments

10.1101/2020.01.14.906313 ◽

2020 ◽

Author(s):

Archit Verma ◽

Barbara Engelhardt

Keyword(s):

Single Cell ◽

Latent Variable ◽

Environmental Variability ◽

Simulated Data ◽

Joint Analysis ◽

Variable Model ◽

Manifold Alignment ◽

Multiple Data Sets ◽

Sequencing Platforms ◽

Low Dimensional

Joint analysis of multiple single cell RNA-sequencing (scRNA-seq) data is confounded by technical batch effects across experiments, biological or environmental variability across cells, and different capture processes across sequencing platforms. Manifold alignment is a principled, effective tool for integrating multiple data sets and controlling for confounding factors. We demonstrate that the semi-supervised t-distributed Gaussian process latent variable model (sstGPLVM), which projects the data onto a mixture of fixed and latent dimensions, can learn a unified low-dimensional embedding for multiple single cell experiments with minimal assumptions. We show the efficacy of the model as compared with state-of-the-art methods for single cell data integration on simulated data, pancreas cells from four sequencing technologies, induced pluripotent stem cells from male and female donors, and mouse brain cells from both spatial seqFISH+ and traditional scRNA-seq.Code and data is available at https://github.com/architverma1/sc-manifold-alignment

Download Full-text

A Universal Analysis Pipeline for Hybrid Capture-Based Targeted Sequencing Data with Unique Molecular Indexes

Genomics & Informatics ◽

10.5808/gi.2018.16.4.e29 ◽

2018 ◽

Vol 16 (4) ◽

pp. e29 ◽

Cited By ~ 1

Author(s):

Min-Jung Kim ◽

Si-Cho Kim ◽

Young-Joon Kim

Keyword(s):

Targeted Sequencing ◽

Sequencing Data ◽

Analysis Pipeline ◽

Hybrid Capture

Download Full-text

A Clustering Approach for Motif Discovery in ChIP-Seq Dataset

Entropy ◽

10.3390/e21080802 ◽

2019 ◽

Vol 21 (8) ◽

pp. 802

Author(s):

Chun-xiao Sun ◽

Yu Yang ◽

Hua Wang ◽

Wen-hu Wang

Keyword(s):

Dna Sequences ◽

Motif Discovery ◽

Simulated Data ◽

Data Set ◽

Genome Wide ◽

A Genome ◽

Wide Scale ◽

Clustering Approach ◽

Ap Clustering ◽

Generation Sequencing

Chromatin immunoprecipitation combined with next-generation sequencing (ChIP-Seq) technology has enabled the identification of transcription factor binding sites (TFBSs) on a genome-wide scale. To effectively and efficiently discover TFBSs in the thousand or more DNA sequences generated by a ChIP-Seq data set, we propose a new algorithm named AP-ChIP. First, we set two thresholds based on probabilistic analysis to construct and further filter the cluster subsets. Then, we use Affinity Propagation (AP) clustering on the candidate cluster subsets to find the potential motifs. Experimental results on simulated data show that the AP-ChIP algorithm is able to make an almost accurate prediction of TFBSs in a reasonable time. Also, the validity of the AP-ChIP algorithm is tested on a real ChIP-Seq data set.

Download Full-text

Sequencing of the black rockfish chromosomal genome provides insight into sperm storage in the female ovary

DNA Research ◽

10.1093/dnares/dsz023 ◽

2019 ◽

Vol 26 (6) ◽

pp. 453-464 ◽

Cited By ~ 2

Author(s):

Qinghua Liu ◽

Xueying Wang ◽

Yongshuang Xiao ◽

Haixia Zhao ◽

Shihong Xu ◽

...

Keyword(s):

Gasterosteus Aculeatus ◽

Glucose Transporter ◽

Sperm Storage ◽

Internal Fertilization ◽

Black Rockfish ◽

Kegg Pathways ◽

A Genome ◽

Sequencing Platforms ◽

Female Ovary

Abstract Black rockfish (Sebastes schlegelii) is an economically important viviparous marine teleost in Japan, Korea, and China. It is characterized by internal fertilization, long-term sperm storage in the female ovary, and a high abortion rate. For better understanding the mechanism of fertilization and gestation, it is essential to establish a reference genome for viviparous teleosts. Herein, we used a combination of Pacific Biosciences sequel, Illumina sequencing platforms, 10× Genomics, and Hi-C technology to obtain a genome assembly size of 848.31 Mb comprising 24 chromosomes, and contig and scaffold N50 lengths of 2.96 and 35.63 Mb, respectively. We predicted 39.98% repetitive elements, and 26,979 protein-coding genes. S. schlegelii diverged from Gasterosteus aculeatus ∼32.1-56.8 million years ago. Furthermore, sperm remained viable within the ovary for up to 6 months. The glucose transporter SLC2 showed significantly positive genomic selection, and carbohydrate metabolism-related KEGG pathways were significantly up-regulated in ovaries after copulation. In vitro suppression of glycolysis with sodium iodoacetate reduced sperm longevity significantly. The results indicated the importance of carbohydrates in maintaining sperm survivability. Decoding the S. schlegelii genome not only provides new insights into sperm storage; additionally, it is highly valuable for marine researchers and reproduction biologists.

Download Full-text