Dual indexed design of in-Drop single-cell RNA-seq libraries improves sequencing quality and throughput

Mapping Intimacies ◽

10.1101/835488 ◽

2019 ◽

Author(s):

Austin N. Southard Smith ◽

Alan J. Simmons ◽

Bob Chen ◽

Angela L. Jones ◽

Marisol A. Ramirez Solano ◽

...

Keyword(s):

Single Cell ◽

High Throughput ◽

Cost Effective ◽

Quality Data ◽

Sequencing Data ◽

High Quality ◽

High Data ◽

Sequencing Technologies ◽

Effective Manner ◽

Sequencing Quality

AbstractThe increasing demand of single-cell RNA-sequencing (scRNA-seq) experiments, such as the number of experiments and cells queried per experiment, necessitates higher sequencing depth coupled to high data quality. New high-throughput sequencers, such as the Illumina NovaSeq 6000, enables this demand to be filled in a cost-effective manner. However, current scRNA-seq library designs present compatibility challenges with newer sequencing technologies, such as index-hopping, and their ability to generate high quality data has yet to be systematically evaluated. Here, we engineered a new dual-indexed library structure, called TruDrop, on top of the inDrop scRNA-seq platform to solve these compatibility challenges, such that TruDrop libraries and standard Illumina libraries can be sequenced alongside each other on the NovaSeq. We overcame the index-hopping issue, demonstrated significant improvements in base-calling accuracy, and provided an example of multiplexing twenty-four scRNA-seq libraries simultaneously. We showed favorable comparisons in transcriptional diversity of TruDrop compared with prior library structures. Our approach enables cost-effective, high throughput generation of sequencing data with high quality, which should enable more routine use of scRNA-seq technologies.

Download Full-text

PgRC: Pseudogenome based Read Compressor

10.1101/710822 ◽

2019 ◽

Author(s):

Tomasz Kowalski ◽

Szymon Grabowski

Keyword(s):

High Throughput ◽

Compression Ratio ◽

High Throughput Sequencing ◽

Sequencing Data ◽

High Quality ◽

Link Type ◽

Sequencing Technologies ◽

Significant Interest ◽

The One ◽

Shortest Common Superstring

AbstractMotivationThe amount of sequencing data from High-Throughput Sequencing technologies grows at a pace exceeding the one predicted by Moore’s law. One of the basic requirements is to efficiently store and transmit such huge collections of data. Despite significant interest in designing FASTQ compressors, they are still imperfect in terms of compression ratio or decompression resources.ResultsWe present Pseudogenome-based Read Compressor (PgRC), an in-memory algorithm for compressing the DNA stream, based on the idea of building an approximation of the shortest common superstring over high-quality reads. Experiments show that PgRC wins in compression ratio over its main competitors, SPRING and Minicom, by up to 18 and 21 percent on average, respectively, while being at least comparably fast in decompression.AvailabilityPgRC can be downloaded from https://github.com/kowallus/[email protected]

Download Full-text

W2RAP: a pipeline for high quality, robust assemblies of large complex genomes from short read data

10.1101/110999 ◽

2017 ◽

Cited By ~ 9

Author(s):

Bernardo J. Clavijo ◽

Gonzalo Garcia Accinelli ◽

Jonathan Wright ◽

Darren Heavens ◽

Katie Barr ◽

...

Keyword(s):

De Novo ◽

Low Cost ◽

Cost Effective ◽

Data Generation ◽

Sequencing Data ◽

High Quality ◽

Crop Species ◽

Short Read ◽

Link Type ◽

Sequencing Technologies

AbstractProducing high-quality whole-genome shotgun de novo assemblies from plant and animal species with large and complex genomes using low-cost short read sequencing technologies remains a challenge. But when the right sequencing data, with appropriate quality control, is assembled using approaches focused on robustness of the process rather than maximization of a single metric such as the usual contiguity estimators, good quality assemblies with informative value for comparative analyses can be produced. Here we present a complete method described from data generation and qc all the way up to scaffold of complex genomes using Illumina short reads and its application to data from plants and human datasets. We show how to use the w2rap pipeline following a metric-guided approach to produce cost-effective assemblies. The assemblies are highly accurate, provide good coverage of the genome and show good short range contiguity. Our pipeline has already enabled the rapid, cost-effective generation of de novo genome assemblies from large, polyploid crop species with a focus on comparative genomics.Availabilityw2rap is available under MIT license, with some subcomponents under GPL-licenses. A ready-to-run docker with all software pre-requisites and example data is also available.http://github.com/bioinfologics/w2raphttp://github.com/bioinfologics/w2rap-contigger

Download Full-text

Detecting selection in low-coverage high-throughput sequencing data using principal component analysis

BMC Bioinformatics ◽

10.1186/s12859-021-04375-2 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Jonas Meisner ◽

Anders Albrechtsen ◽

Kristian Hanghøj

Keyword(s):

Principal Component Analysis ◽

High Throughput ◽

East Asian ◽

Principal Component ◽

Component Analysis ◽

Human Populations ◽

Population Genetic Study ◽

Sequencing Data ◽

High Quality ◽

Low Coverage

Abstract Background Identification of selection signatures between populations is often an important part of a population genetic study. Leveraging high-throughput DNA sequencing larger sample sizes of populations with similar ancestries has become increasingly common. This has led to the need of methods capable of identifying signals of selection in populations with a continuous cline of genetic differentiation. Individuals from continuous populations are inherently challenging to group into meaningful units which is why existing methods rely on principal components analysis for inference of the selection signals. These existing methods require called genotypes as input which is problematic for studies based on low-coverage sequencing data. Materials and methods We have extended two principal component analysis based selection statistics to genotype likelihood data and applied them to low-coverage sequencing data from the 1000 Genomes Project for populations with European and East Asian ancestry to detect signals of selection in samples with continuous population structure. Results Here, we present two selections statistics which we have implemented in the framework. These methods account for genotype uncertainty, opening for the opportunity to conduct selection scans in continuous populations from low and/or variable coverage sequencing data. To illustrate their use, we applied the methods to low-coverage sequencing data from human populations of East Asian and European ancestries and show that the implemented selection statistics can control the false positive rate and that they identify the same signatures of selection from low-coverage sequencing data as state-of-the-art software using high quality called genotypes. Conclusion We show that selection scans of low-coverage sequencing data of populations with similar ancestry perform on par with that obtained from high quality genotype data. Moreover, we demonstrate that outperform selection statistics obtained from called genotypes from low-coverage sequencing data without the need for ad-hoc filtering.

Download Full-text

Identifying tumor clones in sparse single-cell mutation data

Bioinformatics ◽

10.1093/bioinformatics/btaa449 ◽

2020 ◽

Vol 36 (Supplement_1) ◽

pp. i186-i193

Author(s):

Matthew A Myers ◽

Simone Zaccaria ◽

Benjamin J Raphael

Keyword(s):

Single Cell ◽

Genome Sequencing ◽

Whole Genome ◽

Sequencing Data ◽

Single Nucleotide ◽

Sequencing Coverage ◽

Sequencing Technologies ◽

Low Coverage ◽

Clonal Composition ◽

Cancer Studies

Abstract Motivation Recent single-cell DNA sequencing technologies enable whole-genome sequencing of hundreds to thousands of individual cells. However, these technologies have ultra-low sequencing coverage (<0.5× per cell) which has limited their use to the analysis of large copy-number aberrations (CNAs) in individual cells. While CNAs are useful markers in cancer studies, single-nucleotide mutations are equally important, both in cancer studies and in other applications. However, ultra-low coverage sequencing yields single-nucleotide mutation data that are too sparse for current single-cell analysis methods. Results We introduce SBMClone, a method to infer clusters of cells, or clones, that share groups of somatic single-nucleotide mutations. SBMClone uses a stochastic block model to overcome sparsity in ultra-low coverage single-cell sequencing data, and we show that SBMClone accurately infers the true clonal composition on simulated datasets with coverage at low as 0.2×. We applied SBMClone to single-cell whole-genome sequencing data from two breast cancer patients obtained using two different sequencing technologies. On the first patient, sequenced using the 10X Genomics CNV solution with sequencing coverage ≈0.03×, SBMClone recovers the major clonal composition when incorporating a small amount of additional information. On the second patient, where pre- and post-treatment tumor samples were sequenced using DOP-PCR with sequencing coverage ≈0.5×, SBMClone shows that tumor cells are present in the post-treatment sample, contrary to published analysis of this dataset. Availability and implementation SBMClone is available on the GitHub repository https://github.com/raphael-group/SBMClone. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Genome sequences of horticultural plants: past, present, and future

Horticulture Research ◽

10.1038/s41438-019-0195-6 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 14

Author(s):

Fei Chen ◽

Yunfeng Song ◽

Xiaojiang Li ◽

Junhao Chen ◽

Lan Mo ◽

...

Keyword(s):

Data Storage ◽

Genome Sequencing ◽

Herbal Medicines ◽

Genome Project ◽

Plant Genome ◽

Quality Data ◽

Sequencing Data ◽

Sequencing Technologies ◽

Horticultural Plant ◽

Horticultural Plants

Abstract Horticultural plants play various and critical roles for humans by providing fruits, vegetables, materials for beverages, and herbal medicines and by acting as ornamentals. They have also shaped human art, culture, and environments and thereby have influenced the lifestyles of humans. With the advent of sequencing technologies, there has been a dramatic increase in the number of sequenced genomes of horticultural plant species in the past decade. The genomes of horticultural plants are highly diverse and complex, often with a high degree of heterozygosity and a high ploidy due to their long and complex history of evolution and domestication. Here we summarize the advances in the genome sequencing of horticultural plants, the reconstruction of pan-genomes, and the development of horticultural genome databases. We also discuss past, present, and future studies related to genome sequencing, data storage, data quality, data sharing, and data visualization to provide practical guidance for genomic studies of horticultural plants. Finally, we propose a horticultural plant genome project as well as the roadmap and technical details toward three goals of the project.

Download Full-text

Measuring Costs and Outcomes of Tele-Intervention When Serving Families of Children who are Deaf/Hard-of-Hearing

International Journal of Telerehabilitation ◽

10.5195/ijt.2013.6129 ◽

2013 ◽

Vol 5 (2) ◽

pp. 3-10 ◽

Cited By ~ 21

Author(s):

Kristina M. Blaiser ◽

Diane Behl ◽

Catherine Callow-Heusser ◽

Karl R. White

Keyword(s):

Early Intervention ◽

Hard Of Hearing ◽

Home Visit ◽

Rating Scales ◽

Child Outcomes ◽

Cost Effective ◽

High Quality ◽

Early Intervention Services ◽

Effective Manner ◽

Intervention Services

Background: Optimal outcomes for children who are deaf/hard-of-hearing (DHH) depend on access to high quality, specialized early intervention services. Tele-intervention – the delivery of early intervention services via telehealth technology - has the potential to meet this need in a cost-effective manner. Method: Twenty-seven families of infants and toddlers with varying degrees of hearing loss participated in a randomized study, receiving their services primarily through TI or via traditional in-person home visits. Pre- and post-test measures of child outcomes, family and provider statisfaction, and costs were collected. Results: The TI group scored statistically significantly higher on the expressive language measure than the in-person group (p =.03). A measure of home visit quality revealed that the TI group scored statistically significantly better on the Parent Engagement subscale of the Home Visit Rating Scales-Adapted & Extended (HOVRS-A+; Roggman, et al., 2012). Cost savings associate with providing services via TI increased as the intensity of service delivery increased. Although most providers and families were positive about TI, there was great variability in their perceptions. Conclusions: Tele-intervention is a promising cost-effective method for delivering high quality early intervention services to families of children who are DHH.

Download Full-text

Splatter: simulation of single-cell RNA sequencing data

10.1101/133173 ◽

2017 ◽

Cited By ~ 8

Author(s):

Luke Zappia ◽

Belinda Phipson ◽

Alicia Oshlack

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Real Data ◽

Cell Types ◽

Rna Seq ◽

Sequencing Data ◽

Sequencing Technologies ◽

Simulation Based ◽

Single Cell Rna Sequencing ◽

Multiple Cell

AbstractAs single-cell RNA sequencing technologies have rapidly developed, so have analysis methods. Many methods have been tested, developed and validated using simulated datasets. Unfortunately, current simulations are often poorly documented, their similarity to real data is not demonstrated, or reproducible code is not available.Here we present the Splatter Bioconductor package for simple, reproducible and well-documented simulation of single-cell RNA-seq data. Splatter provides an interface to multiple simulation methods including Splat, our own simulation, based on a gamma-Poisson distribution. Splat can simulate single populations of cells, populations with multiple cell types or differentiation paths.

Download Full-text

DR2S: An Integrated Algorithm Providing Reference-Grade Haplotype Sequences from Heterozygous Samples

10.1101/2020.11.09.374140 ◽

2020 ◽

Author(s):

Steffen Klasberg ◽

Alexander H. Schmidt ◽

Vinzenz Lange ◽

Gerhard Schöfl

Keyword(s):

Allelic Variation ◽

R Package ◽

Full Length ◽

Reference Sequence ◽

Read Length ◽

Sequencing Data ◽

High Quality ◽

Reference Allele ◽

Sequencing Technologies ◽

Generation Sequencing

AbstractBackgroundHigh resolution HLA genotyping of donors and recipients is a crucially important prerequisite for haematopoetic stem-cell transplantation and relies heavily on the quality and completeness of immuno-genetic reference sequence databases of allelic variation.ResultsHere, we report on DR2S, an R package that leverages the strengths of two sequencing technologies – the accuracy of next-generation sequencing with the read length of third-generation sequencing technologies like PacBio’s SMRT sequencing or ONT’s nanopore sequencing – to reconstruct fully-phased high-quality full-length haplotype sequences. Although optimised for HLA and KIR genes, DR2S is applicable to all loci with known reference sequences provided that full-length sequencing data is available for analysis. In addition, DR2S integrates supporting tools for easy visualisation and quality control of the reconstructed haplotype to ensure suitability for submission to public allele databases.ConclusionsDR2S is a largely automated workflow designed to create high-quality fully-phased reference allele sequences for highly polymorphic gene regions such as HLA or KIR. It has been used by biologists to successfully characterise and submit more than 500 HLA alleles and more than 500 KIR alleles to the IPD-IMGT/HLA and IPD-KIR databases.

Download Full-text

G2S3: a gene graph-based imputation method for single-cell RNA sequencing data

10.1101/2020.04.01.020586 ◽

2020 ◽

Author(s):

Weimiao Wu ◽

Qile Dai ◽

Yunqing Liu ◽

Xiting Yan ◽

Zuoheng Wang

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Sequencing Data ◽

High Data ◽

Study Gene Expression ◽

Single Cell Rna Sequencing ◽

Novel Method

AbstractSingle-cell RNA sequencing provides an opportunity to study gene expression at single-cell resolution. However, prevalent dropout events result in high data sparsity and noise that may obscure downstream analyses. We propose a novel method, G2S3, that imputes dropouts by borrowing information from adjacent genes in a sparse gene graph learned from gene expression profiles across cells. We applied G2S3 and other existing methods to seven single-cell datasets to compare their performance. Our results demonstrated that G2S3 is superior in recovering true expression levels, identifying cell subtypes, improving differential expression analyses, and recovering gene regulatory relationships, especially for mildly expressed genes.

Download Full-text

The role and impact of high throughput biomimetic measurements in drug discovery

ADMET & DMPK ◽

10.5599/admet.530 ◽

2018 ◽

Vol 6 (2) ◽

pp. 74-84 ◽

Cited By ~ 5

Author(s):

Shenaz Bunally ◽

Robert J Young

Keyword(s):

High Performance Liquid Chromatography ◽

Liquid Chromatography ◽

Drug Discovery ◽

Physicochemical Property ◽

Physicochemical Properties ◽

High Throughput ◽

High Performance ◽

Cost Effective ◽

High Quality ◽

The Right

During the early phase of drug discovery, it is becoming increasingly important to acquire the full physicochemical profile of molecules. For this purpose, there is a strong interest in developing efficient and cost-effective platforms for fast and reliable measurements of physicochemical properties. We have developed an automated physchem platform which ensures that consistent, comprehensive, and high-quality physicochemical property measurements and derived property information for 100's of compounds per week are available alongside potency data at the right time to guide compound progression decisions. We discuss the routine assessments of biomimetic properties using high throughput automated high-performance liquid chromatography (HPLC) platforms, with details of the methods and hardware employed, also with illustrations of the quality and impact of the data generated.

Download Full-text