Utilities for High-Throughput Analysis of B-Cell Clonal Lineages

Journal of Immunology Research ◽

10.1155/2015/323506 ◽

2015 ◽

Vol 2015 ◽

pp. 1-9 ◽

Cited By ~ 7

Author(s):

William D. Lees ◽

Adrian J. Shepherd

Keyword(s):

B Cell ◽

Large Scale ◽

Cell Lineage ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

High Throughput Analysis ◽

Large Scale Analysis ◽

Automated Pipeline ◽

Lineage Trees ◽

Generation Sequencing

There are at present few tools available to assist with the determination and analysis of B-cell lineage trees from next-generation sequencing data. Here we present two utilities that support automated large-scale analysis and the creation of publication-quality results. The tools are available on the web and are also available for download so that they can be integrated into an automated pipeline. Critically, and in contrast to previously published tools, these utilities can be used with any suitable phylogenetic inference method and with any antibody germline library and hence are species-independent.

Download Full-text

Speeding Up Large-Scale Next Generation Sequencing Data Analysis with pBWA

Journal of Applied Bioinformatics & Computational Biology ◽

10.4172/2329-9533.1000101 ◽

2017 ◽

Vol 01 (01) ◽

Cited By ~ 4

Author(s):

Darren Peters ◽

Xuemei Luo ◽

Ke Qiu ◽

Ping Liang

Keyword(s):

Data Analysis ◽

Next Generation Sequencing ◽

Large Scale ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing ◽

Sequencing Data Analysis

Download Full-text

ASCOT identifies key regulators of neuronal subtype-specific splicing

Nature Communications ◽

10.1038/s41467-019-14020-5 ◽

2020 ◽

Vol 11 (1) ◽

Cited By ~ 2

Author(s):

Jonathan P. Ling ◽

Christopher Wilks ◽

Rone Charles ◽

Patrick J. Leavey ◽

Devlina Ghosh ◽

...

Keyword(s):

Rna Splicing ◽

Large Scale ◽

Splice Variants ◽

Next Generation Sequencing Data ◽

Data Sets ◽

Cell Type ◽

Sequencing Data ◽

Large Scale Analysis ◽

Cell Type Specific ◽

Public Archives

AbstractPublic archives of next-generation sequencing data are growing exponentially, but the difficulty of marshaling this data has led to its underutilization by scientists. Here, we present ASCOT, a resource that uses annotation-free methods to rapidly analyze and visualize splice variants across tens of thousands of bulk and single-cell data sets in the public archive. To demonstrate the utility of ASCOT, we identify novel cell type-specific alternative exons across the nervous system and leverage ENCODE and GTEx data sets to study the unique splicing of photoreceptors. We find that PTBP1 knockdown and MSI1 and PCBP2 overexpression are sufficient to activate many photoreceptor-specific exons in HepG2 liver cancer cells. This work demonstrates how large-scale analysis of public RNA-Seq data sets can yield key insights into cell type-specific control of RNA splicing and underscores the importance of considering both annotated and unannotated splicing events.

Download Full-text

NGSPERL: a semi-automated framework for large scale next generation sequencing data analysis

International Journal of Computational Biology and Drug Design ◽

10.1504/ijcbdd.2015.072082 ◽

2015 ◽

Vol 8 (3) ◽

pp. 203

Author(s):

Quanhu Sheng ◽

Shilin Zhao ◽

Mingsheng Guo ◽

Yu Shyr

Keyword(s):

Data Analysis ◽

Next Generation Sequencing ◽

Large Scale ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing ◽

Sequencing Data Analysis

Download Full-text

A fully automated pipeline for quantitative genotype calling from next generation sequencing data in autopolyploids

BMC Bioinformatics ◽

10.1186/s12859-018-2433-6 ◽

2018 ◽

Vol 19 (1) ◽

Cited By ~ 15

Author(s):

Guilherme S. Pereira ◽

Antonio Augusto F. Garcia ◽

Gabriel R. A. Margarido

Keyword(s):

Next Generation Sequencing ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Genotype Calling ◽

Automated Pipeline ◽

Generation Sequencing

Download Full-text

GenoPheno: cataloging large-scale phenotypic and next-generation sequencing data within human datasets

Briefings in Bioinformatics ◽

10.1093/bib/bbaa033 ◽

2020 ◽

Author(s):

Alba Gutiérrez-Sacristán ◽

Carlos De Niz ◽

Cartik Kothari ◽

Sek Won Kong ◽

Kenneth D Mandl ◽

...

Keyword(s):

Next Generation Sequencing ◽

Web Application ◽

Large Scale ◽

Human Subjects ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Phenotypic Data ◽

Data Repositories ◽

Generation Sequencing

Abstract Precision medicine promises to revolutionize treatment, shifting therapeutic approaches from the classical one-size-fits-all to those more tailored to the patient’s individual genomic profile, lifestyle and environmental exposures. Yet, to advance precision medicine’s main objective—ensuring the optimum diagnosis, treatment and prognosis for each individual—investigators need access to large-scale clinical and genomic data repositories. Despite the vast proliferation of these datasets, locating and obtaining access to many remains a challenge. We sought to provide an overview of available patient-level datasets that contain both genotypic data, obtained by next-generation sequencing, and phenotypic data—and to create a dynamic, online catalog for consultation, contribution and revision by the research community. Datasets included in this review conform to six specific inclusion parameters that are: (i) contain data from more than 500 human subjects; (ii) contain both genotypic and phenotypic data from the same subjects; (iii) include whole genome sequencing or whole exome sequencing data; (iv) include at least 100 recorded phenotypic variables per subject; (v) accessible through a website or collaboration with investigators and (vi) make access information available in English. Using these criteria, we identified 30 datasets, reviewed them and provided results in the release version of a catalog, which is publicly available through a dynamic Web application and on GitHub. Users can review as well as contribute new datasets for inclusion (Web: https://avillachlab.shinyapps.io/genophenocatalog/; GitHub: https://github.com/hms-dbmi/GenoPheno-CatalogShiny).

Download Full-text

Proteogenomic strategies for identification of aberrant cancer peptides using large-scale next-generation sequencing data

PROTEOMICS ◽

10.1002/pmic.201400206 ◽

2014 ◽

Vol 14 (23-24) ◽

pp. 2719-2730 ◽

Cited By ~ 45

Author(s):

Sunghee Woo ◽

Seong Won Cha ◽

Seungjin Na ◽

Clark Guest ◽

Tao Liu ◽

...

Keyword(s):

Next Generation Sequencing ◽

Large Scale ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

Download Full-text

DDBJ Read Annotation Pipeline: A Cloud Computing-Based Pipeline for High-Throughput Analysis of Next-Generation Sequencing Data

DNA Research ◽

10.1093/dnares/dst017 ◽

2013 ◽

Vol 20 (4) ◽

pp. 383-390 ◽

Cited By ~ 51

Author(s):

H. Nagasaki ◽

T. Mochizuki ◽

Y. Kodama ◽

S. Saruhashi ◽

S. Morizaki ◽

...

Keyword(s):

Cloud Computing ◽

Next Generation Sequencing ◽

High Throughput ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

High Throughput Analysis ◽

Annotation Pipeline ◽

Throughput Analysis ◽

Generation Sequencing

Download Full-text

Somatic hypermutation analysis for improved identification of B cell clonal families from next-generation sequencing data

PLoS Computational Biology ◽

10.1371/journal.pcbi.1007977 ◽

2020 ◽

Vol 16 (6) ◽

pp. e1007977 ◽

Cited By ~ 1

Author(s):

Nima Nouri ◽

Steven H. Kleinstein

Keyword(s):

Next Generation Sequencing ◽

B Cell ◽

Somatic Hypermutation ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

Download Full-text

rmvPFBAM: Removing Primers from BAM Files Based on Amplicon-Based Next-Generation Sequencing and Cloud Computing When Analyzing Personal Genome Data

Scientific Programming ◽

10.1155/2021/6536470 ◽

2021 ◽

Vol 2021 ◽

pp. 1-6

Author(s):

Yanjun Ma

Keyword(s):

Next Generation Sequencing ◽

False Positive ◽

Large Scale ◽

Genomic Data ◽

Next Generation Sequencing Data ◽

Personal Genome ◽

Next Generation ◽

Sequencing Data ◽

Personal Genomic ◽

Generation Sequencing

Personal genomic data constitute one important part of personal health data. However, due to the large amount of personal genomic data obtained by the next-generation sequencing technology, special tools are needed to analyze these data. In this article, we will explore a tool analyzing cloud-based large-scale genome sequencing data. Analyzing and identifying genomic variations from amplicon-based next-generation sequencing data are necessary for the clinical diagnosis and treatment of cancer patients. When processing the amplicon-based next-generation sequencing data, one essential step is removing primer sequences from the reads to avoid detecting false-positive mutations introduced by nonspecific primer binding and primer extension reactions. At present, the removing primer tools usually discard primer sequences from the FASTQ file instead of BAM file, but this method could cause some downstream analysis problems. Only one tool (BAMClipper) removes primer sequences from BAM files, but it only modified the CIGAR value of the BAM file, and false-positive mutations falling in the primer region could still be detected based on its processed BAM file. So, we developed one cutting primer tool (rmvPFBAM) removing primer sequences from the BAM file, and the mutations detected based on the processed BAM file by rmvPFBAM are highly credible. Besides that, rmvPFBAM runs faster than other tools, such as cutPrimers and BAMClipper.

Download Full-text

Compression of Next-Generation Sequencing Data and of DNA Digital Files

Algorithms ◽

10.3390/a13060151 ◽

2020 ◽

Vol 13 (6) ◽

pp. 151

Author(s):

Bruno Carpentieri

Keyword(s):

Next Generation Sequencing ◽

Dna Sequences ◽

Network Traffic ◽

Large Scale ◽

Genomic Data ◽

Biological Data ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

The increase in memory and in network traffic used and caused by new sequenced biological data has recently deeply grown. Genomic projects such as HapMap and 1000 Genomes have contributed to the very large rise of databases and network traffic related to genomic data and to the development of new efficient technologies. The large-scale sequencing of samples of DNA has brought new attention and produced new research, and thus the interest in the scientific community for genomic data has greatly increased. In a very short time, researchers have developed hardware tools, analysis software, algorithms, private databases, and infrastructures to support the research in genomics. In this paper, we analyze different approaches for compressing digital files generated by Next-Generation Sequencing tools containing nucleotide sequences, and we discuss and evaluate the compression performance of generic compression algorithms by confronting them with a specific system designed by Jones et al. specifically for genomic file compression: Quip. Moreover, we present a simple but effective technique for the compression of DNA sequences in which we only consider the relevant DNA data and experimentally evaluate its performances.

Download Full-text