The State of Software in Evolutionary Biology

Mapping Intimacies ◽

10.1101/031930 ◽

2015 ◽

Cited By ~ 2

Author(s):

Diego Darriba ◽

Tomas Flouri ◽

Alexandros Stamatakis

Keyword(s):

Software Quality ◽

Evolutionary Biology ◽

Current Data ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

Scientific Software ◽

Code Quality ◽

Generation Sequencing ◽

Highly Cited

With Next Generation Sequencing Data (NGS) coming off age and being routinely used, evolutionary biology is transforming into a data-driven science. As a consequence, researchers have to rely on a growing number of increasingly complex software. All widely used tools in our field have grown considerably, in terms of the number of features as well as lines of code. In addition, analysis pipelines now include substantially more components than 5-10 years ago. A topic that has received little attention in this context is the code quality of widely used codes. Unfortunately, the majority of users tend to blindly trust software and the results it produces. To this end, we assessed the code quality of 15 highly cited tools (e.g., MrBayes, MAFFT, SweepFinder etc.) from the broader area of evolutionary biology that are used in current data analysis pipelines. We also discuss widely unknown problems associated with floating point arithmetics for representing real numbers on computer systems. Since, the software quality of the tools we analyzed is rather mediocre, we provide a list of best practices for improving the quality of existing tools, but also list techniques that can be deployed for developing reliable, high quality scientific software from scratch. Finally, we also discuss journal and science policy as well as funding issues that need to be addressed for improving software quality as well as ensuring support for developing new and maintaining existing software. Our intention is to raise the awareness of the community regarding software quality issues and to emphasize the substantial lack of funding for scientific software development.

Download Full-text

A new strategy for enhancing imputation quality of rare variants from next-generation sequencing data via combining SNP and exome chip data

BMC Genomics ◽

10.1186/s12864-015-2192-y ◽

2015 ◽

Vol 16 (1) ◽

Cited By ~ 6

Author(s):

Young Jin Kim ◽

◽

Juyoung Lee ◽

Bong-Jo Kim ◽

Taesung Park

Keyword(s):

Next Generation Sequencing ◽

Rare Variants ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Chip Data ◽

New Strategy ◽

Exome Chip ◽

Generation Sequencing

Download Full-text

SeqSQC: A Bioconductor Package for Evaluating the Sample Quality of Next-generation Sequencing Data

Genomics Proteomics & Bioinformatics ◽

10.1016/j.gpb.2018.07.006 ◽

2019 ◽

Vol 17 (2) ◽

pp. 211-218 ◽

Cited By ~ 1

Author(s):

Qian Liu ◽

Qiang Hu ◽

Song Yao ◽

Marilyn L. Kwan ◽

Janise M. Roh ◽

...

Keyword(s):

Next Generation Sequencing ◽

Next Generation Sequencing Data ◽

Bioconductor Package ◽

Next Generation ◽

Sequencing Data ◽

Sample Quality ◽

Generation Sequencing

Download Full-text

Systematic Comparison of the Performances of De Novo Genome Assemblers for Oxford Nanopore Technology Reads From Piroplasm

Frontiers in Cellular and Infection Microbiology ◽

10.3389/fcimb.2021.696669 ◽

2021 ◽

Vol 11 ◽

Author(s):

Jinming Wang ◽

Kai Chen ◽

Qiaoyun Ren ◽

Ying Zhang ◽

Junlong Liu ◽

...

Keyword(s):

De Novo ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

Coverage Depth ◽

Sequence Coverage ◽

Long Reads ◽

Oxford Nanopore ◽

Generation Sequencing ◽

Easy Operation

BackgroundEmerging long reads sequencing technology has greatly changed the landscape of whole-genome sequencing, enabling scientists to contribute to decoding the genetic information of non-model species. The sequences generated by PacBio or Oxford Nanopore Technology (ONT) be assembled de novo before further analyses. Some genome de novo assemblers have been developed to assemble long reads generated by ONT. The performance of these assemblers has not been completely investigated. However, genome assembly is still a challenging task.Methods and ResultsWe systematically evaluated the performance of nine de novo assemblers for ONT on different coverage depth datasets. Several metrics were measured to determine the performance of these tools, including N50 length, sequence coverage, runtime, easy operation, accuracy of genome and genomic completeness in varying depths of coverage. Based on the results of our assessments, the performances of these tools are summarized as follows: 1) Coverage depth has a significant effect on genome quality; 2) The level of contiguity of the assembled genome varies dramatically among different de novo tools; 3) The correctness of an assembled genome is closely related to the completeness of the genome. More than 30× nanopore data can be assembled into a relatively complete genome, the quality of which is highly dependent on the polishing using next generation sequencing data.ConclusionConsidering the results of our investigation, the advantage and disadvantage of each tool are summarized and guidelines of selecting assembly tools are provided under specific conditions.

Download Full-text

Faculty Opinions recommendation of VarWalker: personalized mutation network analysis of putative cancer genes from next-generation sequencing data.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.718272765.793499663 ◽

2014 ◽

Author(s):

Gary Bader ◽

Mohamed Helmy

Keyword(s):

Next Generation Sequencing ◽

Network Analysis ◽

Next Generation Sequencing Data ◽

Cancer Genes ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

Download Full-text

Faculty Opinions recommendation of Bioinformatory-assisted analysis of next-generation sequencing data for precision medicine in pancreatic cancer.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.727775566.793536095 ◽

2017 ◽

Author(s):

Steve Pereira

Keyword(s):

Pancreatic Cancer ◽

Next Generation Sequencing ◽

Precision Medicine ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Assisted Analysis ◽

Generation Sequencing

Download Full-text

NGSremix: A software tool for estimating pairwise relatedness between admixed individuals from next-generation sequencing data

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab174 ◽

2021 ◽

Author(s):

Anne Krogh Nøhr ◽

Kristian Hanghøj ◽

Genis Garcia Erill ◽

Zilong Li ◽

Ida Moltke ◽

...

Keyword(s):

Next Generation Sequencing ◽

Genetic Research ◽

Likelihood Estimation ◽

Software Tool ◽

Estimation Methods ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Ngs Data ◽

Generation Sequencing

Abstract Estimation of relatedness between pairs of individuals is important in many genetic research areas. When estimating relatedness, it is important to account for admixture if this is present. However, the methods that can account for admixture are all based on genotype data as input, which is a problem for low-depth next-generation sequencing (NGS) data from which genotypes are called with high uncertainty. Here we present a software tool, NGSremix, for maximum likelihood estimation of relatedness between pairs of admixed individuals from low-depth NGS data, which takes the uncertainty of the genotypes into account via genotype likelihoods. Using both simulated and real NGS data for admixed individuals with an average depth of 4x or below we show that our method works well and clearly outperforms all the commonly used state-of-the-art relatedness estimation methods PLINK, KING, relateAdmix, and ngsRelate that all perform quite poorly. Hence, NGSremix is a useful new tool for estimating relatedness in admixed populations from low-depth NGS data. NGSremix is implemented in C/C ++ in a multi-threaded software and is freely available on Github https://github.com/KHanghoj/NGSremix.

Download Full-text

recoup: flexible and versatile signal visualization from next generation sequencing

BMC Bioinformatics ◽

10.1186/s12859-020-03902-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Panagiotis Moulos

Keyword(s):

Next Generation Sequencing ◽

Next Generation Sequencing Data ◽

Special Focus ◽

Next Generation ◽

Sequencing Data ◽

User Friendliness ◽

Computational Environment ◽

Level Data ◽

Data Signal ◽

Generation Sequencing

Abstract Background The relentless continuing emergence of new genomic sequencing protocols and the resulting generation of ever larger datasets continue to challenge the meaningful summarization and visualization of the underlying signal generated to answer important qualitative and quantitative biological questions. As a result, the need for novel software able to reliably produce quick, comprehensive, and easily repeatable genomic signal visualizations in a user-friendly manner is rapidly re-emerging. Results recoup is a Bioconductor package for quick, flexible, versatile, and accurate visualization of genomic coverage profiles generated from Next Generation Sequencing data. Coupled with a database of precalculated genomic regions for multiple organisms, recoup offers processing mechanisms for quick, efficient, and multi-level data interrogation with minimal effort, while at the same time creating publication-quality visualizations. Special focus is given on plot reusability, reproducibility, and real-time exploration and formatting options, operations rarely supported in similar visualization tools in a profound way. recoup was assessed using several qualitative user metrics and found to balance the tradeoff between important package features, including speed, visualization quality, overall friendliness, and the reusability of the results with minimal additional calculations. Conclusion While some existing solutions for the comprehensive visualization of NGS data signal offer satisfying results, they are often compromised regarding issues such as effortless tracking of processing and preparation steps under a common computational environment, visualization quality and user friendliness. recoup is a unique package presenting a balanced tradeoff for a combination of assessment criteria while remaining fast and friendly.

Download Full-text

Clinical Implications of Copy Number Alteration Detection using Panel-Based Next-Generation Sequencing Data in Myelodysplastic Syndrome

Leukemia Research ◽

10.1016/j.leukres.2021.106540 ◽

2021 ◽

pp. 106540

Author(s):

Yoo-Jin Kim ◽

Seung-Hyun Jung ◽

Eun-Hye Hur ◽

Eun-Ji Choi ◽

Kyoo-Hyung Lee ◽

...

Keyword(s):

Next Generation Sequencing ◽

Myelodysplastic Syndrome ◽

Copy Number ◽

Copy Number Alteration ◽

Next Generation Sequencing Data ◽

Clinical Implications ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

Download Full-text

BIGpre: A Quality Assessment Package for Next-Generation Sequencing Data

Genomics Proteomics & Bioinformatics ◽

10.1016/s1672-0229(11)60027-2 ◽

2011 ◽

Vol 9 (6) ◽

pp. 238-244 ◽

Cited By ~ 21

Author(s):

Tongwu Zhang ◽

Yingfeng Luo ◽

Kan Liu ◽

Linlin Pan ◽

Bing Zhang ◽

...

Keyword(s):

Next Generation Sequencing ◽

Quality Assessment ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

Download Full-text

Improved variant discovery through local re-alignment of short-read next-generation sequencing data using SRMA

Genome Biology ◽

10.1186/gb-2010-11-10-r99 ◽

2010 ◽

Vol 11 (10) ◽

Cited By ~ 53

Author(s):

Nils Homer ◽

Stanley F Nelson

Keyword(s):

Next Generation Sequencing ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Short Read ◽

Variant Discovery ◽

Generation Sequencing

Download Full-text