scholarly journals jackalope: a swift, versatile phylogenomic and high-throughput sequencing simulator

2019 ◽  
Author(s):  
Lucas A. Nell

AbstractHigh-throughput sequencing (HTS) is central to the study of population genomics and has an increasingly important role in constructing phylogenies. Choices in research design for sequencing projects can include a wide range of factors, such as sequencing platform, depth of coverage, and bioinformatic tools. Simulating HTS data better informs these decisions. However, current standalone HTS simulators cannot generate genomic variants under even somewhat complex evolutionary scenarios, which greatly reduces their usefulness for fields such as population genomics and phylogenomics. Here I present the R package jackalope that simply and efficiently simulates (i) variants from reference genomes and (ii) reads from both Illumina and Pacific Biosciences (PacBio) platforms. Genomic variants can be simulated using phylogenies, gene trees, coalescent-simulation output, population-genomic summary statistics, and Variant Call Format (VCF) files. jackalope can simulate single, paired-end, or mate-pair Illumina reads, as well as reads from Pacific Biosciences. These simulations include sequencing errors, mapping qualities, multiplexing, and optical/PCR duplicates. It can read reference genomes from FASTA files and can simulate new ones, and all outputs can be written to standard file formats. jackalope is available for Mac, Windows, and Linux systems.

F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 1466 ◽  
Author(s):  
Erik Fasterius ◽  
Cristina Al-Khalili Szigyarto

High throughput sequencing technologies are flourishing in the biological sciences, enabling unprecedented insights into e.g. genetic variation, but require extensive bioinformatic expertise for the analysis. There is thus a need for simple yet effective software that can analyse both existing and novel data, providing interpretable biological results with little bioinformatic prowess. We present seqCAT, a Bioconductor toolkit for analysing genetic variation in high throughput sequencing data. It is a highly accessible, easy-to-use and well-documented R-package that enables a wide range of researchers to analyse their own and publicly available data, providing biologically relevant conclusions and publication-ready figures. SeqCAT can provide information regarding genetic similarities between an arbitrary number of samples, validate specific variants as well as define functionally similar variant groups for further downstream analyses. Its ease of use, installation, complete data-to-conclusions functionality and the inherent flexibility of the R programming language make seqCAT a powerful tool for variant analyses compared to already existing solutions. A publicly available dataset of liver cancer-derived organoids is analysed herein using the seqCAT package, demonstrating that the organoids are genetically stable. A previously known liver cancer-related mutation is additionally shown to be present in a sample though it was not listed in the original publication. Differences between DNA- and RNA-based variant calls in this dataset are also analysed revealing a high median concordance of 97.5%.


F1000Research ◽  
2019 ◽  
Vol 7 ◽  
pp. 1466 ◽  
Author(s):  
Erik Fasterius ◽  
Cristina Al-Khalili Szigyarto

High throughput sequencing technologies are flourishing in the biological sciences, enabling unprecedented insights into e.g. genetic variation, but require extensive bioinformatic expertise for the analysis. There is thus a need for simple yet effective software that can analyse both existing and novel data, providing interpretable biological results with little bioinformatic prowess. We present seqCAT, a Bioconductor toolkit for analysing genetic variation in high throughput sequencing data. It is a highly accessible, easy-to-use and well-documented R-package that enables a wide range of researchers to analyse their own and publicly available data, providing biologically relevant conclusions and publication-ready figures. SeqCAT can provide information regarding genetic similarities between an arbitrary number of samples, validate specific variants as well as define functionally similar variant groups for further downstream analyses. Its ease of use, installation, complete data-to-conclusions functionality and the inherent flexibility of the R programming language make seqCAT a powerful tool for variant analyses compared to already existing solutions. A publicly available dataset of liver cancer-derived organoids is analysed herein using the seqCAT package, corroborating the original authors' conclusions that the organoids are genetically stable. A previously known liver cancer-related mutation is additionally shown to be present in a sample though it was not listed in the original publication. Differences between DNA- and RNA-based variant calls in this dataset are also analysed revealing a high median concordance of 97.5%. SeqCAT is an open source software under a MIT licence available at https://bioconductor.org/packages/release/bioc/html/seqCAT.html.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Xue Lin ◽  
Yingying Hua ◽  
Shuanglin Gu ◽  
Li Lv ◽  
Xingyu Li ◽  
...  

Abstract Background Genomic localized hypermutation regions were found in cancers, which were reported to be related to the prognosis of cancers. This genomic localized hypermutation is quite different from the usual somatic mutations in the frequency of occurrence and genomic density. It is like a mutations “violent storm”, which is just what the Greek word “kataegis” means. Results There are needs for a light-weighted and simple-to-use toolkit to identify and visualize the localized hypermutation regions in genome. Thus we developed the R package “kataegis” to meet these needs. The package used only three steps to identify the genomic hypermutation regions, i.e., i) read in the variation files in standard formats; ii) calculate the inter-mutational distances; iii) identify the hypermutation regions with appropriate parameters, and finally one step to visualize the nucleotide contents and spectra of both the foci and flanking regions, and the genomic landscape of these regions. Conclusions The kataegis package is available on Bionconductor/Github (https://github.com/flosalbizziae/kataegis), which provides a light-weighted and simple-to-use toolkit for quickly identifying and visualizing the genomic hypermuation regions.


2018 ◽  
Vol 24 (9_suppl) ◽  
pp. 94S-103S ◽  
Author(s):  
Qi Wang ◽  
Lijuan Cao ◽  
Guangying Sheng ◽  
Hongjie Shen ◽  
Jing Ling ◽  
...  

Inherited thrombocytopenia is a group of hereditary diseases with a reduction in platelet count as the main clinical manifestation. Clinically, there is an urgent need for a convenient and rapid diagnosis method. We introduced a high-throughput, next-generation sequencing (NGS) platform into the routine diagnosis of patients with unexplained thrombocytopenia and analyzed the gene sequencing results to evaluate the value of NGS technology in the screening and diagnosis of inherited thrombocytopenia. From a cohort of 112 patients with thrombocytopenia, we screened 43 patients with hereditary features. For the blood samples of these 43 patients, a gene sequencing platform for hemorrhagic and thrombotic diseases comprising 89 genes was used to perform gene detection using NGS technology. When we combined the screening results with clinical features and other findings, 15 (34.9%) of 43patients were diagnosed with inherited thrombocytopenia. In addition, 19 pathogenic variants, including 8 previously unreported variants, were identified in these patients. Through the use of this detection platform, we expect to establish a more effective diagnostic approach to such disorders.


Author(s):  
Anthony Federico ◽  
Stefano Monti

Abstract Summary Geneset enrichment is a popular method for annotating high-throughput sequencing data. Existing tools fall short in providing the flexibility to tackle the varied challenges researchers face in such analyses, particularly when analyzing many signatures across multiple experiments. We present a comprehensive R package for geneset enrichment workflows that offers multiple enrichment, visualization, and sharing methods in addition to novel features such as hierarchical geneset analysis and built-in markdown reporting. hypeR is a one-stop solution to performing geneset enrichment for a wide audience and range of use cases. Availability and implementation The most recent version of the package is available at https://github.com/montilab/hypeR. Contact [email protected] or [email protected]


2016 ◽  
Vol 44 (14) ◽  
pp. e123-e123 ◽  
Author(s):  
Yun Zheng ◽  
Bo Ji ◽  
Renhua Song ◽  
Shengpeng Wang ◽  
Ting Li ◽  
...  

Blood ◽  
2016 ◽  
Vol 127 (23) ◽  
pp. 2791-2803 ◽  
Author(s):  
Ilenia Simeoni ◽  
Jonathan C. Stephens ◽  
Fengyuan Hu ◽  
Sri V. V. Deevi ◽  
Karyn Megy ◽  
...  

Key Points Developed a targeted sequencing platform covering 63 genes linked to heritable bleeding, thrombotic, and platelet disorders. The ThromboGenomics platform provides a sensitive genetic test to obtain molecular diagnoses in patients with a suspected etiology.


mSphere ◽  
2018 ◽  
Vol 3 (2) ◽  
Author(s):  
Rafal Tokarz ◽  
Stephen Sameroff ◽  
Teresa Tagliafierro ◽  
Komal Jain ◽  
Simon H. Williams ◽  
...  

ABSTRACT Ticks carry a wide range of known human and animal pathogens and are postulated to carry others with the potential to cause disease. Here we report a discovery effort wherein unbiased high-throughput sequencing was used to characterize the virome of 2,021 ticks, including Ixodes scapularis ( n = 1,138), Amblyomma americanum ( n = 720), and Dermacentor variabilis ( n = 163), collected in New York, Connecticut, and Virginia in 2015 and 2016. We identified 33 viruses, including 24 putative novel viral species. The most frequently detected viruses were phylogenetically related to members of the Bunyaviridae and Rhabdoviridae families, as well as the recently proposed Chuviridae . Our work expands our understanding of tick viromes and underscores the high viral diversity that is present in ticks. IMPORTANCE The incidence of tick-borne disease is increasing, driven by rapid geographical expansion of ticks and the discovery of new tick-associated pathogens. The examination of the tick microbiome is essential in order to understand the relationship between microbes and their tick hosts and to facilitate the identification of new tick-borne pathogens. Genomic analyses using unbiased high-throughput sequencing platforms have proven valuable for investigations of tick bacterial diversity, but the examination of tick viromes has historically not been well explored. By performing a comprehensive virome analysis of the three primary tick species associated with human disease in the United States, we gained substantial insight into tick virome diversity and can begin to assess a potential role of these viruses in the tick life cycle.


Sign in / Sign up

Export Citation Format

Share Document