jackalope: a swift, versatile phylogenomic and high-throughput sequencing simulator

Mapping Intimacies ◽

10.1101/650747 ◽

2019 ◽

Author(s):

Lucas A. Nell

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Population Genomics ◽

R Package ◽

Gene Trees ◽

Sequencing Platform ◽

Genomic Variants ◽

Pacific Biosciences ◽

Wide Range ◽

Reference Genomes

AbstractHigh-throughput sequencing (HTS) is central to the study of population genomics and has an increasingly important role in constructing phylogenies. Choices in research design for sequencing projects can include a wide range of factors, such as sequencing platform, depth of coverage, and bioinformatic tools. Simulating HTS data better informs these decisions. However, current standalone HTS simulators cannot generate genomic variants under even somewhat complex evolutionary scenarios, which greatly reduces their usefulness for fields such as population genomics and phylogenomics. Here I present the R package jackalope that simply and efficiently simulates (i) variants from reference genomes and (ii) reads from both Illumina and Pacific Biosciences (PacBio) platforms. Genomic variants can be simulated using phylogenies, gene trees, coalescent-simulation output, population-genomic summary statistics, and Variant Call Format (VCF) files. jackalope can simulate single, paired-end, or mate-pair Illumina reads, as well as reads from Pacific Biosciences. These simulations include sequencing errors, mapping qualities, multiplexing, and optical/PCR duplicates. It can read reference genomes from FASTA files and can simulate new ones, and all outputs can be written to standard file formats. jackalope is available for Mac, Windows, and Linux systems.

Download Full-text

seqCAT: a Bioconductor R-package for variant analysis of high throughput sequencing data

F1000Research ◽

10.12688/f1000research.16083.1 ◽

2018 ◽

Vol 7 ◽

pp. 1466 ◽

Cited By ~ 2

Author(s):

Erik Fasterius ◽

Cristina Al-Khalili Szigyarto

Keyword(s):

Genetic Variation ◽

Liver Cancer ◽

High Throughput ◽

High Throughput Sequencing ◽

R Package ◽

Ease Of Use ◽

Sequencing Data ◽

Dna And Rna ◽

High Throughput Sequencing Data ◽

Wide Range

High throughput sequencing technologies are flourishing in the biological sciences, enabling unprecedented insights into e.g. genetic variation, but require extensive bioinformatic expertise for the analysis. There is thus a need for simple yet effective software that can analyse both existing and novel data, providing interpretable biological results with little bioinformatic prowess. We present seqCAT, a Bioconductor toolkit for analysing genetic variation in high throughput sequencing data. It is a highly accessible, easy-to-use and well-documented R-package that enables a wide range of researchers to analyse their own and publicly available data, providing biologically relevant conclusions and publication-ready figures. SeqCAT can provide information regarding genetic similarities between an arbitrary number of samples, validate specific variants as well as define functionally similar variant groups for further downstream analyses. Its ease of use, installation, complete data-to-conclusions functionality and the inherent flexibility of the R programming language make seqCAT a powerful tool for variant analyses compared to already existing solutions. A publicly available dataset of liver cancer-derived organoids is analysed herein using the seqCAT package, demonstrating that the organoids are genetically stable. A previously known liver cancer-related mutation is additionally shown to be present in a sample though it was not listed in the original publication. Differences between DNA- and RNA-based variant calls in this dataset are also analysed revealing a high median concordance of 97.5%.

Download Full-text

seqCAT: a Bioconductor R-package for variant analysis of high throughput sequencing data

F1000Research ◽

10.12688/f1000research.16083.2 ◽

2019 ◽

Vol 7 ◽

pp. 1466 ◽

Cited By ~ 1

Author(s):

Erik Fasterius ◽

Cristina Al-Khalili Szigyarto

Keyword(s):

Genetic Variation ◽

Liver Cancer ◽

High Throughput ◽

High Throughput Sequencing ◽

R Package ◽

Ease Of Use ◽

Sequencing Data ◽

Dna And Rna ◽

High Throughput Sequencing Data ◽

Wide Range

High throughput sequencing technologies are flourishing in the biological sciences, enabling unprecedented insights into e.g. genetic variation, but require extensive bioinformatic expertise for the analysis. There is thus a need for simple yet effective software that can analyse both existing and novel data, providing interpretable biological results with little bioinformatic prowess. We present seqCAT, a Bioconductor toolkit for analysing genetic variation in high throughput sequencing data. It is a highly accessible, easy-to-use and well-documented R-package that enables a wide range of researchers to analyse their own and publicly available data, providing biologically relevant conclusions and publication-ready figures. SeqCAT can provide information regarding genetic similarities between an arbitrary number of samples, validate specific variants as well as define functionally similar variant groups for further downstream analyses. Its ease of use, installation, complete data-to-conclusions functionality and the inherent flexibility of the R programming language make seqCAT a powerful tool for variant analyses compared to already existing solutions. A publicly available dataset of liver cancer-derived organoids is analysed herein using the seqCAT package, corroborating the original authors' conclusions that the organoids are genetically stable. A previously known liver cancer-related mutation is additionally shown to be present in a sample though it was not listed in the original publication. Differences between DNA- and RNA-based variant calls in this dataset are also analysed revealing a high median concordance of 97.5%. SeqCAT is an open source software under a MIT licence available at https://bioconductor.org/packages/release/bioc/html/seqCAT.html.

Download Full-text

kataegis: an R package for identification and visualization of the genomic localized hypermutation regions using high-throughput sequencing

BMC Genomics ◽

10.1186/s12864-021-07696-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Xue Lin ◽

Yingying Hua ◽

Shuanglin Gu ◽

Li Lv ◽

Xingyu Li ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Somatic Mutations ◽

R Package ◽

Frequency Of Occurrence ◽

Link Type ◽

Genomic Landscape ◽

One Step ◽

Flanking Regions

Abstract Background Genomic localized hypermutation regions were found in cancers, which were reported to be related to the prognosis of cancers. This genomic localized hypermutation is quite different from the usual somatic mutations in the frequency of occurrence and genomic density. It is like a mutations “violent storm”, which is just what the Greek word “kataegis” means. Results There are needs for a light-weighted and simple-to-use toolkit to identify and visualize the localized hypermutation regions in genome. Thus we developed the R package “kataegis” to meet these needs. The package used only three steps to identify the genomic hypermutation regions, i.e., i) read in the variation files in standard formats; ii) calculate the inter-mutational distances; iii) identify the hypermutation regions with appropriate parameters, and finally one step to visualize the nucleotide contents and spectra of both the foci and flanking regions, and the genomic landscape of these regions. Conclusions The kataegis package is available on Bionconductor/Github (https://github.com/flosalbizziae/kataegis), which provides a light-weighted and simple-to-use toolkit for quickly identifying and visualizing the genomic hypermuation regions.

Download Full-text

Application of High-Throughput Sequencing in the Diagnosis of Inherited Thrombocytopenia

Clinical and Applied Thrombosis/Hemostasis ◽

10.1177/1076029618790696 ◽

2018 ◽

Vol 24 (9_suppl) ◽

pp. 94S-103S ◽

Cited By ~ 3

Author(s):

Qi Wang ◽

Lijuan Cao ◽

Guangying Sheng ◽

Hongjie Shen ◽

Jing Ling ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Gene Sequencing ◽

Hereditary Diseases ◽

Sequencing Platform ◽

Pathogenic Variants ◽

Diagnosis Method ◽

Inherited Thrombocytopenia ◽

Thrombotic Diseases ◽

Next Generation Sequencing Ngs

Inherited thrombocytopenia is a group of hereditary diseases with a reduction in platelet count as the main clinical manifestation. Clinically, there is an urgent need for a convenient and rapid diagnosis method. We introduced a high-throughput, next-generation sequencing (NGS) platform into the routine diagnosis of patients with unexplained thrombocytopenia and analyzed the gene sequencing results to evaluate the value of NGS technology in the screening and diagnosis of inherited thrombocytopenia. From a cohort of 112 patients with thrombocytopenia, we screened 43 patients with hereditary features. For the blood samples of these 43 patients, a gene sequencing platform for hemorrhagic and thrombotic diseases comprising 89 genes was used to perform gene detection using NGS technology. When we combined the screening results with clinical features and other findings, 15 (34.9%) of 43patients were diagnosed with inherited thrombocytopenia. In addition, 19 pathogenic variants, including 8 previously unreported variants, were identified in these patients. Through the use of this detection platform, we expect to establish a more effective diagnostic approach to such disorders.

Download Full-text

hypeR: an R package for geneset enrichment workflows

Bioinformatics ◽

10.1093/bioinformatics/btz700 ◽

2019 ◽

Cited By ~ 3

Author(s):

Anthony Federico ◽

Stefano Monti

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

R Package ◽

Use Cases ◽

Sequencing Data ◽

Wide Audience ◽

Popular Method ◽

High Throughput Sequencing Data ◽

One Stop ◽

Recent Version

Abstract Summary Geneset enrichment is a popular method for annotating high-throughput sequencing data. Existing tools fall short in providing the flexibility to tackle the varied challenges researchers face in such analyses, particularly when analyzing many signatures across multiple experiments. We present a comprehensive R package for geneset enrichment workflows that offers multiple enrichment, visualization, and sharing methods in addition to novel features such as hierarchical geneset analysis and built-in markdown reporting. hypeR is a one-stop solution to performing geneset enrichment for a wide audience and range of use cases. Availability and implementation The most recent version of the package is available at https://github.com/montilab/hypeR. Contact [email protected] or [email protected]

Download Full-text

Accurate detection for a wide range of mutation and editing sites of microRNAs from small RNA high-throughput sequencing profiles

Nucleic Acids Research ◽

10.1093/nar/gkw471 ◽

2016 ◽

Vol 44 (14) ◽

pp. e123-e123 ◽

Cited By ~ 24

Author(s):

Yun Zheng ◽

Bo Ji ◽

Renhua Song ◽

Shengpeng Wang ◽

Ting Li ◽

...

Keyword(s):

High Throughput ◽

Small Rna ◽

High Throughput Sequencing ◽

Accurate Detection ◽

Wide Range

Download Full-text

HTSSIP: An R package for analysis of high throughput sequencing data from nucleic acid stable isotope probing (SIP) experiments

PLoS ONE ◽

10.1371/journal.pone.0189616 ◽

2018 ◽

Vol 13 (1) ◽

pp. e0189616 ◽

Cited By ~ 13

Author(s):

Nicholas D. Youngblut ◽

Samuel E. Barnett ◽

Daniel H. Buckley

Keyword(s):

Nucleic Acid ◽

Stable Isotope ◽

High Throughput ◽

High Throughput Sequencing ◽

R Package ◽

Stable Isotope Probing ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Acid Stable

Download Full-text

High-throughput sequencing platform established by sensor measurement technology for the detection of TSC1 and TSC2 genes in prenatal diagnosis

Measurement ◽

10.1016/j.measurement.2020.107828 ◽

2020 ◽

Vol 160 ◽

pp. 107828

Author(s):

Qiuxia Xu ◽

Min Wang ◽

Sujing Huang ◽

Lin Xu ◽

Hongqiong Guan ◽

...

Keyword(s):

Prenatal Diagnosis ◽

High Throughput ◽

High Throughput Sequencing ◽

Measurement Technology ◽

Sequencing Platform ◽

Sensor Measurement

Download Full-text

A high-throughput sequencing test for diagnosing inherited bleeding, thrombotic, and platelet disorders

Blood ◽

10.1182/blood-2015-12-688267 ◽

2016 ◽

Vol 127 (23) ◽

pp. 2791-2803 ◽

Cited By ~ 102

Author(s):

Ilenia Simeoni ◽

Jonathan C. Stephens ◽

Fengyuan Hu ◽

Sri V. V. Deevi ◽

Karyn Megy ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Genetic Test ◽

Targeted Sequencing ◽

Sequencing Platform ◽

Key Points ◽

Platelet Disorders

Key Points Developed a targeted sequencing platform covering 63 genes linked to heritable bleeding, thrombotic, and platelet disorders. The ThromboGenomics platform provides a sensitive genetic test to obtain molecular diagnoses in patients with a suspected etiology.

Download Full-text

Identification of Novel Viruses in Amblyomma americanum , Dermacentor variabilis , and Ixodes scapularis Ticks

mSphere ◽

10.1128/msphere.00614-17 ◽

2018 ◽

Vol 3 (2) ◽

Cited By ~ 35

Author(s):

Rafal Tokarz ◽

Stephen Sameroff ◽

Teresa Tagliafierro ◽

Komal Jain ◽

Simon H. Williams ◽

...

Keyword(s):

New York ◽

High Throughput ◽

High Throughput Sequencing ◽

Ixodes Scapularis ◽

The United States ◽

Dermacentor Variabilis ◽

Amblyomma Americanum ◽

Animal Pathogens ◽

Wide Range ◽

Sequencing Platforms

ABSTRACT Ticks carry a wide range of known human and animal pathogens and are postulated to carry others with the potential to cause disease. Here we report a discovery effort wherein unbiased high-throughput sequencing was used to characterize the virome of 2,021 ticks, including Ixodes scapularis ( n = 1,138), Amblyomma americanum ( n = 720), and Dermacentor variabilis ( n = 163), collected in New York, Connecticut, and Virginia in 2015 and 2016. We identified 33 viruses, including 24 putative novel viral species. The most frequently detected viruses were phylogenetically related to members of the Bunyaviridae and Rhabdoviridae families, as well as the recently proposed Chuviridae . Our work expands our understanding of tick viromes and underscores the high viral diversity that is present in ticks. IMPORTANCE The incidence of tick-borne disease is increasing, driven by rapid geographical expansion of ticks and the discovery of new tick-associated pathogens. The examination of the tick microbiome is essential in order to understand the relationship between microbes and their tick hosts and to facilitate the identification of new tick-borne pathogens. Genomic analyses using unbiased high-throughput sequencing platforms have proven valuable for investigations of tick bacterial diversity, but the examination of tick viromes has historically not been well explored. By performing a comprehensive virome analysis of the three primary tick species associated with human disease in the United States, we gained substantial insight into tick virome diversity and can begin to assess a potential role of these viruses in the tick life cycle.

Download Full-text