Physiological RNA dynamics in RNA-Seq analysis

Zhongneng Xu; Shuichi Asakawa

doi:10.1093/bib/bby045

Physiological RNA dynamics in RNA-Seq analysis

Briefings in Bioinformatics ◽

10.1093/bib/bby045 ◽

2018 ◽

Vol 20 (5) ◽

pp. 1725-1733 ◽

Cited By ~ 1

Author(s):

Zhongneng Xu ◽

Shuichi Asakawa

Keyword(s):

Rna Degradation ◽

Decay Rates ◽

Sequence Length ◽

Nucleotide Polymorphisms ◽

Rna Seq ◽

Sequencing Data ◽

Single Nucleotide ◽

Rna Dynamics ◽

Rna Quantification ◽

Rna Accumulation

Abstract Physiological RNA dynamics cause problems in transcriptome analysis. Physiological RNA accumulation affects the analysis of RNA quantification, and physiological RNA degradation affects the analysis of the RNA sequence length, feature site and quantification. In the present article, we review the effects of physiological degradation and accumulation of RNA on analysing RNA sequencing data. Physiological RNA accumulation and degradation probably led to such phenomena as incorrect estimations of transcription quantification, differential expressions, co-expressions, RNA decay rates, alternative splicing, boundaries of transcription, novel genes, new single-nucleotide polymorphisms, small RNAs and gene fusion. Thus, the transcriptomic data obtained up to date warrant further scrutiny. New and improved techniques and bioinformatics software are needed to produce accurate data in transcriptome research.

Download Full-text

BAMixChecker: an automated checkup tool for matched sample pairs in NGS cohort

Bioinformatics ◽

10.1093/bioinformatics/btz479 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4806-4808 ◽

Cited By ~ 2

Author(s):

Hein Chun ◽

Sangwoo Kim

Keyword(s):

Genomic Analysis ◽

Supplementary Information ◽

Nucleotide Polymorphisms ◽

Rna Seq ◽

Sequencing Data ◽

Single Nucleotide ◽

Frequent Problem ◽

Generation Sequencing ◽

User Intervention ◽

Genotype Concordance

Abstract Summary Mislabeling in the process of next generation sequencing is a frequent problem that can cause an entire genomic analysis to fail, and a regular cohort-level checkup is needed to ensure that it has not occurred. We developed a new, automated tool (BAMixChecker) that accurately detects sample mismatches from a given BAM file cohort with minimal user intervention. BAMixChecker uses a flexible, data-specific set of single-nucleotide polymorphisms and detects orphan (unpaired) and swapped (mispaired) samples based on genotype-concordance score and entropy-based file name analysis. BAMixChecker shows ∼100% accuracy in real WES, RNA-Seq and targeted sequencing data cohorts, even for small panels (<50 genes). BAMixChecker provides an HTML-style report that graphically outlines the sample matching status in tables and heatmaps, with which users can quickly inspect any mismatch events. Availability and implementation BAMixChecker is available at https://github.com/heinc1010/BAMixChecker Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Machine learning as an effective method for identifying true SNPs in polyploid plants

10.1101/274407 ◽

2018 ◽

Cited By ~ 1

Author(s):

Walid Korani ◽

Josh P. Clevenger ◽

Ye Chu ◽

Peggy Ozias-Akins

Keyword(s):

Machine Learning ◽

Sequence Data ◽

Snp Array ◽

Real Data ◽

Large Set ◽

Nucleotide Polymorphisms ◽

Rna Seq ◽

Sequencing Data ◽

Single Nucleotide ◽

Accuracy Rates

AbstractSingle Nucleotide Polymorphisms (SNPs) have many advantages as molecular markers since they are ubiquitous and co-dominant. However, the discovery of true SNPs especially in polyploid species is difficult. Peanut is an allopolyploid, which has a very low rate of true SNP calling. A large set of true and false SNPs identified from the Arachis 58k Affymetrix array was leveraged to train machine learning models to select true SNPs straight from sequence data. These models achieved accuracy rates of above 80% using real peanut RNA-seq and whole genome shotgun (WGS) re-sequencing data, which is higher than previously reported for polyploids. A 48K SNP array, Axiom Arachis2, was designed using the approach which revealed 75% accuracy of calling SNPs from different tetraploid peanut genotypes. Using the method to simulate SNP variation in peanut, cotton, wheat, and strawberry, we show that models built with our parameter sets achieve above 98% accuracy in selecting true SNPs. Additionally, models built with simulated genotypes were able to select true SNPs at above 80% accuracy using real peanut data, demonstrating that our model can be used even if real data are not available to train the models. This work demonstrates an effective approach for calling highly reliable SNPs from polyploids using machine learning. A novel tool was developed for predicting true SNPs from sequence data, designated as SNP-ML (SNP-Machine Learning, pronounced “snip mill”), using the described models. SNP-ML additionally provides functionality to train new models not included in this study for customized use, designated SNP-MLer (SNP-Machine Learner, pronounced “snip miller”). SNP-ML is freely available for public use.

Download Full-text

Risk prediction and marker selection in nonsynonymous single nucleotide polymorphisms using whole genome sequencing data

Animal Cells and Systems ◽

10.1080/19768354.2020.1860125 ◽

2020 ◽

Vol 24 (6) ◽

pp. 321-328

Author(s):

Young-Sup Lee ◽

KyeongHye Won ◽

Donghyun Shin ◽

Jae-Don Oh

Keyword(s):

Single Nucleotide Polymorphisms ◽

Whole Genome Sequencing ◽

Risk Prediction ◽

Genome Sequencing ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Sequencing Data ◽

Single Nucleotide ◽

Marker Selection

Download Full-text

Investigation of allele specific expression in various tissues of broiler chickens using the detection tool VADT

Scientific Reports ◽

10.1038/s41598-021-83459-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

M. Joseph Tomlinson ◽

Shawn W. Polson ◽

Jing Qiu ◽

Juniper A. Lake ◽

William Lee ◽

...

Keyword(s):

Broiler Chickens ◽

Nucleotide Polymorphisms ◽

Rna Seq ◽

Specific Expression ◽

Single Nucleotide ◽

Allele Specific Expression ◽

Detection Tool ◽

Commercial Broiler ◽

Significant Phenomenon ◽

Allele Specific

AbstractDifferential abundance of allelic transcripts in a diploid organism, commonly referred to as allele specific expression (ASE), is a biologically significant phenomenon and can be examined using single nucleotide polymorphisms (SNPs) from RNA-seq. Quantifying ASE aids in our ability to identify and understand cis-regulatory mechanisms that influence gene expression, and thereby assist in identifying causal mutations. This study examines ASE in breast muscle, abdominal fat, and liver of commercial broiler chickens using variants called from a large sub-set of the samples (n = 68). ASE analysis was performed using a custom software called VCF ASE Detection Tool (VADT), which detects ASE of biallelic SNPs using a binomial test. On average ~ 174,000 SNPs in each tissue passed our filtering criteria and were considered informative, of which ~ 24,000 (~ 14%) showed ASE. Of all ASE SNPs, only 3.7% exhibited ASE in all three tissues, with ~ 83% showing ASE specific to a single tissue. When ASE genes (genes containing ASE SNPs) were compared between tissues, the overlap among all three tissues increased to 20.1%. Our results indicate that ASE genes show tissue-specific enrichment patterns, but all three tissues showed enrichment for pathways involved in translation.

Download Full-text

Genome-wide profiling in colorectal cancer identifies PHF19 and TBC1D16 as oncogenic super enhancers

Nature Communications ◽

10.1038/s41467-021-26600-5 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Qing-Lan Li ◽

Xiang Lin ◽

Ya-Li Yu ◽

Lin Chen ◽

Qi-Xin Hu ◽

...

Keyword(s):

Colorectal Cancer ◽

Colorectal Cancer Patient ◽

Nucleotide Polymorphisms ◽

Rna Seq ◽

Motif Analysis ◽

Single Nucleotide ◽

Super Enhancer ◽

Genome Wide ◽

Functional Factors ◽

Cancer Tissues

AbstractColorectal cancer is one of the most common cancers in the world. Although genomic mutations and single nucleotide polymorphisms have been extensively studied, the epigenomic status in colorectal cancer patient tissues remains elusive. Here, together with genomic and transcriptomic analysis, we use ChIP-Seq to profile active enhancers at the genome wide level in colorectal cancer paired patient tissues (tumor and adjacent tissues from the same patients). In total, we sequence 73 pairs of colorectal cancer tissues and generate 147 H3K27ac ChIP-Seq, 144 RNA-Seq, 147 whole genome sequencing and 86 H3K4me3 ChIP-Seq samples. Our analysis identifies 5590 gain and 1100 lost variant enhancer loci in colorectal cancer, and 334 gain and 121 lost variant super enhancer loci. Multiple key transcription factors in colorectal cancer are predicted with motif analysis and core regulatory circuitry analysis. Further experiments verify the function of the super enhancers governing PHF19 and TBC1D16 in regulating colorectal cancer tumorigenesis, and KLF3 is identified as an oncogenic transcription factor in colorectal cancer. Taken together, our work provides an important epigenomic resource and functional factors for epigenetic studies in colorectal cancer.

Download Full-text

Ancient introgression between distantly related white oaks (Quercus sect Quercus) shows evidence of climate-associated asymmetric gene exchange

Journal of Heredity ◽

10.1093/jhered/esab053 ◽

2021 ◽

Author(s):

Scott T O’Donnell ◽

Sorel T Fitz-Gibbon ◽

Victoria L Sork

Keyword(s):

Gene Flow ◽

Genotyping By Sequencing ◽

Nucleotide Polymorphisms ◽

Sequencing Data ◽

Single Nucleotide ◽

Scrub Oak ◽

Population Genetic Inference ◽

Genetic Inference ◽

California Floristic Province ◽

Python Package

Abstract Ancient introgression can be an important source of genetic variation that shapes the evolution and diversification of many taxa. Here, we estimate the timing, direction and extent of gene flow between two distantly related oak species in the same section (Quercus sect. Quercus). We estimated these demographic events using genotyping by sequencing data (GBS), which generated 25,702 single nucleotide polymorphisms (SNPs) for 24 individuals of California scrub oak (Quercus berberidifolia) and 23 individuals of Engelmann oak (Q. engelmannii). We tested several scenarios involving gene flow between these species using the diffusion approximation-based population genetic inference framework and model-testing approach of the Python package DaDi. We found that the most likely demographic scenario includes a bottleneck in Q. engelmannii that coincides with asymmetric gene flow from Q. berberidifolia into Q. engelmannii. Given that the timing of this gene flow coincides with the advent of a Mediterranean-type climate in the California Floristic Province, we propose that changing precipitation patterns and seasonality may have favored the introgression of climate-associated genes from the endemic into the non-endemic California oak.

Download Full-text

Bacsnp: Using Single Nucleotide Polymorphism (SNP) Specificities and Frequencies to Identify Genotype Composition in Baculoviruses

Viruses ◽

10.3390/v12060625 ◽

2020 ◽

Vol 12 (6) ◽

pp. 625 ◽

Cited By ~ 1

Author(s):

Jörg T. Wennmann ◽

Jiangbin Fan ◽

Johannes A. Jehle

Keyword(s):

Nucleotide Polymorphisms ◽

Downstream Process ◽

Sequencing Data ◽

Nucleotide Polymorphism ◽

Single Nucleotide ◽

Natural Isolates ◽

Genotype Composition ◽

R Programming ◽

Dsdna Viruses ◽

Virus Isolates

Natural isolates of baculoviruses (as well as other dsDNA viruses) generally consist of homogenous or heterogenous populations of genotypes. The number and positions of single nucleotide polymorphisms (SNPs) from sequencing data are often used as suitable markers to study their genotypic composition. Identifying and assigning the specificities and frequencies of SNPs from high-throughput genome sequencing data can be very challenging, especially when comparing between several sequenced isolates or samples. In this study, the new tool “bacsnp”, written in R programming langue, was developed as a downstream process, enabling the detection of SNP specificities across several virus isolates. The basis of this analysis is the use of a common, closely related reference to which the sequencing reads of an isolate are mapped. Thereby, the specificities of SNPs are linked and their frequencies can be used to analyze the genetic composition across the sequenced isolate. Here, the downstream process and analysis of detected SNP positions is demonstrated on the example of three baculovirus isolates showing the fast and reliable detection of a mixed sequenced sample.

Download Full-text

1202. Multimodal Sequencing of a Clonal Case Cluster of Carbapenem-Resistant Citrobacter Reveals Unexpectedly Rapid Dynamics of KPC3-Containing Plasmids

Open Forum Infectious Diseases ◽

10.1093/ofid/ofy210.1035 ◽

2018 ◽

Vol 5 (suppl_1) ◽

pp. S364-S364

Author(s):

Roby Bhattacharyya ◽

Alejandro Pironti ◽

Bruce J Walker ◽

Abigail Manson ◽

Virginia Pierce ◽

...

Keyword(s):

Point Mutations ◽

Illumina Miseq ◽

Nucleotide Polymorphisms ◽

Sequencing Data ◽

Single Nucleotide ◽

Carbapenem Resistant ◽

Oxford Nanopore ◽

Close Relationship ◽

Long Read ◽

Carbapenem Resistant Enterobacteriaceae

Abstract Background Carbapenem-resistant Enterobacteriaceae (CRE) are a major public health threat. We report four clonally related Citrobacter freundii isolates harboring the blaKPC-3 carbapenemase in April–May 2017 that are nearly identical to a strain from 2014 at the same institution. Despite differing by ≤5 single nucleotide polymorphisms (SNPs), these isolates exhibited dramatic differences in carbapenemase plasmid architecture. Methods We sequenced four carbapenem-resistant C. freundii isolates from 2017 and compared them with an ongoing CRE surveillance project at our institution. SNPs were identified from Illumina MiSeq data aligned to a reference genome using the variant caller Pilon. Plasmids were assembled from Illumina and Oxford Nanopore sequencing data using Unicycler. Results The four 2017 isolates differed from one another by 0–5 chromosomal SNPs; two were identical. With one exception, these isolates differed by >38,000 SNPs from 25 C. freundii isolates sequenced from 2013 to 2017 at the same institution for CRE surveillance. The exception was a 2014 isolate that differed by 13–16 SNPs from each 2017 isolate, with 13 SNPs common to all four. Each C. freundii isolate harbored wild-type blaKPC-3. Despite the close relationship among the 2017 cluster, the plasmids harboring the blaKPC-3 genes differed dramatically: the carbapenemase occurred in one of the two different plasmids, with rearrangements between these plasmids across isolates. The related 2014 isolate harbored both plasmids, each with a separate copy of blaKPC-3. No transmission chains were found between any of the affected patients. Conclusion WGS confirmed clonality among four contemporaneous blaKPC-3-containing C. freundii isolates, and marked similarity with a 2014 isolate, within an institution. That only 13–16 SNPs varied between the 2014 and 2017 isolates suggests durable persistence of the blaKPC-3 gene within this lineage in a hospital ecosystem. The plasmids harboring these carbapenemase genes proved remarkably plastic, with plasmid loss and rearrangements occurring on the same time scale as two to three chromosomal point mutations. Combining short and long-read sequencing in a case cluster uniquely revealed unexpectedly rapid dynamics of carbapenemase plasmids, providing critical insight into their manner of spread. Disclosures M. J. Ferraro, SeLux Diagnostics: Scientific Advisor and Shareholder, Consulting fee. D. C. Hooper, SeLux Diagnostics: Scientific Advisor, Consulting fee.

Download Full-text

Evidence GDF15 Plays a Role in Familial and Recurrent Hyperemesis Gravidarum

Geburtshilfe und Frauenheilkunde ◽

10.1055/a-0661-0287 ◽

2018 ◽

Vol 78 (09) ◽

pp. 866-870 ◽

Cited By ~ 6

Author(s):

Marlena Fejzo ◽

Daria Arzy ◽

Rayna Tian ◽

Kimber MacGibbon ◽

Patrick Mullin

Keyword(s):

Genome Wide Association Study ◽

Risk Allele ◽

Hyperemesis Gravidarum ◽

Nucleotide Polymorphisms ◽

Sequencing Data ◽

Severe Nausea ◽

Single Nucleotide ◽

Genome Wide ◽

History Of ◽

Study Support

Abstract Introduction Hyperemesis gravidarum (HG), a pregnancy complication characterized by severe nausea and vomiting in pregnancy, occurs in up to 2% of pregnancies. It is associated with both maternal and fetal morbidity. HG is highly heritable and recurs in approximately 80% of women. In a recent genome-wide association study, it was shown that placentation, appetite, and the cachexia gene GDF15 are linked to HG. The purpose of this study was to explore whether GDF15 alleles linked to overexpression of GDF15 protein segregate with the condition in families, and whether the GDF15 risk allele is associated with recurrence of HG. Methods We analyzed GDF15 overexpression alleles for segregation with disease using exome-sequencing data from 5 HG families. We compared the allele frequency of the GDF15 risk allele, rs16982345, in patients who had recurrence of HG with its frequency in those who did not have recurrence. Results Single nucleotide polymorphisms (SNPs) linked to higher levels of GDF15 segregated with disease in HG families. The GDF15 risk allele, rs16982345, was associated with an 8-fold higher risk of recurrence of HG. Conclusion The findings of this study support the hypothesis that GDF15 is involved in the pathogenesis of both familial and recurrent cases of HG. The findings may be applicable when counseling women with a familial history of HG or recurrent HG. The GDF15-GFRAL brainstem-activated pathway was recently identified and therapies to treat conditions of abnormal appetite are under development. Based on our findings, patients carrying GDF15 variants associated with GDF15 overexpression should be included in future studies of GDF15-GFRAL-based therapeutics. If safe, this approach could reduce maternal and fetal morbidity.

Download Full-text

Analytical Validation of the Tag-It High-Throughput Microsphere-Based Universal Array Genotyping Platform: Application to the Multiplex Detection of a Panel of Thrombophilia-Associated Single-Nucleotide Polymorphisms

Clinical Chemistry ◽

10.1373/clinchem.2004.035071 ◽

2004 ◽

Vol 50 (11) ◽

pp. 2028-2036 ◽

Cited By ~ 47

Author(s):

Susan Bortolin ◽

Margot Black ◽

Hemanshu Modi ◽

Ihor Boszko ◽

Daniel Kobler ◽

...

Keyword(s):

Single Nucleotide Polymorphisms ◽

High Throughput ◽

Nucleotide Polymorphisms ◽

Sequencing Data ◽

Single Nucleotide ◽

Direct Dna Sequencing ◽

Array Platform ◽

Genotyping Platform ◽

Universal Array ◽

Primer Sets

Abstract Background: We have developed a novel, microsphere-based universal array platform referred to as the Tag-It™ platform. This platform is suitable for high-throughput clinical genotyping applications and was used for multiplex analysis of a panel of thrombophilia-associated single-nucleotide polymorphisms (SNPs). Methods: Genomic DNA from 132 patients was amplified by multiplex PCR using 6 primer sets, followed by multiplex allele-specific primer extension using 12 universally tagged genotyping primers. The products were then sorted on the Tag-It array and detected by use of the Luminex xMAP™ system. Genotypes were also determined by sequencing. Results: Empirical validation of the universal array showed that the highest nonspecific signal was 3.7% of the specific signal. Patient genotypes showed 100% concordance with direct DNA sequencing data for 736 SNP determinations. Conclusions: The Tag-It microsphere-based universal array platform is a highly accurate, multiplexed, high-throughput SNP-detection platform.

Download Full-text