Deciphering transcriptional control mechanisms in hematopoiesis—The impact of high-throughput sequencing technologies

Abstract Background: Differential expression (DE) analysis of RNA-seq data typically depends on gene annotations. Different sets of gene annotations are available for the human genome and are continually updated–a process complicated with the development and application of high-throughput sequencing technologies. However, the impact of the complexity of gene annotations on DE analysis remains unclear.Results: Using “mappability”, a metric of the complexity of gene annotation, we compared three distinct human gene annotations, GENCODE, RefSeq, and NONCODE, and evaluated how mappability affected DE analysis. We found that mappability was significantly different among the human gene annotations. We also found that increasing mappability improved the performance of DE analysis, and the impact of mappability mainly evident in the quantification step and propagated downstream of DE analysis systematically.Conclusions: We assessed how the complexity of gene annotations affects DE analysis using mappability. Our findings indicate that the growth and complexity of gene annotations negatively impact the performance of DE analysis, suggesting that an approach that excludes unnecessary gene models from gene annotations improves the performance of DE analysis.

Download Full-text

Taxon appearance from extraction and amplification steps demonstrates the value of multiple controls in tick microbiota analysis

10.1101/714030 ◽

2019 ◽

Cited By ~ 2

Author(s):

Emilie Lejal ◽

Agustín Estrada-Peña ◽

Maud Marsot ◽

Jean-François Cosson ◽

Olivier Rué ◽

...

Keyword(s):

Microbial Communities ◽

Dna Extraction ◽

High Throughput ◽

High Throughput Sequencing ◽

Dna Amplification ◽

Community Diversity ◽

Analytical Process ◽

Sequencing Technologies ◽

And Gender ◽

The Impact

AbstractBackgroundThe development of high throughput sequencing technologies has substantially improved analysis of bacterial community diversity, composition, and functions. Over the last decade, high throughput sequencing has been used extensively to identify the diversity and composition of tick microbial communities. However, a growing number of studies are warning about the impact of contamination brought along the different steps of the analytical process, from DNA extraction to amplification. In low biomass samples, e.g. individual tick samples, these contaminants may represent a large part of the obtained sequences, and thus generate considerable errors in downstream analyses and in the interpretation of results. Most studies of tick microbiota either do not mention the inclusion of controls during the DNA extraction or amplification steps, or consider the lack of an electrophoresis signal as an absence of contamination. In this context, we aimed to assess the proportion of contaminant sequences resulting from these steps. We analyzed the microbiota of individual Ixodes ricinus ticks by including several categories of controls throughout the analytical process: crushing, DNA extraction, and DNA amplification.ResultsControls yielded a significant number of sequences (1,126 to 13,198 mean sequences, depending on the control category). Some operational taxonomic units (OTUs) detected in these controls belong to genera reported in previous tick microbiota studies. In this study, these OTUs accounted for 50.9% of the total number of sequences in our samples, and were considered contaminants. Contamination levels (i.e. the percentage of sequences belonging to OTUs identified as contaminants) varied with tick stage and gender: 76.3% of nymphs and 75% of males demonstrated contamination over 50%, while most females (65.7%) had rates lower than 20%. Contamination mainly corresponded to OTUs detected in crushing and DNA extraction controls, highlighting the importance of carefully controlling these steps.ConclusionHere, we showed that contaminant OTUs from extraction and amplification steps can represent more than half the total sequence yield in sequencing runs, and lead to unreliable results when characterizing tick microbial communities. We thus strongly advise the routine use of negative controls in tick microbiota studies, and more generally in studies involving low biomass samples.

Download Full-text

In Silico Estimation of the Abundance and Phylogenetic Significance of the Composite Oct4-Sox2 Binding Motifs within a Wide Range of Species

Data ◽

10.3390/data5040111 ◽

2020 ◽

Vol 5 (4) ◽

pp. 111

Author(s):

Arman Kulyyassov ◽

Ruslan Kalendar

Keyword(s):

High Throughput ◽

Dna Sequences ◽

High Throughput Sequencing ◽

Regulatory Elements ◽

Scoring Method ◽

Sequencing Technologies ◽

Wide Range ◽

Multiple Species ◽

Eukaryotic Organisms ◽

The Impact

High-throughput sequencing technologies have greatly accelerated the progress of genomics, transcriptomics, and metagenomics. Currently, a large amount of genomic data from various organisms is being generated, the volume of which is increasing every year. Therefore, the development of methods that allow the rapid search and analysis of DNA sequences is urgent. Here, we present a novel motif-based high-throughput sequence scoring method that generates genome information. We found and identified Utf1-like, Fgf4-like, and Hoxb1-like motifs, which are cis-regulatory elements for the pluripotency transcription factors Sox2 and Oct4 within the genomes of different eukaryotic organisms. The genome-wide analysis of these motifs was performed to understand the impact of their diversification on mammalian genome evolution. Utf1-like, Fgf4-like, and Hoxb1-like motif diversity was evaluated across genomes from multiple species.

Download Full-text

Impact of human gene annotations on RNA-seq differential expression analysis

BMC Genomics ◽

10.1186/s12864-021-08038-7 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Yu Hamaguchi ◽

Chao Zeng ◽

Michiaki Hamada

Keyword(s):

Differential Expression ◽

High Throughput ◽

High Throughput Sequencing ◽

Human Gene ◽

Gene Annotation ◽

Differential Expression Analysis ◽

Rna Seq ◽

Gene Annotations ◽

Sequencing Technologies ◽

The Impact

Abstract Background Differential expression (DE) analysis of RNA-seq data typically depends on gene annotations. Different sets of gene annotations are available for the human genome and are continually updated–a process complicated with the development and application of high-throughput sequencing technologies. However, the impact of the complexity of gene annotations on DE analysis remains unclear. Results Using “mappability”, a metric of the complexity of gene annotation, we compared three distinct human gene annotations, GENCODE, RefSeq, and NONCODE, and evaluated how mappability affected DE analysis. We found that mappability was significantly different among the human gene annotations. We also found that increasing mappability improved the performance of DE analysis, and the impact of mappability mainly evident in the quantification step and propagated downstream of DE analysis systematically. Conclusions We assessed how the complexity of gene annotations affects DE analysis using mappability. Our findings indicate that the growth and complexity of gene annotations negatively impact the performance of DE analysis, suggesting that an approach that excludes unnecessary gene models from gene annotations improves the performance of DE analysis.

Download Full-text

Assessing genotyping errors in mammalian museum study skins using high-throughput genotyping-by-sequencing

Conservation Genetics Resources ◽

10.1007/s12686-021-01213-8 ◽

2021 ◽

Author(s):

Stella C. Yuan ◽

Eric Malekos ◽

Melissa T. R. Hawkins

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Massively Parallel Sequencing ◽

Massively Parallel ◽

Museum Specimens ◽

Museum Specimen ◽

Genotyping Errors ◽

Allelic Dropout ◽

Parallel Sequencing ◽

Sequencing Technologies

AbstractThe use of museum specimens held in natural history repositories for population and conservation genetic research is increasing in tandem with the use of massively parallel sequencing technologies. Short Tandem Repeats (STRs), or microsatellite loci, are commonly used genetic markers in wildlife and population genetic studies. However, they traditionally suffered from a host of issues including length homoplasy, high costs, low throughput, and difficulties in reproducibility across laboratories. Massively parallel sequencing technologies can address these problems, but the incorporation of museum specimen derived DNA suffers from significant fragmentation and exogenous DNA contamination. Combatting these issues requires extra measures of stringency in the lab and during data analysis, yet there have not been any high-throughput sequencing studies evaluating microsatellite allelic dropout from museum specimen extracted DNA. In this study, we evaluate genotyping errors derived from mammalian museum skin DNA extracts for previously characterized microsatellites across PCR replicates utilizing high-throughput sequencing. We found it useful to classify samples based on DNA concentration, which determined the rate by which genotypes were accurately recovered. Longer microsatellites performed worse in all museum specimens. Allelic dropout rates across loci were dependent on sample quantity, with high concentration museum specimens performing as well and recovering quality metrics nearly as high as the frozen tissue sample. Based on our results, we provide a set of best practices for quality assurance and incorporation of reliable genotypes from museum specimens.

Download Full-text

Detection of novel allelic variations in soybean mutant population using Tilling by Sequencing

10.1101/711440 ◽

2019 ◽

Author(s):

Reneth Millas ◽

Mary Espina ◽

CM Sabbir Ahmed ◽

Angelina Bernardini ◽

Ekundayo Adeleke ◽

...

Keyword(s):

Fatty Acid ◽

High Throughput ◽

Reverse Genetics ◽

High Throughput Sequencing ◽

Fatty Acid Biosynthesis ◽

Induced Mutations ◽

Mutant Population ◽

Sequencing Technologies ◽

Allelic Variations ◽

Tilling By Sequencing

ABSTRACTOne of the most important tools in genetic improvement is mutagenesis, which is a useful tool to induce genetic and phenotypic variation for trait improvement and discovery of novel genes. JTN-5203 (MG V) mutant population was generated using an induced ethyl methane sulfonate (EMS) mutagenesis and was used for detection of induced mutations in FAD2-1A and FAD2-1B genes using reverse genetics approach. Optimum concentration of EMS was used to treat 15,000 bulk JTN-5203 seeds producing 1,820 M2 population. DNA was extracted, normalized, and pooled from these individuals. Specific primers were designed from FAD2-1A and FAD2-1B genes that are involved in the fatty acid biosynthesis pathway for further analysis using next-generation sequencing. High throughput mutation discovery through TILLING-by-Sequencing approach was used to detect novel allelic variations in this population. Several mutations and allelic variations with high impacts were detected for FAD2-1A and FAD2-1B. This includes GC to AT transition mutations in FAD2-1A (20%) and FAD2-1B (69%). Mutation density for this population is estimated to be about 1/136kb. Through mutagenesis and high-throughput sequencing technologies, novel alleles underlying the mutations observed in mutants with reduced polyunsaturated fatty acids will be identified, and these mutants can be further used in breeding soybean lines with improved fatty acid profile, thereby developing heart-healthy-soybeans.

Download Full-text

Adenosine-to-inosine RNA editing may be implicated in human pathogenesis

Bulletin of Russian State Medical University ◽

10.24075/brsmu.2019.028 ◽

2019 ◽

pp. 22-25

Author(s):

AA Kliuchnikova ◽

SA Moshkovskii

Keyword(s):

Immune Responses ◽

Rna Editing ◽

High Throughput ◽

High Throughput Sequencing ◽

Common Mechanism ◽

Adenosine Deaminases ◽

Human Transcriptome ◽

Sequencing Technologies

Adenosine-to-inosine (A-to-I) RNA editing is a common mechanism of post-transcriptional modification in many metazoans including vertebrates; the process is catalyzed by adenosine deaminases acting on RNA (ADARs). Using high-throughput sequencing technologies resulted in finding thousands of RNA editing sites throughout the human transcriptome however, their functions are still poorly understood. The aim of this brief review is to draw attention of clinicians and biomedical researchers to ADAR-mediated RNA editing phenomenon and its possible implication in development of neuropathologies, antiviral immune responses and cancer.

Download Full-text

Overview of High Throughput Sequencing Technologies to Elucidate Molecular Pathways in Cardiovascular Diseases

Circulation Research ◽

10.1161/circresaha.113.300939 ◽

2013 ◽

Vol 112 (12) ◽

pp. 1613-1623 ◽

Cited By ~ 49

Author(s):

Jared M. Churko ◽

Gary L. Mantalas ◽

Michael P. Snyder ◽

Joseph C. Wu

Keyword(s):

Cardiovascular Diseases ◽

High Throughput ◽

High Throughput Sequencing ◽

Molecular Pathways ◽

Sequencing Technologies

Download Full-text

Minirmd: accurate and fast duplicate removal tool for short reads via multiple minimizers

Bioinformatics ◽

10.1093/bioinformatics/btaa915 ◽

2020 ◽

Author(s):

Yuansheng Liu ◽

Xiaocai Zhang ◽

Quan Zou ◽

Xiangxiang Zeng

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

De Novo ◽

Supplementary Information ◽

Supplementary Data ◽

Complementary Strand ◽

Short Reads ◽

Sequencing Technologies ◽

Computational Resources

Abstract Summary Removing duplicate and near-duplicate reads, generated by high-throughput sequencing technologies, is able to reduce computational resources in downstream applications. Here we develop minirmd, a de novo tool to remove duplicate reads via multiple rounds of clustering using different length of minimizer. Experiments demonstrate that minirmd removes more near-duplicate reads than existing clustering approaches and is faster than existing multi-core tools. To the best of our knowledge, minirmd is the first tool to remove near-duplicates on reverse-complementary strand. Availability and implementation https://github.com/yuansliu/minirmd. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

High-Throughput Sequencing is a Crucial Tool to Investigate the Contribution of Human Endogenous Retroviruses (HERVs) to Human Biology and Development

Viruses ◽

10.3390/v12060633 ◽

2020 ◽

Vol 12 (6) ◽

pp. 633 ◽

Cited By ~ 1

Author(s):

Maria Paola Pisano ◽

Nicole Grandi ◽

Enzo Tramontano

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Developmental Stages ◽

Large Fraction ◽

Expression Patterns ◽

Cell Types ◽

Endogenous Retroviruses ◽

Human Endogenous Retroviruses ◽

Retroviral Infections ◽

The Impact

Human Endogenous retroviruses (HERVs) are remnants of ancient retroviral infections that represent a large fraction of our genome. Their transcriptional activity is finely regulated in early developmental stages and their expression is modulated in different cell types and tissues. Such activity has an impact on human physiology and pathology that is only partially understood up to date. Novel high-throughput sequencing tools have recently allowed for a great advancement in elucidating the various HERV expression patterns in different tissues as well as the mechanisms controlling their transcription, and overall, have helped in gaining better insights in an all-inclusive understanding of the impact of HERVs in biology of the host.

Download Full-text