single nucleotide polymorphism calling Latest Research Papers

Parallel computing for genome sequence processing

Briefings in Bioinformatics ◽

10.1093/bib/bbab070 ◽

2021 ◽

Author(s):

You Zou ◽

Yuejie Zhu ◽

Yaohang Li ◽

Fang-Xiang Wu ◽

Jianxin Wang

Keyword(s):

Parallel Computing ◽

Data Storage ◽

Genome Sequence ◽

High Performance ◽

Programming Model ◽

Algorithm Design ◽

Sequence Processing ◽

Genome Data ◽

Sequencing Technologies ◽

Single Nucleotide Polymorphism Calling

Abstract The rapid increase of genome data brought by gene sequencing technologies poses a massive challenge to data processing. To solve the problems caused by enormous data and complex computing requirements, researchers have proposed many methods and tools which can be divided into three types: big data storage, efficient algorithm design and parallel computing. The purpose of this review is to investigate popular parallel programming technologies for genome sequence processing. Three common parallel computing models are introduced according to their hardware architectures, and each of which is classified into two or three types and is further analyzed with their features. Then, the parallel computing for genome sequence processing is discussed with four common applications: genome sequence alignment, single nucleotide polymorphism calling, genome sequence preprocessing, and pattern detection and searching. For each kind of application, its background is firstly introduced, and then a list of tools or algorithms are summarized in the aspects of principle, hardware platform and computing efficiency. The programming model of each hardware and application provides a reference for researchers to choose high-performance computing tools. Finally, we discuss the limitations and future trends of parallel computing technologies.

Download Full-text

Single nucleotide polymorphism calling and imputation strategies for cost‐effective genotyping in a tropical maize breeding program

Crop Science ◽

10.1002/csc2.20255 ◽

2020 ◽

Vol 60 (6) ◽

pp. 3066-3082

Author(s):

Amanda Avelar Oliveira ◽

Lauro José Moreira Guimarães ◽

Claudia Teixeira Guimarães ◽

Paulo Evaristo de Oliveira Guimarães ◽

Marcos de Oliveira Pinto ◽

...

Keyword(s):

Single Nucleotide Polymorphism ◽

Cost Effective ◽

Breeding Program ◽

Tropical Maize ◽

Nucleotide Polymorphism ◽

Maize Breeding ◽

Single Nucleotide ◽

Single Nucleotide Polymorphism Calling

Download Full-text

Coexpression Clusters and Allele-Specific Expression in Metabolism-Based Herbicide Resistance

Genome Biology and Evolution ◽

10.1093/gbe/evaa191 ◽

2020 ◽

Vol 12 (12) ◽

pp. 2267-2278 ◽

Cited By ~ 1

Author(s):

Darci A Giacomini ◽

Eric L Patterson ◽

Anita Küpper ◽

Roland Beffa ◽

Todd A Gaines ◽

...

Keyword(s):

Transcriptome Assembly ◽

Dichlorophenoxyacetic Acid ◽

Specific Expression ◽

Single Nucleotide ◽

Allele Specific Expression ◽

Single Nucleotide Polymorphism Calling ◽

Allele Specific ◽

Amaranthus Tuberculatus ◽

2,4 Dichlorophenoxyacetic Acid ◽

Two Populations

Abstract In the last decade, Amaranthus tuberculatus has evolved resistance to 2,4-dichlorophenoxyacetic acid (2,4-D) and 4-hydroxyphenylpyruvate dioxygenase inhibitors in multiple states across the midwestern United States. Two populations resistant to both mode-of-action groups, one from Nebraska (NEB) and one from Illinois (CHR), were studied using an RNA-seq approach on F2 mapping populations to identify the genes responsible for resistance. Using both an A. tuberculatus transcriptome assembly and a high-quality grain amaranth (A. hypochondriacus) genome as references, differential transcript and gene expression analyses were conducted to identify genes that were significantly over- or underexpressed in resistant plants. When these differentially expressed genes (DEGs) were mapped on the A. hypochondriacus genome, physical clustering of the DEGs was apparent along several of the 16 A. hypochondriacus scaffolds. Furthermore, single-nucleotide polymorphism calling to look for resistant-specific (R) variants, and subsequent mapping of these variants, also found similar patterns of clustering. Specifically, regions biased toward R alleles overlapped with the DEG clusters. Within one of these clusters, allele-specific expression of cytochrome P450 81E8 was observed for 2,4-D resistance in both the CHR and NEB populations, and phylogenetic analysis indicated a common evolutionary origin of this R allele in the two populations.

Download Full-text

Transcriptome-Wide Comparisons and Virulence Gene Polymorphisms of Host-Associated Genotypes of the Cnidarian Parasite Ceratonova shasta in Salmonids

Genome Biology and Evolution ◽

10.1093/gbe/evaa109 ◽

2020 ◽

Vol 12 (8) ◽

pp. 1258-1276 ◽

Cited By ~ 2

Author(s):

Gema Alama-Bermejo ◽

Eli Meyer ◽

Stephen D Atkinson ◽

Astrid S Holzer ◽

Monika M Wiśniewska ◽

...

Keyword(s):

Pacific Northwest ◽

High Throughput Sequencing ◽

Coho Salmon ◽

Proteolytic Enzymes ◽

Evolutionary Relationship ◽

Model Organism ◽

Virulence Gene ◽

Genetic Distances ◽

Targeted Interventions ◽

Single Nucleotide Polymorphism Calling

Abstract Ceratonova shasta is an important myxozoan pathogen affecting the health of salmonid fishes in the Pacific Northwest of North America. Ceratonova shasta exists as a complex of host-specific genotypes, some with low to moderate virulence, and one that causes a profound, lethal infection in susceptible hosts. High throughput sequencing methods are powerful tools for discovering the genetic basis of these host/virulence differences, but deep sequencing of myxozoans has been challenging due to extremely fast molecular evolution of this group, yielding strongly divergent sequences that are difficult to identify, and unavoidable host contamination. We designed and optimized different bioinformatic pipelines to address these challenges. We obtained a unique set of comprehensive, host-free myxozoan RNA-seq data from C. shasta genotypes of varying virulence from different salmonid hosts. Analyses of transcriptome-wide genetic distances and maximum likelihood multigene phylogenies elucidated the evolutionary relationship between lineages and demonstrated the limited resolution of the established Internal Transcribed Spacer marker for C. shasta genotype identification, as this marker fails to differentiate between biologically distinct genotype II lineages from coho salmon and rainbow trout. We further analyzed the data sets based on polymorphisms in two gene groups related to virulence: cell migration and proteolytic enzymes including their inhibitors. The developed single-nucleotide polymorphism-calling pipeline identified polymorphisms between genotypes and demonstrated that variations in both motility and protease genes were associated with different levels of virulence of C. shasta in its salmonid hosts. The prospective use of proteolytic enzymes as promising candidates for targeted interventions against myxozoans in aquaculture is discussed. We developed host-free transcriptomes of a myxozoan model organism from strains that exhibited different degrees of virulence, as a unique source of data that will foster functional gene analyses and serve as a base for the development of potential therapeutics for efficient control of these parasites.

Download Full-text

Whole Genome Sequencing Results Associated with Minimum Inhibitory Concentrations of 14 Anti-Tuberculosis Drugs among Rifampicin-Resistant Isolates of Mycobacterium Tuberculosis from Iran

Journal of Clinical Medicine ◽

10.3390/jcm9020465 ◽

2020 ◽

Vol 9 (2) ◽

pp. 465 ◽

Cited By ~ 5

Author(s):

Jalil Kardan-Yamchi ◽

Hossein Kazemian ◽

Simone Battaglia ◽

Hamidreza Abtahi ◽

Abbas Rahimi Foroushani ◽

...

Keyword(s):

Drug Resistance ◽

Mycobacterium Tuberculosis ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Drug Susceptibility ◽

Drug Susceptibility Testing ◽

Whole Genome ◽

Second Line ◽

Single Nucleotide Polymorphism Calling ◽

Broth Microdilution Method

Accurate and timely detection of drug resistance can minimize the risk of further resistance development and lead to effective treatment. The aim of this study was to determine the resistance to first/second-line anti-tuberculosis drugs in rifampicin/multidrug-resistant Mycobacterium tuberculosis (RR/MDR-MTB) isolates. Molecular epidemiology of strains was determined using whole genome sequencing (WGS)-based genotyping. A total of 35 RR/MDR-MTB isolates were subjected to drug susceptibility testing against first/second-line drugs using 7H9 Middlebrook in broth microdilution method. Illumina technology was used for paired-end WGS applying a Maxwell 16 Cell DNA Purification kit and the NextSeq platform. Data analysis and single nucleotide polymorphism calling were performed using MTBseq pipeline. The genome-based resistance to each drug among the resistant phenotypes was as follows: rifampicin (97.1%), isoniazid (96.6%), ethambutol (100%), levofloxacin (83.3%), moxifloxacin (83.3%), amikacin (100%), kanamycin (100%), capreomycin (100%), prothionamide (100%), D-cycloserine (11.1%), clofazimine (20%), bedaquiline (0.0%), and delamanid (44.4%). There was no linezolid-resistant phenotype, and a bedaquiline-resistant strain was wild type for related genes. The Beijing, Euro-American, and Delhi-CAS were the most populated lineage/sublineages. Drug resistance-associated mutations were mostly linked to minimum inhibitory concentration results. However, the role of well-known drug-resistant genes for D-cycloserine, clofazimine, bedaquiline, and delamanid was found to be more controversial.

Download Full-text

Genomic diversity affects the accuracy of bacterial single-nucleotide polymorphism–calling pipelines

GigaScience ◽

10.1093/gigascience/giaa007 ◽

2020 ◽

Vol 9 (2) ◽

Cited By ~ 17

Author(s):

Stephen J Bush ◽

Dona Foster ◽

David W Eyre ◽

Emily L Clark ◽

Nicola De Maio ◽

...

Keyword(s):

Reference Genome ◽

Simulated Data ◽

Real Data ◽

Genomic Diversity ◽

Nucleotide Polymorphisms ◽

Sequencing Data ◽

Single Nucleotide ◽

Snp Calling ◽

Single Nucleotide Polymorphism Calling ◽

Nucleotide Divergence

Abstract Background Accurately identifying single-nucleotide polymorphisms (SNPs) from bacterial sequencing data is an essential requirement for using genomics to track transmission and predict important phenotypes such as antimicrobial resistance. However, most previous performance evaluations of SNP calling have been restricted to eukaryotic (human) data. Additionally, bacterial SNP calling requires choosing an appropriate reference genome to align reads to, which, together with the bioinformatic pipeline, affects the accuracy and completeness of a set of SNP calls obtained. This study evaluates the performance of 209 SNP-calling pipelines using a combination of simulated data from 254 strains of 10 clinically common bacteria and real data from environmentally sourced and genomically diverse isolates within the genera Citrobacter, Enterobacter, Escherichia, and Klebsiella. Results We evaluated the performance of 209 SNP-calling pipelines, aligning reads to genomes of the same or a divergent strain. Irrespective of pipeline, a principal determinant of reliable SNP calling was reference genome selection. Across multiple taxa, there was a strong inverse relationship between pipeline sensitivity and precision, and the Mash distance (a proxy for average nucleotide divergence) between reads and reference genome. The effect was especially pronounced for diverse, recombinogenic bacteria such as Escherichia coli but less dominant for clonal species such as Mycobacterium tuberculosis. Conclusions The accuracy of SNP calling for a given species is compromised by increasing intra-species diversity. When reads were aligned to the same genome from which they were sequenced, among the highest-performing pipelines was Novoalign/GATK. By contrast, when reads were aligned to particularly divergent genomes, the highest-performing pipelines often used the aligners NextGenMap or SMALT, and/or the variant callers LoFreq, mpileup, or Strelka.

Download Full-text

Methylation content sensitive enzyme ddRAD (MCSeEd): a reference-free, whole genome profiling system to address cytosine/adenine methylation changes

Scientific Reports ◽

10.1038/s41598-019-51423-2 ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 2

Author(s):

Gianpiero Marconi ◽

Stefano Capomaccio ◽

Cinzia Comino ◽

Alberto Acquadro ◽

Ezio Portis ◽

...

Keyword(s):

Large Scale ◽

Reference Genome ◽

Cost Effective ◽

Whole Genome ◽

High Coverage ◽

Reduced Representation ◽

Large Scale Analysis ◽

Genome Methylation ◽

Cost Effective Approach ◽

Single Nucleotide Polymorphism Calling

Abstract Methods for investigating DNA methylation nowadays either require a reference genome and high coverage, or investigate only CG methylation. Moreover, no large-scale analysis can be performed for N6-methyladenosine (6 mA) at an affordable price. Here we describe the methylation content sensitive enzyme double-digest restriction-site-associated DNA (ddRAD) technique (MCSeEd), a reduced-representation, reference-free, cost-effective approach for characterizing whole genome methylation patterns across different methylation contexts (e.g., CG, CHG, CHH, 6 mA). MCSeEd can also detect genetic variations among hundreds of samples. MCSeEd is based on parallel restrictions carried out by combinations of methylation insensitive and sensitive endonucleases, followed by next-generation sequencing. Moreover, we present a robust bioinformatic pipeline (available at https://bitbucket.org/capemaster/mcseed/src/master/) for differential methylation analysis combined with single nucleotide polymorphism calling without or with a reference genome.

Download Full-text

Methylation content sensitive enzyme ddRAD (MCSeEd): a reference-free, whole genome profiling system to address cytosine/ adenine methylation changes

10.1101/616532 ◽

2019 ◽

Author(s):

Gianpiero Marconi ◽

Stefano Capomaccio ◽

Cinzia Comino ◽

Alberto Acquadro ◽

Ezio Portis ◽

...

Keyword(s):

Large Scale ◽

Reference Genome ◽

Cost Effective ◽

Whole Genome ◽

High Coverage ◽

Reduced Representation ◽

Large Scale Analysis ◽

Genome Methylation ◽

Cost Effective Approach ◽

Single Nucleotide Polymorphism Calling

AbstractMethods for investigating DNA methylation nowadays either require a reference genome and high coverage, or investigate only CG methylation. Moreover, no large-scale analysis can be performed for N6-methyladenosine (6mA). Here we describe the methylation content sensitive enzyme double-digest restriction-site-associated DNA (ddRAD) technique (MCSeEd), a reduced-representation, reference-free, cost-effective approach for characterizing whole genome methylation patterns across different methylation contexts (e.g., CG, CHG, CHH, 6mA). MCSeEd can also detect genetic variations among hundreds of samples. MCSeEd is based on parallel restrictions carried out by combinations of methylation insensitive and sensitive endonucleases, followed by next-generation sequencing. Moreover, we present a robust bioinformatic pipeline (available at https://bitbucket.org/capemaster/mcseed/src/master/) for differential methylation analysis combined with single nucleotide polymorphism calling without or with a reference genome.

Download Full-text

Evaluation of F-Measure and Feature Analysis of C5.0 Implementation on Single Nucleotide Polymorphism Calling

Indonesian Journal of Artificial Intelligence and Data Mining ◽

10.24014/ijaidm.v1i1.4616 ◽

2018 ◽

Vol 1 (1) ◽

pp. 1

Author(s):

Lailan Sahrina Hasibuan ◽

Sita Nabila ◽

Nurul Hudachair ◽

Muhammad Abrar Istiadi

Keyword(s):

Single Nucleotide Polymorphism ◽

The Other ◽

Training Dataset ◽

Final Decision ◽

Nucleotide Polymorphism ◽

Single Nucleotide ◽

Rule Based ◽

Snp Calling ◽

Single Nucleotide Polymorphism Calling ◽

F Measure

Data growing in molecular biology has increased rapidly since Next-Generation Sequencing (NGS) technology introduced in 2000, the latest technology used to sequence DNA with high throughput. Single Nucleotide Polymorphism (SNP) is a marker based on DNA which can be used to identify organism specifically. SNPs are usually exploited for optimizing parents selection in producing high-quality seed for plant breeding. This paper discusses SNP calling underlying NGS data of cultivated soybean (Glycine max [L]. Merr) using C5.0, an improved rule-based algorithm of C4.5. The evaluation illustrated that C5.0 is better than the other rule-based algorithm CART based on f-measure. The value of f-measure using C5.0 and CART are 0.63 and 0.58. Besides of that, C5.0 is robust for imbalanced training dataset up to 1:17 but it is suffer in large training dataset. C5.0’s performance may be increased by applying bagging or the other ensemble technique as improvement of CART by applying bagging in final decision. The other important thing is using appropriate features in representing SNP candidates. Based on information gain of C5.0, this paper recommends error probability, homopolymer left, mismatch alt and mean nearby qual as features for SNP calling.

Download Full-text

single nucleotide polymorphism calling
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Parallel computing for genome sequence processing

Single nucleotide polymorphism calling and imputation strategies for cost‐effective genotyping in a tropical maize breeding program

Coexpression Clusters and Allele-Specific Expression in Metabolism-Based Herbicide Resistance

Transcriptome-Wide Comparisons and Virulence Gene Polymorphisms of Host-Associated Genotypes of the Cnidarian Parasite Ceratonova shasta in Salmonids

Whole Genome Sequencing Results Associated with Minimum Inhibitory Concentrations of 14 Anti-Tuberculosis Drugs among Rifampicin-Resistant Isolates of Mycobacterium Tuberculosis from Iran

Genomic diversity affects the accuracy of bacterial single-nucleotide polymorphism–calling pipelines

Methylation content sensitive enzyme ddRAD (MCSeEd): a reference-free, whole genome profiling system to address cytosine/adenine methylation changes

Methylation content sensitive enzyme ddRAD (MCSeEd): a reference-free, whole genome profiling system to address cytosine/ adenine methylation changes

Evaluation of F-Measure and Feature Analysis of C5.0 Implementation on Single Nucleotide Polymorphism Calling

Export Citation Format

single nucleotide polymorphism callingRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Parallel computing for genome sequence processing

Single nucleotide polymorphism calling and imputation strategies for cost‐effective genotyping in a tropical maize breeding program

Coexpression Clusters and Allele-Specific Expression in Metabolism-Based Herbicide Resistance

Transcriptome-Wide Comparisons and Virulence Gene Polymorphisms of Host-Associated Genotypes of the Cnidarian Parasite Ceratonova shasta in Salmonids

Whole Genome Sequencing Results Associated with Minimum Inhibitory Concentrations of 14 Anti-Tuberculosis Drugs among Rifampicin-Resistant Isolates of Mycobacterium Tuberculosis from Iran

Genomic diversity affects the accuracy of bacterial single-nucleotide polymorphism–calling pipelines

Methylation content sensitive enzyme ddRAD (MCSeEd): a reference-free, whole genome profiling system to address cytosine/adenine methylation changes

Methylation content sensitive enzyme ddRAD (MCSeEd): a reference-free, whole genome profiling system to address cytosine/ adenine methylation changes

Evaluation of F-Measure and Feature Analysis of C5.0 Implementation on Single Nucleotide Polymorphism Calling

single nucleotide polymorphism calling
Recently Published Documents