single nucleotide polymorphism calling
Recently Published Documents


TOTAL DOCUMENTS

9
(FIVE YEARS 1)

H-INDEX

2
(FIVE YEARS 0)

Author(s):  
You Zou ◽  
Yuejie Zhu ◽  
Yaohang Li ◽  
Fang-Xiang Wu ◽  
Jianxin Wang

Abstract The rapid increase of genome data brought by gene sequencing technologies poses a massive challenge to data processing. To solve the problems caused by enormous data and complex computing requirements, researchers have proposed many methods and tools which can be divided into three types: big data storage, efficient algorithm design and parallel computing. The purpose of this review is to investigate popular parallel programming technologies for genome sequence processing. Three common parallel computing models are introduced according to their hardware architectures, and each of which is classified into two or three types and is further analyzed with their features. Then, the parallel computing for genome sequence processing is discussed with four common applications: genome sequence alignment, single nucleotide polymorphism calling, genome sequence preprocessing, and pattern detection and searching. For each kind of application, its background is firstly introduced, and then a list of tools or algorithms are summarized in the aspects of principle, hardware platform and computing efficiency. The programming model of each hardware and application provides a reference for researchers to choose high-performance computing tools. Finally, we discuss the limitations and future trends of parallel computing technologies.


Crop Science ◽  
2020 ◽  
Vol 60 (6) ◽  
pp. 3066-3082
Author(s):  
Amanda Avelar Oliveira ◽  
Lauro José Moreira Guimarães ◽  
Claudia Teixeira Guimarães ◽  
Paulo Evaristo de Oliveira Guimarães ◽  
Marcos de Oliveira Pinto ◽  
...  

2020 ◽  
Vol 12 (12) ◽  
pp. 2267-2278 ◽  
Author(s):  
Darci A Giacomini ◽  
Eric L Patterson ◽  
Anita Küpper ◽  
Roland Beffa ◽  
Todd A Gaines ◽  
...  

Abstract In the last decade, Amaranthus tuberculatus has evolved resistance to 2,4-dichlorophenoxyacetic acid (2,4-D) and 4-hydroxyphenylpyruvate dioxygenase inhibitors in multiple states across the midwestern United States. Two populations resistant to both mode-of-action groups, one from Nebraska (NEB) and one from Illinois (CHR), were studied using an RNA-seq approach on F2 mapping populations to identify the genes responsible for resistance. Using both an A. tuberculatus transcriptome assembly and a high-quality grain amaranth (A. hypochondriacus) genome as references, differential transcript and gene expression analyses were conducted to identify genes that were significantly over- or underexpressed in resistant plants. When these differentially expressed genes (DEGs) were mapped on the A. hypochondriacus genome, physical clustering of the DEGs was apparent along several of the 16 A. hypochondriacus scaffolds. Furthermore, single-nucleotide polymorphism calling to look for resistant-specific (R) variants, and subsequent mapping of these variants, also found similar patterns of clustering. Specifically, regions biased toward R alleles overlapped with the DEG clusters. Within one of these clusters, allele-specific expression of cytochrome  P450  81E8 was observed for 2,4-D resistance in both the CHR and NEB populations, and phylogenetic analysis indicated a common evolutionary origin of this R allele in the two populations.


2020 ◽  
Vol 12 (8) ◽  
pp. 1258-1276 ◽  
Author(s):  
Gema Alama-Bermejo ◽  
Eli Meyer ◽  
Stephen D Atkinson ◽  
Astrid S Holzer ◽  
Monika M Wiśniewska ◽  
...  

Abstract Ceratonova shasta is an important myxozoan pathogen affecting the health of salmonid fishes in the Pacific Northwest of North America. Ceratonova shasta exists as a complex of host-specific genotypes, some with low to moderate virulence, and one that causes a profound, lethal infection in susceptible hosts. High throughput sequencing methods are powerful tools for discovering the genetic basis of these host/virulence differences, but deep sequencing of myxozoans has been challenging due to extremely fast molecular evolution of this group, yielding strongly divergent sequences that are difficult to identify, and unavoidable host contamination. We designed and optimized different bioinformatic pipelines to address these challenges. We obtained a unique set of comprehensive, host-free myxozoan RNA-seq data from C. shasta genotypes of varying virulence from different salmonid hosts. Analyses of transcriptome-wide genetic distances and maximum likelihood multigene phylogenies elucidated the evolutionary relationship between lineages and demonstrated the limited resolution of the established Internal Transcribed Spacer marker for C. shasta genotype identification, as this marker fails to differentiate between biologically distinct genotype II lineages from coho salmon and rainbow trout. We further analyzed the data sets based on polymorphisms in two gene groups related to virulence: cell migration and proteolytic enzymes including their inhibitors. The developed single-nucleotide polymorphism-calling pipeline identified polymorphisms between genotypes and demonstrated that variations in both motility and protease genes were associated with different levels of virulence of C. shasta in its salmonid hosts. The prospective use of proteolytic enzymes as promising candidates for targeted interventions against myxozoans in aquaculture is discussed. We developed host-free transcriptomes of a myxozoan model organism from strains that exhibited different degrees of virulence, as a unique source of data that will foster functional gene analyses and serve as a base for the development of potential therapeutics for efficient control of these parasites.


2020 ◽  
Vol 9 (2) ◽  
pp. 465 ◽  
Author(s):  
Jalil Kardan-Yamchi ◽  
Hossein Kazemian ◽  
Simone Battaglia ◽  
Hamidreza Abtahi ◽  
Abbas Rahimi Foroushani ◽  
...  

Accurate and timely detection of drug resistance can minimize the risk of further resistance development and lead to effective treatment. The aim of this study was to determine the resistance to first/second-line anti-tuberculosis drugs in rifampicin/multidrug-resistant Mycobacterium tuberculosis (RR/MDR-MTB) isolates. Molecular epidemiology of strains was determined using whole genome sequencing (WGS)-based genotyping. A total of 35 RR/MDR-MTB isolates were subjected to drug susceptibility testing against first/second-line drugs using 7H9 Middlebrook in broth microdilution method. Illumina technology was used for paired-end WGS applying a Maxwell 16 Cell DNA Purification kit and the NextSeq platform. Data analysis and single nucleotide polymorphism calling were performed using MTBseq pipeline. The genome-based resistance to each drug among the resistant phenotypes was as follows: rifampicin (97.1%), isoniazid (96.6%), ethambutol (100%), levofloxacin (83.3%), moxifloxacin (83.3%), amikacin (100%), kanamycin (100%), capreomycin (100%), prothionamide (100%), D-cycloserine (11.1%), clofazimine (20%), bedaquiline (0.0%), and delamanid (44.4%). There was no linezolid-resistant phenotype, and a bedaquiline-resistant strain was wild type for related genes. The Beijing, Euro-American, and Delhi-CAS were the most populated lineage/sublineages. Drug resistance-associated mutations were mostly linked to minimum inhibitory concentration results. However, the role of well-known drug-resistant genes for D-cycloserine, clofazimine, bedaquiline, and delamanid was found to be more controversial.


GigaScience ◽  
2020 ◽  
Vol 9 (2) ◽  
Author(s):  
Stephen J Bush ◽  
Dona Foster ◽  
David W Eyre ◽  
Emily L Clark ◽  
Nicola De Maio ◽  
...  

Abstract Background Accurately identifying single-nucleotide polymorphisms (SNPs) from bacterial sequencing data is an essential requirement for using genomics to track transmission and predict important phenotypes such as antimicrobial resistance. However, most previous performance evaluations of SNP calling have been restricted to eukaryotic (human) data. Additionally, bacterial SNP calling requires choosing an appropriate reference genome to align reads to, which, together with the bioinformatic pipeline, affects the accuracy and completeness of a set of SNP calls obtained. This study evaluates the performance of 209 SNP-calling pipelines using a combination of simulated data from 254 strains of 10 clinically common bacteria and real data from environmentally sourced and genomically diverse isolates within the genera Citrobacter, Enterobacter, Escherichia, and Klebsiella. Results We evaluated the performance of 209 SNP-calling pipelines, aligning reads to genomes of the same or a divergent strain. Irrespective of pipeline, a principal determinant of reliable SNP calling was reference genome selection. Across multiple taxa, there was a strong inverse relationship between pipeline sensitivity and precision, and the Mash distance (a proxy for average nucleotide divergence) between reads and reference genome. The effect was especially pronounced for diverse, recombinogenic bacteria such as Escherichia coli but less dominant for clonal species such as Mycobacterium tuberculosis. Conclusions The accuracy of SNP calling for a given species is compromised by increasing intra-species diversity. When reads were aligned to the same genome from which they were sequenced, among the highest-performing pipelines was Novoalign/GATK. By contrast, when reads were aligned to particularly divergent genomes, the highest-performing pipelines often used the aligners NextGenMap or SMALT, and/or the variant callers LoFreq, mpileup, or Strelka.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Gianpiero Marconi ◽  
Stefano Capomaccio ◽  
Cinzia Comino ◽  
Alberto Acquadro ◽  
Ezio Portis ◽  
...  

Abstract Methods for investigating DNA methylation nowadays either require a reference genome and high coverage, or investigate only CG methylation. Moreover, no large-scale analysis can be performed for N6-methyladenosine (6 mA) at an affordable price. Here we describe the methylation content sensitive enzyme double-digest restriction-site-associated DNA (ddRAD) technique (MCSeEd), a reduced-representation, reference-free, cost-effective approach for characterizing whole genome methylation patterns across different methylation contexts (e.g., CG, CHG, CHH, 6 mA). MCSeEd can also detect genetic variations among hundreds of samples. MCSeEd is based on parallel restrictions carried out by combinations of methylation insensitive and sensitive endonucleases, followed by next-generation sequencing. Moreover, we present a robust bioinformatic pipeline (available at https://bitbucket.org/capemaster/mcseed/src/master/) for differential methylation analysis combined with single nucleotide polymorphism calling without or with a reference genome.


2019 ◽  
Author(s):  
Gianpiero Marconi ◽  
Stefano Capomaccio ◽  
Cinzia Comino ◽  
Alberto Acquadro ◽  
Ezio Portis ◽  
...  

AbstractMethods for investigating DNA methylation nowadays either require a reference genome and high coverage, or investigate only CG methylation. Moreover, no large-scale analysis can be performed for N6-methyladenosine (6mA). Here we describe the methylation content sensitive enzyme double-digest restriction-site-associated DNA (ddRAD) technique (MCSeEd), a reduced-representation, reference-free, cost-effective approach for characterizing whole genome methylation patterns across different methylation contexts (e.g., CG, CHG, CHH, 6mA). MCSeEd can also detect genetic variations among hundreds of samples. MCSeEd is based on parallel restrictions carried out by combinations of methylation insensitive and sensitive endonucleases, followed by next-generation sequencing. Moreover, we present a robust bioinformatic pipeline (available at https://bitbucket.org/capemaster/mcseed/src/master/) for differential methylation analysis combined with single nucleotide polymorphism calling without or with a reference genome.


Author(s):  
Lailan Sahrina Hasibuan ◽  
Sita Nabila ◽  
Nurul Hudachair ◽  
Muhammad Abrar Istiadi

Data growing in molecular biology has increased rapidly since Next-Generation Sequencing (NGS) technology introduced in 2000, the latest technology used to sequence DNA with high throughput. Single Nucleotide Polymorphism (SNP) is a marker based on DNA which can be used to identify organism specifically. SNPs are usually exploited for optimizing parents selection in producing high-quality seed for plant breeding. This paper discusses SNP calling underlying NGS data of cultivated soybean (Glycine max [L]. Merr) using C5.0, an improved rule-based algorithm of C4.5. The evaluation illustrated that C5.0 is better than the other rule-based algorithm CART based on f-measure. The value of f-measure using C5.0 and CART are 0.63 and 0.58. Besides of that, C5.0 is robust for imbalanced training dataset up to 1:17 but it is suffer in large training dataset. C5.0’s performance may be increased by applying bagging or the other ensemble technique as improvement of CART by applying bagging in final decision. The other important thing is using appropriate features in representing SNP candidates. Based on information gain of C5.0, this paper recommends error probability, homopolymer left, mismatch alt and mean nearby qual as features for SNP calling.


Sign in / Sign up

Export Citation Format

Share Document