The In Silico Genotyper (ISG): an open-source pipeline to rapidly identify and annotate nucleotide variants for comparative genomics applications

Mapping Intimacies ◽

10.1101/015578 ◽

2015 ◽

Cited By ~ 20

Author(s):

Jason W Sahl ◽

Stephen M Beckstrom-Sternberg ◽

James Babic-Sternberg ◽

John D Gillece ◽

Crystal M Hepp ◽

...

Keyword(s):

Comparative Genomics ◽

Open Source ◽

In Silico ◽

Sequence Data ◽

Source Code ◽

Whole Genome Sequence ◽

Nucleotide Polymorphisms ◽

Bacterial Genomes ◽

Single Nucleotide ◽

General Public License

The identification and annotation of nucleotide variants, including insertions/deletions and single nucleotide polymorphisms (SNPs), from whole genome sequence data is important for studies of bacterial evolution, comparative genomics, and phylogeography. The in silico Genotyper (ISG) represents a parallel, tested, open source tool that can perform these functions and scales well to thousands of bacterial genomes. ISG is written in Java and requires MUMmer (Delcher, et al., 2003), BWA (Li and Durbin, 2009), and GATK (McKenna, et al., 2010) for full functionality. The source code and compiled binaries are freely available from https://github.com/TGenNorth/ISGPipeline under a GNU General Public License. Benchmark comparisons demonstrate that ISG is faster and more flexible than comparable tools.

Download Full-text

Single-Nucleotide Polymorphisms in the Whole-Genome Sequence Data of Shiga Toxin-Producing Escherichia coli O157:H7/H- Strains by Cultivation

Current Microbiology ◽

10.1007/s00284-017-1208-z ◽

2017 ◽

Vol 74 (4) ◽

pp. 425-430 ◽

Cited By ~ 3

Author(s):

Eiji Yokoyama ◽

Shinichiro Hirai ◽

Taichiro Ishige ◽

Satoshi Murakami

Keyword(s):

Escherichia Coli ◽

Single Nucleotide Polymorphisms ◽

Genome Sequence ◽

Shiga Toxin ◽

Sequence Data ◽

Whole Genome Sequence ◽

Whole Genome ◽

Escherichia Coli O157 ◽

Nucleotide Polymorphisms ◽

Single Nucleotide

Download Full-text

An Integrated Pipeline of Open Source Software Adapted for Multi-CPU Architectures: Use in the Large-Scale Identification of Single Nucleotide Polymorphisms

Comparative and Functional Genomics ◽

10.1155/2007/35604 ◽

2007 ◽

Vol 2007 ◽

pp. 1-7 ◽

Cited By ~ 1

Author(s):

B. Jayashree ◽

Manindra S. Hanspal ◽

Rajgopal Srinivasan ◽

R. Vigneshwaran ◽

Rajeev K. Varshney ◽

...

Keyword(s):

Single Nucleotide Polymorphisms ◽

Open Source ◽

Open Source Software ◽

Large Scale ◽

Sequence Data ◽

Snp Genotyping ◽

Model Organisms ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Web Interfaces

The large amounts of EST sequence data available from a single species of an organism as well as for several species within a genus provide an easy source of identification of intra- and interspecies single nucleotide polymorphisms (SNPs). In the case of model organisms, the data available are numerous, given the degree of redundancy in the deposited EST data. There are several available bioinformatics tools that can be used to mine this data; however, using them requires a certain level of expertise: the tools have to be used sequentially with accompanying format conversion and steps like clustering and assembly of sequences become time-intensive jobs even for moderately sized datasets. We report here a pipeline of open source software extended to run on multiple CPU architectures that can be used to mine large EST datasets for SNPs and identify restriction sites for assaying the SNPs so that cost-effective CAPS assays can be developed for SNP genotyping in genetics and breeding applications. At the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), the pipeline has been implemented to run on a Paracel high-performance system consisting of four dual AMD Opteron processors running Linux with MPICH. The pipeline can be accessed through user-friendly web interfaces at http://hpc.icrisat.cgiar.org/PBSWeb and is available on request for academic use. We have validated the developed pipeline by mining chickpea ESTs for interspecies SNPs, development of CAPS assays for SNP genotyping, and confirmation of restriction digestion pattern at the sequence level.

Download Full-text

qqman: an R package for visualizing GWAS results using Q-Q and manhattan plots

10.1101/005165 ◽

2014 ◽

Cited By ~ 364

Author(s):

Stephen D. Turner

Keyword(s):

Association Studies ◽

Source Code ◽

R Package ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Genome Wide ◽

Human Trait ◽

General Public License

Summary: Genome-wide association studies (GWAS) have identified thousands of human trait-associated single nucleotide polymorphisms. Here, I describe a freely available R package for visualizing GWAS results using Q-Q and manhattan plots. The qqman package enables the flexible creation of manhattan plots, both genome-wide and for single chromosomes, with optional highlighting of SNPs of interest. Availability: qqman is released under the GNU General Public License, and is freely available on the Comprehensive R Archive Network (http://cran.r-project.org/package=qqman). The source code is available on GitHub (https://github.com/stephenturner/qqman).

Download Full-text

Utilizing Big Data to Identify Tiny Toxic Components: Digitalis

Foods ◽

10.3390/foods10081794 ◽

2021 ◽

Vol 10 (8) ◽

pp. 1794

Author(s):

Elizabeth Sage Hunter ◽

Robert Literman ◽

Sara M. Handy

Keyword(s):

Single Nucleotide Polymorphisms ◽

Dietary Supplements ◽

Genome Sequence ◽

Sequence Data ◽

Whole Genome Sequence ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Genome Sequence Data ◽

Genus Level

The botanical genus Digitalis is equal parts colorful, toxic, and medicinal, and its bioactive compounds have a long history of therapeutic use. However, with an extremely narrow therapeutic range, even trace amounts of Digitalis can cause adverse effects. Using chemical methods, the United States Food and Drug Administration traced a 1997 case of Digitalis toxicity to a shipment of Plantago (a common ingredient in dietary supplements marketed to improve digestion) contaminated with Digitalis lanata. With increased accessibility to next generation sequencing technology, here we ask whether this case could have been cracked rapidly using shallow genome sequencing strategies (e.g., genome skims). Using a modified implementation of the Site Identification from Short Read Sequences (SISRS) bioinformatics pipeline with whole-genome sequence data, we generated over 2 M genus-level single nucleotide polymorphisms in addition to species-informative single nucleotide polymorphisms. We simulated dietary supplement contamination by spiking low quantities (0–10%) of Digitalis whole-genome sequence data into a background of commonly used ingredients in products marketed for “digestive cleansing” and reliably detected Digitalis at the genus level while also discriminating between Digitalis species. This work serves as a roadmap for the development of novel DNA-based assays to quickly and reliably detect the presence of toxic species such as Digitalis in food products or dietary supplements using genomic methods and highlights the power of harnessing the entire genome to identify botanical species.

Download Full-text

Impact of single nucleotide polymorphisms in HBB gene causing haemoglobinopathies: in silico analysis

New Biotechnology ◽

10.1016/j.nbt.2009.01.004 ◽

2009 ◽

Vol 25 (4) ◽

pp. 214-219 ◽

Cited By ~ 10

Author(s):

C. George Priya Doss ◽

Sethumadhavan Rao

Keyword(s):

Single Nucleotide Polymorphisms ◽

In Silico ◽

In Silico Analysis ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Silico Analysis

Download Full-text

Whole genome characterization of strains belonging to the Ralstonia solanacearum species complex and in silico analysis of TaqMan assays for detection in this heterogenous species complex

European Journal of Plant Pathology ◽

10.1007/s10658-020-02190-8 ◽

2021 ◽

Author(s):

Viola Kurm ◽

Ilse Houwers ◽

Claudia E. Coipan ◽

Peter Bonants ◽

Cees Waalwijk ◽

...

Keyword(s):

Ralstonia Solanacearum ◽

In Silico ◽

Species Complex ◽

Sequence Data ◽

In Silico Analysis ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genome Sequences ◽

Pcr Assays

AbstractIdentification and classification of members of the Ralstonia solanacearum species complex (RSSC) is challenging due to the heterogeneity of this complex. Whole genome sequence data of 225 strains were used to classify strains based on average nucleotide identity (ANI) and multilocus sequence analysis (MLSA). Based on the ANI score (>95%), 191 out of 192(99.5%) RSSC strains could be grouped into the three species R. solanacearum, R. pseudosolanacearum, and R. syzygii, and into the four phylotypes within the RSSC (I,II, III, and IV). R. solanacearum phylotype II could be split in two groups (IIA and IIB), from which IIB clustered in three subgroups (IIBa, IIBb and IIBc). This division by ANI was in accordance with MLSA. The IIB subgroups found by ANI and MLSA also differed in the number of SNPs in the primer and probe sites of various assays. An in-silico analysis of eight TaqMan and 11 conventional PCR assays was performed using the whole genome sequences. Based on this analysis several cases of potential false positives or false negatives can be expected upon the use of these assays for their intended target organisms. Two TaqMan assays and two PCR assays targeting the 16S rDNA sequence should be able to detect all phylotypes of the RSSC. We conclude that the increasing availability of whole genome sequences is not only useful for classification of strains, but also shows potential for selection and evaluation of clade specific nucleic acid-based amplification methods within the RSSC.

Download Full-text

In Silico Identification of Novel Acute Myeloid Leukemia Associated Missense SNPs in Human CEBPA Gene

Abasyn Journal Life Sciences ◽

10.34091/ajls.3.2.2 ◽

2020 ◽

pp. 10-24

Keyword(s):

Acute Myeloid Leukemia ◽

In Silico ◽

Myeloid Leukemia ◽

Gene Interaction ◽

Nucleotide Polymorphisms ◽

Future Perspectives ◽

Single Nucleotide ◽

In Silico Identification ◽

And Function ◽

Acute Myeloid

Single nucleotide polymorphisms (SNPs) in CEBPA gene have been found to be associated with cancer especially Acute Myeloid Leukemia (AML). Therefore, the identification of functional and structural polymorphisms in CEBPA is important to study and discover therapeutics targets and potential malfunctioning. For this purpose, several bioinformatics tools were used for the identification of disease-associated nsSNPs, which might be vital for the structure and function of CEBPA, making them extremely important. In silico tools used in this study included SIFT, PROVEAN, PolyPhen2, SNP&GO and PhD-SNP, followed by ConSurf and I-Mutant. Protein 3D modelling was carried out using I-TASSER and MODELLER v9.22, while GeneMANIA and string were used for the prediction of gene-gene interaction in this regard. From our study, we found that the L345P, R333C, R339Q, V328G, R327W, L317Q, N292S, E284A, R156W, Y108N and F82L mutations were the most crucial SNPs. Additionally, the gene-gene interaction showed the genes having correlation with CEBPA’s co-expressions and importance in several pathways. In future, these 11 mutations should be investigated while studying diseases related to CEBPA, especially for AML. Being the first of its kind, future perspectives are proposed in this study, which will help in precision medicine. Animal models are of great significance in finding out CEBPA effects in disease.

Download Full-text

In-Silico and In-vitro Analysis of Human SOS1 Protein Causing Noonan Syndrome – A Novel Approach to Explore the Molecular Pathways

Current Genomics ◽

10.2174/1389202922666211130144221 ◽

2021 ◽

Vol 22 ◽

Author(s):

Vinoth Sigamani ◽

Sheeja Rajasingh ◽

Narasimman Gurusamy ◽

Arunima Panda ◽

Johnson Rajasingh

Keyword(s):

Noonan Syndrome ◽

In Silico ◽

Genetic Disorder ◽

In Vitro Study ◽

Human Skin Fibroblast ◽

Nucleotide Polymorphisms ◽

Gene Expressions ◽

Single Nucleotide ◽

Nucleotide Mutation

Aims: Noonan syndrome (NS) is an autosomal dominant genetic disorder caused by single nucleotide mutation in PTPN11, SOS1, RAF1, and KRAS genes. Background: We hypothesize that in-silico analysis of human SOS1 mutations would be a promising predictor in identifying the pathogenic effect of NS. Methods: Here, we computationally analyzed the SOS1 gene to identify the pathogenic non-synonymous single nucleotide polymorphisms (nsSNPs) to cause NS. The variant information of SOS1 was collected from the SNP database (dbSNP). The variants were further analyzed by in-silico tools I-Mutant, iPTREE-STAB, and MutPred to elucidate their structural and functional characteristics. Results: We found that 11 nsSNPs of SOS1 were more pathogenic to cause NS. The 3D modeling of the wild-type and the 11 nsSNPs were performed using I-TASSER and validated via ERRAT and RAMPAGE. SOS1 interacting proteins were analysed through STRING, which showed that SOS1 interacted with cardiac proteins GATA4, TNNT2, and ACTN2. During these interactions, GRB2 and HRAS act as an intermediate molecules between SOS1 and cardiac proteins. These in-silico analyses were validated using induced cardiomyocytes (iCMCs) derived from NS patients carrying SOS1 gene variant c.1654A>G (NS-iCMCs) and compared with control human skin fibroblast-derived iCMCs (C-iCMCs). Our in vitro data further confirmed that the SOS1, GRB2 and HRAS gene expressions as well as the activated ERK protein, were significantly decreased in NS-iCMCs compared to C-iCMCs. Conclusion: This is the first in-silico and in vitro study demonstrating that 11 nsSNPs of SOS1 were playing a deleterious pathogenic role in causing NS.

Download Full-text

148 Multiple Dysregulated Novel Pathways and Genes in Aleutian Mink Disease Revealed by Selection Signatures and Gene Network Analyses Using Whole-genome Sequence Data

Journal of Animal Science ◽

10.1093/jas/skab235.137 ◽

2021 ◽

Vol 99 (Supplement_3) ◽

pp. 76-76

Author(s):

Seyed Milad Vahedi ◽

Karim Karimi ◽

Siavash Salek Ardestani ◽

Younes Miar

Keyword(s):

Sequence Data ◽

American Mink ◽

Enrichment Analysis ◽

Whole Genome Sequence ◽

Fixation Index ◽

Pathway Enrichment Analysis ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Network Analyses ◽

Genome Level

Abstract Aleutian disease (AD) is a chronic persistent infection in domestic mink caused by Aleutian mink disease virus (AMDV). Female mink’s fertility and pelt quality depression are the main reasons for the AD’s negative economic impacts on the mink industry. A total number of 79 American mink from the Canadian Center for Fur Animal Research at Dalhousie University (Truro, NS, Canada) were classified based on the results of counter immunoelectrophoresis (CIEP) tests into two groups of positive (n = 48) and negative (n = 31). Whole-genome sequences comprising 4,176 scaffolds and 8,039,737 single nucleotide polymorphisms (SNPs) were used to trace the selection footprints for response to AMDV infection at the genome level. Window-based fixation index (Fst) and nucleotide diversity (θπ) statistics were estimated to compare positive and negative animals’ genomes. The overlapped top 1% genomic windows between two statistics were considered as potential regions underlying selection pressures. A total of 98 genomic regions harboring 33 candidate genes were detected as selective signals. Most of the identified genes were involved in the development and functions of immune system (PPP3CA, SMAP2, TNFRSF21, SKIL, and AKIRIN2), musculoskeletal system (COL9A2, PPP1R9A, ANK2, AKAP9, and STRIT1), nervous system (ASCL1, ZFP69B, SLC25A27, MCF2, and SLC7A14), reproductive system (CAMK2D, GJB7, SSMEM1, C6orf163), liver (PAH and DPYD), and lung (SLC35A1). Gene-expression network analysis showed the interactions among 27 identified genes. Moreover, pathway enrichment analysis of the constructed genes network revealed significant oxytocin (KEGG: hsa04921) and GnRH signaling (KEGG: hsa04912) pathways, which are likely to be impaired by AMDV leading to dams’ fecundity reduction. These results provided a perspective to the genetic architecture of response to AD in American mink and novel insight into the pathogenesis of AMDV.

Download Full-text

Characterization and risk association of polymorphisms in Aurora kinases A, B and C with genetic susceptibility to gastric cancer development

BMC Cancer ◽

10.1186/s12885-019-6133-z ◽

2019 ◽

Vol 19 (1) ◽

Cited By ~ 1

Author(s):

Aner Mesic ◽

Marija Rogar ◽

Petra Hudler ◽

Nurija Bilalovic ◽

Izet Eminovic ◽

...

Keyword(s):

Gastric Cancer ◽

Transcription Factors ◽

In Silico ◽

In Silico Analysis ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Mitotic Kinases ◽

Genes Encoding ◽

Silico Analysis ◽

Control Study

Abstract Background Single nucleotide polymorphisms (SNPs) in genes encoding mitotic kinases could influence development and progression of gastric cancer (GC). Methods Case-control study of nine SNPs in mitotic genes was conducted using qPCR. The study included 116 GC patients and 203 controls. In silico analysis was performed to evaluate the effects of polymorphisms on transcription factors binding sites. Results The AURKA rs1047972 genotypes (CT vs. CC: OR, 1.96; 95% CI, 1.05–3.65; p = 0.033; CC + TT vs. CT: OR, 1.94; 95% CI, 1.04–3.60; p = 0.036) and rs911160 (CC vs. GG: OR, 5.56; 95% CI, 1.24–24.81; p = 0.025; GG + CG vs. CC: OR, 5.26; 95% CI, 1.19–23.22; p = 0.028), were associated with increased GC risk, whereas certain rs8173 genotypes (CG vs. CC: OR, 0.60; 95% CI, 0.36–0.99; p = 0.049; GG vs. CC: OR, 0.38; 95% CI, 0.18–0.79; p = 0.010; CC + CG vs. GG: OR, 0.49; 95% CI, 0.25–0.98; p = 0.043) were protective. Association with increased GC risk was demonstrated for AURKB rs2241909 (GG + AG vs. AA: OR, 1.61; 95% CI, 1.01–2.56; p = 0.041) and rs2289590 (AC vs. AA: OR, 2.41; 95% CI, 1.47–3.98; p = 0.001; CC vs. AA: OR, 6.77; 95% CI, 2.24–20.47; p = 0.001; AA+AC vs. CC: OR, 4.23; 95% CI, 1.44–12.40; p = 0.009). Furthermore, AURKC rs11084490 (GG + CG vs. CC: OR, 1.71; 95% CI, 1.04–2.81; p = 0.033) was associated with increased GC risk. A combined analysis of five SNPs, associated with an increased GC risk, detected polymorphism profiles where all the combinations contribute to the higher GC risk, with an OR increased 1.51-fold for the rs1047972(CT)/rs11084490(CG + GG) to 2.29-fold for the rs1047972(CT)/rs911160(CC) combinations. In silico analysis for rs911160 and rs2289590 demonstrated that different transcription factors preferentially bind to polymorphic sites, indicating that AURKA and AURKB could be regulated differently depending on the presence of particular allele. Conclusions Our results revealed that AURKA (rs1047972 and rs911160), AURKB (rs2241909 and rs2289590) and AURKC (rs11084490) are associated with a higher risk of GC susceptibility. Our findings also showed that the combined effect of these SNPs may influence GC risk, thus indicating the significance of assessing multiple polymorphisms, jointly. The study was conducted on a less numerous but ethnically homogeneous Bosnian population, therefore further investigations in larger and multiethnic groups and the assessment of functional impact of the results are needed to strengthen the findings.

Download Full-text