scholarly journals A Novel Bioinformatics Method for Efficient Knowledge Discovery by BLSOM from Big Genomic Sequence Data

2014 ◽  
Vol 2014 ◽  
pp. 1-11
Author(s):  
Yu Bai ◽  
Yuki Iwasaki ◽  
Shigehiko Kanaya ◽  
Yue Zhao ◽  
Toshimichi Ikemura

With remarkable increase of genomic sequence data of a wide range of species, novel tools are needed for comprehensive analyses of the big sequence data. Self-Organizing Map (SOM) is an effective tool for clustering and visualizing high-dimensional data such as oligonucleotide composition on one map. By modifying the conventional SOM, we have previously developed Batch-Learning SOM (BLSOM), which allows classification of sequence fragments according to species, solely depending on the oligonucleotide composition. In the present study, we introduce the oligonucleotide BLSOM used for characterization of vertebrate genome sequences. We first analyzed pentanucleotide compositions in 100 kb sequences derived from a wide range of vertebrate genomes and then the compositions in the human and mouse genomes in order to investigate an efficient method for detecting differences between the closely related genomes. BLSOM can recognize the species-specific key combination of oligonucleotide frequencies in each genome, which is called a “genome signature,” and the specific regions specifically enriched in transcription-factor-binding sequences. Because the classification and visualization power is very high, BLSOM is an efficient powerful tool for extracting a wide range of information from massive amounts of genomic sequences (i.e., big sequence data).

2015 ◽  
Vol 2015 ◽  
pp. 1-8 ◽  
Author(s):  
Akihito Kikuchi ◽  
Toshimichi Ikemura ◽  
Takashi Abe

With the remarkable increase in genomic sequence data from various organisms, novel tools are needed for comprehensive analyses of available big sequence data. We previously developed a Batch-Learning Self-Organizing Map (BLSOM), which can cluster genomic fragment sequences according to phylotype solely dependent on oligonucleotide composition and applied to genome and metagenomic studies. BLSOM is suitable for high-performance parallel-computing and can analyze big data simultaneously, but a large-scale BLSOM needs a large computational resource. We have developed Self-Compressing BLSOM (SC-BLSOM) for reduction of computation time, which allows us to carry out comprehensive analysis of big sequence data without the use of high-performance supercomputers. The strategy of SC-BLSOM is to hierarchically construct BLSOMs according to data class, such as phylotype. The first-layer BLSOM was constructed with each of the divided input data pieces that represents the data subclass, such as phylotype division, resulting in compression of the number of data pieces. The second BLSOM was constructed with a total of weight vectors obtained in the first-layer BLSOMs. We compared SC-BLSOM with the conventional BLSOM by analyzing bacterial genome sequences. SC-BLSOM could be constructed faster than BLSOM and cluster the sequences according to phylotype with high accuracy, showing the method’s suitability for efficient knowledge discovery from big sequence data.


2021 ◽  
Vol 20 (7) ◽  
pp. 911-927
Author(s):  
Lucia Muggia ◽  
Yu Quan ◽  
Cécile Gueidan ◽  
Abdullah M. S. Al-Hatmi ◽  
Martin Grube ◽  
...  

AbstractLichen thalli provide a long-lived and stable habitat for colonization by a wide range of microorganisms. Increased interest in these lichen-associated microbial communities has revealed an impressive diversity of fungi, including several novel lineages which still await formal taxonomic recognition. Among these, members of the Eurotiomycetes and Dothideomycetes usually occur asymptomatically in the lichen thalli, even if they share ancestry with fungi that may be parasitic on their host. Mycelia of the isolates are characterized by melanized cell walls and the fungi display exclusively asexual propagation. Their taxonomic placement requires, therefore, the use of DNA sequence data. Here, we consider recently published sequence data from lichen-associated fungi and characterize and formally describe two new, individually monophyletic lineages at family, genus, and species levels. The Pleostigmataceae fam. nov. and Melanina gen. nov. both comprise rock-inhabiting fungi that associate with epilithic, crust-forming lichens in subalpine habitats. The phylogenetic placement and the monophyly of Pleostigmataceae lack statistical support, but the family was resolved as sister to the order Verrucariales. This family comprises the species Pleostigma alpinum sp. nov., P. frigidum sp. nov., P. jungermannicola, and P. lichenophilum sp. nov. The placement of the genus Melanina is supported as a lineage within the Chaetothyriales. To date, this genus comprises the single species M. gunde-cimermaniae sp. nov. and forms a sister group to a large lineage including Herpotrichiellaceae, Chaetothyriaceae, Cyphellophoraceae, and Trichomeriaceae. The new phylogenetic analysis of the subclass Chaetothyiomycetidae provides new insight into genus and family level delimitation and classification of this ecologically diverse group of fungi.


2022 ◽  
pp. 096703352110618
Author(s):  
Orlando CH Tavares ◽  
Tiago R Tavares ◽  
Carlos R Pinheiro Junior ◽  
Luciélio M da Silva ◽  
Paulo GS Wadt ◽  
...  

The southwestern region of the Amazon has great environmental variability, presents a great complexity of pedoenvironments due to its rich variability of geological and geomorphological environments, as well as for being a transition region with other two Brazilian biomes. In this study, the use of pedometric tools (the Algorithms for Quantitative Pedology (AQP) R package and diffuse reflectance spectroscopy) was evaluated for the characterization of 15 soil profiles in southwestern Amazon. The AQP statistical package—which evaluates the soil in-depth based on slicing functions—indicated a wide range of variation in soil attributes, especially in the superficial horizons. In addition, the results obtained in the similarity analysis corroborated with the description of physical, chemical components and oxide contents in-depth, aiding the classification of soil profiles. The in-depth characterization of visible-near infrared spectra allowed inference of the pedogenetic processes of some profiles, setting precedents for future work aiming to establish analytical strategies for soil classification in southwestern Amazon based on spectral data.


mSphere ◽  
2018 ◽  
Vol 3 (2) ◽  
Author(s):  
Xuan Qin ◽  
Chuan Zhou ◽  
Danielle M. Zerr ◽  
Amanda Adler ◽  
Amin Addetia ◽  
...  

ABSTRACTClinical isolates ofPseudomonas aeruginosafrom patients with cystic fibrosis (CF) are known to differ from those associated with non-CF hosts by colony morphology, drug susceptibility patterns, and genomic hypermutability.Pseudomonas aeruginosaisolates from CF patients have long been recognized for their overall reduced rate of antimicrobial susceptibility, but their intraclonal MIC heterogeneity has long been overlooked. Using two distinct cohorts of clinical strains (n= 224 from 56 CF patients,n= 130 from 68 non-CF patients) isolated in 2013, we demonstrated profound Etest MIC heterogeneity in CFP. aeruginosaisolates in comparison to non-CFP. aeruginosaisolates. On the basis of whole-genome sequencing of 19 CFP. aeruginosaisolates from 9 patients with heterogeneous MICs, the core genome phylogenetic tree confirmed the within-patient CFP. aeruginosaclonal lineage along with considerable coding sequence variability. No extrachromosomal DNA elements or previously characterized antibiotic resistance mutations could account for the wide divergence in antimicrobial MICs betweenP. aeruginosacoisolates, though many heterogeneous mutations in efflux and porin genes and their regulators were present. A unique OprD sequence was conserved among the majority of isolates of CFP. aeruginosaanalyzed, suggesting a pseudomonal response to selective pressure that is common to the isolates. Genomic sequence data also suggested that CF pseudomonal hypermutability was not entirely due to mutations inmutL,mutS, anduvr. We conclude that the net effect of hundreds of adaptive mutations, both shared between clonally related isolate pairs and unshared, accounts for their highly heterogeneous MIC variances. We hypothesize that this heterogeneity is indicative of the pseudomonal syntrophic-like lifestyle under conditions of being “locked” inside a host focal airway environment for prolonged periods.IMPORTANCEPatients with cystic fibrosis endure “chronic focal infections” with a variety of microorganisms. One microorganism,Pseudomonas aeruginosa, adapts to the host and develops resistance to a wide range of antimicrobials. Interestingly, as the infection progresses, multiple isogenic strains ofP. aeruginosaemerge and coexist within the airways of these patients. Despite a common parental origin, the multiple strains ofP. aeruginosadevelop vastly different susceptibility patterns to actively used antimicrobial agents—a phenomenon we define as “heterogeneous MICs.” By sequencing pairs ofP. aeruginosaisolates displaying heterogeneous MICs, we observed widespread isogenic gene lesions in drug transporters, DNA mismatch repair machinery, and many other structural or cellular functions. Coupled with the heterogeneous MICs, these genetic lesions demonstrated a symbiotic response to host selection and suggested evolution of a multicellular syntrophic bacterial lifestyle. Current laboratory standard interpretive criteria do not address the emergence of heterogeneous growth and susceptibilitiesin vitrowith treatment implications.


2017 ◽  
Vol 107 (5) ◽  
pp. 519-527 ◽  
Author(s):  
Paul A. Langlois ◽  
Jacob Snelling ◽  
John P. Hamilton ◽  
Claude Bragard ◽  
Ralf Koebnik ◽  
...  

Prevalence of Xanthomonas translucens, which causes cereal leaf streak (CLS) in cereal crops and bacterial wilt in forage and turfgrass species, has increased in many regions in recent years. Because the pathogen is seedborne in economically important cereals, it is a concern for international and interstate germplasm exchange and, thus, reliable and robust protocols for its detection in seed are needed. However, historical confusion surrounding the taxonomy within the species has complicated the development of accurate and reliable diagnostic tools for X. translucens. Therefore, we sequenced genomes of 15 X. translucens strains representing six different pathovars and compared them with additional publicly available X. translucens genome sequences to obtain a genome-based phylogeny for robust classification of this species. Our results reveal three main clusters: one consisting of pv. cerealis, one consisting of pvs. undulosa and translucens, and a third consisting of pvs. arrhenatheri, graminis, phlei, and poae. Based on genomic differences, diagnostic loop-mediated isothermal amplification (LAMP) primers were developed that clearly distinguish strains that cause disease on cereals, such as pvs. undulosa, translucens, hordei, and secalis, from strains that cause disease on noncereal hosts, such as pvs. arrhenatheri, cerealis, graminis, phlei, and poae. Additional LAMP assays were developed that selectively amplify strains belonging to pvs. cerealis and poae, distinguishing them from other pathovars. These primers will be instrumental in diagnostics when implementing quarantine regulations to limit further geographic spread of X. translucens pathovars.


2009 ◽  
Vol 21 (3) ◽  
pp. 306-320 ◽  
Author(s):  
E. Scott Weber ◽  
Thomas B. Waltzek ◽  
Devon A. Young ◽  
Erica L. Twitchell ◽  
Amy E. Gates ◽  
...  

Iridoviruses infect food and ornamental fish species from a wide range of freshwater to marine habitats across the globe. The objective of the current study was to characterize an iridovirus causing systemic infection of wild-caught Pterapogon kauderni Koumans 1933 (Banggai cardinalfish). Freshly frozen and fixed specimens were processed for histopathologic evaluation, transmission electron microscopic examination, virus culture, molecular virologic testing, microbiology, and in situ hybridization (ISH) using riboprobes. Basophilic granular cytoplasmic inclusions were identified in cytomegalic cells often found beneath endothelium, and hexagonal virus particles typical of iridovirus were identified in the cytoplasm of enlarged cells by transmission electron microscopy. Attempts at virus isolation in cell culture were unsuccessful; however, polymerase chain reaction (PCR)-based molecular testing resulted in amplification and sequencing of regions of the DNA polymerase and major capsid protein genes, along with the full-length ATPase gene of the putative iridovirus. Virus gene sequences were then used to infer phylogenetic relationships of the P. kauderni agent to other known systemic iridoviruses from fishes. Riboprobes, which were transcribed from a cloned PCR amplification product from the viral genome generated hybridization signals from inclusions within cytomegalic cells in histologic sections tested in ISH experiments. To the authors' knowledge, this is the first report of a systemic iridovirus from P. kauderni. The pathologic changes induced and the genomic sequence data confirm placement of the Banggai cardinalfish iridovirus in the genus Megalocytivirus family Iridoviridae. The ISH provides an additional molecular diagnostic technique for confirmation of presumptive infections detected in histologic sections from infected fish.


Author(s):  
Gurjit S. Randhawa ◽  
Maximillian P.M. Soltysiak ◽  
Hadi El Roz ◽  
Camila P.E. de Souza ◽  
Kathleen A. Hill ◽  
...  

AbstractAs of February 20, 2020, the 2019 novel coronavirus (renamed to COVID-19) spread to 30 countries with 2130 deaths and more than 75500 confirmed cases. COVID-19 is being compared to the infamous SARS coronavirus, which resulted, between November 2002 and July 2003, in 8098 confirmed cases worldwide with a 9.6% death rate and 774 deaths. Though COVID-19 has a death rate of 2.8% as of 20 February, the 75752 confirmed cases in a few weeks (December 8, 2019 to February 20, 2020) are alarming, with cases likely being under-reported given the comparatively longer incubation period. Such outbreaks demand elucidation of taxonomic classification and origin of the virus genomic sequence, for strategic planning, containment, and treatment. This paper identifies an intrinsic COVID-19 genomic signature and uses it together with a machine learning-based alignment-free approach for an ultra-fast, scalable, and highly accurate classification of whole COVID-19 genomes. The proposed method combines supervised machine learning with digital signal processing for genome analyses, augmented by a decision tree approach to the machine learning component, and a Spearman’s rank correlation coefficient analysis for result validation. These tools are used to analyze a large dataset of over 5000 unique viral genomic sequences, totalling 61.8 million bp. Our results support a hypothesis of a bat origin and classify COVID-19 as Sarbecovirus, within Betacoronavirus. Our method achieves high levels of classification accuracy and discovers the most relevant relationships among over 5,000 viral genomes within a few minutes, ab initio, using raw DNA sequence data alone, and without any specialized biological knowledge, training, gene or genome annotations. This suggests that, for novel viral and pathogen genome sequences, this alignment-free whole-genome machine-learning approach can provide a reliable real-time option for taxonomic classification.


Author(s):  
Maodong Zhang ◽  
Yanyun Huang ◽  
Dale L. Godson ◽  
Champika Fernando ◽  
Trevor W. Alexander ◽  
...  

AbstractHigh throughput sequencing is currently revolutionizing the genomics field and providing new approaches to the detection and characterization of microorganisms. The objective of this study was to assess the detection of influenza D virus (IDV) in bovine respiratory tract samples using two sequencing platforms (MiSeq and Nanopore (GridION)), and species-specific qPCR. An IDV-specific qPCR was performed on 232 samples (116 nasal swabs and 116 tracheal washes) that had been previously subject to virome sequencing using MiSeq. Nanopore sequencing was performed on 19 samples positive for IDV by either MiSeq or qPCR. Nanopore sequence data was analyzed by two bioinformatics methods: What’s In My Pot (WIMP, on the EPI2ME platform), and an in-house developed analysis pipeline. The agreement of IDV detection between qPCR and MiSeq was 82.3%, between qPCR and Nanopore was 57.9% (in-house) and 84.2% (WIMP), and between MiSeq and Nanopore was 89.5% (in-house) and 73.7% (WIMP). IDV was detected by MiSeq in 14 of 17 IDV qPCR-positive samples with Cq (cycle quantification) values below 31, despite multiplexing 50 samples for sequencing. When qPCR was regarded as the gold standard, the sensitivity and specificity of MiSeq sequence detection were 28.3% and 98.9%, respectively. We conclude that both MiSeq and Nanopore sequencing are capable of detecting IDV in clinical specimens with a range of Cq values. Sensitivity may be further improved by optimizing sequence data analysis, improving virus enrichment, or reducing the degree of multiplexing.


Sign in / Sign up

Export Citation Format

Share Document