oligonucleotide composition Latest Research Papers

Time-series trend of pandemic SARS-CoV-2 variants visualized using batch-learning self-organizing map for oligonucleotide compositions

10.1101/2021.04.15.439956 ◽

2021 ◽

Author(s):

Takashi Abe ◽

Ryuki Furukawa ◽

Yuki Iwasaki ◽

Toshimichi Ikemura

Keyword(s):

Time Series ◽

New Technologies ◽

Sequence Data ◽

Geographic Region ◽

Time Dependent ◽

Self Organizing Map ◽

Population Frequency ◽

Batch Learning ◽

Oligonucleotide Composition ◽

Self Organizing

To confront the global threat of coronavirus disease 2019, a massive number of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome sequences have been decoded, with the results promptly released through the GISAID database. Based on variant types, eight clades have already been defined in GISAID, but the diversity can be far greater. Owing to the explosive increase in available sequences, it is important to develop new technologies that can easily grasp the whole picture of the big-sequence data and support efficient knowledge discovery. An ability to efficiently clarify the detailed time-series changes in genome-wide mutation patterns will enable us to promptly identify and characterize dangerous variants that rapidly increase their population frequency. Here, we collectively analyzed over 150,000 SARS-CoV-2 genomes to understand their overall features and time-dependent changes using a batch-learning self-organizing map (BLSOM) for oligonucleotide composition, which is an unsupervised machine learning method. BLSOM can separate clades defined by GISAID with high precision, and each clade is subdivided into clusters, which shows a differential increase/decrease pattern based on geographic region and time. This allowed us to identify prevalent strains in each region and to show the commonality and diversity of the prevalent strains. Comprehensive characterization of the oligonucleotide composition of SARS-CoV-2 and elucidation of time-series trends of the population frequency of variants can clarify the viral adaptation processes after invasion into the human population and the time-dependent trend of prevalent epidemic strains across various regions, such as continents.

Download Full-text

Unsupervised explainable artificial intelligence for molecular evolutionary studies of over forty thousand SARS-CoV-2 genomes

10.21203/rs.3.rs-106139/v1 ◽

2020 ◽

Author(s):

Toshimichi Ikemura ◽

Kennosuke Wada ◽

Yoshiko Wada ◽

Yuki Iwasaki ◽

Takashi Abe

Keyword(s):

Public Health ◽

Artificial Intelligence ◽

Big Data ◽

Knowledge Discovery ◽

Prior Knowledge ◽

Genome Sequence ◽

Evolutionary Studies ◽

Explainable Artificial Intelligence ◽

Explainable Ai ◽

Oligonucleotide Composition

Abstract Unsupervised AI (artificial intelligence) can obtain novel knowledge from big data without particular models or prior knowledge and is highly desirable for unveiling hidden features in big data. SARS-CoV-2 poses a serious threat to public health and one important issue in characterizing this fast-evolving virus is to elucidate various aspects of their genome sequence changes. We previously established unsupervised AI, a BLSOM (batch-learning SOM), which can analyze five million genomic sequences simultaneously. The present study applied the BLSOM to the oligonucleotide compositions of forty thousand SARS-CoV-2 genomes. While only the oligonucleotide composition was given, the obtained clusters of genomes corresponded primarily to known main clades and internal divisions in the main clades. Since the BLSOM is explainable AI, it reveals which features of the oligonucleotide composition are responsible for clade clustering. The BLSOM has powerful image display capabilities and enables efficient knowledge discovery about viral evolutionary processes.

Download Full-text

Unsupervised explainable AI for molecular evolutionary study of forty thousand SARS-CoV-2 genomes

10.21203/rs.3.rs-91227/v1 ◽

2020 ◽

Author(s):

Toshimichi Ikemura ◽

Kennosuke Wada ◽

Yoshiko Wada ◽

Yuki Iwasaki ◽

Takashi Abe

Keyword(s):

Public Health ◽

Artificial Intelligence ◽

Big Data ◽

Knowledge Discovery ◽

Prior Knowledge ◽

Genome Sequence ◽

Image Display ◽

Genomic Sequences ◽

Explainable Ai ◽

Oligonucleotide Composition

Abstract Unsupervised AI (artificial intelligence) can obtain novel knowledge from big data without particular models or prior knowledge and is highly desirable for unveiling hidden features in big data. SARS-CoV-2 poses a serious threat to public health and one important issue in characterizing this fast-evolving virus is to elucidate various aspects of their genome sequence changes. We previously established unsupervised AI, a BLSOM (batch-learning SOM), which can analyze five million genomic sequences simultaneously. The present study applied the BLSOM to the oligonucleotide compositions of forty thousand SARS-CoV-2 genomes. While only the oligonucleotide composition was given, the obtained clusters of genomes corresponded primarily to known main clades and internal divisions in the main clades. Since the BLSOM is explainable AI, it reveals which features of the oligonucleotide composition are responsible for clade clustering. The BLSOM has powerful image display capabilities and enables efficient knowledge discovery about viral evolutionary processes.

Download Full-text

Unsupervised explainable AI for simultaneous molecular evolutionary study of forty thousand SARS-CoV-2 genomes

10.1101/2020.10.11.335406 ◽

2020 ◽

Author(s):

Toshimichi Ikemura ◽

Kennosuke Wada ◽

Yoshiko Wada ◽

Yuki Iwasaki ◽

Takashi Abe

Keyword(s):

Public Health ◽

Artificial Intelligence ◽

Big Data ◽

Knowledge Discovery ◽

Prior Knowledge ◽

Genome Sequence ◽

Image Display ◽

Genomic Sequences ◽

Explainable Ai ◽

Oligonucleotide Composition

AbstractUnsupervised AI (artificial intelligence) can obtain novel knowledge from big data without particular models or prior knowledge and is highly desirable for unveiling hidden features in big data. SARS-CoV-2 poses a serious threat to public health and one important issue in characterizing this fast-evolving virus is to elucidate various aspects of their genome sequence changes. We previously established unsupervised AI, a BLSOM (batch-learning SOM), which can analyze five million genomic sequences simultaneously. The present study applied the BLSOM to the oligonucleotide compositions of forty thousand SARS-CoV-2 genomes. While only the oligonucleotide composition was given, the obtained clusters of genomes corresponded primarily to known main clades and internal divisions in the main clades. Since the BLSOM is explainable AI, it reveals which features of the oligonucleotide composition are responsible for clade clustering. The BLSOM has powerful image display capabilities and enables efficient knowledge discovery about viral evolutionary processes.

Download Full-text

MetaBCC-LR: metagenomics binning by coverage and composition for long reads

Bioinformatics ◽

10.1093/bioinformatics/btaa441 ◽

2020 ◽

Vol 36 (Supplement_1) ◽

pp. i3-i11

Author(s):

Anuradha Wickramarachchi ◽

Vijini Mallawaarachchi ◽

Vaibhav Rajan ◽

Yu Lin

Keyword(s):

Error Rates ◽

Supplementary Information ◽

Metagenomic Data ◽

Sequencing Technologies ◽

Input Size ◽

Long Reads ◽

Wide Range ◽

Long Read ◽

Oligonucleotide Composition ◽

Species Specific

Abstract Motivation Metagenomics studies have provided key insights into the composition and structure of microbial communities found in different environments. Among the techniques used to analyse metagenomic data, binning is considered a crucial step to characterize the different species of micro-organisms present. The use of short-read data in most binning tools poses several limitations, such as insufficient species-specific signal, and the emergence of long-read sequencing technologies offers us opportunities to surmount them. However, most current metagenomic binning tools have been developed for short reads. The few tools that can process long reads either do not scale with increasing input size or require a database with reference genomes that are often unknown. In this article, we present MetaBCC-LR, a scalable reference-free binning method which clusters long reads directly based on their k-mer coverage histograms and oligonucleotide composition. Results We evaluate MetaBCC-LR on multiple simulated and real metagenomic long-read datasets with varying coverages and error rates. Our experiments demonstrate that MetaBCC-LR substantially outperforms state-of-the-art reference-free binning tools, achieving ∼13% improvement in F1-score and ∼30% improvement in ARI compared to the best previous tools. Moreover, we show that using MetaBCC-LR before long-read assembly helps to enhance the assembly quality while significantly reducing the assembly cost in terms of time and memory usage. The efficiency and accuracy of MetaBCC-LR pave the way for more effective long-read-based metagenomics analyses to support a wide range of applications. Availability and implementation The source code is freely available at: https://github.com/anuradhawick/MetaBCC-LR. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Determination of Plasmid pSN1216-29 Host Range and the Similarity in Oligonucleotide Composition Between Plasmid and Host Chromosomes

Frontiers in Microbiology ◽

10.3389/fmicb.2020.01187 ◽

2020 ◽

Vol 11 ◽

Author(s):

Maho Tokuda ◽

Haruo Suzuki ◽

Kosuke Yanagiya ◽

Masahiro Yuki ◽

Kengo Inoue ◽

...

Keyword(s):

Host Range ◽

Oligonucleotide Composition

Download Full-text

IslandCafe: Compositional Anomaly and Feature Enrichment Assessment for Delineation of Genomic Islands

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400562 ◽

2019 ◽

Vol 9 (10) ◽

pp. 3273-3285 ◽

Cited By ~ 2

Author(s):

Mehul Jani ◽

Rajeev K. Azad

Keyword(s):

Genomic Island ◽

Bacterial Species ◽

Bacterial Genome ◽

Genomic Islands ◽

Biological Factors ◽

Phyletic Pattern ◽

Evolutionary Forces ◽

Synthetic Test ◽

Oligonucleotide Composition ◽

Horizontal Acquisition

One of the evolutionary forces driving bacterial genome evolution is the acquisition of clusters of genes through horizontal gene transfer (HGT). These genomic islands may confer adaptive advantages to the recipient bacteria, such as, the ability to thwart antibiotics, become virulent or hypervirulent, or acquire novel metabolic traits. Methods for detecting genomic islands either search for markers or features typical of islands or examine anomaly in oligonucleotide composition against the genome background. The former tends to underestimate, missing islands that have the markers either lost or degraded, while the latter tends to overestimate, due to their inability to discriminate compositional atypicality arising because of HGT from those that are a consequence of other biological factors. We propose here a framework that exploits the strengths of both these approaches while bypassing the pitfalls of either. Genomic islands lacking markers are identified by their association with genomic islands with markers. This was made possible by performing marker enrichment and phyletic pattern analyses within an integrated framework of recursive segmentation and clustering. The proposed method, IslandCafe, compared favorably with frequently used methods for genomic island detection on synthetic test datasets and on a test-set of known islands from 15 well-characterized bacterial species. Furthermore, IslandCafe identified novel islands with imprints of likely horizontal acquisition.

Download Full-text

Manipulating Immune Activation of Macrophages by Tuning the Oligonucleotide Composition of Gold Nanoparticles

Bioconjugate Chemistry ◽

10.1021/acs.bioconjchem.9b00316 ◽

2019 ◽

Vol 30 (7) ◽

pp. 2032-2037 ◽

Cited By ~ 16

Author(s):

Roger M. Pallares ◽

Priscilla Choo ◽

Lisa E. Cole ◽

Chad A. Mirkin ◽

Andrew Lee ◽

...

Keyword(s):

Gold Nanoparticles ◽

Immune Activation ◽

Oligonucleotide Composition

Download Full-text

Grid-Assembly: An oligonucleotide composition-based partitioning strategy to aid metagenomic sequence assembly

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720015410048 ◽

2015 ◽

Vol 13 (03) ◽

pp. 1541004 ◽

Cited By ~ 1

Author(s):

Tarini Shankar Ghosh ◽

Varun Mehra ◽

Sharmila S. Mande

Keyword(s):

Genome Assembly ◽

Three Dimensional ◽

Metagenomic Sequence ◽

Overlapping Grids ◽

3D Space ◽

Assembly Method ◽

Single Genome ◽

Usage Patterns ◽

Oligonucleotide Composition

Metagenomics approach involves extraction, sequencing and characterization of the genomic content of entire community of microbes present in a given environment. In contrast to genomic data, accurate assembly of metagenomic sequences is a challenging task. Given the huge volume and the diverse taxonomic origin of metagenomic sequences, direct application of single genome assembly methods on metagenomes are likely to not only lead to an immense increase in requirements of computational infrastructure, but also result in the formation of chimeric contigs. A strategy to address the above challenge would be to partition metagenomic sequence datasets into clusters and assemble separately the sequences in individual clusters using any single-genome assembly method. The current study presents such an approach that uses tetranucleotide usage patterns to first represent sequences as points in a three dimensional (3D) space. The 3D space is subsequently partitioned into "Grids". Sequences within overlapping grids are then progressively assembled using any available assembler. We demonstrate the applicability of the current Grid-Assembly method using various categories of assemblers as well as different simulated metagenomic datasets. Validation results indicate that the Grid-Assembly approach helps in improving the overall quality of assembly, in terms of the purity and volume of the assembled contigs.

Download Full-text

Development of Self-Compressing BLSOM for Comprehensive Analysis of Big Sequence Data

BioMed Research International ◽

10.1155/2015/506052 ◽

2015 ◽

Vol 2015 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Akihito Kikuchi ◽

Toshimichi Ikemura ◽

Takashi Abe

Keyword(s):

High Performance ◽

Large Scale ◽

Genomic Sequence ◽

Sequence Data ◽

Bacterial Genome ◽

Computation Time ◽

Comprehensive Analysis ◽

Self Organizing Map ◽

Genome Sequences ◽

Oligonucleotide Composition

With the remarkable increase in genomic sequence data from various organisms, novel tools are needed for comprehensive analyses of available big sequence data. We previously developed a Batch-Learning Self-Organizing Map (BLSOM), which can cluster genomic fragment sequences according to phylotype solely dependent on oligonucleotide composition and applied to genome and metagenomic studies. BLSOM is suitable for high-performance parallel-computing and can analyze big data simultaneously, but a large-scale BLSOM needs a large computational resource. We have developed Self-Compressing BLSOM (SC-BLSOM) for reduction of computation time, which allows us to carry out comprehensive analysis of big sequence data without the use of high-performance supercomputers. The strategy of SC-BLSOM is to hierarchically construct BLSOMs according to data class, such as phylotype. The first-layer BLSOM was constructed with each of the divided input data pieces that represents the data subclass, such as phylotype division, resulting in compression of the number of data pieces. The second BLSOM was constructed with a total of weight vectors obtained in the first-layer BLSOMs. We compared SC-BLSOM with the conventional BLSOM by analyzing bacterial genome sequences. SC-BLSOM could be constructed faster than BLSOM and cluster the sequences according to phylotype with high accuracy, showing the method’s suitability for efficient knowledge discovery from big sequence data.

Download Full-text

oligonucleotide composition
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Time-series trend of pandemic SARS-CoV-2 variants visualized using batch-learning self-organizing map for oligonucleotide compositions

Unsupervised explainable artificial intelligence for molecular evolutionary studies of over forty thousand SARS-CoV-2 genomes

Unsupervised explainable AI for molecular evolutionary study of forty thousand SARS-CoV-2 genomes

Unsupervised explainable AI for simultaneous molecular evolutionary study of forty thousand SARS-CoV-2 genomes

MetaBCC-LR: metagenomics binning by coverage and composition for long reads

Determination of Plasmid pSN1216-29 Host Range and the Similarity in Oligonucleotide Composition Between Plasmid and Host Chromosomes

IslandCafe: Compositional Anomaly and Feature Enrichment Assessment for Delineation of Genomic Islands

Manipulating Immune Activation of Macrophages by Tuning the Oligonucleotide Composition of Gold Nanoparticles

Grid-Assembly: An oligonucleotide composition-based partitioning strategy to aid metagenomic sequence assembly

Development of Self-Compressing BLSOM for Comprehensive Analysis of Big Sequence Data

Export Citation Format

oligonucleotide compositionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Time-series trend of pandemic SARS-CoV-2 variants visualized using batch-learning self-organizing map for oligonucleotide compositions

Unsupervised explainable artificial intelligence for molecular evolutionary studies of over forty thousand SARS-CoV-2 genomes

Unsupervised explainable AI for molecular evolutionary study of forty thousand SARS-CoV-2 genomes

Unsupervised explainable AI for simultaneous molecular evolutionary study of forty thousand SARS-CoV-2 genomes

MetaBCC-LR: metagenomics binning by coverage and composition for long reads

Determination of Plasmid pSN1216-29 Host Range and the Similarity in Oligonucleotide Composition Between Plasmid and Host Chromosomes

IslandCafe: Compositional Anomaly and Feature Enrichment Assessment for Delineation of Genomic Islands

Manipulating Immune Activation of Macrophages by Tuning the Oligonucleotide Composition of Gold Nanoparticles

Grid-Assembly: An oligonucleotide composition-based partitioning strategy to aid metagenomic sequence assembly

Development of Self-Compressing BLSOM for Comprehensive Analysis of Big Sequence Data

oligonucleotide composition
Recently Published Documents