manta - a clustering algorithm for weighted ecological networks

Mapping Intimacies ◽

10.1101/807511 ◽

2019 ◽

Cited By ~ 1

Author(s):

Lisa Röttjers ◽

Karoline Faust

Keyword(s):

Network Inference ◽

Clustering Algorithm ◽

Synthetic Data ◽

Parameter Tuning ◽

Network Clustering ◽

Sequencing Data ◽

Cluster Membership ◽

Microbial Network ◽

Successful Approach ◽

Require Parameter

AbstractMicrobial network inference and analysis has become a successful approach to generate biological hypotheses from microbial sequencing data. Network clustering is a crucial step in this analysis. Here, we present a novel heuristic flow-based network clustering algorithm, which equals or outperforms existing algorithms on noise-free synthetic data. manta comes with unique strengths such as the ability to identify nodes that represent an intermediate between clusters, to exploit negative edges and to assess the robustness of cluster membership. manta does not require parameter tuning, is straightforward to install and run, and can easily be combined with existing microbial network inference tools.

Download Full-text

manta: a Clustering Algorithm for Weighted Ecological Networks

mSystems ◽

10.1128/msystems.00903-19 ◽

2020 ◽

Vol 5 (1) ◽

Author(s):

Lisa Röttjers ◽

Karoline Faust

Keyword(s):

Network Inference ◽

Clustering Algorithm ◽

Transition Period ◽

Parameter Tuning ◽

Network Clustering ◽

Sequencing Data ◽

Cluster Assignment ◽

Real World Data ◽

Data Set ◽

Microbial Network

ABSTRACT Microbial network inference and analysis have become successful approaches to extract biological hypotheses from microbial sequencing data. Network clustering is a crucial step in this analysis. Here, we present a novel heuristic network clustering algorithm, manta, which clusters nodes in weighted networks. In contrast to existing algorithms, manta exploits negative edges while differentiating between weak and strong cluster assignments. For this reason, manta can tackle gradients and is able to avoid clustering problematic nodes. In addition, manta assesses the robustness of cluster assignment, which makes it more robust to noisy data than most existing tools. On noise-free synthetic data, manta equals or outperforms existing algorithms, while it identifies biologically relevant subcompositions in real-world data sets. On a cheese rind data set, manta identifies groups of taxa that correspond to intermediate moisture content in the rinds, while on an ocean data set, the algorithm identifies a cluster of organisms that were reduced in abundance during a transition period but did not correlate strongly to biochemical parameters that changed during the transition period. These case studies demonstrate the power of manta as a tool that identifies biologically informative groups within microbial networks. IMPORTANCE manta comes with unique strengths, such as the abilities to identify nodes that represent an intermediate between clusters, to exploit negative edges, and to assess the robustness of cluster membership. manta does not require parameter tuning, is straightforward to install and run, and can be easily combined with existing microbial network inference tools.

Download Full-text

Modeling Microbial Community Networks: Methods and Tools.

Current Genomics ◽

10.2174/1389202921999200905133146 ◽

2020 ◽

Vol 21 ◽

Author(s):

Marco Cappellato ◽

Giacomo Baruzzo ◽

Ilaria Patuzzi ◽

Barbara Di Camillo

Keyword(s):

Microbial Community ◽

Network Inference ◽

Composition Studies ◽

Synthetic Data ◽

Computational Method ◽

Sequencing Data ◽

Generative Process ◽

Sequencing Technologies ◽

Key Points ◽

Generation Sequencing

: In the current research landscape, microbiota composition studies are of extreme interest, since it has been widely shown that resident microorganisms affect and shape the ecological niche they inhabit. This complex micro-world is characterized by different types of interactions. Understanding these relationships provides a useful tool for decoding the causes and effects of communities’ organization. Next-Generation Sequencing technologies allow to reconstruct the internal composition of the whole microbial community present in a sample. Sequencing data can then be investigated through statistical and computational method coming from network theory to infer the network of interactions among microbial species. Since there are several network inference approaches in the literature, in this paper we tried to shed light on their main characteristics and challenges, providing a useful tool not only to those interested in using the methods, but also to those who want to develop new ones. In addition, we focused on the frameworks used to produce synthetic data, starting from the simulation of network structures up to their integration with abundance models, with the aim of clarifying the key points of the entire generative process.

Download Full-text

Development of a biomarker database toward performing disease classification and finding disease interrelations

Database ◽

10.1093/database/baab011 ◽

2021 ◽

Vol 2021 ◽

Author(s):

Shaikh Farhad Hossain ◽

Ming Huang ◽

Naoaki Ono ◽

Aki Morita ◽

Shigehiko Kanaya ◽

...

Keyword(s):

Rapid Detection ◽

Molecular Mechanisms ◽

Clustering Algorithm ◽

Disease Diagnosis ◽

Disease Classification ◽

Network Clustering ◽

Medical Field ◽

Database Development ◽

Abnormal State ◽

Comprehensive Information

Abstract A biomarker is a measurable indicator of a disease or abnormal state of a body that plays an important role in disease diagnosis, prognosis and treatment. The biomarker has become a significant topic due to its versatile usage in the medical field and in rapid detection of the presence or severity of some diseases. The volume of biomarker data is rapidly increasing and the identified data are scattered. To provide comprehensive information, the explosively growing data need to be recorded in a single platform. There is no open-source freely available comprehensive online biomarker database. To fulfill this purpose, we have developed a human biomarker database as part of the KNApSAcK family databases which contain a vast quantity of information on the relationships between biomarkers and diseases. We have classified the diseases into 18 disease classes, mostly according to the National Center for Biotechnology Information definitions. Apart from this database development, we also have performed disease classification by separately using protein and metabolite biomarkers based on the network clustering algorithm DPClusO and hierarchical clustering. Finally, we reached a conclusion about the relationships among the disease classes. The human biomarker database can be accessed online and the inter-disease relationships may be helpful in understanding the molecular mechanisms of diseases. To our knowledge, this is one of the first approaches to classify diseases based on biomarkers. Database URL: http://www.knapsackfamily.com/Biomarker/top.php

Download Full-text

A Hard C-Means Clustering Algorithm Incorporating Membership KL Divergence and Local Data Information for Noisy Image Segmentation

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s021800141850012x ◽

2017 ◽

Vol 32 (04) ◽

pp. 1850012 ◽

Cited By ~ 5

Author(s):

R. R. Gharieb ◽

G. Gendy ◽

H. Selim

Keyword(s):

Image Segmentation ◽

Membership Function ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Cluster Center ◽

Local Data ◽

Cluster Membership ◽

Kl Divergence ◽

Clustering Approach ◽

Center Distance

In this paper, the standard hard C-means (HCM) clustering approach to image segmentation is modified by incorporating weighted membership Kullback–Leibler (KL) divergence and local data information into the HCM objective function. The membership KL divergence, used for fuzzification, measures the proximity between each cluster membership function of a pixel and the locally-smoothed value of the membership in the pixel vicinity. The fuzzification weight is a function of the pixel to cluster-centers distances. The used pixel to a cluster-center distance is composed of the original pixel data distance plus a fraction of the distance generated from the locally-smoothed pixel data. It is shown that the obtained membership function of a pixel is proportional to the locally-smoothed membership function of this pixel multiplied by an exponentially distributed function of the minus pixel distance relative to the minimum distance provided by the nearest cluster-center to the pixel. Therefore, since incorporating the locally-smoothed membership and data information in addition to the relative distance, which is more tolerant to additive noise than the absolute distance, the proposed algorithm has a threefold noise-handling process. The presented algorithm, named local data and membership KL divergence based fuzzy C-means (LDMKLFCM), is tested by synthetic and real-world noisy images and its results are compared with those of several FCM-based clustering algorithms.

Download Full-text

Analysis on Network Clustering Algorithm of Data Mining Methods Based on Rough Set Theory

2011 Fourth International Symposium on Knowledge Acquisition and Modeling ◽

10.1109/kam.2011.85 ◽

2011 ◽

Author(s):

Xiao-rong Ye

Keyword(s):

Data Mining ◽

Set Theory ◽

Rough Set ◽

Clustering Algorithm ◽

Rough Set Theory ◽

Network Clustering ◽

Mining Methods

Download Full-text

CASS: A distributed network clustering algorithm based on structure similarity for large-scale network

PLoS ONE ◽

10.1371/journal.pone.0203670 ◽

2018 ◽

Vol 13 (10) ◽

pp. e0203670 ◽

Cited By ~ 1

Author(s):

Jungrim Kim ◽

Mincheol Shin ◽

Jeongwoo Kim ◽

Chihyun Park ◽

Sujin Lee ◽

...

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

Network Clustering ◽

Distributed Network ◽

Large Scale Network ◽

Scale Network ◽

Structure Similarity

Download Full-text

RobOMP: Robust variants of Orthogonal Matching Pursuit for sparse representations

10.7287/peerj.preprints.27482v1 ◽

2019 ◽

Author(s):

Carlos A Loza

Keyword(s):

Mean Squared Error ◽

Matching Pursuit ◽

Synthetic Data ◽

Optimal Solution ◽

Parameter Tuning ◽

Weight Vector ◽

Orthogonal Matching Pursuit ◽

Main Mode ◽

Observation Matrix ◽

And Performance

Sparse coding aims to find a parsimonious representation of an example given an observation matrix or dictionary. In this regard, Orthogonal Matching Pursuit (OMP) provides an intuitive, simple and fast approximation of the optimal solution. However, its main building block is anchored on the minimization of the Mean Squared Error cost function (MSE). This approach is only optimal if the errors are distributed according to a Gaussian distribution without samples that strongly deviate from the main mode, i.e. outliers. If such assumption is violated, the sparse code will likely be biased and performance will degrade accordingly. In this paper, we introduce five robust variants of OMP (RobOMP) fully based on the theory of M-Estimators under a linear model. The proposed framework exploits efficient Iteratively Reweighted Least Squares (IRLS) techniques to mitigate the effect of outliers and emphasize the samples corresponding to the main mode of the data. This is done adaptively via a learned weight vector that models the distribution of the data in a robust manner. Experiments on synthetic data under several noise distributions and image recognition under different combinations of occlusion and missing pixels thoroughly detail the superiority of RobOMP over MSE-based approaches and similar robust alternatives. We also introduce a denoising framework based on robust, sparse and redundant representations that open the door to potential further applications of the proposed techniques. The five different variants of RobOMP do not require parameter tuning from the user and, hence, constitute principled alternatives to OMP.

Download Full-text

Ad-hoc Network Clustering Algorithm Based on Node Data Value

Proceedings of the 2018 International Conference on Computer Modeling, Simulation and Algorithm (CMSA 2018) ◽

10.2991/cmsa-18.2018.26 ◽

2018 ◽

Author(s):

Jingmin Tang ◽

Zhangbao Gao

Keyword(s):

Ad Hoc Network ◽

Clustering Algorithm ◽

Ad Hoc ◽

Network Clustering ◽

Data Value

Download Full-text

Single-cell regulatory network inference and clustering from high-dimensional sequencing data

2019 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata47090.2019.9006016 ◽

2019 ◽

Author(s):

Aristidis G. Vrahatis ◽

Georgios N. Dimitrakopoulos ◽

Sotiris K. Tasoulis ◽

Spiros V. Georgakopoulos ◽

Vassilis P. Plagianakos

Keyword(s):

Single Cell ◽

Regulatory Network ◽

Network Inference ◽

High Dimensional ◽

Sequencing Data

Download Full-text

Secure and stable Vehicular Ad Hoc Network clustering algorithm based on hybrid mobility similarities and trust management scheme

Vehicular Communications ◽

10.1016/j.vehcom.2018.08.001 ◽

2018 ◽

Vol 13 ◽

pp. 128-138 ◽

Cited By ~ 9

Author(s):

Sarah Oubabas ◽

Rachida Aoudjit ◽

Joel J. P. C. Rodrigues ◽

Said Talbi

Keyword(s):

Ad Hoc Network ◽

Trust Management ◽

Clustering Algorithm ◽

Ad Hoc ◽

Vehicular Ad Hoc Network ◽

Network Clustering ◽

Management Scheme ◽

Trust Management Scheme

Download Full-text