Capturing dynamics of time-varying data via topology

Lu Xian; Henry Adams; Chad M. Topaz; Lori Ziegelmeier

doi:10.3934/fods.2021033

Capturing dynamics of time-varying data via topology

Foundations of Data Science ◽

10.3934/fods.2021033 ◽

2021 ◽

Vol 0 (0) ◽

pp. 0

Author(s):

Lu Xian ◽

Henry Adams ◽

Chad M. Topaz ◽

Lori Ziegelmeier

Keyword(s):

Machine Learning ◽

Algebraic Topology ◽

Metric Spaces ◽

Applied Mathematics ◽

Topological Data Analysis ◽

Complex Data ◽

Time Varying ◽

Identification Task ◽

Static Data ◽

Biological Aggregations

<p style='text-indent:20px;'>One approach to understanding complex data is to study its shape through the lens of algebraic topology. While the early development of topological data analysis focused primarily on static data, in recent years, theoretical and applied studies have turned to data that varies in time. A time-varying collection of metric spaces as formed, for example, by a moving school of fish or flock of birds, can contain a vast amount of information. There is often a need to simplify or summarize the dynamic behavior. We provide an introduction to topological summaries of time-varying metric spaces including vineyards [<xref ref-type="bibr" rid="b19">19</xref>], crocker plots [<xref ref-type="bibr" rid="b55">55</xref>], and multiparameter rank functions [<xref ref-type="bibr" rid="b37">37</xref>]. We then introduce a new tool to summarize time-varying metric spaces: a <i>crocker stack</i>. Crocker stacks are convenient for visualization, amenable to machine learning, and satisfy a desirable continuity property which we prove. We demonstrate the utility of crocker stacks for a parameter identification task involving an influential model of biological aggregations [<xref ref-type="bibr" rid="b57">57</xref>]. Altogether, we aim to bring the broader applied mathematics community up-to-date on topological summaries of time-varying metric spaces.</p>

Download Full-text

Identifying homogeneous subgroups of patients and important features: a topological machine learning approach

BMC Bioinformatics ◽

10.1186/s12859-021-04360-9 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Ewan Carr ◽

Mathieu Carrière ◽

Bertrand Michel ◽

Frédéric Chazal ◽

Raquel Iniesta

Keyword(s):

Machine Learning ◽

Topological Data Analysis ◽

Mixed Data ◽

Complex Data ◽

Data Types ◽

Topological Features ◽

Machine Learning Approach ◽

Recent Developments ◽

Dimensional Graph ◽

Selection Of

Abstract Background This paper exploits recent developments in topological data analysis to present a pipeline for clustering based on Mapper, an algorithm that reduces complex data into a one-dimensional graph. Results We present a pipeline to identify and summarise clusters based on statistically significant topological features from a point cloud using Mapper. Conclusions Key strengths of this pipeline include the integration of prior knowledge to inform the clustering process and the selection of optimal clusters; the use of the bootstrap to restrict the search to robust topological features; the use of machine learning to inspect clusters; and the ability to incorporate mixed data types. Our pipeline can be downloaded under the GNU GPLv3 license at https://github.com/kcl-bhi/mapper-pipeline.

Download Full-text

Big Data to Knowledge: Application of Machine Learning to Predictive Modeling of Therapeutic Response in Cancer.

Current Genomics ◽

10.2174/1389202921999201224110101 ◽

2020 ◽

Vol 21 ◽

Author(s):

Sukanya Panja ◽

Sarra Rahem ◽

Cassandra J. Chu ◽

Antonina Mitrofanova

Keyword(s):

Machine Learning ◽

Missing Values ◽

Therapeutic Response ◽

Patient Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Complex Data ◽

Human Machine Interaction ◽

Data Repositories ◽

Response Modeling

Background: In recent years, the availability of high throughput technologies, establishment of large molecular patient data repositories, and advancement in computing power and storage have allowed elucidation of complex mechanisms implicated in therapeutic response in cancer patients. The breadth and depth of such data, alongside experimental noise and missing values, requires a sophisticated human-machine interaction that would allow effective learning from complex data and accurate forecasting of future outcomes, ideally embedded in the core of machine learning design. Objective: In this review, we will discuss machine learning techniques utilized for modeling of treatment response in cancer, including Random Forests, support vector machines, neural networks, and linear and logistic regression. We will overview their mathematical foundations and discuss their limitations and alternative approaches all in light of their application to therapeutic response modeling in cancer. Conclusion: We hypothesize that the increase in the number of patient profiles and potential temporal monitoring of patient data will define even more complex techniques, such as deep learning and causal analysis, as central players in therapeutic response modeling.

Download Full-text

Deep Transfer Learning Based Intrusion Detection System for Electric Vehicular Networks

Sensors ◽

10.3390/s21144736 ◽

2021 ◽

Vol 21 (14) ◽

pp. 4736

Author(s):

Sk. Tanzir Mehedi ◽

Adnan Anwar ◽

Ziaur Rahman ◽

Kawsar Ahmed

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Intrusion Detection ◽

Real Time ◽

Transfer Learning ◽

Security Requirements ◽

Detection Accuracy ◽

Area Network ◽

Complex Data ◽

Network Intrusion

The Controller Area Network (CAN) bus works as an important protocol in the real-time In-Vehicle Network (IVN) systems for its simple, suitable, and robust architecture. The risk of IVN devices has still been insecure and vulnerable due to the complex data-intensive architectures which greatly increase the accessibility to unauthorized networks and the possibility of various types of cyberattacks. Therefore, the detection of cyberattacks in IVN devices has become a growing interest. With the rapid development of IVNs and evolving threat types, the traditional machine learning-based IDS has to update to cope with the security requirements of the current environment. Nowadays, the progression of deep learning, deep transfer learning, and its impactful outcome in several areas has guided as an effective solution for network intrusion detection. This manuscript proposes a deep transfer learning-based IDS model for IVN along with improved performance in comparison to several other existing models. The unique contributions include effective attribute selection which is best suited to identify malicious CAN messages and accurately detect the normal and abnormal activities, designing a deep transfer learning-based LeNet model, and evaluating considering real-world data. To this end, an extensive experimental performance evaluation has been conducted. The architecture along with empirical analyses shows that the proposed IDS greatly improves the detection accuracy over the mainstream machine learning, deep learning, and benchmark deep transfer learning models and has demonstrated better performance for real-time IVN security.

Download Full-text

Graph Learning for Combinatorial Optimization: A Survey of State-of-the-Art

Data Science and Engineering ◽

10.1007/s41019-021-00155-3 ◽

2021 ◽

Author(s):

Yun Peng ◽

Byron Choi ◽

Jianliang Xu

Keyword(s):

Machine Learning ◽

Combinatorial Optimization ◽

Graph Embedding ◽

Partial Solution ◽

Complex Data ◽

Learning Methods ◽

Graph Learning ◽

Second Stage ◽

End To End ◽

Embedding Methods

AbstractGraphs have been widely used to represent complex data in many applications, such as e-commerce, social networks, and bioinformatics. Efficient and effective analysis of graph data is important for graph-based applications. However, most graph analysis tasks are combinatorial optimization (CO) problems, which are NP-hard. Recent studies have focused a lot on the potential of using machine learning (ML) to solve graph-based CO problems. Most recent methods follow the two-stage framework. The first stage is graph representation learning, which embeds the graphs into low-dimension vectors. The second stage uses machine learning to solve the CO problems using the embeddings of the graphs learned in the first stage. The works for the first stage can be classified into two categories, graph embedding methods and end-to-end learning methods. For graph embedding methods, the learning of the the embeddings of the graphs has its own objective, which may not rely on the CO problems to be solved. The CO problems are solved by independent downstream tasks. For end-to-end learning methods, the learning of the embeddings of the graphs does not have its own objective and is an intermediate step of the learning procedure of solving the CO problems. The works for the second stage can also be classified into two categories, non-autoregressive methods and autoregressive methods. Non-autoregressive methods predict a solution for a CO problem in one shot. A non-autoregressive method predicts a matrix that denotes the probability of each node/edge being a part of a solution of the CO problem. The solution can be computed from the matrix using search heuristics such as beam search. Autoregressive methods iteratively extend a partial solution step by step. At each step, an autoregressive method predicts a node/edge conditioned to current partial solution, which is used to its extension. In this survey, we provide a thorough overview of recent studies of the graph learning-based CO methods. The survey ends with several remarks on future research directions.

Download Full-text

Classification of apatite structures via topological data analysis: a framework for a ‘Materials Barcode’ representation of structure maps

Scientific Reports ◽

10.1038/s41598-021-90070-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Scott Broderick ◽

Ruhil Dongol ◽

Tianmu Zhang ◽

Krishna Rajan

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Crystal Chemistry ◽

Persistent Homology ◽

Hierarchical Classification ◽

Topological Data Analysis ◽

Learning Tool ◽

Coordination Polyhedra ◽

Machine Learning Tool ◽

Topological Data

AbstractThis paper introduces the use of topological data analysis (TDA) as an unsupervised machine learning tool to uncover classification criteria in complex inorganic crystal chemistries. Using the apatite chemistry as a template, we track through the use of persistent homology the topological connectivity of input crystal chemistry descriptors on defining similarity between different stoichiometries of apatites. It is shown that TDA automatically identifies a hierarchical classification scheme within apatites based on the commonality of the number of discrete coordination polyhedra that constitute the structural building units common among the compounds. This information is presented in the form of a visualization scheme of a barcode of homology classifications, where the persistence of similarity between compounds is tracked. Unlike traditional perspectives of structure maps, this new “Materials Barcode” schema serves as an automated exploratory machine learning tool that can uncover structural associations from crystal chemistry databases, as well as to achieve a more nuanced insight into what defines similarity among homologous compounds.

Download Full-text

Heart disease prediction using machine learning techniques : a survey

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.8.10557 ◽

2018 ◽

Vol 7 (2.8) ◽

pp. 684 ◽

Cited By ~ 12

Author(s):

V V. Ramalingam ◽

Ayantan Dandapath ◽

M Karthik Raja

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Complex Data ◽

Learning Techniques ◽

Vector Machines ◽

Supervised Learning Algorithms ◽

Life Threatening

Heart related diseases or Cardiovascular Diseases (CVDs) are the main reason for a huge number of death in the world over the last few decades and has emerged as the most life-threatening disease, not only in India but in the whole world. So, there is a need of reliable, accurate and feasible system to diagnose such diseases in time for proper treatment. Machine Learning algorithms and techniques have been applied to various medical datasets to automate the analysis of large and complex data. Many researchers, in recent times, have been using several machine learning techniques to help the health care industry and the professionals in the diagnosis of heart related diseases. This paper presents a survey of various models based on such algorithms and techniques andanalyze their performance. Models based on supervised learning algorithms such as Support Vector Machines (SVM), K-Nearest Neighbour (KNN), NaïveBayes, Decision Trees (DT), Random Forest (RF) and ensemble models are found very popular among the researchers.

Download Full-text

Automated Feature Selection and Classification for High-Dimensional Biomedical Data

10.21203/rs.3.rs-563410/v1 ◽

2021 ◽

Author(s):

Tammo P.A. Beishuizen ◽

Joaquin Vanschoren ◽

Peter A.J. Hilbers ◽

Dragan Bošnački

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Automated System ◽

Complex Data ◽

Biomedical Data ◽

Selection Methods ◽

Model Predictions ◽

Automated Machine Learning ◽

Feature Selection Techniques ◽

Best Fit

Abstract Background: Automated machine learning aims to automate the building of accurate predictive models, including the creation of complex data preprocessing pipelines. Although successful in many fields, they struggle to produce good results on biomedical datasets, especially given the high dimensionality of the data. Result: In this paper, we explore the automation of feature selection in these scenarios. We analyze which feature selection techniques are ideally included in an automated system, determine how to efficiently find the ones that best fit a given dataset, integrate this into an existing AutoML tool (TPOT), and evaluate it on four very different yet representative types of biomedical data: microarray, mass spectrometry, clinical and survey datasets. We focus on feature selection rather than latent feature generation since we often want to explain the model predictions in terms of the intrinsic features of the data. Conclusion: Our experiments show that for none of these datasets we need more than 200 features to accurately explain the output. Additional features did not increase the quality significantly. We also find that the automated machine learning results are significantly improved after adding additional feature selection methods and prior knowledge on how to select and tune them.

Download Full-text

AI and Machine Learning In Nuclear Fusion

10.31224/osf.io/3nwsc ◽

2020 ◽

Author(s):

Andrew Kamal

Keyword(s):

Machine Learning ◽

Algebraic Topology ◽

Potential Role ◽

Nuclear Fusion ◽

Maximum Point ◽

Maximum Heat ◽

Mathematical Proofs ◽

Maximum Heat Transfer ◽

Entropy Transfer ◽

Transfer Mechanisms

With the emergence of regressional mathematics and algebraic topology comes advancements in the field of artificial intelligence and machine learning. Such advancements when looking into problems such as nuclear fusion and entropy, can be utilized to analyze unsolved abnormalities in the area of fusion related research. Proof theory will be utilized throughout this paper. For logical mathematical proofs: n represents an unknown number, e represents point of entropy, and m represents maximum point, f represents fusion. This paper will look into analysis of the topic of nuclear fusion and unsolved problems as hardness problems and attempt to formulate computational proofs in relation to entropy, fusion maximum, heat transfer, and entropy transfer mechanisms. This paper will not only be centered around logical proofs but also around computational mechanisms such as distributed computing and its potential role in analyzing computational hardness in relation to fusion related problems. We will summarize a proposal for experimentation utilizing further logical proof formalities and the decentralized-internet SDK for a computational pipeline in order to solve fusion related hardness problems.

Download Full-text

Topological Segmentation of Time-Varying Functional Connectivity Highlights the Role of Preferred Cortical Circuits

10.1101/2020.09.06.285130 ◽

2020 ◽

Author(s):

Jacob Billings ◽

Manish Saggar ◽

Shella Keilholz ◽

Giovanni Petri

Keyword(s):

Functional Connectivity ◽

Brain Function ◽

Persistent Homology ◽

Brain Regions ◽

Topological Data Analysis ◽

Time Varying ◽

Imaging Data ◽

Experimental Conditions ◽

Common Time ◽

Brain Imaging Data

Functional connectivity (FC) and its time-varying analogue (TVFC) leverage brain imaging data to interpret brain function as patterns of coordinating activity among brain regions. While many questions remain regarding the organizing principles through which brain function emerges from multi-regional interactions, advances in the mathematics of Topological Data Analysis (TDA) may provide new insights into the brain’s spontaneous self-organization. One tool from TDA, “persistent homology”, observes the occurrence and the persistence of n-dimensional holes presented in the metric space over a dataset. The occurrence of n-dimensional holes within the TVFC point cloud may denote conserved and preferred routes of information flow among brain regions. In the present study, we compare the use of persistence homology versus more traditional TVFC metrics at the task of segmenting brain states that differ across a common time-series of experimental conditions. We find that the structures identified by persistence homology more accurately segment the stimuli, more accurately segment volunteer performance during experimentally defined tasks, and generalize better across volunteers. Finally, we present empirical and theoretical observations that interpret brain function as a topological space defined by cyclic and interlinked motifs among distributed brain regions, especially, the attention networks.

Download Full-text

Static Data Anonymization Part II: Complex Data Structures

Data Privacy ◽

10.1201/9781315370910-12 ◽

2016 ◽

pp. 85-104

Keyword(s):

Data Structures ◽

Complex Data ◽

Data Anonymization ◽

Static Data

Download Full-text