Machine learning meets genome assembly

Kleber Padovani de Souza; João Carlos Setubal; André Carlos Ponce de Leon F. de Carvalho; Guilherme Oliveira; Annie Chateau; Ronnie Alves

doi:10.1093/bib/bby072

Machine learning meets genome assembly

Briefings in Bioinformatics ◽

10.1093/bib/bby072 ◽

2018 ◽

Vol 20 (6) ◽

pp. 2116-2129 ◽

Cited By ~ 4

Author(s):

Kleber Padovani de Souza ◽

João Carlos Setubal ◽

André Carlos Ponce de Leon F. de Carvalho ◽

Guilherme Oliveira ◽

Annie Chateau ◽

...

Keyword(s):

Machine Learning ◽

Approximate Solutions ◽

Machine Learning Algorithms ◽

Np Hard ◽

Sequencing Technologies ◽

Starting Point ◽

Hard Problems ◽

Dna Fragment ◽

Living Organisms ◽

Dna Fragment Assembly

Abstract Motivation: With the recent advances in DNA sequencing technologies, the study of the genetic composition of living organisms has become more accessible for researchers. Several advances have been achieved because of it, especially in the health sciences. However, many challenges which emerge from the complexity of sequencing projects remain unsolved. Among them is the task of assembling DNA fragments from previously unsequenced organisms, which is classified as an NP-hard (nondeterministic polynomial time hard) problem, for which no efficient computational solution with reasonable execution time exists. However, several tools that produce approximate solutions have been used with results that have facilitated scientific discoveries, although there is ample room for improvement. As with other NP-hard problems, machine learning algorithms have been one of the approaches used in recent years in an attempt to find better solutions to the DNA fragment assembly problem, although still at a low scale. Results: This paper presents a broad review of pioneering literature comprising artificial intelligence-based DNA assemblers—particularly the ones that use machine learning—to provide an overview of state-of-the-art approaches and to serve as a starting point for further study in this field.

Download Full-text

Using Machine Learning for Quantum Annealing Accuracy Prediction

Algorithms ◽

10.3390/a14060187 ◽

2021 ◽

Vol 14 (6) ◽

pp. 187

Author(s):

Aaron Barbosa ◽

Elijah Pelofske ◽

Georg Hahn ◽

Hristo N. Djidjev

Keyword(s):

Machine Learning ◽

Maximum Clique ◽

Classification Model ◽

Maximum Clique Problem ◽

Problem Instance ◽

Np Hard ◽

Machine Learning Classification ◽

Hard Problems ◽

Problem Instances ◽

D Wave

Quantum annealers, such as the device built by D-Wave Systems, Inc., offer a way to compute solutions of NP-hard problems that can be expressed in Ising or quadratic unconstrained binary optimization (QUBO) form. Although such solutions are typically of very high quality, problem instances are usually not solved to optimality due to imperfections of the current generations quantum annealers. In this contribution, we aim to understand some of the factors contributing to the hardness of a problem instance, and to use machine learning models to predict the accuracy of the D-Wave 2000Q annealer for solving specific problems. We focus on the maximum clique problem, a classic NP-hard problem with important applications in network analysis, bioinformatics, and computational chemistry. By training a machine learning classification model on basic problem characteristics such as the number of edges in the graph, or annealing parameters, such as the D-Wave’s chain strength, we are able to rank certain features in the order of their contribution to the solution hardness, and present a simple decision tree which allows to predict whether a problem will be solvable to optimality with the D-Wave 2000Q. We extend these results by training a machine learning regression model that predicts the clique size found by D-Wave.

Download Full-text

Utilizing the Microbiota and Machine Learning Algorithms to Assess Risk of Salmonella Contamination in Poultry Rinsate

Journal of Food Protection ◽

10.4315/jfp-20-367 ◽

2021 ◽

Author(s):

Hannah Bolinger ◽

David Tran ◽

Kenneth Harary ◽

George C. Paoli ◽

Giselle Guron ◽

...

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Diagnostic Tools ◽

Sequencing Data ◽

Testing Methods ◽

16S Sequencing ◽

Sequencing Technologies ◽

Microbiological Testing ◽

Microbiome Data ◽

Larger Sample

Traditional microbiological testing methods are slow, and many molecular-based techniques rely on culture-based enrichment to overcome low limits of detection. Recent advancements in sequencing technologies may make it possible to utilize machine learning (ML) to identify patterns in microbiome data to potentially predict the presence or absence of pathogens. In this study, 299 poultry rinsate samples from various points in the processing chain were analyzed to determine if microbiota could inform about a sample’s risk for containing Salmonella . Samples were culture confirmed as Salmonella -positive or -negative following modified USDA MLG protocols. The culture confirmation result was used as a reference to compare with 16S sequencing data. Pre-chill samples tested positive (71/82) at a higher frequency than post-chill samples (30/217) and contained greater microbial diversity. Due to their larger sample size, post-chill samples were analyzed more deeply. Analysis of variance (ANOVA) identified a significant effect of chilling on the number of genera (p<0.001), but analysis of similarities (ANOSIM) failed to provide evidence for microbial dissimilarity between pre- and post-chill samples (p=0.001, R=0.443). Various ML models were trained using post-chill samples to predict if a sample contained Salmonella based on the samples’ microbiota pre-enrichment. The optimal model was a Random Forest-based model with a performance as follows: accuracy (88%), sensitivity (85%), specificity (90%). While the algorithms described in this paper are prototypes, these risk-based algorithms demonstrate the potential and need for further studies to provide insight alongside diagnostic tests. Combining risk-based information with diagnostic tools can help poultry processors make informed decisions to help identify and prevent the spread of Salmonella . These data add to the growing body of literature exploring novel ways to utilize microbiome data for predictive food safety.

Download Full-text

A note on no-free-lunch theorem

Discrete Mathematics Algorithms and Applications ◽

10.1142/s1793830922500276 ◽

2021 ◽

Author(s):

Lidong Wu

Keyword(s):

Machine Learning ◽

Theoretical Result ◽

Data Driven ◽

Np Hard ◽

Free Lunch ◽

No Free Lunch Theorem ◽

Data Driven Approach ◽

Hard Problems ◽

No Free Lunch ◽

Np Hard Problems

The No-Free-Lunch theorem is an interesting and important theoretical result in machine learning. Based on philosophy of No-Free-Lunch theorem, we discuss extensively on the limitation of a data-driven approach in solving NP-hard problems.

Download Full-text

Finding approximate solutions to NP-hard problems by neural networks is hard

Information Processing Letters ◽

10.1016/0020-0190(92)90261-s ◽

1992 ◽

Vol 41 (2) ◽

pp. 93-98 ◽

Cited By ~ 13

Author(s):

Xin Yao

Keyword(s):

Neural Networks ◽

Approximate Solutions ◽

Np Hard ◽

Hard Problems ◽

Np Hard Problems

Download Full-text

Hybrid Heart Disease Prediction Supervised, Unsupervised Opinion Mining Algorithms

International Journal of Scientific Research in Science and Technology ◽

10.32628/ijsrst218415 ◽

2021 ◽

pp. 36-41

Author(s):

Miss. Samiksha Arvind Kale ◽

Prof. Dr A .B . Gadicha

Keyword(s):

Machine Learning ◽

Opinion Mining ◽

Nearest Neighbor ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbor ◽

Living Organisms ◽

Python Programming ◽

Mining Algorithms ◽

Death Cases

Heart plays significant role in living organisms. Diagnosis and prediction of heart related diseases requires more precision, perfection and correctness because slightly mistake can cause fatigue problem or death of the person, there are numerous death cases related to heart and their counting is increasing exponentially day by day. To affect the matter there's essential need of prediction system for awareness about diseases Machine learning is that the branch of AI (AI), it provides prestigious support in predicting any quite event which take training from natural events. During this paper, we calculate accuracy of machine learning algorithms for predicting heart condition, for this algorithms are k-nearest neighbor, decision tree, linear regression and support vector machine (SVM) by using UCI repository dataset for training and testing. For implementation of Python programming Anaconda (jupytor) notebook is best tool, which have many kind of library, header file, that make the work more accurate and precise.

Download Full-text

Fifty years of P vs. NP and the possibility of the impossible

Communications of the ACM ◽

10.1145/3460351 ◽

2022 ◽

Vol 65 (1) ◽

pp. 76-85

Author(s):

Lance Fortnow

Keyword(s):

Machine Learning ◽

Np Hard ◽

Hard Problems ◽

Np Hard Problems

Advances in algorithms, machine learning, and hardware can help tackle many NP-hard problems once thought impossible.

Download Full-text

Free versus bound entanglement, a NP-hard problem tackled by machine learning

Scientific Reports ◽

10.1038/s41598-021-98523-6 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Beatrix C. Hiesmayr

Keyword(s):

Machine Learning ◽

Large Family ◽

Machine Learning Algorithms ◽

Entangled States ◽

Hard Problem ◽

Np Hard ◽

Entanglement Witnesses ◽

Np Hard Problem ◽

Bound Entanglement ◽

Ppt Criterion

AbstractEntanglement detection in high dimensional systems is a NP-hard problem since it is lacking an efficient way. Given a bipartite quantum state of interest free entanglement can be detected efficiently by the PPT-criterion (Peres-Horodecki criterion), in contrast to detecting bound entanglement, i.e. a curious form of entanglement that can also not be distilled into maximally (free) entangled states. Only a few bound entangled states have been found, typically by constructing dedicated entanglement witnesses, so naturally the question arises how large is the volume of those states. We define a large family of magically symmetric states of bipartite qutrits for which we find $$82\%$$ 82 % to be free entangled, $$2\%$$ 2 % to be certainly separable and as much as $$10\%$$ 10 % to be bound entangled, which shows that this kind of entanglement is not rare. Via various machine learning algorithms we can confirm that the remaining $$6\%$$ 6 % of states are more likely to belonging to the set of separable states than bound entangled states. Most important we find via dimension reduction algorithms that there is a strong two-dimensional (linear) sub-structure in the set of bound entangled states. This revealed structure opens a novel path to find and characterize bound entanglement towards solving the long-standing problem of what the existence of bound entanglement is implying.

Download Full-text

Classifying lymphoma and tuberculosis case reports using machine learning algorithms

Bulletin of Electrical Engineering and Informatics ◽

10.11591/eei.v10i5.3132 ◽

2021 ◽

Vol 10 (5) ◽

pp. 2857-2865

Author(s):

Moanda Diana Pholo ◽

Yskandar Hamam ◽

Abdel Baset Khalaf ◽

Chunling Du

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Performance Metrics ◽

Case Reports ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Tuberculosis Case ◽

Starting Point

Available literature reports several lymphoma cases misdiagnosed as tuberculosis, especially in countries with a heavy TB burden. This frequent misdiagnosis is due to the fact that the two diseases can present with similar symptoms. The present study therefore aims to analyse and explore TB as well as lymphoma case reports using Natural Language Processing tools and evaluate the use of machine learning to differentiate between the two diseases. As a starting point in the study, case reports were collected for each disease using web scraping. Natural language processing tools and text clustering were then used to explore the created dataset. Finally, six machine learning algorithms were trained and tested on the collected data, which contained 765 lymphoma and 546 tuberculosis case reports. Each method was evaluated using various performance metrics. The results indicated that the multi-layer perceptron model achieved the best accuracy (93.1%), recall (91.9%) and precision score (93.7%), thus outperforming other algorithms in terms of correctly classifying the different case reports.

Download Full-text

HEART DISEASE PREDICTION SYSTEM USING MACHINE LEARNING ALGORITHM

Iraqi Journal of Information & Communications Technology ◽

10.31987/ijict.1.1.153 ◽

2021 ◽

Vol 1 (1) ◽

pp. 146-176

Author(s):

Israa Nadher ◽

Mohammad Ayache ◽

Hussein Kanaan

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Digital Data ◽

Disease Prediction ◽

Prediction System ◽

Testing Dataset ◽

Starting Point

Abstract—Information decision support systems are becomingmore in use as we are living in the era of digital data andrise of artificial intelligence. Heart disease as one of the mostknown and dangerous is getting very important attention, thisattention is translated into digital and prediction system thatdetects the presence of disease according to the available dataand information. Such systems faced a lot of problems since thefirst rise, but now with the deveolopment of machine learnigfield we are using them in developing new models to detect thepresence of this disease, in addition to algorithms data is veryimportant which also form the heart of the predicton systems,as we know prediction algorithms take decisions and thesedecisions must be based on facts, and these facts are extractedfrom data, as a result data is the starting point of every system.In this paper we propose a Heart Disease Prediction Systemusing Machine Learning Algorithms, in terms of data we usedCleveland dataset, this dataset is normalized then divided intothree scnearios in terms of traning and testing respectively,80%-20%, 50%-50%, 30%-70%. In each case of dataset ifit is normalized or not we will have these three scenarios.We used three machine learning algorithms for every scenarioof the mentioned before which are SVM, SMO and MLP, inthese algorithms we’ve used two different kernels to test theresults upon that. These two types of simulation are added tothe collection of scenarios mentioned above to become as thefollowing we have at the main level two types normalized andunnormalized dataset, then for each one we have three typesaccording to the amount of training and testing dataset, thenfor each of these scenarios we have two scenarios according tothe type of kernel to become 30 scenarios in total, our proposedsystem have shown a dominance in terms of accuracy over theother previous works.

Download Full-text

Hybridization of the optimization process as an alternative way to find approximate solutions for NP-hard problems

PRZEGLĄD ELEKTROTECHNICZNY ◽

10.15199/48.2015.02.15 ◽

2015 ◽

Vol 1 (2) ◽

pp. 61-64

Author(s):

Leonard ROZENBERG

Keyword(s):

Approximate Solutions ◽

Optimization Process ◽

Np Hard ◽

Hard Problems ◽

Np Hard Problems

Download Full-text