Identifying and Fusing Duplicate Features for Data Mining

Mapping Intimacies ◽

10.5753/sbbd.2020.13631 ◽

2020 ◽

Author(s):

Hortênsia Costa Barcelos ◽

Mariana Recamonde Mendoza ◽

Viviane Pereira Moreira

Keyword(s):

Machine Learning ◽

Predictive Power ◽

Ground Truth ◽

Mortality Prediction ◽

Simple Method ◽

Training Time ◽

Duplicate Detection ◽

Original Dataset ◽

Prediction Test ◽

Small Set

This work addresses the problem of identifying and fusing duplicate features in machine learning datasets. Our goal is to evaluate the hypothesis that fusing duplicate features can improve the predictive power of the data while reducing training time. We propose a simple method for duplicate detection and fusion based on a small set of features. An evaluation comparing the duplicate detection against a manually generated ground truth obtained F1 of 0.91. Then,the effects of fusion were measured on a mortality prediction test. The results were inferior to the ones obtained with the original dataset. Thus we concluded that the investigated hypothesis does not hold.

Download Full-text

Nanosecond Photodynamics Simulations of a Cis-Trans Isomerization Are Enabled by Machine Learning

10.26434/chemrxiv.13047863 ◽

2020 ◽

Author(s):

Jingbai Li ◽

Patrick Reiser ◽

André Eberhard ◽

Pascal Friederich ◽

Steven Lopez

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Excited State ◽

Adaptive Sampling ◽

Computational Cost ◽

Ground Truth ◽

Absolute Error ◽

Photochemical Reactions ◽

Computational Techniques ◽

Full Potential

Photochemical reactions are being increasingly used to construct complex molecular architectures with mild and straightforward reaction conditions. Computational techniques are increasingly important to understand the reactivities and chemoselectivities of photochemical isomerization reactions because they offer molecular bonding information along the excited-state(s) of photodynamics. These photodynamics simulations are resource-intensive and are typically limited to 1–10 picoseconds and 1,000 trajectories due to high computational cost. Most organic photochemical reactions have excited-state lifetimes exceeding 1 picosecond, which places them outside possible computational studies. Westermeyr et al. demonstrated that a machine learning approach could significantly lengthen photodynamics simulation times for a model system, methylenimmonium cation (CH2NH2+).We have developed a Python-based code, Python Rapid Artificial Intelligence Ab Initio Molecular Dynamics (PyRAI2MD), to accomplish the unprecedented 10 ns cis-trans photodynamics of trans-hexafluoro-2-butene (CF3–CH=CH–CF3) in 3.5 days. The same simulation would take approximately 58 years with ground-truth multiconfigurational dynamics. We proposed an innovative scheme combining Wigner sampling, geometrical interpolations, and short-time quantum chemical trajectories to effectively sample the initial data, facilitating the adaptive sampling to generate an informative and data-efficient training set with 6,232 data points. Our neural networks achieved chemical accuracy (mean absolute error of 0.032 eV). Our 4,814 trajectories reproduced the S1 half-life (60.5 fs), the photochemical product ratio (trans: cis = 2.3: 1), and autonomously discovered a pathway towards a carbene. The neural networks have also shown the capability of generalizing the full potential energy surface with chemically incomplete data (trans → cis but not cis → trans pathways) that may offer future automated photochemical reaction discoveries.

Download Full-text

Using artificial neural network condensation to facilitate adaption of machine learning in medical settings by reducing computational burden (Preprint)

10.2196/preprints.20767 ◽

2020 ◽

Author(s):

Dianbo Liu

Keyword(s):

Neural Network ◽

Machine Learning ◽

Third World ◽

Mortality Prediction ◽

Neural Net ◽

Medical Settings ◽

Hidden Layer ◽

Applications Of Machine Learning ◽

Computational Resources ◽

Developed Nations

BACKGROUND Applications of machine learning (ML) on health care can have a great impact on people’s lives. At the same time, medical data is usually big, requiring a significant amount of computational resources. Although it might not be a problem for wide-adoption of ML tools in developed nations, availability of computational resource can very well be limited in third-world nations and on mobile devices. This can prevent many people from benefiting of the advancement in ML applications for healthcare. OBJECTIVE In this paper we explored three methods to increase computational efficiency of either recurrent neural net-work(RNN) or feedforward (deep) neural network (DNN) while not compromising its accuracy. We used in-patient mortality prediction as our case analysis upon intensive care dataset. METHODS We reduced the size of RNN and DNN by applying pruning of “unused” neurons. Additionally, we modified the RNN structure by adding a hidden-layer to the RNN cell but reduce the total number of recurrent layers to accomplish a reduction of total parameters in the network. Finally, we implemented quantization on DNN—forcing the weights to be 8-bits instead of 32-bits. RESULTS We found that all methods increased implementation efficiency–including training speed, memory size and inference speed–without reducing the accuracy of mortality prediction. CONCLUSIONS This improvements allow the implementation of sophisticated NN algorithms on devices with lower computational resources.

Download Full-text

Prediction and Analysis of the Severity and Number of Suburban Accidents Using Logit Model, Factor Analysis and Machine Learning: A case study in a developing country

SN Applied Sciences ◽

10.1007/s42452-020-04081-3 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Meisam Ghasedi ◽

Maryam Sarfjoo ◽

Iraj Bargegol

Keyword(s):

Machine Learning ◽

Factor Analysis ◽

Logit Model ◽

Predictive Power ◽

Dominant Role ◽

Learning Approaches ◽

Lighting Condition ◽

Rainy Weather ◽

Pedestrian Accidents ◽

Effective Principal

AbstractThe purpose of this study is to investigate and determine the factors affecting vehicle and pedestrian accidents taking place in the busiest suburban highway of Guilan Province located in the north of Iran and provide the most accurate prediction model. Therefore, the effective principal variables and the probability of occurrence of each category of crashes are analyzed and computed utilizing the factor analysis, logit, and Machine Learning approaches simultaneously. This method not only could contribute to achieving the most comprehensive and efficient model to specify the major contributing factor, but also it can provide officials with suggestions to take effective measures with higher precision to lessen accident impacts and improve road safety. Both the factor analysis and logit model show the significant roles of exceeding lawful speed, rainy weather and driver age (30–50) variables in the severity of vehicle accidents. On the other hand, the rainy weather and lighting condition variables as the most contributing factors in pedestrian accidents severity, underline the dominant role of environmental factors in the severity of all vehicle-pedestrian accidents. Moreover, considering both utilized methods, the machine-learning model has higher predictive power in all cases, especially in pedestrian accidents, with 41.6% increase in the predictive power of fatal accidents and 12.4% in whole accidents. Thus, the Artificial Neural Network model is chosen as the superior approach in predicting the number and severity of crashes. Besides, the good performance and validation of the machine learning is proved through performance and sensitivity analysis.

Download Full-text

Federated Quantum Machine Learning

Entropy ◽

10.3390/e23040460 ◽

2021 ◽

Vol 23 (4) ◽

pp. 460

Author(s):

Samuel Yen-Chi Chen ◽

Shinjae Yoo

Keyword(s):

Machine Learning ◽

Data Privacy ◽

Research Direction ◽

Future Research ◽

Quantum Computers ◽

Training Time ◽

Quantum Neural Network ◽

Distributed Training ◽

Machine Learning Model ◽

Quantum Machine Learning

Distributed training across several quantum computers could significantly improve the training time and if we could share the learned model, not the data, it could potentially improve the data privacy as the training would happen where the data is located. One of the potential schemes to achieve this property is the federated learning (FL), which consists of several clients or local nodes learning on their own data and a central node to aggregate the models collected from those local nodes. However, to the best of our knowledge, no work has been done in quantum machine learning (QML) in federation setting yet. In this work, we present the federated training on hybrid quantum-classical machine learning models although our framework could be generalized to pure quantum machine learning model. Specifically, we consider the quantum neural network (QNN) coupled with classical pre-trained convolutional model. Our distributed federated learning scheme demonstrated almost the same level of trained model accuracies and yet significantly faster distributed training. It demonstrates a promising future research direction for scaling and privacy aspects.

Download Full-text

A Simple Method of Predicting Autumn Leaf Coloring Date Using Machine Learning with Spring Leaf Unfolding Date

Asia-Pacific Journal of Atmospheric Sciences ◽

10.1007/s13143-021-00251-4 ◽

2021 ◽

Author(s):

Sehyun Lee ◽

Sujong Jeong ◽

Chang-Eui Park ◽

Jongho Kim

Keyword(s):

Machine Learning ◽

Simple Method ◽

Leaf Unfolding ◽

Autumn Leaf ◽

Spring Leaf ◽

Leaf Unfolding Date

Download Full-text

A NOVEL EXTENSIVE EX-VIVO OCT DATABASE FROM MURINE MODELS OF COLORECTAL CANCER

British Journal of Surgery ◽

10.1093/bjs/znab160.030 ◽

2021 ◽

Vol 108 (Supplement_3) ◽

Author(s):

J Bote ◽

J F Ortega-Morán ◽

C L Saratxaga ◽

B Pagador ◽

A Picón ◽

...

Keyword(s):

Colorectal Cancer ◽

Machine Learning ◽

Structural Information ◽

Ex Vivo ◽

Ground Truth ◽

Colon Polyps ◽

Learning Methods ◽

Non Invasive ◽

Machine Learning Methods ◽

In Situ Methods

Abstract INTRODUCTION New non-invasive technologies for improving early diagnosis of colorectal cancer (CRC) are demanded by clinicians. Optical Coherence Tomography (OCT) provides sub-surface structural information and offers diagnosis capabilities of colon polyps, further improved by machine learning methods. Databases of OCT images are necessary to facilitate algorithms development and testing. MATERIALS AND METHODS A database has been acquired from rat colonic samples with a Thorlabs OCT system with 930nm centre wavelength that provides 1.2KHz A-scan rate, 7μm axial resolution in air, 4μm lateral resolution, 1.7mm imaging depth in air, 6mm x 6mm FOV, and 107dB sensitivity. The colon from anaesthetised animals has been excised and samples have been extracted and preserved for ex-vivo analysis with the OCT equipment. RESULTS This database consists of OCT 3D volumes (C-scans) and 2D images (B-scans) of murine samples from: 1) healthy tissue, for ground-truth comparison (18 samples; 66 C-scans; 17,478 B-scans); 2) hyperplastic polyps, obtained from an induced colorectal hyperplastic murine model (47 samples; 153 C-scans; 42,450 B-scans); 3) neoplastic polyps (adenomatous and adenocarcinomatous), obtained from clinically validated Pirc F344/NTac-Apcam1137 rat model (232 samples; 564 C-scans; 158,557 B-scans); and 4) unknown tissue (polyp adjacent, presumably healthy) (98 samples; 157 C-scans; 42,070 B-scans). CONCLUSIONS A novel extensive ex-vivo OCT database of murine CRC model has been obtained and will be openly published for the research community. It can be used for classification/segmentation machine learning methods, for correlation between OCT features and histopathological structures, and for developing new non-invasive in-situ methods of diagnosis of colorectal cancer.

Download Full-text

Application of machine learning-based models to boost the predictive power of the SPAN index

International Journal of Neuroscience ◽

10.1080/00207454.2021.1881092 ◽

2021 ◽

pp. 1-11

Author(s):

Chen-Chih Chung ◽

Oluwaseun Adebayo Bamodu ◽

Chien-Tai Hong ◽

Lung Chan ◽

Hung-Wen Chiu

Keyword(s):

Machine Learning ◽

Predictive Power

Download Full-text

Experimental Evaluation of Computer Vision and Machine Learning-Based UAV Detection and Ranging

Drones ◽

10.3390/drones5020037 ◽

2021 ◽

Vol 5 (2) ◽

pp. 37

Author(s):

Bingsheng Wei ◽

Martin Barczyk

Keyword(s):

Machine Learning ◽

Mean Squared Error ◽

Tracking System ◽

Ground Truth ◽

White Background ◽

Cascade Classifier ◽

Detection Algorithms ◽

Squared Error ◽

Test Conditions ◽

Video Feed

We consider the problem of vision-based detection and ranging of a target UAV using the video feed from a monocular camera onboard a pursuer UAV. Our previously published work in this area employed a cascade classifier algorithm to locate the target UAV, which was found to perform poorly in complex background scenes. We thus study the replacement of the cascade classifier algorithm with newer machine learning-based object detection algorithms. Five candidate algorithms are implemented and quantitatively tested in terms of their efficiency (measured as frames per second processing rate), accuracy (measured as the root mean squared error between ground truth and detected location), and consistency (measured as mean average precision) in a variety of flight patterns, backgrounds, and test conditions. Assigning relative weights of 20%, 40% and 40% to these three criteria, we find that when flying over a white background, the top three performers are YOLO v2 (76.73 out of 100), Faster RCNN v2 (63.65 out of 100), and Tiny YOLO (59.50 out of 100), while over a realistic background, the top three performers are Faster RCNN v2 (54.35 out of 100, SSD MobileNet v1 (51.68 out of 100) and SSD Inception v2 (50.72 out of 100), leading us to recommend Faster RCNN v2 as the recommended solution. We then provide a roadmap for further work in integrating the object detector into our vision-based UAV tracking system.

Download Full-text

Minimizing Training Time of Distributed Machine Learning by Reducing Data Communication

IEEE Transactions on Network Science and Engineering ◽

10.1109/tnse.2021.3073897 ◽

2021 ◽

pp. 1-1

Author(s):

Yubin Duan ◽

Ning Wang ◽

Jie Wu

Keyword(s):

Machine Learning ◽

Data Communication ◽

Training Time ◽

Distributed Machine Learning

Download Full-text

CAFD: Context-Aware Fault Diagnostic Scheme towards Sensor Faults Utilizing Machine Learning

Sensors ◽

10.3390/s21020617 ◽

2021 ◽

Vol 21 (2) ◽

pp. 617

Author(s):

Umer Saeed ◽

Young-Doo Lee ◽

Sana Ullah Jan ◽

Insoo Koo

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Diagnostic System ◽

Machine Learning Algorithms ◽

Support Vector ◽

Context Aware ◽

Sensor Faults ◽

Training Time ◽

Low Intensity ◽

Fault Diagnostic

Sensors’ existence as a key component of Cyber-Physical Systems makes it susceptible to failures due to complex environments, low-quality production, and aging. When defective, sensors either stop communicating or convey incorrect information. These unsteady situations threaten the safety, economy, and reliability of a system. The objective of this study is to construct a lightweight machine learning-based fault detection and diagnostic system within the limited energy resources, memory, and computation of a Wireless Sensor Network (WSN). In this paper, a Context-Aware Fault Diagnostic (CAFD) scheme is proposed based on an ensemble learning algorithm called Extra-Trees. To evaluate the performance of the proposed scheme, a realistic WSN scenario composed of humidity and temperature sensor observations is replicated with extreme low-intensity faults. Six commonly occurring types of sensor fault are considered: drift, hard-over/bias, spike, erratic/precision degradation, stuck, and data-loss. The proposed CAFD scheme reveals the ability to accurately detect and diagnose low-intensity sensor faults in a timely manner. Moreover, the efficiency of the Extra-Trees algorithm in terms of diagnostic accuracy, F1-score, ROC-AUC, and training time is demonstrated by comparison with cutting-edge machine learning algorithms: a Support Vector Machine and a Neural Network.

Download Full-text