Proposing Enhanced Feature Engineering and a Selection Model for Machine Learning Processes

Deep neural networks are hegemonic approaches to many machine learning areas, including natural language processing (NLP). Thanks to the availability of large corpora collections and the capability of deep architectures to shape internal language mechanisms in self-supervised learning processes (also known as “pre-training”), versatile and performing models are released continuously for every new network design. These networks, somehow, learn a probability distribution of words and relations across the training collection used, inheriting the potential flaws, inconsistencies and biases contained in such a collection. As pre-trained models have been found to be very useful approaches to transfer learning, dealing with bias has become a relevant issue in this new scenario. We introduce bias in a formal way and explore how it has been treated in several networks, in terms of detection and correction. In addition, available resources are identified and a strategy to deal with bias in deep NLP is proposed.

Download Full-text

Fuzzy based feature engineering architecture for sentiment analysis of medical discussion over online social networks

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202874 ◽

2021 ◽

pp. 1-13

Author(s):

C S Pavan Kumar ◽

L D Dhinesh Babu

Keyword(s):

Machine Learning ◽

Social Networks ◽

Social Media ◽

Sentiment Analysis ◽

Membership Function ◽

Online Social Networks ◽

Learning Model ◽

Feature Engineering ◽

Machine Learning Model ◽

Social Media Platforms

Sentiment analysis is widely used to retrieve the hidden sentiments in medical discussions over Online Social Networking platforms such as Twitter, Facebook, Instagram. People often tend to convey their feelings concerning their medical problems over social media platforms. Practitioners and health care workers have started to observe these discussions to assess the impact of health-related issues among the people. This helps in providing better care to improve the quality of life. Dementia is a serious disease in western countries like the United States of America and the United Kingdom, and the respective governments are providing facilities to the affected people. There is much chatter over social media platforms concerning the patients’ care, healthy measures to be followed to avoid disease, check early indications. These chatters have to be carefully monitored to help the officials take necessary precautions for the betterment of the affected. A novel Feature engineering architecture that involves feature-split for sentiment analysis of medical chatter over online social networks with the pipeline is proposed that can be used on any Machine Learning model. The proposed model used the fuzzy membership function in refining the outputs. The machine learning model has obtained sentiment score is subjected to fuzzification and defuzzification by using the trapezoid membership function and center of sums method, respectively. Three datasets are considered for comparison of the proposed and the regular model. The proposed approach delivered better results than the normal approach and is proved to be an effective approach for sentiment analysis of medical discussions over online social networks.

Download Full-text

Validating Deep Neural Networks for Online Decoding of Motor Imagery Movements from EEG Signals

Sensors ◽

10.3390/s19010210 ◽

2019 ◽

Vol 19 (1) ◽

pp. 210 ◽

Cited By ~ 32

Author(s):

Zied Tayeb ◽

Juri Fedjaev ◽

Nejla Ghaboosi ◽

Christoph Richter ◽

Lukas Everding ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Convolutional Neural Network ◽

Motor Imagery ◽

Classification Performance ◽

Feature Engineering ◽

Learning Models ◽

Eeg Signals ◽

Learning Methods

Non-invasive, electroencephalography (EEG)-based brain-computer interfaces (BCIs) on motor imagery movements translate the subject’s motor intention into control signals through classifying the EEG patterns caused by different imagination tasks, e.g., hand movements. This type of BCI has been widely studied and used as an alternative mode of communication and environmental control for disabled patients, such as those suffering from a brainstem stroke or a spinal cord injury (SCI). Notwithstanding the success of traditional machine learning methods in classifying EEG signals, these methods still rely on hand-crafted features. The extraction of such features is a difficult task due to the high non-stationarity of EEG signals, which is a major cause by the stagnating progress in classification performance. Remarkable advances in deep learning methods allow end-to-end learning without any feature engineering, which could benefit BCI motor imagery applications. We developed three deep learning models: (1) A long short-term memory (LSTM); (2) a spectrogram-based convolutional neural network model (CNN); and (3) a recurrent convolutional neural network (RCNN), for decoding motor imagery movements directly from raw EEG signals without (any manual) feature engineering. Results were evaluated on our own publicly available, EEG data collected from 20 subjects and on an existing dataset known as 2b EEG dataset from “BCI Competition IV”. Overall, better classification performance was achieved with deep learning models compared to state-of-the art machine learning techniques, which could chart a route ahead for developing new robust techniques for EEG signal decoding. We underpin this point by demonstrating the successful real-time control of a robotic arm using our CNN based BCI.

Download Full-text

Feature engineering of machine-learning chemisorption models for catalyst design

Catalysis Today ◽

10.1016/j.cattod.2016.04.013 ◽

2017 ◽

Vol 280 ◽

pp. 232-238 ◽

Cited By ~ 67

Author(s):

Zheng Li ◽

Xianfeng Ma ◽

Hongliang Xin

Keyword(s):

Machine Learning ◽

Catalyst Design ◽

Feature Engineering

Download Full-text

Automated Feature Engineering and Hyperparameter optimization for Machine Learning

2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS) ◽

10.1109/icaccs51430.2021.9441668 ◽

2021 ◽

Author(s):

Mihir Gada ◽

Zenil Haria ◽

Arnav Mankad ◽

Kaustubh Damania ◽

Smita Sankhe

Keyword(s):

Machine Learning ◽

Feature Engineering ◽

Hyperparameter Optimization

Download Full-text

Feature engineering combined with machine learning and rule-based methods for structured information extraction from narrative clinical discharge summaries

Journal of the American Medical Informatics Association ◽

10.1136/amiajnl-2011-000776 ◽

2012 ◽

Vol 19 (5) ◽

pp. 824-832 ◽

Cited By ~ 38

Author(s):

Yan Xu ◽

Kai Hong ◽

Junichi Tsujii ◽

Eric I-Chao Chang

Keyword(s):

Machine Learning ◽

Information Extraction ◽

Feature Engineering ◽

Rule Based ◽

Structured Information ◽

Discharge Summaries

Download Full-text

An Efficient SMOTE-Based Deep Learning Model for Heart Attack Prediction

Scientific Programming ◽

10.1155/2021/6621622 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Muhammad Waqar ◽

Hassan Dawood ◽

Hussain Dawood ◽

Nadeem Majeed ◽

Ameen Banjar ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Heart Attack ◽

High Reliability ◽

Learning Algorithms ◽

Research Work ◽

Machine Learning Algorithms ◽

Feature Engineering ◽

Unequal Distribution ◽

The Given

Cardiac disease treatments are often being subjected to the acquisition and analysis of vast quantity of digital cardiac data. These data can be utilized for various beneficial purposes. These data’s utilization becomes more important when we are dealing with critical diseases like a heart attack where patient life is often at stake. Machine learning and deep learning are two famous techniques that are helping in making the raw data useful. Some of the biggest problems that arise from the usage of the aforementioned techniques are massive resource utilization, extensive data preprocessing, need for features engineering, and ensuring reliability in classification results. The proposed research work presents a cost-effective solution to predict heart attack with high accuracy and reliability. It uses a UCI dataset to predict the heart attack via various machine learning algorithms without the involvement of any feature engineering. Moreover, the given dataset has an unequal distribution of positive and negative classes which can reduce performance. The proposed work uses a synthetic minority oversampling technique (SMOTE) to handle given imbalance data. The proposed system discarded the need of feature engineering for the classification of the given dataset. This led to an efficient solution as feature engineering often proves to be a costly process. The results show that among all machine learning algorithms, SMOTE-based artificial neural network when tuned properly outperformed all other models and many existing systems. The high reliability of the proposed system ensures that it can be effectively used in the prediction of the heart attack.

Download Full-text

Prioritizing Small Molecule as Candidates for Drug Repositioning using Machine Learning

10.1101/331975 ◽

2018 ◽

Author(s):

Khader Shameer ◽

Kipp W. Johnson ◽

Benjamin S. Glicksberg ◽

Rachel Hodos ◽

Ben Readhead ◽

...

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Small Molecule ◽

Chemical Space ◽

Drug Repositioning ◽

Chemical Properties ◽

Support Vector ◽

Feature Engineering ◽

Connectivity Map ◽

Molecular Features

ABSTRACTDrug repositioning, i.e. identifying new uses for existing drugs and research compounds, is a cost-effective drug discovery strategy that is continuing to grow in popularity. Prioritizing and identifying drugs capable of being repositioned may improve the productivity and success rate of the drug discovery cycle, especially if the drug has already proven to be safe in humans. In previous work, we have shown that drugs that have been successfully repositioned have different chemical properties than those that have not. Hence, there is an opportunity to use machine learning to prioritize drug-like molecules as candidates for future repositioning studies. We have developed a feature engineering and machine learning that leverages data from publicly available drug discovery resources: RepurposeDB and DrugBank. ChemVec is the chemoinformatics-based feature engineering strategy designed to compile molecular features representing the chemical space of all drug molecules in the study. ChemVec was trained through a variety of supervised classification algorithms (Naïve Bayes, Random Forest, Support Vector Machines and an ensemble model combining the three algorithms). Models were created using various combinations of datasets as Connectivity Map based model, DrugBank Approved compounds based model, and DrugBank full set of compounds; of which RandomForest trained using Connectivity Map based data performed the best (AUC=0.674). Briefly, our study represents a novel approach to evaluate a small molecule for drug repositioning opportunity and may further improve discovery of pleiotropic drugs, or those to treat multiple indications.

Download Full-text