Chemical Gas Sensors: Recent Developments, Challenges, and the Potential of Machine Learning—A Review

Usman Yaqoob; Mohammad I. Younis

doi:10.3390/s21082877

Chemical Gas Sensors: Recent Developments, Challenges, and the Potential of Machine Learning—A Review

Sensors ◽

10.3390/s21082877 ◽

2021 ◽

Vol 21 (8) ◽

pp. 2877

Author(s):

Usman Yaqoob ◽

Mohammad I. Younis

Keyword(s):

Machine Learning ◽

Gas Sensors ◽

Cross Validation ◽

Experimental Studies ◽

Future Generation ◽

Machine Learning Techniques ◽

Healthcare Applications ◽

Highly Sensitive ◽

Metal Metal ◽

Smart Machine

Nowadays, there is increasing interest in fast, accurate, and highly sensitive smart gas sensors with excellent selectivity boosted by the high demand for environmental safety and healthcare applications. Significant research has been conducted to develop sensors based on novel highly sensitive and selective materials. Computational and experimental studies have been explored in order to identify the key factors in providing the maximum active location for gas molecule adsorption including bandgap tuning through nanostructures, metal/metal oxide catalytic reactions, and nano junction formations. However, there are still great challenges, specifically in terms of selectivity, which raises the need for combining interdisciplinary fields to build smarter and high-performance gas/chemical sensing devices. This review discusses current major gas sensing performance-enhancing methods, their advantages, and limitations, especially in terms of selectivity and long-term stability. The discussion then establishes a case for the use of smart machine learning techniques, which offer effective data processing approaches, for the development of highly selective smart gas sensors. We highlight the effectiveness of static, dynamic, and frequency domain feature extraction techniques. Additionally, cross-validation methods are also covered; in particular, the manipulation of the k-fold cross-validation is discussed to accurately train a model according to the available datasets. We summarize different chemresistive and FET gas sensors and highlight their shortcomings, and then propose the potential of machine learning as a possible and feasible option. The review concludes that machine learning can be very promising in terms of building the future generation of smart, sensitive, and selective sensors.

Download Full-text

Effects of Brain Atlases and Machine Learning Methods on the Discrimination of Schizophrenia Patients: A Multimodal MRI Study

Frontiers in Neuroscience ◽

10.3389/fnins.2021.697168 ◽

2021 ◽

Vol 15 ◽

Author(s):

Jinyu Zang ◽

Yuanyuan Huang ◽

Lingyin Kong ◽

Bingye Lei ◽

Pengfei Ke ◽

...

Keyword(s):

Machine Learning ◽

Dimensionality Reduction ◽

Cross Validation ◽

Integrated Model ◽

Machine Learning Techniques ◽

Brain Atlas ◽

First Episode ◽

Learning Methods ◽

Machine Learning Methods ◽

Brain Atlases

Recently, machine learning techniques have been widely applied in discriminative studies of schizophrenia (SZ) patients with multimodal magnetic resonance imaging (MRI); however, the effects of brain atlases and machine learning methods remain largely unknown. In this study, we collected MRI data for 61 first-episode SZ patients (FESZ), 79 chronic SZ patients (CSZ) and 205 normal controls (NC) and calculated 4 MRI measurements, including regional gray matter volume (GMV), regional homogeneity (ReHo), amplitude of low-frequency fluctuation and degree centrality. We systematically analyzed the performance of two classifications (SZ vs NC; FESZ vs CSZ) based on the combinations of three brain atlases, five classifiers, two cross validation methods and 3 dimensionality reduction algorithms. Our results showed that the groupwise whole-brain atlas with 268 ROIs outperformed the other two brain atlases. In addition, the leave-one-out cross validation was the best cross validation method to select the best hyperparameter set, but the classification performances by different classifiers and dimensionality reduction algorithms were quite similar. Importantly, the contributions of input features to both classifications were higher with the GMV and ReHo features of brain regions in the prefrontal and temporal gyri. Furthermore, an ensemble learning method was performed to establish an integrated model, in which classification performance was improved. Taken together, these findings indicated the effects of these factors in constructing effective classifiers for psychiatric diseases and showed that the integrated model has the potential to improve the clinical diagnosis and treatment evaluation of SZ.

Download Full-text

Application of Machine Learning and Artificial Intelligence Techniques for IVF Analysis and Prediction

International Journal of Big Data and Analytics in Healthcare ◽

10.4018/ijbdah.2019070102 ◽

2019 ◽

Vol 4 (2) ◽

pp. 21-33

Author(s):

Satya Kiranmai Tadepalli ◽

P.V. Lakshmi

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Future Generation ◽

Machine Learning Techniques ◽

Successful Pregnancy ◽

Vitro Fertilization ◽

Digital Revolution ◽

Learning Techniques ◽

Assistive Reproductive Technology

Infertility is the combination of factors that prevent pregnancy. It involves a lot of care and expertise while selecting the best embryo to lead to a successful pregnancy. Assistive reproductive technology (ART) helps to solve this issue. In vitro fertilization (IVF) is one of the methods of ART which is very popular. Artificial intelligence will have digital revolution and manifold advances in the field of reproductive medicine and will eventually provide immense benefits to infertile patients. The main aim of this article is to focus on the methods that can predict the accuracy of pregnancy without human intervention. It provides successful studies conducted by using machine learning techniques. This easily enables doctors to understand the behavior of the attributes which are suitable for the treatment. Blastocyst images can be deployed for the detection and prediction of the best embryo which has the maximum chance of a successful pregnancy. This pioneering work gives one a view into how this field could benefit the future generation.

Download Full-text

Learning predictive models of drug side-effect relationships from distributed representations of literature-derived semantic predications

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocy077 ◽

2018 ◽

Vol 25 (10) ◽

pp. 1339-1350 ◽

Cited By ~ 5

Author(s):

Justin Mower ◽

Devika Subramanian ◽

Trevor Cohen

Keyword(s):

Machine Learning ◽

Language Processing ◽

Side Effect ◽

Cross Validation ◽

Processing System ◽

Biomedical Literature ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Drug Side Effect ◽

Natural Language Processing System

Abstract Objective The aim of this work is to leverage relational information extracted from biomedical literature using a novel synthesis of unsupervised pretraining, representational composition, and supervised machine learning for drug safety monitoring. Methods Using ≈80 million concept-relationship-concept triples extracted from the literature using the SemRep Natural Language Processing system, distributed vector representations (embeddings) were generated for concepts as functions of their relationships utilizing two unsupervised representational approaches. Embeddings for drugs and side effects of interest from two widely used reference standards were then composed to generate embeddings of drug/side-effect pairs, which were used as input for supervised machine learning. This methodology was developed and evaluated using cross-validation strategies and compared to contemporary approaches. To qualitatively assess generalization, models trained on the Observational Medical Outcomes Partnership (OMOP) drug/side-effect reference set were evaluated against a list of ≈1100 drugs from an online database. Results The employed method improved performance over previous approaches. Cross-validation results advance the state of the art (AUC 0.96; F1 0.90 and AUC 0.95; F1 0.84 across the two sets), outperforming methods utilizing literature and/or spontaneous reporting system data. Examination of predictions for unseen drug/side-effect pairs indicates the ability of these methods to generalize, with over tenfold label support enrichment in the top 100 predictions versus the bottom 100 predictions. Discussion and Conclusion Our methods can assist the pharmacovigilance process using information from the biomedical literature. Unsupervised pretraining generates a rich relationship-based representational foundation for machine learning techniques to classify drugs in the context of a putative side effect, given known examples.

Download Full-text

Designing optimized ternary catalytic alloy electrode for efficiency improvement of semiconductor gas sensors using a machine learning approach

Decision Making Applications in Management and Engineering ◽

10.31181/dmame210402126g ◽

2021 ◽

Vol 4 (2) ◽

pp. 126-139

Author(s):

Suman Ghosal Ghosal ◽

◽

Swati Dey ◽

Partha Pratim Chattopadhyay ◽

Shubhabrata Datta ◽

...

Keyword(s):

Machine Learning ◽

Gas Sensors ◽

Noble Metals ◽

Experimental Studies ◽

Alloy Electrode ◽

Test Case ◽

Multi Objective Genetic Algorithm ◽

Ternary Metal ◽

Ann Models ◽

Artificial Neural Network Ann

Catalytic noble metal (s) or its alloy (s) has long been used as the electrode material to enhance the sensing performance of the semiconducting oxide based gas sensors. In the present paper, design of optimized ternary metal alloy electrode, while the database is in pure or binary alloy compositions, using a machine learning methodology is reported for detection of CH4 gas as a test case. Pure noble metals or their binary alloys as the electrode on the semiconducting ZnO sensing layer were investigated by the earlier researchers to enhance the sensitivity towards CH4. Based on those research findings, an artificial neural network (ANN) models were developed considering the three main features of the gas sensor devices, viz. response magnitude, response time and recovery time as a function of ZnO particle size and the composition of the catalytic alloy. A novel methodology was introduced by using ANN models considered for optimized ternary alloy with enriched presentation through the multi-objective genetic algorithm (GA) wherever the generated pareto front was used. The prescriptive data analytics methodology seems to offer more or less convinced evidences for future experimental studies.

Download Full-text

Gap, techniques and evaluation: traffic flow prediction using machine learning and deep learning

Journal Of Big Data ◽

10.1186/s40537-021-00542-7 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Noor Afiza Mat Razali ◽

Nuraini Shamsaimon ◽

Khairul Khalil Ishak ◽

Suzaimah Ramli ◽

Mohd Fahmi Mohamad Amran ◽

...

Keyword(s):

Machine Learning ◽

Traffic Flow ◽

Traffic Congestion ◽

Intelligent Transportation System ◽

Short Term Memory ◽

Smart Cities ◽

Experimental Studies ◽

Machine Learning Techniques ◽

Traffic Flow Prediction ◽

Flow Prediction

AbstractThe development of the Internet of Things (IoT) has produced new innovative solutions, such as smart cities, which enable humans to have a more efficient, convenient and smarter way of life. The Intelligent Transportation System (ITS) is part of several smart city applications where it enhances the processes of transportation and commutation. ITS aims to solve traffic problems, mainly traffic congestion. In recent years, new models and frameworks for predicting traffic flow have been rapidly developed to enhance the performance of traffic flow prediction, alongside the implementation of Artificial Intelligence (AI) methods such as machine learning (ML). To better understand how ML implementations can enhance traffic flow prediction, it is important to inclusively know the current research that has been conducted. The objective of this paper is to present a comprehensive and systematic review of the literature involving 39 articles published from 2016 onwards and extracted from four main databases: Scopus, ScienceDirect, SpringerLink and Taylor & Francis. The extracted information includes the gaps, approaches, evaluation methods, variables, datasets and results of each reviewed study based on the methodology and algorithms used for the purpose of predicting traffic flow. Based on our findings, the common and frequent machine learning techniques that have been applied for traffic flow prediction are Convolutional Neural Network and Long-Short Term Memory. The performance of their proposed techniques was compared with existing baseline models to determine their effectiveness. This paper is limited to certain literature pertaining to common databases. Through this limitation, the discussion is more focused on (and limited to) the techniques found on the list of reviewed articles. The aim of this paper is to provide a comprehensive understanding of the application of ML and DL techniques for improving traffic flow prediction, contributing to the betterment of ITS in smart cities. For future endeavours, experimental studies that apply the most used techniques in the articles reviewed in this study (such as CNN, LSTM or a combination of both techniques) can be accomplished to enhance traffic flow prediction. The results can be compared with baseline studies to determine the accuracy of these techniques.

Download Full-text

A Supervised Machine Learning Approach to Detect the On/Off State in Parkinson’s Disease Using Wearable Based Gait Signals

Diagnostics ◽

10.3390/diagnostics10060421 ◽

2020 ◽

Vol 10 (6) ◽

pp. 421

Author(s):

Satyabrata Aich ◽

Jinyoung Youn ◽

Sabyasachi Chakraborty ◽

Pyari Mohan Pradhan ◽

Jin-han Park ◽

...

Keyword(s):

Machine Learning ◽

Parkinson’S Disease ◽

Parkinson's Disease ◽

Random Forest ◽

Wearable Devices ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Healthcare Applications ◽

Reported Data

Fluctuations in motor symptoms are mostly observed in Parkinson’s disease (PD) patients. This characteristic is inevitable, and can affect the quality of life of the patients. However, it is difficult to collect precise data on the fluctuation characteristics using self-reported data from PD patients. Therefore, it is necessary to develop a suitable technology that can detect the medication state, also termed the “On”/“Off” state, automatically using wearable devices; at the same time, this could be used in the home environment. Recently, wearable devices, in combination with powerful machine learning techniques, have shown the potential to be effectively used in critical healthcare applications. In this study, an algorithm is proposed that can detect the medication state automatically using wearable gait signals. A combination of features that include statistical features and spatiotemporal gait features are used as inputs to four different classifiers such as random forest, support vector machine, K nearest neighbour, and Naïve Bayes. In total, 20 PD subjects with definite motor fluctuations have been evaluated by comparing the performance of the proposed algorithm in association with the four aforementioned classifiers. It was found that random forest outperformed the other classifiers with an accuracy of 96.72%, a recall of 97.35%, and a precision of 96.92%.

Download Full-text

Algebraic Shortcuts for Leave-One-Out Cross-Validation in Supervised Network Inference

10.1101/242321 ◽

2018 ◽

Author(s):

Michiel Stock ◽

Tapio Pahikkala ◽

Antti Airola ◽

Willem Waegeman ◽

Bernard De Baets

Keyword(s):

Machine Learning ◽

Biological Networks ◽

Regulatory Networks ◽

Network Inference ◽

Cross Validation ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Ligand Interaction ◽

Learning Techniques ◽

Leave One Out

AbstractMotivationSupervised machine learning techniques have traditionally been very successful at reconstructing biological networks, such as protein-ligand interaction, protein-protein interaction and gene regulatory networks. Recently, much emphasis has been placed on the correct evaluation of such supervised models. It is vital to distinguish between using the model to either predict new interactions in a given network or to predict interactions for a new vertex not present in the original network. Specific cross-validation schemes need to be used to assess the performance in such different prediction settings.ResultsWe present a series of leave-one-out cross-validation shortcuts to rapidly estimate the performance of state-of-the-art kernel-based network inference techniques.AvailabilityThe machine learning techniques with the algebraic shortcuts are implemented in the RLScore software package.

Download Full-text

I TRIED A BUNCH OF THINGS: THE DANGERS OF UNEXPECTED OVERFITTING IN CLASSIFICATION

10.1101/078816 ◽

2016 ◽

Cited By ~ 13

Author(s):

Michael Powell ◽

Mahan Hosseini ◽

John Collins ◽

Chloe Callahan-Flintoft ◽

William Jones ◽

...

Keyword(s):

Machine Learning ◽

Cross Validation ◽

Machine Learning Techniques ◽

Random Data ◽

Effective Protection ◽

Data Set ◽

Learning Techniques ◽

Original Analysis ◽

Spurious Result ◽

Spurious Results

ABSTRACTMachine learning is a powerful set of techniques that has enhanced the abilities of neuroscientists to interpret information collected through EEG, fMRI, and MEG data. With these powerful techniques comes the danger of overfitting of hyper-parameters which can render results invalid, and cause a failure to generalize beyond the data set. We refer to this problem as ‘over-hyping’ and show that it is pernicious despite commonly used precautions. In particular, over-hyping occurs when an analysis is run repeatedly with slightly different analysis parameters and one set of results is selected based on the analysis. When this is done, the resulting method is unlikely to generalize to a new dataset, rendering it a partially, or perhaps even completely spurious result that will not be valid outside of the data used in the original analysis. While it is commonly assumed that cross-validation is an effective protection against such spurious results generated through overfitting or overhyping, this is not actually true. In this article, we show that both one-shot and iterative optimization of an analysis are prone to over-hyping, despite the use of cross-validation. We demonstrate that non-generalizable results can be obtained even on non-informative (i.e. random) data by modifying hyper-parameters in seemingly innocuous ways. We recommend a number of techniques for limiting over-hyping, such as lock-boxes, blind analyses, pre-registrations, and nested cross-validation. These techniques, are common in other fields that use machine learning, including computer science and physics. Adopting similar safeguards is critical for ensuring the robustness of machine-learning techniques in the neurosciences.

Download Full-text

Classification of Fetal State through the application of Machine Learning techniques on Cardiotocography records: Towards Real World Application.

10.1101/2021.06.03.21255808 ◽

2021 ◽

Author(s):

Andrew M V Dadario ◽

Christian Espinoza ◽

Wellington Araujo Nogueira

Keyword(s):

Machine Learning ◽

Real World ◽

Cross Validation ◽

Low Cost ◽

Gaussian Process Regression ◽

Machine Learning Techniques ◽

Real World Application ◽

Learning Techniques ◽

Qualified Personnel

Objective Anticipating fetal risk is a major factor in reducing child and maternal mortality and suffering. In this context cardiotocography (CTG) is a low cost, well established procedure that has been around for decades, despite lacking consensus regarding its impact on outcomes. Machine learning emerged as an option for automatic classification of CTG records, as previous studies showed expert level results, but often came at the price of reduced generalization potential. With that in mind, the present study sought to improve statistical rigor of evaluation towards real world application. Materials and Methods In this study, a dataset of 2126 CTG recordings labeled as normal, suspect or pathological by the consensus of three expert obstetricians was used to create a baseline random forest model. This was followed by creating a lightgbm model tuned using gaussian process regression and post processed using cross validation ensembling. Performance was assessed using the area under the precision-recall curve (AUPRC) metric over 100 experiment executions, each using a testing set comprised of 30% of data stratified by the class label. Results The best model was a cross validation ensemble of lightgbm models that yielded 95.82% AUPRC. Conclusions The model is shown to produce consistent expert level performance at a less than negligible cost. At an estimated 0.78 USD per million predictions the model can generate value in settings with CTG qualified personnel and all the more in their absence.

Download Full-text

Interpolation of Instantaneous Air Temperature Using Geographical and MODIS Derived Variables with Machine Learning Techniques

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi8090382 ◽

2019 ◽

Vol 8 (9) ◽

pp. 382 ◽

Cited By ~ 2

Author(s):

Marcos Ruiz-Álvarez ◽

Francisco Alonso-Sarria ◽

Francisco Gomariz-Castillo

Keyword(s):

Machine Learning ◽

Random Forest ◽

Linear Regression ◽

Multiple Linear Regression ◽

Air Temperature ◽

Cross Validation ◽

Daily Basis ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector

Several methods have been tried to estimate air temperature using satellite imagery. In this paper, the results of two machine learning algorithms, Support Vector Machines and Random Forest, are compared with Multiple Linear Regression and Ordinary kriging. Several geographic, remote sensing and time variables are used as predictors. The validation is carried out using two different approaches, a leave-one-out cross validation in the spatial domain and a spatio-temporal k-block cross-validation, and four different statistics on a daily basis, allowing the use of ANOVA to compare the results. The main conclusion is that Random Forest produces the best results (R 2 = 0.888 ± 0.026, Root mean square error = 3.01 ± 0.325 using k-block cross-validation). Regression methods (Support Vector Machine, Random Forest and Multiple Linear Regression) are calibrated with MODIS data and several predictors easily calculated from a Digital Elevation Model. The most important variables in the Random Forest model were satellite temperature, potential irradiation and cdayt, a cosine transformation of the julian day.

Download Full-text