Correlation between Indoor Environmental Data and Biometric Parameters for the Impact Assessment of a Living Wall in a ZEB Lab

Francesco Salamone; Benedetta Barozzi; Ludovico Danza; Matteo Ghellere; Italo Meroni

doi:10.3390/s20092523

Correlation between Indoor Environmental Data and Biometric Parameters for the Impact Assessment of a Living Wall in a ZEB Lab

Sensors ◽

10.3390/s20092523 ◽

2020 ◽

Vol 20 (9) ◽

pp. 2523

Author(s):

Francesco Salamone ◽

Benedetta Barozzi ◽

Ludovico Danza ◽

Matteo Ghellere ◽

Italo Meroni

Keyword(s):

Environmental Parameters ◽

Environmental Data ◽

Gradient Boosting ◽

View Factor ◽

Accuracy Score ◽

Biometric Data ◽

Extreme Gradient Boosting ◽

Living Wall ◽

Plant Configuration ◽

The Impact

Users’ satisfaction in indoor spaces plays a key role in building design. In recent years, scientific research has focused more and more on the effects produced by the presence of greenery solutions in indoor environments. In this study, the Internet of Things (IoT) concept is used to define an effective solution to monitor indoor environmental parameters, along with the biometric data of users involved in an experimental campaign conducted in a Zero Energy Building laboratory where a living wall has been installed. The growing interest in the key theory of the IoT allows for the development of promising frameworks used to create datasets usually managed with Machine Learning (ML) approaches. Following this tendency, the dataset derived by the proposed infield research has been managed with different ML algorithms in order to identify the most suitable model and influential variables, among the environmental and biometric ones, that can be used to identify the plant configuration. The obtained results highlight how the eXtreme Gradient Boosting (XGBoost)-based model can obtain the best average accuracy score to predict the plant configuration considering both a selection of environmental parameters and biometric data as input values. Moreover, the XGBoost model has been used to identify the users with the highest accuracy considering a combination of picked biometric and environmental features. Finally, a new Green View Factor index has been introduced to characterize how greenery has an impact on the indoor space and it can be used to compare different studies where green elements have been used.

Download Full-text

The impact of environmental variables on the spread of COVID-19 in the Republic of Korea

Scientific Reports ◽

10.1038/s41598-021-85493-y ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Yong Kwan Lim ◽

Oh Joo Kweon ◽

Hye Ryoun Kim ◽

Tae-Hyoung Kim ◽

Mi-Kyung Lee

Keyword(s):

Environmental Factors ◽

Virus Disease ◽

Environmental Parameters ◽

Environmental Data ◽

Republic Of Korea ◽

Metropolitan Region ◽

Health Concern ◽

Ozone Level ◽

The Republic ◽

The Impact

AbstractCorona virus disease 2019 (COVID-19) has been declared a global pandemic and is a major public health concern worldwide. In this study, we aimed to determine the role of environmental factors, such as climate and air pollutants, in the transmission of COVID-19 in the Republic of Korea. We collected epidemiological and environmental data from two regions of the Republic of Korea, namely Seoul metropolitan region (SMR) and Daegu-Gyeongbuk region (DGR) from February 2020 to July 2020. The data was then analyzed to identify correlations between each environmental factor with confirmed daily COVID-19 cases. Among the various environmental parameters, the duration of sunshine and ozone level were found to positively correlate with COVID-19 cases in both regions. However, the association of temperature variables with COVID-19 transmission revealed contradictory results when comparing the data from SMR and DGR. Moreover, statistical bias may have arisen due to an extensive epidemiological investigation and altered socio-behaviors that occurred in response to a COVID-19 outbreak. Nevertheless, our results suggest that various environmental factors may play a role in COVID-19 transmission.

Download Full-text

Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival

Scientific Reports ◽

10.1038/s41598-021-86327-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Arturo Moncada-Torres ◽

Marissa C. van Maaren ◽

Mathijs P. Hendriks ◽

Sabine Siesling ◽

Gijs Geleijnse

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Explicit Knowledge ◽

Cox Regression ◽

Metastatic Breast ◽

Gradient Boosting ◽

Support Vector ◽

Netherlands Cancer Registry ◽

Extreme Gradient Boosting ◽

The Impact

AbstractCox Proportional Hazards (CPH) analysis is the standard for survival analysis in oncology. Recently, several machine learning (ML) techniques have been adapted for this task. Although they have shown to yield results at least as good as classical methods, they are often disregarded because of their lack of transparency and little to no explainability, which are key for their adoption in clinical settings. In this paper, we used data from the Netherlands Cancer Registry of 36,658 non-metastatic breast cancer patients to compare the performance of CPH with ML techniques (Random Survival Forests, Survival Support Vector Machines, and Extreme Gradient Boosting [XGB]) in predicting survival using the $$c$$ c -index. We demonstrated that in our dataset, ML-based models can perform at least as good as the classical CPH regression ($$c$$ c -index $$\sim \,0.63$$ ∼ 0.63 ), and in the case of XGB even better ($$c$$ c -index $$\sim 0.73$$ ∼ 0.73 ). Furthermore, we used Shapley Additive Explanation (SHAP) values to explain the models’ predictions. We concluded that the difference in performance can be attributed to XGB’s ability to model nonlinearities and complex interactions. We also investigated the impact of specific features on the models’ predictions as well as their corresponding insights. Lastly, we showed that explainable ML can generate explicit knowledge of how models make their predictions, which is crucial in increasing the trust and adoption of innovative ML techniques in oncology and healthcare overall.

Download Full-text

Exploring the Mechanism of Crashes with Autonomous Vehicles Using Machine Learning

Mathematical Problems in Engineering ◽

10.1155/2021/5524356 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Hengrui Chen ◽

Hong Chen ◽

Ruiyu Zhou ◽

Zhizhen Liu ◽

Xiaoke Sun

Keyword(s):

Machine Learning ◽

Autonomous Vehicles ◽

Classification And Regression Tree ◽

Gradient Boosting ◽

Support Vector ◽

Crash Severity ◽

Apriori Algorithm ◽

Driving Mode ◽

Extreme Gradient Boosting ◽

The Impact

The safety issue has become a critical obstacle that cannot be ignored in the marketization of autonomous vehicles (AVs). The objective of this study is to explore the mechanism of AV-involved crashes and analyze the impact of each feature on crash severity. We use the Apriori algorithm to explore the causal relationship between multiple factors to explore the mechanism of crashes. We use various machine learning models, including support vector machine (SVM), classification and regression tree (CART), and eXtreme Gradient Boosting (XGBoost), to analyze the crash severity. Besides, we apply the Shapley Additive Explanations (SHAP) to interpret the importance of each factor. The results indicate that XGBoost obtains the best result (recall = 75%; G-mean = 67.82%). Both XGBoost and Apriori algorithm effectively provided meaningful insights about AV-involved crash characteristics and their relationship. Among all these features, vehicle damage, weather conditions, accident location, and driving mode are the most critical features. We found that most rear-end crashes are conventional vehicles bumping into the rear of AVs. Drivers should be extremely cautious when driving in fog, snow, and insufficient light. Besides, drivers should be careful when driving near intersections, especially in the autonomous driving mode.

Download Full-text

Efficiency of Extreme Gradient Boosting for Imbalanced Land Cover Classification Using an Extended Margin and Disagreement Performance

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi8070315 ◽

2019 ◽

Vol 8 (7) ◽

pp. 315 ◽

Cited By ~ 1

Author(s):

Fei Sun ◽

Run Wang ◽

Bo Wan ◽

Yanjun Su ◽

Qinghua Guo ◽

...

Keyword(s):

Land Cover ◽

Error Component ◽

Training Data ◽

Gradient Boosting ◽

Correct Classification ◽

Imbalanced Learning ◽

Minority Class ◽

Extreme Gradient Boosting ◽

Spectral Separability ◽

The Impact

Imbalanced learning is a methodological challenge in remote sensing communities, especially in complex areas where the spectral similarity exists between land covers. Obtaining high-confidence classification results for imbalanced class issues is highly important in practice. In this paper, extreme gradient boosting (XGB), a novel tree-based ensemble system, is employed to classify the land cover types in Very-high resolution (VHR) images with imbalanced training data. We introduce an extended margin criterion and disagreement performance to evaluate the efficiency of XGB in imbalanced learning situations and examine the effect of minority class spectral separability on model performance. The results suggest that the uncertainty of XGB associated with correct classification is stable. The average probability-based margin of correct classification provided by XGB is 0.82, which is about 46.30% higher than that by random forest (RF) method (0.56). Moreover, the performance uncertainty of XGB is insensitive to spectral separability after the sample imbalance reached a certain level (minority:majority > 10:100). The impact of sample imbalance on the minority class is also related to its spectral separability, and XGB performs better than RF in terms of user accuracy for the minority class with imperfect separability. The disagreement components of XGB are better and more stable than RF with imbalanced samples, especially for complex areas with more types. In addition, appropriate sample imbalance helps to improve the trade-off between the recognition accuracy of XGB and the sample cost. According to our analysis, this margin-based uncertainty assessment and disagreement performance can help users identify the confidence level and error component in similar classification performance (overall, producer, and user accuracies).

Download Full-text

XGBoost-Based Day-Ahead Load Forecasting Algorithm Considering Behind-the-Meter Solar PV Generation

Energies ◽

10.3390/en15010128 ◽

2021 ◽

Vol 15 (1) ◽

pp. 128

Author(s):

Dong-Jin Bae ◽

Bo-Sung Kwon ◽

Kyung-Bin Song

Keyword(s):

Load Forecasting ◽

Rapid Expansion ◽

Gradient Boosting ◽

Base Temperature ◽

Electric Load ◽

Solar Pv ◽

Model Case ◽

Extreme Gradient Boosting ◽

Pv Generation ◽

The Impact

With the rapid expansion of renewable energy, the penetration rate of behind-the-meter (BTM) solar photovoltaic (PV) generators is increasing in South Korea. The BTM solar PV generation is not metered in real-time, distorts the electric load and increases the errors of load forecasting. In order to overcome the problems caused by the impact of BTM solar PV generation, an extreme gradient boosting (XGBoost) load forecasting algorithm is proposed. The capacity of the BTM solar PV generators is estimated based on an investigation of the deviation of load using a grid search. The influence of external factors was considered by using the fluctuation of the load used by lighting appliances and data filtering based on base temperature, as a result, the capacity of the BTM solar PV generators is accurately estimated. The distortion of electric load is eliminated by the reconstituted load method that adds the estimated BTM solar PV generation to the electric load, and the load forecasting is conducted using the XGBoost model. Case studies are performed to demonstrate the accuracy of prediction for the proposed method. The accuracy of the proposed algorithm was improved by 21% and 29% in 2019 and 2020, respectively, compared with the MAPE of the LSTM model that does not reflect the impact of BTM solar PV.

Download Full-text

Remote Diagnosis and Triaging Model for Skin Cancer Using EfficientNet and Extreme Gradient Boosting

Complexity ◽

10.1155/2021/5591614 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Irfan Ullah Khan ◽

Nida Aslam ◽

Talha Anwar ◽

Sumayh S. Aljameel ◽

Mohib Ullah ◽

...

Keyword(s):

Skin Cancer ◽

Skin Lesion ◽

Clinical Data ◽

Cancer Diagnosis ◽

Gradient Boosting ◽

Automated Diagnosis ◽

Data Set ◽

Diagnosis System ◽

Extreme Gradient Boosting ◽

The Impact

Due to the successful application of machine learning techniques in several fields, automated diagnosis system in healthcare has been increasing at a high rate. The aim of the study is to propose an automated skin cancer diagnosis and triaging model and to explore the impact of integrating the clinical features in the diagnosis and enhance the outcomes achieved by the literature study. We used an ensemble-learning framework, consisting of the EfficientNetB3 deep learning model for skin lesion analysis and Extreme Gradient Boosting (XGB) for clinical data. The study used PAD-UFES-20 data set consisting of six unbalanced categories of skin cancer. To overcome the data imbalance, we used data augmentation. Experiments were conducted using skin lesion merely and the combination of skin lesion and clinical data. We found that integration of clinical data with skin lesions enhances automated diagnosis accuracy. Moreover, the proposed model outperformed the results achieved by the previous study for the PAD-UFES-20 data set with an accuracy of 0.78, precision of 0.89, recall of 0.86, and F1 of 0.88. In conclusion, the study provides an improved automated diagnosis system to aid the healthcare professional and patients for skin cancer diagnosis and remote triaging.

Download Full-text

An Interpretable Extreme Gradient Boosting Model to Predict Ash Fusion Temperatures

Minerals ◽

10.3390/min10060487 ◽

2020 ◽

Vol 10 (6) ◽

pp. 487

Author(s):

Maciej Rzychoń ◽

Alina Żogała ◽

Leokadia Róg

Keyword(s):

Coefficient Of Determination ◽

Gradient Boosting ◽

Important Indicator ◽

Upper Silesian Coal Basin ◽

Proposed Model ◽

Extreme Gradient Boosting ◽

The Impact ◽

Partial Dependence ◽

Individual Input

The hemispherical temperature (HT) is the most important indicator representing ash fusion temperatures (AFTs) in the Polish industry to assess the suitability of coal for combustion as well as gasification purposes. It is important, for safe operation and energy saving, to know or to be able to predict value of this parameter. In this study a non-linear model predicting the HT value, based on ash oxides content for 360 coal samples from the Upper Silesian Coal Basin, was developed. The proposed model was established using the machine learning method—extreme gradient boosting (XGBoost) regressor. An important feature of models based on the XGBoost algorithm is the ability to determine the impact of individual input parameters on the predicted value using the feature importance (FI) technique. This method allowed the determination of ash oxides having the greatest impact on the projected HT. Then, the partial dependence plots (PDP) technique was used to visualize the effect of individual oxides on the predicted value. The results indicate that proposed model could estimate value of HT with high accuracy. The coefficient of determination (R2) of the prediction has reached satisfactory value of 0.88.

Download Full-text

Evaluating Risk-Stratified HPV Catch-up Vaccination Strategies: Should We Go beyond Age 26?

Medical Decision Making ◽

10.1177/0272989x211042894 ◽

2021 ◽

pp. 0272989X2110428

Author(s):

Fan Wang ◽

Kristen N. Jozkowski ◽

Shengfan Zhang

Keyword(s):

Simulation Model ◽

Clinical Outcomes ◽

Risk Model ◽

Hpv Vaccination ◽

The United States ◽

Gradient Boosting ◽

Transmitted Infection ◽

Extreme Gradient Boosting ◽

Catch Up ◽

The Impact

Background Human papillomavirus (HPV) is the most common sexually transmitted infection in the United States. HPV can cause genital warts and multiple types of cancers in females. HPV vaccination is recommended to youth age 11 or 12 years before sexual initiation to prevent onset of HPV-related diseases. For females who have not been vaccinated previously, catch-up vaccines are recommended through age 26. The extent to which catch-up vaccines are beneficial in terms of disease prevention and cost-effectiveness is questionable given that some women may have been exposed to HPV before receiving the catch-up vaccination. This study aims to examine whether the cutoff age of catch-up vaccination should be determined based on an individual woman’s risk characteristic instead of a one-size-fits-all age 26. Methods We developed a microsimulation model to evaluate multiple clinical outcomes of HPV vaccination for different women based on a number of personal attributes. We modeled the impact of HPV vaccination at different ages on every woman and tracked her course of life to estimate the clinical outcomes that resulted from receiving vaccines. As the simulation model is risk stratified, we used extreme gradient boosting to build an HPV risk model estimating every woman’s dynamic HPV risk over time for the lifetime simulation model. Results Our study shows that catch-up vaccines still benefit all women after age 26 from the perspective of clinical outcomes. Women facing high risk of HPV infection are expected to gain more health benefits compared with women with low HPV risk. Conclusions From a cancer prevention perspective, this study suggests that the catch-up vaccine after age 26 should be deliberately considered.

Download Full-text

Predicting Suitable Habitats of Melia Azedarach L. Using Data Mining

10.21203/rs.3.rs-1004808/v1 ◽

2021 ◽

Author(s):

Lei Feng ◽

Xiangni Tian ◽

Yousry A. El-Kassaby ◽

Jian Qiu ◽

Ze Feng ◽

...

Keyword(s):

Data Mining ◽

Species Distribution ◽

Mean Annual Precipitation ◽

Gradient Boosting ◽

Melia Azedarach ◽

Support Vector ◽

Suitable Habitat ◽

Degree Days ◽

Extreme Gradient Boosting ◽

The Impact

Abstract Background: Melia azedarach L. is a globally distributed tree species of economic importance; however, it is unclear how the species distribution will respond to future climate changes.Methods: We aimed to select the most accurate one among seven data mining models to predict the species suitable contemporary and future habitats. These models include: maximum entropy (MaxEnt), support vector machine (SVM), generalized linear model (GLM), random forest (RF), naive bayesian model (NBM), extreme gradient boosting (XGBoost), and gradient boosting machine (GBM). A total of 906 M. azedarach locations were identified, and sixteen climate predictors were used for model building. The models’ validity was assessed using three measures (Area Under the Curves (AUC), kappa, and accuracy). Results: We found that the RF provided the most outstanding performance in prediction power and generalization capacity. The top climate factors affecting the species distribution were mean coldest month temperature (MCMT), followed by the number of frost-free days (NFFD), degree-days above 18°C (DD>18), temperature difference between MWMT and MCMT, or continentality (TD), mean annual precipitation (MAP), and degree-days below 18°C (DD<18). We projected that future suitable habitat of this species would increase under both the RCP4.5 and RCP8.5 scenarios for the 2020s, 2050s, and 2080s.Conclusion: Our findings are expected to assist in better understanding the impact of climate change on the species and provide scientific basis for its planting and conservation.

Download Full-text

Improvement of Prediction Performance With Conjoint Molecular Fingerprint in Deep Learning

Frontiers in Pharmacology ◽

10.3389/fphar.2020.606668 ◽

2020 ◽

Vol 11 ◽

Author(s):

Liangxu Xie ◽

Lei Xu ◽

Ren Kong ◽

Shan Chang ◽

Xiaojun Xu

Keyword(s):

Deep Learning ◽

Short Term Memory ◽

Molecular Descriptor ◽

Predictive Performance ◽

Gradient Boosting ◽

Support Vector ◽

Quantitative Structure ◽

Structure Activity ◽

Extreme Gradient Boosting ◽

The Impact

The accurate predicting of physical properties and bioactivity of drug molecules in deep learning depends on how molecules are represented. Many types of molecular descriptors have been developed for quantitative structure-activity/property relationships quantitative structure-activity relationships (QSPR). However, each molecular descriptor is optimized for a specific application with encoding preference. Considering that standalone featurization methods may only cover parts of information of the chemical molecules, we proposed to build the conjoint fingerprint by combining two supplementary fingerprints. The impact of conjoint fingerprint and each standalone fingerprint on predicting performance was systematically evaluated in predicting the logarithm of the partition coefficient (logP) and binding affinity of protein-ligand by using machine learning/deep learning (ML/DL) methods, including random forest (RF), support vector regression (SVR), extreme gradient boosting (XGBoost), long short-term memory network (LSTM), and deep neural network (DNN). The results demonstrated that the conjoint fingerprint yielded improved predictive performance, even outperforming the consensus model using two standalone fingerprints among four out of five examined methods. Given that the conjoint fingerprint scheme shows easy extensibility and high applicability, we expect that the proposed conjoint scheme would create new opportunities for continuously improving predictive performance of deep learning by harnessing the complementarity of various types of fingerprints.

Download Full-text