Short-term forecasting of individual household electricity loads with investigating impact of data resolution and forecast horizon

Renewable Energy and Environmental Sustainability ◽

10.1051/rees/2018003 ◽

2018 ◽

Vol 3 ◽

pp. 3 ◽

Cited By ~ 6

Author(s):

Baran Yildiz ◽

Jose I. Bilbao ◽

Jonathon Dore ◽

Alistair B. Sproul

Keyword(s):

Standard Deviation ◽

Smart Grid ◽

Support Vector ◽

Smart Meter ◽

Forecast Horizon ◽

Data Set ◽

Forecast Performance ◽

Data Resolution ◽

Individual Household ◽

Evaluating Forecasts

Smart grid components such as smart home and battery energy management systems, high penetration of renewable energy systems, and demand response activities, require accurate electricity demand forecasts for the successful operation of the electricity distribution networks. For example, in order to optimize residential PV generation and electricity consumption and plan battery charge-discharge regimes by scheduling household appliances, forecasts need to target and be tailored to individual household electricity loads. The recent uptake of smart meters allows easier access to electricity readings at very fine resolutions; hence, it is possible to utilize this source of available data to create forecast models. In this paper, models which predominantly use smart meter data alongside with weather variables, or smart meter based models (SMBM), are implemented to forecast individual household loads. Well-known machine learning models such as artificial neural networks (ANN), support vector machines (SVM) and Least-Square SVM are implemented within the SMBM framework and their performance is compared. The analysed household stock consists of 14 households from the state of New South Wales, Australia, with at least a year worth of 5 min. resolution data. In order for the results to be comparable between different households, our study first investigates household load profiles according to their volatility and reveals the relationship between load standard deviation and forecast performance. The analysis extends previous research by evaluating forecasts over four different data resolution; 5, 15, 30 and 60 min, each resolution analysed for four different horizons; 1, 6, 12 and 24 h ahead. Both, data resolution and forecast horizon, proved to have significant impact on the forecast performance and the obtained results provide important insights for the operation of various smart grid applications. Finally, it is shown that the load profile of some households vary significantly across different days; as a result, providing a single model for the entire period may result in limited performance. By the use of a pre-clustering step, similar daily load profiles are grouped together according to their standard deviation, and instead of applying one SMBM for the entire data-set of a particular household, separate SMBMs are applied to each one of the clusters. This preliminary clustering step increases the complexity of the analysis however it results in significant improvements in forecast performance.

Download Full-text

Calibration-Free Cuffless Blood Pressure Estimation Based on a Population With a Diverse Range of Age and Blood Pressure

Frontiers in Medical Technology ◽

10.3389/fmedt.2021.695356 ◽

2021 ◽

Vol 3 ◽

Author(s):

Syunsuke Yamanaka ◽

Koji Morikawa ◽

Hiroshi Morita ◽

Ji Young Huh ◽

Osamu Yamamura

Keyword(s):

Blood Pressure ◽

Standard Deviation ◽

Transit Time ◽

Cuff Pressure ◽

Estimation Method ◽

Pulse Transit Time ◽

International Standards ◽

Support Vector ◽

Data Set ◽

Diverse Range

This study presents a new blood pressure (BP) estimation algorithm utilizing machine learning (ML). A cuffless device that can measure BP without calibration would be precious for portability, continuous measurement, and comfortability, but unfortunately, it does not currently exist. Conventional BP measurement with a cuff is standard, but this method has various problems like inaccurate BP measurement, poor portability, and painful cuff pressure. To overcome these disadvantages, many researchers have developed cuffless BP estimation devices. However, these devices are not clinically applicable because they require advanced preparation before use, such as calibration, do not follow international standards (81060-1:2007), or have been designed using insufficient data sets. The present study was conducted to combat these issues. We recruited 127 participants and obtained 878 raw datasets. According to international standards, our diverse data set included participants from different age groups with a wide variety of blood pressures. We utilized ML to formulate a BP estimation method that did not require calibration. The present study also conformed to the method required by international standards while calculating the level of error in BP estimation. Two essential methods were applied in this study: (a) grouping the participants into five subsets based on the relationship between the pulse transit time and systolic BP by a support vector machine ensemble with bagging (b) applying the information from the wavelet transformation of the pulse wave and the electrocardiogram to the linear regression BP estimation model for each group. For systolic BP, the standard deviation of error for the proposed BP estimation results with cross-validation was 7.74 mmHg, which was an improvement from 17.05 mmHg, as estimated by the conventional pulse-transit-time-based methods. For diastolic BP, the standard deviation of error was 6.42 mmHg for the proposed BP estimation, which was an improvement from 14.05mmHg. The purpose of the present study was to demonstrate and evaluate the performance of the newly developed BP estimation ML method that meets the international standard for non-invasive sphygmomanometers in a population with a diverse range of age and BP.

Download Full-text

A machine learning approach to water quality forecasts and sensor network expansion: Case study in the Wabash River Basin, USA

10.22541/au.163255065.56919800/v1 ◽

2021 ◽

Author(s):

Tyler Balson ◽

Adam Ward

Keyword(s):

Machine Learning ◽

Water Quality ◽

Sensor Network ◽

River Basin ◽

Synthetic Data ◽

Support Vector ◽

Data Set ◽

Forecast Performance ◽

Wabash River

Midwestern cities require forecasts of surface nitrate loads to bring additional treatment processes online or activate alternative water supplies. Concurrently, networks of nitrate monitoring stations are being deployed in river basins, co-locating water quality observations with established stream gauges. Here, we construct a synthetic data set of stream discharge and nitrate for the Wabash River Basin - one of the U.S.’s most nutrient polluted basins - using the established Agro-IBIS model. While real-world observations are limited in space and time, particularly for nitrate, the synthetic data set allows for sufficiently long periods to train machine learning models and assess their performance. Using the synthetic data, we established baseline 1-day forecasts for surface water nitrate at 12 cities in the basin using support vector machine regression (SVMR; RMSE 0.48-3.3 mg/L). Next, we used the SVMRs to evaluate the improvement in forecast performance associated with deployment of additional sensors. Synthetic data enable us to quantitatively assess the expected value of an additional nitrate sensor being deployed, which is, of course, not possible if we are limited to the present observational network. We identified the optimal sensor placement to improve forecasts at each city, and the relative value of sensors at all possible locations. Finally, we assessed the co-benefit realized by other cities when a sensor is deployed to optimize a forecast at one city, finding significant positive externalities in all cases. Ultimately, our study explores the potential for AI to make short-term predictions and provide an unbiased assessment of the marginal benefit and co-benefits to an expanded sensor network. While we use water quantity in the Wabash River Basin as a case study, this approach could be readily applied to any problem where the future value of sensors and network design are being evaluated.

Download Full-text

A Computational Method for the Identification of Endolysins and Autolysins

Protein and Peptide Letters ◽

10.2174/0929866526666191002104735 ◽

2020 ◽

Vol 27 (4) ◽

pp. 329-336 ◽

Cited By ~ 1

Author(s):

Lei Xu ◽

Guangmin Liang ◽

Baowen Chen ◽

Xu Tan ◽

Huaikun Xiang ◽

...

Keyword(s):

Support Vector Machine ◽

Cell Wall ◽

Experimental Results ◽

Computational Method ◽

Lytic Enzyme ◽

Support Vector ◽

Lytic Enzymes ◽

Data Set ◽

Optimal Feature ◽

Better Than

Background: Cell lytic enzyme is a kind of highly evolved protein, which can destroy the cell structure and kill the bacteria. Compared with antibiotics, cell lytic enzyme will not cause serious problem of drug resistance of pathogenic bacteria. Thus, the study of cell wall lytic enzymes aims at finding an efficient way for curing bacteria infectious. Compared with using antibiotics, the problem of drug resistance becomes more serious. Therefore, it is a good choice for curing bacterial infections by using cell lytic enzymes. Cell lytic enzyme includes endolysin and autolysin and the difference between them is the purpose of the break of cell wall. The identification of the type of cell lytic enzymes is meaningful for the study of cell wall enzymes. Objective: In this article, our motivation is to predict the type of cell lytic enzyme. Cell lytic enzyme is helpful for killing bacteria, so it is meaningful for study the type of cell lytic enzyme. However, it is time consuming to detect the type of cell lytic enzyme by experimental methods. Thus, an efficient computational method for the type of cell lytic enzyme prediction is proposed in our work. Method: We propose a computational method for the prediction of endolysin and autolysin. First, a data set containing 27 endolysins and 41 autolysins is built. Then the protein is represented by tripeptides composition. The features are selected with larger confidence degree. At last, the classifier is trained by the labeled vectors based on support vector machine. The learned classifier is used to predict the type of cell lytic enzyme. Results: Following the proposed method, the experimental results show that the overall accuracy can attain 97.06%, when 44 features are selected. Compared with Ding's method, our method improves the overall accuracy by nearly 4.5% ((97.06-92.9)/92.9%). The performance of our proposed method is stable, when the selected feature number is from 40 to 70. The overall accuracy of tripeptides optimal feature set is 94.12%, and the overall accuracy of Chou's amphiphilic PseAAC method is 76.2%. The experimental results also demonstrate that the overall accuracy is improved by nearly 18% when using the tripeptides optimal feature set. Conclusion: The paper proposed an efficient method for identifying endolysin and autolysin. In this paper, support vector machine is used to predict the type of cell lytic enzyme. The experimental results show that the overall accuracy of the proposed method is 94.12%, which is better than some existing methods. In conclusion, the selected 44 features can improve the overall accuracy for identification of the type of cell lytic enzyme. Support vector machine performs better than other classifiers when using the selected feature set on the benchmark data set.

Download Full-text

In silico Prediction of Inhibitory Constant of Thrombin Inhibitors Using Machine Learning

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666181220130232 ◽

2019 ◽

Vol 21 (9) ◽

pp. 662-669 ◽

Cited By ~ 1

Author(s):

Junnan Zhao ◽

Lu Zhu ◽

Weineng Zhou ◽

Lingfeng Yin ◽

Yuchen Wang ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Regression Tree ◽

Large Data ◽

Thrombin Inhibitors ◽

Coagulation Cascade ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Descriptor Selection

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.

Download Full-text

Rational Design of Colchicine Derivatives as anti-HIV Agents via QSAR and Molecular Docking

Medicinal Chemistry ◽

10.2174/1573406414666180924163756 ◽

2019 ◽

Vol 15 (4) ◽

pp. 328-340 ◽

Cited By ~ 3

Author(s):

Apilak Worachartcheewan ◽

Napat Songtawee ◽

Suphakit Siriwong ◽

Supaluk Prachayasittikul ◽

Chanin Nantasenamat ◽

...

Keyword(s):

Molecular Docking ◽

Rational Design ◽

External Validation ◽

Rational Drug Design ◽

Support Vector ◽

Data Set ◽

Qsar Models ◽

Anti Hiv Agents ◽

Anti Hiv ◽

Colchicine Derivatives

Background: Human immunodeficiency virus (HIV) is an infective agent that causes an acquired immunodeficiency syndrome (AIDS). Therefore, the rational design of inhibitors for preventing the progression of the disease is required. Objective: This study aims to construct quantitative structure-activity relationship (QSAR) models, molecular docking and newly rational design of colchicine and derivatives with anti-HIV activity. Methods: A data set of 24 colchicine and derivatives with anti-HIV activity were employed to develop the QSAR models using machine learning methods (e.g. multiple linear regression (MLR), artificial neural network (ANN) and support vector machine (SVM)), and to study a molecular docking. Results: The significant descriptors relating to the anti-HIV activity included JGI2, Mor24u, Gm and R8p+ descriptors. The predictive performance of the models gave acceptable statistical qualities as observed by correlation coefficient (Q2) and root mean square error (RMSE) of leave-one out cross-validation (LOO-CV) and external sets. Particularly, the ANN method outperformed MLR and SVM methods that displayed LOO−CV 2 Q and RMSELOO-CV of 0.7548 and 0.5735 for LOOCV set, and Ext 2 Q of 0.8553 and RMSEExt of 0.6999 for external validation. In addition, the molecular docking of virus-entry molecule (gp120 envelope glycoprotein) revealed the key interacting residues of the protein (cellular receptor, CD4) and the site-moiety preferences of colchicine derivatives as HIV entry inhibitors for binding to HIV structure. Furthermore, newly rational design of colchicine derivatives using informative QSAR and molecular docking was proposed. Conclusion: These findings serve as a guideline for the rational drug design as well as potential development of novel anti-HIV agents.

Download Full-text

QSAR Study of PARP Inhibitors by GA-MLR, GA-SVM and GA-ANN Approaches

Current Analytical Chemistry ◽

10.2174/1573411016999200518083359 ◽

2020 ◽

Vol 16 (8) ◽

pp. 1088-1105

Author(s):

Nafiseh Vahedi ◽

Majid Mohammadhosseini ◽

Mehdi Nekoei

Keyword(s):

Present Report ◽

Principal Component ◽

Parp Inhibitors ◽

Support Vector ◽

Ann Model ◽

Statistical Parameters ◽

Qsar Study ◽

Data Set ◽

Test Set ◽

Non Linear

Background: The poly(ADP-ribose) polymerases (PARP) is a nuclear enzyme superfamily present in eukaryotes. Methods: In the present report, some efficient linear and non-linear methods including multiple linear regression (MLR), support vector machine (SVM) and artificial neural networks (ANN) were successfully used to develop and establish quantitative structure-activity relationship (QSAR) models capable of predicting pEC50 values of tetrahydropyridopyridazinone derivatives as effective PARP inhibitors. Principal component analysis (PCA) was used to a rational division of the whole data set and selection of the training and test sets. A genetic algorithm (GA) variable selection method was employed to select the optimal subset of descriptors that have the most significant contributions to the overall inhibitory activity from the large pool of calculated descriptors. Results: The accuracy and predictability of the proposed models were further confirmed using crossvalidation, validation through an external test set and Y-randomization (chance correlations) approaches. Moreover, an exhaustive statistical comparison was performed on the outputs of the proposed models. The results revealed that non-linear modeling approaches, including SVM and ANN could provide much more prediction capabilities. Conclusion: Among the constructed models and in terms of root mean square error of predictions (RMSEP), cross-validation coefficients (Q2 LOO and Q2 LGO), as well as R2 and F-statistical value for the training set, the predictive power of the GA-SVM approach was better. However, compared with MLR and SVM, the statistical parameters for the test set were more proper using the GA-ANN model.

Download Full-text

Identification of Smart Grid Attacks via State Vector Estimator and Support Vector Machine Methods

2020 Intermountain Engineering, Technology and Computing (IETC) ◽

10.1109/ietc47856.2020.9249125 ◽

2020 ◽

Author(s):

Wanghao Fei ◽

Paul Moses ◽

Chad Davis

Keyword(s):

Support Vector Machine ◽

Smart Grid ◽

State Vector ◽

Support Vector

Download Full-text

Comparison of Spectroscopic Techniques Combined with Chemometrics for Cocaine Powder Analysis

Journal of Analytical Toxicology ◽

10.1093/jat/bkaa101 ◽

2020 ◽

Vol 44 (8) ◽

pp. 851-860

Author(s):

Joy Eliaerts ◽

Natalie Meert ◽

Pierre Dardenne ◽

Vincent Baeten ◽

Juan-Antonio Fernandez Pierna ◽

...

Keyword(s):

Gas Chromatography ◽

Near Infrared ◽

Evaluation Criteria ◽

Classification Model ◽

Support Vector ◽

Spectroscopic Techniques ◽

Data Set ◽

Promising Tool ◽

Powder Analysis ◽

Mir Spectra

Abstract Spectroscopic techniques combined with chemometrics are a promising tool for analysis of seized drug powders. In this study, the performance of three spectroscopic techniques [Mid-InfraRed (MIR), Raman and Near-InfraRed (NIR)] was compared. In total, 364 seized powders were analyzed and consisted of 276 cocaine powders (with concentrations ranging from 4 to 99 w%) and 88 powders without cocaine. A classification model (using Support Vector Machines [SVM] discriminant analysis) and a quantification model (using SVM regression) were constructed with each spectral dataset in order to discriminate cocaine powders from other powders and quantify cocaine in powders classified as cocaine positive. The performances of the models were compared with gas chromatography coupled with mass spectrometry (GC–MS) and gas chromatography with flame-ionization detection (GC–FID). Different evaluation criteria were used: number of false negatives (FNs), number of false positives (FPs), accuracy, root mean square error of cross-validation (RMSECV) and determination coefficients (R2). Ten colored powders were excluded from the classification data set due to fluorescence background observed in Raman spectra. For the classification, the best accuracy (99.7%) was obtained with MIR spectra. With Raman and NIR spectra, the accuracy was 99.5% and 98.9%, respectively. For the quantification, the best results were obtained with NIR spectra. The cocaine content was determined with a RMSECV of 3.79% and a R2 of 0.97. The performance of MIR and Raman to predict cocaine concentrations was lower than NIR, with RMSECV of 6.76% and 6.79%, respectively and both with a R2 of 0.90. The three spectroscopic techniques can be applied for both classification and quantification of cocaine, but some differences in performance were detected. The best classification was obtained with MIR spectra. For quantification, however, the RMSECV of MIR and Raman was twice as high in comparison with NIR. Spectroscopic techniques combined with chemometrics can reduce the workload for confirmation analysis (e.g., chromatography based) and therefore save time and resources.

Download Full-text

Approach to hand posture recognition based on hand shape features for human–robot interaction

Complex & Intelligent Systems ◽

10.1007/s40747-021-00333-w ◽

2021 ◽

Author(s):

Jing Qi ◽

Kun Xu ◽

Xilun Ding

Keyword(s):

Gaussian Mixture ◽

Human Robot Interaction ◽

Polar Coordinates ◽

Support Vector ◽

Hand Posture ◽

Data Set ◽

Hand Shape ◽

Hand Posture Recognition ◽

Hand Segmentation ◽

Posture Recognition

AbstractHand segmentation is the initial step for hand posture recognition. To reduce the effect of variable illumination in hand segmentation step, a new CbCr-I component Gaussian mixture model (GMM) is proposed to detect the skin region. The hand region is selected as a region of interest from the image using the skin detection technique based on the presented CbCr-I component GMM and a new adaptive threshold. A new hand shape distribution feature described in polar coordinates is proposed to extract hand contour features to solve the false recognition problem in some shape-based methods and effectively recognize the hand posture in cases when different hand postures have the same number of outstretched fingers. A multiclass support vector machine classifier is utilized to recognize the hand posture. Experiments were carried out on our data set to verify the feasibility of the proposed method. The results showed the effectiveness of the proposed approach compared with other methods.

Download Full-text

Correlation between the structure and skin permeability of compounds

Scientific Reports ◽

10.1038/s41598-021-89587-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Ruolan Zeng ◽

Jiyong Deng ◽

Limin Dang ◽

Xinliang Yu

Keyword(s):

Large Data ◽

Qsar Model ◽

Coefficient Of Determination ◽

Support Vector ◽

Skin Permeability ◽

Data Set ◽

Test Set ◽

Svm Algorithm ◽

Svm Model ◽

Toxicity Relationship

AbstractA three-descriptor quantitative structure–activity/toxicity relationship (QSAR/QSTR) model was developed for the skin permeability of a sufficiently large data set consisting of 274 compounds, by applying support vector machine (SVM) together with genetic algorithm. The optimal SVM model possesses the coefficient of determination R2 of 0.946 and root mean square (rms) error of 0.253 for the training set of 139 compounds; and a R2 of 0.872 and rms of 0.302 for the test set of 135 compounds. Compared with other models reported in the literature, our SVM model shows better statistical performance in a model that deals with more samples in the test set. Therefore, applying a SVM algorithm to develop a nonlinear QSAR model for skin permeability was achieved.

Download Full-text