scholarly journals Prediction of Drug Side Effects with a Refined Negative Sample Selection Strategy

2020 ◽  
Vol 2020 ◽  
pp. 1-16 ◽  
Author(s):  
Haiyan Liang ◽  
Lei Chen ◽  
Xian Zhao ◽  
Xiaolin Zhang

Drugs are an important way to treat various diseases. However, they inevitably produce side effects, bringing great risks to human bodies and pharmaceutical companies. How to predict the side effects of drugs has become one of the essential problems in drug research. Designing efficient computational methods is an alternative way. Some studies paired the drug and side effect as a sample, thereby modeling the problem as a binary classification problem. However, the selection of negative samples is a key problem in this case. In this study, a novel negative sample selection strategy was designed for accessing high-quality negative samples. Such strategy applied the random walk with restart (RWR) algorithm on a chemical-chemical interaction network to select pairs of drugs and side effects, such that drugs were less likely to have corresponding side effects, as negative samples. Through several tests with a fixed feature extraction scheme and different machine-learning algorithms, models with selected negative samples produced high performance. The best model even yielded nearly perfect performance. These models had much higher performance than those without such strategy or with another selection strategy. Furthermore, it is not necessary to consider the balance of positive and negative samples under such a strategy.

Sensors ◽  
2020 ◽  
Vol 20 (13) ◽  
pp. 3784 ◽  
Author(s):  
Morteza Homayounfar ◽  
Amirhossein Malekijoo ◽  
Aku Visuri ◽  
Chelsea Dobbins ◽  
Ella Peltonen ◽  
...  

Smartwatch battery limitations are one of the biggest hurdles to their acceptability in the consumer market. To our knowledge, despite promising studies analyzing smartwatch battery data, there has been little research that has analyzed the battery usage of a diverse set of smartwatches in a real-world setting. To address this challenge, this paper utilizes a smartwatch dataset collected from 832 real-world users, including different smartwatch brands and geographic locations. First, we employ clustering to identify common patterns of smartwatch battery utilization; second, we introduce a transparent low-parameter convolutional neural network model, which allows us to identify the latent patterns of smartwatch battery utilization. Our model converts the battery consumption rate into a binary classification problem; i.e., low and high consumption. Our model has 85.3% accuracy in predicting high battery discharge events, outperforming other machine learning algorithms that have been used in state-of-the-art research. Besides this, it can be used to extract information from filters of our deep learning model, based on learned filters of the feature extractor, which is impossible for other models. Third, we introduce an indexing method that includes a longitudinal study to quantify smartwatch battery quality changes over time. Our novel findings can assist device manufacturers, vendors and application developers, as well as end-users, to improve smartwatch battery utilization.


in an event when there is lots of risk factor then the logistic regression is used for predicting the probability. For binary and ordinal data the medical researcher increase the use of logistic analysis. Several classification problems like spam detection used logistic regression. If a customer purchases a specific product in Diabetes prediction or they will inspire with any other competitor, whether customer click on given advertisement link or not are some example. For two class classification the Logistic Regression is one of the most simple and common machine Learning algorithms. For any binary classification problem it is very easy to use as a basic approach. Deep learning is also its fundamental concept. The relationship measurement and description between dependent binary variable and independent variables can be done by logistic regression.


2020 ◽  
Vol 6 (2) ◽  
pp. 4-11
Author(s):  
Silvija Vlah Jerić

AbstractThe main objective of this analysis is to evaluate and compare the various classification algorithms for the automatic identification of favourable days for intraday trading using the Croatian stock index CROBEX data. Intra-day trading refers to the acquisition and sale of financial instruments on the same trading day. If the increase between the opening price and the closing price of the same day is substantial enough to earn a profit by purchasing at the opening price and selling at the closing price, the day is considered to be favourable for intra-day trading. The goal is to discover relation between selected financial indicators on a given day and the market situation on the following day i.e. to determine whether a day is favourable for day trading or not. The problem is modelled as a binary classification problem. The idea is to test different algorithms and to give greater attention to those that are more rarely used than traditional statistical methods. Thus, the following algorithms are used: neural network, support vector machine, random forest, as well as k-nearest neighbours and naïve Bayes classifier as classifiers that are more common. The work is an extension of authors’ previous work in which the algorithms are compared on resamples resulting from tuning the algorithms, while here, each derived model is used to make predictions on new data. The results should add to the increasing corpus of stock market prediction research efforts and try to fill some gaps in this field of research for the Croatian market, in particular by using machine learning algorithms.


2021 ◽  
Vol 11 (12) ◽  
pp. 5632
Author(s):  
Riku Iikura ◽  
Makoto Okada ◽  
Naoki Mori

The understanding of narrative stories by computer is an important task for their automatic generation. To date, high-performance neural-network technologies such as BERT have been applied to tasks such as the Story Cloze Test and Story Completion. In this study, we focus on the text segmentation of novels into paragraphs, which is an important writing technique for readers to deepen their understanding of the texts. This type of segmentation, which we call “paragraph boundary recognition”, can be considered to be a binary classification problem in terms of the presence or absence of a boundary, such as a paragraph between target sentences. However, in this case, the data imbalance becomes a bottleneck because the number of paragraphs is generally smaller than the number of sentences. To deal with this problem, we introduced several cost-sensitive loss functions, namely. focal loss, dice loss, and anchor loss, which were robust for imbalanced classification in BERT. In addition, introducing the threshold-moving technique into the model was effective in estimating paragraph boundaries. As a result of the experiment on three newly created datasets, BERT with dice loss and threshold moving obtained a higher F1 than the original BERT had using cross-entropy loss as its loss function (76% to 80%, 50% to 54%, 59% to 63%).


Data ◽  
2019 ◽  
Vol 4 (2) ◽  
pp. 65 ◽  
Author(s):  
Kanadpriya Basu ◽  
Treena Basu ◽  
Ron Buckmire ◽  
Nishu Lal

Every year, academic institutions invest considerable effort and substantial resources to influence, predict and understand the decision-making choices of applicants who have been offered admission. In this study, we applied several supervised machine learning techniques to four years of data on 11,001 students, each with 35 associated features, admitted to a small liberal arts college in California to predict student college commitment decisions. By treating the question of whether a student offered admission will accept it as a binary classification problem, we implemented a number of different classifiers and then evaluated the performance of these algorithms using the metrics of accuracy, precision, recall, F-measure and area under the receiver operator curve. The results from this study indicate that the logistic regression classifier performed best in modeling the student college commitment decision problem, i.e., predicting whether a student will accept an admission offer, with an AUC score of 79.6%. The significance of this research is that it demonstrates that many institutions could use machine learning algorithms to improve the accuracy of their estimates of entering class sizes, thus allowing more optimal allocation of resources and better control over net tuition revenue.


2018 ◽  
Vol 38 (6) ◽  
Author(s):  
Binbin Wang ◽  
Li Xiao ◽  
Yang Liu ◽  
Jing Wang ◽  
Beihong Liu ◽  
...  

There is a disparity between the increasing application of digital retinal imaging to neonatal ocular screening and slowly growing number of pediatric ophthalmologists. Assistant tools that can automatically detect ocular disorders may be needed. In present study, we develop a deep convolutional neural network (DCNN) for automated classification and grading of retinal hemorrhage. We used 48,996 digital fundus images from 3770 newborns with retinal hemorrhage of different severity (grade 1, 2 and 3) and normal controls from a large cross-sectional investigation in China. The DCNN was trained for automated grading of retinal hemorrhage (multiclass classification problem: hemorrhage-free and grades 1, 2 and 3) and then validated for its performance level. The DCNN yielded an accuracy of 97.85 to 99.96%, and the area under the receiver operating characteristic curve was 0.989–1.000 in the binary classification of neonatal retinal hemorrhage (i.e., one classification vs. the others). The overall accuracy with regard to the multiclass classification problem was 97.44%. This is the first study to show that a DCNN can detect and grade neonatal retinal hemorrhage at high performance levels. Artificial intelligence will play more positive roles in ocular healthcare of newborns and children.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Abhigyan Nath ◽  
André Leier

Abstract Background Cytokines act by binding to specific receptors in the plasma membrane of target cells. Knowledge of cytokine–receptor interaction (CRI) is very important for understanding the pathogenesis of various human diseases—notably autoimmune, inflammatory and infectious diseases—and identifying potential therapeutic targets. Recently, machine learning algorithms have been used to predict CRIs. “Gold Standard” negative datasets are still lacking and strong biases in negative datasets can significantly affect the training of learning algorithms and their evaluation. To mitigate the unrepresentativeness and bias inherent in the negative sample selection (non-interacting proteins), we propose a clustering-based approach for representative negative sample selection. Results We used deep autoencoders to investigate the effect of different sampling approaches for non-interacting pairs on the training and the performance of machine learning classifiers. By using the anomaly detection capabilities of deep autoencoders we deduced the effects of different categories of negative samples on the training of learning algorithms. Random sampling for selecting non-interacting pairs results in either over- or under-representation of hard or easy to classify instances. When K-means based sampling of negative datasets is applied to mitigate the inadequacies of random sampling, random forest (RF) together with the combined feature set of atomic composition, physicochemical-2grams and two different representations of evolutionary information performs best. Average model performances based on leave-one-out cross validation (loocv) over ten different negative sample sets that each model was trained with, show that RF models significantly outperform the previous best CRI predictor in terms of accuracy (+ 5.1%), specificity (+ 13%), mcc (+ 0.1) and g-means value (+ 5.1). Evaluations using tenfold cv and training/testing splits confirm the competitive performance. Conclusions A comparative analysis was performed to assess the effect of three different sampling methods (random, K-means and uniform sampling) on the training of learning algorithms using different evaluation methods. Models trained on K-means sampled datasets generally show a significantly improved performance compared to those trained on random selections—with RF seemingly benefiting most in our particular setting. Our findings on the sampling are highly relevant and apply to many applications of supervised learning approaches in bioinformatics.


2019 ◽  
Vol 11 (16) ◽  
pp. 1933 ◽  
Author(s):  
Yangyang Li ◽  
Ruoting Xing ◽  
Licheng Jiao ◽  
Yanqiao Chen ◽  
Yingte Chai ◽  
...  

Polarimetric synthetic aperture radar (PolSAR) image classification is a recent technology with great practical value in the field of remote sensing. However, due to the time-consuming and labor-intensive data collection, there are few labeled datasets available. Furthermore, most available state-of-the-art classification methods heavily suffer from the speckle noise. To solve these problems, in this paper, a novel semi-supervised algorithm based on self-training and superpixels is proposed. First, the Pauli-RGB image is over-segmented into superpixels to obtain a large number of homogeneous areas. Then, features that can mitigate the effects of the speckle noise are obtained using spatial weighting in the same superpixel. Next, the training set is expanded iteratively utilizing a semi-supervised unlabeled sample selection strategy that elaborately makes use of spatial relations provided by superpixels. In addition, a stacked sparse auto-encoder is self-trained using the expanded training set to obtain classification results. Experiments on two typical PolSAR datasets verified its capability of suppressing the speckle noise and showed excellent classification performance with limited labeled data.


Electronics ◽  
2021 ◽  
Vol 10 (13) ◽  
pp. 1550
Author(s):  
Alexandros Liapis ◽  
Evanthia Faliagka ◽  
Christos P. Antonopoulos ◽  
Georgios Keramidas ◽  
Nikolaos Voros

Physiological measurements have been widely used by researchers and practitioners in order to address the stress detection challenge. So far, various datasets for stress detection have been recorded and are available to the research community for testing and benchmarking. The majority of the stress-related available datasets have been recorded while users were exposed to intense stressors, such as songs, movie clips, major hardware/software failures, image datasets, and gaming scenarios. However, it remains an open research question if such datasets can be used for creating models that will effectively detect stress in different contexts. This paper investigates the performance of the publicly available physiological dataset named WESAD (wearable stress and affect detection) in the context of user experience (UX) evaluation. More specifically, electrodermal activity (EDA) and skin temperature (ST) signals from WESAD were used in order to train three traditional machine learning classifiers and a simple feed forward deep learning artificial neural network combining continues variables and entity embeddings. Regarding the binary classification problem (stress vs. no stress), high accuracy (up to 97.4%), for both training approaches (deep-learning, machine learning), was achieved. Regarding the stress detection effectiveness of the created models in another context, such as user experience (UX) evaluation, the results were quite impressive. More specifically, the deep-learning model achieved a rather high agreement when a user-annotated dataset was used for validation.


Sign in / Sign up

Export Citation Format

Share Document