scholarly journals Study of different data science methods for demand prediction and replenishment forecasting at retail network

Author(s):  
Aleksei Iurasov ◽  
Giedre Stanelyte

The demand prediction becoming an essential tool to remain or even lead in the competitionamong the retail businesses. A well-done demand prediction model could help retailer to track the level ofinventory, orders and sales in the most effective way in which the best results could be achieved. However,there are many different methods and opinions of how to create a demand prediction model. In this paper,we will analyse the most commonly used methods of Linear regression, Logistic Regression, ProbabilisticNeural Network, Bayesian Additive Regression Trees, Random Forest and Fuzzy Logic with their specificationsand limitations found in studies of authors. After review performed all methods will be compared accordingto characteristics selected. Moreover, in order to get more practical results the accuracy of LogisticRegression and Random Forest methods will be compared based on data of milk sales collected from retailnetwork. For constructing of decision support system for retail network, we need to go beyond demandprediction one-step to replenishment forecasting. It was concluded that there is no best method to forecastreplenishment and results can differ based on the data and conditions analysing. In every situation authorsseeking to select the method with the highest accuracy and the lowest number of errors possible. Limitationsof research: limited number of goods and stores included in the modelling.

Healthcare ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 853
Author(s):  
Jee-Yun Kim ◽  
Jeong Yee ◽  
Tae-Im Park ◽  
So-Youn Shin ◽  
Man-Ho Ha ◽  
...  

Predicting the clinical progression of intensive care unit (ICU) patients is crucial for survival and prognosis. Therefore, this retrospective study aimed to develop the risk scoring system of mortality and the prediction model of ICU length of stay (LOS) among patients admitted to the ICU. Data from ICU patients aged at least 18 years who received parenteral nutrition support for ≥50% of the daily calorie requirement from February 2014 to January 2018 were collected. In-hospital mortality and log-transformed LOS were analyzed by logistic regression and linear regression, respectively. For calculating risk scores, each coefficient was obtained based on regression model. Of 445 patients, 97 patients died in the ICU; the observed mortality rate was 21.8%. Using logistic regression analysis, APACHE II score (15–29: 1 point, 30 or higher: 2 points), qSOFA score ≥ 2 (2 points), serum albumin level < 3.4 g/dL (1 point), and infectious or respiratory disease (1 point) were incorporated into risk scoring system for mortality; patients with 0, 1, 2–4, and 5–6 points had approximately 10%, 20%, 40%, and 65% risk of death. For LOS, linear regression analysis showed the following prediction equation: log(LOS) = 0.01 × (APACHE II) + 0.04 × (total bilirubin) − 0.09 × (admission diagnosis of gastrointestinal disease or injury, poisoning, or other external cause) + 0.970. Our study provides the mortality risk score and LOS prediction equation. It could help clinicians to identify those at risk and optimize ICU management.


2019 ◽  
Vol 8 (10) ◽  
pp. 1709 ◽  
Author(s):  
Tsung-Lun Tsai ◽  
Min-Hsin Huang ◽  
Chia-Yen Lee ◽  
Wu-Wei Lai

Besides the traditional indices such as biochemistry, arterial blood gas, rapid shallow breathing index (RSBI), acute physiology and chronic health evaluation (APACHE) II score, this study suggests a data science framework for extubation prediction in the surgical intensive care unit (SICU) and investigates the value of the information our prediction model provides. A data science framework including variable selection (e.g., multivariate adaptive regression splines, stepwise logistic regression and random forest), prediction models (e.g., support vector machine, boosting logistic regression and backpropagation neural network (BPN)) and decision analysis (e.g., Bayesian method) is proposed to identify the important variables and support the extubation decision. An empirical study of a leading hospital in Taiwan in 2015–2016 is conducted to validate the proposed framework. The results show that APACHE II and white blood cells (WBC) are the two most critical variables, and then the priority sequence is eye opening, heart rate, glucose, sodium and hematocrit. BPN with selected variables shows better prediction performance (sensitivity: 0.830; specificity: 0.890; accuracy 0.860) than that with APACHE II or RSBI. The value of information is further investigated and shows that the expected value of experimentation (EVE), 0.652 days (patient staying in the ICU), is saved when comparing with current clinical experience. Furthermore, the maximal value of information occurs in a failure rate around 7.1% and it reveals the “best applicable condition” of the proposed prediction model. The results validate the decision quality and useful information provided by our predicted model.


The study of pricing factors in the market of the short-term rental has been done. Airbnb was chosen as the object of the study; it is a platform for accommodation, search, and rental around the world. At the beginning of 2021, the company offers 7 million homes from more than 220 countries. The Data Science methods play a significant role in the company's success. One of the key algorithms of the company is the pricing algorithm. Using the "Price Recommendations" feature, the homeowner can analyze which dates are most likely to be booked at the current price and which are not, it helps form a favorable offer. The system calculates the recommended cost of housing based on hundreds of parameters, some of which are easy to recognize, but there are less obvious factors that can also affect demand. The paper proposes an algorithm for identifying implicit pricing factors in the short-term rental market using machine learning methods, which includes: 1) data mining and data preparation; 2) building and analysis of linear regression models; 3) building and analysis of nonlinear regression models. The study was based on ads from the Airbnb site in Washington and New York using scripts developed in Python. The following models are built and analyzed: simple linear regression, multiple linear regression, polynomial regression, decision trees, random forest, and boosting. The results of the study showed that the most important factors are accommodates, cleaning_fee, room_type, bedrooms. But based on the model evaluation criteria, they cannot be used for implementation: linear models are of low quality, while the random forest, boosting, and trees are overfitted. Still the results can be used in conducting business analysis.


2021 ◽  
Vol 44 (4) ◽  
pp. 1-12
Author(s):  
Ratchainant Thammasudjarit ◽  
Punnathorn Ingsathit ◽  
Sigit Ari Saputro ◽  
Atiporn Ingsathit ◽  
Ammarin Thakkinstian

Background: Chronic kidney disease (CKD) takes huge amounts of resources for treatments. Early detection of patients by risk prediction model should be useful in identifying risk patients and providing early treatments. Objective: To compare the performance of traditional logistic regression with machine learning (ML) in predicting the risk of CKD in Thai population. Methods: This study used Thai Screening and Early Evaluation of Kidney Disease (SEEK) data. Seventeen features were firstly considered in constructing prediction models using logistic regression and 4 MLs (Random Forest, Naïve Bayes, Decision Tree, and Neural Network). Data were split into train and test data with a ratio of 70:30. Performances of the model were assessed by estimating recall, C statistics, accuracy, F1, and precision. Results: Seven out of 17 features were included in the prediction models. A logistic regression model could well discriminate CKD from non-CKD patients with the C statistics of 0.79 and 0.78 in the train and test data. The Neural Network performed best among ML followed by a Random Forest, Naïve Bayes, and a Decision Tree with the corresponding C statistics of 0.82, 0.80, 0.78, and 0.77 in training data set. Performance of these corresponding models in testing data decreased about 5%, 3%, 1%, and 2% relative to the logistic model by 2%. Conclusions: Risk prediction model of CKD constructed by the logit equation may yield better discrimination and lower tendency to get overfitting relative to ML models including the Neural Network and Random Forest.  


2021 ◽  
Author(s):  
Hemlata Jain ◽  
Ajay Khunteta ◽  
Sumit Private Shrivastav

Abstract Machine Learning and Deep learning classification has become an important topic in the area of Telecom Churn Prediction. Researchers have come out with very efficient experiments for Churn Prediction and have given a new direction to the telecommunication Industry to save their customers. Companies are eagerly developing the models for predicting churn and putting their efforts to save the potential churners. Therefore, for a better churn prediction model, finding the factors of churn is very important. This study is aiming to find the factors of user’s churn by evaluating their past service usage details. For this purpose, study is taking the advantage of feature importance, feature normalisation, feature correlation and feature extraction. After feature selection and extraction this study performing seven different experiments on the dataset to bring out the best results and compared the techniques. First Experiment includes a hybrid model of Decision tree and Logistic Regression, second experiment include PCA with Logistic Regression and Logit Boost, third experiment using a Deep Learning Technique that is CNN-VAE (Convolutional Neural Network with Variational Autoencoder), Fourth, fifth, sixth and seventh experiments was done on Logistic Regression, Logit Boost, XGBoost and Random Forest respectively. First four experiments are hybrid models and rest are using standalone techniques. The Orange dataset was used in this technique which has 3333 subscriber’s entries and 21 features. On the other hand, these experiments are compared with already existing models that have been developed in literature studies. The performance was evaluated using Accuracy, Precision, Recall rate, F-measure, Confusion Matrix, Marco Average and Weighted Average. This study proved to get better results as compared to old models. Random Forest outperformed in this study by achieving 95% Accuracy and all other experiments also produced very good results. The study states the importance of data mining techniques for a churn prediction model and proposes a very good comparison model where all machine Learning Standalone techniques, Deep Learning Technique and hybrid models with Feature Extraction tasks are being used and compared on the same dataset to evaluate the techniques performance better.


2021 ◽  
Author(s):  
Muayad Alali ◽  
Anoop Mayampurath ◽  
Yangyang Dai ◽  
Allison H. Bartlett

Abstract Objectives:Febrile neutropenia (FN) is a common condition in children receiving chemotherapy. Our goal in this study was to develop a model for predicting blood stream infection (BSI) and transfer to intensive care (TIC) at time of presentation in pediatric cancer patients with FN. Methods: We conducted an observational cohort analysis of pediatric and adolescent cancer patients younger than 24 years admitted for fever and chemotherapy-induced neutropenia over a 7-year period. We excluded stem cell transplant recipients who developed FN after transplant and febrile non-neutropenic episodes. The primary outcome was onset of BSI, as determined by positive blood culture within 7 days of onset of FN. The secondary outcome was transfer to intensive care (TIC) within 14 days of FN onset. Predictor variables include demographics, clinical, and laboratory measures on initial presentation for FN. Data were divided into independent derivation (2009-2015) and prospective validation (2015-2016) cohorts. Prediction models were built for both outcomes using logistic regression and random forest and compared with Hakim model. Performance was assessed using area under the receiver operating characteristic curve (AUC) metrics. Results: A total of 505 FN episodes (FNEs) were identified in 230 patients. BSI was diagnosed in 106 (21%) and TIC occurred in 56 (10.6%) episodes. The most common oncologic diagnosis with FN was acute lymphoblastic leukemia (ALL), and the highest rate of BSI was in patients with AML. Patients who had BSI had higher maximum temperature, higher rates of prior BSI and higher incidence of hypotension compared with patients who did not have BSI. FN patients who were transferred to the intensive care (TIC) had higher temperature and higher incidence of hypotension at presentation compared to FN patients who didn’t have TIC. We compared 3 models: (1) random forest (2) logistic regression and (3) Hakim model. The areas under the curve for BSI prediction were (0.79, 0.65, and 0.64, P < 0.05) for models 1,2, and 3, respectively. And for TIC prediction were (0.88, 0.76, and 0.65, P < 0.05) respectively. The random forest model demonstrated higher accuracy in predicting BSI and TIC and showed a negative predictive value (NPV) of 0.91 and 0.97 for BSI and TIC respectively at the best cutoff point as determined by Youden’s Index. Likelihood ratios (LRs) (post-test probability) for RF model have potential utility of identifying low risk for BSI and TIC (0.24 and 0.12) and high-risk patients (3.5 and 6.8) respectively. Conclusions: Our prediction model has a good diagnostic performance in clinical practices for both BSI and TIC in FN patients at the time of presentation. The model can be used to identify a group of individuals at low risk for BSI who may benefit from early discharge and reduce length of stay, also it can identify FN patients at high risk of complications who might benefit from more intensive therapies at presentation.


2021 ◽  
Vol 1 (1) ◽  
pp. 21-32
Author(s):  
Mawaddah Harahap ◽  
Yusniar Lubis ◽  
Zakarias Situmorang

Dalam kegiatan pemasaran digital, data Science (DS) memiliki peran penting dalam memahami kinerja industri pemasaran sebelum menerapkan teknik pemasaran digital pada pemasaran produk. Hal ini dikarenakan setiap pelanggan merespons secara berbeda setiap penawaran. Perilaku pelanggan juga berubah berdasarkan waktu karena mereka mungkin memiliki kebutuhan yang berbeda pada situasi yang berbeda. Pada makalah ini fokus menyajikan analisis bisnis dengan penerapan DS untuk mengeksplorasi pola perilaku dan juga memprediksi bagaimana pelanggan akan merespons penawaran yang berbeda. Penerapan analisis data eksplorasi juga diterapkan untuk menjawab beberapa pertanyaan bisnis, dari hasil pengamatan menghasilkan lima kelompok pelanggan yang disajikan dalam bentuk visualisasi dan model Random Forest Classifier memiliki skor akurasi prediksi terbaik sebesar 91%, kemudian K neighbors Classifier dan Logistic Regression.


2021 ◽  
Vol 1 (1) ◽  
pp. 8-13
Author(s):  
Amir Mahmud Husein ◽  
Mawaddah Harahap

Peralihan pelanggan merupakan fenomena dimana pelanggan perusahaan berhenti membeli atau berinteraksi sehingga sangat penting bagi perusahaan khususnya perbankan untuk memprediksi kemungkinan churn pelanggan dan hasilnya dapat digunakan untuk membantu retensi pelanggan dan bagian dari strategi perusahaan. Makalah ini menyajikan analisis dan prediksi churn pelanggan dengan menggunakan lima model berbeda yaitu Kneighbors Classifier, Logistic Regression, Linear SVC, Random Tree Classifier dan Random Forest Classifier. Berdasarkan hasil pengujian pendekatan model Random Forest Classifier dan Kneighbors Classifier lebih baik dari pada model lain dengan akurasi sebesar 86% dan 84%. Rekayasa fitur dengan pendekatan Anova dan Chi Square memiliki pengaruh yang signifikan terhadap peningkatan kinerja model prediksi.


2019 ◽  
Vol 11 (18) ◽  
pp. 5039
Author(s):  
Georgia Ellina ◽  
Garyfalos Papaschinopoulos ◽  
Basil Papadopoulos

As a variable system, the Lake of Kastoria is a good example regarding the pattern of the Mediterranean shallow lakes. The focus of this study is on the investigation of this lake’s eutrophication, analyzing the relation of the basic factors that affect this phenomenon using fuzzy logic. In the method we suggest, while there are many fuzzy implications that can be used since the proposition can take values in the close interval [0,1], we investigate the most appropriate implication for the studied water body. We propose a method evaluating fuzzy implications by constructing triangular non-asymptotic fuzzy numbers for each of the studied parameters coming from experimental data. This is achieved with the use of fuzzy estimators and fuzzy linear regression. In this way, we achieve a better understanding of the mechanisms and functions that regulate this ecosystem.


Sign in / Sign up

Export Citation Format

Share Document