A Brief Analysis of Key Machine Learning Methods for Predicting Medicare Payments Related to Physical Therapy Practices in the United States

Shrirang A. Kulkarni; Jodh S. Pannu; Andriy V. Koval; Gabriel J. Merrin; Varadraj P. Gurupur; Ayan Nasir; Christian King; Thomas T. H. Wan

doi:10.3390/info12020057

A Brief Analysis of Key Machine Learning Methods for Predicting Medicare Payments Related to Physical Therapy Practices in the United States

Information ◽

10.3390/info12020057 ◽

2021 ◽

Vol 12 (2) ◽

pp. 57

Author(s):

Shrirang A. Kulkarni ◽

Jodh S. Pannu ◽

Andriy V. Koval ◽

Gabriel J. Merrin ◽

Varadraj P. Gurupur ◽

...

Keyword(s):

United States ◽

Machine Learning ◽

Physical Therapy ◽

Random Forest ◽

Generalized Additive Model ◽

Additive Model ◽

The United States ◽

Random Forest Regression ◽

Key Variables ◽

Machine Learning Models

Background and objectives: Machine learning approaches using random forest have been effectively used to provide decision support in health and medical informatics. This is especially true when predicting variables associated with Medicare reimbursements. However, more work is needed to analyze and predict data associated with reimbursements through Medicare and Medicaid services for physical therapy practices in the United States. The key objective of this study is to analyze different machine learning models to predict key variables associated with Medicare standardized payments for physical therapy practices in the United States. Materials and Methods: This study employs five methods, namely, multiple linear regression, decision tree regression, random forest regression, K-nearest neighbors, and linear generalized additive model, (GAM) to predict key variables associated with Medicare payments for physical therapy practices in the United States. Results: The study described in this article adds to the body of knowledge on the effective use of random forest regression and linear generalized additive model in predicting Medicare Standardized payment. It turns out that random forest regression may have any edge over other methods employed for this purpose. Conclusions: The study provides a useful insight into comparing the performance of the aforementioned methods, while identifying a few intricate details associated with predicting Medicare costs while also ascertaining that linear generalized additive model and random forest regression as the most suitable machine learning models for predicting key variables associated with standardized Medicare payments.

Download Full-text

Explainable machine learning models to understand determinants of COVID-19 mortality in the United States

10.1101/2020.05.23.20110189 ◽

2020 ◽

Author(s):

Piyush Mathur ◽

Tavpritesh Sethi ◽

Anya Mathur ◽

Kamal Maheshwari ◽

Jacek B Cywinski ◽

...

Keyword(s):

United States ◽

Machine Learning ◽

Random Forest ◽

Population Density ◽

The United States ◽

Median Income ◽

Relative Importance ◽

Learning Models ◽

Categorical Models ◽

Machine Learning Models

AbstractBackgroundCOVID-19 is now one of the leading causes of mortality amongst adults in the United States for the year 2020. Multiple epidemiological models have been built, often based on limited data, to understand the spread and impact of the pandemic. However, many geographic and local factors may have played an important role in higher morbidity and mortality in certain populations.ObjectiveThe goal of this study was to develop machine learning models to understand the relative association of socioeconomic, demographic, travel, and health care characteristics of different states across the United States and COVID-19 mortality.MethodsUsing multiple public data sets, 24 variables linked to COVID-19 disease were chosen to build the models. Two independent machine learning models using CatBoost regression and random forest were developed. SHAP feature importance and a Boruta algorithm were used to elucidate the relative importance of features on COVID-19 mortality in the United States.ResultsFeature importances from both the categorical models, i.e., CatBoost and random forest consistently showed that a high population density, number of nursing homes, number of nursing home beds and foreign travel were strongest predictors of COVID-19 mortality. Percentage of African American amongst the population was also found to be of high importance in prediction of COVID-19 mortality whereas racial majority (primarily, Caucasian) was not. Both models fitted the data well with a training R2 of 0.99 and 0.88 respectively. The effect of median age,median income, climate and disease mitigation measures on COVID-19 related mortality remained unclear.ConclusionsCOVID-19 policy making will need to take population density, pre-existing medical care and state travel policies into account. Our models identified and quantified the relative importance of each of these for mortality predictions using machine learning.

Download Full-text

Machine Learning Models of COVID-19 Cases in the United States: A Study of Initial Lockdown and Reopen Regimes

Applied Sciences ◽

10.3390/app112311227 ◽

2021 ◽

Vol 11 (23) ◽

pp. 11227

Author(s):

Arnold Kamis ◽

Yudan Ding ◽

Zhenzhen Qu ◽

Chenchen Zhang

Keyword(s):

United States ◽

Machine Learning ◽

Additive Model ◽

Regression Tree ◽

Predictor Variable ◽

The United States ◽

Predictor Variables ◽

Future Research ◽

Machine Learning Methods ◽

Variance Explained

The purpose of this paper is to model the cases of COVID-19 in the United States from 13 March 2020 to 31 May 2020. Our novel contribution is that we have obtained highly accurate models focused on two different regimes, lockdown and reopen, modeling each regime separately. The predictor variables include aggregated individual movement as well as state population density, health rank, climate temperature, and political color. We apply a variety of machine learning methods to each regime: Multiple Regression, Ridge Regression, Elastic Net Regression, Generalized Additive Model, Gradient Boosted Machine, Regression Tree, Neural Network, and Random Forest. We discover that Gradient Boosted Machines are the most accurate in both regimes. The best models achieve a variance explained of 95.2% in the lockdown regime and 99.2% in the reopen regime. We describe the influence of the predictor variables as they change from regime to regime. Notably, we identify individual person movement, as tracked by GPS data, to be an important predictor variable. We conclude that government lockdowns are an extremely important de-densification strategy. Implications and questions for future research are discussed.

Download Full-text

Explainable machine learning models to understand determinants of COVID-19 mortality in the United States (Preprint)

10.2196/preprints.20511 ◽

2020 ◽

Author(s):

Piyush Mathur ◽

Tavpritesh Sethi ◽

Anya Mathur ◽

Kamal Maheshwari ◽

Jacek Cywinski ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

The United States ◽

Training Data ◽

Mitigation Measures ◽

Data Set ◽

Administrative Action ◽

The Us ◽

Feature Importance ◽

Machine Learning Models

UNSTRUCTURED Introduction The COVID-19 pandemic exhibits an uneven geographic spread which leads to a locational mismatch of testing, mitigation measures and allocation of healthcare resources (human, equipment, and infrastructure).(1) In the absence of effective treatment, understanding and predicting the spread of COVID-19 is unquestionably valuable for public health and hospital authorities to plan for and manage the pandemic. While there have been many models developed to predict mortality, the authors sought to develop a machine learning prediction model that provides an estimate of the relative association of socioeconomic, demographic, travel, and health care characteristics of COVID-19 disease mortality among states in the United States(US). Methods State-wise data was collected for all the features predicting COVID-19 mortality and for deriving feature importance (eTable 1 in the Supplement).(2) Key feature categories include demographic characteristics of the population, pre-existing healthcare utilization, travel, weather, socioeconomic variables, racial distribution and timing of disease mitigation measures (Figure 1 & 2). Two machine learning models, Catboost regression and random forest were trained independently to predict mortality in states on data partitioned into a training (80%) and test (20%) set.(3) Accuracy of models was assessed by R2 score. Importance of the features for prediction of mortality was calculated via two machine learning algorithms - SHAP (SHapley Additive exPlanations) calculated upon CatBoost model and Boruta, a random forest based method trained with 10,000 trees for calculating statistical significance (3-5). Results Results are based on 60,604 total deaths in the US, as of April 30, 2020. Actual number of deaths ranged widely from 7 (Wyoming) to 18,909 (New York).CatBoost regression model obtained an R2 score of 0.99 on the training data set and 0.50 on the test set. Random Forest model obtained an R2 score of 0.88 on the training data set and 0.39 on the test set. Nine out of twenty variables were significantly higher than the maximum variable importance achieved by the shadow dataset in Boruta regression (Figure 2).Both models showed the high feature importance for pre-existing high healthcare utilization reflective in nursing home beds per capita and doctors per 100,000 population. Overall population characteristics such as total population and population density also correlated positively with the number of deaths.Notably, both models revealed a high positive correlation of deaths with percentage of African Americans. Direct flights from China, especially Wuhan were also significant in both models as predictors of death, therefore reflecting early spread of the disease. Associations between deaths and weather patterns, hospital bed capacity, median age, timing of administrative action to mitigate disease spread such as the closure of educational institutions or stay at home order were not significant. The lack of some associations, e.g., administrative action may reflect delayed outcomes of interventions which were not yet reflected in data. Discussion COVID-19 disease has varied spread and mortality across communities amongst different states in the US. While our models show that high population density, pre-existing need for medical care and foreign travel may increase transmission and thus COVID-19 mortality, the effect of geographic, climate and racial disparities on COVID-19 related mortality is not clear. The purpose of our study was not state-wise accurate prediction of deaths in the US, which has already been challenging.(6) Location based understanding of key determinants of COVID-19 mortality, is critically needed for focused targeting of mitigation and control measures. Risk assessment-based understanding of determinants affecting COVID-19 outcomes, using a dynamic and scalable machine learning model such as the two proposed, can help guide resource management and policy framework.

Download Full-text

Analysis of Covid-19 in the United States using Machine Learning

Machine Learning and Applications An International Journal ◽

10.5121/mlaij.2021.8102 ◽

2020 ◽

Vol 8 (1) ◽

pp. 15-21

Author(s):

James G. Koomson

Keyword(s):

United States ◽

Machine Learning ◽

The United States ◽

Global Level ◽

Learning Models ◽

The World ◽

Day By Day ◽

Machine Learning Models

The unprecedented outbreak of COVID-19 also known as the coronavirus has caused a pandemic like none ever seen before this century. Its impact has been massive on a global level. The deadly virus has commanded nations around the world to increase their efforts to fight against the spread of the virus after the stress it has put on resources. With the number of new cases increasing day by day around the world, the objective of this paper is to contribute towards the analysis of the virus by leveraging machine learning models to understand its behavior and predict future patterns in the United States (US) based on data obtained from the COVID-19 Tracking Project.

Download Full-text

Prediction for the Risk of Multiple Chronic Conditions among Working Population in the United States with Machine Learning Models

IEEE Open Journal of Engineering in Medicine and Biology ◽

10.1109/ojemb.2021.3117872 ◽

2021 ◽

pp. 1-1

Author(s):

Jingmei Yang ◽

Xinglong Ju ◽

Feng Liu ◽

Onur Asan ◽

Timothy S Church ◽

...

Keyword(s):

United States ◽

Machine Learning ◽

Chronic Conditions ◽

The United States ◽

Multiple Chronic Conditions ◽

Learning Models ◽

Working Population ◽

Machine Learning Models

Download Full-text

Forecasting American COVID-19 Cases and Deaths through Machine Learning

10.1101/2020.08.13.20174631 ◽

2020 ◽

Author(s):

Anaiy Somalwar

Keyword(s):

United States ◽

Machine Learning ◽

Curve Fitting ◽

The United States ◽

Learning Model ◽

Learning Models ◽

Gaussian Curve ◽

Machine Learning Model ◽

Gradient Based ◽

Machine Learning Models

COVID-19 has become a great national security problem for the United States and many other countries, where public policy and healthcare decisions are based on the several models for the prediction of the future deaths and cases of COVID-19. While the most commonly used models for COVID-19 include epidemiological models and Gaussian curve-fitting models, recent literature has indicated that these models could be improved by incorporating machine learning. However, within this research on potential machine learning models for COVID-19 forecasting, there has been a large emphasis on providing an array of different types of machine learning models rather than optimizing a single one. In this research, we suggest and optimize a linear machine learning model with a gradient-based optimizer for the prediction of future COVID-19 cases and deaths in the United States. We also suggest that a hybrid of a machine learning model for shorter range predictions and a Gaussian curve-fitting model or an epidemiological model for longer range predictions could greatly increase the accuracy of COVID-19 forecasting.

Download Full-text

Forecasting American COVID-19 Cases and Deaths through Machine Learning (Preprint)

10.2196/preprints.23605 ◽

2020 ◽

Author(s):

Anaiy Somalwar

Keyword(s):

United States ◽

Machine Learning ◽

Curve Fitting ◽

The United States ◽

Learning Model ◽

Learning Models ◽

Gaussian Curve ◽

Machine Learning Model ◽

Gradient Based ◽

Machine Learning Models

UNSTRUCTURED COVID-19 has become a great national security problem for the United States and many other countries, where public policy and healthcare decisions are based on the several models for the prediction of the future deaths and cases of COVID-19. While the most commonly used models for COVID-19 include epidemiological models and Gaussian curve-fitting models, recent literature has indicated that these models could be improved by incorporating machine learning. However, within this research on potential machine learning models for COVID-19 forecasting, there has been a large emphasis on providing an array of different types of machine learning models rather than optimizing a single one. In this research, we suggest and optimize a linear machine learning model with a gradient-based optimizer for the prediction of future COVID-19 cases and deaths in the United States. We also suggest that a hybrid of a machine learning model for shorter range predictions and a Gaussian curve-fitting model or an epidemiological model for longer range predictions could greatly increase the accuracy of COVID-19 forecasting. INTERNATIONAL REGISTERED REPORT RR2-https://doi.org/10.1101/2020.08.13.20174631

Download Full-text

Document Preprocessing with TF-IDF to Improve the Polarity Classification Performance of Unstructured Sentiment Analysis

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v5i3.1066 ◽

2020 ◽

pp. 235-242

Author(s):

Farrikh Alzami ◽

Erika Devi Udayanti ◽

Dwi Puji Prabowo ◽

Rama Aria Megantara

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Random Forest ◽

Sentiment Analysis ◽

Classification Performance ◽

Document Preparation ◽

Learning Models ◽

Polarity Classification ◽

Negative Sentiment ◽

Machine Learning Models

Sentiment analysis in terms of polarity classification is very important in everyday life, with the existence of polarity, many people can find out whether the respected document has positive or negative sentiment so that it can help in choosing and making decisions. Sentiment analysis usually done manually. Therefore, an automatic sentiment analysis classification process is needed. However, it is rare to find studies that discuss extraction features and which learning models are suitable for unstructured sentiment analysis types with the Amazon food review case. This research explores some extraction features such as Word Bags, TF-IDF, Word2Vector, as well as a combination of TF-IDF and Word2Vector with several machine learning models such as Random Forest, SVM, KNN and Naïve Bayes to find out a combination of feature extraction and learning models that can help add variety to the analysis of polarity sentiments. By assisting with document preparation such as html tags and punctuation and special characters, using snowball stemming, TF-IDF results obtained with SVM are suitable for obtaining a polarity classification in unstructured sentiment analysis for the case of Amazon food review with a performance result of 87,3 percent.

Download Full-text

Descriptors of Cytochrome Inhibitors and Useful Machine Learning Based Methods for the Design of Safer Drugs

Pharmaceuticals ◽

10.3390/ph14050472 ◽

2021 ◽

Vol 14 (5) ◽

pp. 472

Author(s):

Tyler C. Beck ◽

Kyle R. Beck ◽

Jordan Morningstar ◽

Menny M. Benjamin ◽

Russell A. Norris

Keyword(s):

United States ◽

Machine Learning ◽

Drug Interactions ◽

The United States ◽

Structural Features ◽

Physiochemical Properties ◽

Drug Dosing ◽

Therapeutic Outcomes ◽

Cyp Inhibition ◽

Cyp Inhibitors

Roughly 2.8% of annual hospitalizations are a result of adverse drug interactions in the United States, representing more than 245,000 hospitalizations. Drug–drug interactions commonly arise from major cytochrome P450 (CYP) inhibition. Various approaches are routinely employed in order to reduce the incidence of adverse interactions, such as altering drug dosing schemes and/or minimizing the number of drugs prescribed; however, often, a reduction in the number of medications cannot be achieved without impacting therapeutic outcomes. Nearly 80% of drugs fail in development due to pharmacokinetic issues, outlining the importance of examining cytochrome interactions during preclinical drug design. In this review, we examined the physiochemical and structural properties of small molecule inhibitors of CYPs 3A4, 2D6, 2C19, 2C9, and 1A2. Although CYP inhibitors tend to have distinct physiochemical properties and structural features, these descriptors alone are insufficient to predict major cytochrome inhibition probability and affinity. Machine learning based in silico approaches may be employed as a more robust and accurate way of predicting CYP inhibition. These various approaches are highlighted in the review.

Download Full-text

A novel framework for designing a multi-DoF prosthetic wrist control using machine learning

Scientific Reports ◽

10.1038/s41598-021-94449-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Chinmay P. Swami ◽

Nicholas Lenhard ◽

Jiyeon Kang

Keyword(s):

Machine Learning ◽

Random Forest ◽

Upper Limb ◽

Daily Living ◽

Machine Learning Algorithms ◽

Data Sets ◽

Random Forest Regression ◽

Prosthetic Devices ◽

Upper Limb Function ◽

The Neural Network

AbstractProsthetic arms can significantly increase the upper limb function of individuals with upper limb loss, however despite the development of various multi-DoF prosthetic arms the rate of prosthesis abandonment is still high. One of the major challenges is to design a multi-DoF controller that has high precision, robustness, and intuitiveness for daily use. The present study demonstrates a novel framework for developing a controller leveraging machine learning algorithms and movement synergies to implement natural control of a 2-DoF prosthetic wrist for activities of daily living (ADL). The data was collected during ADL tasks of ten individuals with a wrist brace emulating the absence of wrist function. Using this data, the neural network classifies the movement and then random forest regression computes the desired velocity of the prosthetic wrist. The models were trained/tested with ADLs where their robustness was tested using cross-validation and holdout data sets. The proposed framework demonstrated high accuracy (F-1 score of 99% for the classifier and Pearson’s correlation of 0.98 for the regression). Additionally, the interpretable nature of random forest regression was used to verify the targeted movement synergies. The present work provides a novel and effective framework to develop an intuitive control for multi-DoF prosthetic devices.

Download Full-text