scholarly journals ‘%svy_logistic_regression: A generic SAS® macro for simple and multiple logistic regression and creating quality publication-ready tables using survey or non-survey data

2019 ◽  
Author(s):  
Jacques Muthusi ◽  
Samuel Mwalili ◽  
Peter Young

AbstractIntroductionReproducible research is increasingly gaining interest in the research community. Automating the production of research manuscript tables from statistical software can help increase the reproducibility of findings. Logistic regression is used in studying disease prevalence and associated factors in epidemiological studies and can be easily performed using widely available software including SAS, SUDAAN, Stata or R. However, output from these software must be processed further to make it readily presentable. There exists a number of procedures developed to organize regression output, though many of them suffer limitations of flexibility, complexity, lack of validation checks for input parameters, as well as inability to incorporate survey design.MethodsWe developed a SAS macro, %svy_logistic_regression, for fitting simple and multiple logistic regression models. The macro also creates quality publication-ready tables using survey or non-survey data which aims to increase transparency of data analyses. It further significantly reduces turn-around time for conducting analysis and preparing output tables while also addressing the limitations of existing procedures.ResultsWe demonstrate the use of the macro in the analysis of the 2013-2014 National Health and Nutrition Examination Survey (NHANES), a complex survey designed to assess the health and nutritional status of adults and children in the United States. The output presented here is directly from the macro and is consistent with how regression results are often presented in the epidemiological and biomedical literature, with unadjusted and adjusted model results presented side by side.ConclusionsThe SAS code presented in this macro is comprehensive, easy to follow, manipulate and to extend to other areas of interest. It can also be incorporated quickly by the statistician for immediate use. It is an especially valuable tool for generating quality, easy to review tables which can be incorporated directly in a publication.

2020 ◽  
Author(s):  
Elizabeth Lerner Papautsky ◽  
Dylan R Rice ◽  
Hana Ghoneima ◽  
Anna Laura W McKowen ◽  
Nicholas Anderson ◽  
...  

BACKGROUND The COVID-19 pandemic has broader geographic spread and potentially longer lasting effects than those of previous disasters. Necessary preventive precautions for the transmission of COVID-19 has resulted in delays for in-person health care services, especially at the outset of the pandemic. OBJECTIVE Among a US sample, we examined the rates of delays (defined as cancellations and postponements) in health care at the outset of the pandemic and characterized the reasons for such delays. METHODS As part of an internet-based survey that was distributed on social media in April 2020, we asked a US–based convenience sample of 2570 participants about delays in their health care resulting from the COVID-19 pandemic. Participant demographics and self-reported worries about general health and the COVID-19 pandemic were explored as potent determinants of health care delays. In addition to all delays, we focused on the following three main types of delays, which were the primary outcomes in this study: dental, preventive, and diagnostic care delays. For each outcome, we used bivariate statistical tests (<i>t</i> tests and chi-square tests) and multiple logistic regression models to determine which factors were associated with health care delays. RESULTS The top reported barrier to receiving health care was the fear of SARS-CoV-2 infection (126/374, 33.6%). Almost half (1227/2570, 47.7%) of the participants reported experiencing health care delays. Among those who experienced health care delays and further clarified the type of delay they experienced (921/1227, 75.1%), the top three reported types of care that were affected by delays included dental (351/921, 38.1%), preventive (269/921, 29.2%), and diagnostic (151/921, 16.4%) care. The logistic regression models showed that age (<i>P</i>&lt;.001), gender identity (<i>P</i>&lt;.001), education (<i>P</i>=.007), and self-reported worry about general health (<i>P</i>&lt;.001) were significantly associated with experiencing health care delays. Self-reported worry about general health was negatively related to experiencing delays in dental care. However, this predictor was positively associated with delays in diagnostic testing based on the logistic regression model. Additionally, age was positively associated with delays in diagnostic testing. No factors remained significant in the multiple logistic regression for delays in preventive care, and although there was trend between race and delays (people of color experienced fewer delays than White participants), it was not significant (<i>P</i>=.06). CONCLUSIONS The lessons learned from the initial surge of COVID-19 cases can inform systemic mitigation strategies for potential future disruptions. This study addresses the demand side of health care delays by exploring the determinants of such delays. More research on health care delays during the pandemic is needed, including research on their short- and long-term impacts on patient-level outcomes such as mortality, morbidity, mental health, people’s quality of life, and the experience of pain.


2015 ◽  
Vol 32 (1) ◽  
pp. 288 ◽  
Author(s):  
Daniel Lapresa ◽  
Javier Arana ◽  
M.Teresa Anguera ◽  
J.Ignacio Pérez-Castellanos ◽  
Mario Amatria

This study shows how simple and multiple logistic regression can be used in observational methodology and more specifically, in the fields of physical activity and sport. We demonstrate this in a study designed to determine whether three-a-side futsal or five-a-side futsal is more suited to the needs and potential of children aged 6-to-8 years. We constructed a multiple logistic regression model to analyze use of space (depth of play) and three simple logistic regression models to determine which game format is more likely to potentiate effective technical and tactical performance.


2014 ◽  
Vol 10 (2) ◽  
pp. 90-99 ◽  
Author(s):  
Darcy White ◽  
Rob Stephenson

As the rate of HIV infection continues to rise among men who have sex with men (MSM) in the United States, a focus of current prevention efforts is to encourage frequent HIV testing. Although levels of lifetime testing are high, low levels of routine testing among MSM are concerning. Using data from an online sample of 768 MSM, this article explores how perceptions of HIV prevalence are associated with HIV testing behavior. Ordinal logistic regression models were fitted to examine correlates of perceived prevalence, and binary logistic regression models were fitted to assess associations between perceived prevalence and HIV testing. The results indicate that perceptions of higher prevalence among more proximal reference groups such as friends and sex partners are associated with greater odds of HIV testing. Perceptions of HIV prevalence were nonuniform across the sample; these variations point to groups to target with strategic messaging and interventions to increase HIV testing among MSM.


2012 ◽  
Vol 39 (1-2) ◽  
pp. 63 ◽  
Author(s):  
S.M. Mostafa Kamal

This paper examines the factors affecting adolescent motherhood in Bangladesh using the 2007 Bangladesh Demographic and Health Survey data. Overall, 69.3 per cent of the married adolescents began childbearing. Among them 56.4 per cent were already mothers and 12.9per cent were pregnant for the first time. Of the adult married women age 20–49, 62.1 per cent initiated childbearing before age 19. The multiple logistic regression analyses revealed that women’s education, husband’s education, place of residence, ever use of contraceptive method, religion, wealth and region are important determinants of adolescent motherhood in Bangladesh.


2009 ◽  
Vol 110 (1) ◽  
pp. 89-94 ◽  
Author(s):  
Eric B. Rosero ◽  
Adebola O. Adesanya ◽  
Carlos H. Timaran ◽  
Girish P. Joshi

Background Malignant hyperthermia (MH) is a potentially fatal pharmacogenetic disorder with an estimated mortality of less than 5%. The purpose of this study was to evaluate the current incidence of MH and the predictors associated with in-hospital mortality in the United States. Methods The Nationwide Inpatient Sample, which is the largest all-payer inpatient database in the United States, was used to identify patients discharged with a diagnosis of MH during the years 2000-2005. The weighted exact Cochrane-Armitage test and multivariate logistic regression analyses were used to assess trends in the incidence and risk-adjusted mortality from MH, taking into account the complex survey design. Results From 2000 to 2005, the number of cases of MH increased from 372 to 521 per year. The occurrence of MH increased from 10.2 to 13.3 patients per million hospital discharges (P = 0.001). Mortality rates from MH ranged from 6.5% in 2005 to 16.9% in 2001 (P &lt; 0.0001). The median age of patients with MH was 39 (interquartile range, 23-54 yr). Only 17.8% of the patients were children, who had lower mortality than adults (0.7% vs. 14.1%, P &lt; 0.0001). Logistic regression analyses revealed that risk-adjusted in-hospital mortality was associated with increasing age, female sex, comorbidity burden, source of admission to hospital, and geographic region of the United States. Conclusions The incidence of MH in the United States has increased in recent years. The in-hospital mortality from MH remains elevated and higher than previously reported. The results of this study should enable the identification of areas requiring increased focus in MH-related education.


Stats ◽  
2021 ◽  
Vol 4 (3) ◽  
pp. 665-681
Author(s):  
Luca Insolia ◽  
Ana Kenney ◽  
Martina Calovi ◽  
Francesca Chiaromonte

High-dimensional classification studies have become widespread across various domains. The large dimensionality, coupled with the possible presence of data contamination, motivates the use of robust, sparse estimation methods to improve model interpretability and ensure the majority of observations agree with the underlying parametric model. In this study, we propose a robust and sparse estimator for logistic regression models, which simultaneously tackles the presence of outliers and/or irrelevant features. Specifically, we propose the use of L0-constraints and mixed-integer conic programming techniques to solve the underlying double combinatorial problem in a framework that allows one to pursue optimality guarantees. We use our proposal to investigate the main drivers of honey bee (Apis mellifera) loss through the annual winter loss survey data collected by the Pennsylvania State Beekeepers Association. Previous studies mainly focused on predictive performance, however our approach produces a more interpretable classification model and provides evidence for several outlying observations within the survey data. We compare our proposal with existing heuristic methods and non-robust procedures, demonstrating its effectiveness. In addition to the application to honey bee loss, we present a simulation study where our proposal outperforms other methods across most performance measures and settings.


2020 ◽  
Author(s):  
Lisa M. Kuhns ◽  
Brookley Rogers ◽  
Katie Greeley ◽  
Abigail L. Muldoon ◽  
Niranjan Karnik ◽  
...  

Abstract Background: Despite recent reductions, youth substance use continues to be a concern in the United States. Structured primary care substance use screening among adolescents is recommended, but not widely implemented. The purpose of this study was to describe the distribution and characteristics of adolescent substance use screening in outpatient clinics in a large academic medical center and assess related factors (i.e., patient age, race/ethnicity, gender, and insurance type) to inform and improve the quality of substance use screening in practice. Methods: We abstracted a random sample of 127 records of patients aged 12-17 and coded clinical notes (e.g., converted open-ended notes to discrete values) to describe screening cases and related characteristics (e.g., which substances screened, how screened). We then analyzed descriptive patterns within the data to calculate screening rates, characteristics of screening, and used multiple logistic regression to identify related factors. Results: Among 127 records, rates of screening by providers were 72% (each) for common substances (alcohol, marijuana, tobacco). The primary method of screening was use of clinical mnemonic cues rather than standardized screening tools. A total of 6% of patients reported substance use during screening. Older age and racial/ethnic minority status were associated with provider screening in multiple logistic regression models. Conclusions: Despite recommendations, low rates of structured screening in primary care persist. Failure to use a standardized screening tool may contribute to low screening rates and biased screening. These findings may be used to inform implementation of standardized and structured screening in the clinical environment.


Sign in / Sign up

Export Citation Format

Share Document