The #MeToo Movement in the United States: Text Analysis of Early Twitter Conversations (Preprint)

Mapping Intimacies ◽

10.2196/preprints.13837 ◽

2019 ◽

Author(s):

Sepideh Modrek ◽

Bozhidar Chakalov

Keyword(s):

United States ◽

Machine Learning ◽

Sexual Assault ◽

Sexual Harassment ◽

Early Life ◽

English Language ◽

Life Experiences ◽

The United States ◽

Learning Methods ◽

Machine Learning Methods

BACKGROUND The #MeToo movement sparked an international debate on the sexual harassment, abuse, and assault and has taken many directions since its inception in October of 2017. Much of the early conversation took place on public social media sites such as Twitter, where the hashtag movement began. OBJECTIVE The aim of this study is to document, characterize, and quantify early public discourse and conversation of the #MeToo movement from Twitter data in the United States. We focus on posts with public first-person revelations of sexual assault/abuse and early life experiences of such events. METHODS We purchased full tweets and associated metadata from the Twitter Premium application programming interface between October 14 and 21, 2017 (ie, the first week of the movement). We examined the content of novel English language tweets with the phrase “MeToo” from within the United States (N=11,935). We used machine learning methods, least absolute shrinkage and selection operator regression, and support vector machine models to summarize and classify the content of individual tweets with revelations of sexual assault and abuse and early life experiences of sexual assault and abuse. RESULTS We found that the most predictive words created a vivid archetype of the revelations of sexual assault and abuse. We then estimated that in the first week of the movement, 11% of novel English language tweets with the words “MeToo” revealed details about the poster’s experience of sexual assault or abuse and 5.8% revealed early life experiences of such events. We examined the demographic composition of posters of sexual assault and abuse and found that white women aged 25-50 years were overrepresented in terms of their representation on Twitter. Furthermore, we found that the mass sharing of personal experiences of sexual assault and abuse had a large reach, where 6 to 34 million Twitter users may have seen such first-person revelations from someone they followed in the first week of the movement. CONCLUSIONS These data illustrate that revelations shared went beyond acknowledgement of having experienced sexual harassment and often included vivid and traumatic descriptions of early life experiences of assault and abuse. These findings and methods underscore the value of content analysis, supported by novel machine learning methods, to improve our understanding of how widespread the revelations were, which likely amplified the spread and saliency of the #MeToo movement.

Download Full-text

The #MeToo Movement in the United States: Text Analysis of Early Twitter Conversations

Journal of Medical Internet Research ◽

10.2196/13837 ◽

2019 ◽

Vol 21 (9) ◽

pp. e13837 ◽

Cited By ~ 2

Author(s):

Sepideh Modrek ◽

Bozhidar Chakalov

Keyword(s):

United States ◽

Machine Learning ◽

Sexual Assault ◽

Sexual Harassment ◽

Early Life ◽

English Language ◽

Life Experiences ◽

The United States ◽

Learning Methods ◽

Machine Learning Methods

Background The #MeToo movement sparked an international debate on the sexual harassment, abuse, and assault and has taken many directions since its inception in October of 2017. Much of the early conversation took place on public social media sites such as Twitter, where the hashtag movement began. Objective The aim of this study is to document, characterize, and quantify early public discourse and conversation of the #MeToo movement from Twitter data in the United States. We focus on posts with public first-person revelations of sexual assault/abuse and early life experiences of such events. Methods We purchased full tweets and associated metadata from the Twitter Premium application programming interface between October 14 and 21, 2017 (ie, the first week of the movement). We examined the content of novel English language tweets with the phrase “MeToo” from within the United States (N=11,935). We used machine learning methods, least absolute shrinkage and selection operator regression, and support vector machine models to summarize and classify the content of individual tweets with revelations of sexual assault and abuse and early life experiences of sexual assault and abuse. Results We found that the most predictive words created a vivid archetype of the revelations of sexual assault and abuse. We then estimated that in the first week of the movement, 11% of novel English language tweets with the words “MeToo” revealed details about the poster’s experience of sexual assault or abuse and 5.8% revealed early life experiences of such events. We examined the demographic composition of posters of sexual assault and abuse and found that white women aged 25-50 years were overrepresented in terms of their representation on Twitter. Furthermore, we found that the mass sharing of personal experiences of sexual assault and abuse had a large reach, where 6 to 34 million Twitter users may have seen such first-person revelations from someone they followed in the first week of the movement. Conclusions These data illustrate that revelations shared went beyond acknowledgement of having experienced sexual harassment and often included vivid and traumatic descriptions of early life experiences of assault and abuse. These findings and methods underscore the value of content analysis, supported by novel machine learning methods, to improve our understanding of how widespread the revelations were, which likely amplified the spread and saliency of the #MeToo movement.

Download Full-text

Machine Learning Methods to Identify Missed Cases of Bladder Cancer in Population-Based Registries

JCO Clinical Cancer Informatics ◽

10.1200/cci.20.00170 ◽

2021 ◽

pp. 641-653

Author(s):

Anne-Michelle Noone ◽

Clara J. K. Lam ◽

Angela B. Smith ◽

Matthew E. Nielsen ◽

Eric Boyd ◽

...

Keyword(s):

United States ◽

Machine Learning ◽

Bladder Cancer ◽

Cancer Incidence ◽

Cancer Registries ◽

The United States ◽

Population Based ◽

Learning Methods ◽

Machine Learning Methods ◽

Classification And Regression

PURPOSE Population-based cancer incidence rates of bladder cancer may be underestimated. Accurate estimates are needed for understanding the burden of bladder cancer in the United States. We developed and evaluated the feasibility of a machine learning–based classifier to identify bladder cancer cases missed by cancer registries, and estimated the rate of bladder cancer cases potentially missed. METHODS Data were from population-based cohort of 37,940 bladder cancer cases 65 years of age and older in the SEER cancer registries linked with Medicare claims (2007-2013). Cases with other urologic cancers, abdominal cancers, and unrelated cancers were included as control groups. A cohort of cancer-free controls was also selected using the Medicare 5% random sample. We used five supervised machine learning methods: classification and regression trees, random forest, logic regression, support vector machines, and logistic regression, for predicting bladder cancer. RESULTS Registry linkages yielded 37,940 bladder cancer cases and 766,303 cancer-free controls. Using health insurance claims, classification and regression trees distinguished bladder cancer cases from noncancer controls with very high accuracy (95%). Bacille Calmette-Guerin, cystectomy, and mitomycin were the most important predictors for identifying bladder cancer. From 2007 to 2013, we estimated that up to 3,300 bladder cancer cases in the United States may have been missed by the SEER registries. This would result in an average of 3.5% increase in the reported incidence rate. CONCLUSION SEER cancer registries may potentially miss bladder cancer cases during routine reporting. These missed cases can be identified leveraging Medicare claims and data analytics, leading to more accurate estimates of bladder cancer incidence.

Download Full-text

Exploring the Relationship Between Chlorophyll-a and Other Water Quality Parameters by Using Machine Learning Methods:A Case Study of Lake Erie

10.5194/egusphere-egu21-14933 ◽

2021 ◽

Author(s):

Xue Hu ◽

Jinhui Jeanne Huang ◽

Yu Li

Keyword(s):

Neural Network ◽

United States ◽

Machine Learning ◽

Water Quality ◽

Chlorophyll A ◽

Lake Erie ◽

The United States ◽

Learning Methods ◽

Machine Learning Methods ◽

Input Variables

<p>Chlorophyll a (CHLA) is a key water quality indicator for the eutrophication of Lake Erie. In order to better predict the concentration of CHLA, this study divided Lake Erie into the United States and Canada according to national boundaries, and found the input variables most relevant to CHLA. It is concluded that the United States is total phosphorus (TP), and Canada is total nitrogen (TN), and it is analyzed that industrial and agricultural pollution around Lake Erie has caused excessive TP and TN content. The study used machine learning methods to model the water quality of the two parts respectively. The data used in the modelling was obtained from the Canadian Environment and Climate Change Agency for Lake Erie between 2000 and 2018. Several neural network (NN) models and other machine learning methods are used for data analysis, including standard neural network (NN) models, simple recurrent neural network (SRN) models, backpropagation neural network (BPNN) models, jump connections neural network (JCNN) model, random forest (RF) and support vector machine (SVM). At the same time, the most suitable combinations of input variables for CHLA prediction was found. The United States was TP, TN, DO, and T, and Canada was TP, TN, PH, and DO. Combining this result with the environmental protection policies of the United States and Canada, recommendations for improving the pollutant content of Lake Erie were proposed. This will help reduce the risk of eutrophication in Lake Erie.</p>

Download Full-text

Machine Learning Models of COVID-19 Cases in the United States: A Study of Initial Lockdown and Reopen Regimes

Applied Sciences ◽

10.3390/app112311227 ◽

2021 ◽

Vol 11 (23) ◽

pp. 11227

Author(s):

Arnold Kamis ◽

Yudan Ding ◽

Zhenzhen Qu ◽

Chenchen Zhang

Keyword(s):

United States ◽

Machine Learning ◽

Additive Model ◽

Regression Tree ◽

Predictor Variable ◽

The United States ◽

Predictor Variables ◽

Future Research ◽

Machine Learning Methods ◽

Variance Explained

The purpose of this paper is to model the cases of COVID-19 in the United States from 13 March 2020 to 31 May 2020. Our novel contribution is that we have obtained highly accurate models focused on two different regimes, lockdown and reopen, modeling each regime separately. The predictor variables include aggregated individual movement as well as state population density, health rank, climate temperature, and political color. We apply a variety of machine learning methods to each regime: Multiple Regression, Ridge Regression, Elastic Net Regression, Generalized Additive Model, Gradient Boosted Machine, Regression Tree, Neural Network, and Random Forest. We discover that Gradient Boosted Machines are the most accurate in both regimes. The best models achieve a variance explained of 95.2% in the lockdown regime and 99.2% in the reopen regime. We describe the influence of the predictor variables as they change from regime to regime. Notably, we identify individual person movement, as tracked by GPS data, to be an important predictor variable. We conclude that government lockdowns are an extremely important de-densification strategy. Implications and questions for future research are discussed.

Download Full-text

Age of Migration and the Health Status of Older Latinos: Findings From the Health and Retirement Study

Innovation in Aging ◽

10.1093/geroni/igab046.1752 ◽

2021 ◽

Vol 5 (Supplement_1) ◽

pp. 454-454

Author(s):

Blakelee Kemp ◽

Marc Garcia

Keyword(s):

United States ◽

Health Outcomes ◽

Activities Of Daily Living ◽

Early Life ◽

Daily Living ◽

Life Experiences ◽

The United States ◽

Latino Men ◽

Physical Health Outcomes ◽

Retirement Study

Abstract Life course research emphasizes the importance of considering how early life experiences set individuals on specific trajectories over time with implications across multiple health domains. Life experiences of older Latinos are shaped by where they were born and, for the foreign-born, when they immigrated to the United States. Prior research examining the extent to which age of migration is associated with health has largely been limited to regional studies. To address this gap in knowledge, we use nationally representative data from the Health and Retirement Study to examine associations between age of migration and multiple physical health outcomes among older Latinos residing in the United States. We examine 2010 prevalence and follow-up incidence to 2016 of cardiovascular issues, diabetes, one or more activities of daily living (ADLs), one or more instrumental activities of daily living (IADLs), cognitive issues, and mortality incidence. Preliminary results indicate similar health profiles across Latinos who migrated in early life (<18), during adulthood (18-34), and during later adulthood (35+). Most health profiles were similar among Latino men and women except for prevalence and incidence of experiencing difficulties with at least one ADL. Latino women who migrated in later-adulthood have higher prevalence of ADLs and women who migrated early in life (>18) have higher ADL incidence than Latino men who migrated during the same life course periods. A greater understanding of the how immigrant experiences influence physical health outcomes offers important insights into the development of actionable and culturally appropriate social and health policies.

Download Full-text

Evaluating early life risk factors for late life health outcome using probabilistic matching of early and late life cohorts.

10.1101/2020.07.21.20158857 ◽

2020 ◽

Author(s):

Adina Zeki Al Hazzouri ◽

Katrina Kezios ◽

Scott Zimmerman ◽

Sebastian Calonico ◽

M. Maria Glymour

Keyword(s):

United States ◽

Risk Factors ◽

Health Outcome ◽

Early Life ◽

Life Experiences ◽

Late Life ◽

The United States ◽

Matching Method ◽

Synthetic Cohort

Research on Alzheimer's Disease and Related Dementias (ADRD) is hampered by the absence of studies including prospective follow-up from early life through older ages when ADRD is diagnosed. This is a notable gap in the United States and impedes research on lifecourse determinants of ADRD and ADRD disparities, many of which appear attributable to early life experiences. In this simulation project, we evaluate a matching method to create a synthetic lifecourse cohort by merging early and late life cohorts on a set of harmonized covariates. We evaluate performance under several causal scenarios for the association between our exposure and outcome, and varying characteristics of the matching method. In scenarios when a measure is available along all pathways linking exposure and outcome, the synthetic cohort performs well, with bias approaching null as the number of matching levels increases. This approach may create novel opportunities to rigorously evaluate early- and mid-life determinants of ADRD and ADRD disparities.

Download Full-text

Unlocking GOES: A Statistical Framework for Quantifying the Evolution of Convective Structure in Tropical Cyclones

Journal of Applied Meteorology and Climatology ◽

10.1175/jamc-d-19-0286.1 ◽

2020 ◽

Vol 59 (10) ◽

pp. 1671-1689

Author(s):

Trey McNeely ◽

Ann B. Lee ◽

Kimberly M. Wood ◽

Dorit Hammerling

Keyword(s):

Machine Learning ◽

Tropical Cyclones ◽

Satellite Imagery ◽

The United States ◽

Intensity Change ◽

Complex Data ◽

Learning Methods ◽

Machine Learning Methods ◽

Convective Structure ◽

Structure Patterns

AbstractTropical cyclones (TCs) rank among the most costly natural disasters in the United States, and accurate forecasts of track and intensity are critical for emergency response. Intensity guidance has improved steadily but slowly, as processes that drive intensity change are not fully understood. Because most TCs develop far from land-based observing networks, geostationary satellite imagery is critical to monitor these storms. However, these complex data can be challenging to analyze in real time, and off-the-shelf machine-learning algorithms have limited applicability on this front because of their “black box” structure. This study presents analytic tools that quantify convective structure patterns in infrared satellite imagery for overocean TCs, yielding lower-dimensional but rich representations that support analysis and visualization of how these patterns evolve during rapid intensity change. The proposed feature suite targets the global organization, radial structure, and bulk morphology (ORB) of TCs. By combining ORB and empirical orthogonal functions, we arrive at an interpretable and rich representation of convective structure patterns that serve as inputs to machine-learning methods. This study uses the logistic lasso, a penalized generalized linear model, to relate predictors to rapid intensity change. Using ORB alone, binary classifiers identifying the presence (vs absence) of such intensity-change events can achieve accuracy comparable to classifiers using environmental predictors alone, with a combined predictor set improving classification accuracy in some settings. More complex nonlinear machine-learning methods did not perform better than the linear logistic lasso model for current data.

Download Full-text

A gap in the sport management curriculum: An analysis of sexual harassment and sexual assault education in the United States

Journal of Hospitality Leisure Sport & Tourism Education ◽

10.1016/j.jhlste.2017.04.004 ◽

2017 ◽

Vol 20 ◽

pp. 65-75 ◽

Cited By ~ 7

Author(s):

Elizabeth A. Taylor ◽

Robin Hardin

Keyword(s):

United States ◽

Sexual Assault ◽

Sexual Harassment ◽

Sport Management ◽

The United States ◽

Management Curriculum

Download Full-text

Toward the use of neural networks for influenza prediction at multiple spatial resolutions

Science Advances ◽

10.1126/sciadv.abb1237 ◽

2021 ◽

Vol 7 (25) ◽

pp. eabb1237

Author(s):

Emily L. Aiken ◽

Andre T. Nguyen ◽

Cecile Viboud ◽

Mauricio Santillana

Keyword(s):

Neural Network ◽

Machine Learning ◽

Real Time ◽

The United States ◽

Network Approach ◽

Internet Search ◽

Learning Methods ◽

Neural Network Approach ◽

Machine Learning Methods ◽

Search Data

Mitigating the effects of disease outbreaks with timely and effective interventions requires accurate real-time surveillance and forecasting of disease activity, but traditional health care–based surveillance systems are limited by inherent reporting delays. Machine learning methods have the potential to fill this temporal “data gap,” but work to date in this area has focused on relatively simple methods and coarse geographic resolutions (state level and above). We evaluate the predictive performance of a gated recurrent unit neural network approach in comparison with baseline machine learning methods for estimating influenza activity in the United States at the state and city levels and experiment with the inclusion of real-time Internet search data. We find that the neural network approach improves upon baseline models for long time horizons of prediction but is not improved by real-time internet search data. We conduct a thorough analysis of feature importances in all considered models for interpretability purposes.

Download Full-text

The Dynamics of Political Incivility on Twitter

SAGE Open ◽

10.1177/2158244020919447 ◽

2020 ◽

Vol 10 (2) ◽

pp. 215824402091944 ◽

Cited By ~ 1

Author(s):

Yannis Theocharis ◽

Pablo Barberá ◽

Zoltán Fazekas ◽

Sebastian Adrian Popa

Keyword(s):

United States ◽

Machine Learning ◽

Political Communication ◽

The United States ◽

Time Span ◽

Supervised Machine Learning ◽

Machine Learning Methods ◽

Policy Debates ◽

Political Events ◽

Descriptive Account

Online incivility and harassment in political communication have become an important topic of concern among politicians, journalists, and academics. This study provides a descriptive account of uncivil interactions between citizens and politicians on Twitter. We develop a conceptual framework for understanding the dynamics of incivility at three distinct levels: macro (temporal), meso (contextual), and micro (individual). Using longitudinal data from the Twitter communication mentioning Members of Congress in the United States across a time span of over a year and relying on supervised machine learning methods and topic models, we offer new insights about the prevalence and dynamics of incivility toward legislators. We find that uncivil tweets represent consistently around 18% of all tweets mentioning legislators, but with spikes that correspond to controversial policy debates and political events. Although we find evidence of coordinated attacks, our analysis reveals that the use of uncivil language is common to a large number of users.

Download Full-text