Discovering Cohorts of Pregnant Women From Social Media for Safety Surveillance and Analysis (Preprint)

Mapping Intimacies ◽

10.2196/preprints.8164 ◽

2017 ◽

Author(s):

Abeed Sarker ◽

Pramod Chandrashekar ◽

Arjun Magge ◽

Haitao Cai ◽

Ari Klein ◽

...

Keyword(s):

Social Media ◽

Pregnant Women ◽

Language Processing ◽

Classification System ◽

Future Research ◽

Sources Of Information ◽

Blind Test ◽

Other Information ◽

Positive Class ◽

The Many

BACKGROUND Pregnancy exposure registries are the primary sources of information about the safety of maternal usage of medications during pregnancy. Such registries enroll pregnant women in a voluntary fashion early on in pregnancy and follow them until the end of pregnancy or longer to systematically collect information regarding specific pregnancy outcomes. Although the model of pregnancy registries has distinct advantages over other study designs, they are faced with numerous challenges and limitations such as low enrollment rate, high cost, and selection bias. OBJECTIVE The primary objectives of this study were to systematically assess whether social media (Twitter) can be used to discover cohorts of pregnant women and to develop and deploy a natural language processing and machine learning pipeline for the automatic collection of cohort information. In addition, we also attempted to ascertain, in a preliminary fashion, what types of longitudinal information may potentially be mined from the collected cohort information. METHODS Our discovery of pregnant women relies on detecting pregnancy-indicating tweets (PITs), which are statements posted by pregnant women regarding their pregnancies. We used a set of 14 patterns to first detect potential PITs. We manually annotated a sample of 14,156 of the retrieved user posts to distinguish real PITs from false positives and trained a supervised classification system to detect real PITs. We optimized the classification system via cross validation, with features and settings targeted toward optimizing precision for the positive class. For users identified to be posting real PITs via automatic classification, our pipeline collected all their available past and future posts from which other information (eg, medication usage and fetal outcomes) may be mined. RESULTS Our rule-based PIT detection approach retrieved over 200,000 posts over a period of 18 months. Manual annotation agreement for three annotators was very high at kappa (κ)=.79. On a blind test set, the implemented classifier obtained an overall F1 score of 0.84 (0.88 for the pregnancy class and 0.68 for the nonpregnancy class). Precision for the pregnancy class was 0.93, and recall was 0.84. Feature analysis showed that the combination of dense and sparse vectors for classification achieved optimal performance. Employing the trained classifier resulted in the identification of 71,954 users from the collected posts. Over 250 million posts were retrieved for these users, which provided a multitude of longitudinal information about them. CONCLUSIONS Social media sources such as Twitter can be used to identify large cohorts of pregnant women and to gather longitudinal information via automated processing of their postings. Considering the many drawbacks and limitations of pregnancy registries, social media mining may provide beneficial complementary information. Although the cohort sizes identified over social media are large, future research will have to assess the completeness of the information available through them.

Download Full-text

From Dark to Light: The Many Shades of Sharing Misinformation Online

Media and Communication ◽

10.17645/mac.v9i1.3409 ◽

2021 ◽

Vol 9 (1) ◽

pp. 134-143 ◽

Cited By ~ 1

Author(s):

Miriam J. Metzger ◽

Andrew J. Flanagin ◽

Paul Mena ◽

Shan Jiang ◽

Christo Wilson

Keyword(s):

Social Networks ◽

Social Media ◽

Language Processing ◽

Large Scale ◽

Future Research ◽

Online Networks ◽

Wide Range ◽

Report Data ◽

Preliminary Study ◽

The Many

Research typically presumes that people believe misinformation and propagate it through their social networks. Yet, a wide range of motivations for sharing misinformation might impact its spread, as well as people’s belief of it. By examining research on motivations for sharing news information generally, and misinformation specifically, we derive a range of motivations that broaden current understandings of the sharing of misinformation to include factors that may to some extent mitigate the presumed dangers of misinformation for society. To illustrate the utility of our viewpoint we report data from a preliminary study of people’s dis/belief reactions to misinformation shared on social media using natural language processing. Analyses of over 2,5 million comments demonstrate that misinformation on social media is often disbelieved. These insights are leveraged to propose directions for future research that incorporate a more inclusive understanding of the various motivations and strategies for sharing misinformation socially in large-scale online networks.

Download Full-text

Categorizing Blogs as Information Sources for Libraries and Information Science

Encyclopedia of Information Science and Technology, Third Edition ◽

10.4018/978-1-4666-5888-2.ch475 ◽

2015 ◽

pp. 4833-4845

Author(s):

Mark-Shane Scale ◽

Anabel Quan-Haase

Keyword(s):

Social Media ◽

Information Needs ◽

Information Sources ◽

Information Science ◽

Information Source ◽

Future Research ◽

Collection Development ◽

Sources Of Information ◽

Additional Information ◽

Source Category

Blogs are important sources of information currently used in the work of professionals, institutions and academics. Nevertheless, traditional information needs and uses research has not yet discussed where blogs fit in the existing typologies of information sources. Blogs and other types of social media have several characteristics that blur the lines of distinction existent between traditional information source categories. This chapter brings this research problem to the fore. Not only do we examine why blogs do not neatly fit into existing information source categories, but we also deliberate the implications for libraries in terms of the need to consider blogs as an information source to be included in collection development. We discuss the opportunities and possibilities for blogs to be integrated into the collection development efforts of academic and public libraries to better serve patrons. In order to accommodate for blogs and other types of social media as information sources, we propose the introduction of an additional information source category. We suggest new avenues of future research that investigate how blogs are being used to meet information needs in various social settings, such as corporations, health care and educational settings (e.g., higher education, and schools). In this chapter, we develop a framework of how blogs may function as information sources to provide libraries with a better understanding of how blogs are integrated into the context of everyday information seeking. By grouping the ways in which people employ blogs to acquire information, we propose that blogs provide information sources along a continuum ranging from non-fiction to fictional information.

Download Full-text

Proactively Discouraging Cyberbullying Activities

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.38496 ◽

2021 ◽

Vol 9 (10) ◽

pp. 1601-1607

Author(s):

Puneetha KR

Keyword(s):

Social Media ◽

Language Processing ◽

Leisure Activities ◽

Cyber Bullying ◽

Future Research ◽

Internet Users ◽

The Social ◽

Social Media Platforms ◽

Root Word ◽

Cyberbullying Detection

Abstract: Research into cyberbullying detection has increased in recent years, due in part to the proliferation of cyberbullying across social media and its detrimental effect on young people. Cyber bullying is one of the most common problems faced by the internet users making internet a vulnerable space hence there has to be some detection that is needed on the social media platforms. Detecting the bullies online at the earliest makes sure that these platforms are safer for the user and internet indeed becomes a platform to share information and use it for other leisure activities. Even though there has been some research going on implementing detection and prevention of cyber bullying, it is not completely feasible due to certain limitations imposed. In this paper lexicon-based approach of the NLTK sentiwordnetis used to differentiate the positive and negative words and produce results. These words are given negative and positive values greater than or less than zero for positive and negative words respectively. Lexicon based systems utilize word lists and use the presence of words within the lists to detect cyberbullying. Lemmatization is used to find the root word. This paper essentially maps out the state-of-the-art in cyberbullying detection research and serves as a resource for researchers to determine where to best direct their future research efforts in thisfield. Keywords: Abuse and crime involving computers, natural language processing, sentiment analysis, social networking

Download Full-text

A Review on the Detection of Offensive Content in Social Media Platforms

FUOYE Journal of Engineering and Technology ◽

10.46792/fuoyejet.v6i1.591 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Solomon Akinboro ◽

Oluwadamilola Adebusoye ◽

Akintoye Onamade

Keyword(s):

Social Media ◽

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Hybrid Methods ◽

Future Research ◽

Learning Approaches ◽

Hybrid Approaches ◽

Use Of Social Media

Offensive content refers to messages which are socially unacceptable including vulgar or derogatory messages. As the use of social media increases worldwide, social media administrators are faced with the challenges of tackling the inclusion of offensive content, to ensure clean and non-abusive or offensive conversations on the platforms they provide. This work organizes and describes techniques used for the automated detection of offensive languages in social media content in recent times, providing a structured overview of previous approaches, including algorithms, methods and main features used. Selection was from peer-reviewed articles on Google scholar. Search terms include: Profane words, natural language processing, multilingual context, hybrid methods for detecting profane words and deep learning approach for detecting profane words. Exclusions were made based on some criteria. Initial search returned 203 of which only 40 studies met the inclusion criteria; 6 were on natural language processing, 6 studies were on Deep learning approaches, 5 reports analysed hybrid approaches, multi-level classification/multi-lingual classification appear in 13 reports while 10 reports were on other related methods.The limitations of previous efforts to tackle the challenges with regards to the detection of offensive contents are highlighted to aid future research in this area. Keywords— algorithm, offensive content, profane words, social media, texts

Download Full-text

Social Media in State Governments

Advances in Electronic Government, Digital Divide, and Regional Development - E-Government Implementation and Practice in Developing Countries ◽

10.4018/978-1-4666-4090-0.ch006 ◽

2013 ◽

pp. 128-146 ◽

Cited By ~ 6

Author(s):

Rodrigo Sandoval-Almazan ◽

J. Ramon Gil-Garcia

Keyword(s):

Social Media ◽

Empirical Studies ◽

Future Research ◽

State Governments ◽

Government Officials ◽

Depth Analysis ◽

Other Information ◽

Social Media Tools ◽

State And Local ◽

Local Levels

More than other information technology, social media has the potential to improve communication, participation, and collaboration between governments and citizens. The widespread use of Facebook, YouTube, Twitter, and blogs among citizens has forced government officials to use these technologies to reach citizens, interact with them, and legitimate policies and public decisions. Despite this great potential and the relevance of social media in today’s society, there is still a relatively limited number of empirical studies that attempt to understand how governments are using these tools, particularly at the state and local levels. The main objective of this research is to understand how state governments are using Web 2.0 technologies and to provide some conceptual elements for future research in this area. Based on a longitudinal review of the 32 state Websites in Mexico and a more in-depth analysis of two cases, this chapter provides preliminary results on how state governments are using two of the most well known social media tools: Facebook and Twitter. The chapter highlights some differences and similarities among state governments. It also provides some initial ideas about how to develop a more comprehensive strategy for using social media tools and applications in state governments.

Download Full-text

Location impact on source and linguistic features for information credibility of social media

Online Information Review ◽

10.1108/oir-03-2018-0087 ◽

2019 ◽

Vol 43 (1) ◽

pp. 89-112 ◽

Cited By ~ 4

Author(s):

Suliman Aladhadh ◽

Xiuzhen Zhang ◽

Mark Sanderson

Keyword(s):

Social Media ◽

Information Source ◽

Physical Distance ◽

Future Research ◽

Semantic Features ◽

Content Type ◽

Other Information ◽

Social Media Platforms ◽

The Impact ◽

The Relationship

PurposeSocial media platforms provide a source of information about events. However, this information may not be credible, and the distance between an information source and the event may impact on that credibility. Therefore, the purpose of this paper is to address an understanding of the relationship between sources, physical distance from that event and the impact on credibility in social media.Design/methodology/approachIn this paper, the authors focus on the impact of location on the distribution of content sources (informativeness and source) for different events, and identify the semantic features of the sources and the content of different credibility levels.FindingsThe study found that source location impacts on the number of sources across different events. Location also impacts on the proportion of semantic features in social media content.Research limitations/implicationsThis study illustrated the influence of location on credibility in social media. The study provided an overview of the relationship between content types including semantic features, the source and event locations. However, the authors will include the findings of this study to build the credibility model in the future research.Practical implicationsThe results of this study provide a new understanding of reasons behind the overestimation problem in current credibility models when applied to different domains: such models need to be trained on data from the same place of event, as that can make the model more stable.Originality/valueThis study investigates several events – including crisis, politics and entertainment – with steady methodology. This gives new insights about the distribution of sources, credibility and other information types within and outside the country of an event. Also, this study used the power of location to find alternative approaches to assess credibility in social media.

Download Full-text

A Systematic Literature Review of Personality Trait Classification from Textual Content

Open Computer Science ◽

10.1515/comp-2020-0188 ◽

2020 ◽

Vol 10 (1) ◽

pp. 175-193

Author(s):

Hussain Ahmad ◽

Muhammad Zubair Asghar ◽

Alam Sher Khan ◽

Anam Habib

Keyword(s):

Social Media ◽

Literature Review ◽

Language Processing ◽

Systematic Literature Review ◽

Personality Trait ◽

Personal Data ◽

Research Area ◽

Future Research ◽

Detection Techniques ◽

Social Media Networks

AbstractThe day-to-day use of digital devices with Internet access, such as tablets and smartphones, has increased exponentially in recent years and this has had a consequent effect on the usage of the Internet and social media networks. When using social networks, people share personal data that is broadcast between users, which provides useful information for organizations. This means that characterizing users through their social media activity is an emerging research area in the field of Natural Language Processing (NLP) and this paper will present a review of how personality can be detected using online content.Approach A systematic literature review identified 30 papers published between 2007 and 2019, while particular inclusion and exclusion criteria were used to select the most relevant articles.Outcomes This review describes a variety of challenges and trends, as well as providing ideas for the direction of future research. In addition, personality trait identification and techniques were classified into different types, including deep learning, machine learning (ML) and semi-supervised/hybrid.Implications This paper’s outcomes will not only facilitate insight into the various personality types and models but will also provide knowledge about the relevant detection techniques.Novelty While prior studies have conducted literature reviews in the personality trait detection field, the systematic literature review in this paper provides specific answers to the proposed research questions. This is novel to this field as this particular type of study has not been conducted before.

Download Full-text

On the Alert for Share Price Manipulation and Inadvertent Disclosure in Social Media Channels

Advances in Media, Entertainment, and the Arts - Handbook of Research on Deception, Fake News, and Misinformation Online ◽

10.4018/978-1-5225-8535-0.ch015 ◽

2019 ◽

pp. 265-280

Author(s):

Darren P. Ingram

Keyword(s):

Social Media ◽

Status Quo ◽

Share Price ◽

Future Research ◽

Exploratory Research ◽

Social Media Networks ◽

Price Manipulation ◽

Other Information ◽

The Status ◽

Media Channels

Social media networks offer a tremendous opportunity for the dissemination of financial and other information globally to companies. It can be immensely useful for stakeholders and investors too. So far its permitted use as a primary disclosure channel is restricted. Some risks also exist through inadvertent disclosure of information, as well as potential share price manipulation, yet are companies necessarily aware and armed to handle the risks? This research conducts exploratory research into the attitudes of Nordic companies, in a region where social media primary disclosure is not permitted, to analyze the status quo and consider any risks that may prevail. Possible action changes and future research opportunities are also examined.

Download Full-text

Social and Informational Affordances of Social Media in Music Learning and Teaching

The Oxford Handbook of Social Media and Music Learning ◽

10.1093/oxfordhb/9780190660772.013.25 ◽

2020 ◽

pp. 425-442

Author(s):

Anabel Quan-Haase

Keyword(s):

Social Media ◽

Music Education ◽

Future Research ◽

Positive Outcomes ◽

Music Learning ◽

Learning And Teaching ◽

The Social ◽

21St Century Learners ◽

The Many

This chapter examines the role of social media in music learning and teaching with the aim of discerning the affordances created by specific features and functions. While much scholarship has outlined the many merits and possibilities of including social media in formal and informal music education, not much is known about what aspects of social media lead to positive outcomes. Music education is defined broadly here and includes both learning about music and learning with the purpose of achieving classroom goals. The majority of research either tends to focus on single platforms or discusses social media more generally. The present chapter starts with a close look at the affordance concept, tracing its historical roots and problematizing its definition. The chapter then discusses how various affordances can contribute to different aspects of music education. Much of the literature on social media has examined the social affordances of social media and neglected to consider the informational affordances. The chapter argues that both social and informational affordances are important in investigations of social media for music education. Finally, conclusions are discussed for 21st-century learners, and the advantages of employing the affordances framework in studies of music education and social media are outlined. Future research based on the affordances framework that could examine what features and functions of social media are beneficial for music learning and teaching are examined, including discussion of a series of constraints placed on learners and teachers by the technology.

Download Full-text

Social Media as an Emerging Data Resource for Epidemiologic Research: Characteristics of Regular and Nonregular Social Media Users in Nurses’ Health Study II

American Journal of Epidemiology ◽

10.1093/aje/kwz224 ◽

2019 ◽

Vol 189 (2) ◽

pp. 156-161 ◽

Cited By ~ 1

Author(s):

Eric S Kim ◽

Peter James ◽

Emily S Zevon ◽

Claudia Trudel-Fitzgerald ◽

Laura D Kubzansky ◽

...

Keyword(s):

Social Media ◽

Confidence Interval ◽

Psychosocial Factors ◽

Language Processing ◽

Odds Ratio ◽

Statistical Significance ◽

Effect Sizes ◽

Health Study ◽

Future Research ◽

Data Resource

Abstract With advances in natural language processing and machine learning, researchers are leveraging social media as a low-cost, low-burden method for measuring various psychosocial factors. However, it is unclear whether information derived from social media is generalizable to broader populations, especially middle-aged and older adults. Using data on women aged 53–70 years from Nurses’ Health Study II (2017–2018; n = 49,045), we assessed differences in sociodemographic characteristics, health conditions, behaviors, and psychosocial factors between regular and nonregular users of Facebook (Facebook, Inc., Menlo Park, California). We evaluated effect sizes with phi (φ) coefficients (categorical data) or Cohen’s d (continuous data) and calculated odds ratios with 95% confidence intervals. While most comparisons between regular and nonregular users achieved statistical significance in this large sample, effect sizes were mostly “very small” (conventionally defined as φ or d <0.01) (e.g., optimism score: meanregular users = 19 vs. meannonregular users = 19 (d = −0.03); physical activity: meanregular users = 24 metabolic equivalent of task (MET)-hours/week vs. meannonregular users = 24 MET-hours/week (d = 0.01)). Some factors had slightly larger differences for regular users versus nonregular users (e.g., depression: 28% vs. 23% (φ = 0.05); odds ratio = 1.27 (95% confidence interval: 1.22, 1.33); obesity: 34% vs. 26% (φ = 0.07); odds ratio = 1.42 (95% confidence interval: 1.36, 1.48)). Results suggest that regular Facebook users were similar to nonregular users across sociodemographic and psychosocial factors, with modestly worse health regarding obesity and depressive symptoms. In future research, investigators should evaluate other demographic groups.

Download Full-text