Variationist sociolinguistics and corpus-based variationist linguistics: overlap and cross-pollination potential

Benedikt Szmrecsanyi

doi:10.1017/cnj.2017.34

Variationist sociolinguistics and corpus-based variationist linguistics: overlap and cross-pollination potential

The Canadian Journal of Linguistics / La revue canadienne de linguistique ◽

10.1017/cnj.2017.34 ◽

2017 ◽

Vol 62 (4) ◽

pp. 685-701 ◽

Cited By ~ 7

Author(s):

Benedikt Szmrecsanyi

Keyword(s):

Corpus Linguistics ◽

Cross Pollination ◽

Research Areas ◽

Variation Patterns ◽

Variationist Sociolinguistics ◽

Usage Patterns ◽

Probabilistic Grammars ◽

Formally Trained ◽

Usage Data

AbstractThe paper surveys overlap between corpus linguistics and variationist sociolinguistics. Corpus linguistics is customarily defined as a methodology that bases claims about language on usage patterns in collections of naturalistic, authentic speech or text. Because this is what is typically done in variationist sociolinguistics work, I argue that variationist sociolinguists are by definition corpus linguists, though of course the reverse is not true: the variationist method entails more than merely analyzing usage data, and not all corpus analysts are interested in variation. But that being said, a considerable and arguably increasing number of corpus linguists not formally trained in variationist sociolinguistics are explicitly concerned with variation and engage in what I callcorpus-based variationist linguistics(CVL). I first discuss what unites or divides work in CVL and in variationist sociolinguistics. In a plea to cross subdisciplinary boundaries, I subsequently identify three research areas where variationist sociolinguists may draw inspiration from work in CVL: conducting multi-variable research, paying more attention to probabilistic grammars, and taking more seriously the register-sensitivity of variation patterns.

Download Full-text

Personality traits, adjectives and gender

Journal of Language and Discrimination ◽

10.1558/jld.40370 ◽

2020 ◽

Vol 4 (1) ◽

pp. 16-50

Author(s):

Heiko Motschenbacher ◽

Eka Roivainen

Keyword(s):

Personality Traits ◽

Personality Trait ◽

Corpus Linguistics ◽

American English ◽

Psychological Analysis ◽

Usage Patterns ◽

Google Books ◽

And Gender ◽

Interdisciplinary Study ◽

The Relationship

There have been linguistic studies on the gendering mechanisms of adjectives and psychological studies on the relationship between personality traits and gender, but the two fields have never entered into a dialogue on these issues. This article seeks to address this gap by presenting an interdisciplinary study that explores the gendering mechanisms associated with personality traits and personality trait-denoting adjectives. The findings of earlier work in this area and basic gendering mechanisms relevant to adjectives and personality traits are outlined. This is followed by a linguistic and a psychological analysis of the usage patterns of a set of personality trait adjectives. The linguistic section draws on corpus linguistics to explore the distribution of these adjectives with female, male and gender-neutral personal nouns in the Corpus of Contemporary American English. The psychological analysis relates the usage frequencies of personality trait adjectives with the nouns man, woman and person in the Google Books corpus to desirability ratings of the adjectives.

Download Full-text

462 Redefining Positive Airway Pressure Adherence Phenotypes Utilizing Deep Neural Networks and Unsupervised Clustering

SLEEP ◽

10.1093/sleep/zsab072.461 ◽

2021 ◽

Vol 44 (Supplement_2) ◽

pp. A182-A182

Author(s):

Yoav Nygate ◽

Sam Rusk ◽

Chris Fernandez ◽

Nick Glattard ◽

Nathaniel Watson ◽

...

Keyword(s):

Airway Pressure ◽

Clustering Algorithm ◽

Positive Airway Pressure ◽

Treatment Success ◽

Unsupervised Clustering ◽

Additional Patient ◽

Apnea Hypopnea Index ◽

Obstructive Sleep ◽

Usage Patterns ◽

Usage Data

Abstract Introduction Improving positive airway pressure (PAP) adherence is crucial to obstructive sleep apnea (OSA) treatment success. We have previously shown the potential of utilizing Deep Neural Network (DNN) models to accurately predict future PAP usage, based on predefined compliance phenotypes, to enable early patient outreach and interventions. These phenotypes were limited, based solely on usage patterns. We propose an unsupervised learning methodology for redefining these adherence phenotypes in order to assist with the creation of more precise and personalized patient categorization. Methods We trained a DNN model to predict PAP compliance based on daily usage patterns, where compliance was defined as the requirement for 4 hours of PAP usage a night on over 70% of the recorded nights. The DNN model was trained on N=14,000 patients with 455 days of daily PAP usage data. The latent dimension of the trained DNN model was used as a feature vector containing rich usage pattern information content associated with overall PAP compliance. Along with the 455 days of daily PAP usage data, our dataset included additional patient demographics such as age, sex, apnea-hypopnea index, and BMI. These parameters, along with the extracted usage patterns, were applied together as inputs to an unsupervised clustering algorithm. The clusters that emerged from the algorithm were then used as indicators for new PAP compliance phenotypes. Results Two main clusters emerged: highly compliant and highly non-compliant. Furthermore, in the transition between the two main clusters, a sparse cluster of struggling patients emerged. This method allows for the continuous monitoring of patients as they transition from one cluster to the other. Conclusion In this research, we have shown that by utilizing historical PAP usage patterns along with additional patient information we can identify PAP specific adherence phenotypes. Clinically, this allows focus of PAP adherence program resources to be targeted early on to patients susceptible to treatment non-adherence. Furthermore, the transition between the two main phenotypes can also indicate when personalized intervention is necessary to maximize treatment success and outcomes. Lastly, providers can transition patients in the highly non-compliant group more quickly to alternative therapies. Support (if any):

Download Full-text

Why very good in India might be pretty good in North America

International Journal of Corpus Linguistics ◽

10.1075/ijcl.17063.wag ◽

2019 ◽

Vol 24 (4) ◽

pp. 445-489

Author(s):

Susanne Wagner

Keyword(s):

North America ◽

Southeast Asia ◽

North American ◽

Corpus Linguistics ◽

Indian Subcontinent ◽

World Englishes ◽

Web Based ◽

Variationist Sociolinguistics ◽

The Future ◽

Trajectories Of Change

Abstract Situated at the interface of several sub-disciplines (corpus linguistics, World Englishes, variationist sociolinguistics), this study investigates patterns of adjectival amplification (very good, so glad, pretty cool) in the Corpus of Global Web-Based English (GloWbE). It highlights regional distributions/preferences of amplifier-adjective 2-grams and the idiosyncratic status of certain bigrams according to their frequency status. Globally, clear regional preferences in amplification patterns as well as possible trends concerning change are identified. Regionally, L1 varieties contrast starkly with some regions (Africa, Indian subcontinent) but – maybe unexpectedly – not with others (Southeast Asia). The results offer insights into current trajectories of change concerning the investigated amplifiers in certain regions and 2-grams: North American varieties are leading a trend away from very towards so and possibly pretty in the future.

Download Full-text

0636 Using AI To Predict Future CPAP Adherence and the Impact of Behavioral and Technical Interventions

SLEEP ◽

10.1093/sleep/zsaa056.632 ◽

2020 ◽

Vol 43 (Supplement_1) ◽

pp. A243-A243

Author(s):

W Hevener ◽

B Beine ◽

J Woodruff ◽

D Munafo ◽

C Fernandez ◽

...

Keyword(s):

Effect Size ◽

Statistical Significance ◽

Post Intervention ◽

Cpap Adherence ◽

Cohen’S D ◽

Usage Patterns ◽

Patient Outreach ◽

Cohen's D ◽

The Impact ◽

Usage Data

Abstract Introduction Clinical management of CPAP adherence remains an ongoing challenge. Behavioral and technical interventions such as patient outreach, coaching, troubleshooting, and resupply may be deployed to positively impact adherence. Previous authors have described adherence phenotypes that retrospectively categorize patients by discrete usage patterns. We design an AI model that predictively categorizes patients into previously studied adherence phenotypes and analyzes the statistical significance and effect size of several types of interventions on subsequent CPAP adherence. Methods We collected a cross-sectional cohort of subjects (N = 13,917) with 455 days of daily CPAP usage data acquired. Patient outreach notes and resupply data were temporally synchronized with daily CPAP usage. Each 30-days of usage was categorized into one of four adherence phenotypes as defined by Aloia et al. (2008) including Good Users, Variable Users, Occasional Attempters, and Non-Users. Cross-validation was used to train and evaluate a Recurrent Neural Network model for predicting future adherence phenotypes based on the dynamics of prior usage patterns. Two-sided 95% bootstrap confidence intervals and Cohen’s d statistic were used to analyze the significance and effect size of changes in usage behavior 30-days before and after administration of several resupply interventions. Results The AI model predicted the next 30-day adherence phenotype with an average of 90% sensitivity, 96% specificity, 95% accuracy, and 0.83 Cohen’s Kappa. The AI model predicted the number of days of CPAP non-use, use under 4-hours, and use over 4-hours for the next 30-days with OLS Regression R-squared values of 0.94, 0.88, and 0.95 compared to ground truth. Ten resupply interventions were associated with statistically significant increases in adherence, and ranked by adherence effect size using Cohen’s d. The most impactful were new cushions or masks, with a mean post-intervention CPAP adherence increase of 7-14% observed in Variable User, Occasional Attempter, and Non-User groups. Conclusion The AI model applied past CPAP usage data to predict future adherence phenotypes and usage with high sensitivity and specificity. We identified resupply interventions that were associated with significant increases in adherence for struggling patients. This work demonstrates a novel application for AI to aid clinicians in maintaining CPAP adherence. Support

Download Full-text

Wireless water usage monitoring system for home / small premises

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v15.i2.pp704-713 ◽

2019 ◽

Vol 15 (2) ◽

pp. 704

Author(s):

W.L. Chee Wei ◽

A.S. Ab Ghafar ◽

N.N. Hairul Rozi ◽

F. A. Saparudin

Keyword(s):

Monitoring System ◽

Sensor Node ◽

Industrial Revolution ◽

Light Emitting Diode ◽

Graphical Form ◽

Sink Node ◽

Water Usage ◽

Usage Patterns ◽

Sd Card ◽

Usage Data

<p>The fourth Industrial Revolution has led to tremendous change in industrial automation. Measurement system can be seen as an important tool implemented in various fields because it enables us to access essential data from the environment or desired location. One of the essential measurement systems in industry, company or home is water usage monitoring. Water usage monitoring is the regular collection of information on the total amount of water drawn from sources during a given period. It enables a company or industry to understand water usage patterns and identify potential inefficiencies. For instance, a hotel premise who wants to monitor its water usage per room basis. Monitoring is also essential to set reduction targets of water used. The paper presents the development of wireless water usage monitoring system. This system consists of two nodes which are sensor node and sink node. The sensor node collects the water usage data and send them to the sink node. An ultrasonic sensor, Light-Emitting Diode (LED) and buzzer are attached to the sensor node as alert system for the user in case of water wastage occurrence. The sink node receives data from the sensor node wirelessly and mark this data time stamp by referring to a Real Time Clock (RTC) and store it in the database. The database is attached to sink node with Secure Digital (SD) card module. Furthermore, a Graphical User Interface (GUI) is used to display the water usage data in graphical form for easier user interpretation. The proposed wireless water usage monitoring system is suitable for home and small premises usage.</p>

Download Full-text

Zur linguistischen Sinnsuche in- und ausserhalb von schriftsprachlichen Korpora. Eine Replik auf Wolfgang Teuberts Beitrag

Linguistik Online ◽

10.13092/lo.28.611 ◽

2006 ◽

Vol 28 (3) ◽

Author(s):

Raphael Berthele

Keyword(s):

Corpus Linguistics ◽

Value Added ◽

Oral Communication ◽

Metalinguistic Awareness ◽

The World ◽

Usage Patterns ◽

Concept Of Word ◽

Comprehensive Study

In this reply to Wolfgang Teubert's contribution, some of the central tenets of his position are questioned. The primacy of literacy for linguistic investigations as advocated by Teubert is challenged, since the most important function of language is oral communication, and most of the world's languages exist exclusively in oral use. Counter-evidence to Teubert's claim according to which oral languages do not develop metalinguistic awareness (as e.g. the concept of "word") is given. The author scrutinizes the value added by Teubert's programmatic suggestion of "hermeneutic corpus linguistics" to the linguistic disciplines, since the sole focus on "discourse objects" unnecessarily narrows down the scope of linguistic investigation. Moreover, it seems to lead to backgrounding and neglect of what linguistics is all about: the comprehensive study of the structures, meaning potentials and usage patterns of the languages and varieties of the world, and the study of their usage-based emergence and evolution.

Download Full-text

MULTI-FACTORIAL PATTERNS OF ONLINE HOMEWORK USAGE IN ENGINEERING: A PILOT STUDY

Proceedings of the Canadian Engineering Education Association (CEEA) ◽

10.24908/pceea.vi0.14122 ◽

2020 ◽

Author(s):

Agnes D’Entremont ◽

Jonathan Verrett ◽

ShunFu Hu ◽

Juan Abelló ◽

Negar M. Harandi ◽

...

Keyword(s):

Pilot Study ◽

Mechanical Engineering ◽

Latent Profile Analysis ◽

Profile Analysis ◽

Online Homework ◽

Usage Patterns ◽

Second Year ◽

Usage Data ◽

Latent Profile

WeBWorK online homework usage data for a second-year, 130-student mechanical engineering course was analyzed using latent profile analysis (LPA) to identify student usage patterns and their relation to tests/exams grades. Ten WeBWorK usage variables were used by LPA to identify three distinct student sub-groups having particular usage patterns. The resulting three sub-groups were found to have statistically significant differences in tests/exam grades. Lower grades corresponded to fewer WeBWorK sessions and questions attempted, with a higher number of attempts and questions attempted per session; lower grades also corresponded to lower collaboration metrics and later first submissions of correct answers. These results might be used by instructors to inform and encourage online homework usage practices that are related to higher grades.

Download Full-text

Persuasionsstrategien in deutschen rechtsorientierten Zeitungen.

Linguistik Online ◽

10.13092/lo.97.5597 ◽

2019 ◽

Vol 97 (4) ◽

pp. 89-109

Author(s):

Carolina Flinz

Keyword(s):

Corpus Linguistics ◽

Language Usage ◽

Different Types ◽

Usage Patterns ◽

The One

Corpus Linguistics has often proved fruitful to examine different types of discourses, also the one of refugees. Aim of the paper is to show how language usage patterns can be focused on with the help of techniques grounded in Corpus Linguistics, giving information about themes and topoi. After showing what type of words (keywords, collocations) and what type of phenomena will be considered (topoi, metaphors and frames) in the article, the focus will shift on the methodology and the adopted criteria. After presenting the primary corpus (articles from right-oriented newspapers) and the comparison corpus (articles from Die Zeit) the main results of the analysis are presented and reflected on.

Download Full-text

The field-specific citation and usage patterns of book literature in the Book Citation Index

Research Evaluation ◽

10.1093/reseval/rvz037 ◽

2020 ◽

Vol 29 (2) ◽

pp. 203-214

Author(s):

Pei-Shan Chi

Keyword(s):

Social Sciences ◽

Citation Index ◽

Document Type ◽

Social Sciences And Humanities ◽

The Social ◽

Usage Patterns ◽

Data Source ◽

Social Sciences Citation Index ◽

Different Sources ◽

Usage Data

Abstract The usage data provided by Web of Science Core Collection (WoS) implies the scholarly interest of researchers through full text accesses and record saves on the platform. The WoS usage count has been studied for journal papers alongside citations at different levels of journal, country, and field. To extend the results of the previous studies, this study explores the WoS usage counts for book literature in the Book Citation Index (BKCI) to determine the usefulness of the usage statistics provided by the new data source and their different patterns across fields as well as document types. The correlations between WoS citations and usage counts are from weak to moderate in six selected fields. Edited books have stronger correlations between the two metrics than the other two document type groups. Usage data of aggregated book volumes in the sciences correlate with citations significantly and show higher utilization rates than citations. Their usage counts on the same platform are the supplement of WoS citations in the fields. In contrast, book publications in the social sciences and humanities (SSH) present a different pattern of their usage to reduce its ability to coordinate citations. In addition, the low usage of books in SSH may indicate the limited access of the BKCI-SSH and probably lower effectiveness of its usage data compared to the Social Sciences Citation Index (SSCI). However, the further investigation of altmetric usage metrics from different sources confirms an overall lower usage for books in the social sciences than in the sciences.

Download Full-text

Extracting Usage Patterns from Power Usage Data of Homes' Appliances in Smart Home using Big Data Platform

International Journal of Information Technology and Web Engineering ◽

10.4018/ijitwe.2016040103 ◽

2016 ◽

Vol 11 (2) ◽

pp. 39-50 ◽

Cited By ~ 10

Author(s):

Ali Reza Honarvar ◽

Ashkan Sami

Keyword(s):

Big Data ◽

Smart City ◽

Smart Home ◽

Pattern Mining ◽

Greenhouse Gas Emission ◽

Smart Cities ◽

Sequence Pattern ◽

Sequence Of Events ◽

Usage Patterns ◽

Usage Data

Advances in sensing techniques and IOT enabled the possibility to gain precise information about devices in smart home and smart city environments. Data analysis for sensors and devices may help us develop friendlier systems for smart city or smart home. Sequence pattern mining extracts interesting sequence pattern from data. Electricity usage dose follow a sequence of events. In this study the authors investigate this issue and extracted valuable sequence pattern from real appliances' power usage dataset using PrefixSpan. The experiments in this research is implemented on Spark as a novel distributed and parallel big data processing platform on two different clusters and interesting findings are obtained. These findings show the importance of extracting sequence pattern from power usage data to various applications such as decreasing CO2 and greenhouse gas emission by decreasing the electricity usage. The findings also show the needs to bring big data platforms to processing such kind of data which is captured in smart home and smart cities.

Download Full-text