A simple technique for improving the quality of parameter estimates in learning hierarchy validation studies

Psychometrika ◽  
1980 ◽  
Vol 45 (2) ◽  
pp. 269-271 ◽  
Author(s):  
Alan R. Barton
SLEEP ◽  
2020 ◽  
Author(s):  
Luca Menghini ◽  
Nicola Cellini ◽  
Aimee Goldstone ◽  
Fiona C Baker ◽  
Massimiliano de Zambotti

Abstract Sleep-tracking devices, particularly within the consumer sleep technology (CST) space, are increasingly used in both research and clinical settings, providing new opportunities for large-scale data collection in highly ecological conditions. Due to the fast pace of the CST industry combined with the lack of a standardized framework to evaluate the performance of sleep trackers, their accuracy and reliability in measuring sleep remains largely unknown. Here, we provide a step-by-step analytical framework for evaluating the performance of sleep trackers (including standard actigraphy), as compared with gold-standard polysomnography (PSG) or other reference methods. The analytical guidelines are based on recent recommendations for evaluating and using CST from our group and others (de Zambotti and colleagues; Depner and colleagues), and include raw data organization as well as critical analytical procedures, including discrepancy analysis, Bland–Altman plots, and epoch-by-epoch analysis. Analytical steps are accompanied by open-source R functions (depicted at https://sri-human-sleep.github.io/sleep-trackers-performance/AnalyticalPipeline_v1.0.0.html). In addition, an empirical sample dataset is used to describe and discuss the main outcomes of the proposed pipeline. The guidelines and the accompanying functions are aimed at standardizing the testing of CSTs performance, to not only increase the replicability of validation studies, but also to provide ready-to-use tools to researchers and clinicians. All in all, this work can help to increase the efficiency, interpretation, and quality of validation studies, and to improve the informed adoption of CST in research and clinical settings.


2019 ◽  
Vol 11 (3) ◽  
pp. 481-492 ◽  
Author(s):  
Amir Ghiasi ◽  
Grigorios Fountas ◽  
Panagiotis Anastasopoulos ◽  
Fred Mannering

Purpose Unlike many other quantitative characteristics used to determine higher education rankings, opinion-based peer assessment scores and the factors that may influence them are not well understood. Using peer scores of US colleges of engineering as reported annually in US News and World Report (USNews) rankings, the purpose of this paper is to provide some insights into peer assessments by statistically identifying factors that influence them. Design/methodology/approach With highly detailed data, a random parameters linear regression is estimated to statistically identify the factors determining a college of engineering’s average USNews peer assessment score. Findings The findings show that a wide variety of college- and university-specific attributes influence average peer impressions of a university’s college of engineering including the size of the faculty, the quality of admitted students and the quality of the faculty measured by their citation data and other factors. Originality/value The paper demonstrates that average peer assessment scores can be readily and accurately predicted with observable data on the college of engineering and the university as a whole. In addition, the individual parameter estimates from the statistical modeling in this paper provide insights as to how specific college and university attributes can help guide policies to improve an individual college’s average peer assessment scores and its overall ranking.


2020 ◽  
pp. 1-2
Author(s):  
A. Geerinck ◽  
M. Locquet ◽  
J.-Y. Reginster

The Sarcopenia Quality of Life (SarQoL®) questionnaire was developed in 2015 to fill the need for a specific instrument to measure quality of life in sarcopenia. Since then, its validity and reliability have been evaluated in multiple languages, and it is now available in 30 language-specific versions. In multiple validation studies, the SarQoL® has demonstrated its ability to discriminate between sarcopenic and non-sarcopenic subjects when diagnosed according to the EWGSOP criteria (1). However, these criteria have now been updated, and the discriminative power of the SarQoL® questionnaire should be reaffirmed using the EWGSOP2 criteria (2). The analysis presented below aims to establish whether the SarQoL® questionnaire can discriminate between sarcopenic, probably sarcopenic (low grip strength in the EWGSOP2 algorithm) and non-sarcopenic participants.


2020 ◽  
Vol 102-B (12) ◽  
pp. 1599-1607
Author(s):  
Ben A. Marson ◽  
Simon Craxford ◽  
Sandeep R. Deshmukh ◽  
Douglas J. C. Grindlay ◽  
Joseph C. Manning ◽  
...  

Aims This study evaluates the quality of patient-reported outcome measures (PROMs) reported in childhood fracture trials and recommends outcome measures to assess and report physical function, functional capacity, and quality of life using the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) standards. Methods A Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA)-compliant systematic review of OVID Medline, Embase, and Cochrane CENTRAL was performed to identify all PROMs reported in trials. A search of OVID Medline, Embase, and PsycINFO was performed to identify all PROMs with validation studies in childhood fractures. Development studies were identified through hand-searching. Data extraction was undertaken by two reviewers. Study quality and risk of bias was evaluated by COSMIN guidelines and recorded on standardized checklists. Results Searches yielded 13,672 studies, which were screened to identify 124 trials and two validation studies. Review of the 124 trials identified 16 reported PROMs, of which two had validation studies. The development papers were retrieved for all PROMs. The quality of the original development studies was adequate for Patient-Reported Outcomes Measurement Information System (PROMIS) Mobility and Upper Extremity and doubtful for the EuroQol Five Dimension Youth questionnaire (EQ-5D-Y). All other PROMs were found to have inadequate development studies. No content validity studies were identified. Reviewer-rated content validity was acceptable for six PROMs: Activity Scale for Kids (ASK), Childhood Health Assessment Questionnaire, PROMIS Upper Extremity, PROMIS Mobility, EQ-5D-Y, and Pediatric Quality of Life Inventory (PedsQL4.0). The Modified Disabilities of the Arm, Shoulder, and Hand (DASH) questionnaire was shown to have indeterminate reliability and convergence validity in one study and PROMIS Upper Extremity had insufficient convergence validity in one study. Conclusion There is insufficient evidence to recommend strongly the use of any single PROM to assess and report physical function or quality of life following childhood fractures. There is a need to conduct validation studies for PROMs. In the absence of these studies, we cautiously recommend the use of the PROMIS or ASK-P for physical function and the PedsQL4.0 or EQ-5D-Y for quality of life. Cite this article: Bone Joint J 2020;102-B(12):1599–1607.


Author(s):  
Angela Lisibach ◽  
Valérie Benelli ◽  
Marco Giacomo Ceppi ◽  
Karin Waldner-Knogler ◽  
Chantal Csajka ◽  
...  

Abstract Purpose Older people are at risk of anticholinergic side effects due to changes affecting drug elimination and higher sensitivity to drug’s side effects. Anticholinergic burden scales (ABS) were developed to quantify the anticholinergic drug burden (ADB). We aim to identify all published ABS, to compare them systematically and to evaluate their associations with clinical outcomes. Methods We conducted a literature search in MEDLINE and EMBASE to identify all published ABS and a Web of Science citation (WoS) analysis to track validation studies implying clinical outcomes. Quality of the ABS was assessed using an adapted AGREE II tool. For the validation studies, we used the Newcastle-Ottawa Scale and the Cochrane tool Rob2.0. The validation studies were categorized into six evidence levels based on the propositions of the Oxford Center for Evidence-Based Medicine with respect to their quality. At least two researchers independently performed screening and quality assessments. Results Out of 1297 records, we identified 19 ABS and 104 validations studies. Despite differences in quality, all ABS were recommended for use. The anticholinergic cognitive burden (ACB) scale and the German anticholinergic burden scale (GABS) achieved the highest percentage in quality. Most ABS are validated, yet validation studies for newer scales are lacking. Only two studies compared eight ABS simultaneously. The four most investigated clinical outcomes delirium, cognition, mortality and falls showed contradicting results. Conclusion There is need for good quality validation studies comparing multiple scales to define the best scale and to conduct a meta-analysis for the assessment of their clinical impact.


2017 ◽  
Vol 35 (4) ◽  
pp. 477-499 ◽  
Author(s):  
Ute Knoch ◽  
Carol A. Chapelle

Argument-based validation requires test developers and researchers to specify what is entailed in test interpretation and use. Doing so has been shown to yield advantages (Chapelle, Enright, & Jamieson, 2010), but it also requires an analysis of how the concerns of language testers can be conceptualized in the terms used to construct a validity argument. This article presents one such analysis by examining how issues associated with the rating of test takers’ linguistic performance can be included in a validity argument. Through a manual search of published language testing research, we gathered examples of research studies investigating the quality of rating processes and products. We then analyzed them in terms of how the research could be framed within a validity argument. Drawing on Kane’s (2001, 2006, 2013) conceptualization of inferences, warrants, and assumptions, we show that the relevance of research about the rating of test performances extends beyond one or two inferences about rater reliability. Such research results, for example, provide backing for assumptions about the correspondence of the rating scale to the test construct (explanation inference) and the context of extrapolation as well as the decisions made based on the ratings and their consequences. Our analysis reveals a picture of the extensive reach of the rating process into many aspects of test score meaning as well as concrete suggestions for integrating rating issues into future argument-based validation studies.


Sign in / Sign up

Export Citation Format

Share Document