2469

Ram Gouripeddi; Mollie Cummins; Randy Madsen; Bernie LaSalle; Andrew Middleton Redd; Angela Paige Presson; Xiangyang Ye; Julio C. Facelli; Tom Green; Steve Harper

doi:10.1017/cts.2017.78

2469

Journal of Clinical and Translational Science ◽

10.1017/cts.2017.78 ◽

2017 ◽

Vol 1 (S1) ◽

pp. 18-19

Author(s):

Ram Gouripeddi ◽

Mollie Cummins ◽

Randy Madsen ◽

Bernie LaSalle ◽

Andrew Middleton Redd ◽

...

Keyword(s):

Statistical Analysis ◽

Statistical Methods ◽

Study Design ◽

Workflow Management ◽

Final Report ◽

Data Intensive ◽

Study Protocols ◽

Information Models ◽

Quality Checks ◽

Project Data

OBJECTIVES/SPECIFIC AIMS: Key factors causing irreproducibility of research include those related to inappropriate study design methodologies and statistical analysis. In modern statistical practice irreproducibility could arise due to statistical (false discoveries, p-hacking, overuse/misuse of p-values, low power, poor experimental design) and computational (data, code and software management) issues. These require understanding the processes and workflows practiced by an organization, and the development and use of metrics to quantify reproducibility. METHODS/STUDY POPULATION: Within the Foundation of Discovery – Population Health Research, Center for Clinical and Translational Science, University of Utah, we are undertaking a project to streamline the study design and statistical analysis workflows and processes. As a first step we met with key stakeholders to understand the current practices by eliciting example statistical projects, and then developed process information models for different types of statistical needs using Lucidchart. We then reviewed these with the Foundation’s leadership and the Standards Committee to come up with ideal workflows and model, and defined key measurement points (such as those around study design, analysis plan, final report, requirements for quality checks, and double coding) for assessing reproducibility. As next steps we are using our finding to embed analytical and infrastructural approaches within the statisticians’ workflows. This will include data and code dissemination platforms such as Box, Bitbucket, and GitHub, documentation platforms such as Confluence, and workflow tracking platforms such as Jira. These tools will simplify and automate the capture of communications as a statistician work through a project. Data-intensive process will use process-workflow management platforms such as Activiti, Pegasus, and Taverna. RESULTS/ANTICIPATED RESULTS: These strategies for sharing and publishing study protocols, data, code, and results across the spectrum, active collaboration with the research team, automation of key steps, along with decision support. DISCUSSION/SIGNIFICANCE OF IMPACT: This analysis of statistical methods and process and computational methods to automate them ensure quality of statistical methods and reproducibility of research.

Download Full-text

DEVELOPMENT OF A METHOD FOR DESIGNING INFORMATION SYSTEMS FOR ENERGY COMPANIES

NEWS OF THE NATIONAL ACADEMY OF SCIENCES OF THE REPUBLIC OF KAZAKHSTAN ◽

10.32014/2020.2518-1726.82 ◽

2020 ◽

Vol 5 (333) ◽

pp. 53-57

Author(s):

T.B. Aldongar ◽

◽

F.U. Malikova ◽

G.B. Issayeva ◽

B.R. Absatarova ◽

...

Keyword(s):

Information Systems ◽

Business Processes ◽

Design Research ◽

Graphical Model ◽

Formal Model ◽

Workflow Management ◽

Research Process ◽

Information Models ◽

The Creation ◽

Concept Of Information

The creation of information models requires the use of known methods and the development of new methods of formalizing the pre-design research process. The modeling process consists of four stages: data collection on the object of management - pre-project research; creation of a graphical model of business processes taking place in the enterprise; development of a formal model of business processes; business research by optimizing the formal model. To support the creation of workflow management services and systems, the complex offers methodologies, standards and specialized software that make up the developer's tools. This can be ensured only by modern automated methods based on information systems. It is important that the information collected is structured to meet the needs of potential users and stored in a form that allows the use of modern access technologies. Before discussing the effectiveness of FIM, it should be noted that the basic concept of information itself is still not the same. In a pragmatic way, it is a set of messages in the form of an important document for the system. Information can be evaluated not only by volume, but also by various parameters, the most important of which are: timeliness, relevance, value, aging, accuracy, etc. in addition, the information may be clear, probable and accurate. The methods of its reception and processing are different in each case.

Download Full-text

Variability of Prognostic Results Based on Biological Parameters in Sickle Cell Disease Cohort Studies in Children: What Should Clinicians Know?

Children ◽

10.3390/children8020143 ◽

2021 ◽

Vol 8 (2) ◽

pp. 143

Author(s):

Julie Sommet ◽

Enora Le Roux ◽

Bérengère Koehl ◽

Zinedine Haouari ◽

Damir Mohamed ◽

...

Keyword(s):

Statistical Analysis ◽

Sickle Cell Disease ◽

Cohort Studies ◽

Statistical Methods ◽

Sickle Cell ◽

Biological Parameters ◽

Analysis Method ◽

Cell Disease ◽

Statistical Analysis Method

Background: Many pediatric studies describe the association between biological parameters (BP) and severity of sickle cell disease (SCD) using different methods to collect or to analyze BP. This article assesses the methods used for collection and subsequent statistical analysis of BP, and how these impact prognostic results in SCD children cohort studies. Methods: Firstly, we identified the collection and statistical methods used in published SCD cohort studies. Secondly, these methods were applied to our cohort of 375 SCD children, to evaluate the association of BP with cerebral vasculopathy (CV). Results: In 16 cohort studies, BP were collected either once or several times during follow-up. The identified methods in the statistical analysis were: (1) one baseline value per patient (2) last known value; (3) mean of all values; (4) modelling of all values in a two-stage approach. Applying these four different statistical methods to our cohort, the results and interpretation of the association between BP and CV were different depending on the method used. Conclusion: The BP prognostic value depends on the chosen statistical analysis method. Appropriate statistical analyses of prognostic factors in cohort studies should be considered and should enable valuable and reproducible conclusions.

Download Full-text

Study design, result reporting and publication of late-stage cardiovascular trials

European Heart Journal ◽

10.1093/ehjci/ehaa946.3568 ◽

2020 ◽

Vol 41 (Supplement_2) ◽

Author(s):

C Kapelios ◽

H Naci ◽

P Vardas ◽

E Mossialos

Keyword(s):

Study Design ◽

Late Stage ◽

Late Phase ◽

Phase Iii ◽

Open Label ◽

Explanatory Factors ◽

Study Protocols ◽

Study Results ◽

Artery Disease ◽

Design Result

Abstract Introduction Preregistration of study protocols in publicly accessible databases is required for publication of study results in high-impact medical journals. Nonetheless, data on the characteristics of clinical trials registered in these databases and their outcome, in terms of result reporting and publication are limited. Methods The purpose of this study was to perform a comprehensive analysis of the characteristics of late-stage, cardiovascular disease (CVD) trials registered in Clinicaltrials.gov. We searched for interventional, late-phase (annotated as phase III) CVD studies in adults first posted after 1/1/2013 and completed up to 31/12/2018. Data on study design, result reporting, result spinning and publication were collected, and potential associations with a pre-defined set of explanatory factors were examined. Results The search yielded 352 studies. One hundred were excluded from further analysis because they were misclassified as CVD studies, while 2 were excluded as duplicate entries. In total, 250 CVD trials were included in the analysis. The most commonly studied fields were hypertension, coronary artery disease and heart failure. Of these, 193 (77.2%) were randomized studies, 99 (39.6%) open label designs, and 126 (50.4%) had industry as main sponsor. 179 trials (71.6%) evaluated the effect of drugs and 27 (10.8%) evaluated devices. Industry-funded trials focused on patent-protected drugs and devices more often than non-industry-funded trials (72.0% vs. 30.6%, P<0.001 and 55.0% vs. 26.3%, P=0.033, respectively). Sixty three studies (25.2%) had results posted on clinicaltrials.gov, and 116 (46.4%) had results published in the scientific literature. No clear indication of result spinning was found in 96 (85%) of published studies. In multivariate analysis, industry sponsorship was statistically significantly associated with results posting (OR: 3.56; 95% CI:1.67–7.60, P=0.001) and publication (OR: 0.41; 95% CI:0.23–0.75, P=0.004). Results spinning was associated with confirmation of the primary hypothesis (OR: 0.23; 95% CI: 0.07–0.75, P=0.015) and results posting (OR: 0.08; 95% CI: 0.01–0.65, P=0.018). Conclusions Among late-stage cardiovascular trials only 1/4 had their results posted on clinicaltrials.gov and less than half had results published. Industry sponsors were more likely to invest in research on patent-protected drugs and devices than were non-industry sponsors. Having industry as a sponsor was independently associated with increased likelihood of results posting, but decreased likelihood of results publication. Results reporting was significantly associated with lower risk of results spinning. Funding Acknowledgement Type of funding source: None

Download Full-text

Evaluation software for effects produced by MOOC in mediums with different linguistically levels

Digital Scholarship in the Humanities ◽

10.1093/llc/fqy029 ◽

2018 ◽

Vol 34 (3) ◽

pp. 633-645

Author(s):

Cornel Samoilă ◽

Doru Ursuţiu ◽

Vlad Jinga

Keyword(s):

Statistical Analysis ◽

Statistical Methods ◽

Rapid Assessment ◽

Assessment Method ◽

Behavioral Changes ◽

Strong Negation ◽

Face To Face ◽

Made In ◽

The Way

Abstract MOOC appearance has produced, in a first phase, more discussions than contributions. Despite pessimistic opinions or those catastrophic foreseeing the end of the classic education by accepting MOOC, the authors consider that, as it is happening in all situations when a field is reformed, instead of criticism or catastrophic predictions, an assessment should be simply made. MOOC will not be better or worse if it is discussed and dissected but can be tested in action, perfected by results, or abandoned if it has no prospects. Without testing, no decision is valid. A similarity between the MOOC appearance and the appearance of the idea of flying machines heavier than air can be made. In the flight case, the first reaction was a strong negation (including at Academies level) and only performing the first independent flight with an apparatus heavier than air has shifted orientation from denial to contributions. So, practical tests clarified the battle between ideas. The authors of this article encourage the idea of testing–assessment and, therefore, imagined and proposed one software for quickly assess whether MOOC produces changes in knowledge, by simply transferring courses from ‘face-to-face’ environment into the virtual one. Among the methods of statistical analysis for student behavioral changes was chosen the Keppel method. It underpins the assessment method of this work being approached using both the version with one variable and also with three variables. It is intended that this attempts to pave the way for other series of rapid assessment regarding MOOC effects (using other statistical methods). We believe, that this is the only approach that can lead either to improve the system or to renunciation.

Download Full-text

Recent and Planned Developments of the Program OxCal

Radiocarbon ◽

10.1017/s0033822200057878 ◽

2013 ◽

Vol 55 (2) ◽

pp. 720-730 ◽

Cited By ~ 641

Author(s):

Christopher Bronk Ramsey ◽

Sharen Lee

Keyword(s):

Statistical Analysis ◽

Statistical Methods ◽

Software Package ◽

Model Averaging ◽

Data Sets ◽

Radiocarbon Dates ◽

New Approach ◽

New Models ◽

Multiphase Models ◽

Deposition Models

OxCal is a widely used software package for the calibration of radiocarbon dates and the statistical analysis of 14C and other chronological information. The program aims to make statistical methods easily available to researchers and students working in a range of different disciplines. This paper will look at the recent and planned developments of the package. The recent additions to the statistical methods are primarily aimed at providing more robust models, in particular through model averaging for deposition models and through different multiphase models. The paper will look at how these new models have been implemented and explore the implications for researchers who might benefit from their use. In addition, a new approach to the evaluation of marine reservoir offsets will be presented. As the quantity and complexity of chronological data increase, it is also important to have efficient methods for the visualization of such extensive data sets and methods for the presentation of spatial and geographical data embedded within planned future versions of OxCal will also be discussed.

Download Full-text

Statistical Analysis of Weather Effects on PM2.5

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.610-613.1033 ◽

2012 ◽

Vol 610-613 ◽

pp. 1033-1040

Author(s):

Wei Dai ◽

Jia Qi Gao ◽

Bo Wang ◽

Feng Ouyang

Keyword(s):

Statistical Analysis ◽

Wind Speed ◽

Relative Humidity ◽

Statistical Methods ◽

Predictive Analytics ◽

Hypothesis Test ◽

Meteorological Data ◽

Weather Conditions ◽

Statistical Distribution ◽

Weather Effects

Effects of weather conditions including temperature, relative humidity, wind speed, wind and direction on PM2.5 were studied using statistical methods. PM2.5 samples were collected during the summer and the winter in a suburb of Shenzhen. Then, correlations, hypothesis test and statistical distribution of PM2.5 and meteorological data were analyzed with IBM SPSS predictive analytics software. Seasonal and daily variations of PM2.5 have been found and these mainly resulted from the weather effects.

Download Full-text

An Exploratory Typology of Near-Model and Non-Standard Tiberian Torah Manuscripts from the Cairo Genizah

Semitic Languages and Cultures - Studies in Semitic Vocalisation and Reading Traditions ◽

10.11647/obp.0207.12 ◽

2020 ◽

pp. 467-548

Author(s):

Estara Arrant

Keyword(s):

Statistical Analysis ◽

Statistical Methods ◽

Cairo Genizah ◽

Scholarly Attention

Estara Arrant examines categories of Torah codices from the Cairo Genizah that have not been afforded sufficient scholarly attention, namely ‘near-model’ codices, a term coined by Arrant. The study analyses almost three hundred fragments by means of a methodology based on statistical analysis. The study shows how statistical methods can be employed to reveal sub-types of Torah fragments that share linguistic and codicological features.

Download Full-text

Statistical analysis of cytogenetic data

Faktori eksperimental'noi evolucii organizmiv ◽

10.7124/feeo.v28.1391 ◽

2021 ◽

Vol 28 ◽

pp. 146-150

Author(s):

L. A. Atramentova

Keyword(s):

Data Structure ◽

Statistical Analysis ◽

Confidence Interval ◽

Statistical Method ◽

Confidence Intervals ◽

Study Design ◽

Cytogenetic Study ◽

Data Set ◽

Statistical Errors ◽

Cytogenetic Data

Using the data obtained in a cytogenetic study as an example, we consider the typical errors that are made when performing statistical analysis. Widespread but flawed statistical analysis inevitably produces biased results and increases the likelihood of incorrect scientific conclusions. Errors occur due to not taking into account the study design and the structure of the analyzed data. The article shows how the numerical imbalance of the data set leads to a bias in the result. Using a dataset as an example, it explains how to balance the complex. It shows the advantage of presenting sample indicators with confidence intervals instead of statistical errors. Attention is drawn to the need to take into account the size of the analyzed shares when choosing a statistical method. It shows how the same data set can be analyzed in different ways depending on the purpose of the study. The algorithm of correct statistical analysis and the form of the tabular presentation of the results are described. Keywords: data structure, numerically unbalanced complex, confidence interval.

Download Full-text

Service Assessment Planning for the Hunt Library Dataspace

10.31229/osf.io/t4vek ◽

2019 ◽

Author(s):

Mia Partlow ◽

Karen Ciccone ◽

Margaret Peak

Keyword(s):

North Carolina ◽

Statistical Analysis ◽

Data Science ◽

Planning Process ◽

Service Development ◽

State University ◽

University Libraries ◽

New Service Development ◽

Data Intensive ◽

Critical Data

Presentation given at TRLN Annual Meeting, Durham, North Carolina, July 1, 2019. The Hunt Library Dataspace was launched in August 2018 to provide students with access to the tools and support they need to develop critical data skills and perform data intensive tasks. It is outfitted with specialized computing hardware and software and staffed by graduate student Data Science Consultants who provide drop-in support for programming, data analysis, statistical analysis, visualization, and other data-related topics.Prior to launching the Dataspace the Libraries’ Director of Planning and Research worked with the Data & Visualization Services department to develop a plan for assessing the new Dataspace services. The process began with identifying relevant goals based on NC State University and the NC State University Libraries’ strategic priorities. Next we identified measures that would assess our success in relation to those goals. This talk describes the assessment planning process, the measures and methods employed, outcomes, and how this information will be used to improve our services and inform new service development.

Download Full-text

Discovering and Summarizing Relationships Between Chemicals, Genes, Proteins, and Diseases in PubChem

Frontiers in Research Metrics and Analytics ◽

10.3389/frma.2021.689059 ◽

2021 ◽

Vol 6 ◽

Author(s):

Leonid Zaslavsky ◽

Tiejun Cheng ◽

Asta Gindulyte ◽

Siqian He ◽

Sunghwan Kim ◽

...

Keyword(s):

Statistical Analysis ◽

Statistical Methods ◽

Information Resources ◽

Biomedical Literature ◽

Compact Form ◽

Medical Subject Headings ◽

Named Entities ◽

Pubchem Compound ◽

Subject Headings

The literature knowledge panels developed and implemented in PubChem are described. These help to uncover and summarize important relationships between chemicals, genes, proteins, and diseases by analyzing co-occurrences of terms in biomedical literature abstracts. Named entities in PubMed records are matched with chemical names in PubChem, disease names in Medical Subject Headings (MeSH), and gene/protein names in popular gene/protein information resources, and the most closely related entities are identified using statistical analysis and relevance-based sampling. Knowledge panels for the co-occurrence of chemical, disease, and gene/protein entities are included in PubChem Compound, Protein, and Gene pages, summarizing these in a compact form. Statistical methods for removing redundancy and estimating relevance scores are discussed, along with benefits and pitfalls of relying on automated (i.e., not human-curated) methods operating on data from multiple heterogeneous sources.

Download Full-text