Clinical Research Informatics: Contributions from 2016

C. Daniel; R. Choquet

doi:10.15265/iy-2017-024

Clinical Research Informatics: Contributions from 2016

Yearbook of Medical Informatics ◽

10.1055/s-0037-1606504 ◽

2017 ◽

Vol 26 (01) ◽

pp. 209-211

Author(s):

C. Daniel ◽

R. Choquet

Keyword(s):

Machine Learning ◽

Clinical Research ◽

Scientific Data ◽

Machine Learning Techniques ◽

Editorial Team ◽

Real World Data ◽

Double Blind ◽

Patient Reported ◽

Clinical Research Informatics ◽

Privacy Breaches

Summary Objectives: To summarize key contributions to current research in the field of Clinical Research Informatics (CRI) and to select the best papers published in 2016. Methods: A bibliographic search using a combination of MeSH and free terms on CRI was performed using PubMed, followed by a double-blind review in order to select a list of candidate best papers to be then peer-reviewed by external reviewers. A consensus meeting between the two section editors and the editorial team was organized to finally conclude on the selection of best papers. Results: Among the 452 papers published in 2016 in the various areas of CRI and returned by the query, the full review process selected four best papers. The authors of the first paper utilized a comprehensive representation of the patient medical record and semi-automatically labeled training sets to create phenotype models via a machine learning process. The second selected paper describes an open source tool chain securely connecting ResearchKit compatible applications (Apps) to the widely-used clinical research infrastructure Informatics for Integrating Biology and the Bedside (i2b2). The third selected paper describes the FAIR Guiding Principles for scientific data management and stewardship. The fourth selected paper focuses on the evaluation of the risk of privacy breaches in releasing genomics datasets. Conclusions: A major trend in the 2016 publications is the variety of research on “real-world data” - healthcare-generated data, person health data, and patient-reported outcomes -highlighting the opportunities provided by new machine learning techniques as well as new potential risks of privacy breaches.

Download Full-text

Clinical Research Informatics Contributions from 2015

Yearbook of Medical Informatics ◽

10.15265/iy-2016-044 ◽

2016 ◽

Vol 25 (01) ◽

pp. 219-223

Author(s):

R. Choquet ◽

C. Daniel ◽

Keyword(s):

Clinical Research ◽

Data Privacy ◽

Design Stage ◽

Editorial Team ◽

Legal Requirements ◽

Double Blind ◽

Blind Review ◽

Healthcare Data ◽

Clinical Research Informatics ◽

Electronic Health

Summary Objectives: To summarize key contributions to current research in the field of Clinical Research Informatics (CRI) and to select best papers published in 2015. Method: A bibliographic search using a combination of MeSH and free terms search over PubMed on Clinical Research Informatics (CRI) was performed followed by a double-blind review in order to select a list of candidate best papers to be then peer-reviewed by external reviewers. A consensus meeting between the two section editors and the editorial team was finally organized to conclude on the selection of best papers. Results: Among the 579 returned papers published in the past year in the various areas of Clinical Research Informatics (CRI) - i) methods supporting clinical research, ii) data sharing and interoperability, iii) re-use of healthcare data for research, iv) patient recruitment and engagement, v) data privacy, security and regulatory issues and vi) policy and perspectives - the full review process selected four best papers. The first selected paper evaluates the capability of the Clinical Data Interchange Standards Consortium (CDISC) Operational Data Model (ODM) to support the representation of case report forms (in both the design stage and with patient level data) during a complete clinical study lifecycle. The second selected paper describes a prototype for secondary use of electronic health records data captured in non-standardized text. The third selected paper presents a privacy preserving electronic health record linkage tool and the last selected paper describes how big data use in US relies on access to health information governed by varying and often misunderstood legal requirements and ethical considerations. Conclusions: A major trend in the 2015 publications is the analysis of observational, “nonexperimental” information and the potential biases and confounding factors hidden in the data that will have to be carefully taken into account to validate new predictive models. In addiction, researchers have to understand complicated and sometimes contradictory legal requirements and to consider ethical obligations in order to balance privacy and promoting discovery.

Download Full-text

Clinical Research Informatics

Yearbook of Medical Informatics ◽

10.1055/s-0040-1702007 ◽

2020 ◽

Vol 29 (01) ◽

pp. 203-207

Author(s):

Christel Daniel ◽

Dipak Kalra ◽

Keyword(s):

Clinical Research ◽

Real World ◽

Reporting System ◽

Quality Data ◽

Editorial Team ◽

Common Data Model ◽

Free Text ◽

Real World Data ◽

World Data ◽

Clinical Research Informatics

Objectives: To summarize key contributions to current research in the field of Clinical Research Informatics (CRI) and to select best papers published in 2019. Method: A bibliographic search using a combination of MeSH descriptors and free-text terms on CRI was performed using PubMed, followed by a double-blind review in order to select a list of candidate best papers to be then peer-reviewed by external reviewers. After peer-review ranking, a consensus meeting between the two section editors and the editorial team was organized to finally conclude on the selected three best papers. Results: Among the 517 papers, published in 2019, returned by the search, that were in the scope of the various areas of CRI, the full review process selected three best papers. The first best paper describes the use of a homomorphic encryption technique to enable federated analysis of real-world data while complying more easily with data protection requirements. The authors of the second best paper demonstrate the evidence value of federated data networks reporting a large real world data study related to the first line treatment for hypertension. The third best paper reports the migration of the US Food and Drug Administration (FDA) adverse event reporting system database to the OMOP common data model. This work opens the combined analysis of both spontaneous reporting system and electronic health record (EHR) data for pharmacovigilance. Conclusions: The most significant research efforts in the CRI field are currently focusing on real world evidence generation and especially the reuse of EHR data. With the progress achieved this year in the areas of phenotyping, data integration, semantic interoperability, and data quality assessment, real world data is becoming more accessible and reusable. High quality data sets are key assets not only for large scale observational studies or for changing the way clinical trials are conducted but also for developing or evaluating artificial intelligence algorithms guiding clinical decision for more personalized care. And lastly, security and confidentiality, ethical and regulatory issues, and more generally speaking data governance are still active research areas this year.

Download Full-text

Improving Reliability Estimation for Individual Numeric Predictions: A Machine Learning Approach

INFORMS Journal on Computing ◽

10.1287/ijoc.2020.1019 ◽

2021 ◽

Author(s):

Gediminas Adomavicius ◽

Yaqiong Wang

Keyword(s):

Machine Learning ◽

General Purpose ◽

Reliability Estimation ◽

Machine Learning Techniques ◽

Data Sets ◽

Real World Data ◽

Learning Techniques ◽

Reliability Indicator ◽

Machine Learning Approach ◽

Prediction Reliability

Numerical predictive modeling is widely used in different application domains. Although many modeling techniques have been proposed, and a number of different aggregate accuracy metrics exist for evaluating the overall performance of predictive models, other important aspects, such as the reliability (or confidence and uncertainty) of individual predictions, have been underexplored. We propose to use estimated absolute prediction error as the indicator of individual prediction reliability, which has the benefits of being intuitive and providing highly interpretable information to decision makers, as well as allowing for more precise evaluation of reliability estimation quality. As importantly, the proposed reliability indicator allows the reframing of reliability estimation itself as a canonical numeric prediction problem, which makes the proposed approach general-purpose (i.e., it can work in conjunction with any outcome prediction model), alleviates the need for distributional assumptions, and enables the use of advanced, state-of-the-art machine learning techniques to learn individual prediction reliability patterns directly from data. Extensive experimental results on multiple real-world data sets show that the proposed machine learning-based approach can significantly improve individual prediction reliability estimation as compared with a number of baselines from prior work, especially in more complex predictive scenarios.

Download Full-text

Clinical Research Informatics: Contributions from 2017

Yearbook of Medical Informatics ◽

10.1055/s-0038-1641220 ◽

2018 ◽

Vol 27 (01) ◽

pp. 177-183 ◽

Cited By ~ 1

Author(s):

Christel Daniel ◽

Dipak Kalra ◽

Keyword(s):

Electronic Health Records ◽

Clinical Research ◽

Association Studies ◽

Bias Reduction ◽

Lessons Learned ◽

Editorial Team ◽

Private Industry ◽

Health Records ◽

Clinical Research Informatics ◽

Electronic Health

Objectives: To summarize key contributions to current research in the field of Clinical Research Informatics (CRI) and to select best papers published in 2017. Method: A bibliographic search using a combination of MeSH descriptors and free terms on CRI was performed using PubMed, followed by a double-blind review in order to select a list of candidate best papers to be then peer-reviewed by external reviewers. A consensus meeting between the two section editors and the editorial team was organized to finally conclude on the selection of best papers. Results: Among the 741 returned papers published in 2017 in the various areas of CRI, the full review process selected five best papers. The first best paper reports on the implementation of consent management considering patient preferences for the use of de-identified data of electronic health records for research. The second best paper describes an approach using natural language processing to extract symptoms of severe mental illness from clinical text. The authors of the third best paper describe the challenges and lessons learned when leveraging the EHR4CR platform to support patient inclusion in academic studies in the context of an important collaboration between private industry and public health institutions. The fourth best paper describes a method and an interactive tool for case-crossover analyses of electronic medical records for patient safety. The last best paper proposes a new method for bias reduction in association studies using electronic health records data. Conclusions: Research in the CRI field continues to accelerate and to mature, leading to tools and platforms deployed at national or international scales with encouraging results. Beyond securing these new platforms for exploiting large-scale health data, another major challenge is the limitation of biases related to the use of “real-world” data. Controlling these biases is a prerequisite for the development of learning health systems.

Download Full-text

A Precipitation Nowcasting Mechanism for Real-World Data Based on Machine Learning

Mathematical Problems in Engineering ◽

10.1155/2020/8408931 ◽

2020 ◽

Vol 2020 ◽

pp. 1-11

Author(s):

Yanfei Xiang ◽

Jianbing Ma ◽

Xi Wu

Keyword(s):

Machine Learning ◽

Optical Flow ◽

Weather Prediction ◽

Radar Data ◽

Machine Learning Techniques ◽

Economic Losses ◽

Model Parameters ◽

Real World Data ◽

Flow Method ◽

Optical Flow Method

Unpredicted precipitations, even mild, may cause severe economic losses to many businesses. Precipitation nowcasting is hence significant for people to make correct decisions timely. For traditional methods, such as numerical weather prediction (NWP), the accuracy is limited because the smaller scale of strong convective weather must be smaller than the minimum scale that the model can capture. And it often requires a supercomputer. Furthermore, the optical flow method has been proved to be available for precipitation nowcasting. However, it is difficult to determine the model parameters because the two steps of tracking and extrapolation are separate. In contrast, current machine learning applications are based on well-selected full datasets, ignoring the fact that real datasets quite often contain missing data requiring extra consideration. In this paper, we used a real Hubei dataset in which a few radar echo data are missing and proposed a proper mechanism to deal with the situation. Furthermore, we proposed a novel mechanism for radar reflectivity data with single altitudes or cumulative altitudes using machine learning techniques. From the experimental results, we conclude that our method can predict future precipitation with a high accuracy when a few data are missing, and it outperforms the traditional optical flow method. In addition, our model can be used for various types of radar data with a type-specific feature extraction, which makes the method more flexible and suitable for most situations.

Download Full-text

Malicious web domain identification using online credibility and performance data by considering the class imbalance issue

Industrial Management & Data Systems ◽

10.1108/imds-02-2018-0072 ◽

2019 ◽

Vol 119 (3) ◽

pp. 676-696 ◽

Cited By ~ 5

Author(s):

Zhongyi Hu ◽

Raymond Chiong ◽

Ilung Pranata ◽

Yukun Bao ◽

Yuqing Lin

Keyword(s):

Machine Learning ◽

Class Imbalance ◽

Performance Data ◽

Machine Learning Techniques ◽

Data Sets ◽

Real World Data ◽

Content Type ◽

Domain Identification ◽

Learning Techniques ◽

And Performance

Purpose Malicious web domain identification is of significant importance to the security protection of internet users. With online credibility and performance data, the purpose of this paper to investigate the use of machine learning techniques for malicious web domain identification by considering the class imbalance issue (i.e. there are more benign web domains than malicious ones). Design/methodology/approach The authors propose an integrated resampling approach to handle class imbalance by combining the synthetic minority oversampling technique (SMOTE) and particle swarm optimisation (PSO), a population-based meta-heuristic algorithm. The authors use the SMOTE for oversampling and PSO for undersampling. Findings By applying eight well-known machine learning classifiers, the proposed integrated resampling approach is comprehensively examined using several imbalanced web domain data sets with different imbalance ratios. Compared to five other well-known resampling approaches, experimental results confirm that the proposed approach is highly effective. Practical implications This study not only inspires the practical use of online credibility and performance data for identifying malicious web domains but also provides an effective resampling approach for handling the class imbalance issue in the area of malicious web domain identification. Originality/value Online credibility and performance data are applied to build malicious web domain identification models using machine learning techniques. An integrated resampling approach is proposed to address the class imbalance issue. The performance of the proposed approach is confirmed based on real-world data sets with different imbalance ratios.

Download Full-text

Using Random Forests on Real-World City Data for Urban Planning in a Visual Semantic Decision Support System

Sensors ◽

10.3390/s19102266 ◽

2019 ◽

Vol 19 (10) ◽

pp. 2266 ◽

Cited By ~ 1

Author(s):

Nikolaos Sideris ◽

Georgios Bardis ◽

Athanasios Voulodimos ◽

Georgios Miaoulis ◽

Djamchid Ghazanfarpour

Keyword(s):

Machine Learning ◽

Urban Planning ◽

Random Forests ◽

Real World ◽

Performance Metrics ◽

World City ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Real World Data

The constantly increasing amount and availability of urban data derived from varying sources leads to an assortment of challenges that include, among others, the consolidation, visualization, and maximal exploitation prospects of the aforementioned data. A preeminent problem affecting urban planning is the appropriate choice of location to host a particular activity (either commercial or common welfare service) or the correct use of an existing building or empty space. In this paper, we propose an approach to address these challenges availed with machine learning techniques. The proposed system combines, fuses, and merges various types of data from different sources, encodes them using a novel semantic model that can capture and utilize both low-level geometric information and higher level semantic information and subsequently feeds them to the random forests classifier, as well as other supervised machine learning models for comparisons. Our experimental evaluation on multiple real-world data sets comparing the performance of several classifiers (including Feedforward Neural Networks, Support Vector Machines, Bag of Decision Trees, k-Nearest Neighbors and Naïve Bayes), indicated the superiority of Random Forests in terms of the examined performance metrics (Accuracy, Specificity, Precision, Recall, F-measure and G-mean).

Download Full-text

Symptom clusters among cancer survivors: what can machine learning techniques tell us?

BMC Medical Research Methodology ◽

10.1186/s12874-021-01352-4 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Koen I. Neijenhuijs ◽

Carel F. W. Peeters ◽

Henk van Weert ◽

Pim Cuijpers ◽

Irma Verdonck-de Leeuw

Keyword(s):

Machine Learning ◽

High Risk ◽

Cancer Survivors ◽

Well Being ◽

Physical Symptoms ◽

Symptom Clusters ◽

Machine Learning Techniques ◽

Risk Scores ◽

Learning Techniques ◽

Patient Reported

Abstract Purpose Knowledge regarding symptom clusters may inform targeted interventions. The current study investigated symptom clusters among cancer survivors, using machine learning techniques on a large data set. Methods Data consisted of self-reports of cancer survivors who used a fully automated online application ‘Oncokompas’ that supports them in their self-management. This is done by 1) monitoring their symptoms through patient reported outcome measures (PROMs); and 2) providing a personalized overview of supportive care options tailored to their scores, aiming to reduce symptom burden and improve health-related quality of life. In the present study, data on 26 generic symptoms (physical and psychosocial) were used. Results of the PROM of each symptom are presented to the user as a no well-being risk, moderate well-being risk, or high well-being risk score. Data of 1032 cancer survivors were analysed using Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) on high risk scores and moderate-to-high risk scores separately. Results When analyzing the high risk scores, seven clusters were extracted: one main cluster which contained most frequently occurring physical and psychosocial symptoms, and six subclusters with different combinations of these symptoms. When analyzing moderate-to-high risk scores, three clusters were extracted: two main clusters were identified, which separated physical symptoms (and their consequences) and psycho-social symptoms, and one subcluster with only body weight issues. Conclusion There appears to be an inherent difference on the co-occurrence of symptoms dependent on symptom severity. Among survivors with high risk scores, the data showed a clustering of more connections between physical and psycho-social symptoms in separate subclusters. Among survivors with moderate-to-high risk scores, we observed less connections in the clustering between physical and psycho-social symptoms.

Download Full-text

Pan-Cancer Metastasis Prediction Based on Graph Deep Learning Method

Frontiers in Cell and Developmental Biology ◽

10.3389/fcell.2021.675978 ◽

2021 ◽

Vol 9 ◽

Author(s):

Yining Xu ◽

Xinran Cui ◽

Yadong Wang

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Gene Expression Data ◽

Real World ◽

Tumor Metastasis ◽

Machine Learning Techniques ◽

Expression Data ◽

Real World Data ◽

Convolutional Network ◽

Learning Techniques

Tumor metastasis is the major cause of mortality from cancer. From this perspective, detecting cancer gene expression and transcriptome changes is important for exploring tumor metastasis molecular mechanisms and cellular events. Precisely estimating a patient’s cancer state and prognosis is the key challenge to develop a patient’s therapeutic schedule. In the recent years, a variety of machine learning techniques widely contributed to analyzing real-world gene expression data and predicting tumor outcomes. In this area, data mining and machine learning techniques have widely contributed to gene expression data analysis by supplying computational models to support decision-making on real-world data. Nevertheless, limitation of real-world data extremely restricted model predictive performance, and the complexity of data makes it difficult to extract vital features. Besides these, the efficacy of standard machine learning pipelines is far from being satisfactory despite the fact that diverse feature selection strategy had been applied. To address these problems, we developed directed relation-graph convolutional network to provide an advanced feature extraction strategy. We first constructed gene regulation network and extracted gene expression features based on relational graph convolutional network method. The high-dimensional features of each sample were regarded as an image pixel, and convolutional neural network was implemented to predict the risk of metastasis for each patient. Ten cross-validations on 1,779 cases from The Cancer Genome Atlas show that our model’s performance (area under the curve, AUC = 0.837; area under precision recall curve, AUPRC = 0.717) outstands that of an existing network-based method (AUC = 0.707, AUPRC = 0.555).

Download Full-text