scholarly journals Re-examination of Rule-Based Methods in Deidentification of Electronic Health Records: Algorithm Development and Validation

10.2196/17622 ◽  
2020 ◽  
Vol 8 (4) ◽  
pp. e17622
Author(s):  
Zhenyu Zhao ◽  
Muyun Yang ◽  
Buzhou Tang ◽  
Tiejun Zhao

Background Deidentification of clinical records is a critical step before their publication. This is usually treated as a type of sequence labeling task, and ensemble learning is one of the best performing solutions. Under the framework of multi-learner ensemble, the significance of a candidate rule-based learner remains an open issue. Objective The aim of this study is to investigate whether a rule-based learner is useful in a hybrid deidentification system and offer suggestions on how to build and integrate a rule-based learner. Methods We chose a data-driven rule-learner named transformation-based error-driven learning (TBED) and integrated it into the best performing hybrid system in this task. Results On the popular Informatics for Integrating Biology and the Bedside (i2b2) deidentification data set, experiments showed that TBED can offer high performance with its generated rules, and integrating the rule-based model into an ensemble framework, which reached an F1 score of 96.76%, achieved the best performance reported in the community. Conclusions We proved the rule-based method offers an effective contribution to the current ensemble learning approach for the deidentification of clinical records. Such a rule system could be automatically learned by TBED, avoiding the high cost and low reliability of manual rule composition. In particular, we boosted the ensemble model with rules to create the best performance of the deidentification of clinical records.

2019 ◽  
Author(s):  
Zhenyu Zhao ◽  
Muyun Yang ◽  
Buzhou Tang ◽  
Tiejun Zhao

BACKGROUND Deidentification of clinical records is a critical step before their publication. This is usually treated as a type of sequence labeling task, and ensemble learning is one of the best performing solutions. Under the framework of multi-learner ensemble, the significance of a candidate rule-based learner remains an open issue. OBJECTIVE The aim of this study is to investigate whether a rule-based learner is useful in a hybrid deidentification system and offer suggestions on how to build and integrate a rule-based learner. METHODS We chose a data-driven rule-learner named transformation-based error-driven learning (TBED) and integrated it into the best performing hybrid system in this task. RESULTS On the popular Informatics for Integrating Biology and the Bedside (i2b2) deidentification data set, experiments showed that TBED can offer high performance with its generated rules, and integrating the rule-based model into an ensemble framework, which reached an F1 score of 96.76%, achieved the best performance reported in the community. CONCLUSIONS We proved the rule-based method offers an effective contribution to the current ensemble learning approach for the deidentification of clinical records. Such a rule system could be automatically learned by TBED, avoiding the high cost and low reliability of manual rule composition. In particular, we boosted the ensemble model with rules to create the best performance of the deidentification of clinical records.


Author(s):  
C. Sauer ◽  
F. Bagusat ◽  
M.-L. Ruiz-Ripoll ◽  
C. Roller ◽  
M. Sauer ◽  
...  

AbstractThis work aims at the characterization of a modern concrete material. For this purpose, we perform two experimental series of inverse planar plate impact (PPI) tests with the ultra-high performance concrete B4Q, using two different witness plate materials. Hugoniot data in the range of particle velocities from 180 to 840 m/s and stresses from 1.1 to 7.5 GPa is derived from both series. Within the experimental accuracy, they can be seen as one consistent data set. Moreover, we conduct corresponding numerical simulations and find a reasonably good agreement between simulated and experimentally obtained curves. From the simulated curves, we derive numerical Hugoniot results that serve as a homogenized, mean shock response of B4Q and add further consistency to the data set. Additionally, the comparison of simulated and experimentally determined results allows us to identify experimental outliers. Furthermore, we perform a parameter study which shows that a significant influence of the applied pressure dependent strength model on the derived equation of state (EOS) parameters is unlikely. In order to compare the current results to our own partially reevaluated previous work and selected recent results from literature, we use simulations to numerically extrapolate the Hugoniot results. Considering their inhomogeneous nature, a consistent picture emerges for the shock response of the discussed concrete and high-strength mortar materials. Hugoniot results from this and earlier work are presented for further comparisons. In addition, a full parameter set for B4Q, including validated EOS parameters, is provided for the application in simulations of impact and blast scenarios.


2021 ◽  
pp. 016555152110184
Author(s):  
Gunjan Chandwani ◽  
Anil Ahlawat ◽  
Gaurav Dubey

Document retrieval plays an important role in knowledge management as it facilitates us to discover the relevant information from the existing data. This article proposes a cluster-based inverted indexing algorithm for document retrieval. First, the pre-processing is done to remove the unnecessary and redundant words from the documents. Then, the indexing of documents is done by the cluster-based inverted indexing algorithm, which is developed by integrating the piecewise fuzzy C-means (piFCM) clustering algorithm and inverted indexing. After providing the index to the documents, the query matching is performed for the user queries using the Bhattacharyya distance. Finally, the query optimisation is done by the Pearson correlation coefficient, and the relevant documents are retrieved. The performance of the proposed algorithm is analysed by the WebKB data set and Twenty Newsgroups data set. The analysis exposes that the proposed algorithm offers high performance with a precision of 1, recall of 0.70 and F-measure of 0.8235. The proposed document retrieval system retrieves the most relevant documents and speeds up the storing and retrieval of information.


Sign in / Sign up

Export Citation Format

Share Document