Filtered BERT: Similarity Filter-Based Augmentation with Bidirectional Transfer Learning for Protected Health Information Prediction in Clinical Documents

Min Kang; Kye Hwa Lee; Youngho Lee

doi:10.3390/app11083668

Filtered BERT: Similarity Filter-Based Augmentation with Bidirectional Transfer Learning for Protected Health Information Prediction in Clinical Documents

Applied Sciences ◽

10.3390/app11083668 ◽

2021 ◽

Vol 11 (8) ◽

pp. 3668

Author(s):

Min Kang ◽

Kye Hwa Lee ◽

Youngho Lee

Keyword(s):

Health Information ◽

Transfer Learning ◽

Data Augmentation ◽

Protected Health Information ◽

Limited Data ◽

Secondary Use ◽

Data Environment

For the secondary use of clinical documents, it is necessary to de-identify protected health information (PHI) in documents. However, the difficulty lies in the fact that there are few publicly annotated PHI documents. To solve this problem, in this study, we propose a filtered bidirectional encoder representation from transformers (BERT)-based method that predicts a masked word and validates the word again through a similarity filter to construct augmented sentences. The proposed method effectively performs data augmentation. The results show that the augmentation method based on filtered BERT improved the performance of the model. This suggests that our method can effectively improve the performance of the model in the limited data environment.

Download Full-text

Protected Health Information De-Identification on Visual and Textual Features using Transfer Learning

2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS) ◽

10.1109/iciccs48265.2020.9120961 ◽

2020 ◽

Author(s):

Munipalle Sai Nikhila ◽

Vinay Kornapalli ◽

Pradeep Singh

Keyword(s):

Health Information ◽

Transfer Learning ◽

Protected Health Information ◽

Textual Features

Download Full-text

Image Classification using ImageNet Classifiers in Environments with Limited Data

10.21203/rs.3.rs-428416/v1 ◽

2021 ◽

Author(s):

Anirvin Sharma ◽

Abhinav Singh ◽

Tanupriya Choudhury ◽

Tanmay Sarkar

Keyword(s):

Image Classification ◽

Transfer Learning ◽

Data Augmentation ◽

Classification Algorithms ◽

Limited Data ◽

Training Time ◽

Learning Framework ◽

Rare Phenomena ◽

Fully Connected ◽

Compare And Contrast

Abstract In this research, we compare and contrast various image classification algorithms and how effective they are in specific problem sets where data might be scarce such as prediction of rare phenomena (for example, natural calamities), enterprise solutions etc. We have employed various state-of-the-art algorithms in this study credited to have been some of the best classifiers at the time of their inception. These classifiers have also been suspected to fall prey to overfitting on the datasets they were initially tested on viz. ImageNet and Common Objects in Context (COCO); we test to what extent these classifiers tend to generalize to the new data provided by us in a transfer learning framework. We utilize transfer learning on the ImageNet classifiers to adapt to our smaller dataset and examine various techniques such as data augmentation, batch normalization, dropout etc. to mitigate overfitting. All the classifiers follow a standard fully connected architecture. The end result should provide the reader with an overall analysis of which algorithm or approach to use in conditions where data might be limited while also giving a brief overview of the progress of image classification algorithms since their advent. We also provide an analysis on the effectiveness of data augmentation in limited datasets by providing results achieved with and without utilizing data augmentation. In our case, we found the MobileNet (with its lightweight nature contributing to low computational costs) and InceptionV3 (owing to its lower training time) to be the best performing classifiers for applying transfer learning to limited datasets out of the classifiers we have used for our study. This paper aims to establish preemptive standards that can be used to evaluate the models which can be used in object recognition, and image classification for problems containing limited amounts of data.

Download Full-text

Study on National Protected Health Information for Secondary Use and De-identification

Asia-pacific Journal of Multimedia services convergent with Art Humanities and Sociology ◽

10.14257/ajmahs.2016.08.11 ◽

2016 ◽

pp. 15-23

Author(s):

Cheoljung Kim ◽

Kwangsoo Yeo ◽

Pilwoo Lee ◽

Hanjin In ◽

Byeongjoo Moon ◽

...

Keyword(s):

Health Information ◽

Protected Health Information ◽

Secondary Use

Download Full-text

4.C. Round table: Joining forces: frameworks for international and multi-sectoral collaborations in health information

European Journal of Public Health ◽

10.1093/eurpub/ckaa165.141 ◽

2020 ◽

Vol 30 (Supplement_5) ◽

Author(s):

◽

Keyword(s):

Public Health ◽

Big Data ◽

Health Information ◽

Population Health ◽

Round Table ◽

Secondary Use ◽

Use Of Data ◽

National Initiatives ◽

Information Research

Abstract Countries have a wide range of lifestyles, environmental exposures and different health(care) systems providing a large natural experiment to be investigated. Through pan-European comparative studies, underlying determinants of population health can be explored and provide rich new insights into the dynamics of population health and care such as the safety, quality, effectiveness and costs of interventions. Additionally, in the big data era, secondary use of data has become one of the major cornerstones of digital transformation for health systems improvement. Several countries are reviewing governance models and regulatory framework for data reuse. Precision medicine and public health intelligence share the same population-based approach, as such, aligning secondary use of data initiatives will increase cost-efficiency of the data conversion value chain by ensuring that different stakeholders needs are accounted for since the beginning. At EU level, the European Commission has been raising awareness of the need to create adequate data ecosystems for innovative use of big data for health, specially ensuring responsible development and deployment of data science and artificial intelligence technologies in the medical and public health sectors. To this end, the Joint Action on Health Information (InfAct) is setting up the Distributed Infrastructure on Population Health (DIPoH). DIPoH provides a framework for international and multi-sectoral collaborations in health information. More specifically, DIPoH facilitates the sharing of research methods, data and results through participation of countries and already existing research networks. DIPoH's efforts include harmonization and interoperability, strengthening of the research capacity in MSs and providing European and worldwide perspectives to national data. In order to be embedded in the health information landscape, DIPoH aims to interact with existing (inter)national initiatives to identify common interfaces, to avoid duplication of the work and establish a sustainable long-term health information research infrastructure. In this workshop, InfAct lays down DIPoH's core elements in coherence with national and European initiatives and actors i.e. To-Reach, eHAction, the French Health Data Hub and ECHO. Pitch presentations on DIPoH and its national nodes will set the scene. In the format of a round table, possible collaborations with existing initiatives at (inter)national level will be debated with the audience. Synergies will be sought, reflections on community needs will be made and expectations on services will be discussed. The workshop will increase the knowledge of delegates around the latest health information infrastructure and initiatives that strive for better public health and health systems in countries. The workshop also serves as a capacity building activity to promote cooperation between initiatives and actors in the field. Key messages DIPoH an infrastructure aiming to interact with existing (inter)national initiatives to identify common interfaces, avoid duplication and enable a long-term health information research infrastructure. National nodes can improve coordination, communication and cooperation between health information stakeholders in a country, potentially reducing overlap and duplication of research and field-work.

Download Full-text

Deep transfer learning with limited data for machinery fault diagnosis

Applied Soft Computing ◽

10.1016/j.asoc.2021.107150 ◽

2021 ◽

Vol 103 ◽

pp. 107150

Author(s):

Te Han ◽

Chao Liu ◽

Rui Wu ◽

Dongxiang Jiang

Keyword(s):

Fault Diagnosis ◽

Transfer Learning ◽

Limited Data ◽

Machinery Fault Diagnosis

Download Full-text

Classification of Space Objects by Using Deep Learning with Micro-Doppler Signature Images

Sensors ◽

10.3390/s21134365 ◽

2021 ◽

Vol 21 (13) ◽

pp. 4365

Author(s):

Kwangyong Jung ◽

Jae-In Lee ◽

Nammoon Kim ◽

Sunjin Oh ◽

Dong-Wook Seo

Keyword(s):

Transfer Learning ◽

Data Augmentation ◽

Doppler Frequency ◽

Extraction Methods ◽

Classification Performance ◽

Reconstruction Method ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Space Objects ◽

Target Characteristics

Radar target classification is an important task in the missile defense system. State-of-the-art studies using micro-doppler frequency have been conducted to classify the space object targets. However, existing studies rely highly on feature extraction methods. Therefore, the generalization performance of the classifier is limited and there is room for improvement. Recently, to improve the classification performance, the popular approaches are to build a convolutional neural network (CNN) architecture with the help of transfer learning and use the generative adversarial network (GAN) to increase the training datasets. However, these methods still have drawbacks. First, they use only one feature to train the network. Therefore, the existing methods cannot guarantee that the classifier learns more robust target characteristics. Second, it is difficult to obtain large amounts of data that accurately mimic real-world target features by performing data augmentation via GAN instead of simulation. To mitigate the above problem, we propose a transfer learning-based parallel network with the spectrogram and the cadence velocity diagram (CVD) as the inputs. In addition, we obtain an EM simulation-based dataset. The radar-received signal is simulated according to a variety of dynamics using the concept of shooting and bouncing rays with relative aspect angles rather than the scattering center reconstruction method. Our proposed model is evaluated on our generated dataset. The proposed method achieved about 0.01 to 0.39% higher accuracy than the pre-trained networks with a single input feature.

Download Full-text

Olympic Games Event Recognition via Transfer Learning with Photobombing Guided Data Augmentation

Journal of Imaging ◽

10.3390/jimaging7020012 ◽

2021 ◽

Vol 7 (2) ◽

pp. 12

Author(s):

Yousef I. Mohamad ◽

Samah S. Baraheem ◽

Tam V. Nguyen

Keyword(s):

Deep Learning ◽

Transfer Learning ◽

Data Augmentation ◽

Olympic Games ◽

Event Recognition ◽

Surveillance Systems ◽

Video Captioning ◽

Practical Applications ◽

Sport Events ◽

The Olympic Games

Automatic event recognition in sports photos is both an interesting and valuable research topic in the field of computer vision and deep learning. With the rapid increase and the explosive spread of data, which is being captured momentarily, the need for fast and precise access to the right information has become a challenging task with considerable importance for multiple practical applications, i.e., sports image and video search, sport data analysis, healthcare monitoring applications, monitoring and surveillance systems for indoor and outdoor activities, and video captioning. In this paper, we evaluate different deep learning models in recognizing and interpreting the sport events in the Olympic Games. To this end, we collect a dataset dubbed Olympic Games Event Image Dataset (OGED) including 10 different sport events scheduled for the Olympic Games Tokyo 2020. Then, the transfer learning is applied on three popular deep convolutional neural network architectures, namely, AlexNet, VGG-16 and ResNet-50 along with various data augmentation methods. Extensive experiments show that ResNet-50 with the proposed photobombing guided data augmentation achieves 90% in terms of accuracy.

Download Full-text

Multiscale Object Detection from Drone Imagery Using Ensemble Transfer Learning

Drones ◽

10.3390/drones5030066 ◽

2021 ◽

Vol 5 (3) ◽

pp. 66

Author(s):

Rahee Walambe ◽

Aboli Marathe ◽

Ketan Kotecha

Keyword(s):

Object Detection ◽

Transfer Learning ◽

Data Augmentation ◽

Test Time ◽

Complex Task ◽

Open Domain ◽

End User ◽

Aerial Vehicle ◽

Uav Images ◽

Voting Strategy

Object detection in uncrewed aerial vehicle (UAV) images has been a longstanding challenge in the field of computer vision. Specifically, object detection in drone images is a complex task due to objects of various scales such as humans, buildings, water bodies, and hills. In this paper, we present an implementation of ensemble transfer learning to enhance the performance of the base models for multiscale object detection in drone imagery. Combined with a test-time augmentation pipeline, the algorithm combines different models and applies voting strategies to detect objects of various scales in UAV images. The data augmentation also presents a solution to the deficiency of drone image datasets. We experimented with two specific datasets in the open domain: the VisDrone dataset and the AU-AIR Dataset. Our approach is more practical and efficient due to the use of transfer learning and two-level voting strategy ensemble instead of training custom models on entire datasets. The experimentation shows significant improvement in the mAP for both VisDrone and AU-AIR datasets by employing the ensemble transfer learning method. Furthermore, the utilization of voting strategies further increases the 3reliability of the ensemble as the end-user can select and trace the effects of the mechanism for bounding box predictions.

Download Full-text

A novel data augmentation approach for mask detection using deep transfer learning

Intelligence-Based Medicine ◽

10.1016/j.ibmed.2021.100037 ◽

2021 ◽

pp. 100037

Author(s):

Manas Ranjan Prusty ◽

Vaibhav Tripathi ◽

Anmol Dubey

Keyword(s):

Transfer Learning ◽

Data Augmentation

Download Full-text

A DICOM dataset for evaluation of medical image de-identification

Scientific Data ◽

10.1038/s41597-021-00967-y ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Michael Rutherford ◽

Seong K. Mun ◽

Betty Levine ◽

William Bennett ◽

Kirk Smith ◽

...

Keyword(s):

Health Information ◽

National Cancer Institute ◽

Medical Image ◽

Cancer Imaging ◽

Protected Health Information ◽

Dicom Standard ◽

Clinical Imaging ◽

X Ray ◽

Evaluation Dataset ◽

Data Elements

AbstractWe developed a DICOM dataset that can be used to evaluate the performance of de-identification algorithms. DICOM objects (a total of 1,693 CT, MRI, PET, and digital X-ray images) were selected from datasets published in the Cancer Imaging Archive (TCIA). Synthetic Protected Health Information (PHI) was generated and inserted into selected DICOM Attributes to mimic typical clinical imaging exams. The DICOM Standard and TCIA curation audit logs guided the insertion of synthetic PHI into standard and non-standard DICOM data elements. A TCIA curation team tested the utility of the evaluation dataset. With this publication, the evaluation dataset (containing synthetic PHI) and de-identified evaluation dataset (the result of TCIA curation) are released on TCIA in advance of a competition, sponsored by the National Cancer Institute (NCI), for algorithmic de-identification of medical image datasets. The competition will use a much larger evaluation dataset constructed in the same manner. This paper describes the creation of the evaluation datasets and guidelines for their use.

Download Full-text