Privacy-Preserving Predictive Modeling: Harmonization of Contextual Embeddings From Different Sources (Preprint)

Mapping Intimacies ◽

10.2196/preprints.9455 ◽

2017 ◽

Author(s):

Yingxiang Huang ◽

Junghye Lee ◽

Shuang Wang ◽

Jimeng Sun ◽

Hongfang Liu ◽

...

Keyword(s):

Medical Information ◽

Original Data ◽

Global Model ◽

Local Models ◽

Multiple Sources ◽

Model Combining ◽

Novel Approach ◽

Private Data ◽

Wide Range ◽

Medical Concepts

BACKGROUND Data sharing has been a big challenge in biomedical informatics because of privacy concerns. Contextual embedding models have demonstrated a very strong representative capability to describe medical concepts (and their context), and they have shown promise as an alternative way to support deep-learning applications without the need to disclose original data. However, contextual embedding models acquired from individual hospitals cannot be directly combined because their embedding spaces are different, and naive pooling renders combined embeddings useless. OBJECTIVE The aim of this study was to present a novel approach to address these issues and to promote sharing representation without sharing data. Without sacrificing privacy, we also aimed to build a global model from representations learned from local private data and synchronize information from multiple sources. METHODS We propose a methodology that harmonizes different local contextual embeddings into a global model. We used Word2Vec to generate contextual embeddings from each source and Procrustes to fuse different vector models into one common space by using a list of corresponding pairs as anchor points. We performed prediction analysis with harmonized embeddings. RESULTS We used sequential medical events extracted from the Medical Information Mart for Intensive Care III database to evaluate the proposed methodology in predicting the next likely diagnosis of a new patient using either structured data or unstructured data. Under different experimental scenarios, we confirmed that the global model built from harmonized local models achieves a more accurate prediction than local models and global models built from naive pooling. CONCLUSIONS Such aggregation of local models using our unique harmonization can serve as the proxy for a global model, combining information from a wide range of institutions and information sources. It allows information unique to a certain hospital to become available to other sites, increasing the fluidity of information flow in health care.

Download Full-text

Personalized Federated Learning for ECG Classification Based on Feature Alignment

Security and Communication Networks ◽

10.1155/2021/6217601 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Renjie Tang ◽

Junzhou Luo ◽

Junbo Qian ◽

Jiahui Jin

Keyword(s):

Data Privacy ◽

Medical Information ◽

Global Model ◽

Local Model ◽

Local Alignment ◽

Global Alignment ◽

Local Data ◽

Discriminative Ability ◽

Private Data ◽

Feature Alignment

Electrocardiogram (ECG) data classification is a hot research area for its application in medical information processing. However, insufficient data, privacy preserve, and local deployment are still challenging difficulties. To address these problems, a novel personalized federated learning method for ECG classification is proposed in this paper. First, a global model is trained with federated learning framework on multiple local data clients. Then, we use the global model and private data to train the local model. To reduce the feature inconsistency between global and private local data and for better fitting the private local data, a novel ”feature alignment” module is devised to guarantee the uniformity, which contains two parts, global alignment and local alignment, respectively. For global alignment, the graph metric of batch data is used to constrain the dissimilarity between features generated by the global model and local model. For local alignment, triplet loss is adopted to increase discriminative ability for local private data. Comprehensive experiments on our collected dataset are evaluated. The results show that the proposed method can be better adapted to local data and exhibit superior ability of generalization.

Download Full-text

Differential Image Compression for Telemedicine: A Novel Approach

Pakistan Journal of Engineering Technology & Science ◽

10.22555/pjets.v1i1.137 ◽

2015 ◽

Vol 1 (1) ◽

Author(s):

Adnan Alam Khan ◽

Dr. Asadullah Shah ◽

Saghir Muhammad

Keyword(s):

Image Compression ◽

Medical Information ◽

Emerging Technologies ◽

Medical Images ◽

Image Transmission ◽

Medical Doctors ◽

Medical Sciences ◽

Compression Method ◽

Paramedical Staff ◽

Novel Approach

Telemedicine is one of the most emerging technologies of applied medical sciences. Medical information related to patients is transmitted and stored for references and consultations. Medical images occupy huge space; in order to transmit these images may delay the process of image transmission in critical times. Image compression techniques provide a better solution to combat bandwidth scarcity problems, and transmit same image in a much lower bandwidth requirements, more faster and at the same time maintain quality. In this paper a differential image compression method is developed in which medical images are taken from a wounded patient and are compressed to reduce the bit rate of these images. Results indicate that on average 25% compression on images is achieved with 3.5 MOS taken from medical doctors and other paramedical staff the ultimately user of the images.

Download Full-text

Responsive and Personalized Web Layouts with Integer Programming

Proceedings of the ACM on Human-Computer Interaction ◽

10.1145/3461735 ◽

2021 ◽

Vol 5 (EICS) ◽

pp. 1-23

Author(s):

Markku Laine ◽

Yu Zhang ◽

Simo Santala ◽

Jussi P. P. Jokinen ◽

Antti Oulasvirta

Keyword(s):

Integer Programming ◽

Web Design ◽

Web Pages ◽

Web Page Design ◽

Automated Generation ◽

Novel Approach ◽

Wide Range ◽

Responsive Design ◽

Responsive Web Design ◽

Page Design

Over the past decade, responsive web design (RWD) has become the de facto standard for adapting web pages to a wide range of devices used for browsing. While RWD has improved the usability of web pages, it is not without drawbacks and limitations: designers and developers must manually design the web layouts for multiple screen sizes and implement associated adaptation rules, and its "one responsive design fits all" approach lacks support for personalization. This paper presents a novel approach for automated generation of responsive and personalized web layouts. Given an existing web page design and preferences related to design objectives, our integer programming -based optimizer generates a consistent set of web designs. Where relevant data is available, these can be further automatically personalized for the user and browsing device. The paper includes presentation of techniques for runtime adaptation of the designs generated into a fully responsive grid layout for web browsing. Results from our ratings-based online studies with end users (N = 86) and designers (N = 64) show that the proposed approach can automatically create high-quality responsive web layouts for a variety of real-world websites.

Download Full-text

Construction of a multi-source heterogeneous hybrid platform for big data

Journal of Computational Methods in Sciences and Engineering ◽

10.3233/jcm-215138 ◽

2021 ◽

pp. 1-10

Author(s):

Ying Wang ◽

Yiding Liu ◽

Minna Xia

Keyword(s):

Big Data ◽

Data Analysis ◽

Forest Fire ◽

Original Data ◽

Big Data Analysis ◽

Multiple Sources ◽

Data Types ◽

Fire Monitoring ◽

Data Platform

Big data is featured by multiple sources and heterogeneity. Based on the big data platform of Hadoop and spark, a hybrid analysis on forest fire is built in this study. This platform combines the big data analysis and processing technology, and learns from the research results of different technical fields, such as forest fire monitoring. In this system, HDFS of Hadoop is used to store all kinds of data, spark module is used to provide various big data analysis methods, and visualization tools are used to realize the visualization of analysis results, such as Echarts, ArcGIS and unity3d. Finally, an experiment for forest fire point detection is designed so as to corroborate the feasibility and effectiveness, and provide some meaningful guidance for the follow-up research and the establishment of forest fire monitoring and visualized early warning big data platform. However, there are two shortcomings in this experiment: more data types should be selected. At the same time, if the original data can be converted to XML format, the compatibility is better. It is expected that the above problems can be solved in the follow-up research.

Download Full-text

Using NLP in openEHR archetypes retrieval to promote interoperability: a feasibility study in China

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01554-2 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Bo Sun ◽

Fei Zhang ◽

Jing Li ◽

Yicheng Yang ◽

Xiaolin Diao ◽

...

Keyword(s):

Information Sharing ◽

Language Processing ◽

Medical Information ◽

Medical Professional ◽

Semantic Interoperability ◽

Medical Information System ◽

Search Term ◽

Search Terms ◽

Novel Approach ◽

Test Sets

Abstract Background With the development and application of medical information system, semantic interoperability is essential for accurate and advanced health-related computing and electronic health record (EHR) information sharing. The openEHR approach can improve semantic interoperability. One key improvement of openEHR is that it allows for the use of existing archetypes. The crucial problem is how to improve the precision and resolve ambiguity in the archetype retrieval. Method Based on the query expansion technology and Word2Vec model in Nature Language Processing (NLP), we propose to find synonyms as substitutes for original search terms in archetype retrieval. Test sets in different medical professional level are used to verify the feasibility. Result Applying the approach to each original search term (n = 120) in test sets, a total of 69,348 substitutes were constructed. Precision at 5 (P@5) was improved by 0.767, on average. For the best result, the P@5 was up to 0.975. Conclusions We introduce a novel approach that using NLP technology and corpus to find synonyms as substitutes for original search terms. Compared to simply mapping the element contained in openEHR to an external dictionary, this approach could greatly improve precision and resolve ambiguity in retrieval tasks. This is helpful to promote the application of openEHR and advance EHR information sharing.

Download Full-text

Density Guarantee on Finding Multiple Subgraphs and Subtensors

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3446668 ◽

2021 ◽

Vol 15 (5) ◽

pp. 1-32

Author(s):

Quang-huy Duong ◽

Heri Ramampiaro ◽

Kjetil Nørvåg ◽

Thu-lan Dam

Keyword(s):

Lower Bound ◽

State Of The Art ◽

The State ◽

The Other ◽

Exact Methods ◽

Practical Solution ◽

Novel Approach ◽

Wide Range ◽

Real World Datasets ◽

Tensor Data

Dense subregion (subgraph & subtensor) detection is a well-studied area, with a wide range of applications, and numerous efficient approaches and algorithms have been proposed. Approximation approaches are commonly used for detecting dense subregions due to the complexity of the exact methods. Existing algorithms are generally efficient for dense subtensor and subgraph detection, and can perform well in many applications. However, most of the existing works utilize the state-or-the-art greedy 2-approximation algorithm to capably provide solutions with a loose theoretical density guarantee. The main drawback of most of these algorithms is that they can estimate only one subtensor, or subgraph, at a time, with a low guarantee on its density. While some methods can, on the other hand, estimate multiple subtensors, they can give a guarantee on the density with respect to the input tensor for the first estimated subsensor only. We address these drawbacks by providing both theoretical and practical solution for estimating multiple dense subtensors in tensor data and giving a higher lower bound of the density. In particular, we guarantee and prove a higher bound of the lower-bound density of the estimated subgraph and subtensors. We also propose a novel approach to show that there are multiple dense subtensors with a guarantee on its density that is greater than the lower bound used in the state-of-the-art algorithms. We evaluate our approach with extensive experiments on several real-world datasets, which demonstrates its efficiency and feasibility.

Download Full-text

The Romano–Wolf multiple-hypothesis correction in Stata

The Stata Journal Promoting communications on statistics and Stata ◽

10.1177/1536867x20976314 ◽

2020 ◽

Vol 20 (4) ◽

pp. 812-843

Author(s):

Damian Clarke ◽

Joseph P. Romano ◽

Michael Wolf

Keyword(s):

Error Rate ◽

Multiple Testing ◽

Original Data ◽

Dependence Structure ◽

Familywise Error Rate ◽

Testing Procedures ◽

Multiple Testing Procedures ◽

Multiple Hypothesis ◽

Wide Range ◽

Performance Gains

When considering multiple-hypothesis tests simultaneously, standard statistical techniques will lead to overrejection of null hypotheses unless the multiplicity of the testing framework is explicitly considered. In this article, we discuss the Romano–Wolf multiple-hypothesis correction and document its implementation in Stata. The Romano–Wolf correction (asymptotically) controls the familywise error rate, that is, the probability of rejecting at least one true null hypothesis among a family of hypotheses under test. This correction is considerably more powerful than earlier multiple-testing procedures, such as the Bonferroni and Holm corrections, given that it takes into account the dependence structure of the test statistics by resampling from the original data. We describe a command, rwolf, that implements this correction and provide several examples based on a wide range of models. We document and discuss the performance gains from using rwolf over other multiple-testing procedures that control the familywise error rate.

Download Full-text

Development of Distortion Modeling Methods for Large Welded Structures

10.5957/smc-2010-p09 ◽

2010 ◽

Author(s):

Y. P. Yang ◽

H. Castner ◽

N. Kapustka

Keyword(s):

Thermal Analysis ◽

Plastic Strain ◽

Computation Time ◽

Thermomechanical Analysis ◽

Global Model ◽

Temperature History ◽

Plastic Strains ◽

Local Models ◽

Welded Structures ◽

Modeling Methods

Two distortion modeling methods, mapping plastic strain and lump-pass modeling, were developed and validated for predicting distortion on large welded structures to reduce the computation time. The mapping plastic-strain method requires two kinds of models, local models and a global model. The local models are analyzed to predict plastic strains and the global model is analyzed by mapping the plastic strains to predict distortions. The lump-pass modeling method includes two kinds of analyses: a thermal analysis and a thermomechanical analysis. The thermal analysis is conducted to predict temperature history. The thermomechanical analysis is performed to predict distortion by inputting the predicted temperature history.

Download Full-text

Semantic Information Retrieval on Medical Texts

ACM Computing Surveys ◽

10.1145/3462476 ◽

2022 ◽

Vol 54 (7) ◽

pp. 1-38

Author(s):

Lynda Tamine ◽

Lorraine Goeuriot

Keyword(s):

Information Retrieval ◽

Health Informatics ◽

Medical Information ◽

State Of The Art ◽

Lessons Learned ◽

Semantic Search ◽

Future Research ◽

Cross Model ◽

Wide Range ◽

Search Systems

The explosive growth and widespread accessibility of medical information on the Internet have led to a surge of research activity in a wide range of scientific communities including health informatics and information retrieval (IR). One of the common concerns of this research, across these disciplines, is how to design either clinical decision support systems or medical search engines capable of providing adequate support for both novices (e.g., patients and their next-of-kin) and experts (e.g., physicians, clinicians) tackling complex tasks (e.g., search for diagnosis, search for a treatment). However, despite the significant multi-disciplinary research advances, current medical search systems exhibit low levels of performance. This survey provides an overview of the state of the art in the disciplines of IR and health informatics, and bridging these disciplines shows how semantic search techniques can facilitate medical IR. First,we will give a broad picture of semantic search and medical IR and then highlight the major scientific challenges. Second, focusing on the semantic gap challenge, we will discuss representative state-of-the-art work related to feature-based as well as semantic-based representation and matching models that support medical search systems. In addition to seminal works, we will present recent works that rely on research advancements in deep learning. Third, we make a thorough cross-model analysis and provide some findings and lessons learned. Finally, we discuss some open issues and possible promising directions for future research trends.

Download Full-text

Memories of Tiananmen

10.5117/9789463728447 ◽

2021 ◽

Author(s):

Francis Lee ◽

Joseph Man Chan

Keyword(s):

Hong Kong ◽

Collective Memory ◽

Digital Media ◽

Political Communication ◽

Civil Liberties ◽

Significant Degree ◽

Historical Period ◽

Multiple Sources ◽

News Reports ◽

Wide Range

This book analyzes how collective memory regarding the 1989 Beijing student movement and the Tiananmen crackdown was produced, contested, sustained, and transformed in Hong Kong between 1989 and 2019. Drawing on data gathered through multiple sources such as news reports, digital media content, vigil onsite surveys, population surveys, and in-depth interviews with activists, rally participants, and other stakeholders, it identifies six key processes in the dynamics of social remembering: memory formation, memory mobilization, memory institutionalization, intergenerational transfer, memory repair, and memory balkanization. Memories of Tiananmen demonstrates how a socially dominant collective memory, even one the state finds politically irritable, can be generated and maintained through constant negotiation and efforts by a wide range of actors. While the book mainly focuses on the interplay between political changes and Tiananmen commemoration in the historical period within which the society enjoyed a significant degree of civil liberties, it also discusses how the trajectory of the collective memory may take a drastic turn as Hong Kong's autonomy is abridged. The book promises to be a key reference for anyone interested in collective memory studies, social movement research, political communication, and China and Hong Kong studies.

Download Full-text