vertically partitioned data Latest Research Papers

Accurate Training of The Cox Proportional Hazards Model on Vertically-Partitioned Data While Preserving Privacy

10.21203/rs.3.rs-602219/v1 ◽

2021 ◽

Author(s):

Bart Kamphorst ◽

Thomas Rooijakkers ◽

Thijs Veugen ◽

Matteo Cellamare ◽

Daan Knoors

Keyword(s):

Survival Analysis ◽

Proportional Hazards ◽

Computation Time ◽

Privacy Preserving ◽

Cox Proportional Hazards ◽

Medical Data ◽

Security And Privacy ◽

Partitioned Data ◽

Vertically Partitioned Data ◽

Technical Challenges

Abstract Background: Analysing distributed medical data is challenging because of data sensitivity and various regulations to access and combine data. Some privacy-preserving methods are known for analyzing horizontally-partitioned data, where different organisations have similar data on disjoint sets of people. Technically more challenging is the case of vertically-partitioned data, dealing with data on overlapping sets of people. We use an emerging technology based on cryptographic techniques called secure multi-party computation (MPC), and apply it to perform privacy-preserving survival analysis on vertically-distributed data by means of the Cox proportional hazards (CPH) model. Both MPC and CPH are explained. Methods: We use a Newton-Raphson solver to securely train the CPH model with MPC, jointly with all data holders, without revealing any sensitive data. In order to securely compute the log-partial likelihood in each iteration, we run into several technical challenges to preserve the efficiency and security of our solution. To tackle these technical challenges, we generalize a cryptographic protocol for securely computing the inverse of the Hessian matrix and develop a new method for securely computing exponentiations. A theoretical complexity estimate is given to get insight into the computational and communication effort that is needed. Results: Our secure solution is implemented in a setting with three different machines, each presenting a different data holder, which can communicate through the internet. The MPyC platform is used for implementing this privacy-preserving solution to obtain the CPH model. We test the accuracy and computation time of our methods on three standard benchmark survival datasets. We identify future work to make our solution more efficient. Conclusions: Our secure solution is comparable with the standard, non-secure solver in terms of accuracy and convergence speed. The computation time is considerably larger, although the theoretical complexity is still cubic in the number of covariates and quadratic in the number of subjects. We conclude that this is a promising way of performing parametric survival analysis on vertically-distributed medical data, while realising high level of security and privacy.

Download Full-text

Implementing Vertical Federated Learning Using Autoencoders: Practical Application, Generalizability, and Utility Study

JMIR Medical Informatics ◽

10.2196/26598 ◽

2021 ◽

Vol 9 (6) ◽

pp. e26598

Author(s):

Dongchul Cha ◽

MinDong Sung ◽

Yu-Rang Park

Keyword(s):

Domain Knowledge ◽

Characteristic Curve ◽

Feature Space ◽

Original Data ◽

Training Data ◽

Data Governance ◽

Domain Specific Knowledge ◽

Partitioned Data ◽

Vertically Partitioned Data ◽

Latent Representations

Background Machine learning (ML) is now widely deployed in our everyday lives. Building robust ML models requires a massive amount of data for training. Traditional ML algorithms require training data centralization, which raises privacy and data governance issues. Federated learning (FL) is an approach to overcome this issue. We focused on applying FL on vertically partitioned data, in which an individual’s record is scattered among different sites. Objective The aim of this study was to perform FL on vertically partitioned data to achieve performance comparable to that of centralized models without exposing the raw data. Methods We used three different datasets (Adult income, Schwannoma, and eICU datasets) and vertically divided each dataset into different pieces. Following the vertical division of data, overcomplete autoencoder-based model training was performed for each site. Following training, each site’s data were transformed into latent data, which were aggregated for training. A tabular neural network model with categorical embedding was used for training. A centrally based model was used as a baseline model, which was compared to that of FL in terms of accuracy and area under the receiver operating characteristic curve (AUROC). Results The autoencoder-based network successfully transformed the original data into latent representations with no domain knowledge applied. These altered data were different from the original data in terms of the feature space and data distributions, indicating appropriate data security. The loss of performance was minimal when using an overcomplete autoencoder; accuracy loss was 1.2%, 8.89%, and 1.23%, and AUROC loss was 1.1%, 0%, and 1.12% in the Adult income, Schwannoma, and eICU dataset, respectively. Conclusions We proposed an autoencoder-based ML model for vertically incomplete data. Since our model is based on unsupervised learning, no domain-specific knowledge is required in individual sites. Under the circumstances where direct data sharing is not available, our approach may be a practical solution enabling both data protection and building a robust model.

Download Full-text

Multi-Tier Federated Learning for Vertically Partitioned Data

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9415026 ◽

2021 ◽

Author(s):

Anirban Das ◽

Stacy Patterson

Keyword(s):

Partitioned Data ◽

Vertically Partitioned Data

Download Full-text

Automatable Distributed Regression Analysis of Vertically Partitioned Data Facilitated by PopMedNet: Feasibility and Enhancement Study

JMIR Medical Informatics ◽

10.2196/21459 ◽

2021 ◽

Vol 9 (4) ◽

pp. e21459

Author(s):

Qoua Her ◽

Thomas Kent ◽

Yuji Samizo ◽

Aleksandra Slavkovic ◽

Yury Vilk ◽

...

Keyword(s):

Regression Analysis ◽

Real World ◽

Data Sources ◽

Patient Privacy ◽

File Transfer ◽

Level Information ◽

Partitioned Data ◽

Analysis Center ◽

Vertically Partitioned Data ◽

Technical Capability

Background In clinical research, important variables may be collected from multiple data sources. Physical pooling of patient-level data from multiple sources often raises several challenges, including proper protection of patient privacy and proprietary interests. We previously developed an SAS-based package to perform distributed regression—a suite of privacy-protecting methods that perform multivariable-adjusted regression analysis using only summary-level information—with horizontally partitioned data, a setting where distinct cohorts of patients are available from different data sources. We integrated the package with PopMedNet, an open-source file transfer software, to facilitate secure file transfer between the analysis center and the data-contributing sites. The feasibility of using PopMedNet to facilitate distributed regression analysis (DRA) with vertically partitioned data, a setting where the data attributes from a cohort of patients are available from different data sources, was unknown. Objective The objective of the study was to describe the feasibility of using PopMedNet and enhancements to PopMedNet to facilitate automatable vertical DRA (vDRA) in real-world settings. Methods We gathered the statistical and informatic requirements of using PopMedNet to facilitate automatable vDRA. We enhanced PopMedNet based on these requirements to improve its technical capability to support vDRA. Results PopMedNet can enable automatable vDRA. We identified and implemented two enhancements to PopMedNet that improved its technical capability to perform automatable vDRA in real-world settings. The first was the ability to simultaneously upload and download multiple files, and the second was the ability to directly transfer summary-level information between the data-contributing sites without a third-party analysis center. Conclusions PopMedNet can be used to facilitate automatable vDRA to protect patient privacy and support clinical research in real-world settings.

Download Full-text

Learning algorithms for vector quantization using vertically partitioned data with IoT

Artificial Life and Robotics ◽

10.1007/s10015-021-00683-1 ◽

2021 ◽

Author(s):

Hirofumi Miyajima ◽

Noritaka Shigei ◽

Hiromi Miyajima ◽

Norio Shiratori

Keyword(s):

Vector Quantization ◽

Learning Algorithms ◽

Partitioned Data ◽

Vertically Partitioned Data

Download Full-text

Privacy-Preserving Kernel Computation For Vertically Partitioned Data

10.14428/esann/2021.es2021-152 ◽

2021 ◽

Author(s):

Mirko Polato ◽

Alberto Gallinaro ◽

Fabio Aiolli

Keyword(s):

Privacy Preserving ◽

Partitioned Data ◽

Vertically Partitioned Data

Download Full-text

Simplified security learning using vertically partitioned data with IoT

Nonlinear Theory and Its Applications IEICE ◽

10.1587/nolta.12.412 ◽

2021 ◽

Vol 12 (3) ◽

pp. 412-423

Author(s):

Hirofumi Miyajima ◽

Noritaka Shigei ◽

Hiromi Miyajima ◽

Norio Shiratori

Keyword(s):

Partitioned Data ◽

Vertically Partitioned Data

Download Full-text

Implementing Vertical Federated Learning Using Autoencoders: Practical Application, Generalizability, and Utility Study (Preprint)

10.2196/preprints.26598 ◽

2020 ◽

Author(s):

Dongchul Cha ◽

MinDong Sung ◽

Yu-Rang Park

Keyword(s):

Domain Knowledge ◽

Characteristic Curve ◽

Feature Space ◽

Original Data ◽

Training Data ◽

Data Governance ◽

Domain Specific Knowledge ◽

Partitioned Data ◽

Vertically Partitioned Data ◽

Latent Representations

BACKGROUND Machine learning (ML) is now widely deployed in our everyday lives. Building robust ML models requires a massive amount of data for training. Traditional ML algorithms require training data centralization, which raises privacy and data governance issues. Federated learning (FL) is an approach to overcome this issue. We focused on applying FL on vertically partitioned data, in which an individual’s record is scattered among different sites. OBJECTIVE The aim of this study was to perform FL on vertically partitioned data to achieve performance comparable to that of centralized models without exposing the raw data. METHODS We used three different datasets (Adult income, Schwannoma, and eICU datasets) and vertically divided each dataset into different pieces. Following the vertical division of data, overcomplete autoencoder-based model training was performed for each site. Following training, each site’s data were transformed into latent data, which were aggregated for training. A tabular neural network model with categorical embedding was used for training. A centrally based model was used as a baseline model, which was compared to that of FL in terms of accuracy and area under the receiver operating characteristic curve (AUROC). RESULTS The autoencoder-based network successfully transformed the original data into latent representations with no domain knowledge applied. These altered data were different from the original data in terms of the feature space and data distributions, indicating appropriate data security. The loss of performance was minimal when using an overcomplete autoencoder; accuracy loss was 1.2%, 8.89%, and 1.23%, and AUROC loss was 1.1%, 0%, and 1.12% in the Adult income, Schwannoma, and eICU dataset, respectively. CONCLUSIONS We proposed an autoencoder-based ML model for vertically incomplete data. Since our model is based on unsupervised learning, no domain-specific knowledge is required in individual sites. Under the circumstances where direct data sharing is not available, our approach may be a practical solution enabling both data protection and building a robust model.

Download Full-text

Privacy-preserving two-parties logistic regression on vertically partitioned data using asynchronous gradient sharing

Peer-to-Peer Networking and Applications ◽

10.1007/s12083-020-01017-x ◽

2020 ◽

Author(s):

Qianjun Wei ◽

Qiang Li ◽

Zhipeng Zhou ◽

ZhengQiang Ge ◽

Yonggang Zhang

Keyword(s):

Logistic Regression ◽

Privacy Preserving ◽

Partitioned Data ◽

Vertically Partitioned Data

Download Full-text

An Implementation of Privacy Preserving “IF THEN ELSE” Rules for Vertically Partitioned Data

Rising Threats in Expert Applications and Solutions - Advances in Intelligent Systems and Computing ◽

10.1007/978-981-15-6014-9_6 ◽

2020 ◽

pp. 45-56

Author(s):

Kamlesh Ahuja ◽

Navneet Sharma

Keyword(s):

Privacy Preserving ◽

Partitioned Data ◽

Vertically Partitioned Data

Download Full-text

vertically partitioned data
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Accurate Training of The Cox Proportional Hazards Model on Vertically-Partitioned Data While Preserving Privacy

Implementing Vertical Federated Learning Using Autoencoders: Practical Application, Generalizability, and Utility Study

Multi-Tier Federated Learning for Vertically Partitioned Data

Automatable Distributed Regression Analysis of Vertically Partitioned Data Facilitated by PopMedNet: Feasibility and Enhancement Study

Learning algorithms for vector quantization using vertically partitioned data with IoT

Privacy-Preserving Kernel Computation For Vertically Partitioned Data

Simplified security learning using vertically partitioned data with IoT

Implementing Vertical Federated Learning Using Autoencoders: Practical Application, Generalizability, and Utility Study (Preprint)

Privacy-preserving two-parties logistic regression on vertically partitioned data using asynchronous gradient sharing

An Implementation of Privacy Preserving “IF THEN ELSE” Rules for Vertically Partitioned Data

Export Citation Format

vertically partitioned dataRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Accurate Training of The Cox Proportional Hazards Model on Vertically-Partitioned Data While Preserving Privacy

Implementing Vertical Federated Learning Using Autoencoders: Practical Application, Generalizability, and Utility Study

Multi-Tier Federated Learning for Vertically Partitioned Data

Automatable Distributed Regression Analysis of Vertically Partitioned Data Facilitated by PopMedNet: Feasibility and Enhancement Study

Learning algorithms for vector quantization using vertically partitioned data with IoT

Privacy-Preserving Kernel Computation For Vertically Partitioned Data

Simplified security learning using vertically partitioned data with IoT

Implementing Vertical Federated Learning Using Autoencoders: Practical Application, Generalizability, and Utility Study (Preprint)

Privacy-preserving two-parties logistic regression on vertically partitioned data using asynchronous gradient sharing

An Implementation of Privacy Preserving “IF THEN ELSE” Rules for Vertically Partitioned Data

vertically partitioned data
Recently Published Documents