Unsupervised Shape Completion via Deep Prior in the Neural Tangent Kernel Perspective

Lei Chu; Hao Pan; Wenping Wang

doi:10.1145/3459234

Unsupervised Shape Completion via Deep Prior in the Neural Tangent Kernel Perspective

ACM Transactions on Graphics ◽

10.1145/3459234 ◽

2021 ◽

Vol 40 (3) ◽

pp. 1-17

Author(s):

Lei Chu ◽

Hao Pan ◽

Wenping Wang

Keyword(s):

Missing Data ◽

Deep Neural Networks ◽

Feature Space ◽

Training Dataset ◽

Learning Mechanisms ◽

Reconstruction Methods ◽

Novel Approach ◽

Flexible Adaptation ◽

Shape Completion ◽

3D Shapes

We present a novel approach for completing and reconstructing 3D shapes from incomplete scanned data by using deep neural networks. Rather than being trained on supervised completion tasks and applied on a testing shape, the network is optimized from scratch on the single testing shape to fully adapt to the shape and complete the missing data using contextual guidance from the known regions. The ability to complete missing data by an untrained neural network is usually referred to as the deep prior . In this article, we interpret the deep prior from a neural tangent kernel (NTK) perspective and show that the completed shape patches by the trained CNN are naturally similar to existing patches, as they are proximate in the kernel feature space induced by NTK. The interpretation allows us to design more efficient network structures and learning mechanisms for the shape completion and reconstruction task. Being more aware of structural regularities than both traditional and other unsupervised learning-based reconstruction methods, our approach completes large missing regions with plausible shapes and complements supervised learning-based methods that use database priors by requiring no extra training dataset and showing flexible adaptation to a particular shape instance.

Download Full-text

The development and validation of prognostic models for overall survival in the presence of missing data in the training dataset: a strategy with a detailed example

Diagnostic and Prognostic Research ◽

10.1186/s41512-021-00103-9 ◽

2021 ◽

Vol 5 (1) ◽

Author(s):

Kara-Louise Royle ◽

David A. Cairns

Keyword(s):

Overall Survival ◽

Missing Data ◽

Prognostic Model ◽

Prognostic Index ◽

Risk Groups ◽

Prognostic Models ◽

Training Dataset ◽

Validation Process ◽

Test Dataset ◽

Development And Validation

Abstract Background The United Kingdom Myeloma Research Alliance (UK-MRA) Myeloma Risk Profile is a prognostic model for overall survival. It was trained and tested on clinical trial data, aiming to improve the stratification of transplant ineligible (TNE) patients with newly diagnosed multiple myeloma. Missing data is a common problem which affects the development and validation of prognostic models, where decisions on how to address missingness have implications on the choice of methodology. Methods Model building The training and test datasets were the TNE pathways from two large randomised multicentre, phase III clinical trials. Potential prognostic factors were identified by expert opinion. Missing data in the training dataset was imputed using multiple imputation by chained equations. Univariate analysis fitted Cox proportional hazards models in each imputed dataset with the estimates combined by Rubin’s rules. Multivariable analysis applied penalised Cox regression models, with a fixed penalty term across the imputed datasets. The estimates from each imputed dataset and bootstrap standard errors were combined by Rubin’s rules to define the prognostic model. Model assessment Calibration was assessed by visualising the observed and predicted probabilities across the imputed datasets. Discrimination was assessed by combining the prognostic separation D-statistic from each imputed dataset by Rubin’s rules. Model validation The D-statistic was applied in a bootstrap internal validation process in the training dataset and an external validation process in the test dataset, where acceptable performance was pre-specified. Development of risk groups Risk groups were defined using the tertiles of the combined prognostic index, obtained by combining the prognostic index from each imputed dataset by Rubin’s rules. Results The training dataset included 1852 patients, 1268 (68.47%) with complete case data. Ten imputed datasets were generated. Five hundred twenty patients were included in the test dataset. The D-statistic for the prognostic model was 0.840 (95% CI 0.716–0.964) in the training dataset and 0.654 (95% CI 0.497–0.811) in the test dataset and the corrected D-Statistic was 0.801. Conclusion The decision to impute missing covariate data in the training dataset influenced the methods implemented to train and test the model. To extend current literature and aid future researchers, we have presented a detailed example of one approach. Whilst our example is not without limitations, a benefit is that all of the patient information available in the training dataset was utilised to develop the model. Trial registration Both trials were registered; Myeloma IX-ISRCTN68454111, registered 21 September 2000. Myeloma XI-ISRCTN49407852, registered 24 June 2009.

Download Full-text

Automatic detection and segmentation of adenomatous colorectal polyps during colonoscopy using Mask R-CNN

Open Life Sciences ◽

10.1515/biol-2020-0055 ◽

2020 ◽

Vol 15 (1) ◽

pp. 588-596 ◽

Cited By ~ 1

Author(s):

Jie Meng ◽

Linyan Xue ◽

Ying Chang ◽

Jianguang Zhang ◽

Shilong Chang ◽

...

Keyword(s):

Deep Neural Networks ◽

Alimentary Tract ◽

Multiple Scale ◽

Colorectal Polyps ◽

Training Dataset ◽

Cad System ◽

Multicenter Trials ◽

Testing Dataset ◽

Adenoma Detection ◽

Aided Diagnosis

AbstractColorectal cancer (CRC) is one of the main alimentary tract system malignancies affecting people worldwide. Adenomatous polyps are precursors of CRC, and therefore, preventing the development of these lesions may also prevent subsequent malignancy. However, the adenoma detection rate (ADR), a measure of the ability of a colonoscopist to identify and remove precancerous colorectal polyps, varies significantly among endoscopists. Here, we attempt to use a convolutional neural network (CNN) to generate a unique computer-aided diagnosis (CAD) system by exploring in detail the multiple-scale performance of deep neural networks. We applied this system to 3,375 hand-labeled images from the screening colonoscopies of 1,197 patients; of whom, 3,045 were assigned to the training dataset and 330 to the testing dataset. The images were diagnosed simply as either an adenomatous or non-adenomatous polyp. When applied to the testing dataset, our CNN-CAD system achieved a mean average precision of 89.5%. We conclude that the proposed framework could increase the ADR and decrease the incidence of interval CRCs, although further validation through large multicenter trials is required.

Download Full-text

Evaluation of Power Insulator Detection Efficiency with the Use of Limited Training Dataset

Applied Sciences ◽

10.3390/app10062104 ◽

2020 ◽

Vol 10 (6) ◽

pp. 2104

Author(s):

Michał Tomaszewski ◽

Paweł Michalski ◽

Jakub Osuchowski

Keyword(s):

Neural Network ◽

Neural Networks ◽

Object Detection ◽

Convolutional Neural Network ◽

Deep Neural Networks ◽

Detection Efficiency ◽

Training Data ◽

Training Dataset ◽

Training Set ◽

Convolutional Network

This article presents an analysis of the effectiveness of object detection in digital images with the application of a limited quantity of input. The possibility of using a limited set of learning data was achieved by developing a detailed scenario of the task, which strictly defined the conditions of detector operation in the considered case of a convolutional neural network. The described solution utilizes known architectures of deep neural networks in the process of learning and object detection. The article presents comparisons of results from detecting the most popular deep neural networks while maintaining a limited training set composed of a specific number of selected images from diagnostic video. The analyzed input material was recorded during an inspection flight conducted along high-voltage lines. The object detector was built for a power insulator. The main contribution of the presented papier is the evidence that a limited training set (in our case, just 60 training frames) could be used for object detection, assuming an outdoor scenario with low variability of environmental conditions. The decision of which network will generate the best result for such a limited training set is not a trivial task. Conducted research suggests that the deep neural networks will achieve different levels of effectiveness depending on the amount of training data. The most beneficial results were obtained for two convolutional neural networks: the faster region-convolutional neural network (faster R-CNN) and the region-based fully convolutional network (R-FCN). Faster R-CNN reached the highest AP (average precision) at a level of 0.8 for 60 frames. The R-FCN model gained a worse AP result; however, it can be noted that the relationship between the number of input samples and the obtained results has a significantly lower influence than in the case of other CNN models, which, in the authors’ assessment, is a desired feature in the case of a limited training set.

Download Full-text

Discovery of Highly Polymorphic Organic Materials: A New Machine Learning Approach

10.26434/chemrxiv.9524219 ◽

2019 ◽

Author(s):

Zied Hosni ◽

Annalisa Riccardi ◽

Stephanie Yerdelen ◽

Alan R. G. Martin ◽

Deborah Bowering ◽

...

Keyword(s):

Machine Learning ◽

Structure Prediction ◽

External Validation ◽

New Drugs ◽

Training Dataset ◽

Validation Dataset ◽

Machine Learning Classification ◽

Novel Approach ◽

Physical Form ◽

Machine Learning Approach

<div><div><p>Polymorphism is the capacity of a molecule to adopt different conformations or molecular packing arrangements in the solid state. This is a key property to control during pharmaceutical manufacturing because it can impact a range of properties including stability and solubility. In this study, a novel approach based on machine learning classification methods is used to predict the likelihood for an organic compound to crystallise in multiple forms. A training dataset of drug-like molecules was curated from the Cambridge Structural Database (CSD) and filtered according to entries in the Drug Bank database. The number of separate forms in the CSD for each molecule was recorded. A metaclassifier was trained using this dataset to predict the expected number of crystalline forms from the compound descriptors. This approach was used to estimate the number of crystallographic forms for an external validation dataset. These results suggest this novel methodology can be used to predict the extent of polymorphism of new drugs or not-yet experimentally screened molecules. This promising method complements expensive ab initio methods for crystal structure prediction and as integral to experimental physical form screening, may identify systems that with unexplored potential.</p> </div> </div>

Download Full-text

Sparse Signal Recovery from Modulo Observations

10.21203/rs.3.rs-42731/v1 ◽

2020 ◽

Author(s):

Viraj Shah ◽

Chinmay Hegde

Keyword(s):

Phase Retrieval ◽

Dynamic Range ◽

Signal Reconstruction ◽

Real Data ◽

Superior Performance ◽

Signal Recovery ◽

Reconstruction Methods ◽

Novel Approach ◽

Sparsity Constraints ◽

Improved Performance

Abstract We consider the problem of reconstructing a signal from under-determined modulo observations (or measurements). This observation model is inspired by a (relatively) less well-known imaging mechanism called modulo imaging, which can be used to extend the dynamic range of imaging systems; variations of this model have also been studied under the category of phase unwrapping. Signal reconstruction in the under-determined regime with modulo observations is a challenging ill-posed problem, and existing reconstruction methods cannot be used directly. In this paper, we propose a novel approach to solving the inverse problem limited to two modulo periods, inspired by recent advances in algorithms for phase retrieval under sparsity constraints. We show that given a sufficient number of measurements, our algorithm perfectly recovers the underlying signal and provides improved performance over other existing algorithms. We also provide experiments validating our approach on both synthetic and real data to depict its superior performance.

Download Full-text

Multi-View Clustering in Latent Embedding Space

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5756 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3513-3520 ◽

Cited By ~ 2

Author(s):

Man-Sheng Chen ◽

Ling Huang ◽

Chang-Dong Wang ◽

Dong Huang

Keyword(s):

Structure Learning ◽

Clustering Algorithms ◽

Feature Space ◽

Global Structure ◽

Optimization Scheme ◽

Optimization Framework ◽

Novel Approach ◽

Indicator Matrix ◽

Original Feature

Previous multi-view clustering algorithms mostly partition the multi-view data in their original feature space, the efficacy of which heavily and implicitly relies on the quality of the original feature presentation. In light of this, this paper proposes a novel approach termed Multi-view Clustering in Latent Embedding Space (MCLES), which is able to cluster the multi-view data in a learned latent embedding space while simultaneously learning the global structure and the cluster indicator matrix in a unified optimization framework. Specifically, in our framework, a latent embedding representation is firstly discovered which can effectively exploit the complementary information from different views. The global structure learning is then performed based on the learned latent embedding representation. Further, the cluster indicator matrix can be acquired directly with the learned global structure. An alternating optimization scheme is introduced to solve the optimization problem. Extensive experiments conducted on several real-world multi-view datasets have demonstrated the superiority of our approach.

Download Full-text

Predictive Analytics with Strategically Missing Data

INFORMS Journal on Computing ◽

10.1287/ijoc.2019.0947 ◽

2020 ◽

Author(s):

Juheng Zhang ◽

Xiaoping Liu ◽

Xiao-Bai Li

Keyword(s):

Missing Data ◽

Financial Reporting ◽

Real World ◽

Missing Values ◽

Predictive Analytics ◽

Support Vector ◽

Real World Data ◽

Novel Approach ◽

Strategic Behaviors ◽

Job Application

We study strategically missing data problems in predictive analytics with regression. In many real-world situations, such as financial reporting, college admission, job application, and marketing advertisement, data providers often conceal certain information on purpose in order to gain a favorable outcome. It is important for the decision-maker to have a mechanism to deal with such strategic behaviors. We propose a novel approach to handle strategically missing data in regression prediction. The proposed method derives imputation values of strategically missing data based on the Support Vector Regression models. It provides incentives for the data providers to disclose their true information. We show that with the proposed method imputation errors for the missing values are minimized under some reasonable conditions. An experimental study on real-world data demonstrates the effectiveness of the proposed approach.

Download Full-text

A Novel Discriminating and Relative Global Spatial Image Representation with Applications in CBIR

Applied Sciences ◽

10.3390/app8112242 ◽

2018 ◽

Vol 8 (11) ◽

pp. 2242 ◽

Cited By ~ 16

Author(s):

Bushra Zafar ◽

Rehan Ashraf ◽

Nouman Ali ◽

Muhammad Iqbal ◽

Muhammad Sajid ◽

...

Keyword(s):

Image Classification ◽

Spatial Information ◽

Image Representation ◽

Feature Space ◽

Research Problem ◽

Visual Words ◽

Spatial Image ◽

User Query ◽

Novel Approach ◽

Image Representations

The requirement for effective image search, which motivates the use of Content-Based Image Retrieval (CBIR) and the search of similar multimedia contents on the basis of user query, remains an open research problem for computer vision applications. The application domains for Bag of Visual Words (BoVW) based image representations are object recognition, image classification and content-based image analysis. Interest point detectors are quantized in the feature space and the final histogram or image signature do not retain any detail about co-occurrences of features in the 2D image space. This spatial information is crucial, as it adversely affects the performance of an image classification-based model. The most notable contribution in this context is Spatial Pyramid Matching (SPM), which captures the absolute spatial distribution of visual words. However, SPM is sensitive to image transformations such as rotation, flipping and translation. When images are not well-aligned, SPM may lose its discriminative power. This paper introduces a novel approach to encoding the relative spatial information for histogram-based representation of the BoVW model. This is established by computing the global geometric relationship between pairs of identical visual words with respect to the centroid of an image. The proposed research is evaluated by using five different datasets. Comprehensive experiments demonstrate the robustness of the proposed image representation as compared to the state-of-the-art methods in terms of precision and recall values.

Download Full-text

Statistical image reconstruction methods in PET with compensation for missing data

1996 IEEE Nuclear Science Symposium. Conference Record ◽

10.1109/nssmic.1996.587908 ◽

2002 ◽

Author(s):

P.E. Kinahan ◽

J.A. Fessler ◽

J.S. Karp

Keyword(s):

Missing Data ◽

Image Reconstruction ◽

Statistical Image Reconstruction ◽

Reconstruction Methods

Download Full-text

Polynomial Matrix Completion for Missing Data Imputation and Transductive Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5796 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3842-3849

Author(s):

Jicong Fan ◽

Yuqian Zhang ◽

Madeleine Udell

Keyword(s):

Missing Data ◽

Matrix Completion ◽

Synthetic Data ◽

Full Rank ◽

Feature Space ◽

Optimization Method ◽

Transductive Learning ◽

Intrinsic Dimension ◽

Missing Data Imputation ◽

New Formulation

This paper develops new methods to recover the missing entries of a high-rank or even full-rank matrix when the intrinsic dimension of the data is low compared to the ambient dimension. Specifically, we assume that the columns of a matrix are generated by polynomials acting on a low-dimensional intrinsic variable, and wish to recover the missing entries under this assumption. We show that we can identify the complete matrix of minimum intrinsic dimension by minimizing the rank of the matrix in a high dimensional feature space. We develop a new formulation of the resulting problem using the kernel trick together with a new relaxation of the rank objective, and propose an efficient optimization method. We also show how to use our methods to complete data drawn from multiple nonlinear manifolds. Comparative studies on synthetic data, subspace clustering with missing data, motion capture data recovery, and transductive learning verify the superiority of our methods over the state-of-the-art.

Download Full-text