Unsupervised Learning from Videos for Object Discovery in Single Images

Dong Zhao; Baoqing Ding; Yulin Wu; Lei Chen; Hongchao Zhou

doi:10.3390/sym13010038

Unsupervised Learning from Videos for Object Discovery in Single Images

Symmetry ◽

10.3390/sym13010038 ◽

2020 ◽

Vol 13 (1) ◽

pp. 38

Author(s):

Dong Zhao ◽

Baoqing Ding ◽

Yulin Wu ◽

Lei Chen ◽

Hongchao Zhou

Keyword(s):

Input Image ◽

Frame Structure ◽

Object Discovery ◽

Spatial Transformations ◽

Learning Tasks ◽

Video Frames ◽

Segmentation Methods ◽

Feed Forward Network ◽

Single Input ◽

Inter Frame

This paper proposes a method for discovering the primary objects in single images by learning from videos in a purely unsupervised manner—the learning process is based on videos, but the generated network is able to discover objects from a single input image. The rough idea is that an image typically consists of multiple object instances (like the foreground and background) that have spatial transformations across video frames and they can be sparsely represented. By exploring the sparsity representation of a video with a neural network, one may learn the features of each object instance without any labels, which can be used to discover, recognize, or distinguish object instances from a single image. In this paper, we consider a relatively simple scenario, where each image roughly consists of a foreground and a background. Our proposed method is based on encoder-decoder structures to sparsely represent the foreground, background, and segmentation mask, which further reconstruct the original images. We apply the feed-forward network trained from videos for object discovery in single images, which is different from the previous co-segmentation methods that require videos or collections of images as the input for inference. The experimental results on various object segmentation benchmarks demonstrate that the proposed method extracts primary objects accurately and robustly, which suggests that unsupervised image learning tasks can benefit from the sparsity of images and the inter-frame structure of videos.

Download Full-text

Single input image super resolution with alias detection

10th International Conference on Information Science, Signal Processing and their Applications (ISSPA 2010) ◽

10.1109/isspa.2010.5605447 ◽

2010 ◽

Author(s):

Michael Chukwu ◽

Maher Sid-Ahmed

Keyword(s):

Super Resolution ◽

Input Image ◽

Single Input ◽

Image Super Resolution

Download Full-text

Dual-Path Attention Compensation U-Net for Stroke Lesion Segmentation

Computational Intelligence and Neuroscience ◽

10.1155/2021/7552185 ◽

2021 ◽

Vol 2021 ◽

pp. 1-16

Author(s):

Haisheng Hui ◽

Xueying Zhang ◽

Zelin Wu ◽

Fenlian Li

Keyword(s):

Input Image ◽

Core Network ◽

Lesion Segmentation ◽

Coverage Area ◽

Primary Path ◽

Primary Network ◽

Segmentation Methods ◽

Mri Scans ◽

Stroke Lesion ◽

Segmentation Task

For the segmentation task of stroke lesions, using the attention U-Net model based on the self-attention mechanism can suppress irrelevant regions in an input image while highlighting salient features useful for specific tasks. However, when the lesion is small and the lesion contour is blurred, attention U-Net may generate wrong attention coefficient maps, leading to incorrect segmentation results. To cope with this issue, we propose a dual-path attention compensation U-Net (DPAC-UNet) network, which consists of a primary network and auxiliary path network. Both networks are attention U-Net models and identical in structure. The primary path network is the core network that performs accurate lesion segmentation and outputting of the final segmentation result. The auxiliary path network generates auxiliary attention compensation coefficients and sends them to the primary path network to compensate for and correct possible attention coefficient errors. To realize the compensation mechanism of DPAC-UNet, we propose a weighted binary cross-entropy Tversky (WBCE-Tversky) loss to train the primary path network to achieve accurate segmentation and propose another compound loss function called tolerance loss to train the auxiliary path network to generate auxiliary compensation attention coefficient maps with expanded coverage area to perform compensate operations. We conducted segmentation experiments using the 239 MRI scans of the anatomical tracings of lesions after stroke (ATLAS) dataset to evaluate the performance and effectiveness of our method. The experimental results show that the DSC score of the proposed DPAC-UNet network is 6% higher than the single-path attention U-Net. It is also higher than the existing segmentation methods of the related literature. Therefore, our method demonstrates powerful abilities in the application of stroke lesion segmentation.

Download Full-text

Identification and Prediction of Frame Structure Dynamics by Spatial Matrix Identification Method

10.1115/imece2000-2186 ◽

2000 ◽

Author(s):

Masaaki Okuma ◽

Ward Heylen ◽

Hisayoshi Matsuoka ◽

Paul Sas

Keyword(s):

Boundary Condition ◽

Dynamic Characteristics ◽

Structural Modification ◽

Test Structure ◽

Frame Structure ◽

Structural Dynamic ◽

Identification Method ◽

Input Multiple Output ◽

Structural Dynamic Characteristics ◽

Single Input

Abstract This paper presents the result of using an experimental spatial matrix identification method to predict the dynamics of a frame structure under a different boundary condition. The single-input-multiple-output frequency response functions of the test structure under the free-free boundary condition are measured by hammer testing. Using the FRFs, a set of spatial matrices is determined to represent its structural dynamic characteristics by the method. Then, using the identified spatial matrices, the dynamic characteristics of the test structure under the boundary condition of clamping 4 points is predicted. The prediction is practically accurate. The result of the prediction demonstrates that the spatial matrices identified by the method can be used for structural modification and substructure synthesis in the field of computer aided mechanical engineering.

Download Full-text

Human Face Reconstruction from a Single Input Image Based on a Coupled Statistical Model

Bio-inspired Computing – Theories and Applications - Communications in Computer and Information Science ◽

10.1007/978-981-10-3614-9_45 ◽

2016 ◽

pp. 373-378 ◽

Cited By ~ 1

Author(s):

Yujuan Sun ◽

Muwei Jian ◽

Junyu Dong

Keyword(s):

Statistical Model ◽

Input Image ◽

Human Face ◽

Face Reconstruction ◽

Single Input

Download Full-text

An Effective and Efficient Dehazing Method of Single Input Image

Image and Graphics Technologies and Applications - Communications in Computer and Information Science ◽

10.1007/978-981-13-1702-6_59 ◽

2018 ◽

pp. 596-604

Author(s):

Fu-Qiang Han ◽

Zhan-Li Sun ◽

Ya-Min Wang

Keyword(s):

Input Image ◽

Single Input

Download Full-text

Identification and Prediction of Frame Structure Dynamics by Spatial Matrix Identification Method

Journal of Vibration and Acoustics ◽

10.1115/1.1377020 ◽

2001 ◽

Vol 123 (3) ◽

pp. 390-394 ◽

Cited By ~ 6

Author(s):

Masaaki Okuma ◽

Ward Heylen ◽

Hisayoshi Matsuoka ◽

Paul Sas

Keyword(s):

Boundary Condition ◽

Dynamic Characteristics ◽

Test Structure ◽

Frame Structure ◽

Structural Dynamic ◽

Various Boundary Conditions ◽

Identification Method ◽

Input Multiple Output ◽

Structural Dynamic Characteristics ◽

Single Input

This paper presents the results of using an experimental spatial matrix identification method to predict the dynamics of a frame structure under various boundary conditions. The single-input-multiple-output frequency response functions (FRFs) of the test structure under the free-free boundary condition are measured by hammer testing. Using the FRFs, a set of spatial matrices is constructed in order to represent the structural dynamic characteristics of the system by the new method. Using the spatial matrices, the dynamic characteristics of the test structure under the boundary condition of clamping 4 points is predicted. The prediction is adequately accurate for practical application. The results of the prediction demonstrate that the spatial matrices identified by this method can be used for structural modification and substructure synthesis in the field of computer-aided mechanical engineering.

Download Full-text

SYMBOL RECOGNITION IN A CAD ENVIRONMENT USING A NEURAL NETWORK

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213094000091 ◽

1994 ◽

Vol 03 (02) ◽

pp. 157-185 ◽

Cited By ~ 4

Author(s):

DERSHUNG YANG ◽

LARRY A. RENDELL ◽

JULIE L. WEBSTER ◽

DORIS S. SHAW ◽

JAMES H. GARRETT

Keyword(s):

Neural Network ◽

Computer Aided Design ◽

Recognition Performance ◽

Input Image ◽

Design System ◽

Feed Forward Network ◽

Geometric Knowledge ◽

Training Examples ◽

Aided Design ◽

Discriminant Power

A new neural network called AUGURS is designed to assist a user of a Computer-Aided Design system in utilizing standard graphic symbols. With AUGURS, the CAD user can avoid searching for standard symbols in a large library and rely on AUGURS to automatically retrieve those symbols resembling the user’s drawing. More specifically, AUGURS inputs a bitmap image normalized with respect to location, size, and orientation, and outputs a list of standard symbols ranked by its assessment of the similarity between the symbol and the input image. Only the top ranked symbols are presented to the user for selection. AUGURS encodes geometric knowledge into its network structure and carefully balances its discriminant power and noise tolerance. The encoded knowledge enables AUGURS to learn reasonably well despite the limited number of training examples, the most serious challenge for the CAD domain. We have compared AUGURS with the Zipcode Net, a traditional layered feed-forward network with an unconstrained structure, and a network that inputs either Zernike or pseudo-Zernike moments. The experimental results conclude that AUGURS can achieve the best recognition performance among all networks being compared with reasonable recognition and learning efficiency.

Download Full-text

Compressive Video Sensing Based on Intra-Inter-Frame Constraints and Genetic Algorithm

10.3233/faia210418 ◽

2021 ◽

Author(s):

Yuchen Yue ◽

Hua Li ◽

Jianhua Luo

Keyword(s):

Genetic Algorithm ◽

High Speed ◽

Reconstruction Algorithms ◽

Video Capture ◽

Video Signals ◽

Video Frames ◽

Practical Engineering ◽

Priori Information ◽

Inter Frame ◽

Reconstruction Model

Establishing structured reconstruction models and efficient reconstruction algorithms according to practical engineering needs is of great concern in the applied research of Compressed Sensing (CS) theory. Targeting problems during high-speed video capture, the paper proposes a set of video CS scheme based on intra-frame and inter-frame constraints and Genetic Algorithm (GA). Firstly, it employs the intra-frame and inter-frame correlation of the video signals as the priori information, creating a video CS reconstruction model on the basis of temporal and spatial similarity constraints. Then it utilizes overcomplete dictionary of Ridgelet to divide the video frames into three structures, smooth, single-oriented, or multijointed. Video frames cluster according to the structure using Affinity Propagation (AP) algorithm, and finally clusters are reconstructed using evolutionary algorithm. It is proved efficient in terms of reconstruction result in the experiment.

Download Full-text

A MODEL AND TRAINING METHOD FOR CONTEXT CLASSIFICATION IN CCTV SEWER INSPECTION VIDEO FRAMES

Radio Electronics Computer Science Control ◽

10.15588/1607-3274-2021-3-9 ◽

2021 ◽

pp. 97-108

Author(s):

V. V. Moskalenko ◽

M. O. Zaretsky ◽

A. S. Moskalenko ◽

A. O. Panych ◽

V. V. Lysyuk

Keyword(s):

Training Dataset ◽

Reference Vector ◽

Temporal Network ◽

Training Method ◽

Training Algorithm ◽

Convolutional Network ◽

Video Frames ◽

Sewer Inspection ◽

Inter Frame ◽

And Training

Context. A model and training method for observational context classification in CCTV sewer inspection vide frames was developed and researched. The object of research is the process of detection of temporal-spatial context during CCTV sewer inspections. The subjects of the research are machine learning model and training method for classification analysis of CCTV video sequences under the limited and imbalanced training dataset constraint. Objective. Stated research goal is to develop an efficient context classifier model and training algorithm for CCTV sewer inspection video frames under the constraint of the limited and imbalanced labeled training set. Methods. The four-stage training algorithm of the classifier is proposed. The first stage involves training with soft triplet loss and regularisation component which penalises the network’s binary output code rounding error. The next stage is needed to determine the binary code for each class according to the principles of error-correcting output codes with accounting for intra- and interclass relationship. The resulting reference vector for each class is then used as a sample label for the future training with Joint Binary Cross Entropy Loss. The last machine learning stage is related to decision rule parameter optimization according to the information criteria to determine the boundaries of deviation of binary representation of observations for each class from the corresponding reference vector. A 2D convolutional frame feature extractor combined with the temporal network for inter-frame dependency analysis is considered. Variants with 1D Dilated Regular Convolutional Network, 1D Dilated Causal Convolutional Network, LSTM Network, GRU Network are considered. Model efficiency comparison is made on the basis of micro averaged F1 score calculated on the test dataset. Results. Results obtained on the dataset provided by Ace Pipe Cleaning, Inc confirm the suitability of the model and method for practical use, the resulting accuracy equals 92%. Comparison of the training outcome with the proposed method against the conventional methods indicated a 4% advantage in micro averaged F1 score. Further analysis of the confusion matrix had shown that the most significant increase in accuracy in comparison with the conventional methods is achieved for complex classes which combine both camera orientation and the sewer pipe construction features. Conclusions. The scientific novelty of the work lies in the new models and methods of classification analysis of the temporalspatial context when automating CCTV sewer inspections under imbalanced and limited training dataset conditions. Training results obtained with the proposed method were compared with the results obtained with the conventional method. The proposed method showed 4% advantage in micro averaged F1 score. It had been empirically proven that the use of the regular convolutional temporal network architecture is the most efficient in utilizing inter-frame dependencies. Resulting accuracy is suitable for practical use, as the additional error correction can be made by using the odometer data.

Download Full-text

An Efficient Handwritten Character Recognition Using Quantum Multilayer Neural Network (QMLNN) Architecture

Research Anthology on Advancements in Quantum Technology ◽

10.4018/978-1-7998-8593-1.ch021 ◽

2021 ◽

pp. 435-446

Author(s):

Debanjan Konar ◽

Suman Kalyan Kar

Keyword(s):

Neural Network ◽

Character Recognition ◽

Input Image ◽

Handwritten Character Recognition ◽

Handwritten Character ◽

Fuzzy Input ◽

Feed Forward Network ◽

Network Output ◽

Fuzziness Measure ◽

Neighborhood Topology

This chapter proposes a quantum multi-layer neural network (QMLNN) architecture suitable for handwritten character recognition in real time, assisted by quantum backpropagation of errors calculated from the quantum-inspired fuzziness measure of network output states. It is composed of three second-order neighborhood-topology-based inter-connected layers of neurons represented by qubits known as input, hidden, and output layers. The QMLNN architecture is a feed forward network with standard quantum backpropagation algorithm for the adjustment of its weighted interconnection. QMLNN self-organizes the quantum fuzzy input image information by means of the quantum backpropagating errors at the intermediate and output layers of the architecture. The interconnection weights are described using rotation gates. After the network is stabilized, a quantum observation at the output layer destroys the superposition of quantum states in order to obtain true binary outputs.

Download Full-text