Near-Duplicate Image Detection System Using Coarse-to-Fine Matching Scheme Based on Global and Local CNN Features

Zhili Zhou; Kunde Lin; Yi Cao; Ching-Nung Yang; Yuling Liu

doi:10.3390/math8040644

Near-Duplicate Image Detection System Using Coarse-to-Fine Matching Scheme Based on Global and Local CNN Features

Mathematics ◽

10.3390/math8040644 ◽

2020 ◽

Vol 8 (4) ◽

pp. 644 ◽

Cited By ~ 2

Author(s):

Zhili Zhou ◽

Kunde Lin ◽

Yi Cao ◽

Ching-Nung Yang ◽

Yuling Liu

Keyword(s):

Real Time ◽

High Efficiency ◽

Feature Matching ◽

Saliency Detection ◽

Great Success ◽

Feature Maps ◽

Image Detection ◽

Global And Local ◽

Coarse To Fine ◽

Duplicate Image Detection

Due to the great success of convolutional neural networks (CNNs) in the area of computer vision, the existing methods tend to match the global or local CNN features between images for near-duplicate image detection. However, global CNN features are not robust enough to combat background clutter and partial occlusion, while local CNN features lead to high computational complexity in the step of feature matching. To achieve high efficiency while maintaining good accuracy, we propose a coarse-to-fine feature matching scheme using both global and local CNN features for real-time near-duplicate image detection. In the coarse matching stage, we implement the sum-pooling operation on convolutional feature maps (CFMs) to generate the global CNN features, and match these global CNN features between a given query image and database images to efficiently filter most of irrelevant images of the query. In the fine matching stage, the local CNN features are extracted by using maximum values of the CFMs and the saliency map generated by the graph-based visual saliency detection (GBVS) algorithm. These local CNN features are then matched between images to detect the near-duplicate versions of the query. Experimental results demonstrate that our proposed method not only achieves a real-time detection, but also provides higher accuracy than the state-of-the-art methods.

Download Full-text

Integrating SIFT and CNN Feature Matching for Partial-Duplicate Image Detection

IEEE Transactions on Emerging Topics in Computational Intelligence ◽

10.1109/tetci.2019.2909936 ◽

2020 ◽

Vol 4 (5) ◽

pp. 593-604

Author(s):

Zhili Zhou ◽

Q. M. Jonathan Wu ◽

Shaohua Wan ◽

Wendi Sun ◽

Xingming Sun

Keyword(s):

Feature Matching ◽

Image Detection ◽

Duplicate Image Detection

Download Full-text

Erratum to: Real-time, large-scale duplicate image detection method based on multi-feature fusion

Journal of Real-Time Image Processing ◽

10.1007/s11554-017-0673-8 ◽

2017 ◽

Vol 16 (5) ◽

pp. 1881-1881

Author(s):

Ming Chen ◽

Yuhua Li ◽

Zhifeng Zhang ◽

Ching-Hsien Hsu ◽

Shangguang Wang

Keyword(s):

Real Time ◽

Large Scale ◽

Detection Method ◽

Feature Fusion ◽

Image Detection ◽

Duplicate Image Detection

Download Full-text

Real-time, large-scale duplicate image detection method based on multi-feature fusion

Journal of Real-Time Image Processing ◽

10.1007/s11554-016-0632-9 ◽

2016 ◽

Vol 13 (3) ◽

pp. 557-570 ◽

Cited By ~ 6

Author(s):

Ming Chen ◽

Yuhua Li ◽

Zhifeng Zhang ◽

Ching-Hsien Hsu ◽

Shangguang Wang

Keyword(s):

Real Time ◽

Large Scale ◽

Detection Method ◽

Feature Fusion ◽

Image Detection ◽

Duplicate Image Detection

Download Full-text

Efficient coarse-to-fine near-duplicate image detection in riemannian manifold

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2012.6288048 ◽

2012 ◽

Cited By ~ 1

Author(s):

Ligang Zheng ◽

Guoping Qiu ◽

Jiwu Huang

Keyword(s):

Riemannian Manifold ◽

Image Detection ◽

Coarse To Fine ◽

Duplicate Image Detection

Download Full-text

Safety Monitoring and Warning System for Subway Construction Workers using Wearable Technology

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666191127111248 ◽

2019 ◽

Vol 13 ◽

Author(s):

Jun-hua Chen ◽

Da-hu Wang ◽

Cun-yuan Sun

Keyword(s):

Real Time ◽

Video Surveillance ◽

Early Warning ◽

Warning System ◽

Safety Monitoring ◽

Wearable Technology ◽

Construction Workers ◽

Great Success ◽

Subway Construction ◽

Monitoring And Early Warning

Objective: This study focused on the application of wearable technology in the safety monitoring and early warning for subway construction workers. Methods: With the help of real-time video surveillance and RFID positioning which was applied in the construction has realized the real-time monitoring and early warning of on-site construction to a certain extent, but there are still some problems. Real-time video surveillance technology relies on monitoring equipment, while the location of the equipment is fixed, so it is difficult to meet the full coverage of the construction site. However, wearable technologies can solve this problem, they have outstanding performance in collecting workers’ information, especially physiological state data and positioning data. Meanwhile, wearable technology has no impact on work and is not subject to the inference of dynamic environment. Results and conclusion: The first time the system applied to subway construction was a great success. During the construction of the station, the number of occurrences of safety warnings was 43 times, but the number of occurrences of safety accidents was 0, which showed that the safety monitoring and early warning system played a significant role and worked out perfectly.

Download Full-text

Low-Cost Embedded System Using Convolutional Neural Networks-Based Spatiotemporal Feature Map for Real-Time Human Action Recognition

Applied Sciences ◽

10.3390/app11114940 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4940

Author(s):

Jinsoo Kim ◽

Jeongho Cho

Keyword(s):

Embedded System ◽

Real Time ◽

Action Recognition ◽

Processing Speed ◽

Recognition Accuracy ◽

Low Cost ◽

Human Action Recognition ◽

Human Action ◽

Video Data ◽

Feature Maps

The field of research related to video data has difficulty in extracting not only spatial but also temporal features and human action recognition (HAR) is a representative field of research that applies convolutional neural network (CNN) to video data. The performance for action recognition has improved, but owing to the complexity of the model, some still limitations to operation in real-time persist. Therefore, a lightweight CNN-based single-stream HAR model that can operate in real-time is proposed. The proposed model extracts spatial feature maps by applying CNN to the images that develop the video and uses the frame change rate of sequential images as time information. Spatial feature maps are weighted-averaged by frame change, transformed into spatiotemporal features, and input into multilayer perceptrons, which have a relatively lower complexity than other HAR models; thus, our method has high utility in a single embedded system connected to CCTV. The results of evaluating action recognition accuracy and data processing speed through challenging action recognition benchmark UCF-101 showed higher action recognition accuracy than the HAR model using long short-term memory with a small amount of video frames and confirmed the real-time operational possibility through fast data processing speed. In addition, the performance of the proposed weighted mean-based HAR model was verified by testing it in Jetson NANO to confirm the possibility of using it in low-cost GPU-based embedded systems.

Download Full-text

A Global-Local Blur Disentangling Network for Dynamic Scene Deblurring

Applied Sciences ◽

10.3390/app11052174 ◽

2021 ◽

Vol 11 (5) ◽

pp. 2174

Author(s):

Xiaoguang Li ◽

Feifan Yang ◽

Jianglu Huang ◽

Li Zhuo

Keyword(s):

Local Features ◽

Attention Mechanism ◽

Experimental Results ◽

Dynamic Scene ◽

Feature Maps ◽

Training Scheme ◽

Real Scene ◽

Global And Local

Images captured in a real scene usually suffer from complex non-uniform degradation, which includes both global and local blurs. It is difficult to handle the complex blur variances by a unified processing model. We propose a global-local blur disentangling network, which can effectively extract global and local blur features via two branches. A phased training scheme is designed to disentangle the global and local blur features, that is the branches are trained with task-specific datasets, respectively. A branch attention mechanism is introduced to dynamically fuse global and local features. Complex blurry images are used to train the attention module and the reconstruction module. The visualized feature maps of different branches indicated that our dual-branch network can decouple the global and local blur features efficiently. Experimental results show that the proposed dual-branch blur disentangling network can improve both the subjective and objective deblurring effects for real captured images.

Download Full-text

Real-Time Environment Monitoring Using a Lightweight Image Super-Resolution Network

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18115890 ◽

2021 ◽

Vol 18 (11) ◽

pp. 5890

Author(s):

Qiang Yu ◽

Feiqiang Liu ◽

Long Xiao ◽

Zitao Liu ◽

Xiaomin Yang

Keyword(s):

Deep Learning ◽

Real Time ◽

Super Resolution ◽

Model Complexity ◽

Practical Application ◽

Single Image ◽

Feature Maps ◽

Benchmark Datasets ◽

Image Super Resolution ◽

Single Image Super Resolution

Deep-learning (DL)-based methods are of growing importance in the field of single image super-resolution (SISR). The practical application of these DL-based models is a remaining problem due to the requirement of heavy computation and huge storage resources. The powerful feature maps of hidden layers in convolutional neural networks (CNN) help the model learn useful information. However, there exists redundancy among feature maps, which can be further exploited. To address these issues, this paper proposes a lightweight efficient feature generating network (EFGN) for SISR by constructing the efficient feature generating block (EFGB). Specifically, the EFGB can conduct plain operations on the original features to produce more feature maps with parameters slightly increasing. With the help of these extra feature maps, the network can extract more useful information from low resolution (LR) images to reconstruct the desired high resolution (HR) images. Experiments conducted on the benchmark datasets demonstrate that the proposed EFGN can outperform other deep-learning based methods in most cases and possess relatively lower model complexity. Additionally, the running time measurement indicates the feasibility of real-time monitoring.

Download Full-text

Evolution of a Web-Scale Near Duplicate Image Detection System

Proceedings of The Web Conference 2020 ◽

10.1145/3366423.3380031 ◽

2020 ◽

Author(s):

Andrey Gusev ◽

Jiajing Xu

Keyword(s):

Detection System ◽

Image Detection ◽

Duplicate Image Detection

Download Full-text

A Real-Time Apple Targets Detection Method for Picking Robot Based on Improved YOLOv5

Remote Sensing ◽

10.3390/rs13091619 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1619

Author(s):

Bin Yan ◽

Pan Fan ◽

Xiaoyan Lei ◽

Zhijie Liu ◽

Fuzeng Yang

Keyword(s):

Real Time ◽

Apple Tree ◽

Detection Method ◽

Recognition Algorithm ◽

Medium Size ◽

Feature Maps ◽

Detection Algorithms ◽

Fusion Mode ◽

Visual Attention Mechanism ◽

Tree Image

The apple target recognition algorithm is one of the core technologies of the apple picking robot. However, most of the existing apple detection algorithms cannot distinguish between the apples that are occluded by tree branches and occluded by other apples. The apples, grasping end-effector and mechanical picking arm of the robot are very likely to be damaged if the algorithm is directly applied to the picking robot. Based on this practical problem, in order to automatically recognize the graspable and ungraspable apples in an apple tree image, a light-weight apple targets detection method was proposed for picking robot using improved YOLOv5s. Firstly, BottleneckCSP module was improved designed to BottleneckCSP-2 module which was used to replace the BottleneckCSP module in backbone architecture of original YOLOv5s network. Secondly, SE module, which belonged to the visual attention mechanism network, was inserted to the proposed improved backbone network. Thirdly, the bonding fusion mode of feature maps, which were inputs to the target detection layer of medium size in the original YOLOv5s network, were improved. Finally, the initial anchor box size of the original network was improved. The experimental results indicated that the graspable apples, which were unoccluded or only occluded by tree leaves, and the ungraspable apples, which were occluded by tree branches or occluded by other fruits, could be identified effectively using the proposed improved network model in this study. Specifically, the recognition recall, precision, mAP and F1 were 91.48%, 83.83%, 86.75% and 87.49%, respectively. The average recognition time was 0.015 s per image. Contrasted with original YOLOv5s, YOLOv3, YOLOv4 and EfficientDet-D0 model, the mAP of the proposed improved YOLOv5s model increased by 5.05%, 14.95%, 4.74% and 6.75% respectively, the size of the model compressed by 9.29%, 94.6%, 94.8% and 15.3% respectively. The average recognition speeds per image of the proposed improved YOLOv5s model were 2.53, 1.13 and 3.53 times of EfficientDet-D0, YOLOv4 and YOLOv3 and model, respectively. The proposed method can provide technical support for the real-time accurate detection of multiple fruit targets for the apple picking robot.

Download Full-text