Small-Scale Face Detection Based on Improved R-FCN

Chaowei Tang; Shiyu Chen; Xu Zhou; Shuai Ruan; Haotian Wen

doi:10.3390/app10124177

Small-Scale Face Detection Based on Improved R-FCN

Applied Sciences ◽

10.3390/app10124177 ◽

2020 ◽

Vol 10 (12) ◽

pp. 4177

Author(s):

Chaowei Tang ◽

Shiyu Chen ◽

Xu Zhou ◽

Shuai Ruan ◽

Haotian Wen

Keyword(s):

Receptive Field ◽

Face Detection ◽

Feature Fusion ◽

Selection Method ◽

Local Information ◽

Small Scale ◽

Average Precision ◽

Convolutional Network ◽

High Layer ◽

Novel Method

Face detection is an important basic technique for face-related applications, such as face analysis, recognition, and reconstruction. Images in unconstrained scenes may contain many small-scale faces. The features that the detector can extract from small-scale faces are limited, which will cause missed detection and greatly reduce the precision of face detection. Therefore, this study proposes a novel method to detect small-scale faces based on region-based fully convolutional network (R-FCN). First, we propose a novel R-FCN framework with the ability of feature fusion and receptive field adaptation. Second, a bottom-up feature fusion branch is established to enrich the local information of high-layer features. Third, a receptive field adaptation block (RFAB) is proposed to ensure that the receptive field can be adaptively selected to strengthen the expression ability of features. Finally, we improve the anchor setting method and adopt soft non-maximum suppression (SoftNMS) as the selection method of candidate boxes. Experimental results show that average precision for small-scale face detection of R-FCN with feature fusion branch and RFAB (RFAB-f-R-FCN) is improved by 0.8%, 2.9%, and 11% on three subsets of Wider Face compared with that of R-FCN.

Download Full-text

SE-IYOLOV3: An Accurate Small Scale Face Detector for Outdoor Security

Mathematics ◽

10.3390/math8010093 ◽

2020 ◽

Vol 8 (1) ◽

pp. 93 ◽

Cited By ~ 2

Author(s):

Zhenrong Deng ◽

Rui Yang ◽

Rushi Lan ◽

Zhenbing Liu ◽

Xiaonan Luo

Keyword(s):

Receptive Field ◽

Face Detection ◽

Network Structure ◽

Difficult Problem ◽

Detection Performance ◽

Experimental Results ◽

Small Scale ◽

Detection Accuracy ◽

Novel Method ◽

Face Detector

Small scale face detection is a very difficult problem. In order to achieve a higher detection accuracy, we propose a novel method, termed SE-IYOLOV3, for small scale face in this work. In SE-IYOLOV3, we improve the YOLOV3 first, in which the anchorage box with a higher average intersection ratio is obtained by combining niche technology on the basis of the k-means algorithm. An upsampling scale is added to form a face network structure that is suitable for detecting dense small scale faces. The number of prediction boxes is five times more than the YOLOV3 network. To further improve the detection performance, we adopt the SENet structure to enhance the global receptive field of the network. The experimental results on the WIDERFACEdataset show that the IYOLOV3 network embedded in the SENet structure can significantly improve the detection accuracy of dense small scale faces.

Download Full-text

Feature Extraction and Fusion Using Deep Convolutional Neural Networks for Face Detection

Mathematical Problems in Engineering ◽

10.1155/2017/1376726 ◽

2017 ◽

Vol 2017 ◽

pp. 1-9 ◽

Cited By ~ 8

Author(s):

Xiaojun Lu ◽

Xu Duan ◽

Xiuping Mao ◽

Yuanyuan Li ◽

Xiangde Zhang

Keyword(s):

Feature Extraction ◽

Face Detection ◽

Feature Fusion ◽

Binary Classification ◽

Recall Rate ◽

Feature Representation ◽

Svm Classifier ◽

Computation Complexity ◽

Average Precision ◽

Deep Convolutional Neural Networks

This paper proposes a method that uses feature fusion to represent images better for face detection after feature extraction by deep convolutional neural network (DCNN). First, with Clarifai net and VGG Net-D (16 layers), we learn features from data, respectively; then we fuse features extracted from the two nets. To obtain more compact feature representation and mitigate computation complexity, we reduce the dimension of the fused features by PCA. Finally, we conduct face classification by SVM classifier for binary classification. In particular, we exploit offset max-pooling to extract features with sliding window densely, which leads to better matches of faces and detection windows; thus the detection result is more accurate. Experimental results show that our method can detect faces with severe occlusion and large variations in pose and scale. In particular, our method achieves 89.24% recall rate on FDDB and 97.19% average precision on AFW.

Download Full-text

Densely Connected Pyramidal Dilated Convolutional Network for Hyperspectral Image Classification

Remote Sensing ◽

10.3390/rs13173396 ◽

2021 ◽

Vol 13 (17) ◽

pp. 3396

Author(s):

Feng Zhao ◽

Junjie Zhang ◽

Zhe Meng ◽

Hanqiang Liu

Keyword(s):

Receptive Field ◽

Spatial Information ◽

Hyperspectral Image ◽

Feature Fusion ◽

Receptive Fields ◽

Classification Performance ◽

Convolutional Network ◽

Dilated Convolution ◽

Spatial Features ◽

Good Classification Performance

Recently, with the extensive application of deep learning techniques in the hyperspectral image (HSI) field, particularly convolutional neural network (CNN), the research of HSI classification has stepped into a new stage. To avoid the problem that the receptive field of naive convolution is small, the dilated convolution is introduced into the field of HSI classification. However, the dilated convolution usually generates blind spots in the receptive field, resulting in discontinuous spatial information obtained. In order to solve the above problem, a densely connected pyramidal dilated convolutional network (PDCNet) is proposed in this paper. Firstly, a pyramidal dilated convolutional (PDC) layer integrates different numbers of sub-dilated convolutional layers is proposed, where the dilated factor of the sub-dilated convolution increases exponentially, achieving multi-sacle receptive fields. Secondly, the number of sub-dilated convolutional layers increases in a pyramidal pattern with the depth of the network, thereby capturing more comprehensive hyperspectral information in the receptive field. Furthermore, a feature fusion mechanism combining pixel-by-pixel addition and channel stacking is adopted to extract more abstract spectral–spatial features. Finally, in order to reuse the features of the previous layers more effectively, dense connections are applied in densely pyramidal dilated convolutional (DPDC) blocks. Experiments on three well-known HSI datasets indicate that PDCNet proposed in this paper has good classification performance compared with other popular models.

Download Full-text

A Vehicle and Pedestrian Detection Method Based on Improved YOLOv4-Tiny

International Journal of Science and Engineering Applications ◽

10.7753/ijsea1101.1003 ◽

2022 ◽

Vol 11 (01) ◽

pp. 22-26

Author(s):

Hui Xiang ◽

Junyan Han ◽

Hanqing Wang ◽

Hao Li ◽

Shangqing Li ◽

...

Keyword(s):

Detection Method ◽

Feature Fusion ◽

Pedestrian Detection ◽

Mean Average Precision ◽

Detection Methods ◽

Small Scale ◽

Detection Accuracy ◽

Improved Method ◽

Average Precision ◽

The Mean

Aiming at the problems of low detection accuracy and poor recognition effect of small-scale targets in traditional vehicle and pedestrian detection methods, a vehicle and pedestrian detection method based on improved YOLOv4-Tiny is proposed. On the basis of YOLOv4-Tiny, the 8-fold down sampling feature layer was added for feature fusion, the PANet structure was used to perform bidirectional fusion for the deep and shallow features from the output feature layer of backbone network, and the detection head for small targets was added. The results show that the mean average precision of the improved method has reached 85.93%, and the detection performance is similar to that of YOLOv4. Compared with the YOLOv4-Tiny, the mean average precision of the improved method is increased by 24.45%, and the detection speed reaches 67.83FPS, which means that the detection effect is significantly improved and can meet the real-time requirements.

Download Full-text

Mask-Refined R-CNN: A Network for Refining Object Details in Instance Segmentation

Sensors ◽

10.3390/s20041010 ◽

2020 ◽

Vol 20 (4) ◽

pp. 1010 ◽

Cited By ~ 13

Author(s):

Yiqing Zhang ◽

Jun Chu ◽

Lu Leng ◽

Jun Miao

Keyword(s):

Receptive Field ◽

Spatial Information ◽

Feature Fusion ◽

State Of The Art ◽

Rapid Development ◽

Experimental Results ◽

Small Scale ◽

Feature Maps ◽

Segmentation Accuracy ◽

Instance Segmentation

With the rapid development of flexible vision sensors and visual sensor networks, computer vision tasks, such as object detection and tracking, are entering a new phase. Accordingly, the more challenging comprehensive task, including instance segmentation, can develop rapidly. Most state-of-the-art network frameworks, for instance, segmentation, are based on Mask R-CNN (mask region-convolutional neural network). However, the experimental results confirm that Mask R-CNN does not always successfully predict instance details. The scale-invariant fully convolutional network structure of Mask R-CNN ignores the difference in spatial information between receptive fields of different sizes. A large-scale receptive field focuses more on detailed information, whereas a small-scale receptive field focuses more on semantic information. So the network cannot consider the relationship between the pixels at the object edge, and these pixels will be misclassified. To overcome this problem, Mask-Refined R-CNN (MR R-CNN) is proposed, in which the stride of ROIAlign (region of interest align) is adjusted. In addition, the original fully convolutional layer is replaced with a new semantic segmentation layer that realizes feature fusion by constructing a feature pyramid network and summing the forward and backward transmissions of feature maps of the same resolution. The segmentation accuracy is substantially improved by combining the feature layers that focus on the global and detailed information. The experimental results on the COCO (Common Objects in Context) and Cityscapes datasets demonstrate that the segmentation accuracy of MR R-CNN is about 2% higher than that of Mask R-CNN using the same backbone. The average precision of large instances reaches 56.6%, which is higher than those of all state-of-the-art methods. In addition, the proposed method requires low time cost and is easily implemented. The experiments on the Cityscapes dataset also prove that the proposed method has great generalization ability.

Download Full-text

SACN: A Novel Rotating Face Detector Based on Architecture Search

Electronics ◽

10.3390/electronics10050558 ◽

2021 ◽

Vol 10 (5) ◽

pp. 558

Author(s):

Anping Song ◽

Xiaokang Xu ◽

Xinyi Zhai

Keyword(s):

Face Detection ◽

Human Face ◽

Angle Error ◽

Rotation Invariant ◽

Convolutional Network ◽

Data Set ◽

Practical Applications ◽

Model Size ◽

Average Angle ◽

Face Detector

Rotation-Invariant Face Detection (RIPD) has been widely used in practical applications; however, the problem of the adjusting of the rotation-in-plane (RIP) angle of the human face still remains. Recently, several methods based on neural networks have been proposed to solve the RIP angle problem. However, these methods have various limitations, including low detecting speed, model size, and detecting accuracy. To solve the aforementioned problems, we propose a new network, called the Searching Architecture Calibration Network (SACN), which utilizes architecture search, fully convolutional network (FCN) and bounding box center cluster (CC). SACN was tested on the challenging Multi-Oriented Face Detection Data Set and Benchmark (MOFDDB) and achieved a higher detecting accuracy and almost the same speed as existing detectors. Moreover, the average angle error is optimized from the current 12.6° to 10.5°.

Download Full-text

A Novel Method to Predict Drug-Target Interactions Based on Large-Scale Graph Representation Learning

Cancers ◽

10.3390/cancers13092111 ◽

2021 ◽

Vol 13 (9) ◽

pp. 2111

Author(s):

Bo-Wei Zhao ◽

Zhu-Hong You ◽

Lun Hu ◽

Zhen-Hao Guo ◽

Lei Wang ◽

...

Keyword(s):

Drug Target ◽

Large Scale ◽

Computational Models ◽

Structural Information ◽

Characteristic Curve ◽

Representation Learning ◽

Graph Representation ◽

Convolutional Network ◽

Novel Method

Identification of drug-target interactions (DTIs) is a significant step in the drug discovery or repositioning process. Compared with the time-consuming and labor-intensive in vivo experimental methods, the computational models can provide high-quality DTI candidates in an instant. In this study, we propose a novel method called LGDTI to predict DTIs based on large-scale graph representation learning. LGDTI can capture the local and global structural information of the graph. Specifically, the first-order neighbor information of nodes can be aggregated by the graph convolutional network (GCN); on the other hand, the high-order neighbor information of nodes can be learned by the graph embedding method called DeepWalk. Finally, the two kinds of feature are fed into the random forest classifier to train and predict potential DTIs. The results show that our method obtained area under the receiver operating characteristic curve (AUROC) of 0.9455 and area under the precision-recall curve (AUPR) of 0.9491 under 5-fold cross-validation. Moreover, we compare the presented method with some existing state-of-the-art methods. These results imply that LGDTI can efficiently and robustly capture undiscovered DTIs. Moreover, the proposed model is expected to bring new inspiration and provide novel perspectives to relevant researchers.

Download Full-text

Improved SSD-assisted algorithm for surface defect detection of electromagnetic luminescence

Proceedings of the Institution of Mechanical Engineers Part O Journal of Risk and Reliability ◽

10.1177/1748006x21995388 ◽

2021 ◽

pp. 1748006X2199538

Author(s):

Zhenying Xu ◽

Ziqian Wu ◽

Wei Fan

Keyword(s):

Defect Detection ◽

Feature Fusion ◽

Recognition Rate ◽

Detection Methods ◽

Small Scale ◽

Detection Accuracy ◽

Single Shot ◽

Surface Defect Detection ◽

Feature Pyramid ◽

Small Feature

Defect detection of electromagnetic luminescence (EL) cells is the core step in the production and preparation of solar cell modules to ensure conversion efficiency and long service life of batteries. However, due to the lack of feature extraction capability for small feature defects, the traditional single shot multibox detector (SSD) algorithm performs not well in EL defect detection with high accuracy. Consequently, an improved SSD algorithm with modification in feature fusion in the framework of deep learning is proposed to improve the recognition rate of EL multi-class defects. A dataset containing images with four different types of defects through rotation, denoising, and binarization is established for the EL. The proposed algorithm can greatly improve the detection accuracy of the small-scale defect with the idea of feature pyramid networks. An experimental study on the detection of the EL defects shows the effectiveness of the proposed algorithm. Moreover, a comparison study shows the proposed method outperforms other traditional detection methods, such as the SIFT, Faster R-CNN, and YOLOv3, in detecting the EL defect.

Download Full-text

Driver Drowsiness Estimation Based on Factorized Bilinear Feature Fusion and a Long-Short-Term Recurrent Convolutional Network

Information ◽

10.3390/info12010003 ◽

2020 ◽

Vol 12 (1) ◽

pp. 3

Author(s):

Shuang Chen ◽

Zengcai Wang ◽

Wenxin Chen

Keyword(s):

Short Term Memory ◽

Feature Fusion ◽

Detection Methods ◽

Video Frame ◽

Estimation Model ◽

Short Term ◽

Convolutional Network ◽

Drowsiness Detection ◽

Driver Drowsiness ◽

Time Information

The effective detection of driver drowsiness is an important measure to prevent traffic accidents. Most existing drowsiness detection methods only use a single facial feature to identify fatigue status, ignoring the complex correlation between fatigue features and the time information of fatigue features, and this reduces the recognition accuracy. To solve these problems, we propose a driver sleepiness estimation model based on factorized bilinear feature fusion and a long- short-term recurrent convolutional network to detect driver drowsiness efficiently and accurately. The proposed framework includes three models: fatigue feature extraction, fatigue feature fusion, and driver drowsiness detection. First, we used a convolutional neural network (CNN) to effectively extract the deep representation of eye and mouth-related fatigue features from the face area detected in each video frame. Then, based on the factorized bilinear feature fusion model, we performed a nonlinear fusion of the deep feature representations of the eyes and mouth. Finally, we input a series of fused frame-level features into a long-short-term memory (LSTM) unit to obtain the time information of the features and used the softmax classifier to detect sleepiness. The proposed framework was evaluated with the National Tsing Hua University drowsy driver detection (NTHU-DDD) video dataset. The experimental results showed that this method had better stability and robustness compared with other methods.

Download Full-text

Building Multi-Feature Fusion Refined Network for Building Extraction from High-Resolution Remote Sensing Images

Remote Sensing ◽

10.3390/rs13142794 ◽

2021 ◽

Vol 13 (14) ◽

pp. 2794

Author(s):

Shuhao Ran ◽

Xianjun Gao ◽

Yuanwei Yang ◽

Shaohua Li ◽

Guangbin Zhang ◽

...

Keyword(s):

Feature Fusion ◽

Small Scale ◽

Automatic Extraction ◽

Learning Approaches ◽

Building Extraction ◽

Visual Interpretation ◽

Learning Capacity ◽

Convolutional Networks ◽

Fully Convolutional Networks ◽

Great Progress

Deep learning approaches have been widely used in building automatic extraction tasks and have made great progress in recent years. However, the missing detection and wrong detection causing by spectrum confusion is still a great challenge. The existing fully convolutional networks (FCNs) cannot effectively distinguish whether the feature differences are from one building or the building and its adjacent non-building objects. In order to overcome the limitations, a building multi-feature fusion refined network (BMFR-Net) was presented in this paper to extract buildings accurately and completely. BMFR-Net is based on an encoding and decoding structure, mainly consisting of two parts: the continuous atrous convolution pyramid (CACP) module and the multiscale output fusion constraint (MOFC) structure. The CACP module is positioned at the end of the contracting path and it effectively minimizes the loss of effective information in multiscale feature extraction and fusion by using parallel continuous small-scale atrous convolution. To improve the ability to aggregate semantic information from the context, the MOFC structure performs predictive output at each stage of the expanding path and integrates the results into the network. Furthermore, the multilevel joint weighted loss function effectively updates parameters well away from the output layer, enhancing the learning capacity of the network for low-level abstract features. The experimental results demonstrate that the proposed BMFR-Net outperforms the other five state-of-the-art approaches in both visual interpretation and quantitative evaluation.

Download Full-text