scholarly journals Deep Learning for Facial Beauty Prediction

Information ◽  
2020 ◽  
Vol 11 (8) ◽  
pp. 391
Author(s):  
Kerang Cao ◽  
Kwang-nam Choi ◽  
Hoekyung Jung ◽  
Lini Duan

Facial beauty prediction (FBP) is a burgeoning issue for attractiveness evaluation, which aims to make assessment consistent with human opinion. Since FBP is a regression problem, to handle this issue, there are data-driven methods for finding the relations between facial features and beauty assessment. Recently, deep learning methods have shown its amazing capacity for feature representation and analysis. Convolutional neural networks (CNNs) have shown tremendous performance on facial recognition and comprehension, which are proved as an effective method for facial feature exploration. Lately, there are well-designed networks with efficient structures investigated for better representation performance. However, these designs concentrate on the effective block but do not build an efficient information transmission pathway, which led to a sub-optimal capacity for feature representation. Furthermore, these works cannot find the inherent correlations of feature maps, which also limits the performance. In this paper, an elaborate network design for FBP issue is proposed for better performance. A residual-in-residual (RIR) structure is introduced to the network for passing the gradient flow deeper, and building a better pathway for information transmission. By applying the RIR structure, a deeper network can be established for better feature representation. Besides the RIR network design, an attention mechanism is introduced to exploit the inner correlations among features. We investigate a joint spatial-wise and channel-wise attention (SCA) block to distribute the importance among features, which finds a better representation for facial information. Experimental results show our proposed network can predict facial beauty closer to a human’s assessment than state-of-the-arts.

2019 ◽  
Vol 11 (11) ◽  
pp. 1382 ◽  
Author(s):  
Daifeng Peng ◽  
Yongjun Zhang ◽  
Haiyan Guan

Change detection (CD) is essential to the accurate understanding of land surface changes using available Earth observation data. Due to the great advantages in deep feature representation and nonlinear problem modeling, deep learning is becoming increasingly popular to solve CD tasks in remote-sensing community. However, most existing deep learning-based CD methods are implemented by either generating difference images using deep features or learning change relations between pixel patches, which leads to error accumulation problems since many intermediate processing steps are needed to obtain final change maps. To address the above-mentioned issues, a novel end-to-end CD method is proposed based on an effective encoder-decoder architecture for semantic segmentation named UNet++, where change maps could be learned from scratch using available annotated datasets. Firstly, co-registered image pairs are concatenated as an input for the improved UNet++ network, where both global and fine-grained information can be utilized to generate feature maps with high spatial accuracy. Then, the fusion strategy of multiple side outputs is adopted to combine change maps from different semantic levels, thereby generating a final change map with high accuracy. The effectiveness and reliability of our proposed CD method are verified on very-high-resolution (VHR) satellite image datasets. Extensive experimental results have shown that our proposed approach outperforms the other state-of-the-art CD methods.


2020 ◽  
Vol 16 (6) ◽  
pp. 3721-3730 ◽  
Author(s):  
Xiaofeng Yuan ◽  
Jiao Zhou ◽  
Biao Huang ◽  
Yalin Wang ◽  
Chunhua Yang ◽  
...  

Sensors ◽  
2021 ◽  
Vol 21 (16) ◽  
pp. 5312
Author(s):  
Yanni Zhang ◽  
Yiming Liu ◽  
Qiang Li ◽  
Jianzhong Wang ◽  
Miao Qi ◽  
...  

Recently, deep learning-based image deblurring and deraining have been well developed. However, most of these methods fail to distill the useful features. What is more, exploiting the detailed image features in a deep learning framework always requires a mass of parameters, which inevitably makes the network suffer from a high computational burden. We propose a lightweight fusion distillation network (LFDN) for image deblurring and deraining to solve the above problems. The proposed LFDN is designed as an encoder–decoder architecture. In the encoding stage, the image feature is reduced to various small-scale spaces for multi-scale information extraction and fusion without much information loss. Then, a feature distillation normalization block is designed at the beginning of the decoding stage, which enables the network to distill and screen valuable channel information of feature maps continuously. Besides, an information fusion strategy between distillation modules and feature channels is also carried out by the attention mechanism. By fusing different information in the proposed approach, our network can achieve state-of-the-art image deblurring and deraining results with a smaller number of parameters and outperform the existing methods in model complexity.


Author(s):  
Hung Phuoc Truong ◽  
Thanh Phuong Nguyen ◽  
Yong-Guk Kim

AbstractWe present a novel framework for efficient and robust facial feature representation based upon Local Binary Pattern (LBP), called Weighted Statistical Binary Pattern, wherein the descriptors utilize the straight-line topology along with different directions. The input image is initially divided into mean and variance moments. A new variance moment, which contains distinctive facial features, is prepared by extracting root k-th. Then, when Sign and Magnitude components along four different directions using the mean moment are constructed, a weighting approach according to the new variance is applied to each component. Finally, the weighted histograms of Sign and Magnitude components are concatenated to build a novel histogram of Complementary LBP along with different directions. A comprehensive evaluation using six public face datasets suggests that the present framework outperforms the state-of-the-art methods and achieves 98.51% for ORL, 98.72% for YALE, 98.83% for Caltech, 99.52% for AR, 94.78% for FERET, and 99.07% for KDEF in terms of accuracy, respectively. The influence of color spaces and the issue of degraded images are also analyzed with our descriptors. Such a result with theoretical underpinning confirms that our descriptors are robust against noise, illumination variation, diverse facial expressions, and head poses.


Author(s):  
Qiang Yu ◽  
Feiqiang Liu ◽  
Long Xiao ◽  
Zitao Liu ◽  
Xiaomin Yang

Deep-learning (DL)-based methods are of growing importance in the field of single image super-resolution (SISR). The practical application of these DL-based models is a remaining problem due to the requirement of heavy computation and huge storage resources. The powerful feature maps of hidden layers in convolutional neural networks (CNN) help the model learn useful information. However, there exists redundancy among feature maps, which can be further exploited. To address these issues, this paper proposes a lightweight efficient feature generating network (EFGN) for SISR by constructing the efficient feature generating block (EFGB). Specifically, the EFGB can conduct plain operations on the original features to produce more feature maps with parameters slightly increasing. With the help of these extra feature maps, the network can extract more useful information from low resolution (LR) images to reconstruct the desired high resolution (HR) images. Experiments conducted on the benchmark datasets demonstrate that the proposed EFGN can outperform other deep-learning based methods in most cases and possess relatively lower model complexity. Additionally, the running time measurement indicates the feasibility of real-time monitoring.


Symmetry ◽  
2021 ◽  
Vol 13 (2) ◽  
pp. 266 ◽  
Author(s):  
Yifeng Wang ◽  
Zhijiang Zhang ◽  
Ning Zhang ◽  
Dan Zeng

The one-shot multiple object tracking (MOT) framework has drawn more and more attention in the MOT research community due to its advantage in inference speed. However, the tracking accuracy of current one-shot approaches could lead to an inferior performance compared with their two-stage counterparts. The reasons are two-fold: one is that motion information is often neglected due to the single-image input. The other is that detection and re-identification (ReID) are two different tasks with different focuses. Joining detection and re-identification at the training stage could lead to a suboptimal performance. To alleviate the above limitations, we propose a one-shot network named Motion and Correlation-Multiple Object Tracking (MAC-MOT). MAC-MOT introduces a motion enhance attention module (MEA) and a dual correlation attention module (DCA). MEA performs differences on adjacent feature maps which enhances the motion-related features while suppressing irrelevant information. The DCA module focuses on decoupling the detection task and re-identification task to strike a balance and reduce the competition between these two tasks. Moreover, symmetry is a core design idea in our proposed framework which is reflected in Siamese-based deep learning backbone networks, the input of dual stream images, as well as a dual correlation attention module. Our proposed approach is evaluated on the popular multiple object tracking benchmarks MOT16 and MOT17. We demonstrate that the proposed MAC-MOT can achieve a better performance than the baseline state of the arts (SOTAs).


2021 ◽  
Vol 13 (11) ◽  
pp. 2171
Author(s):  
Yuhao Qing ◽  
Wenyi Liu ◽  
Liuyan Feng ◽  
Wanjia Gao

Despite significant progress in object detection tasks, remote sensing image target detection is still challenging owing to complex backgrounds, large differences in target sizes, and uneven distribution of rotating objects. In this study, we consider model accuracy, inference speed, and detection of objects at any angle. We also propose a RepVGG-YOLO network using an improved RepVGG model as the backbone feature extraction network, which performs the initial feature extraction from the input image and considers network training accuracy and inference speed. We use an improved feature pyramid network (FPN) and path aggregation network (PANet) to reprocess feature output by the backbone network. The FPN and PANet module integrates feature maps of different layers, combines context information on multiple scales, accumulates multiple features, and strengthens feature information extraction. Finally, to maximize the detection accuracy of objects of all sizes, we use four target detection scales at the network output to enhance feature extraction from small remote sensing target pixels. To solve the angle problem of any object, we improved the loss function for classification using circular smooth label technology, turning the angle regression problem into a classification problem, and increasing the detection accuracy of objects at any angle. We conducted experiments on two public datasets, DOTA and HRSC2016. Our results show the proposed method performs better than previous methods.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Maiki Higa ◽  
Shinya Tanahara ◽  
Yoshitaka Adachi ◽  
Natsumi Ishiki ◽  
Shin Nakama ◽  
...  

AbstractIn this report, we propose a deep learning technique for high-accuracy estimation of the intensity class of a typhoon from a single satellite image, by incorporating meteorological domain knowledge. By using the Visual Geometric Group’s model, VGG-16, with images preprocessed with fisheye distortion, which enhances a typhoon’s eye, eyewall, and cloud distribution, we achieved much higher classification accuracy than that of a previous study, even with sequential-split validation. Through comparison of t-distributed stochastic neighbor embedding (t-SNE) plots for the feature maps of VGG with the original satellite images, we also verified that the fisheye preprocessing facilitated cluster formation, suggesting that our model could successfully extract image features related to the typhoon intensity class. Moreover, gradient-weighted class activation mapping (Grad-CAM) was applied to highlight the eye and the cloud distributions surrounding the eye, which are important regions for intensity classification; the results suggest that our model qualitatively gained a viewpoint similar to that of domain experts. A series of analyses revealed that the data-driven approach using only deep learning has limitations, and the integration of domain knowledge could bring new breakthroughs.


Sign in / Sign up

Export Citation Format

Share Document