scholarly journals Facial Expression Recognition Based on Weighted-Cluster Loss and Deep Transfer Learning Using a Highly Imbalanced Dataset

Sensors ◽  
2020 ◽  
Vol 20 (9) ◽  
pp. 2639
Author(s):  
Quan T. Ngo ◽  
Seokhoon Yoon

Facial expression recognition (FER) is a challenging problem in the fields of pattern recognition and computer vision. The recent success of convolutional neural networks (CNNs) in object detection and object segmentation tasks has shown promise in building an automatic deep CNN-based FER model. However, in real-world scenarios, performance degrades dramatically owing to the great diversity of factors unrelated to facial expressions, and due to a lack of training data and an intrinsic imbalance in the existing facial emotion datasets. To tackle these problems, this paper not only applies deep transfer learning techniques, but also proposes a novel loss function called weighted-cluster loss, which is used during the fine-tuning phase. Specifically, the weighted-cluster loss function simultaneously improves the intra-class compactness and the inter-class separability by learning a class center for each emotion class. It also takes the imbalance in a facial expression dataset into account by giving each emotion class a weight based on its proportion of the total number of images. In addition, a recent, successful deep CNN architecture, pre-trained in the task of face identification with the VGGFace2 database from the Visual Geometry Group at Oxford University, is employed and fine-tuned using the proposed loss function to recognize eight basic facial emotions from the AffectNet database of facial expression, valence, and arousal computing in the wild. Experiments on an AffectNet real-world facial dataset demonstrate that our method outperforms the baseline CNN models that use either weighted-softmax loss or center loss.

Electronics ◽  
2019 ◽  
Vol 8 (12) ◽  
pp. 1487 ◽  
Author(s):  
Asad Ullah ◽  
Jing Wang ◽  
M. Shahid Anwar ◽  
Usman Ahmad ◽  
Uzair Saeed ◽  
...  

Automatic facial expression recognition is an emerging field. Moreover, the interest has been increased with the transition from laboratory-controlled conditions to in the wild scenarios. Most of the research has been done over nonoccluded faces under the constrained environment, while automatic facial expression is less understood/implemented for partial occlusion in the real world conditions. Apart from that, our research aims to tackle the issues of overfitting (caused by the shortage of adequate training data) and to alleviate the expression-unrelated/intraclass/nonlinear facial variations, such as head pose estimation, eye gaze estimation, intensity and microexpressions. In our research, we control the magnitude of each Action Unit (AU) and combine several of the Action Unit combinations to leverage learning from the generative and discriminative representations for automatic FER. We have also addressed the problem of diversification of expressions from lab controlled to real-world scenarios from our cross-database study and proposed a model for enhancement of the discriminative power of deep features while increasing the interclass scatters, by preserving the locality closeness. Furthermore, facial expression consists of an expressive component as well as neutral component, so we proposed a generative model which is capable of generating neutral expression from an input image using cGAN. The expressive component is filtered and passed to the intermediate layers and the process is called De-expression Residue Learning. The residue in the intermediate/middle layers is very important for learning through expressive components. Finally, we validate the effectiveness of our method (DLP-DeRL) through qualitative and quantitative experimental results using four databases. Our method is more accurate and robust, and outperforms all the existing methods (hand crafted features and deep learning) while dealing the images in the wild.


2021 ◽  
Vol 14 (2) ◽  
pp. 127-135
Author(s):  
Fadhil Yusuf Rahadika ◽  
Novanto Yudistira ◽  
Yuita Arum Sari

During the COVID-19 pandemic, many offline activities are turned into online activities via video meetings to prevent the spread of the COVID 19 virus. In the online video meeting, some micro-interactions are missing when compared to direct social interactions. The use of machines to assist facial expression recognition in online video meetings is expected to increase understanding of the interactions among users. Many studies have shown that CNN-based neural networks are quite effective and accurate in image classification. In this study, some open facial expression datasets were used to train CNN-based neural networks with a total number of training data of 342,497 images. This study gets the best results using ResNet-50 architecture with Mish activation function and Accuracy Booster Plus block. This architecture is trained using the Ranger and Gradient Centralization optimization method for 60000 steps with a batch size of 256. The best results from the training result in accuracy of AffectNet validation data of 0.5972, FERPlus validation data of 0.8636, FERPlus test data of 0.8488, and RAF-DB test data of 0.8879. From this study, the proposed method outperformed plain ResNet in all test scenarios without transfer learning, and there is a potential for better performance with the pre-training model. The code is available at https://github.com/yusufrahadika-facial-expressions-essay.


2020 ◽  
Vol 28 (1) ◽  
pp. 97-111
Author(s):  
Nadir Kamel Benamara ◽  
Mikel Val-Calvo ◽  
Jose Ramón Álvarez-Sánchez ◽  
Alejandro Díaz-Morcillo ◽  
Jose Manuel Ferrández-Vicente ◽  
...  

Facial emotion recognition (FER) has been extensively researched over the past two decades due to its direct impact in the computer vision and affective robotics fields. However, the available datasets to train these models include often miss-labelled data due to the labellers bias that drives the model to learn incorrect features. In this paper, a facial emotion recognition system is proposed, addressing automatic face detection and facial expression recognition separately, the latter is performed by a set of only four deep convolutional neural network respect to an ensembling approach, while a label smoothing technique is applied to deal with the miss-labelled training data. The proposed system takes only 13.48 ms using a dedicated graphics processing unit (GPU) and 141.97 ms using a CPU to recognize facial emotions and reaches the current state-of-the-art performances regarding the challenging databases, FER2013, SFEW 2.0, and ExpW, giving recognition accuracies of 72.72%, 51.97%, and 71.82% respectively.


Optik ◽  
2018 ◽  
Vol 158 ◽  
pp. 1016-1025 ◽  
Author(s):  
Asim Munir ◽  
Ayyaz Hussain ◽  
Sajid Ali Khan ◽  
Muhammad Nadeem ◽  
Sadia Arshid

IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 108906-108915 ◽  
Author(s):  
Keyu Yan ◽  
Wenming Zheng ◽  
Tong Zhang ◽  
Yuan Zong ◽  
Chuangao Tang ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document