scholarly journals Patch Attention Layer of Embedding Handcrafted Features in CNN for Facial Expression Recognition

Sensors ◽  
2021 ◽  
Vol 21 (3) ◽  
pp. 833
Author(s):  
Xingcan Liang ◽  
Linsen Xu ◽  
Jinfu Liu ◽  
Zhipeng Liu ◽  
Gaoxin Cheng ◽  
...  

Recognizing facial expression has attracted much more attention due to its broad range of applications in human–computer interaction systems. Although facial representation is crucial to final recognition accuracy, traditional handcrafted representations only reflect shallow characteristics and it is uncertain whether the convolutional layer can extract better ones. In addition, the policy that weights are shared across a whole image is improper for structured face images. To overcome such limitations, a novel method based on patches of interest, the Patch Attention Layer (PAL) of embedding handcrafted features, is proposed to learn the local shallow facial features of each patch on face images. Firstly, a handcrafted feature, Gabor surface feature (GSF), is extracted by convolving the input face image with a set of predefined Gabor filters. Secondly, the generated feature is segmented as nonoverlapped patches that can capture local shallow features by the strategy of using different local patches with different filters. Then, the weighted shallow features are fed into the remaining convolutional layers to capture high-level features. Our method can be carried out directly on a static image without facial landmark information, and the preprocessing step is very simple. Experiments on four databases show that our method achieved very competitive performance (Extended Cohn–Kanade database (CK+): 98.93%; Oulu-CASIA: 97.57%; Japanese Female Facial Expressions database (JAFFE): 93.38%; and RAF-DB: 86.8%) compared to other state-of-the-art methods.

2019 ◽  
Vol 1 (1) ◽  
pp. 25-31
Author(s):  
Arif Budi Setiawan ◽  
Kaspul Anwar ◽  
Laelatul Azizah ◽  
Adhi Prahara

During interview, a psychologist should pay attention to every gesture and response, both verbal and nonverbal language/behaviors, made by the client. Psychologist certainly has limitation in recognizing every gesture and response that indicates a lie, especially in interpreting nonverbal behaviors that usually occurs in a short time. In this research, a real time facial expression recognition is proposed to track nonverbal behaviors to help psychologist keep informed about the change of facial expression that indicate a lie. The method tracks eye gaze, wrinkles on the forehead, and false smile using combination of face detection and facial landmark recognition to find the facial features and image processing method to track the nonverbal behaviors in facial features. Every nonverbal behavior is recorded and logged according to the video timeline to assist the psychologist analyze the behavior of the client. The result of tracking nonverbal behaviors of face is accurate and expected to be useful assistant for the psychologists.


Sensors ◽  
2021 ◽  
Vol 21 (6) ◽  
pp. 2003 ◽  
Author(s):  
Xiaoliang Zhu ◽  
Shihao Ye ◽  
Liang Zhao ◽  
Zhicheng Dai

As a sub-challenge of EmotiW (the Emotion Recognition in the Wild challenge), how to improve performance on the AFEW (Acted Facial Expressions in the wild) dataset is a popular benchmark for emotion recognition tasks with various constraints, including uneven illumination, head deflection, and facial posture. In this paper, we propose a convenient facial expression recognition cascade network comprising spatial feature extraction, hybrid attention, and temporal feature extraction. First, in a video sequence, faces in each frame are detected, and the corresponding face ROI (range of interest) is extracted to obtain the face images. Then, the face images in each frame are aligned based on the position information of the facial feature points in the images. Second, the aligned face images are input to the residual neural network to extract the spatial features of facial expressions corresponding to the face images. The spatial features are input to the hybrid attention module to obtain the fusion features of facial expressions. Finally, the fusion features are input in the gate control loop unit to extract the temporal features of facial expressions. The temporal features are input to the fully connected layer to classify and recognize facial expressions. Experiments using the CK+ (the extended Cohn Kanade), Oulu-CASIA (Institute of Automation, Chinese Academy of Sciences) and AFEW datasets obtained recognition accuracy rates of 98.46%, 87.31%, and 53.44%, respectively. This demonstrated that the proposed method achieves not only competitive performance comparable to state-of-the-art methods but also greater than 2% performance improvement on the AFEW dataset, proving the significant outperformance of facial expression recognition in the natural environment.


2021 ◽  
Vol 8 (5) ◽  
pp. 949
Author(s):  
Fitra A. Bachtiar ◽  
Muhammad Wafi

<p><em>Human machine interaction</em>, khususnya pada <em>facial</em> <em>behavior</em> mulai banyak diperhatikan untuk dapat digunakan sebagai salah satu cara untuk personalisasi pengguna. Kombinasi ekstraksi fitur dengan metode klasifikasi dapat digunakan agar sebuah mesin dapat mengenali ekspresi wajah. Akan tetapi belum diketahui basis metode klasifikasi apa yang tepat untuk digunakan. Penelitian ini membandingkan tiga metode klasifikasi untuk melakukan klasifikasi ekspresi wajah. Dataset ekspresi wajah yang digunakan pada penelitian ini adalah JAFFE dataset dengan total 213 citra wajah yang menunjukkan 7 (tujuh) ekspresi wajah. Ekspresi wajah pada dataset tersebut yaitu <em>anger</em>, <em>disgust</em>, <em>fear</em>, <em>happy</em>, <em>neutral</em>, <em>sadness</em>, dan <em>surprised</em>. Facial Landmark digunakan sebagai ekstraksi fitur wajah. Model klasifikasi yang digunakan pada penelitian ini adalah ELM, SVM, dan <em>k</em>-NN. Masing masing model klasifikasi akan dicari nilai parameter terbaik dengan menggunakan 80% dari total data. 5- <em>fold</em> <em>cross-validation</em> digunakan untuk mencari parameter terbaik. Pengujian model dilakukan dengan 20% data dengan metode evaluasi akurasi, F1 Score, dan waktu komputasi. Nilai parameter terbaik pada ELM adalah menggunakan 40 hidden neuron, SVM dengan nilai  = 10<sup>5</sup> dan 200 iterasi, sedangkan untuk <em>k</em>-NN menggunakan 3 <em>k</em> tetangga. Hasil uji menggunakan parameter tersebut menunjukkan ELM merupakan algoritme terbaik diantara ketiga model klasifikasi tersebut. Akurasi dan F1 Score untuk klasifikasi ekspresi wajah untuk ELM mendapatkan nilai akurasi sebesar 0.76 dan F1 Score 0.76, sedangkan untuk waktu komputasi membutuhkan waktu 6.97´10<sup>-3</sup> detik.   </p><p> </p><p><em><strong>Abstract</strong></em></p><p class="Abstract">H<em>uman-machine interaction, especially facial behavior is considered to be use in user personalization. Feature extraction and classification model combinations can be used for a machine to understand the human facial expression. However, which classification base method should be used is not yet known. This study compares three classification methods for facial expression recognition. JAFFE dataset is used in this study with a total of 213 facial images which shows seven facial expressions. The seven facial expressions are anger, disgust, fear, happy, neutral, sadness, dan surprised. Facial Landmark is used as a facial component features. The classification model used in this study is ELM, SVM, and k-NN. The hyperparameter of each model is searched using 80% of the total data. 5-fold cross-validation is used to find the hyperparameter. The testing is done using 20% of the data and evaluated using accuracy, F1 Score, and computation time. The hyperparameter for ELM is 40 hidden neurons, SVM with  = 105 and 200 iteration, while k-NN used 3 k neighbors. The experiment results show that ELM outperforms other classification methods. The accuracy and F1 Score achieved by ELM is 0.76 and 0.76, respectively. Meanwhile, time computation takes 6.97 10<sup>-3</sup> seconds.      </em></p>


Author(s):  
Siu-Yeung Cho ◽  
Teik-Toe Teoh ◽  
Yok-Yen Nguwi

Facial expression recognition is a challenging task. A facial expression is formed by contracting or relaxing different facial muscles on human face that results in temporally deformed facial features like wide-open mouth, raising eyebrows or etc. The challenges of such system have to address with some issues. For instances, lighting condition is a very difficult problem to constraint and regulate. On the other hand, real-time processing is also a challenging problem since there are so many facial features to be extracted and processed and sometimes, conventional classifiers are not even effective in handling those features and produce good classification performance. This chapter discusses the issues on how the advanced feature selection techniques together with good classifiers can play a vital important role of real-time facial expression recognition. Several feature selection methods and classifiers are discussed and their evaluations for real-time facial expression recognition are presented in this chapter. The content of this chapter is a way to open-up a discussion about building a real-time system to read and respond to the emotions of people from facial expressions.


Author(s):  
Dinesh Kumar P ◽  
Dr. B. Rosiline Jeetha

Facial expression, as one of the most significant means for human beings to show their emotions and intensions in the process of communication, plays a significant role in human interfaces. In recent years, facial expression recognition has been under especially intensive investigation, due conceivably to its vital applications in various fields including virtual reality, intelligent tutoring system, health-care and data driven animation. The main target of facial expression recognition is to identify the human emotional state (e.g., anger, contempt, disgust, fear, happiness, sadness, and surprise ) based on the given facial images. This paper deals with the Facial expression detection and recognition through Viola-jones algorithm and HCNN using LSTM method. It improves the hypothesis execution enough and meanwhile inconceivably reduces the computational costs. In feature matching, the author proposes Hybrid Scale-Invariant Feature Transform (SIFT) with double δ-LBP (Dδ-LBP) and it utilizes the fixed facial landmark localization approach and SIFT’s orientation assignment, to obtain the features that are illumination and pose independent. For face detection, basically we utilize the face detection Viola-Jones algorithm and it recognizes the occluded face and it helps to perform the feature selection through the whale optimization algorithm, once after compression and further, it minimizes the feature vector given into the Hybrid Convolutional Neural Network (HCNN) and Long Short-Term Memory (LSTM) model for identifying the facial expression in efficient manner.The experimental result confirms that the HCNN-LSTM Model beats traditional deep-learning and machine-learning techniques with respect to precision, recall, f-measure, and accuracy using CK+ database. Proposes Hybrid Scale-Invariant Feature Transform (SIFT) with double δ-LBP (Dδ-LBP) and it utilizes the fixed facial landmark localization approach and SIFT’s orientation assignment, to obtain the features that are illumination and pose independent. And HCNN and LSTM model for identifying the facial expression.


2013 ◽  
Vol 347-350 ◽  
pp. 3780-3785
Author(s):  
Jing Jie Yan ◽  
Ming Han Xin

Although spatio-temporal features (ST) have recently been developed and shown to be available for facial expression recognition and behavior recognition in videos, it utilizes the method of directly flattening the cuboid into a vector as a feature vector for recognition which causes the obtained vector is likely to potentially sensitive to small cuboid perturbations or noises. To overcome the drawback of spatio-temporal features, we propose a novel method called fused spatio-temporal features (FST) method utilizing the separable linear filters to detect interesting points and fusing two cuboids representation methods including local histogrammed gradient descriptor and flattening the cuboid into a vector for cuboids descriptor. The proposed FST method may robustness to small cuboid perturbations or noises and also preserve both spatial and temporal positional information. The experimental results on two video-based facial expression databases demonstrate the effectiveness of the proposed method.


Sign in / Sign up

Export Citation Format

Share Document