scholarly journals A Multimodal Approach to improve Performance Evaluation of Call Center Agent

Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2720
Author(s):  
Abdelrahman Ahmed ◽  
Khaled Shaalan ◽  
Sergio Toral ◽  
Yasser Hifny

The paper proposes three modeling techniques to improve the performance evaluation of the call center agent. The first technique is speech processing supported by an attention layer for the agent’s recorded calls. The speech comprises 65 features for the ultimate determination of the context of the call using the Open-Smile toolkit. The second technique uses the Max Weights Similarity (MWS) approach instead of the Softmax function in the attention layer to improve the classification accuracy. MWS function replaces the Softmax function for fine-tuning the output of the attention layer for processing text. It is formed by determining the similarity in the distance of input weights of the attention layer to the weights of the max vectors. The third technique combines the agent’s recorded call speech with the corresponding transcribed text for binary classification. The speech modeling and text modeling are based on combinations of the Convolutional Neural Networks (CNNs) and Bi-directional Long-Short Term Memory (BiLSTMs). In this paper, the classification results for each model (text versus speech) are proposed and compared with the multimodal approach’s results. The multimodal classification provided an improvement of (0.22%) compared with acoustic model and (1.7%) compared with text model.

Sensors ◽  
2020 ◽  
Vol 20 (19) ◽  
pp. 5489
Author(s):  
Abdelrahman Ahmed ◽  
Sergio Toral ◽  
Khaled Shaalan ◽  
Yaser Hifny

Measuring the productivity of an agent in a call center domain is a challenging task. Subjective measures are commonly used for evaluation in the current systems. In this paper, we propose an objective framework for modeling agent productivity for real estate call centers based on speech signal processing. The problem is formulated as a binary classification task using deep learning methods. We explore several designs for the classifier based on convolutional neural networks (CNNs), long-short-term memory networks (LSTMs), and an attention layer. The corpus consists of seven hours collected and annotated from three different call centers. The result shows that the speech-based approach can lead to significant improvements (1.57% absolute improvements) over a robust text baseline system.


Author(s):  
P.L. Nikolaev

This article deals with method of binary classification of images with small text on them Classification is based on the fact that the text can have 2 directions – it can be positioned horizontally and read from left to right or it can be turned 180 degrees so the image must be rotated to read the sign. This type of text can be found on the covers of a variety of books, so in case of recognizing the covers, it is necessary first to determine the direction of the text before we will directly recognize it. The article suggests the development of a deep neural network for determination of the text position in the context of book covers recognizing. The results of training and testing of a convolutional neural network on synthetic data as well as the examples of the network functioning on the real data are presented.


Sensors ◽  
2021 ◽  
Vol 21 (10) ◽  
pp. 3430
Author(s):  
Jean Mário Moreira de Lima ◽  
Fábio Meneghetti Ugulino de Araújo

Soft sensors based on deep learning have been growing in industrial process applications, inferring hard-to-measure but crucial quality-related variables. However, applications may present strong non-linearity, dynamicity, and a lack of labeled data. To deal with the above-cited problems, the extraction of relevant features is becoming a field of interest in soft-sensing. A novel deep representative learning soft-sensor modeling approach is proposed based on stacked autoencoder (SAE), mutual information (MI), and long-short term memory (LSTM). SAE is trained layer by layer with MI evaluation performed between extracted features and targeted output to evaluate the relevance of learned representation in each layer. This approach highlights relevant information and eliminates irrelevant information from the current layer. Thus, deep output-related representative features are retrieved. In the supervised fine-tuning stage, an LSTM is coupled to the tail of the SAE to address system inherent dynamic behavior. Also, a k-fold cross-validation ensemble strategy is applied to enhance the soft-sensor reliability. Two real-world industrial non-linear processes are employed to evaluate the proposed method performance. The obtained results show improved prediction performance in comparison to other traditional and state-of-art methods. Compared to the other methods, the proposed model can generate more than 38.6% and 39.4% improvement of RMSE for the two analyzed industrial cases.


2021 ◽  
Vol 43 ◽  
pp. 102574
Author(s):  
Zixu Yang ◽  
Hansong Xiao ◽  
Wenxing Shi ◽  
Mingsheng Zhang ◽  
Baolong Wang

2021 ◽  
Vol 11 (11) ◽  
pp. 4753
Author(s):  
Gen Ye ◽  
Chen Du ◽  
Tong Lin ◽  
Yan Yan ◽  
Jack Jiang

(1) Background: Deep learning has become ubiquitous due to its impressive performance in various domains, such as varied as computer vision, natural language and speech processing, and game-playing. In this work, we investigated the performance of recent deep learning approaches on the laryngopharyngeal reflux (LPR) diagnosis task. (2) Methods: Our dataset is composed of 114 subjects with 37 pH-positive cases and 77 control cases. In contrast to prior work based on either reflux finding score (RFS) or pH monitoring, we directly take laryngoscope images as inputs to neural networks, as laryngoscopy is the most common and simple diagnostic method. The diagnosis task is formulated as a binary classification problem. We first tested a powerful backbone network that incorporates residual modules, attention mechanism and data augmentation. Furthermore, recent methods in transfer learning and few-shot learning were investigated. (3) Results: On our dataset, the performance is the best test classification accuracy is 73.4%, while the best AUC value is 76.2%. (4) Conclusions: This study demonstrates that deep learning techniques can be applied to classify LPR images automatically. Although the number of pH-positive images used for training is limited, deep network can still be capable of learning discriminant features with the advantage of technique.


2018 ◽  
Author(s):  
William Schueller ◽  
Vittorio Loreto ◽  
Pierre-Yves Oudeyer

In the process of collectively inventing new words for new con-cepts in a population, conflicts can quickly become numerous,in the form of synonymy and homonymy. Remembering all ofthem could cost too much memory, and remembering too fewmay slow down the overall process. Is there an efficient be-havior that could help balance the two? The Naming Game isa multi-agent computational model for the emergence of lan-guage, focusing on the negotiation of new lexical conventions,where a common lexicon self-organizes but going through aphase of high complexity. Previous work has been done onthe control of complexity growth in this particular model, byallowing agents to actively choose what they talk about. How-ever, those strategies were relying on ad hoc heuristics highlydependent on fine-tuning of parameters. We define here a newprincipled measure and a new strategy, based on the beliefsof each agent on the global state of the population. The mea-sure does not rely on heavy computation, and is cognitivelyplausible. The new strategy yields an efficient control of com-plexity growth, along with a faster agreement process. Also,we show that short-term memory is enough to build relevantbeliefs about the global lexicon.


Author(s):  
Garima Devnani ◽  
Ayush Jaiswal ◽  
Roshni John ◽  
Rajat Chaurasia ◽  
Neha Tirpude

<span lang="EN-US">Fine-tuning of a model is a method that is most often required to cater to the users’ explicit requirements. But the question remains whether the model is accurate enough to be used for a certain application. This paper strives to present the metrics used for performance evaluation of a Convolutional Neural Network (CNN) model. The evaluation is based on the training process which provides us with intermediate models after every 1000 iterations. While 1000 iterations are not substantial enough over the range of 490k iterations, the groups are sized with 100k iterations each. Now, the intention was to compare the recorded metrics to evaluate the model in terms of accuracy. The training model used the set of specific categories chosen from the Microsoft Common Objects in Context (MS COCO) dataset while allowing the users to use their externally available images to test the model’s accuracy. Our trained model ensured that all the objects are detected that are present in the image to depict the effect of precision.</span>


Author(s):  
L. Lu ◽  
C. Cai ◽  
A. H. Soni

Abstract For an arbitrarily shaped object manipulated by a robot hand, this paper presents a procedure for analyzing the position and rotation ranges of the object, and a procedure for designing the kinematic parameters of a hand to meet given requirements on the motion ranges. Rotation dexterity index, dexterity charts, and a dexterity scalar characterizing both position range and rotation range are introduced for the performance evaluation of a robot hand. Least-square-error iteration and steps are detailed for the kinematic parameter determination of a robot hand.


Sign in / Sign up

Export Citation Format

Share Document