scholarly journals Evaluating Low-Level Speech Features Against Human Perceptual Data

2017 ◽  
Vol 5 ◽  
pp. 425-440 ◽  
Author(s):  
Caitlin Richter ◽  
Naomi H. Feldman ◽  
Harini Salgado ◽  
Aren Jansen

We introduce a method for measuring the correspondence between low-level speech features and human perception, using a cognitive model of speech perception implemented directly on speech recordings. We evaluate two speaker normalization techniques using this method and find that in both cases, speech features that are normalized across speakers predict human data better than unnormalized speech features, consistent with previous research. Results further reveal differences across normalization methods in how well each predicts human data. This work provides a new framework for evaluating low-level representations of speech on their match to human perception, and lays the groundwork for creating more ecologically valid models of speech perception.

2021 ◽  
Author(s):  
Meng Liu ◽  
Wenshan Dong ◽  
Shaozheng Qin ◽  
Tom Verguts ◽  
Qi Chen

AbstractHuman perception and learning is thought to rely on a hierarchical generative model that is continuously updated via precision-weighted prediction errors (pwPEs). However, the neural basis of such cognitive process and how it unfolds during decision making, remain poorly understood. To investigate this question, we combined a hierarchical Bayesian model (i.e., Hierarchical Gaussian Filter, HGF) with electrophysiological (EEG) recording, while participants performed a probabilistic reversal learning task in alternatingly stable and volatile environments. Behaviorally, the HGF fitted significantly better than two control, non-hierarchical, models. Neurally, low-level and high-level pwPEs were independently encoded by the P300 component. Low-level pwPEs were reflected in the theta (4-8 Hz) frequency band, but high-level pwPEs were not. Furthermore, the expressions of high-level pwPEs were stronger for participants with better HGF fit. These results indicate that the brain employs hierarchical learning, and encodes both low- and high-level learning signals separately and adaptively.


Author(s):  
Richard Stone ◽  
Minglu Wang ◽  
Thomas Schnieders ◽  
Esraa Abdelall

Human-robotic interaction system are increasingly becoming integrated into industrial, commercial and emergency service agencies. It is critical that human operators understand and trust automation when these systems support and even make important decisions. The following study focused on human-in-loop telerobotic system performing a reconnaissance operation. Twenty-four subjects were divided into groups based on level of automation (Low-Level Automation (LLA), and High-Level Automation (HLA)). Results indicated a significant difference between low and high word level of control in hit rate when permanent error occurred. In the LLA group, the type of error had a significant effect on the hit rate. In general, the high level of automation was better than the low level of automation, especially if it was more reliable, suggesting that subjects in the HLA group could rely on the automatic implementation to perform the task more effectively and more accurately.


2020 ◽  
Vol 8 ◽  
pp. 199-214
Author(s):  
Xi (Leslie) Chen ◽  
Sarah Ita Levitan ◽  
Michelle Levine ◽  
Marko Mandic ◽  
Julia Hirschberg

Humans rarely perform better than chance at lie detection. To better understand human perception of deception, we created a game framework, LieCatcher, to collect ratings of perceived deception using a large corpus of deceptive and truthful interviews. We analyzed the acoustic-prosodic and linguistic characteristics of language trusted and mistrusted by raters and compared these to characteristics of actual truthful and deceptive language to understand how perception aligns with reality. With this data we built classifiers to automatically distinguish trusted from mistrusted speech, achieving an F1 of 66.1%. We next evaluated whether the strategies raters said they used to discriminate between truthful and deceptive responses were in fact useful. Our results show that, although several prosodic and lexical features were consistently perceived as trustworthy, they were not reliable cues. Also, the strategies that judges reported using in deception detection were not helpful for the task. Our work sheds light on the nature of trusted language and provides insight into the challenging problem of human deception detection.


2018 ◽  
Vol 2018 ◽  
pp. 1-11 ◽  
Author(s):  
Hai Wang ◽  
Lei Dai ◽  
Yingfeng Cai ◽  
Long Chen ◽  
Yong Zhang

Traditional salient object detection models are divided into several classes based on low-level features and contrast between pixels. In this paper, we propose a model based on a multilevel deep pyramid (MLDP), which involves fusing multiple features on different levels. Firstly, the MLDP uses the original image as the input for a VGG16 model to extract high-level features and form an initial saliency map. Next, the MLDP further extracts high-level features to form a saliency map based on a deep pyramid. Then, the MLDP obtains the salient map fused with superpixels by extracting low-level features. After that, the MLDP applies background noise filtering to the saliency map fused with superpixels in order to filter out the interference of background noise and form a saliency map based on the foreground. Lastly, the MLDP combines the saliency map fused with the superpixels with the saliency map based on the foreground, which results in the final saliency map. The MLDP is not limited to low-level features while it fuses multiple features and achieves good results when extracting salient targets. As can be seen in our experiment section, the MLDP is better than the other 7 state-of-the-art models across three different public saliency datasets. Therefore, the MLDP has superiority and wide applicability in extraction of salient targets.


2010 ◽  
Vol 25 (1) ◽  
pp. 173-189 ◽  
Author(s):  
J. Brotzge ◽  
K. Hondl ◽  
B. Philips ◽  
L. Lemon ◽  
E. J. Bass ◽  
...  

Abstract The Center for Collaborative Adaptive Sensing of the Atmosphere (CASA) is a multiyear engineering research center established by the National Science Foundation for the development of small, inexpensive, low-power radars designed to improve the scanning of the lowest levels (<3 km AGL) of the atmosphere. Instead of sensing autonomously, CASA radars are designed to operate as a network, collectively adapting to the changing needs of end users and the environment; this network approach to scanning is known as distributed collaborative adaptive sensing (DCAS). DCAS optimizes the low-level volume coverage scanning and maximizes the utility of each scanning cycle. A test bed of four prototype CASA radars was deployed in southwestern Oklahoma in 2006 and operated continuously while in DCAS mode from March through June of 2007. This paper analyzes three convective events observed during April–May 2007, during CASA’s intense operation period (IOP), with a special focus on evaluating the benefits and weaknesses of CASA radar system deployment and DCAS scanning strategy of detecting and tracking low-level circulations. Data collected from nearby Weather Surveillance Radar-1988 Doppler (WSR-88D) and CASA radars are compared for mesoscyclones, misocyclones, and low-level vortices. Initial results indicate that the dense, overlapping coverage at low levels provided by the CASA radars and the high temporal (60 s) resolution provided by DCAS give forecasters more detailed feature continuity and tracking. Moreover, the CASA system is able to resolve a whole class of circulations—misocyclones—far better than the WSR-88Ds. In fact, many of these are probably missed completely by the WSR-88D. The impacts of this increased detail on severe weather warnings are under investigation. Ongoing efforts include enhancing the DCAS data quality and scanning strategy, improving the DCAS data visualization, and developing a robust infrastructure to better support forecast and warning operations.


Author(s):  
Mark Last ◽  
Yael Mendelson ◽  
Sugato Chakrabarty ◽  
Karishma Batra

Car manufacturers are interested to detect evolving problems in a car fleet as early as possible so they can take preventive actions and deal with the problems before they become widespread. The vast amount of warranty claims recorded by the car dealers makes the manual process of analyzing this data hardly feasible. This chapter describes a fuzzy-based methodology for automated detection of evolving maintenance problems in massive streams of car warranty data. The empirical distributions of time-to-failure and mileage-to-failure are monitored over time using the advanced, fuzzy approach to comparison of frequency distributions. The authors’ fuzzy-based early warning tool builds upon an automated interpretation of the differences between consecutive histogram plots using a cognitive model of human perception rather than “crisp” statistical models. They demonstrate the effectiveness and the efficiency of the proposed tool on warranty data that is very similar to the actual data gathered from a database within General Motors.


1988 ◽  
Vol 132 ◽  
pp. 45-48
Author(s):  
J. L. Coutures ◽  
G. Boucharlat

Photodiode linear arrays are perfectly adapted for spectral analysis. The TH 7832CDZ bilinear array is a new device specially adapted to low level detection (exposure < 7 nJ/cm2) with a reading efficiency of the photodiode signal better than 97% on all the dynamic range (> 70 dB).


Author(s):  
David B. Pisoni ◽  
Susannah V. Levi

This article examines how new approaches—coupled with previous insights—provide a new framework for questions that deal with the nature of phonological and lexical knowledge and representation, processing of stimulus variability, and perceptual learning and adaptation. First, it outlines the traditional view of speech perception and identifies some problems with assuming such a view, in which only abstract representations exist. The article then discusses some new approaches to speech perception that retain detailed information in the representations. It also considers a view which rejects abstraction altogether, but shows that such a view has difficulty dealing with a range of linguistic phenomena. After providing a brief discussion of some new directions in linguistics that encode both detailed information and abstraction, the article concludes by discussing the coupling of speech perception and spoken word recognition.


Sign in / Sign up

Export Citation Format

Share Document