Non-Contact Speech Recovery Technology Using a 24 GHz Portable Auditory Radar and Webcam

Yue Ma; Hong Hong; Hui Li; Heng Zhao; Yusheng Li; Li Sun; Chen Gu; Xiaohua Zhu

doi:10.3390/rs12040653

Non-Contact Speech Recovery Technology Using a 24 GHz Portable Auditory Radar and Webcam

Remote Sensing ◽

10.3390/rs12040653 ◽

2020 ◽

Vol 12 (4) ◽

pp. 653

Author(s):

Yue Ma ◽

Hong Hong ◽

Hui Li ◽

Heng Zhao ◽

Yusheng Li ◽

...

Keyword(s):

Information Exchange ◽

Continuous Wave ◽

Vocal Tract ◽

Vibration Signal ◽

Human Communication ◽

Formant Frequency ◽

Vocal Tract Resonance ◽

High Consistency ◽

Source Excitation ◽

24 Ghz

Language has been one of the most effective ways of human communication and information exchange. To solve the problem of non-contact robust speech recognition, recovery, and surveillance, this paper presents a speech recovery technology based on a 24 GHz portable auditory radar and webcam. The continuous-wave auditory radar is utilized to extract the vocal vibration signal, and the webcam is used to obtain the fitted formant frequency. The traditional formant speech synthesizer is selected to synthesize and recover speech, using the vocal vibration signal as the sound source excitation and the fitted formant frequency as the vocal tract resonance characteristics. Experiments on reading single English characters and words are carried out. Using microphone records as a reference, the effectiveness of the proposed speech recovery technology is verified. Mean opinion scores show a relatively high consistency between the synthesized speech and original acoustic speech.

Download Full-text

Contributions of Voice and Nonverbal Communication to Perceived Masculinity–Femininity for Cisgender and Transgender Communicators

Journal of Speech Language and Hearing Research ◽

10.1044/2019_jslhr-19-00387 ◽

2020 ◽

Vol 63 (4) ◽

pp. 931-947

Author(s):

Teresa L. D. Hardy ◽

Carol A. Boliek ◽

Daniel Aalto ◽

Justin Lewicke ◽

Kristopher Wells ◽

...

Keyword(s):

Fundamental Frequency ◽

Sound Pressure Level ◽

Sound Pressure ◽

Vocal Tract ◽

Presentation Mode ◽

Pressure Level ◽

Presentation Modes ◽

Audiovisual Stimuli ◽

Vocal Tract Resonance ◽

Point Light

Purpose The purpose of this study was twofold: (a) to identify a set of communication-based predictors (including both acoustic and gestural variables) of masculinity–femininity ratings and (b) to explore differences in ratings between audio and audiovisual presentation modes for transgender and cisgender communicators. Method The voices and gestures of a group of cisgender men and women ( n = 10 of each) and transgender women ( n = 20) communicators were recorded while they recounted the story of a cartoon using acoustic and motion capture recording systems. A total of 17 acoustic and gestural variables were measured from these recordings. A group of observers ( n = 20) rated each communicator's masculinity–femininity based on 30- to 45-s samples of the cartoon description presented in three modes: audio, visual, and audio visual. Visual and audiovisual stimuli contained point light displays standardized for size. Ratings were made using a direct magnitude estimation scale without modulus. Communication-based predictors of masculinity–femininity ratings were identified using multiple regression, and analysis of variance was used to determine the effect of presentation mode on perceptual ratings. Results Fundamental frequency, average vowel formant, and sound pressure level were identified as significant predictors of masculinity–femininity ratings for these communicators. Communicators were rated significantly more feminine in the audio than the audiovisual mode and unreliably in the visual-only mode. Conclusions Both study purposes were met. Results support continued emphasis on fundamental frequency and vocal tract resonance in voice and communication modification training with transgender individuals and provide evidence for the potential benefit of modifying sound pressure level, especially when a masculine presentation is desired.

Download Full-text

Impact of Vocal Tract Resonance on the Perception of Voice Quality Changes Caused by Varying Vocal Fold Stiffness

Acta Acustica united with Acustica ◽

10.3813/aaa.918937 ◽

2016 ◽

Vol 102 (2) ◽

pp. 209-213 ◽

Cited By ~ 2

Author(s):

Rosario Signorello ◽

Zhaoyan Zhang ◽

Bruce Gerratt ◽

Jody Kreiman

Keyword(s):

Vocal Fold ◽

Vocal Tract ◽

Voice Quality ◽

Quality Changes ◽

Vocal Tract Resonance

Download Full-text

Vocal tract area functions from two point acoustic measurements with formant frequency constraints

IEEE Transactions on Acoustics Speech and Signal Processing ◽

10.1109/tassp.1984.1164455 ◽

1984 ◽

Vol 32 (6) ◽

pp. 1122-1135 ◽

Cited By ~ 9

Author(s):

P. Milenkovic

Keyword(s):

Vocal Tract ◽

Acoustic Measurements ◽

Formant Frequency ◽

Frequency Constraints

Download Full-text

Effects of Long-Term Tracheostomy on Spectral Characteristics of Vowel Production

Journal of Speech Language and Hearing Research ◽

10.1044/jshr.3405.1057 ◽

1991 ◽

Vol 34 (5) ◽

pp. 1057-1065 ◽

Cited By ~ 25

Author(s):

Ruth Saletsky Kamen ◽

Ben C. Watson

Keyword(s):

Vocal Tract ◽

Spectral Characteristics ◽

Group Differences ◽

Vowel Production ◽

Formant Frequency ◽

Motor Patterns ◽

Spectral Parameters ◽

Oral Motor ◽

Vowel Space

This study investigated the effects of long-term tracheostomy on the development of speech. Eight children who underwent tracheotomy during the prelingual period were compared to matched controls on selected spectral parameters of the speech acoustic signal and standard measures of oral-motor, phonologic, and articulatory proficiency. Analysis of formant frequency values revealed significant between-group differences. Children with histories of long-term tracheostomy showed reduced acoustic vowel space, as defined by group formant frequency values. This suggests that these children were limited in their ability to produce extreme vocal tract configurations for vowels /a,i,u/ postdecannulation. Oral motor patterns were less mature, and sound substitutions were not only more variable for this group, but also reflected a persistent overlay of maladaptive compensations developed during cannulation.

Download Full-text

Vocal Tract Steadiness

Journal of Speech Language and Hearing Research ◽

10.1044/jshr.3504.761 ◽

1992 ◽

Vol 35 (4) ◽

pp. 761-768 ◽

Cited By ~ 30

Author(s):

Petra Zwirner ◽

Gary J. Barnes

Keyword(s):

Motor Control ◽

Fundamental Frequency ◽

Vocal Tract ◽

Upper Airway ◽

Normal Subjects ◽

Subject Group ◽

Formant Frequency ◽

Control Subjects ◽

Acoustic Analyses ◽

The Relationship

Acoustic analyses of upper airway and phonatory stability were conducted on samples of sustained phonation to evaluate the relation between laryngeal and articulomotor stability for 31 patients with dysarthria and 12 non-dysarthric control subjects. Significantly higher values were found for the variability in fundamental frequency and formant frequency of patients who have Huntington’s disease compared with normal subjects and patients with Parkinson’s disease. No significant correlations were found between formant frequency variability and the variability of the fundamental frequency for any subject group. These findings are discussed as they pertain to the relationship between phonatory and upper airway subsystems and the evaluation of vocal tract motor control impairments in dysarthria.

Download Full-text

Speed Calibration and Traceability for Train-Borne 24 GHz Continuous-Wave Doppler Radar Sensor

Sensors ◽

10.3390/s20041230 ◽

2020 ◽

Vol 20 (4) ◽

pp. 1230

Author(s):

Lei Du ◽

Qiao Sun ◽

Jie Bai ◽

Xiaolei Wang ◽

Tianqi Xu

Keyword(s):

Continuous Wave ◽

Doppler Radar ◽

Single Channel ◽

Full Range ◽

Calibration Method ◽

Simulation System ◽

Radar Sensor ◽

Dual Channel ◽

Calibration Methods ◽

24 Ghz

The 24 GHz continuous-wave (CW) Doppler radar sensor (DRS) is widely used for measuring the instantaneous speed of moving objects by using a non-contact approach, and has begun to be used in train-borne movable speed measurements in recent years in China because of its advanced performance. The architecture and working principle of train-borne DRSs with different structures including single-channel DRSs used for freight train speed measurements in railway freight dedicated lines and dual-channel DRSs used for speed measurements of high-speed and urban rail trains in railway passenger dedicated lines, are first introduced. Then, the disadvantages of two traditional speed calibration methods for train-borne DRS are described, and a new speed calibration method based on the Doppler shift signal simulation by imposing a signal modulation on the incident CW microwave signal is proposed. A 24 GHz CW radar target simulation system for a train-borne DRS was specifically realized to verify the proposed speed calibration method for a train-borne DRS, and traceability and performance evaluation on simulated speed were taken into account. The simulated speed range of the simulation system was up to (5~500) km/h when the simulated incident angle range was within the range of (45 ± 8)°, and the maximum permissible error (MPE) of the simulated speed was ±0.05 km/h. Finally, the calibration and uncertainty evaluation results of two typical train-borne dual-channel DRS samples validated the effectiveness and feasibility of the proposed speed calibration approach for a train-borne DRS with full range in the laboratory as well as in the field.

Download Full-text

Vocal Tract Resonance Analysis Using LTAS in the Context of the Singer’s Level of Advancement

Hard and Soft Computing for Artificial Intelligence, Multimedia and Security - Advances in Intelligent Systems and Computing ◽

10.1007/978-3-319-48429-7_23 ◽

2016 ◽

pp. 249-257 ◽

Cited By ~ 1

Author(s):

Edward Półrolniczak ◽

Michał Kramarczyk

Keyword(s):

Vocal Tract ◽

Resonance Analysis ◽

Vocal Tract Resonance

Download Full-text

Method And Apparatus For Vocal Tract Resonance Tracking Using Nonlinear Predictor And Target-Guided Temporal Restraint

The Journal of the Acoustical Society of America ◽

10.1121/1.3600957 ◽

2011 ◽

Vol 129 (6) ◽

pp. 4097

Author(s):

Li Deng

Keyword(s):

Vocal Tract ◽

Vocal Tract Resonance

Download Full-text

Phonetics of Singing in Western Classical Style

Oxford Research Encyclopedia of Linguistics ◽

10.1093/acrefore/9780199384655.013.412 ◽

2018 ◽

Author(s):

Johan Sundberg

Keyword(s):

Fundamental Frequency ◽

Vocal Fold ◽

Vocal Tract ◽

Formant Frequency ◽

Vowel Identification ◽

Fundamental Frequencies ◽

Vocal Fold Vibration ◽

Syllable Duration ◽

Classical Singing ◽

The Voice

The function of the voice organ is basically the same in classical singing as in speech. However, loud orchestral accompaniment has necessitated the use of the voice in an economical way. As a consequence, the vowel sounds tend to deviate considerably from those in speech. Male voices cluster formant three, four, and five, so that a marked peak is produced in spectrum envelope near 3,000 Hz. This helps them to get heard through a loud orchestral accompaniment. They seem to achieve this effect by widening the lower pharynx, which makes the vowels more centralized than in speech. Singers often sing at fundamental frequencies higher than the normal first formant frequency of the vowel in the lyrics. In such cases they raise the first formant frequency so that it gets somewhat higher than the fundamental frequency. This is achieved by reducing the degree of vocal tract constriction or by widening the lip and jaw openings, constricting the vocal tract in the pharyngeal end and widening it in the mouth. These deviations from speech cause difficulties in vowel identification, particularly at high fundamental frequencies. Actually, vowel identification is almost impossible above 700 Hz (pitch F5). Another great difference between vocal sound produced in speech and the classical singing tradition concerns female voices, which need to reduce the timbral differences between voice registers. Females normally speak in modal or chest register, and the transition to falsetto tends to happen somewhere above 350 Hz. The great timbral differences between these registers are avoided by establishing control over the register function, that is, over the vocal fold vibration characteristics, so that seamless transitions are achieved. In many other respects, there are more or less close similarities between speech and singing. Thus, marking phrase structure, emphasizing important events, and emotional coloring are common principles, which may make vocal artists deviate considerably from the score’s nominal description of fundamental frequency and syllable duration.

Download Full-text

Low Cost and Compact FMCW 24 GHz Radar Applications for Snowpack and Ice Thickness Measurements

Sensors ◽

10.3390/s20143909 ◽

2020 ◽

Vol 20 (14) ◽

pp. 3909

Author(s):

Patrick Pomerleau ◽

Alain Royer ◽

Alexandre Langlois ◽

Patrick Cliche ◽

Bruno Courtemanche ◽

...

Keyword(s):

Snow Depth ◽

Continuous Wave ◽

Snow Water Equivalent ◽

Low Cost ◽

Ice Thickness ◽

Lake Ice ◽

Wide Range ◽

Thickness Measurements ◽

24 Ghz

Monitoring the evolution of snow on the ground and lake ice—two of the most important components of the changing northern environment—is essential. In this paper, we describe a lightweight, compact and autonomous 24 GHz frequency-modulated continuous-wave (FMCW) radar system for freshwater ice thickness and snow mass (snow water equivalent, SWE) measurements. Although FMCW radars have a long-established history, the novelty of this research lies in that we take advantage the availability of a new generation of low cost and low power requirement units that facilitates the monitoring of snow and ice at remote locations. Test performance (accuracy and limitations) is presented for five different applications, all using an automatic operating mode with improved signal processing: (1) In situ lake ice thickness measurements giving 2 cm accuracy up to ≈1 m ice thickness and a radar resolution of 4 cm; (2) remotely piloted aircraft-based lake ice thickness from low-altitude flight at 5 m; (3) in situ dry SWE measurements based on known snow depth, giving 13% accuracy (RMSE 20%) over boreal forest, subarctic taiga and Arctic tundra, with a measurement capability of up to 3 m in snowpack thickness; (4) continuous monitoring of surface snow density under particular Antarctic conditions; (5) continuous SWE monitoring through the winter with a synchronized and collocated snow depth sensor (ultrasonic or LiDAR sensor), giving 13.5% bias and 25 mm root mean square difference (RMSD) (10%) for dry snow. The need for detection processing for wet snow, which strongly absorbs radar signals, is discussed. An appendix provides 24 GHz simulated effective refractive index and penetration depth as a function of a wide range of density, temperature and wetness for ice and snow.

Download Full-text