scholarly journals Non-Contact Speech Recovery Technology Using a 24 GHz Portable Auditory Radar and Webcam

2020 ◽  
Vol 12 (4) ◽  
pp. 653
Author(s):  
Yue Ma ◽  
Hong Hong ◽  
Hui Li ◽  
Heng Zhao ◽  
Yusheng Li ◽  
...  

Language has been one of the most effective ways of human communication and information exchange. To solve the problem of non-contact robust speech recognition, recovery, and surveillance, this paper presents a speech recovery technology based on a 24 GHz portable auditory radar and webcam. The continuous-wave auditory radar is utilized to extract the vocal vibration signal, and the webcam is used to obtain the fitted formant frequency. The traditional formant speech synthesizer is selected to synthesize and recover speech, using the vocal vibration signal as the sound source excitation and the fitted formant frequency as the vocal tract resonance characteristics. Experiments on reading single English characters and words are carried out. Using microphone records as a reference, the effectiveness of the proposed speech recovery technology is verified. Mean opinion scores show a relatively high consistency between the synthesized speech and original acoustic speech.

2020 ◽  
Vol 63 (4) ◽  
pp. 931-947
Author(s):  
Teresa L. D. Hardy ◽  
Carol A. Boliek ◽  
Daniel Aalto ◽  
Justin Lewicke ◽  
Kristopher Wells ◽  
...  

Purpose The purpose of this study was twofold: (a) to identify a set of communication-based predictors (including both acoustic and gestural variables) of masculinity–femininity ratings and (b) to explore differences in ratings between audio and audiovisual presentation modes for transgender and cisgender communicators. Method The voices and gestures of a group of cisgender men and women ( n = 10 of each) and transgender women ( n = 20) communicators were recorded while they recounted the story of a cartoon using acoustic and motion capture recording systems. A total of 17 acoustic and gestural variables were measured from these recordings. A group of observers ( n = 20) rated each communicator's masculinity–femininity based on 30- to 45-s samples of the cartoon description presented in three modes: audio, visual, and audio visual. Visual and audiovisual stimuli contained point light displays standardized for size. Ratings were made using a direct magnitude estimation scale without modulus. Communication-based predictors of masculinity–femininity ratings were identified using multiple regression, and analysis of variance was used to determine the effect of presentation mode on perceptual ratings. Results Fundamental frequency, average vowel formant, and sound pressure level were identified as significant predictors of masculinity–femininity ratings for these communicators. Communicators were rated significantly more feminine in the audio than the audiovisual mode and unreliably in the visual-only mode. Conclusions Both study purposes were met. Results support continued emphasis on fundamental frequency and vocal tract resonance in voice and communication modification training with transgender individuals and provide evidence for the potential benefit of modifying sound pressure level, especially when a masculine presentation is desired.


2016 ◽  
Vol 102 (2) ◽  
pp. 209-213 ◽  
Author(s):  
Rosario Signorello ◽  
Zhaoyan Zhang ◽  
Bruce Gerratt ◽  
Jody Kreiman

1991 ◽  
Vol 34 (5) ◽  
pp. 1057-1065 ◽  
Author(s):  
Ruth Saletsky Kamen ◽  
Ben C. Watson

This study investigated the effects of long-term tracheostomy on the development of speech. Eight children who underwent tracheotomy during the prelingual period were compared to matched controls on selected spectral parameters of the speech acoustic signal and standard measures of oral-motor, phonologic, and articulatory proficiency. Analysis of formant frequency values revealed significant between-group differences. Children with histories of long-term tracheostomy showed reduced acoustic vowel space, as defined by group formant frequency values. This suggests that these children were limited in their ability to produce extreme vocal tract configurations for vowels /a,i,u/ postdecannulation. Oral motor patterns were less mature, and sound substitutions were not only more variable for this group, but also reflected a persistent overlay of maladaptive compensations developed during cannulation.


1992 ◽  
Vol 35 (4) ◽  
pp. 761-768 ◽  
Author(s):  
Petra Zwirner ◽  
Gary J. Barnes

Acoustic analyses of upper airway and phonatory stability were conducted on samples of sustained phonation to evaluate the relation between laryngeal and articulomotor stability for 31 patients with dysarthria and 12 non-dysarthric control subjects. Significantly higher values were found for the variability in fundamental frequency and formant frequency of patients who have Huntington’s disease compared with normal subjects and patients with Parkinson’s disease. No significant correlations were found between formant frequency variability and the variability of the fundamental frequency for any subject group. These findings are discussed as they pertain to the relationship between phonatory and upper airway subsystems and the evaluation of vocal tract motor control impairments in dysarthria.


Sensors ◽  
2020 ◽  
Vol 20 (4) ◽  
pp. 1230
Author(s):  
Lei Du ◽  
Qiao Sun ◽  
Jie Bai ◽  
Xiaolei Wang ◽  
Tianqi Xu

The 24 GHz continuous-wave (CW) Doppler radar sensor (DRS) is widely used for measuring the instantaneous speed of moving objects by using a non-contact approach, and has begun to be used in train-borne movable speed measurements in recent years in China because of its advanced performance. The architecture and working principle of train-borne DRSs with different structures including single-channel DRSs used for freight train speed measurements in railway freight dedicated lines and dual-channel DRSs used for speed measurements of high-speed and urban rail trains in railway passenger dedicated lines, are first introduced. Then, the disadvantages of two traditional speed calibration methods for train-borne DRS are described, and a new speed calibration method based on the Doppler shift signal simulation by imposing a signal modulation on the incident CW microwave signal is proposed. A 24 GHz CW radar target simulation system for a train-borne DRS was specifically realized to verify the proposed speed calibration method for a train-borne DRS, and traceability and performance evaluation on simulated speed were taken into account. The simulated speed range of the simulation system was up to (5~500) km/h when the simulated incident angle range was within the range of (45 ± 8)°, and the maximum permissible error (MPE) of the simulated speed was ±0.05 km/h. Finally, the calibration and uncertainty evaluation results of two typical train-borne dual-channel DRS samples validated the effectiveness and feasibility of the proposed speed calibration approach for a train-borne DRS with full range in the laboratory as well as in the field.


Author(s):  
Johan Sundberg

The function of the voice organ is basically the same in classical singing as in speech. However, loud orchestral accompaniment has necessitated the use of the voice in an economical way. As a consequence, the vowel sounds tend to deviate considerably from those in speech. Male voices cluster formant three, four, and five, so that a marked peak is produced in spectrum envelope near 3,000 Hz. This helps them to get heard through a loud orchestral accompaniment. They seem to achieve this effect by widening the lower pharynx, which makes the vowels more centralized than in speech. Singers often sing at fundamental frequencies higher than the normal first formant frequency of the vowel in the lyrics. In such cases they raise the first formant frequency so that it gets somewhat higher than the fundamental frequency. This is achieved by reducing the degree of vocal tract constriction or by widening the lip and jaw openings, constricting the vocal tract in the pharyngeal end and widening it in the mouth. These deviations from speech cause difficulties in vowel identification, particularly at high fundamental frequencies. Actually, vowel identification is almost impossible above 700 Hz (pitch F5). Another great difference between vocal sound produced in speech and the classical singing tradition concerns female voices, which need to reduce the timbral differences between voice registers. Females normally speak in modal or chest register, and the transition to falsetto tends to happen somewhere above 350 Hz. The great timbral differences between these registers are avoided by establishing control over the register function, that is, over the vocal fold vibration characteristics, so that seamless transitions are achieved. In many other respects, there are more or less close similarities between speech and singing. Thus, marking phrase structure, emphasizing important events, and emotional coloring are common principles, which may make vocal artists deviate considerably from the score’s nominal description of fundamental frequency and syllable duration.


Sensors ◽  
2020 ◽  
Vol 20 (14) ◽  
pp. 3909
Author(s):  
Patrick Pomerleau ◽  
Alain Royer ◽  
Alexandre Langlois ◽  
Patrick Cliche ◽  
Bruno Courtemanche ◽  
...  

Monitoring the evolution of snow on the ground and lake ice—two of the most important components of the changing northern environment—is essential. In this paper, we describe a lightweight, compact and autonomous 24 GHz frequency-modulated continuous-wave (FMCW) radar system for freshwater ice thickness and snow mass (snow water equivalent, SWE) measurements. Although FMCW radars have a long-established history, the novelty of this research lies in that we take advantage the availability of a new generation of low cost and low power requirement units that facilitates the monitoring of snow and ice at remote locations. Test performance (accuracy and limitations) is presented for five different applications, all using an automatic operating mode with improved signal processing: (1) In situ lake ice thickness measurements giving 2 cm accuracy up to ≈1 m ice thickness and a radar resolution of 4 cm; (2) remotely piloted aircraft-based lake ice thickness from low-altitude flight at 5 m; (3) in situ dry SWE measurements based on known snow depth, giving 13% accuracy (RMSE 20%) over boreal forest, subarctic taiga and Arctic tundra, with a measurement capability of up to 3 m in snowpack thickness; (4) continuous monitoring of surface snow density under particular Antarctic conditions; (5) continuous SWE monitoring through the winter with a synchronized and collocated snow depth sensor (ultrasonic or LiDAR sensor), giving 13.5% bias and 25 mm root mean square difference (RMSD) (10%) for dry snow. The need for detection processing for wet snow, which strongly absorbs radar signals, is discussed. An appendix provides 24 GHz simulated effective refractive index and penetration depth as a function of a wide range of density, temperature and wetness for ice and snow.


Sign in / Sign up

Export Citation Format

Share Document