compressed speech
Recently Published Documents


TOTAL DOCUMENTS

217
(FIVE YEARS 7)

H-INDEX

22
(FIVE YEARS 0)

Author(s):  
Iman Qays Abduljaleel ◽  
Amal Hameed Khaleel

<span lang="EN-US">Compression and encryption of speech signals are essential multimedia technologies. In the field of speech, these technologies are needed to meet the security and confidentiality of information requirements for transferring huge speech signals via a network, and for decreasing storage space for rapid retrieval. In this paper, we propose an algorithm that includes hybrid transformation in order to analyses the speech signal frequencies. The speech signal is then compressed, after removing low and less intense frequencies, to produce a well compressed speech signal and ensure the quality of the speech. The resulting compressed speech is then used as an input in a scrambling algorithm that was proposed on two levels. One of these is an external scramble that works on mixing up the segments of speech that were divided using Fuzzy C-Means and changing their locations. The internal scramble scatters the values of each block internally based on the pattern of a Sudoku puzzle and quadratic map so that the resulting speech is an input to a proposed encryption algorithm using the threefish algorithm. The proposed algorithm proved to be highly efficient in the compression and encryption of the speech signal based on approved statistical measures.</span>


2021 ◽  
Author(s):  
Karen Banai ◽  
Hanin Karawani ◽  
Limor Lavie ◽  
Yizhar Lavner

Abstract Perceptual learning, defined as long-lasting changes in the ability to extract information from the environment, occurs following either brief exposure or prolonged practice. Whether these two types of experience yield qualitatively distinct patterns of learning is not clear. We used a time-compressed speech task to assess perceptual learning following either rapid exposure or additional training. We report that both experiences yielded robust and long-lasting learning. Individual differences in rapid learning explained unique variance in performance in independent speech tasks (natural-fast speech and speech-in-noise) with no additional contribution for training-induced learning (Experiment 1). Finally, it seems that similar factors influence the specificity of the two types of learning (Experiment 1 and 2). We suggest that rapid learning is key for understanding the role of perceptual learning in speech recognition under adverse conditions while longer learning could serve to strengthen and stabilize learning.


2021 ◽  
Vol 2021 ◽  
pp. 1-15
Author(s):  
Chuanpeng Guo ◽  
Wei Yang ◽  
Mengxia Shuai ◽  
Liusheng Huang

Traditional machine learning-based steganalysis methods on compressed speech have achieved great success in the field of communication security. However, previous studies lacked mathematical modeling of the correlation between codewords, and there is still room for improvement in steganalysis for small-sized and low embedding rate samples. To deal with the challenge, we use Bayesian networks to measure different types of correlations between codewords in linear prediction code and present F3SNet—a four-step strategy: embedding, encoding, attention, and classification for quantization index modulation steganalysis of compressed speech based on the hierarchical attention network. Among them, embedding converts codewords into high-density numerical vectors, encoding uses the memory characteristics of LSTM to retain more information by distributing it among all its vectors, and attention further determines which vectors have a greater impact on the final classification result. To evaluate the performance of F3SNet, we make a comprehensive comparison of F3SNet with existing steganography methods. Experimental results show that F3SNet surpasses the state-of-the-art methods, particularly for small-sized and low embedding rate samples.


2021 ◽  
Vol 68 (2) ◽  
pp. 1565-1574
Author(s):  
Peng Liu ◽  
Songbin Li ◽  
Qiandong Yan ◽  
Jingang Wang ◽  
Cheng Zhang

2020 ◽  
Author(s):  
Maram Tarabeih-Ghanayim ◽  
Yizhar Lavner ◽  
Karen Banai

Many auditory skills improve with practice, but the generalization of this learning to untrained materials is limited. Here, we asked whether the type of practice (semantic or accent judgment) and talker variability (defined as the number of different talkers encountered during practice, two or six), influenced the perceptual learning of time-compressed speech and its generalization to unpracticed materials. Four groups of participants trained on the four task/talker variability combinations, and their pre- and post-training recognition of time-compressed speech was compared to that of a group of untrained participants (n = 14-16 participants/group). Across groups, training led to substantial learning of the trained tokens and to generalization to new talkers producing previously encountered sentences (compared to the untrained control group). However, neither type of training had a significant effect on the recognition of new sentences, even for familiar talkers. Semantic training yielded more learning and better retention of learning over a 2 week interval than accent training. The number of talkers had only marginal effects. These results suggest that learning of time-compressed speech is robust and only partially task specific, but its generalization to untrained tokens following brief practice is limited. In contrast to other types of speech training, here exposure to a variety of talkers during training did not contribute to the transfer of learning to new materials.


2020 ◽  
Vol 63 (4) ◽  
pp. 1083-1092
Author(s):  
Eric M. Johnson ◽  
Shae D. Morgan ◽  
Sarah Hargus Ferguson

Purpose This preliminary investigation compared effects of time compression on intelligibility for male versus female talkers. We hypothesized that time compression would have a greater effect for female talkers. Method Sentence materials from four talkers (two males) were time compressed, and original-speed and time-compressed speech materials were presented in a background of 12-talker babble to young adult listeners with normal hearing. Each talker/processing condition was heard by eight listeners (total N = 64). Generalized linear mixed-effects models were used to determine the effects of and interaction between processing condition and talker sex on keyword intelligibility. Additional post hoc analyses examined whether processing condition effects were related to talker vowel space and word frequency. Results For original-speed sentences, female and male talkers were essentially equally intelligible. Time compression reduced intelligibility for all talkers, but the effect was significantly greater for the female talkers. Supplementary analyses revealed that the effect of time compression depended on both talker vowel space and word frequency: The detrimental effect decreased significantly as word frequency and vowel space increased. Word frequency effects were also greater overall for talkers with larger vowel spaces. Conclusions While the small talker sample limits conclusions about the effects of talker sex, the secondary analyses suggest that intelligibility of talkers with larger vowel spaces is less susceptible to the negative effect of time compression, especially for high-frequency words.


Sign in / Sign up

Export Citation Format

Share Document