scholarly journals Latent space visualization, characterization, and generation of diverse vocal communication signals

2019 ◽  
Author(s):  
Tim Sainburg ◽  
Marvin Thielk ◽  
Timothy Q Gentner

ABSTRACTAnimals produce vocalizations that range in complexity from a single repeated call to hundreds of unique vocal elements patterned in sequences unfolding over hours. Characterizing complex vocalizations can require considerable effort and a deep intuition about each species’ vocal behavior. Even with a great deal of experience, human characterizations of animal communication can be affected by human perceptual biases. We present here a set of computational methods that center around projecting animal vocalizations into low dimensional latent representational spaces that are directly learned from data. We apply these methods to diverse datasets from over 20 species, including humans, bats, songbirds, mice, cetaceans, and nonhuman primates, enabling high-powered comparative analyses of unbiased acoustic features in the communicative repertoires across species. Latent projections uncover complex features of data in visually intuitive and quantifiable ways. We introduce methods for analyzing vocalizations as both discrete sequences and as continuous latent variables. Each method can be used to disentangle complex spectro-temporal structure and observe long-timescale organization in communication. Finally, we show how systematic sampling from latent representational spaces of vocalizations enables comprehensive investigations of perceptual and neural representations of complex and ecologically relevant acoustic feature spaces.

2021 ◽  
Vol 13 (2) ◽  
pp. 51
Author(s):  
Lili Sun ◽  
Xueyan Liu ◽  
Min Zhao ◽  
Bo Yang

Variational graph autoencoder, which can encode structural information and attribute information in the graph into low-dimensional representations, has become a powerful method for studying graph-structured data. However, most existing methods based on variational (graph) autoencoder assume that the prior of latent variables obeys the standard normal distribution which encourages all nodes to gather around 0. That leads to the inability to fully utilize the latent space. Therefore, it becomes a challenge on how to choose a suitable prior without incorporating additional expert knowledge. Given this, we propose a novel noninformative prior-based interpretable variational graph autoencoder (NPIVGAE). Specifically, we exploit the noninformative prior as the prior distribution of latent variables. This prior enables the posterior distribution parameters to be almost learned from the sample data. Furthermore, we regard each dimension of a latent variable as the probability that the node belongs to each block, thereby improving the interpretability of the model. The correlation within and between blocks is described by a block–block correlation matrix. We compare our model with state-of-the-art methods on three real datasets, verifying its effectiveness and superiority.


Animals ◽  
2021 ◽  
Vol 11 (4) ◽  
pp. 1026
Author(s):  
Robin Walb ◽  
Lorenzo von Fersen ◽  
Theo Meijer ◽  
Kurt Hammerschmidt

Studies in animal communication have shown that many species have individual distinct calls. These individual distinct vocalizations can play an important role in animal communication because they can carry important information about the age, sex, personality, or social role of the signaler. Although we have good knowledge regarding the importance of individual vocalization in social living mammals, it is less clear to what extent solitary living mammals possess individual distinct vocalizations. We recorded and analyzed the vocalizations of 14 captive adult Malayan tapirs (Tapirus indicus) (six females and eight males) to answer this question. We investigated whether familiarity or relatedness had an influence on call similarity. In addition to sex-related differences, we found significant differences between all subjects, comparable to the individual differences found in highly social living species. Surprisingly, kinship appeared to have no influence on call similarity, whereas familiar subjects exhibited significantly higher similarity in their harmonic calls compared to unfamiliar or related subjects. The results support the view that solitary animals could have individual distinct calls, like highly social animals. Therefore, it is likely that non-social factors, like low visibility, could have an influence on call individuality. The increasing knowledge of their behavior will help to protect this endangered species.


2017 ◽  
Vol 284 (1855) ◽  
pp. 20170451 ◽  
Author(s):  
Henrik Brumm ◽  
Sue Anne Zollinger

Sophisticated vocal communication systems of birds and mammals, including human speech, are characterized by a high degree of plasticity in which signals are individually adjusted in response to changes in the environment. Here, we present, to our knowledge, the first evidence for vocal plasticity in a reptile. Like birds and mammals, tokay geckos ( Gekko gecko ) increased the duration of brief call notes in the presence of broadcast noise compared to quiet conditions, a behaviour that facilitates signal detection by receivers. By contrast, they did not adjust the amplitudes of their call syllables in noise (the Lombard effect), which is in line with the hypothesis that the Lombard effect has evolved independently in birds and mammals. However, the geckos used a different strategy to increase signal-to-noise ratios: instead of increasing the amplitude of a given call type when exposed to noise, the subjects produced more high-amplitude syllable types from their repertoire. Our findings demonstrate that reptile vocalizations are much more flexible than previously thought, including elaborate vocal plasticity that is also important for the complex signalling systems of birds and mammals. We suggest that signal detection constraints are one of the major forces driving the evolution of animal communication systems across different taxa.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Stefano Recanatesi ◽  
Matthew Farrell ◽  
Guillaume Lajoie ◽  
Sophie Deneve ◽  
Mattia Rigotti ◽  
...  

AbstractArtificial neural networks have recently achieved many successes in solving sequential processing and planning tasks. Their success is often ascribed to the emergence of the task’s low-dimensional latent structure in the network activity – i.e., in the learned neural representations. Here, we investigate the hypothesis that a means for generating representations with easily accessed low-dimensional latent structure, possibly reflecting an underlying semantic organization, is through learning to predict observations about the world. Specifically, we ask whether and when network mechanisms for sensory prediction coincide with those for extracting the underlying latent variables. Using a recurrent neural network model trained to predict a sequence of observations we show that network dynamics exhibit low-dimensional but nonlinearly transformed representations of sensory inputs that map the latent structure of the sensory environment. We quantify these results using nonlinear measures of intrinsic dimensionality and linear decodability of latent variables, and provide mathematical arguments for why such useful predictive representations emerge. We focus throughout on how our results can aid the analysis and interpretation of experimental data.


2021 ◽  
Author(s):  
Nasim Winchester Vahidi

The mechanisms underlying how single auditory neurons and neuron populations encode natural and acoustically complex vocal signals, such as human speech or bird songs, are not well understood. Classical models focus on individual neurons, whose spike rates vary systematically as a function of change in a small number of simple acoustic dimensions. However, neurons in the caudal medial nidopallium (NCM), an auditory forebrain region in songbirds that is analogous to the secondary auditory cortex in mammals, have composite receptive fields (CRFs) that comprise multiple acoustic features tied to both increases and decreases in firing rates. Here, we investigated the anatomical organization and temporal activation patterns of auditory CRFs in European starlings exposed to natural vocal communication signals (songs). We recorded extracellular electrophysiological responses to various bird songs at auditory NCM sites, including both single and multiple neurons, and we then applied a quadratic model to extract large sets of CRF features that were tied to excitatory and suppressive responses at each measurement site. We found that the superset of CRF features yielded spatially and temporally distributed, generalizable representations of a conspecific song. Individual sites responded to acoustically diverse features, as there was no discernable organization of features across anatomically ordered sites. The CRF features at each site yielded broad, temporally distributed responses that spanned the entire duration of many starling songs, which can last for 50 s or more. Based on these results, we estimated that a nearly complete representation of any conspecific song, regardless of length, can be obtained by evaluating populations as small as 100 neurons. We conclude that natural acoustic communication signals drive a distributed yet highly redundant representation across the songbird auditory forebrain, in which adjacent neurons contribute to the encoding of multiple diverse and time-varying spectro-temporal features.


Energies ◽  
2020 ◽  
Vol 13 (17) ◽  
pp. 4291
Author(s):  
Xuejiao Gong ◽  
Bo Tang ◽  
Ruijin Zhu ◽  
Wenlong Liao ◽  
Like Song

Due to the strong concealment of electricity theft and the limitation of inspection resources, the number of power theft samples mastered by the power department is insufficient, which limits the accuracy of power theft detection. Therefore, a data augmentation method for electricity theft detection based on the conditional variational auto-encoder (CVAE) is proposed. Firstly, the stealing power curves are mapped into low dimensional latent variables by using the encoder composed of convolutional layers, and the new stealing power curves are reconstructed by the decoder composed of deconvolutional layers. Then, five typical attack models are proposed, and the convolutional neural network is constructed as a classifier according to the data characteristics of stealing power curves. Finally, the effectiveness and adaptability of the proposed method is verified by a smart meters’ data set from London. The simulation results show that the CVAE can take into account the shapes and distribution characteristics of samples at the same time, and the generated stealing power curves have the best effect on the performance improvement of the classifier than the traditional augmentation methods such as the random oversampling method, synthetic minority over-sampling technique, and conditional generative adversarial network. Moreover, it is suitable for different classifiers.


2005 ◽  
Vol 94 (4) ◽  
pp. 2970-2975 ◽  
Author(s):  
Rajiv Narayan ◽  
Ayla Ergün ◽  
Kamal Sen

Although auditory cortex is thought to play an important role in processing complex natural sounds such as speech and animal vocalizations, the specific functional roles of cortical receptive fields (RFs) remain unclear. Here, we study the relationship between a behaviorally important function: the discrimination of natural sounds and the structure of cortical RFs. We examine this problem in the model system of songbirds, using a computational approach. First, we constructed model neurons based on the spectral temporal RF (STRF), a widely used description of auditory cortical RFs. We focused on delayed inhibitory STRFs, a class of STRFs experimentally observed in primary auditory cortex (ACx) and its analog in songbirds (field L), which consist of an excitatory subregion and a delayed inhibitory subregion cotuned to a characteristic frequency. We quantified the discrimination of birdsongs by model neurons, examining both the dynamics and temporal resolution of discrimination, using a recently proposed spike distance metric (SDM). We found that single model neurons with delayed inhibitory STRFs can discriminate accurately between songs. Discrimination improves dramatically when the temporal structure of the neural response at fine timescales is considered. When we compared discrimination by model neurons with and without the inhibitory subregion, we found that the presence of the inhibitory subregion can improve discrimination. Finally, we modeled a cortical microcircuit with delayed synaptic inhibition, a candidate mechanism underlying delayed inhibitory STRFs, and showed that blocking inhibition in this model circuit degrades discrimination.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Yoshihiro Nagano ◽  
Ryo Karakida ◽  
Masato Okada

Abstract Deep neural networks are good at extracting low-dimensional subspaces (latent spaces) that represent the essential features inside a high-dimensional dataset. Deep generative models represented by variational autoencoders (VAEs) can generate and infer high-quality datasets, such as images. In particular, VAEs can eliminate the noise contained in an image by repeating the mapping between latent and data space. To clarify the mechanism of such denoising, we numerically analyzed how the activity pattern of trained networks changes in the latent space during inference. We considered the time development of the activity pattern for specific data as one trajectory in the latent space and investigated the collective behavior of these inference trajectories for many data. Our study revealed that when a cluster structure exists in the dataset, the trajectory rapidly approaches the center of the cluster. This behavior was qualitatively consistent with the concept retrieval reported in associative memory models. Additionally, the larger the noise contained in the data, the closer the trajectory was to a more global cluster. It was demonstrated that by increasing the number of the latent variables, the trend of the approach a cluster center can be enhanced, and the generalization ability of the VAE can be improved.


2015 ◽  
Vol 370 (1664) ◽  
pp. 20140097 ◽  
Author(s):  
Martin Rohrmeier ◽  
Willem Zuidema ◽  
Geraint A. Wiggins ◽  
Constance Scharff

Human language, music and a variety of animal vocalizations constitute ways of sonic communication that exhibit remarkable structural complexity. While the complexities of language and possible parallels in animal communication have been discussed intensively, reflections on the complexity of music and animal song, and their comparisons, are underrepresented. In some ways, music and animal songs are more comparable to each other than to language as propositional semantics cannot be used as indicator of communicative success or wellformedness, and notions of grammaticality are less easily defined. This review brings together accounts of the principles of structure building in music and animal song. It relates them to corresponding models in formal language theory, the extended Chomsky hierarchy (CH), and their probabilistic counterparts. We further discuss common misunderstandings and shortcomings concerning the CH and suggest ways to move beyond. We discuss language, music and animal song in the context of their function and motivation and further integrate problems and issues that are less commonly addressed in the context of language, including continuous event spaces, features of sound and timbre, representation of temporality and interactions of multiple parallel feature streams. We discuss these aspects in the light of recent theoretical, cognitive, neuroscientific and modelling research in the domains of music, language and animal song.


2019 ◽  
Vol 94 (Suppl. 1-4) ◽  
pp. 51-60
Author(s):  
Julie E. Elie ◽  
Susanne Hoffmann ◽  
Jeffery L. Dunning ◽  
Melissa J. Coleman ◽  
Eric S. Fortune ◽  
...  

Acoustic communication signals are typically generated to influence the behavior of conspecific receivers. In songbirds, for instance, such cues are routinely used by males to influence the behavior of females and rival males. There is remarkable diversity in vocalizations across songbird species, and the mechanisms of vocal production have been studied extensively, yet there has been comparatively little emphasis on how the receiver perceives those signals and uses that information to direct subsequent actions. Here, we emphasize the receiver as an active participant in the communication process. The roles of sender and receiver can alternate between individuals, resulting in an emergent feedback loop that governs the behavior of both. We describe three lines of research that are beginning to reveal the neural mechanisms that underlie the reciprocal exchange of information in communication. These lines of research focus on the perception of the repertoire of songbird vocalizations, evaluation of vocalizations in mate choice, and the coordination of duet singing.


Sign in / Sign up

Export Citation Format

Share Document