Latent space visualization, characterization, and generation of diverse vocal communication signals

Mapping Intimacies ◽

10.1101/870311 ◽

2019 ◽

Cited By ~ 5

Author(s):

Tim Sainburg ◽

Marvin Thielk ◽

Timothy Q Gentner

Keyword(s):

Latent Variables ◽

Vocal Communication ◽

Animal Communication ◽

Temporal Structure ◽

Systematic Sampling ◽

Acoustic Feature ◽

Communication Signals ◽

Animal Vocalizations ◽

Complex Features ◽

Low Dimensional

ABSTRACTAnimals produce vocalizations that range in complexity from a single repeated call to hundreds of unique vocal elements patterned in sequences unfolding over hours. Characterizing complex vocalizations can require considerable effort and a deep intuition about each species’ vocal behavior. Even with a great deal of experience, human characterizations of animal communication can be affected by human perceptual biases. We present here a set of computational methods that center around projecting animal vocalizations into low dimensional latent representational spaces that are directly learned from data. We apply these methods to diverse datasets from over 20 species, including humans, bats, songbirds, mice, cetaceans, and nonhuman primates, enabling high-powered comparative analyses of unbiased acoustic features in the communicative repertoires across species. Latent projections uncover complex features of data in visually intuitive and quantifiable ways. We introduce methods for analyzing vocalizations as both discrete sequences and as continuous latent variables. Each method can be used to disentangle complex spectro-temporal structure and observe long-timescale organization in communication. Finally, we show how systematic sampling from latent representational spaces of vocalizations enables comprehensive investigations of perceptual and neural representations of complex and ecologically relevant acoustic feature spaces.

Download Full-text

Interpretable Variational Graph Autoencoder with Noninformative Prior

Future Internet ◽

10.3390/fi13020051 ◽

2021 ◽

Vol 13 (2) ◽

pp. 51

Author(s):

Lili Sun ◽

Xueyan Liu ◽

Min Zhao ◽

Bo Yang

Keyword(s):

Latent Variables ◽

Latent Variable ◽

Expert Knowledge ◽

Structural Information ◽

Standard Normal Distribution ◽

Noninformative Prior ◽

Latent Space ◽

Distribution Parameters ◽

Standard Normal ◽

Low Dimensional

Variational graph autoencoder, which can encode structural information and attribute information in the graph into low-dimensional representations, has become a powerful method for studying graph-structured data. However, most existing methods based on variational (graph) autoencoder assume that the prior of latent variables obeys the standard normal distribution which encourages all nodes to gather around 0. That leads to the inability to fully utilize the latent space. Therefore, it becomes a challenge on how to choose a suitable prior without incorporating additional expert knowledge. Given this, we propose a novel noninformative prior-based interpretable variational graph autoencoder (NPIVGAE). Specifically, we exploit the noninformative prior as the prior distribution of latent variables. This prior enables the posterior distribution parameters to be almost learned from the sample data. Furthermore, we regard each dimension of a latent variable as the probability that the node belongs to each block, thereby improving the interpretability of the model. The correlation within and between blocks is described by a block–block correlation matrix. We compare our model with state-of-the-art methods on three real datasets, verifying its effectiveness and superiority.

Download Full-text

Individual Differences in the Vocal Communication of Malayan Tapirs (Tapirus indicus) Considering Familiarity and Relatedness

Animals ◽

10.3390/ani11041026 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1026

Author(s):

Robin Walb ◽

Lorenzo von Fersen ◽

Theo Meijer ◽

Kurt Hammerschmidt

Keyword(s):

Individual Differences ◽

Vocal Communication ◽

Social Role ◽

Animal Communication ◽

Solitary Living ◽

Low Visibility ◽

Living Species ◽

The Individual ◽

Tapirus Indicus

Studies in animal communication have shown that many species have individual distinct calls. These individual distinct vocalizations can play an important role in animal communication because they can carry important information about the age, sex, personality, or social role of the signaler. Although we have good knowledge regarding the importance of individual vocalization in social living mammals, it is less clear to what extent solitary living mammals possess individual distinct vocalizations. We recorded and analyzed the vocalizations of 14 captive adult Malayan tapirs (Tapirus indicus) (six females and eight males) to answer this question. We investigated whether familiarity or relatedness had an influence on call similarity. In addition to sex-related differences, we found significant differences between all subjects, comparable to the individual differences found in highly social living species. Surprisingly, kinship appeared to have no influence on call similarity, whereas familiar subjects exhibited significantly higher similarity in their harmonic calls compared to unfamiliar or related subjects. The results support the view that solitary animals could have individual distinct calls, like highly social animals. Therefore, it is likely that non-social factors, like low visibility, could have an influence on call individuality. The increasing knowledge of their behavior will help to protect this endangered species.

Download Full-text

Vocal plasticity in a reptile

Proceedings of The Royal Society B Biological Sciences ◽

10.1098/rspb.2017.0451 ◽

2017 ◽

Vol 284 (1855) ◽

pp. 20170451 ◽

Cited By ~ 13

Author(s):

Henrik Brumm ◽

Sue Anne Zollinger

Keyword(s):

Signal Detection ◽

Communication Systems ◽

Vocal Communication ◽

Animal Communication ◽

High Amplitude ◽

Vocal Plasticity ◽

Lombard Effect ◽

Gekko Gecko ◽

High Degree ◽

Signalling Systems

Sophisticated vocal communication systems of birds and mammals, including human speech, are characterized by a high degree of plasticity in which signals are individually adjusted in response to changes in the environment. Here, we present, to our knowledge, the first evidence for vocal plasticity in a reptile. Like birds and mammals, tokay geckos ( Gekko gecko ) increased the duration of brief call notes in the presence of broadcast noise compared to quiet conditions, a behaviour that facilitates signal detection by receivers. By contrast, they did not adjust the amplitudes of their call syllables in noise (the Lombard effect), which is in line with the hypothesis that the Lombard effect has evolved independently in birds and mammals. However, the geckos used a different strategy to increase signal-to-noise ratios: instead of increasing the amplitude of a given call type when exposed to noise, the subjects produced more high-amplitude syllable types from their repertoire. Our findings demonstrate that reptile vocalizations are much more flexible than previously thought, including elaborate vocal plasticity that is also important for the complex signalling systems of birds and mammals. We suggest that signal detection constraints are one of the major forces driving the evolution of animal communication systems across different taxa.

Download Full-text

Predictive learning as a network mechanism for extracting low-dimensional latent space representations

Nature Communications ◽

10.1038/s41467-021-21696-1 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Stefano Recanatesi ◽

Matthew Farrell ◽

Guillaume Lajoie ◽

Sophie Deneve ◽

Mattia Rigotti ◽

...

Keyword(s):

Latent Variables ◽

Latent Structure ◽

Network Activity ◽

Semantic Organization ◽

Sequential Processing ◽

Predictive Learning ◽

Neural Representations ◽

Latent Space ◽

Low Dimensional ◽

Sensory Prediction

AbstractArtificial neural networks have recently achieved many successes in solving sequential processing and planning tasks. Their success is often ascribed to the emergence of the task’s low-dimensional latent structure in the network activity – i.e., in the learned neural representations. Here, we investigate the hypothesis that a means for generating representations with easily accessed low-dimensional latent structure, possibly reflecting an underlying semantic organization, is through learning to predict observations about the world. Specifically, we ask whether and when network mechanisms for sensory prediction coincide with those for extracting the underlying latent variables. Using a recurrent neural network model trained to predict a sequence of observations we show that network dynamics exhibit low-dimensional but nonlinearly transformed representations of sensory inputs that map the latent structure of the sensory environment. We quantify these results using nonlinear measures of intrinsic dimensionality and linear decodability of latent variables, and provide mathematical arguments for why such useful predictive representations emerge. We focus throughout on how our results can aid the analysis and interpretation of experimental data.

Download Full-text

Spatial and Temporal Organization of Composite Receptive Fields in the Songbird Auditory Forebrain

10.1101/2021.07.21.453115 ◽

2021 ◽

Author(s):

Nasim Winchester Vahidi

Keyword(s):

Vocal Communication ◽

Receptive Fields ◽

Temporal Organization ◽

Quadratic Model ◽

Bird Songs ◽

Communication Signals ◽

Complete Representation ◽

Auditory Forebrain ◽

Neuron Populations ◽

Classical Models

The mechanisms underlying how single auditory neurons and neuron populations encode natural and acoustically complex vocal signals, such as human speech or bird songs, are not well understood. Classical models focus on individual neurons, whose spike rates vary systematically as a function of change in a small number of simple acoustic dimensions. However, neurons in the caudal medial nidopallium (NCM), an auditory forebrain region in songbirds that is analogous to the secondary auditory cortex in mammals, have composite receptive fields (CRFs) that comprise multiple acoustic features tied to both increases and decreases in firing rates. Here, we investigated the anatomical organization and temporal activation patterns of auditory CRFs in European starlings exposed to natural vocal communication signals (songs). We recorded extracellular electrophysiological responses to various bird songs at auditory NCM sites, including both single and multiple neurons, and we then applied a quadratic model to extract large sets of CRF features that were tied to excitatory and suppressive responses at each measurement site. We found that the superset of CRF features yielded spatially and temporally distributed, generalizable representations of a conspecific song. Individual sites responded to acoustically diverse features, as there was no discernable organization of features across anatomically ordered sites. The CRF features at each site yielded broad, temporally distributed responses that spanned the entire duration of many starling songs, which can last for 50 s or more. Based on these results, we estimated that a nearly complete representation of any conspecific song, regardless of length, can be obtained by evaluating populations as small as 100 neurons. We conclude that natural acoustic communication signals drive a distributed yet highly redundant representation across the songbird auditory forebrain, in which adjacent neurons contribute to the encoding of multiple diverse and time-varying spectro-temporal features.

Download Full-text

Data Augmentation for Electricity Theft Detection Using Conditional Variational Auto-Encoder

Energies ◽

10.3390/en13174291 ◽

2020 ◽

Vol 13 (17) ◽

pp. 4291

Author(s):

Xuejiao Gong ◽

Bo Tang ◽

Ruijin Zhu ◽

Wenlong Liao ◽

Like Song

Keyword(s):

Latent Variables ◽

Data Augmentation ◽

Sampling Technique ◽

Smart Meters ◽

Generative Adversarial Network ◽

Data Set ◽

Electricity Theft ◽

Adversarial Network ◽

Low Dimensional ◽

Power Curves

Due to the strong concealment of electricity theft and the limitation of inspection resources, the number of power theft samples mastered by the power department is insufficient, which limits the accuracy of power theft detection. Therefore, a data augmentation method for electricity theft detection based on the conditional variational auto-encoder (CVAE) is proposed. Firstly, the stealing power curves are mapped into low dimensional latent variables by using the encoder composed of convolutional layers, and the new stealing power curves are reconstructed by the decoder composed of deconvolutional layers. Then, five typical attack models are proposed, and the convolutional neural network is constructed as a classifier according to the data characteristics of stealing power curves. Finally, the effectiveness and adaptability of the proposed method is verified by a smart meters’ data set from London. The simulation results show that the CVAE can take into account the shapes and distribution characteristics of samples at the same time, and the generated stealing power curves have the best effect on the performance improvement of the classifier than the traditional augmentation methods such as the random oversampling method, synthetic minority over-sampling technique, and conditional generative adversarial network. Moreover, it is suitable for different classifiers.

Download Full-text

Delayed Inhibition in Cortical Receptive Fields and the Discrimination of Complex Stimuli

Journal of Neurophysiology ◽

10.1152/jn.00144.2005 ◽

2005 ◽

Vol 94 (4) ◽

pp. 2970-2975 ◽

Cited By ~ 18

Author(s):

Rajiv Narayan ◽

Ayla Ergün ◽

Kamal Sen

Keyword(s):

Auditory Cortex ◽

Temporal Structure ◽

Receptive Fields ◽

Primary Auditory Cortex ◽

Natural Sounds ◽

Functional Roles ◽

Complex Stimuli ◽

Animal Vocalizations ◽

Field L ◽

Cortical Receptive Fields

Although auditory cortex is thought to play an important role in processing complex natural sounds such as speech and animal vocalizations, the specific functional roles of cortical receptive fields (RFs) remain unclear. Here, we study the relationship between a behaviorally important function: the discrimination of natural sounds and the structure of cortical RFs. We examine this problem in the model system of songbirds, using a computational approach. First, we constructed model neurons based on the spectral temporal RF (STRF), a widely used description of auditory cortical RFs. We focused on delayed inhibitory STRFs, a class of STRFs experimentally observed in primary auditory cortex (ACx) and its analog in songbirds (field L), which consist of an excitatory subregion and a delayed inhibitory subregion cotuned to a characteristic frequency. We quantified the discrimination of birdsongs by model neurons, examining both the dynamics and temporal resolution of discrimination, using a recently proposed spike distance metric (SDM). We found that single model neurons with delayed inhibitory STRFs can discriminate accurately between songs. Discrimination improves dramatically when the temporal structure of the neural response at fine timescales is considered. When we compared discrimination by model neurons with and without the inhibitory subregion, we found that the presence of the inhibitory subregion can improve discrimination. Finally, we modeled a cortical microcircuit with delayed synaptic inhibition, a candidate mechanism underlying delayed inhibitory STRFs, and showed that blocking inhibition in this model circuit degrades discrimination.

Download Full-text

Collective dynamics of repeated inference in variational autoencoder rapidly find cluster structure

Scientific Reports ◽

10.1038/s41598-020-72593-4 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Yoshihiro Nagano ◽

Ryo Karakida ◽

Masato Okada

Keyword(s):

Activity Pattern ◽

Latent Variables ◽

Cluster Structure ◽

Generative Models ◽

Cluster Center ◽

Specific Data ◽

Global Cluster ◽

Latent Space ◽

Variational Autoencoder ◽

Low Dimensional

Abstract Deep neural networks are good at extracting low-dimensional subspaces (latent spaces) that represent the essential features inside a high-dimensional dataset. Deep generative models represented by variational autoencoders (VAEs) can generate and infer high-quality datasets, such as images. In particular, VAEs can eliminate the noise contained in an image by repeating the mapping between latent and data space. To clarify the mechanism of such denoising, we numerically analyzed how the activity pattern of trained networks changes in the latent space during inference. We considered the time development of the activity pattern for specific data as one trajectory in the latent space and investigated the collective behavior of these inference trajectories for many data. Our study revealed that when a cluster structure exists in the dataset, the trajectory rapidly approaches the center of the cluster. This behavior was qualitatively consistent with the concept retrieval reported in associative memory models. Additionally, the larger the noise contained in the data, the closer the trajectory was to a more global cluster. It was demonstrated that by increasing the number of the latent variables, the trend of the approach a cluster center can be enhanced, and the generalization ability of the VAE can be improved.

Download Full-text

Principles of structure building in music, language and animal song

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2014.0097 ◽

2015 ◽

Vol 370 (1664) ◽

pp. 20140097 ◽

Cited By ~ 29

Author(s):

Martin Rohrmeier ◽

Willem Zuidema ◽

Geraint A. Wiggins ◽

Constance Scharff

Keyword(s):

Formal Language ◽

Animal Communication ◽

Structural Complexity ◽

Language Theory ◽

Human Language ◽

Chomsky Hierarchy ◽

Structure Building ◽

Animal Vocalizations ◽

Communicative Success ◽

Parallel Feature

Human language, music and a variety of animal vocalizations constitute ways of sonic communication that exhibit remarkable structural complexity. While the complexities of language and possible parallels in animal communication have been discussed intensively, reflections on the complexity of music and animal song, and their comparisons, are underrepresented. In some ways, music and animal songs are more comparable to each other than to language as propositional semantics cannot be used as indicator of communicative success or wellformedness, and notions of grammaticality are less easily defined. This review brings together accounts of the principles of structure building in music and animal song. It relates them to corresponding models in formal language theory, the extended Chomsky hierarchy (CH), and their probabilistic counterparts. We further discuss common misunderstandings and shortcomings concerning the CH and suggest ways to move beyond. We discuss language, music and animal song in the context of their function and motivation and further integrate problems and issues that are less commonly addressed in the context of language, including continuous event spaces, features of sound and timbre, representation of temporality and interactions of multiple parallel feature streams. We discuss these aspects in the light of recent theoretical, cognitive, neuroscientific and modelling research in the domains of music, language and animal song.

Download Full-text

From Perception to Action: The Role of Auditory Input in Shaping Vocal Communication and Social Behaviors in Birds

Brain Behavior and Evolution ◽

10.1159/000504380 ◽

2019 ◽

Vol 94 (Suppl. 1-4) ◽

pp. 51-60

Author(s):

Julie E. Elie ◽

Susanne Hoffmann ◽

Jeffery L. Dunning ◽

Melissa J. Coleman ◽

Eric S. Fortune ◽

...

Keyword(s):

Feedback Loop ◽

Vocal Communication ◽

Neural Mechanisms ◽

Communication Process ◽

Auditory Input ◽

Research Focus ◽

Vocal Production ◽

Active Participant ◽

Communication Signals

Acoustic communication signals are typically generated to influence the behavior of conspecific receivers. In songbirds, for instance, such cues are routinely used by males to influence the behavior of females and rival males. There is remarkable diversity in vocalizations across songbird species, and the mechanisms of vocal production have been studied extensively, yet there has been comparatively little emphasis on how the receiver perceives those signals and uses that information to direct subsequent actions. Here, we emphasize the receiver as an active participant in the communication process. The roles of sender and receiver can alternate between individuals, resulting in an emergent feedback loop that governs the behavior of both. We describe three lines of research that are beginning to reveal the neural mechanisms that underlie the reciprocal exchange of information in communication. These lines of research focus on the perception of the repertoire of songbird vocalizations, evaluation of vocalizations in mate choice, and the coordination of duet singing.

Download Full-text