A solution to the learning dilemma for recurrent networks of spiking neurons

Mapping Intimacies ◽

10.1101/738385 ◽

2019 ◽

Cited By ~ 8

Author(s):

Guillaume Bellec ◽

Franz Scherr ◽

Anand Subramoney ◽

Elias Hajek ◽

Darjan Salaj ◽

...

Keyword(s):

Recurrent Neural Networks ◽

Energy Efficient ◽

Gradient Descent ◽

Spiking Neurons ◽

Recurrent Networks ◽

Network Learning ◽

Backpropagation Through Time ◽

New Learning ◽

On Chip ◽

The Brain

AbstractRecurrently connected networks of spiking neurons underlie the astounding information processing capabilities of the brain. But in spite of extensive research, it has remained open how they can learn through synaptic plasticity to carry out complex network computations. We argue that two pieces of this puzzle were provided by experimental data from neuroscience. A new mathematical insight tells us how these pieces need to be combined to enable biologically plausible online network learning through gradient descent, in particular deep reinforcement learning. This new learning method – called e-prop – approaches the performance of BPTT (backpropagation through time), the best known method for training recurrent neural networks in machine learning. In addition, it suggests a method for powerful on-chip learning in novel energy-efficient spike-based hardware for AI.

Download Full-text

STDP Forms Associations between Memory Traces in Networks of Spiking Neurons

Cerebral Cortex ◽

10.1093/cercor/bhz140 ◽

2019 ◽

Vol 30 (3) ◽

pp. 952-968

Author(s):

Christoph Pokorny ◽

Matias J Ison ◽

Arjun Rao ◽

Robert Legenstein ◽

Christos Papadimitriou ◽

...

Keyword(s):

Brain Function ◽

Spatial Information ◽

Spike Timing ◽

Spiking Neurons ◽

Recurrent Networks ◽

Neural Codes ◽

Memory Traces ◽

Conflicting Evidence ◽

Excitability Of Neurons ◽

The Brain

Abstract Memory traces and associations between them are fundamental for cognitive brain function. Neuron recordings suggest that distributed assemblies of neurons in the brain serve as memory traces for spatial information, real-world items, and concepts. However, there is conflicting evidence regarding neural codes for associated memory traces. Some studies suggest the emergence of overlaps between assemblies during an association, while others suggest that the assemblies themselves remain largely unchanged and new assemblies emerge as neural codes for associated memory items. Here we study the emergence of neural codes for associated memory items in a generic computational model of recurrent networks of spiking neurons with a data-constrained rule for spike-timing-dependent plasticity. The model depends critically on 2 parameters, which control the excitability of neurons and the scale of initial synaptic weights. By modifying these 2 parameters, the model can reproduce both experimental data from the human brain on the fast formation of associations through emergent overlaps between assemblies, and rodent data where new neurons are recruited to encode the associated memories. Hence, our findings suggest that the brain can use both of these 2 neural codes for associations, and dynamically switch between them during consolidation.

Download Full-text

Accurate and efficient time-domain classification with adaptive spiking recurrent neural networks

10.1101/2021.03.22.436372 ◽

2021 ◽

Author(s):

Bojian Yin ◽

Federico Corradi ◽

Sander M. Bohté

Keyword(s):

Neural Networks ◽

Time Domain ◽

Recurrent Neural Networks ◽

State Of The Art ◽

Spiking Neurons ◽

Recurrent Networks ◽

Computationally Efficient ◽

Hardware Implementations ◽

Comparable Performance ◽

The Time Domain

ABSTRACTInspired by more detailed modeling of biological neurons, Spiking neural networks (SNNs) have been investigated both as more biologically plausible and potentially more powerful models of neural computation, and also with the aim of extracting biological neurons’ energy efficiency; the performance of such networks however has remained lacking compared to classical artificial neural networks (ANNs). Here, we demonstrate how a novel surrogate gradient combined with recurrent networks of tunable and adaptive spiking neurons yields state-of-the-art for SNNs on challenging benchmarks in the time-domain, like speech and gesture recognition. This also exceeds the performance of standard classical recurrent neural networks (RNNs) and approaches that of the best modern ANNs. As these SNNs exhibit sparse spiking, we show that they theoretically are one to three orders of magnitude more computationally efficient compared to RNNs with comparable performance. Together, this positions SNNs as an attractive solution for AI hardware implementations.

Download Full-text

DAPHNE: DATA PARALLELISM NEURAL NETWORK SIMULATOR

International Journal of Modern Physics C ◽

10.1142/s0129183193000045 ◽

1993 ◽

Vol 04 (01) ◽

pp. 17-28

Author(s):

PAOLO FRASCONI ◽

MARCO GORI ◽

GIOVANNI SODA

Keyword(s):

Recurrent Neural Networks ◽

Network Architecture ◽

Training Data ◽

Data Parallelism ◽

Network Simulator ◽

Recurrent Networks ◽

Feedforward Networks ◽

Backpropagation Through Time ◽

Connection Machine ◽

Execution Model

In this paper we describe the guideline of Daphne, a parallel simulator for supervised recurrent neural networks trained by Backpropagation through time. The simulator has a modular structure, based on a parallel training kernel running on the CM-2 Connection Machine. The training kernel is written in CM Fortran in order to exploit some advantages of the slicewise execution model. The other modules are written in serial C code. They are used for designing and testing the network, and for interfacing with the training data. A dedicated language is available for defining the network architecture, which allows the use of linked modules. The implementation of the learning procedures is based on training example parallelism. This dimension of parallelism has been found to be effective for learning static patterns using feedforward networks. We extend training example parallelism for learning sequences with full recurrent networks. Daphne is mainly conceived for applications in the field of Automatic Speech Recognition, though it can also serve for simulating feedforward networks.

Download Full-text

Local online learning in recurrent networks with random feedback

10.1101/458570 ◽

2018 ◽

Author(s):

James M. Murray

Keyword(s):

Online Learning ◽

Computational Neuroscience ◽

Learning Rule ◽

Recurrent Networks ◽

Biological Features ◽

Mathematical Arguments ◽

Learning Rules ◽

New Learning ◽

Gradient Based ◽

The Brain

AbstractA longstanding challenge for computational neuroscience has been the development of biologically plausible learning rules for recurrent neural networks (RNNs) enabling the production and processing of time-dependent signals such as those that might drive movement or facilitate working memory. Classic gradient-based algorithms for training RNNs have been available for decades, but they are inconsistent with known biological features of the brain, such as causality and locality. In this work we derive an approximation to gradient-based learning that comports with these biologically motivated constraints. Specifically, the online learning rule for the synaptic weights involves only local information about the pre- and postsynaptic activities, in addition to a random feedback projection of the RNN output error. In addition to providing mathematical arguments for the effectiveness of the new learning rule, we show through simulations that it can be used to train an RNN to successfully perform a variety of tasks. Finally, to overcome the difficulty of training an RNN over a very large number of timesteps, we propose an augmented circuit architecture that allows the RNN to concatenate short-duration patterns into sequences of longer duration.

Download Full-text

Weight Perturbation: An Optimal Architecture and Learning Technique for Analog VLSI Feedforward and Recurrent Multilayer Networks

Neural Computation ◽

10.1162/neco.1991.3.4.546 ◽

1991 ◽

Vol 3 (4) ◽

pp. 546-565 ◽

Cited By ~ 20

Author(s):

Marwan Jabri ◽

Barry Flower

Keyword(s):

Gradient Descent ◽

Analog Vlsi ◽

Multilayer Perceptrons ◽

Recurrent Networks ◽

Discrete Level ◽

Hardware Complexity ◽

Learning Technique ◽

Analog Implementation ◽

On Chip ◽

Direct Approximation

Previous work on analog VLSI implementation of multilayer perceptrons with on-chip learning has mainly targeted the implementation of algorithms like backpropagation. Although backpropagation is efficient, its implementation in analog VLSI requires excessive computational hardware. In this paper we show that, for analog parallel implementations, the use of gradient descent with direct approximation of the gradient using “weight perturbation” instead of backpropagation significantly reduces hardware complexity. Gradient descent by weight perturbation eliminates the need for derivative and bidirectional circuits for on-chip learning, and access to the output states of neurons in hidden layers for off-chip learning. We also show that weight perturbation can be used to implement recurrent networks. A discrete level analog implementation showing the training of an XOR network as an example is described.

Download Full-text

Energy Complexity of Recurrent Neural Networks

Neural Computation ◽

10.1162/neco_a_00579 ◽

2014 ◽

Vol 26 (5) ◽

pp. 953-973 ◽

Cited By ~ 10

Author(s):

Jiří Šíma

Keyword(s):

Neural Network ◽

Recurrent Neural Networks ◽

Time Instant ◽

Optimal Size ◽

Recurrent Networks ◽

Deterministic Finite Automaton ◽

Trade Off ◽

Energy Trade ◽

The Brain ◽

Time Overhead

Recently a new so-called energy complexity measure has been introduced and studied for feedforward perceptron networks. This measure is inspired by the fact that biological neurons require more energy to transmit a spike than not to fire, and the activity of neurons in the brain is quite sparse, with only about 1% of neurons firing. In this letter, we investigate the energy complexity of recurrent networks, which counts the number of active neurons at any time instant of a computation. We prove that any deterministic finite automaton with m states can be simulated by a neural network of optimal size [Formula: see text] with the time overhead of [Formula: see text] per one input bit, using the energy O(e), for any e such that [Formula: see text] and e=O(s), which shows the time-energy trade-off in recurrent networks. In addition, for the time overhead [Formula: see text] satisfying [Formula: see text], we obtain the lower bound of [Formula: see text] on the energy of such a simulation for some constant c>0 and for infinitely many s.

Download Full-text

Learning State Space Trajectories in Recurrent Neural Networks

Neural Computation ◽

10.1162/neco.1989.1.2.263 ◽

1989 ◽

Vol 1 (2) ◽

pp. 263-269 ◽

Cited By ~ 397

Author(s):

Barak A. Pearlmutter

Keyword(s):

Recurrent Neural Networks ◽

Gradient Descent ◽

Recurrent Network ◽

Network Computing ◽

Neural Network Learning ◽

Network Learning ◽

Error Functional ◽

Continuous Domains ◽

Processing Control ◽

Temporal Trajectory

Many neural network learning procedures compute gradients of the errors on the output layer of units after they have settled to their final values. We describe a procedure for finding ∂E/∂wij, where E is an error functional of the temporal trajectory of the states of a continuous recurrent network and wij are the weights of that network. Computing these quantities allows one to perform gradient descent in the weights to minimize E. Simulations in which networks are taught to move through limit cycles are shown. This type of recurrent network seems particularly suited for temporally continuous domains, such as signal processing, control, and speech.

Download Full-text

Weight perturbation learning outperforms node perturbation on broad classes of temporally extended tasks

10.1101/2021.10.04.463055 ◽

2021 ◽

Author(s):

Paul Manfred Züge ◽

Christian Klos ◽

Raoul-Martin Memmesheimer

Keyword(s):

Biologically Relevant ◽

Network Learning ◽

Learning Tasks ◽

Learning Rules ◽

New Learning ◽

Error Dynamics ◽

Low Dimensional ◽

Task Types ◽

The Brain ◽

Better Than

Biological constraints often impose restrictions for plausible plasticity rules such as locality and reward-based rather than supervised learning. Two learning rules that comply with these restrictions are weight (WP) and node (NP) perturbation. NP is often used in learning studies, in particular as a benchmark; it is considered to be superior to WP and more likely neurobiologically realized, as the number of weights and therefore their perturbation dimension typically massively exceed the number of nodes. Here we show that this conclusion no longer holds when we take two biologically relevant properties into account: First, tasks extend in time. This increases the perturbation dimension of NP but not WP. Second, tasks are low dimensional, with many weight configurations providing solutions. We analytically delineate regimes where these properties let WP perform as well as or better than NP. Further we find qualitative features of the weight and error dynamics that allow to distinguish which of the rules underlie a learning process: in WP, but not NP, weights mediating zero input diffuse and gathering batches of subtasks in a trial decreases the number of trials required. The insights suggest new learning rules, which combine for specific task types the advantages of WP and NP. Using numerical simulations, we generalize the results to networks with various architectures solving biologically relevant and standard network learning tasks. Our findings suggest WP and NP as similarly plausible candidates for learning in the brain and as similarly important benchmarks.

Download Full-text

Contrastive Learning and Neural Oscillations

Neural Computation ◽

10.1162/neco.1991.3.4.526 ◽

1991 ◽

Vol 3 (4) ◽

pp. 526-545 ◽

Cited By ~ 24

Author(s):

Pierre Baldi ◽

Fernando Pineda

Keyword(s):

Gradient Descent ◽

Learning Algorithms ◽

Learning Rule ◽

Unified Framework ◽

Contrast Function ◽

New Learning ◽

Different Types ◽

Free Network ◽

Two Phases ◽

The Brain

The concept of Contrastive Learning (CL) is developed as a family of possible learning algorithms for neural networks. CL is an extension of Deterministic Boltzmann Machines to more general dynamical systems. During learning, the network oscillates between two phases. One phase has a teacher signal and one phase has no teacher signal. The weights are updated using a learning rule that corresponds to gradient descent on a contrast function that measures the discrepancy between the free network and the network with a teacher signal. The CL approach provides a general unified framework for developing new learning algorithms. It also shows that many different types of clamping and teacher signals are possible. Several examples are given and an analysis of the landscape of the contrast function is proposed with some relevant predictions for the CL curves. An approach that may be suitable for collective analog implementations is described. Simulation results and possible extensions are briefly discussed together with a new conjecture regarding the function of certain oscillations in the brain. In the appendix, we also examine two extensions of contrastive learning to time-dependent trajectories.

Download Full-text

Local online learning in recurrent networks with random feedback

eLife ◽

10.7554/elife.43299 ◽

2019 ◽

Vol 8 ◽

Cited By ~ 5

Author(s):

James M Murray

Keyword(s):

Synaptic Weight ◽

Learning Rule ◽

Local Information ◽

Recurrent Networks ◽

Biological Features ◽

Mathematical Arguments ◽

Large Numbers ◽

New Learning ◽

Gradient Based ◽

The Brain

Recurrent neural networks (RNNs) enable the production and processing of time-dependent signals such as those involved in movement or working memory. Classic gradient-based algorithms for training RNNs have been available for decades, but are inconsistent with biological features of the brain, such as causality and locality. We derive an approximation to gradient-based learning that comports with these constraints by requiring synaptic weight updates to depend only on local information about pre- and postsynaptic activities, in addition to a random feedback projection of the RNN output error. In addition to providing mathematical arguments for the effectiveness of the new learning rule, we show through simulations that it can be used to train an RNN to perform a variety of tasks. Finally, to overcome the difficulty of training over very large numbers of timesteps, we propose an augmented circuit architecture that allows the RNN to concatenate short-duration patterns into longer sequences.

Download Full-text