Brain Inspired Intelligence
For further interest, please refer to Deng et. al. 2021, ICLR, Li et. al. 2021, ICLR and Li and Deng et. al. 2021, ICML.
SNN model
Spiking neuron networks (SNNs) are biologically inspired networks, which have received increasing attention due to their efficient computation. During their development, many mathematical models have been proposed to describe neuron behavior. The most widely used neuron model for SNN is the Leaky Integrate-and-Fire (LIF) model, which uses simple differential equations to describe the membrane potential behavior of neurons. Its format of explicit iteration is governed by
When the membrane potential exceeds the pre-defined threshold
soft reset:
hard reset:
Questions
The major bottleneck of the spiking neuron network is how to acquire a well-performance SNN, especially on a complex dataset, since directly using the backpropagation algorithm is not suitable for SNN. Currently, two ways can effectively obtain excellent SNNs: surrogate gradient and ANN to SNN. ANN to SNN means converting a well-performed source ANN to a target SNN. Surrogate gradient methods use a soft relaxed function to replace the hard step function and train SNN by BPTT.
ANN to SNN
In this subsection, We use the soft reset and IF model (
- Train the source ANN that only contains Convolutional, average pooling, fully connected layer, and ReLU activation function. Record the maximum activation of each layer.
- Copy the network parameter to target SNN and replace the ReLU function with the IF model. The threshold of each layer is setting to the maximum activation.
- Run the target SNN with enough simulation length to achieve acceptable accuracy.
The converted SNN needs thousands of simulation time to achieve the same accuracy as source ANN, which does not meet the high-efficiency characteristics. So our work is to explore how to obtain higher accuracy, and lower inference latency converted SNN.
Layer-wise conversion error (ICLR 2021)
We split the conversion error into clipping error and flooring error. When the source ANN activation is larger than
Though analyze the output error between the source ANN and converted SNN, we decompose the conversion error into the output error of each layer. As a result, we can make the converted SNN closer to the source ANN by simply reducing the output error of each layer. Here we propose a method to reduce the flooring error: only to increase the SNN’s bias by
for l=1 to L do
SNN.layer[l].thresh<-ANN.layer[l].maximum_activation
SNN.layer[l].weight<-ANN.layer[l].weight
SNN.layer[l].bias<-ANN.layer[l].bias + SNN.layer[l].thresh / (2*T)
end for
Layer-wise Caibration (ICML 2021)
Adaptive threshold. We found that threshold balancing will cause a considerable flooring error,
especially when the simulation length is not enough, since it uses the maximum activation as the SNN’s threshold.
In practice, In practice, an Appropriate reduction of the threshold will increase the SNN performance. Here, we minimize the optimization problem, which is formulated by
The previous method does not rely on real data statistics. It is made by a strong assumption that activation is uniformly distributed. In fact, we can get a better bias increment by analyzing the distribution of some training samples. Here, we propose two methods to calibrate the SNN’s bias and weight, respectively layer-by-layer.
Bias correction (BC). In order to calibrate the bias, we first define a reduced mean function:
Potential correction (PC). Potential is similar to bias correction. In this method we can directly
set
for l=1 to L do
SNN.layer[l].frequency = SNN.layer[l].output.sum(4) / T
Layer[l].Error = ANN.layer[l].output - SNN.layer[l].frequency
SNN.layer[l].bias <- SNN.layer[l].bias + Layer[l].Error.mean(0).mean(2).mean(3) # bias correct
SNN.layer[l].mem <- SNN.layer[l].mean + Layer[l].Error.mean(0) # potential correct
Weight calibration (WC). The layer-wise conversion can be written as
Surrogate gradient
Surrogate gradient methods use a soft relaxed function to replace the hard step function and train SNN by BPTT. There are many shapes of surrogate gradients, like rectangles, exponential, and triangles. But during the training process, the surrogate gradient is always optimal?
Dspike for SNN Flexibility (NeurIPS 2021)
Here we propose a new family of Differentiable Spike (Dspike) functions that can adaptively evolve during training to
find the optimal shape and smoothness for gradient estimation. Mathmatically, Dspike function is like: