LMU use with Event Data Issues

jack9117 · February 2, 2022, 1:38pm

Hi Guys,

I’m trying to use a fully spiking LMU for processing some event data that was sampled at 250ps, but When during training the MAE loss either starts in the billions or goes straight to NaN (I imagine due to the very low dt). I was wondering if there was anything I could do to improve this?
So far I’ve tried scaling down some of the initial weights (e.g. the encoding vectors ex, eh, and em), changing the tau_ref of the RegularSpiking(nengo.Tanh(~)) ensembleand playing about with which weights are trainable (A/B) or even exist. I feel like I’m probably just missing something obvious.
Any insight you can offer would be greatly appreciated.

Many Thanks,
Jack

Network and some data if required:
https://uoe-my.sharepoint.com/:f:/g/personal/s2110140_ed_ac_uk/EofSYG7FJgxGvkiJObaPvWcBAdoI9Or0Yo8E-2Os1r3yag?e=6Sfvtc

xchoo · February 3, 2022, 3:42am

Hi @jack9117, and welcome back to the Nengo forums.

I spoke to the LMU experts and looking at your code, I think we may have identified what is causing the NaNs to occur. In your code, where you compute the A,B matrices, the dt in the cont2discrete is respective to the units of theta (and is independent of the simulation dt). Since they are typically in the same units, you can use dt=1.0 there, regardless of what the simulation timestep is. You can see that we do the same in our KerasLMU library code here.

If that doesn’t fix the issue, I recommend using rate neurons until you identify which part of the code is causing the NaN issue. You might also want to change the tau_ref of your neuron model to something is slightly bigger than the simulation dt (maybe 2 * dt?) to see if that fixes the issue.

jack9117 · February 3, 2022, 11:59am

Hi @xchoo,

I gave both approaches a try, and while it did have an impact on the initial loss I still get NaNs within the first epoch. I’ve switched back to using the tensornode for the hidden state in the meantime, and see I will see if I can some make improvements there first.

Regards,
Jack

jack9117 · February 4, 2022, 5:06pm

Hi @xchoo,

So I changed the hidden state neuron to the rate version of the tanh like you suggested and after playing around with it a bit, I managed to get the MAE down from >1e9 to about 11 over 100 epochs. So Im currently assuming I just need to let it run for longer. Currently I have switched back to the regular spiking like in the original code, and hopefully can produce a similar result if not better.

xchoo · February 7, 2022, 3:31am

Hi @jack9117,

Sounds like you are making progress.
Unfortunately, debugging and tuning ML networks can be a bit of a black art where you just have to try things to get things to work. If you run into more issues, I can message our LMU “experts” to see if they can provide more detailed suggestions to help you.

jack9117 · February 14, 2022, 8:43pm

Hi @xchoo,

I have been playing around with the netowk for the last week and I cant quite seem to replicate the results of the rate node with the regular spiking. the best i managed to achieve was an MAE of ~4 million, which isnt ideal. I was wondering if you had any additional tips if it isn’t too much trouble.

xchoo · February 18, 2022, 9:47pm

Hi @jack9117,

I can’t quite recall how your network is implemented and the link you had provided is now expired. Regardless, many of the crucial concepts to optimizing spiking neural networks is explained in this NengoDL example.

Here are some other things to keep in mind:

Spiking networks only transmit information when the neurons spike. This means that the number of spikes produced in a certain timeframe should be proportional to how quickly your data changes. You can increase the number of spikes produced by increasing the spike rate of the neurons, or by increasing the number of neurons in your network. Note that some spiking neuron models (iirc the nengo.SpikingRectifiedLinear is an example) can “spike” multiple times in one timestep which can also work to increase the number of spikes produced in the network. Also note that most spiking neuron models have a refractory time where the neuron can’t spike, regardless of how large the input to the neuron is. You can reduce the refractory time to increase the spike rate of the neurons.

Because information is only transmitted through spikes, most spiking networks use synaptic filters to smooth out the information (think of it as “averaging” the information over time). However, depending on the synapse model used, they can have unintended side effects on the network. The default synapse in Nengo is the exponential synapse, which functions as a low-pass filter. This has two effects on the network:

It adds a propagation “delay” for each layer in the network.
It smooths out high frequency changes in the input to the network. This means that if your input stimulus changes faster than the cutoff frequency of the synaptic filter, it will be attenuated. One way to test if this is the case with your data is to implement a rate network, but use the same synapse that you would use in a spiking network between each neural network. If this “filtered” rate network is able to train and produce acceptable results, then the synapses with these values shouldn’t affect your spiking network performance.