Validating Nengo dl models during learning changes results

Hello everyone,
This is my first post in this community and I hope that I am not horribly ignorant to the forum guidelines by posting my question here.

I have been using nengo and nengo_dl in my latest research project, and based on my limited experience so for I generally think both are great tools. I come from an AI background and have worked on several deep learning projects (primarily Pytorch) in the past. I am hoping to use nengo to develop a spiking network for model predictive control (MPC). In my typical workflow, when I train a model, I also evaluate it’s performance continuously during the training process on a separate chunk of the dataset. My standard training loop looks something like this:

  1. Initialize the model and training / validation data
  2. Load latest weights, train for an epoch and save the new weights
  3. Load latest weights and evaluate the model
  4. Repeat steps 2 and 3 until some condition is reached (max epochs or loss below X)

When trying to apply this method to my nengo model, however, I noticed that when I train for some epochs, say 5, and then with another simulator object want to continue training with the same parameters, I start back at a higher loss than I left off. Below, I added an image of these results where I first train for 50 epochs, and then train the same model again for another 50 epochs. These are made with the Adam optimizer, but the effect also happens when using SGD but to a lesser extend. This also occurs when I do not even do an evaluation (step 3) in between. I think this might have something to do with the state of the optimizer? But I do use the exact same optimizer object for both training steps.

weirdness

In the end, I would like to be able to look at validation results during training without this influencing the results I get. In other words, I expect the same outcome if I train once for 100 epochs in a single simulator object, or 100 times for 1 epoch with individual simulator objects. I use separate objects because my current understanding it that they “close” and are not “reusable” in nengo.

I have created a working example that produced these results in a jupyter notebook and it is available right here:

https://drive.google.com/file/d/1vXJj0JD_f6odaUiCQdB_4KP49KJANhAS/view?usp=sharing

I would be very happy about any sort of insight on 1. why this behavior occurs, and 2. how I can avoid it. Any assistance in this regard is highly appreciated. Perhaps other people have run into this or a similar issue before and there is a “standard” learning loop that works in nengo_dl that I am unaware of? The examples given in the official documentation do not seem to cover this - only how to indeed save the weights and use them later, which I believe I implemented correctly.

Thank you very much for your help and stay safe,

Justus

Hi @jhuebotter, and welcome to the Nengo forums! :smiley:

I took a quick look at your notebook, and while I haven’t done a in-depth analysis, I can guess at what is happening here. In your notebook, you are using nengo.Ensemble objects for the layers. One of the features of Nengo (and by extension, NengoDL), is that the parameters (gains and biases) of the neural ensemble are randomly generated every time a simulator (nengo.Simulator or nengo_dl.Simulator) object is created.

What this means is that when you created a different simulator object at epoch 50, you are using the same connection weights as the first simulator object, but the neuron parameters are different. I believe this is the cause of the sudden increase in loss at that point (I’m actually amazed the loss doesn’t jump higher).

To prevent this problem, you can either:

  • Use a seed when creating the Nengo ensemble.
  • Or use predefined bias and gain values when creating the Nengo ensemble.

Dear @xchoo,
Thank you very much for taking the time to take a look at my issue. I would like to note that I do use a seed when creating the Nengo ensemble. However, I wanted to double-check if indeed the parameters of the model remain unchanged when using a new simulator object and this seems to be the case. For this purpose, I created a slightly adapted version of my code above and hosted it here: https://drive.google.com/file/d/144ifDVs-zwcWXRX025gUMmphxS2K5Za0/view?usp=sharing

It seems that there is an indication that SGD and Adam are updating different parameters, although the instructions between the two remain unchanged. In my understanding, this should not be the case. Additionally, when evaluating the same model multiple times with separate simulator objects (before or after learning) the loss is always the same, so there seems to be no random aspects based on the simulator initialization itself.

To me both of these issues are very weird (1. the jump in the loss when continuing learning with a new sim object and 2. the fact that Adam and SGD optimize different sets of parameters). They do currently prevent me from being able to use nengo dl in my research projects and I would still appreciate any insights on why this is happening and how to prevent it.

Best wishes,
Justus

I can see how my observations in my last post might not be fitting well to the initial problem description. Please share your opinion if I should post this as a separate thread.
In any case, I would still very much appreciate any input in both the initial problem (increase in loss with new sim) and the second observation (different parameters being updated based on choice of optimizer).

Best,
Justus

Hi @jhuebotter,

Just an update on this. I have been investigating your notebook and have been able to replicate the behaviour you are reporting. It seems that the issue only occurs with the Adam optimization, but I am not familiar enough with TensorFlow to determine why this is the case. I’ve messaged the NengoDL dev about it, but they are currently out of the office. I’ll let you know if when the respond.