[Nengo DL]: Building a ConvNet with kernel regularizers, dropout, and max pooling layers

zerone · July 23, 2020, 4:28am

Hello everyone!

Edit: Please refer to Questions below to begin the discussion straightforward. I have moved the code after the questions which can be used as an example to reproduce the warnings.

I am building a TF-Keras ConvNet with layers having kernel regularizers, max pooling and also wish to add dropout later. I am also using a loss function defined in tensorflow_addons, i.e. not the ones defined tensorflow. Also, please note that my end goal is to run such a network entirely on nengo-loihi.

Questions:

1> The warning UserWarning: Layer '<class 'tensorflow. ... Overwriting." % keras_layer every time I import nengo_dl seems innocuous to me. Is it?

2> If I set max_to_avg_pool = True, it degrades the performance of the network (compared to TF-Keras network) very much. Please note that I am using nengo.SpikingRectifiedLinear() while conversion. My question is,
a) Will such a nengo-dl network with max_to_avg_pool = False run entirely on loihi with spiking neurons?
b) Or will some parts of the network (i.e. MaxPooling) run on CPU/GPU due to the part being a TensorNode?
c) If it will partially run on loihi, then how can I get it running entirely on loihi? Do I train my TF-Keras network with no MaxPooling layer, rather with AveragePooling3D to compare the performance (assuming that AveragePooling3D runs on loihi with spiking neurons)?

3> If I set inference_only = True, the warnings UserWarning: conv3d.kernel_regularizer ... if error_msg else "") disappear but it again degrades the performance of the network compared to TF-Keras. It is probably because when inference_only = False, the neurons are still RectifiedLinear (even after conversion with SpikingRectifiedLinear) and thus better performing than SpikingRectifiedLinear. Is it? Thus, if inference_only = False, then all the layers are still TensorNodes and I guess… they won’t be running on Nengo-Loihi with spiking neurons.

4> If I include dropout in between dense layer and output layer, nengo warns UserWarning: Layer type <class 'tensorflow.python.keras.layers.core.Dropout'> does not have a registered converter. Falling back to TensorNode. % (error_msg + ". " if error_msg else "") => dropout layer is not supported. Is it? If I run such a net on nengo-loihi, will it have the same implication of partially running on loihi and partially on CPU/GPU?

5> If I happen to train the converted nengo-dl network (with tensorflow_addons loss function), it fails (I don’t remember exactly) due to no support of tensorflow_addons loss functions. I guess I have to declare my own custom loss function… is it?

Following is my code:

import nengo
import nengo_dl
import tensorflow as tf
import tensorflow_addons as tfa

def _get_cnn_block(conv, num_filters, ker_params, include_pooling=True,
                   rf=5e-5, pool_depth=2):
  conv = tf.keras.layers.Conv3D(
      num_filters, ker_params, padding="same", data_format="channels_last",
      activation='relu', kernel_initializer='he_uniform',
      kernel_regularizer=tf.keras.regularizers.l2(rf))(conv)
  if include_pooling:
    conv = tf.keras.layers.MaxPool3D(
        pool_size=(pool_depth, 2, 2), data_format="channels_last")(conv)

  return conv

def _get_dense_block(block, nn_dlyr, actvn="relu", rf=5e-5):
  dense = tf.keras.layers.Dense(
      nn_dlyr, activation=actvn, kernel_initializer="he_uniform",
      kernel_regularizer=tf.keras.regularizers.l2(rf))(block)

  return dense

def get_3d_cnn_model(inpt_shape, num_neurons_dlyr, num_clss, lr, rf):

  inpt = tf.keras.Input(shape=inpt_shape)
  
  conv0 = _get_cnn_block(inpt, 64, (3, 3, 3), pool_depth=1, rf=rf)
  
  conv1 = _get_cnn_block(conv0, 128, (3, 3, 3), pool_depth=2, rf=rf)

  flat = tf.keras.layers.Flatten(data_format="channels_last")(conv1)

  dense0 = _get_dense_block(flat, num_neurons_dlyr, rf=rf)

  output = _get_dense_block(dense0, num_clss, actvn="softmax", rf=rf)

  model = tf.keras.Model(inputs=inpt, outputs=output)
  
  model.compile(
      optimizer=tf.keras.optimizers.Adam(lr=lr), 
      loss=tfa.losses.focal_loss.sigmoid_focal_crossentropy, 
      metrics=["accuracy"])
  
  return model

inpt_shape = (16, 36, 64, 3)
model = get_3d_cnn_model(inpt_shape, 2048, 12, 1e-4, 5e-5)

nengo_model = nengo_dl.Converter(
    model, swap_activations={tf.keras.activations.relu: nengo.SpikingRectifiedLinear()},
    scale_firing_rates=10, synapse=0.005,
    max_to_avg_pool=False, inference_only=False)

While importing nengo-dl I get the following warning:

UserWarning: Layer '<class 'tensorflow.python.keras.layers.normalization_v2.BatchNormalization'>' already has a converter. Overwriting.
  "Layer '%s' already has a converter. Overwriting." % keras_layer

And after converting the TF-Keras network to Nengo-DL type model, I get the following warnings,

UserWarning: conv3d.kernel_regularizer has value <tensorflow.python.keras.regularizers.L1L2 object at 0x7f3f6d700e10> != None, which is not supported (unless inference_only=True). Falling back to TensorNode.
  % (error_msg + ". " if error_msg else "")
UserWarning: Cannot convert max pooling layers to native Nengo objects; consider setting max_to_avg_pool=True to use average pooling instead. Falling back to TensorNode.
  % (error_msg + ". " if error_msg else "")
UserWarning: conv3d_1.kernel_regularizer has value <tensorflow.python.keras.regularizers.L1L2 object at 0x7f3f6d512d50> != None, which is not supported (unless inference_only=True). Falling back to TensorNode.
  % (error_msg + ". " if error_msg else "")
UserWarning: dense.kernel_regularizer has value <tensorflow.python.keras.regularizers.L1L2 object at 0x7f3f6c00aad0> != None, which is not supported (unless inference_only=True). Falling back to TensorNode.
  % (error_msg + ". " if error_msg else "")
UserWarning: dense_1.kernel_regularizer has value <tensorflow.python.keras.regularizers.L1L2 object at 0x7f3f6bff27d0> != None, which is not supported (unless inference_only=True). Falling back to TensorNode.
  % (error_msg + ". " if error_msg else "")

and I have few questions related to them. Please clarify.

(Note: As you can see in the code, I am not setting max_to_avg_pool and inference_only to True, and hence the warnings.)

Please correct me if I am wrong anywhere and let me know your suggestions in the light of running the entire network on loihi. Thanks!

pblouw · July 29, 2020, 8:59pm

Hi!
Thanks for the questions. I’m not able to reproduce the UserWarning you are observing using the current developer installation of Nengo DL, but you can see where the warning is being thrown here in case that offers any insight.

Regarding max pooling, there’s no native Nengo equivalent to this operation, so you won’t be able to easily port a model onto Loihi using this operation. Regarding the inference_only argument, setting inference_only=True doesn’t guarantee that the model will be comprised of native Nengo objects, but it helps and allows the the converter to be more aggressive in the conversion process when trying to eliminate non-Nengo model components. More generally, anytime you have TensorNodes in a model, they will not be portable onto Loihi as they will run using Tensorflow under the hood. Dropout won’t translate directly onto Loihi for similar reasons, though since you likely wouldn’t be running a backprop-like training algorithm on Loihi, it might not be necessary to include this. Alternatively, you might be able to mimic dropout by generating spikes that inhibit particular neurons at specific time intervals.

If your ultimate goal is to get a model running on Loihi, it might be useful to start with some of the examples we have of convolutional networks running on the hardware: https://www.nengo.ai/nengo-loihi/examples.html. There are also some good tips for training deep spiking networks here: https://www.nengo.ai/nengo-dl/tips.html#training-a-spiking-deep-network. You will eventually want to have a model that is fully comprised of native Nengo objects in order to start porting things onto Loihi.

Anyway, hopefully this offers some helpful clarification, but let us know if you have any further questions!

zerone · August 2, 2020, 4:09am

Hello @pblouw! pardon for a late response, was caught up in some urgent work. And thank you for a detailed response.

BTW, the warning I am facing is in:

>>> nengo_dl.__version__
'3.2.0'

With respect to ultimately running my model on loihi, following statement:

resolves most of my doubts for now. Since using max pooling, setting inference_only=False, and using Dropout results in a TensorNode, thus not a native nengo Object… they won’t be running on nengo-loihi. So if I want a network which runs flawlessly on GPU as well as on loihi, I will have to design one without MaxPooling and Dropout, and layers with no kernel_regularizers. Please correct if I am wrong anywhere.

zerone · September 6, 2020, 6:56pm

Hello all, I had two questions after further experiments.

Upon replacing MaxPooling with AveragePooling, the performance of my network degraded in TF environment. Was wondering if there’s a way to keep MaxPooling layers during training in TF and then replace those with a Nengo object which could perform the same operation as MaxPooling in Nengo-DL environment?
Conv layers with kernel regularizers get converted to spiking layers only in “inference_only” mode, thus they are good only for inference or testing (and thus can be probably deployed on Loihi too for just inferencing). On the contrary, if any layer is still a TensorNode after conversion (which happens for layers with kernel_regularizers and inference_only=False) then as mentioned above by @pblouw it cannot be executed on Loihi, be it in training mode or testing mode. Therefore, it’s probably better to get rid of kernel_regularizers if it doesn’t significantly degrades the performance. Please correct me if I am wrong anywhere.

Eric · September 8, 2020, 1:19pm

With regards to max pooling: It’s hard to do max pooling with spikes, since you want to take the max across the firing rates, not just whatever neuron is currently spiking. There’s at least one example of people doing max pooling in spikes here: https://dgyblog.com/projects-term/spike-max-pool.html (see the Frontiers paper particularly). Basically, they just filter the neuron activities and use that to choose the max. I’ve never got into that, because in my experience you can do just as well with average pooling (or even just strided convolution), and most modern networks (e.g. ResNet, Inception) have gotten rid of max pooling. If you’re seeing a drop in performance when you replace MaxPooling with AveragePooling, you might want to try a different architecture. The All-Conv net is a good place to start (https://arxiv.org/abs/1412.6806, https://github.com/MateLabs/All-Conv-Keras).

Kernel regularizers only make a difference during training (they add a regularization term to the loss function). As you note, we need to use a TensorNode to implement them; if inference_only=True, we know we’re not doing training, so we can get rid of them. Loihi can’t do backprop anyway, so it doesn’t really matter that kernel regularizers can’t run on Loihi, because you’re not going to be training on Loihi anyway (the only learning NengoLoihi currently supports is the PES learning rule for individual connections).

That said, there’s no reason you can’t train your network on GPU with kernel regularizers, and then convert to Loihi with inference_only=True for testing. To do this, just save the parameters to a file after training:

converter = nengo_dl.Converter(keras_model)

with nengo_dl.Simulator(converter.net) as train_sim:
    # train on GPU (or CPU) here, e.g. `train_sim.fit(...)`

    # save the parameters to file
    train_sim.save_params("my_params_file")

with nengo_dl.Simulator(converter.net, inference_only=True) as test_sim:
    test_sim.load_params("my_params_file")

    # do testing here, e.g. `result = test_sim.predict(...)`

    # if running on Loihi, freeze the parameters back to the network
    test_sim.freeze_params(converter.net)

with nengo_loihi.Simulator(converter.net) as loihi_sim:
    # run on Loihi

See https://www.nengo.ai/nengo-loihi/examples/keras-to-loihi.html for more details on this type of workflow. Let me know if you have any problems, because it is occasionally possible for inference_only=True to change the number of parameters in the model, which causes problems for load_params.

One thing to note is that kernel regularizers will only regularize the weights; they will not have any (direct) control on the activity (firing rate) of the neurons. When training to run on Loihi, we typically regularize the activities directly. There’s not a perfect example of regularizing activities with the NengoDL Keras converter. The example here https://www.nengo.ai/nengo-dl/examples/keras-to-snn.html#Regularizing-during-training does do regularization, but it tries to push all rates towards the target, which isn’t optimal. Here https://www.nengo.ai/nengo-loihi/examples/cifar10-convnet.html#Train-network-using-NengoDL I use the percentile_l2_loss_range function to regularize the rates, which works a lot better, since it forces e.g. the 99th-percentile firing rates (essentially the max rates minus outliers) to be within a particular range (rather than forcing all rates). Between the two of those, hopefully it’s understandable how you might regularize firing rates in NengoDL.

zerone · September 9, 2020, 4:30pm

Thanks @Eric for a detailed response. With respect to kernel regularization all my doubts for now stand resolved. I am yet to have access to Loihi but when I do and in case I run into issues, I will ping here.

However, I have following two questions with respect to MaxPooling equivalent in Nengo-DL.

1>

Since the network is simulated for n_steps milliseconds and the spiking neurons output spikes (during the interval of n_steps), cannot we simply calculate firing rates (i.e. # spikes / n_steps) and then take a max of those values at the end of n_steps? Or is it that the calculation of max function is not possible with spiking neurons? I am probably missing something fundamental here…

2> If calculation of max function isn’t possible with spiking neurons, and there’s another function which is similar to max operation and it can be calculated with spiking neurons, how do we incorporate its implementation in Nengo-DL converted network?

Please let me know.

Eric · September 9, 2020, 6:45pm

It is possible with spiking neurons, but you have to sum over time (or filter) as you described. This is because we can’t know the firing rate of a neuron based on the instantaneous output (i.e. whether it’s spiking or not), but only by the firing rate, which takes time to compute. The more time you use to compute the firing rate, the more accurate your max pooling will be, but the longer it will take to compute the output of your network. If you use e.g. 100 steps to average over at your max pooling layer, this means that at each max pooling layer, you need to wait an additional 100 steps. So if you’ve got a deeper network, this can add a lot of latency (e.g. 10 max pooling layers waiting 100 steps each adds 1000 steps of latency). Since energy is proportional to time, the more time it takes to compute the output, the more energy your network uses.

I’ve honestly never looked at max pooling in spiking networks any more than this. You might be able to find a tradeoff where the increased accuracy is worth the additional latency. As I said, though, most modern architectures have just gotten rid of max pooling entirely (i.e. they’ve decided it’s not really necessary for good accuracy), so that’s the route I’ve gone in all my work as well.

I don’t know anything that’s similar to the max function but easier to compute.

zerone · September 10, 2020, 1:08am

Hello @Eric, thanks for the clarification. Totally getting your point in increased latency due to incorporation of max pooling. BTW, my network performs closer to MaxPooling one (but not better) with strided convolution in TF environment (experiments in queue with spiking neurons). I guess, I will rest the case of MaxPooling for now since it doesn’t make much sense to tradeoff accuracy with latency and power consumption (when these both are the focus of my research). Thank you for your inputs!

msanch35 · October 27, 2020, 9:35pm

@zerone Did you ever find a solution to the BatchNormalization warning? I am training a non-spiking network with and without nengo and getting very different results, so I’m trying to figure out if this “overwriting” is the culprit.

zerone · November 16, 2020, 6:48pm

Hello @msanch35! sorry for a late response… visited this forum after many days. I did not find any solution to suppress the BatchNormalization warning, but did notice that with the latest version of Nengo-DL (3.3.0), this warning no more occurs. BTW, I don’t think that this warning could be the reason of your discrepancy in results… with and without nengo. With my experiments I was able to get similar spiking and non-spiking results by appropriately setting the scale_firing_rates, n_steps, and synapse. You may want to tinker with these params.