[Nengo-DL]: Why is the output of `converter.net.all_nodes` different upon inclusion/exclusion of kernel regularizer?

zerone · February 27, 2021, 7:55pm

Hello everyone,

I am including below a small script to reproduce the issue I am facing.

import nengo
import numpy as np
import tensorflow as tf

import nengo_dl

seed = 0
np.random.seed(seed)
tf.random.set_seed(seed)

def get_model(include_kr=False):
    inp = tf.keras.Input(shape=(28, 28, 1))
    
    # convolutional layers
    if include_kr:
        conv0 = tf.keras.layers.Conv2D(
            filters=32,
            kernel_size=3,
            activation=tf.nn.relu,
            kernel_regularizer=tf.keras.regularizers.l2(1e-3),
        )(inp)
        
        conv1 = tf.keras.layers.Conv2D(
            filters=64,
            kernel_size=3,
            strides=2,
            activation=tf.nn.relu,
            kernel_regularizer=tf.keras.regularizers.l2(1e-3),
        )(conv0)
    else:
        conv0 = tf.keras.layers.Conv2D(
            filters=32,
            kernel_size=3,
            activation=tf.nn.relu,
        )(inp)
        
        conv1 = tf.keras.layers.Conv2D(
            filters=64,
            kernel_size=3,
            strides=2,
            activation=tf.nn.relu,
        )(conv0)
    
    flatten = tf.keras.layers.Flatten()(conv1)
    
    # fully connected layer.
    if include_kr:
      dense = tf.keras.layers.Dense(units=32, activation="relu",
                                    kernel_regularizer=tf.keras.regularizers.l2(1e-3))(flatten)
      dense = tf.keras.layers.Dense(units=64, activation="relu", 
                                    kernel_regularizer=tf.keras.regularizers.l2(1e-3))(dense)
    else:
      dense = tf.keras.layers.Dense(units=32, activation="relu")(flatten)
      dense = tf.keras.layers.Dense(units=64, activation="relu")(dense)
      
    # output layer.
    dense = tf.keras.layers.Dense(units=10, activation="softmax")(dense)
    
    model = tf.keras.Model(inputs=inp, outputs=dense)
    model.summary()
    return model

As you can see, depending on the value of include_kr, the get_model() returns model with or without the kernel regularizers in both Conv layers. Upon converting the model and inspecting the output of converter.net.all_nodes, I see two different outputs as below. Is this an expected behaviour? Why are there bias_relay nodes (in Nengo-DL v3.4.0 which are unlabeled nodes (in Nengo-DL v3.2.0) ) as well as nodes with different labels (i.e. “0.bias” appended to layer names)?

With include_kr=False:

model = get_model()
converter = nengo_dl.Converter(model)
converter.net.all_nodes

Output:

[<Node "input_1" at 0x2abef3fa8150>,
 <Node "conv2d.0.bias" at 0x2abef157c410>,
 <Node "conv2d.0.bias_relay" at 0x2abef3fc64d0>,
 <Node "conv2d_1.0.bias" at 0x2abef3fc6a50>,
 <Node "conv2d_1.0.bias_relay" at 0x2abef3fc6ad0>,
 <TensorNode "dense_2.0" at 0x2abef154d110>,
 <Node "dense_2.0.bias" at 0x2abefff284d0>]

With include_kr=True:

model = get_model(include_kr=True)
converter = nengo_dl.Converter(model)
converter.net.all_nodes

Output:

[<Node "input_1" at 0x2b993bee9910>,
 <TensorNode "conv2d" at 0x2b993bf29d10>,
 <TensorNode "conv2d_1" at 0x2b993bf4b2d0>,
 <TensorNode "dense" at 0x2b993bf4bcd0>,
 <TensorNode "dense_1" at 0x2b993bf4bd50>,
 <TensorNode "dense_2.0" at 0x2b993bf55390>,
 <Node "dense_2.0.bias" at 0x2b993bf559d0>]

Please resolve.

xchoo · March 1, 2021, 8:05pm

Hi @zerone

The purpose of NengoDL’s converter is to attempt to convert a TF model into a Nengo network. Part of this process involves matching TF layers with corresponding Nengo objects. However, since Nengo does not have a native implementation of regularizers, when NengoDL attempts to convert the Conv2D layer with the kernal regularizer, it is unable to construct the Nengo equivalent, and thus, it defaults to using a nengo_dl.TensorNode object to implement the Conv2D (with regularizer) layer.

With your example code, you will see that NengoDL has reported this substitution, with the following message appearing in the console output:

...\nengo_dl\converter.py:326: UserWarning: conv2d.kernel_regularizer has value <tensorflow.python.keras.regularizers.L2 object at 0x000001FD02C0A2B0> != None, which is not supported (unless inference_only=True). Falling back to TensorNode.
  warnings.warn(
...\nengo_dl\converter.py:326: UserWarning: conv2d_1.kernel_regularizer has value <tensorflow.python.keras.regularizers.L2 object at 0x000001FD02C0A880> != None, which is not supported (unless inference_only=True). Falling back to TensorNode.
  warnings.warn(

If your goal is to do the model training within NengoDL, then the difference in the network architecture between the regularizer / non-regularizer case is unavoidable. There is unfortunately, no way to force the NengoDL converter to use only nengo_dl.TensorNodes for the Conv2D layer.

However, if you are able to structure your code such that the training is done in TensorFlow, and you are only looking to convert the trained model (without performing further training involving the regularizers in NengoDL), then you can use the inference_only=True flag in the nengo_dl.Converter call to ignore the regularizers:

model = get_model(include_kr=True)
converter = nengo_dl.Converter(model, inference_only=True)
converter.net.all_nodes

zerone · March 2, 2021, 2:12am

Hello @xchoo! Thank you for your explanation. I have been using kernel_regularizer while training the network in TF mode, and then use nengo_dl.Converter() with inference_only=True to convert the TF trained network to a spiking one for testing - as you suggested. This has caused me no issues in getting a network with all Nengo objects (except the output layer with softmax which can be ignored in my use case).

For further context, let us not set inference_only=True and consider the Nengo-DL network (instead of TF network) to be trained, because there’s where I am facing issues.

I am more concerned about the difference in label names and the extra Node (unlabeled) at <memory-address> which caused my code to break (as it was expecting same output of label names irrespective of the presence of regularizer or not. Of course I can manually create the bias inputs too.) The presence of extra Node (unlabeled) at <memory-address> entries in the list is surprising to me. BTW, upon updating my nengo-dl version (from 3.2.0 to 3.4.0) I saw that the unlabeled nodes are now named as conv2d.0.bias_relay (please see the updated code and output in the question). Can you please explain their use case or what bias_relay means here?

In addtion, I noticed one other odd thing. ReLU Dense layers with no kernel regularizers simply do not have a corresponding label (or bias label) in their all_nodes output, which forced me to not mention their bias inputs (e.g. np.ones((batch_size, 64, 1), dtype=np.int32)) in the input dict during batch training. So there are two problems I am facing right now while training the Nengo-DL network (with rate neurons of course).

1> Presence of extra conv2d.0.bias_relay labels depending on the presence or absence of kernel_regularizer. Why so?
2> Absence of ReLU Dense layers bias labels depending on the presence or absence of kernel_regularizer. Or is it that the ReLU dense layer bias inputs are not required when kernel_regularizer is not present?

Please let me know.

xchoo · March 3, 2021, 4:02am

Architecturally, the converted Nengo network is different in the kernel-regularizer vs no-regularizer cases. In the no-regularizer case, the converted network consists of a mix of Nengo objects (like nengo.Ensemble, nengo.Node) and TensorNode objects. However, in the kernel-regularizer case, pretty much the entire network consists of TensorNode objects because there are no Nengo native objects that support the regularizer implementation.

In your test network, both the Conv2D and Dense layers contain bias values. When converted to a Nengo network, these bias values are implemented as a projection between a nengo.Node, and the corresponding Nengo object. For example, a Dense layer would be constructed as such:

(input) --+--> nengo.Ensemble --> (output)
(bias)  --'

or, alternatively, in Nengo (pseudo)code:

dense = nengo.Ensemble(...)
bias = nengo.Node(output=1)
nengo.Connection(<input>, dense)
nengo.Connection(bias, dense, transform=bias_weights)

For the Conv2D layer, it gets a little more complicated. The bias weights are shared between each channel of the convolution transform, so the bias_relay node is added to project the output of the bias weights to the multiple channels:

bias (nengo.Node) --> bias weights --> bias_relay (nengo.Node) --> Conv2D channel (nengo.Ensemble)

When you use the kernel regularizer, the NengoDL converter is unable to convert the Conv2D or Dense layers into their respective nengo.Ensemble and nengo.Nodes, so everything gets implemented inside one nengo_dl.TensorNode. The bias logic gets included inside the TensorNode, so there is no need for the extra bias or bias_relay nodes.

As I described above, the NengoDL converter converts TensorFlow layers into constituent Nengo objects. When the kernel regularizer is not used, it is able to do so successfully, and the Dense layers (as well as the Conv2D layers) get converted into nengo.Ensemble objects. The convert.net.all_nodes attribute only lists nengo.Nodes (and subclasses), so nengo.Ensemble objects do not appear in this list. To display the list of all nengo.Ensemble objects in a network, you’ll need to use the .all_ensembles attribute:

model = get_model(True)
converter = nengo_dl.Converter(model, inference_only=False)
print("All nodes:", converter.net.all_nodes)
print("All ensembles:", converter.net.all_ensembles)

Doing so for your example network yields the following result. For the no-regularizer case, it looks like this:

All nodes: [<Node 'input_1' at 0x20e9a29af10>, 
            <Node 'conv2d.0.bias' at 0x20e9a2bb250>, 
            <Node 'conv2d.0.bias_relay' at 0x20e9a2bb2b0>, 
            <Node 'conv2d_1.0.bias' at 0x20e9a27b220>, 
            <Node 'conv2d_1.0.bias_relay' at 0x20e9a27b370>, 
            <TensorNode 'dense_2.0' at 0x20e9a264d30>, 
            <Node 'dense_2.0.bias' at 0x20e9a264eb0>]
All ensembles: [<Ensemble 'conv2d.0' at 0x20e9a2bb160>, 
                <Ensemble 'conv2d_1.0' at 0x20e9a27b310>, 
                <Ensemble 'dense.0' at 0x20e9a26d1f0>, 
                <Ensemble 'dense_1.0' at 0x20e9a26d550>]

As a side note, I believe dense_2.0 is a TensorNode because the activation function for that layer is softmax which is not a supported Nengo neuron type.

Likewise, if you include the kernel regularizers, you get a different Nengo network entirely:

All nodes: [<Node 'input_1' at 0x24aa1a0f370>, 
           <TensorNode 'conv2d' at 0x24aa1a0fd90>, 
           <TensorNode 'conv2d_1' at 0x24aa1a60d90>, 
           <TensorNode 'dense' at 0x24aa1a7bbb0>, 
           <TensorNode 'dense_1' at 0x24aa1a84550>, 
           <TensorNode 'dense_2.0' at 0x24aa1a84a30>, 
           <Node 'dense_2.0.bias' at 0x24aa1a84df0>]
All ensembles: []

Here, because the NengoDL converter is unable to convert the regularizers into Nengo objects, everything gets converted into TensorNodes, and there are no nengo.Ensemble objects created (the list is empty).

zerone · March 3, 2021, 7:46pm

Thank you for a detailed explanation of the difference between the output of converter.net.all_nodes in the absence/presence of kernel regularizer. This resolves my doubt. However, during batch training of Nengo-DL network (not the direct TF one, I am also aware that it uses TF underneath to train the network) we have to mention the bias inputs in the input dict to sim.fit(batches) (where batches = (input_dict, output_dict)). Example below (for a different but similar architecture):

input_dict = {
        "input_1": imgs[start:start+batch_size],
        "n_steps": np.ones((batch_size, 1), dtype=np.int32),
        "conv2d.0.bias": np.ones((batch_size, 32, 1), dtype=np.int32),
        "conv2d_1.0.bias": np.ones((batch_size, 64, 1), dtype=np.int32),
        "conv2d_2.0.bias": np.ones((batch_size, 64, 1), dtype=np.int32),
        "conv2d_3.0.bias": np.ones((batch_size, 96, 1), dtype=np.int32),
        "conv2d_4.0.bias": np.ones((batch_size, 128, 1), dtype=np.int32),
        "dense.0.bias": np.ones((batch_size, 10, 1), dtype=np.int32),
        "dense_1.0.bias": np.ones((batch_size, 10, 1), dtype=np.int32),
        "dense_2.0.bias": np.ones((batch_size, 10, 1), dtype=np.int32),
      }

I was previously using nodes[i].label + ".0.bias": np.ones((batch_size, 32, 1), dtype=np.int32) in my input_dict where nodes = converter.net.all_nodes when my network layers had kernel regularizers. But the code failed (as I pointed out earlier) when I removed the kernel regularizers later; thus, I had to mention the bias inputs for each of the layers (except the input layer) explicitly by mentioning its string names.

Now that you have pointed out the different representation of layers (i.e. Ensembles and TensorNodes) in converter.net.all_nodes and converter.net.all_ensembles, I think I have to consider all_ensembles too to build my input_dict. This brings me to a follow up question. Is it necessary to explicitly mention the <layer_name>.0.bias input (as np.ones()) for each layer or does Nengo-DL takes care of the biases for some layers? Upon experimentation with no kernel regularizer in any of the layers (on a different but similar architecture) what I observed was that the sim.fit(batches) requires bias inputs for the Conv layers and the last Dense layer (which is a TensorNode due to softmax activation), but did not complain for the absence of the bias inputs for the two intermediate Dense layers (with just ReLU neurons) in the input_dict. Does Nengo-DL takes care of the bias input internally for them?

For the different architecture in question above, below is the output of converter.net.all_nodes:

[<Node "input_1" at 0x2b6cf1cc49d0>,
 <Node "conv2d.0.bias" at 0x2b6cf1cc4ad0>,
 <Node "conv2d.0.bias_relay" at 0x2b6cf1cc4bd0>,
 <Node "conv2d_1.0.bias" at 0x2b6cf1cf2310>,
 <Node "conv2d_1.0.bias_relay" at 0x2b6cf1cf2350>,
 <Node "conv2d_2.0.bias" at 0x2b6cf1cf2a90>,
 <Node "conv2d_2.0.bias_relay" at 0x2b6cf1cf2ad0>,
 <Node "conv2d_3.0.bias" at 0x2b6cf1cfa190>,
 <Node "conv2d_3.0.bias_relay" at 0x2b6cf1cfa1d0>,
 <Node "conv2d_4.0.bias" at 0x2b6cf1cfa890>,
 <Node "conv2d_4.0.bias_relay" at 0x2b6cf1cfa8d0>,
 <TensorNode "dense_2.0" at 0x2b6cf2099350>,
 <Node "dense_2.0.bias" at 0x2b6cf2099410>]

and the output of converter.net.all_ensembles:

[<Ensemble "conv2d.0" at 0x2b6cf1cc4a10>,
 <Ensemble "conv2d_1.0" at 0x2b6cf1cf2110>,
 <Ensemble "conv2d_2.0" at 0x2b6cf1cf2890>,
 <Ensemble "conv2d_3.0" at 0x2b6cf1cf2f50>,
 <Ensemble "conv2d_4.0" at 0x2b6cf1cfa650>,
 <Ensemble "dense.0" at 0x2b6cf1cfac90>,
 <Ensemble "dense_1.0" at 0x2b6cf1cfae50>]

Looks like one needs to mention the bias inputs for list entries in just the converter.net.all_nodes output (and in that too, only for the *.0.bias and not for *.0.bias_relay). Please confirm.

xchoo · March 3, 2021, 8:26pm

It strikes me odd that you have to specify the bias inputs during the training process with NengoDL. The NengoDL simulator should handle the bias weights training for you, and the bias nodes should already be outputting a value of np.ones(). When NengoDL creates the bias nodes, it creates them with an output of 1, which is then fed through the bias weights to give you the appropriate bias values for each neuron in the layer.

If you could provide more context as to what you are attempting to achieve with and without the kernel regularizer, that would be helpful. From my quick analysis of your network, you really should only need to provide the input to input_1.

zerone · March 6, 2021, 3:27am

nengo-forum.ipynb (7.0 KB)

Please find attached the file where I have batch-wise trained and tested a simple Nengo-DL model. If you comment out any of the "n_steps", "conv2d.0.bias", "conv2d_1.0.bias", "dense_2.0.bias" in input_dict (in cell 4) the training process will throw error complaining that there’s no input for them (they are all mentioned in converter.net.all_nodes). Interestingly it doesn’t require bias inputs for "dense.0.bias", "dense_1.0.bias" and I find them i.e. "dense.0", "dense_1.0" in converter.net.all_ensembles` .

Looks like the Nengo-DL doesn’t require any input for bias_relay. Please also note that I am not that concerned about including or excluding kernel regularizers as of now, I just observed a difference and that’s why pointed it out - and you have already resolved it! The model in uploaded file doesn’t use kernel regularizer. One extra small question though, does making nengo_dl.configure_settings(stateful=False) during training affect the training process in any way?

Edit: I am on

nengo.__version__, nengo_dl.__version__, tf.__version__
('3.1.0', '3.4.0', '2.2.0')

xchoo · March 6, 2021, 4:17am

Hmmm. Yes, I see the issue. I’m definitely not a TensorFlow expert, and have not used data generators with the sim.fit function before, but from my quick poking around, it looks like the issue is that if you use a generator to generate the data for the fit function, TensorFlow (or the TensorFlow magic interpretation code) expects to get data for every single Input (i.e., all of the Nengo nodes) in the network.

That’s my suspicion so far, because if I comment out the generator and just do:
sim.fit({converter.inputs[inp]: train_x}, {"probe": train_y})
everything works fine. I’ll have to double check with the NengoDL dev on Monday to see if this is an oversight or bug, or perhaps there’s another way to use the data generator with the sim.fit function within NengoDL.

Some other observations:
If I modify the generator code to use converter.inputs[inp] as the key to for the input_dict, I get this error:

ValueError: Unsupported value type Node returned by 

IteratorSpec._serialize

This seems to indicate that when you using the generator, it expects that the key is a string, which has to be the name of the layer. This means that the way the code is written, it might fail on the second “run” of a cell (only in a Jupyter notebook though), because on subsequent runs, the Input layer gets named "input_2" then "input_3", etc. So, to fix this, you’ll need to label the input layer with a fixed label.

In Nengo, there is a slight difference between the bias node and the bias_relay node. The bias node only outputs a value, and does not accept any input values. E.g., like so:

A = nengo.Node(output=1)

# Can't do this (node doesn't accept inputs because `size_in` is not specified):
nengo.Connection(ens, A)

# Can do this:
nengo.Connection(A, ens)

The bias_relay node, however is a special-case node called a “passthrough node”. This type of node accepts and input value, and duplicates the input as the output value. E.g.,

B = nengo.Node(size_in=1, output=lambda x: x)

# Can do this:
nengo.Connection(ens, B)

# Can also do this:
nengo.Connection(B, ens)

It’s likely that in NengoDL, when passing the “input layers” to the TensorFlow backend, it’s only considering the purely output nodes as input layers, and thus, the TensorFlow backend is only expecting inputs data definitions for those nodes.

zerone · March 6, 2021, 5:32am

Here’s where I learned how to do batch training, maybe this will help you get some context.

It fails exactly as you mentioned, so I need to restart the kernel and run it afresh. Labeling the layers will help.

Got your point. Thanks! BTW, I guess this following small question was missed in details probably.

Please let me know.

xchoo · March 6, 2021, 5:34pm

This is something else I’ll have to double check, but if I remember correctly, stateful is only useful if you:

Want to continue a simulation run (and maybe training) from where you last left off.
Have some sort of recurrence in the network that stores some state.

In every other case, I think setting stateful=False will speed up the simulation / training as opposed to leaving it on.

zerone · March 8, 2021, 5:05pm

Sure… Thanks @xchoo. Got it.

xchoo · March 8, 2021, 6:31pm

Hi @zerone,
I checked with out NengoDL devs and regarding the use of data generators, the behaviour (where you have to specify the input data for all input nodes) is expected. If you want to read more about it, you can find a blurb about this in the sim.fit documentation:

For dynamic input types (e.g., tf.data pipelines or generators), NengoDL tries to avoid introspecting/altering the data before the simulation starts, as this may have unintended side-effects. So data must be specified via one of the standard Keras methods (arrays, list of arrays, or string name dictionary; using a dictionary of Node objects is not supported). In addition, data must be explicitly provided for all input nodes (it will not be automatically generated if data is not specified).

In addition, when using dynamic inputs, data must be provided for the special "n_steps" input. This specifies the number of timesteps that the simulation will run for. Technically this is just a single scalar value (e.g., 10). But Keras requires that all input data be batched, so that input value needs to be duplicated into an array with size (batch_size, 1) (where all entries have the same value, e.g. 10).

Regarding the stateful attribute, the dev informed me that stateful=True is useful if your network includes anything that requires a state to be tracked. This may include (but is not limted to) recurrent connections, synapses, and neuron types that maintain some sort of state. Regarding runtimes, generally setting stateful=False will improve your runtime for all of the sim.* calls. And it is important to note that by default stateful=True is only set for sim.run. For all other sim calls, the default value for stateful is False if you don’t manually set it in your code (i.e., by doing nengo_dl.configure_settings(stateful=False)).

zerone · March 9, 2021, 4:40pm

Thanks a lot @xchoo for getting back at it! This helps.