Using weight decay

kstandvoss · June 9, 2018, 11:53am

Hi, I am trying to reimplement a DL model in nengo_dl and struggle with getting weight decay / l2 regularisation to work. In order to implement it I have to add the l2norm of the weights to the loss during training. I am using tf.layers.dense layers in my network and tensorflow conveniently allows for a regularisation parameter in the layer. But what it does is to register the l2norm of the weights as a collection which then has to be manually added to the loss. Now, the regular loss I define as an objective which I pass to the sim.train function of nengo_dl. The problem I have is that train loop seems to be executed in a different tf.session then the network construction so that I cannot get access to the l2loss in my objective. Has someone dealt with this issue and can give me a hint on how to get around this issue? Maybe I am just not completely getting the tf.sessions mechanism, but in any case any advice would be very welcome.

drasmuss · June 11, 2018, 6:55pm

Sorry for the slow response, I missed your question earlier. Here’s a simple example of one method for applying L2 weight regularization (I tried to make it as TensorFlow-like as possible so it would look familiar; you’d do things differently if you wanted to regularize parameters in a more Nengo-style network).

import nengo
import numpy as np
import nengo_dl
import tensorflow as tf

with nengo.Network() as net:
    stim = nengo.Node([1])

    class DenseLayer(object):
        def pre_build(self, shape_in, shape_out):
            self.W = tf.get_variable(
                "weights", shape=(shape_in[1], shape_out[1]),
                regularizer=tf.contrib.layers.l2_regularizer(0.1))

        def __call__(self, t, x):
            return tf.matmul(x, self.W)


    a = nengo_dl.TensorNode(DenseLayer(), size_in=1, size_out=10)
    nengo.Connection(stim, a)

    p = nengo.Probe(a)

with nengo_dl.Simulator(net) as sim:
    def my_objective(outputs, targets):
        return (tf.reduce_mean(tf.square(outputs - targets)) +
                tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)[0])


    for _ in range(10):
        sim.train({stim: np.ones((1, 1, 1))}, {p: np.ones((1, 1, 10))},
                  tf.train.GradientDescentOptimizer(1.0),
                  objective=my_objective)
        print(np.linalg.norm(sim.sess.run(a.tensor_func.W)))

You should see the norm of the weights decreasing every time we call sim.train (note that normally you’d just call sim.train with n_epochs=10 for a case like this, but I separated it out so that you can see the weights going down with each training iteration).

The main complication you’ll note is that we have to define our own layer function instead of using tf.layers.dense. This doesn’t actually have anything to do with the regularization, it’s just a feature of the tf.layers implementation. tf.layers combines the creation of the parameter variables (e.g. connection weights) and the application of those parameters to some input x in a single step. Normally this is convenient, and works great. However, when you use tf.layers inside a tf.while_loop (which we do in NengoDL, since we want to simulate models over time), this puts the parameters inside the while loop scope. Again, normally this is fine, unless you also want to use those parameters outside the while loop scope (e.g., when doing something like computing the L2 regularization loss). TensorFlow doesn’t let you use things from inside the while loop scope on the outside. So, long story short, you can’t use tf.layers functions to do regularization if you’re using them inside a tf.while_loop.

So that’s why we create our own version of tf.layers.dense, the DenseLayer TensorNode. The key feature of this is that we can separate out the parameter creation (inside the pre_build function, which will happen outside the tf.while_loop scope) from the application of those parameters (inside the __call__ function). And that lets us use tf.contrib.layers.l2_regularizer even though we’ll be running things in a while loop. (note that I didn’t include biases in DenseLayer, just to keep things simple, but they’d work in the same way).

drasmuss · June 11, 2018, 7:10pm

Also just to clarify this specifically, the training does run in the same Session/Graph as the network construction. However, you need to be careful about what the current “default graph” is when you’re working with TensorFlow, as many functions operate with respect to that default graph. For example, tf.get_collection returns items from the current default graph (I’m guessing this is what you were running in to). Inside the nengo_dl.Simulator scope, the default graph is set to the correct NengoDL graph. But outside that scope we don’t control the default session, so it will depend on what else is going on in your script. I.e.

tf.get_collection() # <--- returns items from the default graph, whatever that is
with nengo_dl.Simulator(net) as sim:
    tf.get_collection() # <--- returns items from the nengo_dl.Simulator graph