How to predict a single image with nengo_dl / simulator?

Sorry if this question has been asked before. I’ve been working with the nengo_dl MNIST classification example. As discussed in the tutorial, I’ve been able to train a SoftLIF nengo_dl network, then switch the parameters to LIF, resulting in a ~2% error rate.

My question now is how can I pass a single image to this network? I would like to read a single image from my camera and make a prediction using the NengoDL. (This is not necessarily handwritten digits).

I have the simulator:

  sim = nengo_dl.Simulator(self.net, minibatch_size=1, unroll_simulation=10)
  sim.load_params("./mnist_params")

I see that sim.run_steps is good for feeding minibatches of images into the network, but this doesn’t seem appropriate for a single image. How can I get this to predict from a single image? Sorry if this is covered someplace, I wasn’t able to find this!

We can use the same method to feed in a single image as we would for a batch of images (effectively just using a batch size of one). Using the same variable names from the MNIST example, this would look something like

img = <load image>

# add batch/time dimensions (with size 1)
img = np.reshape(img, (1, 1, <number of pixels in image>))

# tile the image along the time dimension for however many timesteps we need
img = np.tile(img, (1, n_steps, 1))

# run the simulation and feed in our image
sim.run_steps(n_steps, input_feeds={inp: img})

# view the output results for the image
print(sim.data[out_p])
1 Like

Thanks for your response, that’s very helpful.

I’ve been exploring an idea where I would like to learn an encoding for an image using nengo_dl, then use reinforcement learning to learn a behavior for a robot based on the observed input / encoding.

I’m still in the process of working through the examples that were provided in this discussion:

Do you have any other tips or tricks that you can suggest on doing reinforcement learning using an image as input?

You’ll probably want to look into deep reinforcement learning methods. Deep Q learning (DQN) would be a good place to start as it is essentially the same as normal TD learning, just with a few tricks to make learning more stable.

Hello @drasmuss I was reading your answer as it is helpful in an issue I have. I have a CNN in keras which has input of (224,224,3), an rgb frame. Firstly I converted my model to a nengo one. Then I tried to use the statement below to produce the input for my converted nengo model.

I did not manage to replace number of pixels in image with something like a tensor with the desired dimensions, without an error. Is something like that possible? Or should I just replace with the number of pixels = 224x224x3?

Just in case, I tried the last one, the network predicts correctly but I get the following warning which I assume is produced due to the different dimensions of the input tensor. ( (1,1,224x224x3) instead of (224,224,3) )

WARNING:tensorflow:11 out of the last 11 calls to <function Model.make_predict_function.<locals>.predict_function at 0x7fbb48d57040> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.

Thank you for your time

That’s correct. If your image contains multiple values per pixel, you’ll need to flatten the entire thing into one (flat) vector. When you do, you’ll need to adjust the shape of the layers appropriately to account for this.

If you look through this NengoLoihi example, we work with colour images there. The example goes through step-by-step how the colour images were pre-processed, and how the network was architected.