Insufficient memory for VGG-16 simulation with NengoDL

Bluedot · April 16, 2020, 10:55pm

Hi! I am new to using NengoDL and am working through a conversion example, using a pre-trained Keras VGG16 model. Following the converter examples from the NengoDL documentation, I am able to convert the model without issue. However, when I attempt to simulate on Google Colab, which should have over 10GB of RAM, the runtime quickly runs through the memory and promptly crashes. The main code steps are included below.

Do you have any recommendations for converting a model, such as VGG16 to SNN? Verifying using the Converter.verify() function also resulted in the following mismatch:

ValueError Traceback (most recent call last)

in ()
----> 1 converter.verify()

/usr/local/lib/python3.6/dist-packages/nengo_dl/converter.py in verify(self, training, inputs, atol, rtol)
272 logger.info(“Nengo:\n%s”, nengo_vals[fails])
273 raise ValueError(
–> 274 "Output of Keras model does not match output of converted "
275 “Nengo network”
276 )

ValueError: Output of Keras model does not match output of converted Nengo network

Thank you kindly for your help and insight!

%matplotlib inline

from urllib.request import urlretrieve

import nengo
%tensorflow_version 2.x
import tensorflow as tf
device_name = tf.test.gpu_device_name()
# Set through Edit > Notebook Settings > Enable GPU
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))
import numpy as np
import matplotlib.pyplot as plt

import nengo_dl

from tensorflow.keras.applications import vgg16

vgg_model = vgg16.VGG16(weights='imagenet')

(train_images, train_labels), (test_images, test_labels) = (
    tf.keras.datasets.mnist.load_data())

# flatten images
train_images = train_images.reshape((train_images.shape[0], -1))
test_images = test_images.reshape((test_images.shape[0], -1))

for i in range(3):
    plt.figure()
    plt.imshow(np.reshape(train_images[i], (28, 28)),
               cmap="gray")
    plt.axis('off')
    plt.title(str(train_labels[i]));
    
from keras.activations import softmax, relu, sigmoid
# Change activiation function to ReLU since SNN does not support Softmax
vgg_model.layers[-1].activation = relu

converter = nengo_dl.Converter(vgg_model,max_to_avg_pool=True,swap_activations={tf.keras.activations.relu: nengo.SpikingRectifiedLinear()})

converter.verify()

minibatch_size = 64

with nengo_dl.Simulator(converter.net, minibatch_size=minibatch_size) as sim:
    sim.step()

kpatel · April 17, 2020, 2:22pm

Insufficient Memory:
Nengo-DL always tends to consume more memory compared to the original Keras application since it has to setup for extra functionalities e.g., simulate model over time. The network as large as VGG-16 with minibatch_size=64 would require significant memory and it might not be a surprise to see the insufficient memory error. I would try to decrease the minibatch_size to begin with in order to fit the model on available hardware.
There are also functionalities in nengo_dl.Converter which can be disabled if not needed e.g.

with converter.net:
    nengo_dl.configure_settings(stateful=False,
                           inference_only=True,
                           use_loop=False
                           )

For the converter.verify error:
By converting the model using nengo_dl.Converter(vgg_model,max_to_avg_pool=True,swap_activations={tf.keras.activations.relu: nengo.SpikingRectifiedLinear()}) would change the converted model and which in turn would not match with the original Keras model. But converter.verify is also failing with nengo_dl.Converter(vgg_model) when the last layer activation function is updated. We are looking into this issue.

Bluedot · April 17, 2020, 5:41pm

Thank you for this advice and helpful suggestions. I will look to reduce the size of the model.

Very much appreciate you looking into the converter.verify error.

drasmuss · May 4, 2020, 5:34pm

Hi Bluedot,

I looked into this a bit more, and the verification failure is just due to minor floating point rounding differences (which relu is more sensitive to than softmax). If you increase the tolerances slightly, e.g. converter.verify(atol=1e-6), then the verification will pass.