Out of meory error: nendo_dl.Simulator for nengo_spa model

keiji-iwao · May 9, 2019, 12:39pm

Hello.

I use nengo-spa and run my model with nengo_ocl.Simulator so far. However, I wanted to try a deep leaning model with CUDA, so I cleaned my computer (uninstalled opencl) and constructed a cuda-environment.

Then, I run my model with nengo_dl.Simulator() but the next error happened:

Optimization finished in 0:23:10                                                                                                   
|##################################Constructing graph: creating base arrays (1400%)####################################################################################################################################################################################################################################################################Construction finished in 0:01:41                                                                                                   
|                                                       Simulating                                        #               | 0:00:582019-05-09 19:34:16.435524: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 112.79MiB.  Current allocation summary follows.
2019-05-09 19:34:16.436212: W tensorflow/core/common_runtime/bfc_allocator.cc:277] **************************************************************************************xxxxxxxxxxxxxx
2019-05-09 19:34:16.436364: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 112.79MiB.  Current allocation summary follows.
2019-05-09 19:34:16.436966: W tensorflow/core/common_runtime/bfc_allocator.cc:277] **************************************************************************************xxxxxxxxxxxxxx
2019-05-09 19:34:16.488728: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 112.79MiB.  Current allocation summary follows.

These maessage continues and the next message is shown:

Simulation finished in 0:01:09                                                                                                     
Traceback (most recent call last):
  File "/home/iwao/.pyenv/versions/anaconda3-4.2.0/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1323, in _do_call
    return fn(*args)
  File "/home/iwao/.pyenv/versions/anaconda3-4.2.0/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1302, in _run_fn
    status, run_metadata)
  File "/home/iwao/.pyenv/versions/anaconda3-4.2.0/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
	 [[Node: _arg_Node_-STATEMENT13_ph_0_5/_149 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_2779__arg_Node_-STATEMENT13_ph_0_5", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
	 [[Node: transpose_2/_529 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2852_transpose_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "run.py", line 124, in <module>
    questionAnswering(input_sentences, path, unbinding_list, cue_list, answer, sim_i, verbs, nouns)
  File "/home/iwao/Desktop/research/datas/gakkai/codes/simulation.py", line 157, in questionAnswering
    sim.run(24)
  File "/home/iwao/.pyenv/versions/anaconda3-4.2.0/lib/python3.5/site-packages/nengo_dl/simulator.py", line 321, in run
    self.run_steps(steps, **kwargs)
  File "/home/iwao/.pyenv/versions/anaconda3-4.2.0/lib/python3.5/site-packages/nengo_dl/simulator.py", line 408, in run_steps
    callback=callback, profile=profile)
  File "/home/iwao/.pyenv/versions/anaconda3-4.2.0/lib/python3.5/site-packages/nengo_dl/simulator.py", line 897, in run_batch
    raise e  # pragma: no cover
  File "/home/iwao/.pyenv/versions/anaconda3-4.2.0/lib/python3.5/site-packages/nengo_dl/simulator.py", line 890, in run_batch
    options=run_options, run_metadata=run_metadata)
  File "/home/iwao/.pyenv/versions/anaconda3-4.2.0/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 889, in run
    run_metadata_ptr)
  File "/home/iwao/.pyenv/versions/anaconda3-4.2.0/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/iwao/.pyenv/versions/anaconda3-4.2.0/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
    options, run_metadata)
  File "/home/iwao/.pyenv/versions/anaconda3-4.2.0/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
	 [[Node: _arg_Node_-STATEMENT13_ph_0_5/_149 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_2779__arg_Node_-STATEMENT13_ph_0_5", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
	 [[Node: transpose_2/_529 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2852_transpose_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Sorry for these dirty error messages.
I think the cause is “out of memory”.

However, out of memory error has not been occured before( when running my model with nengo_ocl).
So, my questions is:
Should I use nengo_dl.Simulator if I use cuda, not nengo_ocl? Please note that my model is nengo-spa model.

import nengo_spa as spa
import nengo_dl

model = spa.Network()

with nengo_dl.Simulator(model) as sim:
    sim.run(1.0)

Is this code incorrect?

Please give me any advice. Thank you.

drasmuss · May 9, 2019, 1:42pm

It seems like something else on your computer is holding all the GPU memory (since I would assume you have plenty more than 112MiB available). Do you have multiple TensorFlow processes running at the same time, perhaps? By default TensorFlow will reserve all the GPU memory when it runs, so you can’t run multiple models at the same time.

You can change that default behaviour by setting these options https://www.tensorflow.org/guide/using_gpu#allowing_gpu_memory_growth. In NengoDL you can set that by setting nengo_dl.configure_settings(session_config={"gpu_options.allow_growth": True}).

Another thing to check, can you run just a standard TensorFlow example (not using NengoDL)? Something like https://www.tensorflow.org/tutorials/keras/basic_classification.

keiji-iwao · May 11, 2019, 8:36am

As you say, there is probably enough my GPU memory to be 7.92 GiB.

When I run a sample program after running another sample program, it was confirmed that the freeMemory was reduced because GPU memory was allocated in the previous running like totalMemory: 7.92GiB freeMemory: 164.38MiB.

So I restarted my PC and ran the same source, but similar messages were displayed repeatedly.

Construction finished in 0:01:59                                                                                                                                                                                                                                        
|                                                                                                                          Simulating                             #                                                                                            | 0:01:062019-05-10 17:28:50.376188: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 112.79MiB.  Current allocation summary follows.
2019-05-10 17:28:50.376826: W tensorflow/core/common_runtime/bfc_allocator.cc:277] **************xxx***************************************************************************xxxxxxxx
|                                                                                                                          Simulating                                   #                                                                                      | 0:01:072019-05-10 17:28:50.958895: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 19.55MiB.  Current allocation summary follows.
2019-05-10 17:28:50.959495: W tensorflow/core/common_runtime/bfc_allocator.cc:277] **************xxx***************************************************************************xxxxxxxx
2019-05-10 17:28:50.959566: W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[5124450,1]
2019-05-10 17:28:50.959632: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 112.79MiB.  Current allocation summary follows.
2019-05-10 17:28:50.960172: W tensorflow/core/common_runtime/bfc_allocator.cc:277] **************xxx***************************************************************************xxxxxxxx
|                                                                                                                          Simulating                                    #                                                                                     | 0:01:072019-05-10 17:28:51.118463: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 75.32MiB.  Current allocation summary follows.
2019-05-10 17:28:51.119053: W tensorflow/core/common_runtime/bfc_allocator.cc:277] **************xxx***************************************************************************xxxxxxxx
|                                                                                                                          Simulating                                                                                                            #             | 0:01:172019-05-10 17:29:00.960426: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 19.55MiB.  Current allocation summary follows.
2019-05-10 17:29:00.961025: W tensorflow/core/common_runtime/bfc_allocator.cc:277] **************xxx***************************************************************************xxxxxxxx
2019-05-10 17:29:00.961087: W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[5124450,1]

In my experiment, I use about 1000 semantic pointers whose dimension is 1232. This is an experimentally determined number of dimensions.

And about another thing to check, I run the next program:

import tensorflow as tf
mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(512, activation=tf.nn.relu),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)

The result is:

Epoch 1/5
2019-05-11 16:59:09.742919: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
2019-05-11 16:59:09.976845: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:82:00.0
totalMemory: 7.92GiB freeMemory: 7.36GiB
2019-05-11 16:59:09.976968: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:82:00.0, compute capability: 6.1)
60000/60000 [==============================] - 10s - loss: 0.2197 - acc: 0.9345       
Epoch 2/5
60000/60000 [==============================] - 8s - loss: 0.0981 - acc: 0.9690      
Epoch 3/5
60000/60000 [==============================] - 8s - loss: 0.0690 - acc: 0.9785      
Epoch 4/5
60000/60000 [==============================] - 9s - loss: 0.0535 - acc: 0.9830      
Epoch 5/5
60000/60000 [==============================] - 9s - loss: 0.0433 - acc: 0.9861      
 9952/10000 [============================>.] - ETA: 0s

keiji-iwao · May 13, 2019, 4:08am

When I run my code with the different condition, the dimension is 608 and the number of words (semantic pointers) is 300, it worked well.

drasmuss · May 15, 2019, 3:47pm

Ah yes with a model that large I actually wouldn’t be surprised if you run out of memory. I think the error message “(GPU_0_bfc) ran out of memory trying to allocate 112.79MiB” is saying that the 112.79MiB allocation is the final straw that triggered the OOM error (rather than the total amount that was being allocated).

To verify, you could try running your model on the CPU, by setting nengo_dl.Simulator(..., device="/cpu:0"). Then you can observe the memory usage, and see if it exceeds your available GPU memory (it won’t be an exact match, but gives you an idea).

Another thing you could try is breaking up your run into smaller chunks. For example, instead of doing

with nengo_dl.Simulator(model) as sim:
    sim.run(1.0)

do

with nengo_dl.Simulator(model) as sim:
    for i in range(10):
        sim.run(0.1)
        < save the data you are interested in >

That will reduce the amount of input/output data that you need to store on the GPU, which might let you run a larger model.