Install Nengo-dl for GPU

TKC_RC · January 11, 2019, 11:14pm

Hello, all,
I’ve loaded the nengo-dl (pip install nengo-dl) on three computers. All of them function well with CPU, but not for device=GPU. I’ve tried to install directly on the system (nvidia and cuda, cudnn were installed), anaconda (in conda-env) and nvidia-docker methods.
I’ve also tried to install tensorflow-gpu first. then run start python and check the GPU device.

from tensorflow.python.client import device_lib
device_lib.list_local_devices()

Both CPU and GPU list correctly. (list at end)

exit python, install the nengo-dl (pip install nengo-dl). There is no warning in the installation process.
check the device. only the CPU is available.

Thank you very much, any suggestions are most appreciate. TKC

output with only TF installed.

2019-01-11 22:50:44.116485: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-01-11 22:50:44.210567: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-01-11 22:50:44.210956: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: Quadro P1000 major: 6 minor: 1 memoryClockRate(GHz): 1.4805
pciBusID: 0000:01:00.0
totalMemory: 3.92GiB freeMemory: 2.97GiB
2019-01-11 22:50:44.210971: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-01-11 22:50:44.399572: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-11 22:50:44.399621: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-01-11 22:50:44.399628: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-01-11 22:50:44.399737: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 2680 MB memory) → physical GPU (device: 0, name: Quadro P1000, pci bus id: 0000:01:00.0, compute capability: 6.1)
[name: “/device:CPU:0”
device_type: “CPU”
memory_limit: 268435456
locality {
}
incarnation: 3484944782282267657
, name: “/device:XLA_GPU:0”
device_type: “XLA_GPU”
memory_limit: 17179869184
locality {
}
incarnation: 2208473880247278608
physical_device_desc: “device: XLA_GPU device”
, name: “/device:XLA_CPU:0”
device_type: “XLA_CPU”
memory_limit: 17179869184
locality {
}
incarnation: 11322029864051568140
physical_device_desc: “device: XLA_CPU device”
, name: “/device:GPU:0”
device_type: “GPU”
memory_limit: 2810839040
locality {
bus_id: 1
links {
}
}
incarnation: 5410470045191159611
physical_device_desc: “device: 0, name: Quadro P1000, pci bus id: 0000:01:00.0, compute capability: 6.1”
]

output after nengo-dl installed:

from tensorflow.python.client import device_lib

device_lib.list_local_devices()
2019-01-11 22:52:30.420519: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
[name: “/device:CPU:0”
device_type: “CPU”
memory_limit: 268435456
locality {
}
incarnation: 16464390469410501328
, name: “/device:XLA_CPU:0”
device_type: “XLA_CPU”
memory_limit: 17179869184
locality {
}
incarnation: 15558487920954708056
physical_device_desc: “device: XLA_CPU device”
]

drasmuss · January 11, 2019, 11:22pm

Hmm that is very odd. If I had to guess there’s something weird going on with your pip environment, so when you do pip install nengo-dl it is (for some reason) uninstalling your existing tensorflow-gpu installation and then reinstalling tensorflow. Could you do

> pip install tensorflow-gpu
> pip install nengo-dl

and post the output here?

drasmuss · January 11, 2019, 11:31pm

Also if you just want to get things working quickly, I suspect doing

pip install nengo-dl
pip install tensorflow-gpu

(i.e., install tensorflow-gpu after installing nengo-dl) will work for you.

drasmuss · January 11, 2019, 11:43pm

Actually digging into this a bit more, it looks like this is due to some change in tensorflow. I’m not sure when this happened, but it used to be that if you did

> pip install tensorflow-gpu
> pip install tensorflow

then you would still end up with GPU support (i.e. installing tensorflow did not overwrite tensorflow-gpu). Now it seems that that is no longer the case, and installing tensorflow overtop of tensorflow-gpu removes the GPU support.

Unfortunately there isn’t much we can do to fix that. We have to choose either tensorflow or tensorflow-gpu as an install requirement for nengo-dl. If we picked tensorflow-gpu then it would be broken for anyone without GPU support set up. So we picked tensorflow, as that at least will work for everyone.

Note that if you are installing nengo-dl from source, then we check if you have tensorflow or tensorflow-gpu installed, and handle either one correctly. But when you’re installing a .whl from pip, the choice has to be built into the .whl file. The only solution I can think of is that in the future we will have to not ship .whl files, and only use sdist.

In the meantime, the above suggestion

pip install nengo-dl
pip install tensorflow-gpu

should work for you.

drasmuss · January 13, 2019, 2:12am

There’s a new sdist-only release out now, so installing tensorflow-gpu and nengo-dl in any order should now work correctly.

TKC_RC · January 14, 2019, 2:38pm

Thank you very much for all of your kindly helps.
I have installed the nengo, nengl-dl in a fresh docker file, only the CPU available for tensorflow. I then installed the tensorflow-gpu, (see the output at following). The nengo_dl, tensorflow, and tensorflow_gpu are unable to be import (see errors at next, all of the them output same error messages).
Install tensorflow after install the tensorflow-gpu, all of the requirements already satisfied. (see the end).

messages of install tensorflow-gpu after install nengo_dl
$pip install tensorflow-gpu

Collecting tensorflow-gpu
Downloading https://files.pythonhosted.org/packages/55/7e/bec4d62e9dc95e828922c6cec38acd9461af8abe749f7c9def25ec4b2fdb/tensorflow_gpu-1.12.0-cp36-cp36m-manylinux1_x86_64.whl (281.7MB)
100% |████████████████████████████████| 281.7MB 342kB/s
Requirement already satisfied: termcolor>=1.1.0 in /opt/conda/lib/python3.6/site-packages (from tensorflow-gpu) (1.1.0)
Requirement already satisfied: six>=1.10.0 in /opt/conda/lib/python3.6/site-packages (from tensorflow-gpu) (1.12.0)
Requirement already satisfied: numpy>=1.13.3 in /opt/conda/lib/python3.6/site-packages (from tensorflow-gpu) (1.13.3)
Requirement already satisfied: tensorboard<1.13.0,>=1.12.0 in /opt/conda/lib/python3.6/site-packages (from tensorflow-gpu) (1.12.2)
Requirement already satisfied: gast>=0.2.0 in /opt/conda/lib/python3.6/site-packages (from tensorflow-gpu) (0.2.1.post0)
Requirement already satisfied: keras-applications>=1.0.6 in /opt/conda/lib/python3.6/site-packages (from tensorflow-gpu) (1.0.6)
Requirement already satisfied: grpcio>=1.8.6 in /opt/conda/lib/python3.6/site-packages (from tensorflow-gpu) (1.17.1)
Requirement already satisfied: keras-preprocessing>=1.0.5 in /opt/conda/lib/python3.6/site-packages (from tensorflow-gpu) (1.0.5)
Requirement already satisfied: absl-py>=0.1.6 in /opt/conda/lib/python3.6/site-packages (from tensorflow-gpu) (0.6.1)
Requirement already satisfied: protobuf>=3.6.1 in /opt/conda/lib/python3.6/site-packages (from tensorflow-gpu) (3.6.1)
Requirement already satisfied: wheel>=0.26 in /opt/conda/lib/python3.6/site-packages (from tensorflow-gpu) (0.32.3)
Requirement already satisfied: astor>=0.6.0 in /opt/conda/lib/python3.6/site-packages (from tensorflow-gpu) (0.7.1)
Requirement already satisfied: markdown>=2.6.8 in /opt/conda/lib/python3.6/site-packages (from tensorboard<1.13.0,>=1.12.0->tensorflow-gpu) (3.0.1)
Requirement already satisfied: werkzeug>=0.11.10 in /opt/conda/lib/python3.6/site-packages (from tensorboard<1.13.0,>=1.12.0->tensorflow-gpu) (0.14.1)
Requirement already satisfied: h5py in /opt/conda/lib/python3.6/site-packages (from keras-applications>=1.0.6->tensorflow-gpu) (2.7.1)
Requirement already satisfied: setuptools in /opt/conda/lib/python3.6/site-packages (from protobuf>=3.6.1->tensorflow-gpu) (40.6.3)
Installing collected packages: tensorflow-gpu
Successfully installed tensorflow-gpu-1.12.0

Error message for import nengo_dl after installed tensorflow_gpu
$python
Python 3.6.7 | packaged by conda-forge | (default, Nov 21 2018, 02:32:25)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux
Type “help”, “copyright”, “credits” or “license” for more information.

import nengo_dl
Traceback (most recent call last):
File “/opt/conda/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py”, line 58, in
from tensorflow.python.pywrap_tensorflow_internal import *
File “/opt/conda/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py”, line 28, in
_pywrap_tensorflow_internal = swig_import_helper()
File “/opt/conda/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py”, line 24, in swig_import_helper
_mod = imp.load_module(‘_pywrap_tensorflow_internal’, fp, pathname, description)
File “/opt/conda/lib/python3.6/imp.py”, line 243, in load_module
return load_dynamic(name, filename, file)
File “/opt/conda/lib/python3.6/imp.py”, line 343, in load_dynamic
return _load(spec)
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “”, line 1, in
File “/opt/conda/lib/python3.6/site-packages/nengo_dl/init.py”, line 30, in
from nengo_dl import (
File “/opt/conda/lib/python3.6/site-packages/nengo_dl/op_builders.py”, line 12, in
import tensorflow as tf
File “/opt/conda/lib/python3.6/site-packages/tensorflow/init.py”, line 24, in
from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
File “/opt/conda/lib/python3.6/site-packages/tensorflow/python/init.py”, line 49, in
from tensorflow.python import pywrap_tensorflow
File “/opt/conda/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py”, line 74, in
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File “/opt/conda/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py”, line 58, in
from tensorflow.python.pywrap_tensorflow_internal import *
File “/opt/conda/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py”, line 28, in
_pywrap_tensorflow_internal = swig_import_helper()
File “/opt/conda/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py”, line 24, in swig_import_helper
_mod = imp.load_module(‘_pywrap_tensorflow_internal’, fp, pathname, description)
File “/opt/conda/lib/python3.6/imp.py”, line 243, in load_module
return load_dynamic(name, filename, file)
File “/opt/conda/lib/python3.6/imp.py”, line 343, in load_dynamic
return _load(spec)
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory

Failed to load the native TensorFlow runtime.

See Build and install error messages | TensorFlow

for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.

** install tensorflow after install tensorflow-GPU**
$ pip install tensorflow

Requirement already satisfied: tensorflow in /opt/conda/lib/python3.6/site-packages (1.12.0)
Requirement already satisfied: tensorboard<1.13.0,>=1.12.0 in /opt/conda/lib/python3.6/site-packages (from tensorflow) (1.12.2)
Requirement already satisfied: protobuf>=3.6.1 in /opt/conda/lib/python3.6/site-packages (from tensorflow) (3.6.1)
Requirement already satisfied: grpcio>=1.8.6 in /opt/conda/lib/python3.6/site-packages (from tensorflow) (1.17.1)
Requirement already satisfied: absl-py>=0.1.6 in /opt/conda/lib/python3.6/site-packages (from tensorflow) (0.6.1)
Requirement already satisfied: gast>=0.2.0 in /opt/conda/lib/python3.6/site-packages (from tensorflow) (0.2.1.post0)
Requirement already satisfied: keras-applications>=1.0.6 in /opt/conda/lib/python3.6/site-packages (from tensorflow) (1.0.6)
Requirement already satisfied: six>=1.10.0 in /opt/conda/lib/python3.6/site-packages (from tensorflow) (1.12.0)
Requirement already satisfied: astor>=0.6.0 in /opt/conda/lib/python3.6/site-packages (from tensorflow) (0.7.1)
Requirement already satisfied: termcolor>=1.1.0 in /opt/conda/lib/python3.6/site-packages (from tensorflow) (1.1.0)
Requirement already satisfied: numpy>=1.13.3 in /opt/conda/lib/python3.6/site-packages (from tensorflow) (1.13.3)
Requirement already satisfied: wheel>=0.26 in /opt/conda/lib/python3.6/site-packages (from tensorflow) (0.32.3)
Requirement already satisfied: keras-preprocessing>=1.0.5 in /opt/conda/lib/python3.6/site-packages (from tensorflow) (1.0.5)
Requirement already satisfied: markdown>=2.6.8 in /opt/conda/lib/python3.6/site-packages (from tensorboard<1.13.0,>=1.12.0->tensorflow) (3.0.1)
Requirement already satisfied: werkzeug>=0.11.10 in /opt/conda/lib/python3.6/site-packages (from tensorboard<1.13.0,>=1.12.0->tensorflow) (0.14.1)
Requirement already satisfied: setuptools in /opt/conda/lib/python3.6/site-packages (from protobuf>=3.6.1->tensorflow) (40.6.3)
Requirement already satisfied: h5py in /opt/conda/lib/python3.6/site-packages (from keras-applications>=1.0.6->tensorflow) (2.7.1)

drasmuss · January 14, 2019, 2:48pm

That looks like an error related to the TensorFlow installation (it can’t find the libcublas file), independent of nengo-dl. If you just do

pip install tensorflow-gpu
python -c “import tensorflow”

do you get the same error?

TKC_RC · January 14, 2019, 4:08pm

Thanks for replying.
There is no error message if only install tensorflow-gpu.

Here is summary of installation on a fresh system using pip install:

tensorflow-gpu CPU - yes GPU - yes (warning notes at end)
tf-gpu nengo-dl CPU - yes GPU - no
nengo-dl CPU - yes GPU - no
nengo-dl tf-gpu unable to import tensorflow, nengo-dl ( can’t find the libcublas file)

Warning message note for case 1 (only tf-gpu installed, no nengo-dl)

tf.test.is_gpu_available()
2019-01-14 15:36:23.928789: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-01-14 15:36:24.013728: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

Do anybody see the same message? Do this one (NUMA node = -1) disable the nengo-dl to recognize the GPU?

drasmuss · January 14, 2019, 4:26pm

Could you do

> pip install tensorflow-gpu
> pip install nengo-dl

in a clean environment and post the full output here? Definitely something weird going on if you still aren’t getting GPU support with that.

Sorry for making you do all the legwork, but I can’t reproduce this on my end so have to test your environment specifically.

Do anybody see the same message? Do this one (NUMA node = -1) disable the nengo-dl to recognize the GPU?

I’ve never seen that message before, but it doesn’t seem like that should have anything to do with nengo-dl. nengo-dl doesn’t have any GPU logic of its own, everything is done through tensorflow, so if your tensorflow-gpu is working correctly with that warning message then nengo-dl should work as well.

TKC_RC · January 15, 2019, 4:36pm

I have fixed the problem now. Thank you everybody for the help.
The root cause of the problem is the cuda-10 on the host system.
As using nvidia-docker, the container shows the cuda version = 9.0.
tensorflow-gpu works fine before install nengo-dl.
However, after install the nengo-dl, both tf-gpu and nengo-dl only work on CPU model.
After downgrade the cuda (version = 9.0) on the host system, everything works fine.
Thanks again.