Intalling Jupyter in a virtualenv

Took me a few tries to get this working, since I started incorrectly and also did not want anaconda.

Install

activate your virtualenv
$ cd tensorflow
$ source bin/activate
(tensorflow)$ pip install –upgrade pip
(tensorflow)$ pip install ipython
(tensorflow)$ pip install jupyter

Test

(tensorflow)$ which python
/home/taufiq/tensorflow/bin/python
(tensorflow)$ which ipython
/home/taufiq/tensorflow/bin/ipython
(tensorflow)$ which jupyter-notebook
/usr/local/bin/jupyter-notebook
(I accidentally removed /usr/local/bin/jupyter which I suspect was a link to/usr/local/bin/jupyter-notebook)

Add Kernel

(tensorflow)$ ipython kernelspec install-self –user
This will create a kernelspec for your virtualenv and tell you where it is:
[InstallNativeKernelSpec] Installed kernelspec pythonX in /home/username/.local/share/jupyter/kernels/pythonX
Where pythonX will match the version of Python in your virtualenv.

Copy the new kernelspec somewhere useful. Choose a kernel_name for your new kernel that is not python2 or python3 or one you’ve used before and then:

(tensorflow)$ mkdir -p ~/.ipython/kernels
(tensorflow)$ mv ~/.local/share/jupyter/kernels/pythonX ~/.ipython/kernels/<kernel_name>
Change the name of the kernel that IPython shows, edit ~/.ipython/kernels/<kernel_name>/kernel.json and change the JSON key called display_name to tensrflow.

You should now be able to see your kernel in the IPython notebook menu: Kernel -> Change kernel and be able so switch to it (you may need to refresh the page before it appears in the list). IPython will remember which kernel to use for that notebook from then on.

(thanks to pythonanywhere: http://help.pythonanywhere.com/pages/IPythonNotebookVirtualenvs)

Run

(tensorflow)$ jupyter-notebook
this will open up a browser tab in chrome
In the browser, run a simple test:
screenshot-from-2016-12-18-112222

 

Testing TensorFlow

MNIST

All TensorFlow packages, including the demo models, are installed in the Python library. The exact location of the Python library depends on your system, but is usually one of:

/usr/local/lib/python2.7/dist-packages/tensorflow
/usr/local/lib/python2.7/site-packages/tensorflow

You can find out the directory with the following command (make sure to use the Python you installed TensorFlow to, for example, use python3 instead of python if you installed for Python 3):

$ python -c 'import os; import inspect; import tensorflow; print(os.path.dirname(inspect.getfile(tensorflow)))'

The simple demo model for classifying handwritten digits from the MNIST dataset is in the sub-directory models/image/mnist/convolutional.py. You can run it from the command line as follows (make sure to use the Python you installed TensorFlow with):

# Using 'python -m' to find the program in the python search path:
$ python -m tensorflow.models.image.mnist.convolutional

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GTX 1060
major: 6 minor: 1 memoryClockRate (GHz) 1.6705
pciBusID 0000:01:00.0
Total memory: 2.94GiB
Free memory: 2.88GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1060, pci bus id: 0000:01:00.0)
Initialized!

And I got this error!   due to cuDNN incorrect installation

E tensorflow/stream_executor/cuda/cuda_dnn.cc:378] Loaded runtime CuDNN library: 5005 (compatibility version 5000) but source was compiled with 5105 (compatibility version 5100). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
F tensorflow/core/kernels/conv_ops.cc:532] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms) 
Aborted (core dumped)

Then I started a new terminal window, activated the virtualenv and ran it again, and it works !

Took under 1 min, 8500 steps 0.8% validation and test error

LTSM

$ wget http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz
$ tar -xvf simple-examples.tgz
$ python tensorflow/tensorflow/models/rnn/ptb/ptb_word_lm.py –data=./simple-examples/data –model=small – See more at: http://www.nvidia.com/object/gpu-accelerated-applications-tensorflow-running-jobs.html#sthash.err4u2QJ.dpuf

This one gave an error – apparently incompatibility with latest version of TF

AttributeError: ‘module’ object has no attribute ‘deprecated’

INCEPTION V3

Google’s Inception v3 network is a cutting-edge convolutional network designed for image classification. Training this model from scratch is very intensive and can take from several days up to weeks of training time. An alternative approach is to download the model pre-trained, and then re-train it on another dataset.

First, download the pre-trained Inception v3 model, which includes the checkpoint file model.ckpt-157585:

$ mkdir /home/taufiq/data
$  export INCEPTION_DIR=/home/taufiq/data/
$ cd $INCEPTION_DIR
$ curl -O http://download.tensorflow.org/models/image/imagenet/inception-v3-2016-03-01.tar.gz
$ tar -xvf inception-v3-2016-03-01.tar.gz

Next, clone the TensorFlow models repository:

$ git clone https://github.com/tensorflow/models.git tensorflow-models
$ cd tensorflow-models/inception

A dataset containing labeled images of flowers will be used to re-train the network. Follow these steps to download and preprocess the 218 MB flowers dataset:

$ export FLOWERS_DIR=/home/taufiq/data/flowers
$ mkdir -p $FLOWERS_DIR/data
$ bazel build inception/download_and_preprocess_flowers
$ bazel-bin/inception/download_and_preprocess_flowers $FLOWERS_DIR/data
# Ignore error “…/build_image_data: No such file or directory”
$ python inception/data/build_image_data.py –train_directory=$FLOWERS_DIR/data/raw-data/train/ –validation_directory=$FLOWERS_DIR/data/raw-data/validation/ –output_directory=$FLOWERS_DIR/data –labels_file=$FLOWERS_DIR/data/raw-data/labels.txt

Finished writing all 500 images in data set.

Finished writing all 3170 images in data set.
$ cd –

This will download the 218 MB flowers image dataset and then preprocess it into training and validation sets. The re-training procedure can then be executed using these steps (note that the additional commands are to avoid a dependency on the Bazel build system):

$ mkdir -p $FLOWERS_DIR/train
$ bazel build inception/flowers_train
$ cd ~/data/tensorflow-models/inception/inception/slim
$ edit ops.py
      gamma = variables.variable('gamma',
                                 params_shape,
-                                 initializer=tf.ones_initializer,
+                                 initializer=tf.ones_initializer(),
                                 trainable=trainable,
                                 restore=restore)
$ export FLOWERS_DIR=/home/taufiq/data/flowers
$ export INCEPTION_DIR=/home/taufiq/data/
$ cd ~/data/tensorflow-models/inception
$ bazel-bin/inception/flowers_train –train_dir=$FLOWERS_DIR/train –data_dir=$FLOWERS_DIR/data –pretrained_model_checkpoint_path=$INCEPTION_DIR/inception-v3/model.ckpt-157585 –fine_tune=True –initial_learning_rate=0.001 -input_queue_memory_factor=1 –max_steps=500 –num_gpus 1 –batch_size=64
And I got this error
File “/home/taufiq/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py”, line 665, in <lambda>
shape.as_list(), dtype=dtype, partition_info=partition_info)
TypeError: ones_initializer() got multiple values for keyword argument ‘dtype’

step 450, loss = 1.21 (21.7 examples/sec; 2.951 sec/batch)
step 460, loss = 1.19 (21.6 examples/sec; 2.964 sec/batch)
step 470, loss = 1.07 (21.8 examples/sec; 2.931 sec/batch)
step 480, loss = 1.11 (21.7 examples/sec; 2.950 sec/batch)
step 490, loss = 1.24 (21.7 examples/sec; 2.956 sec/batch)

(Training can also be run on multiple GPUs by adding the –num_gpus=N option). The re-trained model can now be evaluated on the validation dataset:

$ mkdir -p $FLOWERS_DIR/eval
$ bazel build inception/flowers_eval
$ bazel-bin/inception/flowers_eval –eval_dir=$FLOWERS_DIR/eval –data_dir=$FLOWERS_DIR/data –subset=validation –num_examples=500 –checkpoint_dir=$FLOWERS_DIR/train –input_queue_memory_factor=1 –run_once
Successfully loaded model from /data/flowers/train/model.ckpt-499 at step=499.
starting evaluation on (validation).
precision @ 1 = 0.8574 recall @ 5 = 0.9980 [512 examples]

Here the top-1 (i.e., single guess) classification accuracy is 85% after retraining the model for 500 steps. The accuracy can be improved further by training for more steps. For more details on using the Inception v3 model, see the README document.

– See more at: http://www.nvidia.com/object/gpu-accelerated-applications-tensorflow-running-jobs.html#sthash.err4u2QJ.dpuf

Testing TensorFlow and installing in a virtualenv

Now that I have Ubuntu 14.04 with CUDA 8.0 with cuDNN 5.1 and Tensorflow built, its time to test.

More instructions heer: https://www.tensorflow.org/versions/r0.12/get_started/os_setup.html

Since I installed a virtualenv,

Activate the environment:

$ source ~/tensorflow/bin/activate  # If using bash
export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-0.12.0rc0-cp27-none-linux_x86_64.whl

Finally install TensorFlow:

# Python 2
pip install --upgrade $TF_BINARY_URL

Successfully installed tensorflow-gpu protobuf wheel mock numpy six setuptools funcsigs pbr
Cleaning up…
When you are done using TensorFlow, deactivate the environment.

(tensorflow)$ deactivate

$  # Your prompt should change back

To see TF version:
python -c 'import tensorflow as tf; print(tf.__version__)'
0.12.0-rc0

Now to test TensorFlow

python

>> import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally

hello = tf.constant(“Hello, TensorFlow!”)
>>> sess = tf.Session()
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1060
major: 6 minor: 1 memoryClockRate (GHz) 1.6705
pciBusID 0000:01:00.0
Total memory: 2.94GiB
Free memory: 2.88GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1060, pci bus id: 0000:01:00.0)

>>> print(sess.run(hello))
Hello, TensorFlow!
>>> a = tf.constant(10)
>>> b = tf.constant(32)
>>> print(sess.run(a + b))
42

Installing and testing CUDA

In a previous blog I installed cuda7.5 on my 1060 card.  Found out that the Pascal cards (1060, 1080, TitanX,..) need at least cuda8.0.
So now to install cuda8 and cudnn5.1.
Again thanks to Donald for an excellent succinct blog: https://www.pugetsystems.com/labs/hpc/Install-Ubuntu-16-04-or-14-04-and-CUDA-8-and-7-5-for-NVIDIA-Pascal-GPU-825/

Im aussuming all the dependencies are installed.  If not, see the blog above.

CUDA toolkit installs

Download the “.run” install files from NVIDIA (you will need to be registered as a developer to get the 8.0rc version)

You want to run these install scripts and NOT install the bundled display drivers!

You can run the scripts and answer the prompts or you can do,

./cuda_7.5.18_linux.run --help

to see the script options. Then, if you trust me, you can do the following,

chmod 755 cuda_*
./cuda_8.0.27_linux.run --silent --toolkit --samples --samplespath=/usr/local/cuda-8.0/samples --override

That will give you both CUDA toolkit versions with the sample code directories where they belong.

There will be a symbolic link from /usr/local/cuda-8.0 to /usr/local/cuda.

I like to have have base development system tools like CUDA on the default bin and lib path so I create the following files,

gedit /etc/profile.d/cuda.sh

export PATH=$PATH:/usr/local/cuda/bin
export CUDADIR=/usr/local/cuda
export GLPATH=/usr/lib

and for libs,

gedit /etc/ld.so.conf.d/cuda.conf

/usr/local/cuda/lib64

Run “ldconfig” after adding that last file.
I got this error:

/sbin/ldconfig.real: /usr/local/cuda/lib64/libcudnn.so.5 is not a symbolic link

And I added this to the bottom of ~/.bashrc

export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"
export PATH="/usr/local/cuda/bin:$PATH"
export PATH="$PATH:/home/taufiq/bin"

source /home/taufiq/.bazel/bin/bazel-complete.bash

 

 

Installing nVidia drivers

In a previous blog, I went through the long way of installing nVidia drivers.  Thanks to the blog from Donald, there is an easier way: https://www.pugetsystems.com/labs/hpc/Install-Ubuntu-16-04-or-14-04-and-CUDA-8-and-7-5-for-NVIDIA-Pascal-GPU-825/

Install the NVIDIA display driver

I’ve been using the well maintained “graphics-drivers” ppa for adding the NVIDIA display drivers. These have been up-to-date and well packaged. Using this will give you a convenient update path for new drivers. So far I haven’t had any trouble with new drivers rebuilding against kernel source using dkms.

add-apt-repository ppa:graphics-drivers/ppa
apt-get update
apt-get install nvidia-367

Once installed, press ctrl-alt-F1 to get to a terminal, and type

sudo nvidia-smi

You should see something like this

fullsizerender-1

ctrl-alt-F7 to return to gnome

 

 

Installing TensorFlow

Instructions adapted from here, I chose the virtualenv: https://www.tensorflow.org/versions/r0.12/get_started/os_setup.html#optional-install-cuda-gpus-on-linux

# Ubuntu/Linux 64-bit
$ sudo apt-get install python-pip python-dev python-virtualenv

$ virtualenv --system-site-packages ~/tensorflow

Activate the environment:

$ source ~/tensorflow/bin/activate  # If using bash
$ source ~/tensorflow/bin/activate.csh  # If using csh
(tensorflow)$  # Your prompt should change
$ sudo apt-get install  git
$ sudo apt-get install python-numpy python-dev python-wheel

cd ~/tensorflow

 git clone https://github.com/tensorflow/tensorflow
sudo apt-get update

To build TensorFlow, you’ll need to install Bazel

Install Bazel

Follow instructions here to install the dependencies for bazel. Then download the latest stable bazel version using the installer for your system and run the installer as mentioned there:

$ chmod +x PATH_TO_INSTALL.SH
$ ./PATH_TO_INSTALL.SH --user

Remember to replace PATH_TO_INSTALL.SH with the location where you downloaded the installer.

Finally, follow the instructions in that script to place bazel into your binary path.

1. Install JDK 8

$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java8-installer

Note: You might need to sudo apt-get install software-properties-common if you don’t have the add-apt-repositorycommand. See here.

2. Add Bazel distribution URI as a package source (one time setup)

$ echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
$ curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -

3. Update and install Bazel

$ sudo apt-get update && sudo apt-get install bazel

Once installed, you can upgrade to newer version of Bazel with:

$ sudo apt-get upgrade bazel

Are these next steps necessary ? (Bael seems to have been insatalled above)

Get the bazel installer from here:  https://github.com/bazelbuild/bazel/releases/download/0.4.1/bazel-0.4.1-installer-linux-x86_64.sh

$ chmod +x PATH_TO_INSTALL.SH
$ ./PATH_TO_INSTALL.SH --user

Remember to replace PATH_TO_INSTALL.SH with the location where you downloaded the installer.

Finally, follow the instructions in that script to place bazel into your binary path.

Bazel is now installed!

Make sure you have “/home/taufiq/bin” in your path. You can also activate bash
completion by adding the following line to your ~/.bashrc:
source /home/taufiq/.bazel/bin/bazel-complete.bash

Then:

$ cd 
$ source .bashrc

Check by running bazel to get the usage:

$ bazel

Configure TensorFlow

$ cd tensorflow/tensorflow/    #since ite a virtualenv
~/tensorflow/tensorflow$ ./configure
~/tensorflow/tensorflow ~/tensorflow/tensorflow
Please specify the location of python. [Default is /home/taufiq/tensorflow/bin/python]:
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N]
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N]
No Hadoop File System support will be enabled for TensorFlow
Found possible Python library paths:
/home/taufiq/tensorflow/lib/python2.7/site-packages
Please input the desired Python library path to use. Default is [/home/taufiq/tensorflow/lib/python2.7/site-packages]Using python library path: /home/taufiq/tensorflow/lib/python2.7/site-packages
Do you wish to build TensorFlow with OpenCL support? [y/N]
No OpenCL support will be enabled for TensorFlow
Do you wish to build TensorFlow with CUDA support? [y/N] Y
CUDA support will be enabled for TensorFlow
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 7.5
Please specify the location where CUDA 7.5 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the Cudnn version you want to use. [Leave empty to use system default]: 5
Please specify the location where cuDNN 5 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: “3.5,5.2”]: 3.5,5.2,6.1       (replace 6.1 with your compute capability)
…….
INFO: Starting clean (this may take a while). Consider using –expunge_async if the clean takes more than several minutes.
…….
INFO: Downloading from http://github.com/google/protobuf/archive/008b5a228b37c054f46ba478ccafa5e855cb16db.tar.gz: 0B

Build TensorFlow

# To build with GPU support:
$ bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

# The name of the .whl file will depend on your platform.
$ sudo pip install /tmp/tensorflow_pkg/tensorflow-0.12.0rc0-py2-none-any.whl


If you have a version of CUDA which does not support your hardware:
nvcc fatal : Unsupported gpu architecture 'compute_61'

To print version of TF:
python -c 'import tensorflow as tf; print(tf.__version__)'
0.12.0-rc0

 

cuDNN

https://developer.nvidia.com/rdp/cudnn-download

cuDNN v5.1 Library for Linux

$ cd ~/Downloads
$ tar -zxf cudnn-8.0-linux-x64-v5.1.tgz
$ cd cuda
$ sudo cp lib64/* /usr/local/cuda/lib64/
$ sudo cp include/* /usr/local/cuda/include/
$ cd ~/Downloads
$ tar -zxf cudnn-sample-v5.tgz 
$ cd mnistCUDNN/
$ make
$ ./mnistCUDNN 
cudnnGetVersion() : 5005 , CUDNN_VERSION from cudnn.h : 5005 (5.0.5)
Host compiler version : GCC 4.8.4
There are 1 CUDA capable devices on your machine :
device 0 : sms 10 Capabilities 6.1, SmClock 1670.5 Mhz, MemSize (Mb) 3013, MemClock 4004.0 Mhz, Ecc=0, boardGroupID=0
Using device 0

Testing single precision

img_0398
 If you type nvidia-smi in another terminal, you will see the process running:
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 12958 C ./mnistCUDNN 109MiB |
+-----------------------------------------------------------------------------+

Installing and Testing CUDA (deprecated)

Proceeding with Jonas’ blog at: http://berge.io/deep-blog/2016/9/7/how-to-install-tensorflow-with-gpu-support-on-a-machine-with-ubuntu

I find the CUDA version required from the TensorFlow page at:
https://www.tensorflow.org/versions/r0.10/get_started/os_setup.html
Which gives me  cuda 7.5 and cuDNN 5

And Linux specific installation is here:
https://www.tensorflow.org/versions/r0.10/get_started/os_setup.html#optional-install-cuda-gpus-on-linux

NVIDIA documentation is here: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/#axzz4RwTR5w9k

make sure to answer NO when the installation asks if you want to do the driver installation part of the cuda installation.

So lets get started:  Some optional tests first

uname -m && cat /etc/*releasex86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04

Verify hardware is there:  $ lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation Device 1c20 (rev a1)

$ gcc --version
$ uname -r

Now start the CUDA install

0) Download your relevant CUDA.run file
Note, that once again this install is if you purely want to use your graphics card (Titan X) for GPU/CUDA purposes and not for rendering.

7.5 is here:  https://developer.nvidia.com/cuda-75-downloads-archive
Get the runfile
sudo sh cuda_7.5.18_linux.run

Do NOT install nVidia drivers !  (see second line below)

Do you accept the previously read EULA? (accept/decline/quit): accept
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 352.39? ((y)es/(n)o/(q)uit): n
Install the CUDA 7.5 Toolkit? ((y)es/(n)o/(q)uit): y
Enter Toolkit Location [ default is /usr/local/cuda-7.5 ]:
Do you want to install a symbolic link at /usr/local/cuda? ((y)es/(n)o/(q)uit): y
Install the CUDA 7.5 Samples? ((y)es/(n)o/(q)uit): y
Enter CUDA Samples Location [ default is /home/taufiq ]:
Installing the CUDA Toolkit in /usr/local/cuda-7.5 …

Missing recommended library: libGLU.so
Missing recommended library: libX11.so
Missing recommended library: libXi.so
Missing recommended library: libXmu.so
Missing recommended library: libGL.so

Installing the CUDA Samples in /home/taufiq …
Copying samples to /home/taufiq/NVIDIA_CUDA-7.5_Samples now…
Finished copying samples.

===========
= Summary =
===========

Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-7.5
Samples: Installed in /home/taufiq, but missing recommended libraries

Please make sure that
– PATH includes /usr/local/cuda-7.5/bin
– LD_LIBRARY_PATH includes /usr/local/cuda-7.5/lib64, or, add /usr/local/cuda-7.5/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-7.5/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-7.5/doc/pdf for detailed information on setting up CUDA.

***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 352.00 is required for CUDA 7.5 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run -silent -driver

Logfile is /tmp/cuda_install_11270.log

Now to add in the paths as described in the install:
gedit  ~/.bashrc
export CUDA_HOME=/usr/local/cuda-7.5
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64
PATH=${CUDA_HOME}/bin:${PATH}
export PATH
$ source ~/.bashrc

Check for nvidia devices:

$ ls -l /dev/nv*
crw-rw-rw- 1 root root 195, 0 Dec 4 18:45 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Dec 4 18:45 /dev/nvidiactl
They have the correct (0666) file permissions

Testing CUDA

SDK Samples

Now you can copy the SDK samples into your home directory, and build a test sample.

cudainstallsamples7.5.sh  ~ 
cd ~/NVIDIA_CUDA7.5_Samples 
cd 1_Utilities/deviceQuery 
make
This executes nvcc

If everything goes well, you should be able to verify your CUDA installation by running the deviceQuery sample.

$ ./deviceQuery

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 7.5, NumDevs = 1, Device0 = GeForce GTX 1060
Result = PASS

img_0394

$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 375.20 Tue Nov 15 16:49:10 PST 2016
GCC version: gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3)

$ cd ../bandwidthTest
$ make
$ ./bandwidthTest

FullSizeRender.jpg

 

Installing nvidia Drivers (Deprecated)

This was by far more difficult than it needed to be.  Nouveau has to be blacklisted..
Some of this taken from Emil’s blog (thanks Emil)

1. Driver installation

  1. Go the the Nvidia’s driver homepage and download the appropriate package for your gpu. http://www.nvidia.com/Download/index.aspx?lang=en-us
  2. Open a terminal: ctrl + alt + t
  3. $sudo apt-get install build-essential
  4. sudo gedit /etc/modprobe.d/blacklist-nouveau.conf
    blacklist nouveau
    options nouveau modeset=0
    # lbm-nouveau  #optional – havent tried with this

Save and close the file.

$sudo update-initramfs -u
update-initramfs: Generating /boot/initrd.img-4.4.0-31-generic
$sudo service nouveau stop
  1. Press ctrl + alt+ F1 to enter the console.
  2. Go to the directory where your driver is located and run: $chmod a+x .
  3. $ sudo service lightdm stop
  4. Install the drivers by: $sudo bash NVIDIA-*run –no-opengl-files
  5. You might get an error during the installation but continue if possible.
  6. I answered no to installing nVidia drivers under X.  Apparently you can enable this later if you want by running nvidia-xconfig
  7. Installation should be complete. Now check if device nodes are present:
  8. Check if /dev/nvidia* files exist. If they don’t, do:  (this didnt work for me)
    $ sudo modprobe nvidia
  9. Reboot computer
  10. apt-cache search nvidia | grep -P ‘^nvidia-[0-9]+\s’
    this gave me:

    nvidia-173 - NVIDIA legacy binary driver - version 173.14.39
    nvidia-310 - Transitional package for nvidia-310
    nvidia-319 - Transitional package for nvidia-319
    nvidia-304 - NVIDIA legacy binary driver - version 304.132
    nvidia-331 - Transitional package for nvidia-331
    nvidia-340 - NVIDIA binary driver - version 340.98
    nvidia-346 - Transitional package for nvidia-346
    nvidia-352 - Transitional package for nvidia-367
    nvidia-367 - NVIDIA binary driver - version 367.57
  11.  nvidia-smi+-----------------------------------------------------------------------------+
    | NVIDIA-SMI 375.20 Driver Version: 375.20 |
    |-------------------------------+----------------------+----------------------+
    | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
    | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
    |===============================+======================+======================|
    | 0 GeForce GTX 1060 Off | 0000:01:00.0 Off | N/A |
    | N/A 47C P0 25W / N/A | 0MiB / 3045MiB | 0% Default |
    
    | Processes: GPU Memory |
    |GPU PID Type Process name Usage |
    | No running processes found |
    
    
  12. $ nvidia-debugdump -l
    Found 1 NVIDIA devices
    Device ID: 0
    Device name: GeForce GTX 1060
    GPU internal ID: GPU-00c74f5d-66d2-6f5b-9fcc-1614ac8b8db3 

Some more info here: https://devtalk.nvidia.com/default/topic/878117/cuda-setup-and-installation/-solved-titan-x-for-cuda-7-5-login-loop-error-ubuntu-14-04-/

 

Testing TensorFlow

$ sudo python -m tensorflow.models.image.mnist.convolutional
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
Initialized!
Step 0 (epoch 0.00), 2.3 ms
Minibatch loss: 8.334, learning rate: 0.010000
Minibatch error: 85.9%
Validation error: 84.6%
Step 100 (epoch 0.12), 105.7 ms


...

Minibatch error: 0.0%
Validation error: 0.7%
Step 8400 (epoch 9.77), 104.8 ms
Minibatch loss: 1.596, learning rate: 0.006302
Minibatch error: 0.0%
Validation error: 0.8%
Step 8500 (epoch 9.89), 105.1 ms
Minibatch loss: 1.618, learning rate: 0.006302
Minibatch error: 0.0%
Validation error: 0.8%
Test error: 0.8%


$