Knowledge Base/CUDA/Installation
Contents |
Acknowledgements
Most of the original work for getting CUDA up and running on Linux was done by Jeremy Cohen. He produced the original instructions that were later edited by Paul Bilokon.
Background
What is CUDA?
NVIDIA® CUDA™ is a general purpose parallel computing architecture that leverages the parallel compute engine in NVIDIA graphics processing units (GPUs) to solve many complex computational problems. It includes the CUDA Instruction Set Architecture (ISA) and the parallel compute engine in the GPU.
For more information, you should definitely visit NVIDIA's excellent CUDA Zone website: http://www.nvidia.com/object/cuda_home.html
Which programming languages are supported?
In order to use the CUDA architecture programmers need to use C. NVIDIA are planning to add support for other programming languages, including FORTRAN and C++.
Who is using CUDA?
NVIDIA have solved over 100 million CUDA enabled GPUs so far (29 August, 2009).
Which GPUs are CUDA-enabled?
Here is the list so far (29 August, 2009):
- NVIDIA GeGorce 8, 9, 100, 200-series GPUs with a minimum of 256MB of local graphics memory. — These are generally regarded as lower price gaming cards, though they are still very powerful. They are often used by NVIDIA to roll out the newest GPUs and architectures like the GeForce FTX 295 card.
- GeForce GTX 295
- GeForce GTX 285
- GeForce GTX 285 for Mac
- GeForce GTX 280 — Supports double precision. The multiprocessor has eight single-precision floating point ALUs (one per core) but only one double-precision ALU (shared by the eight cores). Thus, for applications whose execution time is dominated by floating point computations, switching from single-precision to double-precision will increase runtime by a factor of approximately eight. For applications which are memory bound, enabling double-precision will only decrease performance by a factor of about two. More on this here.
- GeForce GTX 275
- GeForce GTX 260 — Supports double precision. The multiprocessor has eight single-precision floating point ALUs (one per core) but only one double-precision ALU (shared by the eight cores). Thus, for applications whose execution time is dominated by floating point computations, switching from single-precision to double-precision will increase runtime by a factor of approximately eight. For applications which are memory bound, enabling double-precision will only decrease performance by a factor of about two. More on this here.
- GeForce GTS 250
- GeForce GTS 240
- GeForce GT 220
- GeForce G210
- GeForce GTS 150
- GeForce GT 130
- GeForce GT 120
- GeForce G100
- GeForce 9800 GX2
- GeForce 9800 GTX+
- GeForce 9800 GTX
- GeForce 9800 GT
- GeForce 9600 GSO
- GeForce 9600 GT
- GeForce 9500 GT
- GeForce 9400GT
- GeForce 8800 Ultra
- GeForce 8800 GTX
- GeForce 8800 GTS
- GeForce 8800 GT
- GeForce 8800 GS
- GeForce 8600 GTS
- GeForce 8600 GT
- GeForce 8500 GT
- GeForce 8400 GS
- GeForce 9400 mGPU
- GeForce 9300 mGPU
- GeForce 8300 mGPU
- GeForce 8200 mGPU
- GeForce 8100 mGPU
- NVIDIA GeForce mobile products. — NVIDIA GeForce variants for mobile devices.
- GeForce GTX 280M
- GeForce GTX 260M
- GeForce GTS 260M
- GeForce GTS 250M
- GeForce GTS 160M
- GeForce GTS 150M
- GeForce GT 240M
- GeForce GT 230M
- GeForce GT 130M
- GeForce G210M
- GeForce G110M
- GeForce G105M
- GeForce G102M
- GeForce 9800M GTX
- GeForce 9800M GT
- GeForce 9800M GTS
- GeForce 9800M GS
- GeForce 9700M GTS
- GeForce 9700M GT
- GeForce 9650M GS
- GeForce 9600M GT
- GeForce 9600M GS
- GeForce 9500M GS
- GeForce 9500M G
- GeForce 9400M G
- GeForce 9300M GS
- GeForce 9300M G
- GeForce 9200M GS
- GeForce 9100M G
- GeForce 8800M GTS
- GeForce 8700M GT
- GeForce 8600M GT
- GeForce 8600M GS
- GeForce 8400M GT
- GeForce 8400M GS
- NVIDIA Quadro. — Compared with NVIDIA GeForce, these products offer corporate pricing, better CAD support, more thorough testing, and more memory. However, they tend to use the same GPUs (e.g. Quadro FX 5800 is using the same GPU as GeForce GTX 280).
- Quadro FX 5800
- Quadro FX 5600
- Quadro FX 4800 — see NVIDIA Quadro FX 4800: Workstation Graphics At Its Finest?, an article by Uwe Scheffel.
- Quadro FX 4800 for Mac
- Quadro FX 4700 X2
- Quadro FX 4600
- Quadro FX 3800
- Quadro FX 3700
- Quadro FX 1800
- Quadro FX 1700
- Quadro FX 580
- Quadro FX 570
- Quadro FX 470
- Quadro FX 380
- Quadro FX 370
- Quadro FX 370 Low Profile
- Quadro CX
- Quadro NVS 450
- Quadro NVS 420
- Quadro NVS 295
- Quadro NVS 290
- Quadro Plex 2100 D4
- Quadro Plex 2200 D2
- Quadro Plex 2100 S4
- Quadro Plex 1000 Model IV
- NVIDIA Quadro mobile products. — NVIDIA Quadro variants for mobile devices.
- Quadro FX 3700M
- Quadro FX 3600M
- Quadro FX 2700M
- Quadro FX 1700M
- Quadro FX 1600M
- Quadro FX 770M
- Quadro FX 570M
- Quadro FX 370M
- Quadro FX 360M
- Quadro NVS 320M
- Quadro NVS 160M
- Quadro NVS 150M
- Quadro NVS 140M
- Quadro NVS 135M
- Quadro NVS 130M
- NVIDIA Tesla. — These products are CUDA computing cards with no video output. Tesla C1060, for example, is about the same as GeForce GTX 280, and pretty much exactly the same as Quadro FX 5800.
- Tesla S1070
- Tesla C1060
- Tesla C870
- Tesla D870
- Tesla S870
- NVIDIA ION. — These products were designed for compact, low-power PCs with CPUs like Intel Atom; NVIDIA claim that these GPUs have performance up to 10X faster than similar systems on under-performing PC designs.
erformance.
This list is bound to get out-of-date very quickly, so you should check its parent here: http://www.nvidia.com/object/cuda_learn_products.html
Installation on Linux: bob05.doc.ic.ac.uk
We shall now describe the installation and setup process for NVIDIA CUDA software on the Department of Computing, Imperial College London, machines that we used for the 11 September, 2009, Thalesian Workshop. We believe that these notes may be of use to others, so we publish them here. Of course, many issues that we have faced are configuration-specific.
The entire installation in this case could be performed remotely. So we logged into bob05.doc.ic.ac.uk from a Windows machine using PuTTY.
Our System
We are running Ubuntu 8.04.
bob05% uname --all
gives us
Linux bob05 2.6.24-19-generic #1 SMP Fri Jul 11 23:41:49 UTC 2008 i686 GNU/Linux
bob05% cat /proc/cpuinfo
tells us, among other things,
model name : Intel(R) Core(TM)2 Duo CPU E7400 @ 2.80GHz cpu MHz : 1600.000 cache size : 3072 KB
and
bob05% cat /proc/meminfo
tells us, among other things,
MemTotal: 3368008 kB MemFree: 2882276 kB
What about the video card?
bob05% lspci
tells us that we have
01:00.0 VGA compatible controller: nVidia Corporation Unknown device 06e0 (rev a1)
The device is "Unknown" probably because we are running a legacy driver on this system. Can we find out which one it is?
bob05% lsmod | grep nv
shows us
nvidia 7825536 24 agpgart 34760 1 nvidia i2c_core 24832 1 nvidia
We have found an NVIDIA driver. Which version is this?
bob05% /sbin/modprobe -l nvidia
We have
/lib/modules/2.6.24-19-generic/volatile/nvidia.ko
Conveniently, the version is contained in the path. This is 2.6.24-19-generic. This version is not CUDA-enabled, so we have to install a new one.
Installing a CUDA-enabled driver
We are going to make a directory for all installation packages:
bob05% mkdir -p ~pb401/thalesians/workshops/2009-09-11/install
Downloading the driver
Let us download the CUDA-enabled driver from http://www.nvidia.com/object/cuda_get.html and put it under this directory.
We have chosen to download the CUDA Driver, with
- Operating System: Linux 32-bit
- Linux Version: Ubuntu 8.04
We have downloaded CUDA 2.2 NVIDIA Driver for Linux (Ubuntu 8.04) 185.18.14, NVIDIA-Linux-x86-185.18.14-pkg1.run.
Stopping the X server
First we need to stop the X server if one is running. First of all, is it running?
bob05% ps aux | grep /usr/bin/X
Yes, it is:
root 7189 0.1 0.6 811904 20568 tty7 SLs+ 21:38 0:02 /usr/bin/X :0 -br -audit 0 -auth /var/lib/gdm/:0.Xauth -nolisten tcp vt7
We need to kill the X server as root, so
bob05% ksu
which will show something like
bob05% ksu
WARNING: Your password may be exposed if you enter it here and are logged
in remotely using an unsecure (non-encrypted) channel.
Kerberos password for pb401/root@DOC.IC.AC.UK: :
Authenticated pb401/root@DOC.IC.AC.UK
Account root: authorization for pb401/root@DOC.IC.AC.UK successful
Changing uid to root (0)
Now let's stop the X server:
root@bob05:/homes/pb401# /etc/init.d/gdm stop
and...
* Stopping GNOME Display Manager... [ OK ]
just as we wanted.
When we check again with
bob05% ps aux | grep /usr/bin/X
we see that there is no such process.
Driver installation
Next,
root@bob05:/homes/pb401# cd /homes/pb401/thalesians/workshops/2009-09-11/install/ root@bob05:/homes/pb401/thalesians/workshops/2009-09-11/install# sh NVIDIA-Linux-x86-185.18.14-pkg1.run
You should see
Verifying archive integrity... OK Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86 185.18.14........ ................................................................................ ................................................................................ ................................................................................ ................................................
Then a console application ("NVIDIA Software Installer for Unix/Linux") will come up with a text-based message box:
Please read the following LICENSE and then select either "Accept" to accept the license and continue with the installation, or select "Do Not Accept" to abort the installation.
We choose "Accept" (navigate using arrow keys and press [Enter]).
Another message will appear:
No precompiled kernel interface was found to match your kernel; would you like the installer to attempt to download a kernel interface for your kernel from the NVIDIA ftp site (ftp://download.nvidia.com)?
Our answer is "Yes".
We get
No matching precompiled kernel interface was found on the NVIDIA ftp site; this means that the installer will need to compile a kernel interface for your kernel.
Well, so be it. We select "OK" (there is, of course, no other option).
Then it will go through a number of steps:
- Building kernel module
- Searching for conflicting X files
- Searching for conflicting OpenGL files
- Installing NVIDIA Accelerated Graphics Driver for Linux-x86 (185.18.14)
And then another message box will appear:
Would you like to run the nvidia-xconfig utility to automatically update your X configuration file so that the NVIDIA X driver will be used when you restart X? Any pre-existing X configuration file will be backed up.
We select "Yes".
Hopefully you will then see (as we did):
Your X configuration file has been successfully updated. Installation of the NVIDIA Accelerated Graphics Driver for Linux-x86 (version: 185.18.14) is now complete.
and select "OK".
Driver status check
So far, so good. However,
root@bob05:/homes/pb401/thalesians/workshops/2009-09-11/install# /sbin/modprobe -l nvidia
still shows
/lib/modules/2.6.24-19-generic/volatile/nvidia.ko /lib/modules/2.6.24-19-generic/kernel/drivers/video/nvidia.ko
so looks like the old driver is still active and we need to reboot.
If we were to reboot now, the system will still pick up the old NVIDIA driver and this will result in you being told that there is no CUDA capable card available in the machine if we try to run NVIDIA CUDA examples (more on them later). We can verify this by running
root@bob05:/homes/pb401/thalesians/workshops/2009-09-11/install# /sbin/modprobe -nv nvidia
If the response is
install /sbin/lrm-video nvidia
then the configuration is still pointing to the old driver. This brings us to our next step...
Disabling the original NVIDIA driver
Let us remove the current driver module:
root@bob05:/homes/pb401/thalesians/workshops/2009-09-11/install# /sbin/modprobe -r nvidia
And unmount the "volatile" filesystem containing the old drivers:
root@bob05:/homes/pb401/thalesians/workshops/2009-09-11/install# umount /lib/modules/2.6.24-19-generic/volatile
And
root@bob05:/homes/pb401/thalesians/workshops/2009-09-11/install# /usr/sbin/update-rc.d -f linux-restricted-modules-common remove
which should result in
Removing any system startup links for /etc/init.d/linux-restricted-modules-common ... /etc/rc0.d/S01linux-restricted-modules-common /etc/rc6.d/S01linux-restricted-modules-common /etc/rcS.d/S07linux-restricted-modules-common
The modprobe configuration in /etc/modprobe.d is providing the configuration in the file lrm-video that is pointing to the /sbin/lrm-video script when loading the driver. Remove this and then run depmod to update the module dependencies so that the new CUDA driver is picked up:
root@bob05:/homes/pb401/thalesians/workshops/2009-09-11/install# rm /etc/modprobe.d/lrm-video root@bob05:/homes/pb401/thalesians/workshops/2009-09-11/install# /sbin/depmod
Now running
root@bob05:/homes/pb401/thalesians/workshops/2009-09-11/install# /sbin/modprobe -nv nvidia
we shall see that modprobe would install a different module, the CUDA-enabled driver. The response will be
insmod /lib/modules/2.6.24-19-generic/kernel/drivers/i2c/i2c-core.ko insmod /lib/modules/2.6.24-19-generic/ubuntu/char/intel-agp-ich9m/agpgart.ko insmod /lib/modules/2.6.24-19-generic/kernel/drivers/video/nvidia.ko NVreg_DeviceFileUID=0 NVreg_DeviceFileGID=44 NVreg_DeviceFileMode=0660
Adding the driver to /etc/modules
Rebooting the machine at this stage would result in X failing to start so the NVIDIA driver should be started on boot by adding NVIDIA to the /etc/modules:
root@bob05:/homes/pb401/thalesians/workshops/2009-09-11/install# echo "nvidia" >> /etc/modules
Now we can
Rebooting the machine
Reboot the machine:
root@bob05:/homes/pb401/thalesians/workshops/2009-09-11/install# /sbin/reboot
Broadcast message from pb401@bob05
(/dev/pts/0) at 22:34 ...
The system is going down for reboot NOW!
Another driver status check
We have rebooted bob05. Let's check:
bob05% /sbin/modprobe -l nvidia
This is now showing
/lib/modules/2.6.24-19-generic/kernel/drivers/video/nvidia.ko
instead of
/lib/modules/2.6.24-19-generic/volatile/nvidia.ko
Looks like we are in business.
However, a proper test would be to run the CUDA Toolkit examples on this machine. Therefore we proceed to our next step.
Installing the CUDA Toolkit
Downloading the toolkit
Go back to http://www.nvidia.com/object/cuda_get.html and download the CUDA Toolkit.
Select
- Operating System: Linux 32-bit
- Linux Version: Ubuntu 8.04
and dowload CUDA Toolkit 2.2 for Linux (Ubuntu 8.04).
We have saved the file, cudatoolkit_2.2_linux_32_ubuntu8.04.run, under ~pb401/thalesians/workshops/2009-09-11/install.
Running the installer
Again we need to
bob05% ksu
WARNING: Your password may be exposed if you enter it here and are logged
in remotely using an unsecure (non-encrypted) channel.
Kerberos password for pb401/root@DOC.IC.AC.UK: :
Authenticated pb401/root@DOC.IC.AC.UK
Account root: authorization for pb401/root@DOC.IC.AC.UK successful
Changing uid to root (0)
Now we can
root@bob05:/homes/pb401# cd ~pb401/thalesians/workshops/2009-09-11/install/ root@bob05:/homes/pb401/thalesians/workshops/2009-09-11/install# sh cudatoolkit_2.2_linux_32_ubuntu8.04.run
Verifying archive integrity... All good. Uncompressing NVIDIA CUDA....................................................... ................................................................................ ................................................................................ ................................................................................ ................................................................................ ................................................................................ ................................................................................ ................................................................................ ................................................................................ ................................................................................ ................................................................................ ................................................................................ ................................................................................ ................................................................................ ................................................................................ ................................................................................ ................................................................................ ....................................... Enter install path (default /usr/local/cuda, '/cuda' will be appended):
Accept the default install path, /usr/local/cuda (simply press [Enter]).
You will then see
... `man/man1' -> `/usr/local/cuda/man/man1' `man/man1/nvcc.1' -> `/usr/local/cuda/man/man1/nvcc.1' `lib' -> `/usr/local/cuda/lib' `lib/libcufftemu.so.2.2' -> `/usr/local/cuda/lib/libcufftemu.so.2.2' `lib/libcublasemu.so.2.2' -> `/usr/local/cuda/lib/libcublasemu.so.2.2' `lib/libcublasemu.so.2' -> `/usr/local/cuda/lib/libcublasemu.so.2' `lib/libcufftemu.so' -> `/usr/local/cuda/lib/libcufftemu.so' `lib/libcublas.so.2' -> `/usr/local/cuda/lib/libcublas.so.2' `lib/libcufftemu.so.2' -> `/usr/local/cuda/lib/libcufftemu.so.2' `lib/libcudart.so.2' -> `/usr/local/cuda/lib/libcudart.so.2' `lib/libcublas.so' -> `/usr/local/cuda/lib/libcublas.so' `lib/libcufft.so' -> `/usr/local/cuda/lib/libcufft.so' `lib/libcufft.so.2' -> `/usr/local/cuda/lib/libcufft.so.2' `lib/libcufft.so.2.2' -> `/usr/local/cuda/lib/libcufft.so.2.2' `lib/libcublasemu.so' -> `/usr/local/cuda/lib/libcublasemu.so' `lib/libcudart.so.2.2' -> `/usr/local/cuda/lib/libcudart.so.2.2' `lib/libcudart.so' -> `/usr/local/cuda/lib/libcudart.so' `lib/libcublas.so.2.2' -> `/usr/local/cuda/lib/libcublas.so.2.2' `include' -> `/usr/local/cuda/include' `include/texture_types.h' -> `/usr/local/cuda/include/texture_types.h' `include/cudaGL.h' -> `/usr/local/cuda/include/cudaGL.h' `include/driver_types.h' -> `/usr/local/cuda/include/driver_types.h' `include/cufft.h' -> `/usr/local/cuda/include/cufft.h' `include/math_functions_dbl_ptx1.h' -> `/usr/local/cuda/include/math_functions_dbl_ptx1.h' `include/sm_12_atomic_functions.h' -> `/usr/local/cuda/include/sm_12_atomic_functions.h' `include/common_functions.h' -> `/usr/local/cuda/include/common_functions.h' `include/cuda.h' -> `/usr/local/cuda/include/cuda.h' `include/host_defines.h' -> `/usr/local/cuda/include/host_defines.h' `include/cublas.h' -> `/usr/local/cuda/include/cublas.h' `include/common_types.h' -> `/usr/local/cuda/include/common_types.h' `include/device_types.h' -> `/usr/local/cuda/include/device_types.h' `include/driver_functions.h' -> `/usr/local/cuda/include/driver_functions.h' `include/cuda_runtime_api.h' -> `/usr/local/cuda/include/cuda_runtime_api.h' `include/sm_11_atomic_functions.h' -> `/usr/local/cuda/include/sm_11_atomic_functions.h' `include/cuComplex.h' -> `/usr/local/cuda/include/cuComplex.h' `include/builtin_types.h' -> `/usr/local/cuda/include/builtin_types.h' `include/host_config.h' -> `/usr/local/cuda/include/host_config.h' `include/cuda_runtime.h' -> `/usr/local/cuda/include/cuda_runtime.h' `include/channel_descriptor.h' -> `/usr/local/cuda/include/channel_descriptor.h' `include/math_constants.h' -> `/usr/local/cuda/include/math_constants.h' `include/vector_functions.h' -> `/usr/local/cuda/include/vector_functions.h' `include/vector_types.h' -> `/usr/local/cuda/include/vector_types.h' `include/crt' -> `/usr/local/cuda/include/crt' `include/crt/device_runtime.h' -> `/usr/local/cuda/include/crt/device_runtime.h' `include/crt/storage_class.h' -> `/usr/local/cuda/include/crt/storage_class.h' `include/crt/host_runtime.h' -> `/usr/local/cuda/include/crt/host_runtime.h' `include/crt/func_macro.h' -> `/usr/local/cuda/include/crt/func_macro.h' `include/device_functions.h' -> `/usr/local/cuda/include/device_functions.h' `include/math_functions.h' -> `/usr/local/cuda/include/math_functions.h' `include/__cudaFatFormat.h' -> `/usr/local/cuda/include/__cudaFatFormat.h' `include/texture_fetch_functions.h' -> `/usr/local/cuda/include/texture_fetch_functions.h' `include/math_functions_dbl_ptx3.h' -> `/usr/local/cuda/include/math_functions_dbl_ptx3.h' `include/device_launch_parameters.h' -> `/usr/local/cuda/include/device_launch_parameters.h' `include/sm_13_double_functions.h' -> `/usr/local/cuda/include/sm_13_double_functions.h' `include/cuda_gl_interop.h' -> `/usr/local/cuda/include/cuda_gl_interop.h' `include/cuda_texture_types.h' -> `/usr/local/cuda/include/cuda_texture_types.h' `open64' -> `/usr/local/cuda/open64' `open64/lib' -> `/usr/local/cuda/open64/lib' `open64/lib/bec' -> `/usr/local/cuda/open64/lib/bec' `open64/lib/inline' -> `/usr/local/cuda/open64/lib/inline' `open64/lib/be' -> `/usr/local/cuda/open64/lib/be' `open64/lib/gfec' -> `/usr/local/cuda/open64/lib/gfec' `open64/bin' -> `/usr/local/cuda/open64/bin' `open64/bin/nvopencc' -> `/usr/local/cuda/open64/bin/nvopencc' `bin' -> `/usr/local/cuda/bin' `bin/bin2c' -> `/usr/local/cuda/bin/bin2c' `bin/cudafe++' -> `/usr/local/cuda/bin/cudafe++' `bin/cudafe' -> `/usr/local/cuda/bin/cudafe' `bin/ptxvars.cu' -> `/usr/local/cuda/bin/ptxvars.cu' `bin/nvcc' -> `/usr/local/cuda/bin/nvcc' `bin/ptxas' -> `/usr/local/cuda/bin/ptxas' `bin/fatbin' -> `/usr/local/cuda/bin/fatbin' `bin/nvcc.profile' -> `/usr/local/cuda/bin/nvcc.profile' `bin/filehash' -> `/usr/local/cuda/bin/filehash' `src' -> `/usr/local/cuda/src' `src/fortran.c' -> `/usr/local/cuda/src/fortran.c' ======================================== * Please make sure your PATH includes /usr/local/cuda/bin * Please make sure your LD_LIBRARY_PATH includes /usr/local/cuda/lib * or add /usr/local/cuda/lib to /etc/ld.so.conf and run ldconfig as root * Please read the release notes in /usr/local/cuda/doc/ * To uninstall CUDA, delete /usr/local/cuda * Installation Complete
Configuring the toolkit
First, we need to add the lib directory, /usr/local/cuda/lib, to /etc/ld.so.conf:
root@bob05:/homes/pb401/thalesians/workshops/2009-09-11/install# echo "/usr/local/cuda/lib" >> /etc/ld.so.conf
then run
root@bob05:/homes/pb401/thalesians/workshops/2009-09-11/install# /sbin/ldconfig
so this change is picked up.
Fixing the file permissions
You may have problems with the default permissions assigned to the installed files. The executable and library files must be accessible by non-root users to run the CUDA examples. As a quick workaround, you should make at least the following changes in order to run the examples and build CUDA applications:
root@bob05:/homes/pb401/thalesians/workshops/2009-09-11/install# cd /usr/local/cuda
root@bob05:/usr/local/cuda# find . -perm 700 -exec chmod 755 {} \;
root@bob05:/usr/local/cuda# find . -perm 600 -exec chmod 644 {} \;
Addressing the device node issue
We are almost there but there is still a minor problem: the device nodes /dev/nvidiactl and /dev/nvidia0 that are used to access the video card have permissions set to 660. This results in an error if the user attempting to run CUDA code is not a member of the group video. One way to fix this is to add oneself to the video group by editing /etc/group. However, these changes will be undone when /etc/group is overwritten by the Imperial College London Department of Computing system maintenance processes. An alternative solution is to chmod 666 /dev/nvidiactl and chmod 666 /dev/nvidia0. These changes are also undone on a reboot. We are yet to find a solution to this problem. For now, we
root@bob05:/usr/local/cuda# chmod 666 /dev/nvidiactl root@bob05:/usr/local/cuda# chmod 666 /dev/nvidia0
being aware that these changes will be undone by the local maintenance system on the next reboot. (A solution to this problem is yet to be found.)
Installing the CUDA SDK code examples
Downloading the examples
Let's go back to http://www.nvidia.com/object/cuda_get.html to download the CUDA SDK code examples. Select
- Operating System: Linux 32-bit
- Linux Version: Ubuntu 8.04
and download the "CUDA SDK 2.2.1 code samples for Linux (Ubuntu 8.04)".
We place the file cudasdk_2.21_linux.run under /homes/pb401/thalesians/workshops/2009-09-11/install.
Running the installer
Next:
root@bob05:/homes/pb401/thalesians/workshops/2009-09-11# exit exit bob05% cd ~pb401/thalesians/workshops/2009-09-11/ bob05% mkdir -p NVIDIA/CUDA/SDK/V2.21 bob05% cd install bob05% sh cudasdk_2.21_linux.run
Verifying archive integrity... All good. Uncompressing NVIDIA CUDA SDK................................................... ................................................................................ ................................................................................ ................................................................................ ................................................................................ ................................................................................ ................................................................................ ................................................................................ ................................................................................ .............................................................. Enter install path (default ~/NVIDIA_CUDA_SDK):
We enter
~/thalesians/workshops/2009-09-11/NVIDIA/CUDA/SDK/V2.21
Then:
Located CUDA at /usr/local/cuda If this is correct, choose the default below. If it is not correct, enter the correct path to CUDA Enter CUDA install path (default /usr/local/cuda):
Indeed, this is correct (we installed CUDA Toolkit to the default location), so we press [Enter].
Then a lot of files will be installed and eventually
`sdk/releaseNotesData/GEF9_2D_wte.gif' ->
`/homes/pb401/thalesians/workshops/2009-09-11/NVIDIA/CUDA/SDK/V2.21/releaseNotesData/GEF9_2D_wte.gif'
`sdk/releaseNotesData/tesla.gif' ->
`/homes/pb401/thalesians/workshops/2009-09-11/NVIDIA/CUDA/SDK/V2.21/releaseNotesData/tesla.gif'
`sdk/releaseNotesData/GEFGTX200_2D_wte.gif' ->
`/homes/pb401/thalesians/workshops/2009-09-11/NVIDIA/CUDA/SDK/V2.21/releaseNotesData/GEFGTX200_2D_wte.gif'
`sdk/tools' ->
`/homes/pb401/thalesians/workshops/2009-09-11/NVIDIA/CUDA/SDK/V2.21/tools'
`sdk/tools/CUDA_Occupancy_calculator.xls' ->
`/homes/pb401/thalesians/workshops/2009-09-11/NVIDIA/CUDA/SDK/V2.21/tools/CUDA_Occupancy_calculator.xls'
========================================
Configuring SDK Makefile (/homes/pb401/thalesians/workshops/2009-09-11/NVIDIA/CUDA/SDK/V2.21/common/common.mk)...
========================================
* Please make sure your PATH includes /usr/local/cuda/bin
* Please make sure your LD_LIBRARY_PATH includes /usr/local/cuda/lib
* To uninstall the NVIDIA CUDA SDK, please delete /homes/pb401/thalesians/workshops/2009-09-11/NVIDIA/CUDA/SDK/V2.21
* Installation Complete
Configuring the environment
To ensure that /usr/local/cuda/bin has been added to PATH and /usr/local/cuda/lib has been added to LD_LIBRARY_PATH, we
bob05% emacs .cshrc
(as we are using the tcsh shell in this case) to make sure that we have
set PATH=( \ . \ /usr/bin \ . . . /usr/local/cuda/bin \ ) set LD_LIBRARY_PATH=( \ /usr/local/cuda/lib \ )
Then we close our PuTTY window and reconnect to bob05 to make sure that everything is clean.
Prerequisite packages
These packages are now present on all the Workshop machines, including bob05. However, in the past we worked out that in order to make the examples we need...
root@bob05:/homes/pb401/thalesians/workshops/2009-09-11# apt-get install build-essential g++ libsdl1.2debian libsdl1.2-dev libgl1-mesa-dev libglu1-mesa-dev libsdl-image1.2 libsdl-image1.2-dev libxi-dev libXmu-dev glutg3-dev
For convenience, here is a list of aptitude packages that you may need to get things to make all the examples successfully (our next step):
- build-essential — Informational list of build-essential packages. If you do not plan to build Debian packages, you don't need this package. Moreover this package is not required for building Debian packages. This package contains an informational list of packages which are considered essential for building Debian packages. This package also depends on the packages on that list, to make it easy to have the build-essential packages installed. If you have this package installed, you only need to install whatever a package specifies as its build-time dependencies to build the package. Conversely, if you are determining what your package needs to build-depend on, you can always leave out the packages this package depends on. This package is NOT the definition of what packages are build-essential; the real definition is in the Debian Policy Manual. This package contains merely an informational list, which is all most people need. However, if this package and the manual disagree, the manual is correct.
- g++ — a freely redistributable C++ compiler. It is part of GCC, the GNU compiler collection.
- libsdl1.2debian — SDL is a library that allows programs portable low level access to a video framebuffer, audio output, mouse, and keyboard.
- libsdl1.2-dev — This package contains the files needed to compile and link programs which use SDL.
- libgl1-mesa-swx11 (optional?) — Mesa is a 3-D graphics library with an API which is very similar to that of OpenGL. To the extent that Mesa utilizes the OpenGL command syntax or state machine, it is being used with authorization from Silicon Graphics, Inc. However, the author makes no claim that Mesa is in any way a compatible replacement for OpenGL or associated with Silicon Graphics, Inc.
- libgl1-mesa-dev — This package includes headers and static libraries for compiling programs with Mesa.
- libglu1-mesa (optional?) — GLU offers simple interfaces for building mipmaps; checking for the presence of extensions in the OpenGL (or other libraries which follow the same conventions for advertising extensions); drawing piecewise-linear curves, NURBS, quadrics and other primitives (including, but not limited to, teapots); tesselating surfaces; setting up projection matrices and unprojecting screen coordinates to world coordinates.
- libglu1-mesa-dev — Includes headers and static libraries for compiling programs with GLU.
- libsdl-image1.2 — This is a simple library to load images of various formats as SDL surfaces. This library currently supports BMP, PPM, PCX, GIF, JPEG, PNG, TIFF, and XPM formats.
- libsdl-image1.2-dev — This package contains the include files and static libraries required to support development using the SDL 1.2 image loading library.
- libxi6 (optional?) — libXi provides an X Window System client interface to the XINPUT extension to the X protocol. The Input extension allows setup and configuration of multiple input devices, and will soon allow hotplugging of input devices; to be added and removed on the fly.
- libxi-dev — This package contains the development headers for the library found in libxi6.
- libXmu6 (optional?) — libXmu provides a set of miscellaneous utility convenience functions for X libraries to use.
- libXmu-dev — This package contains the development headers for the library found in libxmu6.
- glutg3 (optional?) — GLUT (as in "gluttony") is a window system independent toolkit for writing OpenGL programs. It implements a simple windowing API, which makes life considerably easier when learning about and exploring OpenGL programming.
- glutg3-dev — This package contains the development headers for the library found in glutg3.
make-ing the SDK examples
bob05% cd /homes/pb401/thalesians/workshops/2009-09-11/NVIDIA/CUDA/SDK/V2.21 bob05% make
Eventually you should see
make -C projects/MonteCarlo/ make[1]: Entering directory `/homes/pb401/thalesians/workshops/2009-09-11/NVIDIA/CUDA/SDK/V2.21/projects/MonteCarlo' make[1]: Leaving directory `/homes/pb401/thalesians/workshops/2009-09-11/NVIDIA/CUDA/SDK/V2.21/projects/MonteCarlo' make -C projects/BlackScholes/ make[1]: Entering directory `/homes/pb401/thalesians/workshops/2009-09-11/NVIDIA/CUDA/SDK/V2.21/projects/BlackScholes' make[1]: Leaving directory `/homes/pb401/thalesians/workshops/2009-09-11/NVIDIA/CUDA/SDK/V2.21/projects/BlackScholes' Finished building all
Testing
It's time to test our installation.
deviceQuery
bob05% cd /homes/pb401/thalesians/workshops/2009-09-11/NVIDIA/CUDA/SDK/V2.21/bin/linux/release bob05% ./deviceQuery
Here is the output that we got:
CUDA Device Query (Runtime API) version (CUDART static linking) There is 1 device supporting CUDA Device 0: "GeForce 9300 GE" CUDA Capability Major revision number: 1 CUDA Capability Minor revision number: 1 Total amount of global memory: 267714560 bytes Number of multiprocessors: 1 Number of cores: 8 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 16384 bytes Total number of registers available per block: 8192 Warp size: 32 Maximum number of threads per block: 512 Maximum sizes of each dimension of a block: 512 x 512 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 262144 bytes Texture alignment: 256 bytes Clock rate: 1.30 GHz Concurrent copy and execution: No Run time limit on kernels: Yes Integrated: No Support host page-locked memory mapping: No Compute mode: Default (multiple host threads can use this device simultaneously) Test PASSED Press ENTER to exit...
Excellent!
BlackScholes
Another test:
bob05% ./BlackScholes
Here is the output:
Initializing data... ...allocating CPU memory for options. ...allocating GPU memory for options. ...generating input data in CPU mem. ...copying input data to GPU mem. Data init done. Executing Black-Scholes GPU kernel (512 iterations)... Options count : 8000000 BlackScholesGPU() time : 23.268410 msec Effective memory bandwidth: 3.438138 GB/s Gigaoptions per second : 0.343814 Reading back GPU results... Checking the results... ...running CPU calculations. Comparing the results... L1 norm: 1.991672E-07 Max absolute error: 1.239777E-05 TEST PASSED Shutting down... ...releasing GPU memory. ...releasing CPU memory. Shutdown done. Press ENTER to exit...
oceanFFT
Now let's run a test that produces graphical output. Please note that we are using the PC X Server Xming 6.9.0.31 and we enabled X11 forwarding in PuTTY as per our Knowledge Base instructions.
However, when we try
bob05% ./oceanFFT
we get a blank X server window and
[CUDA FFT Ocean Simulation] Left mouse button - rotate Middle mouse button - pan Left + middle mouse button - zoom 'w' key - toggle wireframe [CUDA FFT Ocean Simulation] freeglut (./oceanFFT): Unable to create direct context rendering for window 'CUDA FFT Ocean Simulation' This may hurt performance. ERROR: Support for necessary OpenGL extensions missing. Press ENTER to exit...
The reason for this error is yet unclear.
The example runs successfully locally.