Check out my first novel, midnight's simulacra!
CUDA
Hardware/Emulation
NVIDIA maintains a list of supported hardware. Otherwise, there's emulation...
[recombinator](0) $ ~/local/cuda/C/bin/linux/emurelease/deviceQuery CUDA Device Query (Runtime API) version (CUDART static linking) There is no device supporting CUDA. Device 0: "Device Emulation (CPU)" CUDA Driver Version: 2.30 CUDA Runtime Version: 2.30 CUDA Capability Major revision number: 9999 CUDA Capability Minor revision number: 9999 Total amount of global memory: 4294967295 bytes Number of multiprocessors: 16 Number of cores: 128 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 16384 bytes Total number of registers available per block: 8192 Warp size: 1 Maximum number of threads per block: 512 Maximum sizes of each dimension of a block: 512 x 512 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 262144 bytes Texture alignment: 256 bytes Clock rate: 1.35 GHz Concurrent copy and execution: No Run time limit on kernels: No Integrated: Yes Support host page-locked memory mapping: Yes Compute mode: Default (multiple host threads can use this device simultaneously) Test PASSED
Each device has a compute capability, though this does not encompass all differentiated capabilities (see also deviceOverlap and canMapHostMemory...).
CUDA model
- Each processor has a register file.
- 8192 registers for compute capability <= 1.1, otherwise
- 16384 for compute capability <= 1.3
- A given host thread can execute code on only one device at once (but multiple host threads can execute code on the same device)
- A group of threads which share a memory and can "synchronize their execution to coördinate accesses to memory" (use a barrier) form a block. Each thread has a threadId within its (three-dimensional) block.
- For a block of dimensions <Dx, Dy, Dz>, the threadId of the thread having index <x, y, z> is (x + y * Dx + z * Dy * Dx).
- A group of blocks which share a kernel form a grid. Each block (and each thread within that block) has a blockId within its (two-dimensional) grid.
- For a grid of dimensions <Dx, Dy>, the blockId of the block having index <x, y> is (x + y * Dx).
- Thus, a given thread's <blockId X threadId> dyad is unique across the device. All the threads of a block share a blockId, and corresponding threads of various blocks share a threadId.
- Each time the kernel is instantiated, new grid and block dimensions may be provided
- A block's threads, starting from threadId 0, are broken up into contiguous warps having some warp size number of threads.
Memory type | Replication | Access | Host access |
---|---|---|---|
Registers | Per-thread | Read-write | None |
Local memory | Per-thread | Read-write | None |
Shared memory | Per-block | Read-write | None |
Global memory | Per-grid | Read-write | Read-write |
Constant memory | Per-grid | Read | Read-write |
Texture memory | Per-grid | Read | Read-write |
Installation on Debian
libcuda-dev packages exist in the non-free archive area, and supply the core library libcuda.so. Together with the upstream toolkit and SDK from NVIDIA, this provides a full CUDA development environment for 64-bit Debian Unstable systems. I installed CUDA 2.3 on 2010-01-25 (hand-rolled 2.6.32.6 kernel, built with gcc-4.4). This machine did not have CUDA-compatible hardware (it uses Intel 965).
- Download the Ubuntu 9.04 files from NVIDIA's "CUDA Zone".
- Run the toolkit installer (sh cudatoolkit_2.3_linux_64_ubuntu9.04.run)
- For a user-mode install, supply $HOME/local or somesuch
* Please make sure your PATH includes /home/dank/local/cuda/bin * Please make sure your LD_LIBRARY_PATH * for 32-bit Linux distributions includes /home/dank/local/cuda/lib * for 64-bit Linux distributions includes /home/dank/local/cuda/lib64 * OR * for 32-bit Linux distributions add /home/dank/local/cuda/lib * for 64-bit Linux distributions add /home/dank/local/cuda/lib64 * to /etc/ld.so.conf and run ldconfig as root * Please read the release notes in /home/dank/local/cuda/doc/ * To uninstall CUDA, delete /home/dank/local/cuda * Installation Complete
- Run the SDK installer (sh cudasdk_2.3_linux.run)
- I just installed it to the same directory as the toolkit, which seems to work fine.
======================================== Configuring SDK Makefile (/home/dank/local/cuda/shared/common.mk)... ======================================== * Please make sure your PATH includes /home/dank/local/cuda/bin * Please make sure your LD_LIBRARY_PATH includes /home/dank/local/cuda/lib * To uninstall the NVIDIA GPU Computing SDK, please delete /home/dank/local/cuda * Installation Complete
Building
nvcc flags
- -ptax-options=-v displays per-thread register usage
SDK's common.mk
This assumes use of the SDK's common.mk, as recommended by the documentation.
- Add the library path to LD_LIBRARY_PATH, assuming CUDA's been installed to a non-standard directory.
- Set the CUDA_INSTALL_PATH and ROOTDIR (yeargh!) if outside the SDK.
- I keep the following in bin/cudasetup of my home directory. Source it, using sh's . cudasetup syntax:
CUDA="$HOME/local/cuda/" export CUDA_INSTALL_PATH="$CUDA" export ROOTDIR="$CUDA/C/common/" if [ -n "$LD_LIBRARY_PATH" ] ; then export "LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA/lib64" else export "LD_LIBRARY_PATH=$CUDA/lib64" fi unset CUDA
- Set EXECUTABLE in your Makefile, and include $CUDA_INSTALL_PATH/C/common/common.mk
Unit testing
The DEFAULT_GOAL special variable of GNU Make can be used:
.PHONY: test .DEFAULT_GOAL:=test include $(CUDA_INSTALL_PATH)/C/common/common.mk test: $(TARGET) $(TARGET)