Check out my first novel, midnight's simulacra!

PTX

From dankwiki
Revision as of 08:06, 14 December 2010 by Dank (talk | contribs)

The ISA to which CUDA's nvcc compiles source code. This is JIT'd into architecture-specific machine language by the hardware driver after the CUDA runtime is used to load a PTX module. It can then be scheduled for execution on CUDA devices. From Version 2.1 of the PTX ISA Reference:

PTX defines a virtual machine and ISA for general purpose parallel thread execution. PTX programs are translated at install time to the target hardware instruction set. The PTX-to-GPU translator and driver enable NVIDIA GPUs to be used as programmable parallel computers.

Versions

PTX Version CUDA Toolkit Version Changes
2.2 3.2
  • New kernel parameter directives for pointer arguments
  • Flat address space for constants (backwards compatibility for constant banks)
  • Texture changes for OpenCL, bilerp (bilinear interpolation) and high-bw loads
2.1 3.1
  • Stack-based API, indirect branches and function pointers for sm_2x targets
  • .branchtargets, .calltargets, and .callprototype directives
  • 32 driver-specific execution environment registers %envreg0..%envreg31
  • New instruction rcp.approx.ftz.f64 for fast approximate reciprocal
2.0 3.0

Cooperative Thread Arrays

  • Equivalent to a block in CUDA -- broken up into warps, can communicate, can be grouped into a grid, one kernel per grid
  • tid: thread ID within CTA
  • ntid: 3D shape of CTA
  • ctaid: CTA ID within grid
  • nctaid: 3D shape of grid
  • gridid: grid ID

Tools

  • Marcin Wilhelm Kościelnicki's nv50dis, a disassembler