Check out my first novel, midnight's simulacra!
PTX: Difference between revisions
From dankwiki
No edit summary |
No edit summary |
||
(3 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
The ISA to which [[CUDA]]'s nvcc compiles source code. This is JIT'd into architecture-specific machine language by the hardware driver after the CUDA runtime is used to load a PTX module. It can then be scheduled for execution on CUDA devices. From Version 2.1 of the [http://developer.download.nvidia.com/compute/cuda/3_1/toolkit/docs/ptx_isa_2.1.pdf PTX ISA Reference]: | The ISA to which [[CUDA]]'s nvcc compiles source code. This is JIT'd into architecture-specific machine language by the hardware driver after the CUDA runtime is used to load a PTX module. It can then be scheduled for execution on CUDA devices. From Version 2.1 of the [http://developer.download.nvidia.com/compute/cuda/3_1/toolkit/docs/ptx_isa_2.1.pdf PTX ISA Reference]: | ||
:''PTX defines a virtual machine and ISA for general purpose parallel thread execution. PTX programs are translated at install time to the target hardware instruction set. The PTX-to-GPU translator and driver enable NVIDIA GPUs to be used as programmable parallel computers.'' | :''PTX defines a virtual machine and ISA for general purpose parallel thread execution. PTX programs are translated at install time to the target hardware instruction set. The PTX-to-GPU translator and driver enable NVIDIA GPUs to be used as programmable parallel computers.'' | ||
All PTX instructions can be predicated. Predicated instances of the branch and call instructions are the only way to effect conditional branching. | |||
==Versions== | ==Versions== | ||
{| border="1" | {| border="1" | ||
Line 7: | Line 9: | ||
! CUDA Toolkit Version | ! CUDA Toolkit Version | ||
! Changes | ! Changes | ||
|- | |||
| 2.2 | |||
| 3.2 | |||
| | |||
* New kernel parameter directives for pointer arguments | |||
* Flat address space for constants (backwards compatibility for constant banks) | |||
* Texture changes for OpenCL, bilerp (bilinear interpolation) and high-bw loads | |||
|- | |- | ||
| 2.1 | | 2.1 | ||
Line 19: | Line 28: | ||
| 3.0 | | 3.0 | ||
| | | | ||
* Added special registers <tt>nsmid</tt>, <tt>lanemask_*</tt>, <tt>clock64</tt> | |||
|- | |||
| 1.3 | |||
| | |||
| | |||
* Added special registers <tt>laneid</tt>, <tt>warpid</tt>, <tt>smid</tt>, <tt>pm0-3</tt> | |||
* Added instructions <tt>sub.cc</tt> and <tt>sub.c</tt> | |||
|- | |- | ||
|} | |} | ||
==Cooperative Thread Arrays== | |||
* Equivalent to a ''block'' in [[CUDA]] -- broken up into warps, can communicate, can be grouped into a grid, one kernel per grid | |||
* <tt>tid</tt>: thread ID within CTA | |||
* <tt>ntid</tt>: 3D shape of CTA | |||
* <tt>ctaid</tt>: CTA ID within grid | |||
* <tt>nctaid</tt>: 3D shape of grid | |||
* <tt>gridid</tt>: grid ID | |||
==Tools== | ==Tools== | ||
* Marcin Wilhelm Kościelnicki's [http://0x04.net/cgit/index.cgi/nv50dis/ nv50dis], a disassembler | * Marcin Wilhelm Kościelnicki's [http://0x04.net/cgit/index.cgi/nv50dis/ nv50dis], a disassembler | ||
[[CATEGORY:GPGPU]] | [[CATEGORY:GPGPU]] |
Latest revision as of 09:23, 14 December 2010
The ISA to which CUDA's nvcc compiles source code. This is JIT'd into architecture-specific machine language by the hardware driver after the CUDA runtime is used to load a PTX module. It can then be scheduled for execution on CUDA devices. From Version 2.1 of the PTX ISA Reference:
- PTX defines a virtual machine and ISA for general purpose parallel thread execution. PTX programs are translated at install time to the target hardware instruction set. The PTX-to-GPU translator and driver enable NVIDIA GPUs to be used as programmable parallel computers.
All PTX instructions can be predicated. Predicated instances of the branch and call instructions are the only way to effect conditional branching.
Versions
PTX Version | CUDA Toolkit Version | Changes |
---|---|---|
2.2 | 3.2 |
|
2.1 | 3.1 |
|
2.0 | 3.0 |
|
1.3 |
|
Cooperative Thread Arrays
- Equivalent to a block in CUDA -- broken up into warps, can communicate, can be grouped into a grid, one kernel per grid
- tid: thread ID within CTA
- ntid: 3D shape of CTA
- ctaid: CTA ID within grid
- nctaid: 3D shape of grid
- gridid: grid ID
Tools
- Marcin Wilhelm Kościelnicki's nv50dis, a disassembler