PTX: Difference between revisions

Revision as of 07:41, 18 July 2010

The ISA to which CUDA's nvcc compiles source code. This is JIT'd into architecture-specific machine language by the hardware driver after the CUDA runtime is used to load a PTX module. It can then be scheduled for execution on CUDA devices. From Version 2.1 of the PTX ISA Reference:

PTX defines a virtual machine and ISA for general purpose parallel thread execution. PTX programs are translated at install time to the target hardware instruction set. The PTX-to-GPU translator and driver enable NVIDIA GPUs to be used as programmable parallel computers.

Versions

PTX Version	CUDA Toolkit Version	Changes
2.1	3.1	Stack-based API, indirect branches and function pointers for sm_2x targets .branchtargets, .calltargets, and .callprototype directives 32 driver-specific execution environment registers %envreg0..%envreg31 New instruction rcp.approx.ftz.f64 for fast approximate reciprocal
2.0	3.0

Tools

Marcin Wilhelm Kościelnicki's nv50dis, a disassembler

@@ Line 1: / Line 1: @@
 The ISA to which [[CUDA]]'s nvcc compiles source code. This is JIT'd into architecture-specific machine language by the hardware driver after the CUDA runtime is used to load a PTX module. It can then be scheduled for execution on CUDA devices. From Version 2.1 of the [http://developer.download.nvidia.com/compute/cuda/3_1/toolkit/docs/ptx_isa_2.1.pdf PTX ISA Reference]:
 :''PTX defines a virtual machine and ISA for general purpose parallel thread execution. PTX programs are translated at install time to the target hardware instruction set. The PTX-to-GPU translator and driver enable NVIDIA GPUs to be used as programmable parallel computers.''
+==Versions==
+{|
+|-
+! PTX Version
+! CUDA Toolkit Version
+! Changes
+|-
+| 2.1
+| 3.1
+|
+* Stack-based API, indirect branches and function pointers for sm_2x targets
+* .branchtargets, .calltargets, and .callprototype directives
+* 32 driver-specific execution environment registers %envreg0..%envreg31
+* New instruction rcp.approx.ftz.f64 for fast approximate reciprocal
+|-
+| 2.0
+| 3.0
+|
+|-
+|}
 ==Tools==
 * Marcin Wilhelm Kościelnicki's [http://0x04.net/cgit/index.cgi/nv50dis/ nv50dis], a disassembler
 [[CATEGORY:GPGPU]]

anonymous

Search

PTX: Difference between revisions

Namespaces

more

page actions

Revision as of 07:41, 18 July 2010

Versions

Tools

navigation

wiki tools

wiki tools

anonymous

Search

PTX: Difference between revisions

Revision as of 07:41, 18 July 2010

Versions

Tools

navigation

wiki tools

page tools

Categories