Libcudest: Difference between revisions

Revision as of 23:40, 22 April 2010

Reverse engineering of CUDA ioctls in the 3.0 SDK (195.36.15 driver, GTS 360M, amd64). CUDA primarily communicates with the NVIDIA closed-source driver via undocumented ioctl()s.

Maps

Ordered from highest to lowest locations in x86 memory. These are architecture-, and to a lesser degree driver- and kernel version-specific. Applications and libraries can of course create many more maps than these.

vsyscalls. read-execute-private, very few pages, topmost area of memory, usually highest mapping
VDSO. read-execute-private, one page, high in memory (SYSENTER/SYSEXIT)
Userspace stack. read-write-private, many pages, high in memory
Anonymous map, 3 read-write-private pages, high in memory.
- Possibly associated with nvidia driver's NV_STACK_SIZE stack. read-write-private, (3 * 4096 on amd64, 2 * 4096 on i686)
Two sets of /dev/nvidiaX maps for each bound device. Sets are usually continguous, and contain:
- an anonymous page, read-write-private
- several mappings of the device, having variable number of pages, all read-write-shared
Libraries. variable, middle of memory.
Userspace heap. read-write-private, many pages, low in memory
Application (data region). read-write-private, variable, low in memory
Application (text region). read-execute-private, variable, usually lowest mapping

ioctls

An ioctl (on x86) is 32 bits. The following definition comes from <texttt>linux/asm-generic/ioctl.h</texttt> in a 2.6.34 kernel:

Bit 31: Read?
Bit 30: Write?
Bits 29-16: Parameter size
Bits 15-8: Type (module)
Bits 7-0: Number (command)

Looking at the source of the 195.36.15 kernel driver's OS interface, we see that NVIDIA uses the standard ioctl-creation macros from ioctl.h, and can be expected to adhere to this format. The type code used (NV_IOCTL_MAGIC) is 'F' (0x46), which overlaps with the framebuffer ioctl range as registered in 2.6.34. We further see that only _IOWR() is used to declare ioctls, implying that the first two bits will always be '11'. Both of these deductions are borne out observing strace output of a CUDA process.

Code	Param size	Param location(s)	Driver API call sites	Notes
0xd2	0x048	stack	cuInit	Performed immediately following opening of the nvidiactl device
0xca	0x004	anonymous page	cuInit
0xc8	0x600	anonymous page	cuInit	Largest parameter by far. Possibly scaled? Shifted 3 bits left, this is 0x3000, the size of the amd64 anonymous mapping.
0x22	0x00c	stack	cuInit
0x2a	0x020	stack	cuInit
0x4d	0x048	stack	cuInit	Performed following opening of nvidiaX device
0x2d	0x14

@@ Line 60: / Line 60: @@
 | stack
 | cuInit
 |
 |-
 | 0x4d
 | 0x048
 | stack
-|
+| cuInit
 | Performed following opening of nvidiaX device
+|-
+| 0x2d
+| 0x14
+|
+|
+|
 |-
 |}

Libcudest: Difference between revisions

Revision as of 23:40, 22 April 2010

Maps

ioctls

See Also

navigation menu

Libcudest: Difference between revisions

Revision as of 23:40, 22 April 2010

Maps

ioctls

See Also

navigation menu

Search