Libcudest: Difference between revisions

Revision as of 23:12, 22 April 2010

Reverse engineering of CUDA ioctls in the 3.0 SDK (195.36.15 driver, GTS 360M, amd64). CUDA primarily communicates with the NVIDIA closed-source driver via undocumented ioctl()s.

Maps

Ordered from highest to lowest locations in x86 memory. These are architecture-, and to a lesser degree driver- and kernel version-specific. Applications and libraries can of course create many more maps than these.

vsyscalls. read-execute-private, very few pages, topmost area of memory, usually highest mapping
VDSO. read-execute-private, one page, high in memory (SYSENTER/SYSEXIT)
Userspace stack. read-write-private, many pages, high in memory
Two sets of /dev/nvidiaX maps for each bound device. Sets are usually continguous, and contain:
- an anonymous page, read-write-private
- several mappings of the device, having variable number of pages, all read-write-shared
Map of nvidia driver's NV_STACK_SIZE stack. read-write-private, (3 * 4096 on amd64, 2 * 4096 on i686), high in memory
Libraries. variable, middle of memory.
Userspace heap. read-write-private, many pages, low in memory
Application (data region). read-write-private, variable, low in memory
Application (text region). read-execute-private, variable, usually lowest mapping

ioctls

An ioctl (on x86) is 32 bits. The following definition comes from <texttt>linux/asm-generic/ioctl.h</texttt> in a 2.6.34 kernel:

Bit 31: Read?
Bit 30: Write?
Bits 29-16: Parameter size
Bits 15-8: Type (module)
Bits 7-0: Number (command)

Looking at the source of the 195.36.15 kernel driver's OS interface, we see that NVIDIA uses the standard ioctl-creation macros from ioctl.h, and can be expected to adhere to this format. The type code used (NV_IOCTL_MAGIC) is 'F' (0x46), which overlaps with the framebuffer ioctl range as registered in 2.6.34. We further see that only _IOWR() is used to declare ioctls, implying that the first two bits will always be '11'. Both of these deductions are borne out observing strace output of a CUDA process.

Code	Param size	Param location(s)	Driver API call sites	Purpose

@@ Line 6: / Line 6: @@
 * VDSO. read-execute-private, one page, high in memory (SYSENTER/SYSEXIT)
 * Userspace stack. read-write-private, many pages, high in memory
+* Two sets of /dev/nvidiaX maps for each bound device. Sets are usually continguous, and contain:
+** an anonymous page, read-write-private
+** several mappings of the device, having variable number of pages, all read-write-shared
 * Map of nvidia driver's NV_STACK_SIZE stack. read-write-private, (3 * 4096 on amd64, 2 * 4096 on i686), high in memory
+* Libraries. variable, middle of memory.
 * Userspace heap. read-write-private, many pages, low in memory
 * Application (data region). read-write-private, variable, low in memory

Libcudest: Difference between revisions

Revision as of 23:12, 22 April 2010

Maps

ioctls

See Also

navigation menu

Libcudest: Difference between revisions

Revision as of 23:12, 22 April 2010

Maps

ioctls

See Also

navigation menu

Search