Libcudest: Difference between revisions

Revision as of 23:27, 22 April 2010

Reverse engineering of CUDA ioctls in the 3.0 SDK (195.36.15 driver, GTS 360M, amd64). CUDA primarily communicates with the NVIDIA closed-source driver via undocumented ioctl()s.

Maps

Ordered from highest to lowest locations in x86 memory. These are architecture-, and to a lesser degree driver- and kernel version-specific. Applications and libraries can of course create many more maps than these.

vsyscalls. read-execute-private, very few pages, topmost area of memory, usually highest mapping
VDSO. read-execute-private, one page, high in memory (SYSENTER/SYSEXIT)
Userspace stack. read-write-private, many pages, high in memory
Two sets of /dev/nvidiaX maps for each bound device. Sets are usually continguous, and contain:
- an anonymous page, read-write-private
- several mappings of the device, having variable number of pages, all read-write-shared
Map of nvidia driver's NV_STACK_SIZE stack. read-write-private, (3 * 4096 on amd64, 2 * 4096 on i686), high in memory
Libraries. variable, middle of memory.
Userspace heap. read-write-private, many pages, low in memory
Application (data region). read-write-private, variable, low in memory
Application (text region). read-execute-private, variable, usually lowest mapping

ioctls

An ioctl (on x86) is 32 bits. The following definition comes from <texttt>linux/asm-generic/ioctl.h</texttt> in a 2.6.34 kernel:

Bit 31: Read?
Bit 30: Write?
Bits 29-16: Parameter size
Bits 15-8: Type (module)
Bits 7-0: Number (command)

Looking at the source of the 195.36.15 kernel driver's OS interface, we see that NVIDIA uses the standard ioctl-creation macros from ioctl.h, and can be expected to adhere to this format. The type code used (NV_IOCTL_MAGIC) is 'F' (0x46), which overlaps with the framebuffer ioctl range as registered in 2.6.34. We further see that only _IOWR() is used to declare ioctls, implying that the first two bits will always be '11'. Both of these deductions are borne out observing strace output of a CUDA process.

Code	Param size	Param location(s)	Driver API call sites	Notes
0xd2	0x048	stack	cuInit	Performed immediately following opening of the nvidiactl device
0xca	0x004	anonymous page	cuInit
0xc8	0x600	anonymous page	cuInit	Largest parameter by far. Possibly scaled? Shifted 3 bits left, this is 0x3000, the size of the amd64 anonymous mapping.

@@ Line 28: / Line 28: @@
 ! Param location(s)
 ! Driver API call sites
-! Purpose
+! Notes
 |-
 | 0xd2
-| 0x0048
+| 0x048
 | stack
 | cuInit
-|
+| Performed immediately following opening of the nvidiactl device
 |-
 | 0xca
-| 0x0004
+| 0x004
 | anonymous page
 | cuInit
 |
+|-
+| 0xc8
+| 0x600
+| anonymous page
+| cuInit
+| Largest parameter by far. Possibly scaled? Shifted 3 bits left, this is 0x3000, the size of the amd64 anonymous mapping.
 |-
 |}

anonymous

Search

Libcudest: Difference between revisions

Namespaces

more

page actions

Revision as of 23:27, 22 April 2010

Maps

ioctls

See Also

navigation

wiki tools

wiki tools

anonymous

Search

Libcudest: Difference between revisions

Revision as of 23:27, 22 April 2010

Maps

ioctls

See Also

navigation

wiki tools

page tools