Check out my first novel, midnight's simulacra!

Libcudest

From dankwiki
Revision as of 23:40, 22 April 2010 by Dank (talk | contribs) (→‎ioctls)

Reverse engineering of CUDA ioctls in the 3.0 SDK (195.36.15 driver, GTS 360M, amd64). CUDA primarily communicates with the NVIDIA closed-source driver via undocumented ioctl()s.

Maps

Ordered from highest to lowest locations in x86 memory. These are architecture-, and to a lesser degree driver- and kernel version-specific. Applications and libraries can of course create many more maps than these.

  • vsyscalls. read-execute-private, very few pages, topmost area of memory, usually highest mapping
  • VDSO. read-execute-private, one page, high in memory (SYSENTER/SYSEXIT)
  • Userspace stack. read-write-private, many pages, high in memory
  • Anonymous map, 3 read-write-private pages, high in memory.
    • Possibly associated with nvidia driver's NV_STACK_SIZE stack. read-write-private, (3 * 4096 on amd64, 2 * 4096 on i686)
  • Two sets of /dev/nvidiaX maps for each bound device. Sets are usually continguous, and contain:
    • an anonymous page, read-write-private
    • several mappings of the device, having variable number of pages, all read-write-shared
  • Libraries. variable, middle of memory.
  • Userspace heap. read-write-private, many pages, low in memory
  • Application (data region). read-write-private, variable, low in memory
  • Application (text region). read-execute-private, variable, usually lowest mapping

ioctls

An ioctl (on x86) is 32 bits. The following definition comes from <texttt>linux/asm-generic/ioctl.h</texttt> in a 2.6.34 kernel:

  • Bit 31: Read?
  • Bit 30: Write?
  • Bits 29-16: Parameter size
  • Bits 15-8: Type (module)
  • Bits 7-0: Number (command)

Looking at the source of the 195.36.15 kernel driver's OS interface, we see that NVIDIA uses the standard ioctl-creation macros from ioctl.h, and can be expected to adhere to this format. The type code used (NV_IOCTL_MAGIC) is 'F' (0x46), which overlaps with the framebuffer ioctl range as registered in 2.6.34. We further see that only _IOWR() is used to declare ioctls, implying that the first two bits will always be '11'. Both of these deductions are borne out observing strace output of a CUDA process.

Code Param size Param location(s) Driver API call sites Notes
0xd2 0x048 stack cuInit Performed immediately following opening of the nvidiactl device
0xca 0x004 anonymous page cuInit
0xc8 0x600 anonymous page cuInit Largest parameter by far. Possibly scaled? Shifted 3 bits left, this is 0x3000, the size of the amd64 anonymous mapping.
0x22 0x00c stack cuInit
0x2a 0x020 stack cuInit
0x4d 0x048 stack cuInit Performed following opening of nvidiaX device
0x2d 0x14

See Also