Check out my first novel, midnight's simulacra!

Libcudest: Difference between revisions

From dankwiki
No edit summary
Line 28: Line 28:
! Param location(s)
! Param location(s)
! Driver API call sites
! Driver API call sites
! Purpose
! Notes
|-
|-
| 0xd2
| 0xd2
| 0x0048
| 0x048
| stack
| stack
| cuInit
| cuInit
|
| Performed immediately following opening of the nvidiactl device
|-
|-
| 0xca
| 0xca
| 0x0004
| 0x004
| anonymous page
| anonymous page
| cuInit
| cuInit
|  
|  
|-
| 0xc8
| 0x600
| anonymous page
| cuInit
| Largest parameter by far. Possibly scaled? Shifted 3 bits left, this is 0x3000, the size of the amd64 anonymous mapping.
|-
|-
|}
|}

Revision as of 23:27, 22 April 2010

Reverse engineering of CUDA ioctls in the 3.0 SDK (195.36.15 driver, GTS 360M, amd64). CUDA primarily communicates with the NVIDIA closed-source driver via undocumented ioctl()s.

Maps

Ordered from highest to lowest locations in x86 memory. These are architecture-, and to a lesser degree driver- and kernel version-specific. Applications and libraries can of course create many more maps than these.

  • vsyscalls. read-execute-private, very few pages, topmost area of memory, usually highest mapping
  • VDSO. read-execute-private, one page, high in memory (SYSENTER/SYSEXIT)
  • Userspace stack. read-write-private, many pages, high in memory
  • Two sets of /dev/nvidiaX maps for each bound device. Sets are usually continguous, and contain:
    • an anonymous page, read-write-private
    • several mappings of the device, having variable number of pages, all read-write-shared
  • Map of nvidia driver's NV_STACK_SIZE stack. read-write-private, (3 * 4096 on amd64, 2 * 4096 on i686), high in memory
  • Libraries. variable, middle of memory.
  • Userspace heap. read-write-private, many pages, low in memory
  • Application (data region). read-write-private, variable, low in memory
  • Application (text region). read-execute-private, variable, usually lowest mapping

ioctls

An ioctl (on x86) is 32 bits. The following definition comes from <texttt>linux/asm-generic/ioctl.h</texttt> in a 2.6.34 kernel:

  • Bit 31: Read?
  • Bit 30: Write?
  • Bits 29-16: Parameter size
  • Bits 15-8: Type (module)
  • Bits 7-0: Number (command)

Looking at the source of the 195.36.15 kernel driver's OS interface, we see that NVIDIA uses the standard ioctl-creation macros from ioctl.h, and can be expected to adhere to this format. The type code used (NV_IOCTL_MAGIC) is 'F' (0x46), which overlaps with the framebuffer ioctl range as registered in 2.6.34. We further see that only _IOWR() is used to declare ioctls, implying that the first two bits will always be '11'. Both of these deductions are borne out observing strace output of a CUDA process.

Code Param size Param location(s) Driver API call sites Notes
0xd2 0x048 stack cuInit Performed immediately following opening of the nvidiactl device
0xca 0x004 anonymous page cuInit
0xc8 0x600 anonymous page cuInit Largest parameter by far. Possibly scaled? Shifted 3 bits left, this is 0x3000, the size of the amd64 anonymous mapping.

See Also