Check out my first novel, midnight's simulacra!

Libcudest: Difference between revisions

From dankwiki
Jump to navigation Jump to search
Created page with 'Reverse engineering of CUDA ioctls in the 3.0 SDK (195.36.15 driver, GTS 360M, amd64). CUDA primarily communicates with the NVIDIA closed-source driver via undocumented ioctl()s....'
 
No edit summary
Line 6: Line 6:
* VDSO. read-execute-private, one page, high in memory (SYSENTER/SYSEXIT)
* VDSO. read-execute-private, one page, high in memory (SYSENTER/SYSEXIT)
* Userspace stack. read-write-private, many pages, high in memory
* Userspace stack. read-write-private, many pages, high in memory
* Two sets of /dev/nvidiaX maps for each bound device. Sets are usually continguous, and contain:
** an anonymous page, read-write-private
** several mappings of the device, having variable number of pages, all read-write-shared
* Map of nvidia driver's NV_STACK_SIZE stack. read-write-private, (3 * 4096 on amd64, 2 * 4096 on i686), high in memory
* Map of nvidia driver's NV_STACK_SIZE stack. read-write-private, (3 * 4096 on amd64, 2 * 4096 on i686), high in memory
* Libraries. variable, middle of memory.
* Userspace heap. read-write-private, many pages, low in memory
* Userspace heap. read-write-private, many pages, low in memory
* Application (data region). read-write-private, variable, low in memory
* Application (data region). read-write-private, variable, low in memory

Revision as of 23:12, 22 April 2010

Reverse engineering of CUDA ioctls in the 3.0 SDK (195.36.15 driver, GTS 360M, amd64). CUDA primarily communicates with the NVIDIA closed-source driver via undocumented ioctl()s.

Maps

Ordered from highest to lowest locations in x86 memory. These are architecture-, and to a lesser degree driver- and kernel version-specific. Applications and libraries can of course create many more maps than these.

  • vsyscalls. read-execute-private, very few pages, topmost area of memory, usually highest mapping
  • VDSO. read-execute-private, one page, high in memory (SYSENTER/SYSEXIT)
  • Userspace stack. read-write-private, many pages, high in memory
  • Two sets of /dev/nvidiaX maps for each bound device. Sets are usually continguous, and contain:
    • an anonymous page, read-write-private
    • several mappings of the device, having variable number of pages, all read-write-shared
  • Map of nvidia driver's NV_STACK_SIZE stack. read-write-private, (3 * 4096 on amd64, 2 * 4096 on i686), high in memory
  • Libraries. variable, middle of memory.
  • Userspace heap. read-write-private, many pages, low in memory
  • Application (data region). read-write-private, variable, low in memory
  • Application (text region). read-execute-private, variable, usually lowest mapping

ioctls

An ioctl (on x86) is 32 bits. The following definition comes from <texttt>linux/asm-generic/ioctl.h</texttt> in a 2.6.34 kernel:

  • Bit 31: Read?
  • Bit 30: Write?
  • Bits 29-16: Parameter size
  • Bits 15-8: Type (module)
  • Bits 7-0: Number (command)

Looking at the source of the 195.36.15 kernel driver's OS interface, we see that NVIDIA uses the standard ioctl-creation macros from ioctl.h, and can be expected to adhere to this format. The type code used (NV_IOCTL_MAGIC) is 'F' (0x46), which overlaps with the framebuffer ioctl range as registered in 2.6.34. We further see that only _IOWR() is used to declare ioctls, implying that the first two bits will always be '11'. Both of these deductions are borne out observing strace output of a CUDA process.

Code Param size Param location(s) Driver API call sites Purpose

See Also