Check out my first novel, midnight's simulacra!
Libcudest
Reverse engineering of CUDA ioctls in the 3.0 SDK (195.36.15 driver, GTS 360M, amd64). CUDA primarily communicates with the NVIDIA closed-source driver via undocumented ioctl()s.
Maps
Ordered from highest to lowest locations in x86 memory. These are architecture-, and to a lesser degree driver- and kernel version-specific. Applications and libraries can of course create many more maps than these.
- vsyscalls. read-execute-private, very few pages, topmost area of memory, usually highest mapping
- VDSO. read-execute-private, one page, high in memory (SYSENTER/SYSEXIT)
- Userspace stack. read-write-private, many pages, high in memory
- Two sets of /dev/nvidiaX maps for each bound device. Sets are usually continguous, and contain:
- an anonymous page, read-write-private
- several mappings of the device, having variable number of pages, all read-write-shared
- Map of nvidia driver's NV_STACK_SIZE stack. read-write-private, (3 * 4096 on amd64, 2 * 4096 on i686), high in memory
- Libraries. variable, middle of memory.
- Userspace heap. read-write-private, many pages, low in memory
- Application (data region). read-write-private, variable, low in memory
- Application (text region). read-execute-private, variable, usually lowest mapping
ioctls
An ioctl (on x86) is 32 bits. The following definition comes from <texttt>linux/asm-generic/ioctl.h</texttt> in a 2.6.34 kernel:
- Bit 31: Read?
- Bit 30: Write?
- Bits 29-16: Parameter size
- Bits 15-8: Type (module)
- Bits 7-0: Number (command)
Looking at the source of the 195.36.15 kernel driver's OS interface, we see that NVIDIA uses the standard ioctl-creation macros from ioctl.h, and can be expected to adhere to this format. The type code used (NV_IOCTL_MAGIC) is 'F' (0x46), which overlaps with the framebuffer ioctl range as registered in 2.6.34. We further see that only _IOWR() is used to declare ioctls, implying that the first two bits will always be '11'. Both of these deductions are borne out observing strace output of a CUDA process.
Code | Param size | Param location(s) | Driver API call sites | Notes |
---|---|---|---|---|
0xd2 | 0x048 | stack | cuInit | Performed immediately following opening of the nvidiactl device |
0xca | 0x004 | anonymous page | cuInit | |
0xc8 | 0x600 | anonymous page | cuInit | Largest parameter by far. Possibly scaled? Shifted 3 bits left, this is 0x3000, the size of the amd64 anonymous mapping. |
See Also
- Kernel ioctl numbering documentation