Check out my first novel, midnight's simulacra!
Libcudest: Difference between revisions
No edit summary |
|||
(122 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
Reverse engineering of CUDA | __INDEX__ | ||
Reverse engineering of the [[CUDA]] system. CUDA primarily communicates with the NVIDIA closed-source driver via several dozen undocumented ioctl()s. My open source implementation, libcudest, is located at [http://github.com/dankamongmen/libcudest GitHub]. Sundry utilities for reverse engineering are also within this repository, though recent modifications to [http://kadu.net/~joi/valgrind-mmt.git/ valgrind-mmt] have rather superseded my tools. | |||
libcudest began as a project for Hyesoon Kim's [[Grad school|CS4803DGC]] at the Georgia Institute of Technology. | |||
==Driver versions== | |||
Newer drivers can be used with older CUDA versions, but the converse is not true. The "CUDA macroversion" listed below is the first CUDA release designed explicitly for use with the listed drivers. | |||
{| border="1" | |||
! Version | |||
! CUDA macroversion | |||
! Notes | |||
|- | |||
| 195.36.15 | |||
| 3.0 | |||
| | |||
|- | |||
| 195.36.24 | |||
| 3.0 | |||
| | |||
|- | |||
| 195.36.31 | |||
| 3.0 | |||
| | |||
|- | |||
| 256.22 | |||
| 3.1-beta | |||
| | |||
|- | |||
| 256.29 | |||
| 3.1-beta | |||
| | |||
|- | |||
| 256.35 | |||
| 3.1-beta | |||
| | |||
|- | |||
|} | |||
==CUDA Environment variables== | |||
Discovered via binary analysis and a shimmed <tt>getenv(3)</tt>. Effects determined via blackbox and binary analyses: | |||
{| border="1" | |||
! Variable | |||
! Notes | |||
! Documented? | |||
! Effects | |||
|- | |||
| __RM_NO_VERSION_CHECK | |||
| | |||
| N | |||
| Also checked by nvidia-smi | |||
|- | |||
| COMPUTE_PROFILE | |||
| | |||
| Y | |||
| If set to 1, profiling will be performed. Implies CUDA_LAUNCH_BLOCKING. | |||
|- | |||
| COMPUTE_PROFILE_CONFIG | |||
| | |||
| Y | |||
| Specifies a profiler configuration file. Only checked if COMPUTE_PROFILE is set. | |||
|- | |||
| COMPUTE_PROFILE_CSV | |||
| | |||
| Y | |||
| If set to 1, a profiling data will be written in CSV format. Only checked if COMPUTE_PROFILE is set. | |||
|- | |||
| COMPUTE_PROFILE_LOG | |||
| | |||
| Y | |||
| Specifies profiler output file (default: "./cuda_profile.log"). Only checked if COMPUTE_PROFILE is set. | |||
|- | |||
| CUDA_AMODEL_DLL | |||
| | |||
| N | |||
| | |||
|- | |||
| CUDA_AMODEL_GPU | |||
| | |||
| N | |||
| | |||
|- | |||
| CUDA_API_TRACE_PTR | |||
| | |||
| N | |||
| | |||
|- | |||
| CUDA_CACHE_DISABLE | |||
| | |||
| Y | |||
| If this is unset, the code cache will be used. | |||
|- | |||
| CUDA_CACHE_MAXSIZE | |||
| | |||
| Y | |||
| | |||
|- | |||
| CUDA_CACHE_PATH | |||
| | |||
| Y | |||
| If this is set, it overrides the code cache's default path of $HOME/.nv/ComputeCache | |||
|- | |||
| CUDA_DEVCODE_CACHE | |||
| | |||
| Y | |||
| PTX compilation cache. | |||
|- | |||
| CUDA_DEVCODE_PATH | |||
| | |||
| Y | |||
| Search path for fat binaries. | |||
|- | |||
| CUDA_EMULATION_MODE | |||
| | |||
| | |||
| | |||
|- | |||
| CUDA_FORCE_PTX_JIT | |||
| | |||
| | |||
| | |||
|- | |||
| CUDA_HEAP_RANGE | |||
| Checked each time a context is created | |||
| | |||
| | |||
|- | |||
| CUDA_INJECTION64_PATH | |||
| | |||
| | |||
| | |||
|- | |||
| CUDA_LAUNCH_BLOCKING | |||
| | |||
| Y (CUDA 3.0 Programmer's Guide, 3.2.6.1) | |||
| Forces synchronization of host threads on GPU kernels. | |||
|- | |||
| CUDA_MEMCHECK | |||
| Checked each time a context is created | |||
| | |||
| | |||
|- | |||
| CUDA_MEMORY_LOG | |||
| Checked each time a context is created | |||
| | |||
| | |||
| | |||
|- | |||
| CUDA_VISIBLE_DEVICES | |||
| | |||
| | |||
| | |||
|- | |||
|} | |||
==Maps== | ==Maps== | ||
Line 6: | Line 157: | ||
* VDSO. read-execute-private, one page, high in memory (SYSENTER/SYSEXIT) | * VDSO. read-execute-private, one page, high in memory (SYSENTER/SYSEXIT) | ||
* Userspace stack. read-write-private, many pages, high in memory | * Userspace stack. read-write-private, many pages, high in memory | ||
* Anonymous map, 3 read-write-private pages, high in memory. | |||
** Possibly associated with nvidia driver's NV_STACK_SIZE stack. read-write-private, (3 * 4096 on amd64, 2 * 4096 on i686) | |||
* Two sets of /dev/nvidiaX maps for each bound device. Sets are usually continguous, and contain: | * Two sets of /dev/nvidiaX maps for each bound device. Sets are usually continguous, and contain: | ||
** an anonymous page, read-write-private | ** an anonymous page, read-write-private | ||
** several mappings of the device, having variable number of pages, all read-write-shared | ** several mappings of the device, having variable number of pages, all read-write-shared | ||
* Libraries. variable, middle of memory. | * Libraries. variable, middle of memory. | ||
* Userspace heap. read-write-private, many pages, low in memory | * Userspace heap. read-write-private, many pages, low in memory | ||
* Application (data region). read-write-private, variable, low in memory | * Application (data region). read-write-private, variable, low in memory | ||
* Application (text region). read-execute-private, variable, usually lowest mapping | * Application (text region). read-execute-private, variable, usually lowest mapping | ||
===mmap()s=== | |||
{| border="1" | |||
|- | |||
! offset | |||
! size | |||
! notes | |||
! [http://nouveau.freedesktop.org/wiki/HwIntroduction Nouveau name] | |||
! block range | |||
|- | |||
| reg_addr + 0x0000 | |||
| 0x2000 | |||
| not mapped by libcuda | |||
| PMC functional block | |||
| 0x000000--0x001fff | |||
|- | |||
| reg_addr + 0x9000 | |||
| 0x1000 | |||
| [Rwxs] mapped in cuInit(). first mapping. per-device. | |||
| PTIMER functional block | |||
| 0x009000--0x009fff | |||
|- | |||
| reg_addr + 0xc0a000 / 0xc0c000 | |||
| 0x1000 | |||
| [RWxs] location is acquired from ioctl <tt>4e</tt> | |||
| PFIFO command submission interface | |||
| 0xc00000--0xcfffff | |||
|- | |||
|} | |||
==ioctls== | ==ioctls== | ||
An ioctl (on x86) is 32 bits. The following definition comes from < | An ioctl (on x86) is 32 bits. The following definition comes from <tt>linux/asm-generic/ioctl.h</tt> in a 2.6.34 kernel: | ||
* Bit 31: Read? | * Bit 31: Read? | ||
* Bit 30: Write? | * Bit 30: Write? | ||
Line 23: | Line 204: | ||
Looking at the source of the 195.36.15 kernel driver's OS interface, we see that NVIDIA uses the standard ioctl-creation macros from ioctl.h, and can be expected to adhere to this format. The type code used (NV_IOCTL_MAGIC) is 'F' (0x46), which overlaps with the framebuffer ioctl range as registered in 2.6.34. We further see that only _IOWR() is used to declare ioctls, implying that the first two bits will always be '11'. Both of these deductions are borne out observing strace output of a CUDA process. | Looking at the source of the 195.36.15 kernel driver's OS interface, we see that NVIDIA uses the standard ioctl-creation macros from ioctl.h, and can be expected to adhere to this format. The type code used (NV_IOCTL_MAGIC) is 'F' (0x46), which overlaps with the framebuffer ioctl range as registered in 2.6.34. We further see that only _IOWR() is used to declare ioctls, implying that the first two bits will always be '11'. Both of these deductions are borne out observing strace output of a CUDA process. | ||
{| border="1" | {| border="1" class="sortable" | ||
! Code | ! Code | ||
! Param size | ! Param size | ||
! Param location(s) | ! Param location(s) | ||
! Driver API call sites | ! Driver API call sites | ||
! | ! Notes | ||
|- | |||
! COLSPAN="5" style="background:#efefef;" | /dev/nvidiactl | |||
|- | |||
| 0xc8 | |||
NV_ESC_CARD_INFO | |||
| 0x600 (1536) | |||
| anonymous page | |||
| cuInit | |||
| | |||
* Largest parameter by far. | |||
** Possibly scaled? Shifted 3 bits left, this is 0x3000, the size of the amd64 anonymous mapping. | |||
** More likely we support returning up to 32x 48-byte descriptors, and... | |||
* Wants the first 32 bits to be 1, all others 0. | |||
** ...this is most likely a mask indicating which card IDs we want information for! | |||
<pre>typedef struct nv_ioctl_card_info | |||
{ | |||
NvU16 flags; /* see below */ | |||
NvU8 bus; /* bus number (PCI, AGP, etc) */ | |||
NvU8 slot; /* card slot */ | |||
NvU16 vendor_id; /* PCI vendor id */ | |||
NvU16 device_id; | |||
NvU16 interrupt_line; | |||
NvU64 reg_address NV_ALIGN_BYTES(8); | |||
NvU64 reg_size NV_ALIGN_BYTES(8); | |||
NvU64 fb_address NV_ALIGN_BYTES(8); | |||
NvU64 fb_size NV_ALIGN_BYTES(8); | |||
} nv_ioctl_card_info_t;</pre> | |||
* Returns (all subsequent bytes are 0): | |||
<pre>0x00010001 0x0cb110de 0x00000026 0x00000000 | |||
0xf2000000 0x00000000 0x01000000 0x00000000 | |||
0xe0000000 0x00000000 0x10000000 0x00000000</pre> | |||
* 0x0001: flag (NV_IOCTL_CARD_INFO_FLAG_PRESENT) | |||
* 0x0001: bus/slot | |||
* 0x0cb110de: vendor + device IDs | |||
** lspci -n: <tt>01:00.0 0300: 10de:0cb1 (rev a2)</tt> | |||
** lspci -t -v: <tt> \-[0000:00]-+-03.0-[01]--+-00.0 nVidia Corporation GT215 [GeForce GTS 360M]</tt> | |||
* 0x26: IRQ line (here, #38) | |||
* 0xf2000000 00000000: reg_address | |||
* 0x01000000 00000000: reg_size | |||
* 0xe0000000 00000000: fb_address | |||
* 0x10000000 00000000: fb_size | |||
** these are all system memory references, see <tt>/proc/iomem</tt>: | |||
<pre> e0000000-f30fffff : PCI Bus 0000:01 | |||
e0000000-efffffff : 0000:01:00.0 | |||
f0000000-f1ffffff : 0000:01:00.0 | |||
f2000000-f2ffffff : 0000:01:00.0 | |||
f2000000-f2ffffff : nvidia | |||
f3000000-f307ffff : 0000:01:00.0 | |||
f3080000-f3083fff : 0000:01:00.1 | |||
f3080000-f3083fff : ICH HD audio</pre> | |||
|- | |||
| 0xca | |||
NV_ESC_ENV_INFO | |||
| 0x004 | |||
| anonymous page | |||
| cuInit | |||
| | |||
* Seems to ignore input value. | |||
* Writes result value (0x00000001). | |||
<pre>typedef struct nv_ioctl_env_info | |||
{ | |||
NvU32 pat_supported; | |||
} nv_ioctl_env_info_t;</pre> | |||
|- | |||
| 0xce | |||
NV_ESC_ALLOC_OS_EVENT | |||
| 0x14 | |||
| | |||
| | |||
| | |||
|- | |||
| 0xcf | |||
NV_ESC_FREE_OS_EVENT | |||
| | |||
| | |||
| | |||
| | |||
|- | |||
| 0xd1 | |||
NV_ESC_STATUS_CODE | |||
| | |||
| | |||
| | |||
| | |||
|- | |- | ||
| 0xd2 | | 0xd2 | ||
| | NV_ESC_CHECK_VERSION_STR | ||
| 0x048 | |||
| stack | |||
| cuInit | |||
| | |||
* Performed immediately following opening of the nvidiactl device | |||
<pre>typedef struct nv_ioctl_rm_api_version | |||
{ | |||
NvU32 cmd; | |||
NvU32 reply; | |||
char versionString[NV_RM_API_VERSION_STRING_LENGTH]; | |||
} nv_ioctl_rm_api_version_t; | |||
#define NV_RM_API_VERSION_CMD_STRICT 0 | |||
#define NV_RM_API_VERSION_CMD_RELAXED '1' | |||
#define NV_RM_API_VERSION_CMD_OVERRIDE '2' | |||
#define NV_RM_API_VERSION_REPLY_UNRECOGNIZED 0 | |||
#define NV_RM_API_VERSION_REPLY_RECOGNIZED 1</pre> | |||
* 0x312e 3633 2e35 3931 35ull == 195.36.15 | |||
** '1' '.' '6' '3' '.' '5' '9' '1', '5' | |||
** looks like: all version chars in ascii. first 8 reversed, then any left follow? | |||
* All other bytes are 0. | |||
* Writes result to first 8 bytes (0x00000001), leaves others untouched | |||
|- | |||
| 0x22 | |||
| 0x00c | |||
| stack | |||
| cuInit | |||
| | |||
* Inputs set to 0. | |||
* Outputs (example): | |||
<pre>3251635025 65 0</pre> | |||
* First value is used as first input word to the majority of subsequent ioctls | |||
* Second value ranges over (at least) 41--65... | |||
* '''Not sent in 256.22/3.10...''' | |||
|- | |||
| 0x2a | |||
| 0x020 | |||
| stack | |||
| cuInit | |||
| | |||
* [[#GPU methods|GPU method]] invocation. Second and third words specify the method being called. Fifth and sixth specify the address being passed; seventh and eighth the size thereof. | |||
Sample inputs: | |||
<pre>0x7fffffffd310: 3251635025 3251635025 533 0 | |||
0x7fffffffd320: 4294955888 32767 132 0</pre> | |||
* First and second words are *not* always equivalent. | |||
* Outputs are usually unchanged, but not always: | |||
<pre>ioctl 2a, 32-byte param, fd 3 0xc1d04214 0x5c000002 0x2080012f 0x00000000 | |||
0x0010 0x950713f0 0x00007fff 0x000000a8 0x00000000 | |||
GPU method 0x5c000002:2080012f 0x00000000 0x00000000 0x00000000 0x00000000 | |||
0x0010 0x00000000 0x00000000 0x00000000 0x00000000 | |||
0x0020 0x00000000 0x00000000 0x00000000 0x00000000 | |||
0x0030 0x00000000 0x00000000 0x00000000 0x00000000 | |||
0x0040 0x00000000 0x00000000 0x00000000 0x00000000 | |||
0x0050 0x00000000 0x00000000 0x00000000 0x00000000 | |||
0x0060 0x00000000 0x00000000 0x00000000 0x00000000 | |||
0x0070 0x00000000 0x00000000 0x00000000 0x00000000 | |||
0x0080 0x00000000 0x00000000 0x00000000 0x00000000 | |||
0x0090 0x00000000 0x00000000 0x00000000 0x00000000 | |||
0x00a0 0x00000000 0x00000000 | |||
RESULT: 0 0xc1d04214 0x5c000002 0x2080012f 0x00000000 | |||
0x0010 0x950713f0 0x00007fff 0x000000a8 0x00000029 | |||
GPU method 0x5c000002:2080012f **************MODIFICATION FROM CALL | |||
0x00000000 0x00000000 0x00000000 0x00000000 | |||
0x0010 0x00000000 0x00000000 0x00000000 0x00000000 | |||
0x0020 0x00000000 0x00000000 0x00000000 0x00000000 | |||
0x0030 0x00000000 0x00000000 0x00000000 0x00000000 | |||
0x0040 0x00000000 0x00000000 0x00000000 0x00000000 | |||
0x0050 0x00000000 0x00000000 0x00000000 0x00000000 | |||
0x0060 0x00000000 0x00000000 0x00000000 0x00000000 | |||
0x0070 0x00000000 0x00000000 0x00000000 0x00000000 | |||
0x0080 0x00000000 0x00000000 0x00000000 0x00000000 | |||
0x0090 0x00000000 0x00000000 0x00000000 0x00000000 | |||
0x00a0 0x00000000 0x00000000 </pre> | |||
|- | |||
| 0x2b | |||
| 0x020 | |||
| stack | |||
| cuInit | |||
| | |||
* GPU object creation(?) | |||
|- | |||
| 0x4d | |||
| 0x048 | |||
| stack | |||
| cuInit | |||
| | |||
* Performed following opening of nvidiaX device | |||
|- | |||
| 0x2d | |||
| 0x014 | |||
| stack | | stack | ||
| cuInit | | cuInit | ||
| | | | ||
* Performed following read of /proc/interrupts | |||
|- | |||
| 0x4e | |||
| 0x030 | |||
| | |||
| cuInit | |||
| | |||
* Immediately prior to first mmap() | |||
|- | |- | ||
| | |- | ||
| | | 0x4f | ||
| | | 0x020 | ||
| | |||
| cuInit | | cuInit | ||
| | |||
* Invoked if mmap() returns MAP_FAILED, prior to failing out | |||
|- | |||
| 0x54 | |||
| 0x30 | |||
| | |||
| | |||
| | |||
|- | |||
| 0x57 | |||
| 0x038 | |||
| | | | ||
| | |||
| | |||
|- | |||
| 0x58 | |||
| 0x28 | |||
| | |||
| | |||
| | |||
|- | |||
| 0x59 | |||
| 0x10 | |||
| | |||
| | |||
| | |||
|- | |||
! colspan="5" style="background:#ffdead;" | /dev/nvidiaX | |||
|- | |||
| 0x32 | |||
| 0x014 | |||
| stack | |||
| cuInit | |||
| | |||
* Performed several times in succession | |||
|- | |||
| 0x37 | |||
| 0x020 | |||
| stack | |||
| cuInit | |||
| | |||
* Follows burst of 3x 0x32's, then interwoven with bursts of 2a's | |||
|- | |||
|} | |||
==GPU methods== | |||
{| border="1" class="sortable" | |||
! Code | |||
! Param size | |||
! Notes | |||
|- | |||
! COLSPAN="3" style="background:#efefef;" | 0x5c000002 (per-device) | |||
|- | |||
| 0x20800110 | |||
| 0x84 | |||
| | |||
* Retrieves device name: | |||
<pre>RESULT: 0 0xc1d04277 0x5c000002 0x20800110 0x00000000 | |||
0x0010 0x73be4970 0x00007fff 0x00000084 0x00000000 | |||
GPU method 0x5c000002:20800110 0x00000000 0x6f466547 0x20656372 0x20535447 | |||
0x0010 0x4d303633 0x00000000 0x00000000 0x00000000 </pre> | |||
* 6f46654720656372205354474d303633 == "oFeG ecr STGM063" | |||
|- | |- | ||
|} | |} | ||
==disassembly== | |||
These disassemblies makes use of <tt>libcuda.so.195.36.15</tt> (0867d66be617faab3782fa0ba19ec9ba, 7404990 bytes). Symbols were extracted via <tt>objdump -T</tt>. | |||
* AMD64 ABI: | |||
** Integer arguments via RDI, RSI, RDX, RCX, R8 and R9, then stack | |||
** FP arguments in XMM0..XMM7, then stack | |||
** Return value in RAX | |||
** [[libcuda traces]] | |||
==See Also== | ==See Also== | ||
* Kernel [http://www.mjmwired.net/kernel/Documentation/ioctl-number.txt ioctl numbering] documentation | * Kernel [http://www.mjmwired.net/kernel/Documentation/ioctl-number.txt ioctl numbering] documentation | ||
* My [[CUDA]] and [[CUBAR]] pages | |||
* I develped [[ptracer]] to get traces for this project | |||
** Some [[CUDA traces|traces]] | |||
[[CATEGORY: GPGPU]] | |||
[[CATEGORY: Projects]] |
Latest revision as of 22:18, 22 August 2011
Reverse engineering of the CUDA system. CUDA primarily communicates with the NVIDIA closed-source driver via several dozen undocumented ioctl()s. My open source implementation, libcudest, is located at GitHub. Sundry utilities for reverse engineering are also within this repository, though recent modifications to valgrind-mmt have rather superseded my tools.
libcudest began as a project for Hyesoon Kim's CS4803DGC at the Georgia Institute of Technology.
Driver versions
Newer drivers can be used with older CUDA versions, but the converse is not true. The "CUDA macroversion" listed below is the first CUDA release designed explicitly for use with the listed drivers.
Version | CUDA macroversion | Notes |
---|---|---|
195.36.15 | 3.0 | |
195.36.24 | 3.0 | |
195.36.31 | 3.0 | |
256.22 | 3.1-beta | |
256.29 | 3.1-beta | |
256.35 | 3.1-beta |
CUDA Environment variables
Discovered via binary analysis and a shimmed getenv(3). Effects determined via blackbox and binary analyses:
Variable | Notes | Documented? | Effects | |
---|---|---|---|---|
__RM_NO_VERSION_CHECK | N | Also checked by nvidia-smi | ||
COMPUTE_PROFILE | Y | If set to 1, profiling will be performed. Implies CUDA_LAUNCH_BLOCKING. | ||
COMPUTE_PROFILE_CONFIG | Y | Specifies a profiler configuration file. Only checked if COMPUTE_PROFILE is set. | ||
COMPUTE_PROFILE_CSV | Y | If set to 1, a profiling data will be written in CSV format. Only checked if COMPUTE_PROFILE is set. | ||
COMPUTE_PROFILE_LOG | Y | Specifies profiler output file (default: "./cuda_profile.log"). Only checked if COMPUTE_PROFILE is set. | ||
CUDA_AMODEL_DLL | N | |||
CUDA_AMODEL_GPU | N | |||
CUDA_API_TRACE_PTR | N | |||
CUDA_CACHE_DISABLE | Y | If this is unset, the code cache will be used. | ||
CUDA_CACHE_MAXSIZE | Y | |||
CUDA_CACHE_PATH | Y | If this is set, it overrides the code cache's default path of $HOME/.nv/ComputeCache | ||
CUDA_DEVCODE_CACHE | Y | PTX compilation cache. | ||
CUDA_DEVCODE_PATH | Y | Search path for fat binaries. | ||
CUDA_EMULATION_MODE | ||||
CUDA_FORCE_PTX_JIT | ||||
CUDA_HEAP_RANGE | Checked each time a context is created | |||
CUDA_INJECTION64_PATH | ||||
CUDA_LAUNCH_BLOCKING | Y (CUDA 3.0 Programmer's Guide, 3.2.6.1) | Forces synchronization of host threads on GPU kernels. | ||
CUDA_MEMCHECK | Checked each time a context is created | |||
CUDA_MEMORY_LOG | Checked each time a context is created | |||
CUDA_VISIBLE_DEVICES |
Maps
Ordered from highest to lowest locations in x86 memory. These are architecture-, and to a lesser degree driver- and kernel version-specific. Applications and libraries can of course create many more maps than these.
- vsyscalls. read-execute-private, very few pages, topmost area of memory, usually highest mapping
- VDSO. read-execute-private, one page, high in memory (SYSENTER/SYSEXIT)
- Userspace stack. read-write-private, many pages, high in memory
- Anonymous map, 3 read-write-private pages, high in memory.
- Possibly associated with nvidia driver's NV_STACK_SIZE stack. read-write-private, (3 * 4096 on amd64, 2 * 4096 on i686)
- Two sets of /dev/nvidiaX maps for each bound device. Sets are usually continguous, and contain:
- an anonymous page, read-write-private
- several mappings of the device, having variable number of pages, all read-write-shared
- Libraries. variable, middle of memory.
- Userspace heap. read-write-private, many pages, low in memory
- Application (data region). read-write-private, variable, low in memory
- Application (text region). read-execute-private, variable, usually lowest mapping
mmap()s
offset | size | notes | Nouveau name | block range |
---|---|---|---|---|
reg_addr + 0x0000 | 0x2000 | not mapped by libcuda | PMC functional block | 0x000000--0x001fff |
reg_addr + 0x9000 | 0x1000 | [Rwxs] mapped in cuInit(). first mapping. per-device. | PTIMER functional block | 0x009000--0x009fff |
reg_addr + 0xc0a000 / 0xc0c000 | 0x1000 | [RWxs] location is acquired from ioctl 4e | PFIFO command submission interface | 0xc00000--0xcfffff |
ioctls
An ioctl (on x86) is 32 bits. The following definition comes from linux/asm-generic/ioctl.h in a 2.6.34 kernel:
- Bit 31: Read?
- Bit 30: Write?
- Bits 29-16: Parameter size
- Bits 15-8: Type (module)
- Bits 7-0: Number (command)
Looking at the source of the 195.36.15 kernel driver's OS interface, we see that NVIDIA uses the standard ioctl-creation macros from ioctl.h, and can be expected to adhere to this format. The type code used (NV_IOCTL_MAGIC) is 'F' (0x46), which overlaps with the framebuffer ioctl range as registered in 2.6.34. We further see that only _IOWR() is used to declare ioctls, implying that the first two bits will always be '11'. Both of these deductions are borne out observing strace output of a CUDA process.
Code | Param size | Param location(s) | Driver API call sites | Notes |
---|---|---|---|---|
/dev/nvidiactl | ||||
0xc8
NV_ESC_CARD_INFO |
0x600 (1536) | anonymous page | cuInit |
typedef struct nv_ioctl_card_info { NvU16 flags; /* see below */ NvU8 bus; /* bus number (PCI, AGP, etc) */ NvU8 slot; /* card slot */ NvU16 vendor_id; /* PCI vendor id */ NvU16 device_id; NvU16 interrupt_line; NvU64 reg_address NV_ALIGN_BYTES(8); NvU64 reg_size NV_ALIGN_BYTES(8); NvU64 fb_address NV_ALIGN_BYTES(8); NvU64 fb_size NV_ALIGN_BYTES(8); } nv_ioctl_card_info_t;
0x00010001 0x0cb110de 0x00000026 0x00000000 0xf2000000 0x00000000 0x01000000 0x00000000 0xe0000000 0x00000000 0x10000000 0x00000000
e0000000-f30fffff : PCI Bus 0000:01 e0000000-efffffff : 0000:01:00.0 f0000000-f1ffffff : 0000:01:00.0 f2000000-f2ffffff : 0000:01:00.0 f2000000-f2ffffff : nvidia f3000000-f307ffff : 0000:01:00.0 f3080000-f3083fff : 0000:01:00.1 f3080000-f3083fff : ICH HD audio |
0xca
NV_ESC_ENV_INFO |
0x004 | anonymous page | cuInit |
typedef struct nv_ioctl_env_info { NvU32 pat_supported; } nv_ioctl_env_info_t; |
0xce
NV_ESC_ALLOC_OS_EVENT |
0x14 | |||
0xcf
NV_ESC_FREE_OS_EVENT |
||||
0xd1
NV_ESC_STATUS_CODE |
||||
0xd2
NV_ESC_CHECK_VERSION_STR |
0x048 | stack | cuInit |
typedef struct nv_ioctl_rm_api_version { NvU32 cmd; NvU32 reply; char versionString[NV_RM_API_VERSION_STRING_LENGTH]; } nv_ioctl_rm_api_version_t; #define NV_RM_API_VERSION_CMD_STRICT 0 #define NV_RM_API_VERSION_CMD_RELAXED '1' #define NV_RM_API_VERSION_CMD_OVERRIDE '2' #define NV_RM_API_VERSION_REPLY_UNRECOGNIZED 0 #define NV_RM_API_VERSION_REPLY_RECOGNIZED 1
|
0x22 | 0x00c | stack | cuInit |
3251635025 65 0
|
0x2a | 0x020 | stack | cuInit |
Sample inputs: 0x7fffffffd310: 3251635025 3251635025 533 0 0x7fffffffd320: 4294955888 32767 132 0
ioctl 2a, 32-byte param, fd 3 0xc1d04214 0x5c000002 0x2080012f 0x00000000 0x0010 0x950713f0 0x00007fff 0x000000a8 0x00000000 GPU method 0x5c000002:2080012f 0x00000000 0x00000000 0x00000000 0x00000000 0x0010 0x00000000 0x00000000 0x00000000 0x00000000 0x0020 0x00000000 0x00000000 0x00000000 0x00000000 0x0030 0x00000000 0x00000000 0x00000000 0x00000000 0x0040 0x00000000 0x00000000 0x00000000 0x00000000 0x0050 0x00000000 0x00000000 0x00000000 0x00000000 0x0060 0x00000000 0x00000000 0x00000000 0x00000000 0x0070 0x00000000 0x00000000 0x00000000 0x00000000 0x0080 0x00000000 0x00000000 0x00000000 0x00000000 0x0090 0x00000000 0x00000000 0x00000000 0x00000000 0x00a0 0x00000000 0x00000000 RESULT: 0 0xc1d04214 0x5c000002 0x2080012f 0x00000000 0x0010 0x950713f0 0x00007fff 0x000000a8 0x00000029 GPU method 0x5c000002:2080012f **************MODIFICATION FROM CALL 0x00000000 0x00000000 0x00000000 0x00000000 0x0010 0x00000000 0x00000000 0x00000000 0x00000000 0x0020 0x00000000 0x00000000 0x00000000 0x00000000 0x0030 0x00000000 0x00000000 0x00000000 0x00000000 0x0040 0x00000000 0x00000000 0x00000000 0x00000000 0x0050 0x00000000 0x00000000 0x00000000 0x00000000 0x0060 0x00000000 0x00000000 0x00000000 0x00000000 0x0070 0x00000000 0x00000000 0x00000000 0x00000000 0x0080 0x00000000 0x00000000 0x00000000 0x00000000 0x0090 0x00000000 0x00000000 0x00000000 0x00000000 0x00a0 0x00000000 0x00000000 |
0x2b | 0x020 | stack | cuInit |
|
0x4d | 0x048 | stack | cuInit |
|
0x2d | 0x014 | stack | cuInit |
|
0x4e | 0x030 | cuInit |
| |
0x4f | 0x020 | cuInit |
| |
0x54 | 0x30 | |||
0x57 | 0x038 | |||
0x58 | 0x28 | |||
0x59 | 0x10 | |||
/dev/nvidiaX | ||||
0x32 | 0x014 | stack | cuInit |
|
0x37 | 0x020 | stack | cuInit |
|
GPU methods
Code | Param size | Notes |
---|---|---|
0x5c000002 (per-device) | ||
0x20800110 | 0x84 |
RESULT: 0 0xc1d04277 0x5c000002 0x20800110 0x00000000 0x0010 0x73be4970 0x00007fff 0x00000084 0x00000000 GPU method 0x5c000002:20800110 0x00000000 0x6f466547 0x20656372 0x20535447 0x0010 0x4d303633 0x00000000 0x00000000 0x00000000
|
disassembly
These disassemblies makes use of libcuda.so.195.36.15 (0867d66be617faab3782fa0ba19ec9ba, 7404990 bytes). Symbols were extracted via objdump -T.
- AMD64 ABI:
- Integer arguments via RDI, RSI, RDX, RCX, R8 and R9, then stack
- FP arguments in XMM0..XMM7, then stack
- Return value in RAX
- libcuda traces
See Also
- Kernel ioctl numbering documentation
- My CUDA and CUBAR pages
- I develped ptracer to get traces for this project
- Some traces