Check out my first novel, midnight's simulacra!

Libcudest: Difference between revisions

From dankwiki
No edit summary
 
(107 intermediate revisions by the same user not shown)
Line 1: Line 1:
Reverse engineering of CUDA ioctls in the 3.0 SDK (195.36.15 driver, GTS 360M, amd64). CUDA primarily communicates with the NVIDIA closed-source driver via several hundred undocumented ioctl()s. My open source implementation is located at [http://github.com/dankamongmen/wdp/tree/master/cs4803dgc-project/ GitHub].
__INDEX__
Reverse engineering of the [[CUDA]] system. CUDA primarily communicates with the NVIDIA closed-source driver via several dozen undocumented ioctl()s. My open source implementation, libcudest, is located at [http://github.com/dankamongmen/libcudest GitHub]. Sundry utilities for reverse engineering are also within this repository, though recent modifications to [http://kadu.net/~joi/valgrind-mmt.git/ valgrind-mmt] have rather superseded my tools.
 
libcudest began as a project for Hyesoon Kim's [[Grad school|CS4803DGC]] at the Georgia Institute of Technology.
==Driver versions==
Newer drivers can be used with older CUDA versions, but the converse is not true. The "CUDA macroversion" listed below is the first CUDA release designed explicitly for use with the listed drivers.
{| border="1"
! Version
! CUDA macroversion
! Notes
|-
| 195.36.15
| 3.0
|
|-
| 195.36.24
| 3.0
|
|-
| 195.36.31
| 3.0
|
|-
| 256.22
| 3.1-beta
|
|-
| 256.29
| 3.1-beta
|
|-
| 256.35
| 3.1-beta
|
|-
|}
 
==CUDA Environment variables==
Discovered via binary analysis and a shimmed <tt>getenv(3)</tt>. Effects determined via blackbox and binary analyses:
{| border="1"
! Variable
! Notes
! Documented?
! Effects
|-
| __RM_NO_VERSION_CHECK
|
| N
| Also checked by nvidia-smi
|-
| COMPUTE_PROFILE
|
| Y
| If set to 1, profiling will be performed. Implies CUDA_LAUNCH_BLOCKING.
|-
| COMPUTE_PROFILE_CONFIG
|
| Y
| Specifies a profiler configuration file. Only checked if COMPUTE_PROFILE is set.
|-
| COMPUTE_PROFILE_CSV
|
| Y
| If set to 1, a profiling data will be written in CSV format. Only checked if COMPUTE_PROFILE is set.
|-
| COMPUTE_PROFILE_LOG
|
| Y
| Specifies profiler output file (default: "./cuda_profile.log"). Only checked if COMPUTE_PROFILE is set.
|-
| CUDA_AMODEL_DLL
|
| N
|
|-
| CUDA_AMODEL_GPU
|
| N
|
|-
| CUDA_API_TRACE_PTR
|
| N
|
|-
| CUDA_CACHE_DISABLE
|
| Y
| If this is unset, the code cache will be used.
|-
| CUDA_CACHE_MAXSIZE
|
| Y
|
|-
| CUDA_CACHE_PATH
|
| Y
| If this is set, it overrides the code cache's default path of $HOME/.nv/ComputeCache
|-
| CUDA_DEVCODE_CACHE
|
| Y
| PTX compilation cache.
|-
| CUDA_DEVCODE_PATH
|
| Y
| Search path for fat binaries.
|-
| CUDA_EMULATION_MODE
|
|
|
|-
| CUDA_FORCE_PTX_JIT
|
|
|
|-
| CUDA_HEAP_RANGE
| Checked each time a context is created
|
|
|-
| CUDA_INJECTION64_PATH
|
|
|
|-
| CUDA_LAUNCH_BLOCKING
|
| Y (CUDA 3.0 Programmer's Guide, 3.2.6.1)
| Forces synchronization of host threads on GPU kernels.
|-
| CUDA_MEMCHECK
| Checked each time a context is created
|
|
|-
| CUDA_MEMORY_LOG
| Checked each time a context is created
|
|
|
|-
| CUDA_VISIBLE_DEVICES
|
|
|
|-
|}


==Maps==
==Maps==
Line 15: Line 166:
* Application (data region). read-write-private, variable, low in memory
* Application (data region). read-write-private, variable, low in memory
* Application (text region). read-execute-private, variable, usually lowest mapping
* Application (text region). read-execute-private, variable, usually lowest mapping
===mmap()s===
{| border="1"
|-
! offset
! size
! notes
! [http://nouveau.freedesktop.org/wiki/HwIntroduction Nouveau name]
! block range
|-
| reg_addr + 0x0000
| 0x2000
| not mapped by libcuda
| PMC functional block
| 0x000000--0x001fff
|-
| reg_addr + 0x9000
| 0x1000
| [Rwxs] mapped in cuInit(). first mapping. per-device.
| PTIMER functional block
| 0x009000--0x009fff
|-
| reg_addr + 0xc0a000 / 0xc0c000
| 0x1000
| [RWxs] location is acquired from ioctl <tt>4e</tt>
| PFIFO command submission interface
| 0xc00000--0xcfffff
|-
|}


==ioctls==
==ioctls==
An ioctl (on x86) is 32 bits. The following definition comes from <texttt>linux/asm-generic/ioctl.h</texttt> in a 2.6.34 kernel:
An ioctl (on x86) is 32 bits. The following definition comes from <tt>linux/asm-generic/ioctl.h</tt> in a 2.6.34 kernel:
* Bit 31: Read?
* Bit 31: Read?
* Bit 30: Write?
* Bit 30: Write?
Line 32: Line 211:
! Notes
! Notes
|-
|-
| 0xd2
! COLSPAN="5" style="background:#efefef;" | /dev/nvidiactl
| 0x048
|-
| stack
| 0xc8
NV_ESC_CARD_INFO
| 0x600 (1536)
| anonymous page
| cuInit
| cuInit
| Performed immediately following opening of the nvidiactl device
|
* Largest parameter by far.
** Possibly scaled? Shifted 3 bits left, this is 0x3000, the size of the amd64 anonymous mapping.
** More likely we support returning up to 32x 48-byte descriptors, and...
* Wants the first 32 bits to be 1, all others 0.
** ...this is most likely a mask indicating which card IDs we want information for!
<pre>typedef struct nv_ioctl_card_info
{
    NvU16    flags;              /* see below                  */
    NvU8    bus;                /* bus number (PCI, AGP, etc)  */
    NvU8    slot;                /* card slot                  */
    NvU16    vendor_id;          /* PCI vendor id              */
    NvU16    device_id;
    NvU16    interrupt_line;
    NvU64    reg_address    NV_ALIGN_BYTES(8);
    NvU64    reg_size      NV_ALIGN_BYTES(8);
    NvU64    fb_address    NV_ALIGN_BYTES(8);
    NvU64    fb_size        NV_ALIGN_BYTES(8);
} nv_ioctl_card_info_t;</pre>
* Returns (all subsequent bytes are 0):
<pre>0x00010001 0x0cb110de 0x00000026 0x00000000
0xf2000000 0x00000000 0x01000000 0x00000000
0xe0000000 0x00000000 0x10000000 0x00000000</pre>
* 0x0001: flag (NV_IOCTL_CARD_INFO_FLAG_PRESENT)
* 0x0001: bus/slot
* 0x0cb110de: vendor + device IDs
** lspci -n: <tt>01:00.0 0300: 10de:0cb1 (rev a2)</tt>
** lspci -t -v: <tt> \-[0000:00]-+-03.0-[01]--+-00.0  nVidia Corporation GT215 [GeForce GTS 360M]</tt>
* 0x26: IRQ line (here, #38)
* 0xf2000000 00000000: reg_address
* 0x01000000 00000000: reg_size
* 0xe0000000 00000000: fb_address
* 0x10000000 00000000: fb_size
** these are all system memory references, see <tt>/proc/iomem</tt>:
<pre>  e0000000-f30fffff : PCI Bus 0000:01
    e0000000-efffffff : 0000:01:00.0
    f0000000-f1ffffff : 0000:01:00.0
    f2000000-f2ffffff : 0000:01:00.0
      f2000000-f2ffffff : nvidia
    f3000000-f307ffff : 0000:01:00.0
    f3080000-f3083fff : 0000:01:00.1
      f3080000-f3083fff : ICH HD audio</pre>
|-
|-
| 0xca
| 0xca
NV_ESC_ENV_INFO
| 0x004
| 0x004
| anonymous page
| anonymous page
| cuInit
| cuInit
|  
|  
* Seems to ignore input value.
* Writes result value (0x00000001).
<pre>typedef struct nv_ioctl_env_info
{
    NvU32 pat_supported;
} nv_ioctl_env_info_t;</pre>
|-
| 0xce
NV_ESC_ALLOC_OS_EVENT
| 0x14
|
|
|
|-
| 0xcf
NV_ESC_FREE_OS_EVENT
|
|
|
|
|-
| 0xd1
NV_ESC_STATUS_CODE
|
|
|
|
|-
|-
| 0xc8
| 0xd2
| 0x600
NV_ESC_CHECK_VERSION_STR
| anonymous page
| 0x048
| stack
| cuInit
| cuInit
| Largest parameter by far. Possibly scaled? Shifted 3 bits left, this is 0x3000, the size of the amd64 anonymous mapping.
|
* Performed immediately following opening of the nvidiactl device
<pre>typedef struct nv_ioctl_rm_api_version
{
    NvU32 cmd;
    NvU32 reply;
    char versionString[NV_RM_API_VERSION_STRING_LENGTH];
} nv_ioctl_rm_api_version_t;
 
#define NV_RM_API_VERSION_CMD_STRICT        0
#define NV_RM_API_VERSION_CMD_RELAXED      '1'
#define NV_RM_API_VERSION_CMD_OVERRIDE      '2'
 
#define NV_RM_API_VERSION_REPLY_UNRECOGNIZED 0
#define NV_RM_API_VERSION_REPLY_RECOGNIZED  1</pre>
* 0x312e 3633 2e35 3931 35ull == 195.36.15
** '1' '.' '6' '3' '.' '5' '9' '1', '5'
** looks like: all version chars in ascii. first 8 reversed, then any left follow?
* All other bytes are 0.
* Writes result to first 8 bytes (0x00000001), leaves others untouched
|-
|-
| 0x22
| 0x22
Line 55: Line 326:
| cuInit
| cuInit
|
|
* Inputs set to 0.
* Outputs (example):
<pre>3251635025 65 0</pre>
* First value is used as first input word to the majority of subsequent ioctls
* Second value ranges over (at least) 41--65...
* '''Not sent in 256.22/3.10...'''
|-
|-
| 0x2a
| 0x2a
Line 60: Line 337:
| stack
| stack
| cuInit
| cuInit
|  
|
* [[#GPU methods|GPU method]] invocation. Second and third words specify the method being called. Fifth and sixth specify the address being passed; seventh and eighth the size thereof.
Sample inputs:
<pre>0x7fffffffd310: 3251635025 3251635025 533 0
0x7fffffffd320: 4294955888 32767 132 0</pre>
* First and second words are *not* always equivalent.
* Outputs are usually unchanged, but not always:
<pre>ioctl 2a, 32-byte param, fd 3 0xc1d04214 0x5c000002 0x2080012f 0x00000000
0x0010 0x950713f0 0x00007fff 0x000000a8 0x00000000
GPU method 0x5c000002:2080012f 0x00000000 0x00000000 0x00000000 0x00000000
0x0010 0x00000000 0x00000000 0x00000000 0x00000000
0x0020 0x00000000 0x00000000 0x00000000 0x00000000
0x0030 0x00000000 0x00000000 0x00000000 0x00000000
0x0040 0x00000000 0x00000000 0x00000000 0x00000000
0x0050 0x00000000 0x00000000 0x00000000 0x00000000
0x0060 0x00000000 0x00000000 0x00000000 0x00000000
0x0070 0x00000000 0x00000000 0x00000000 0x00000000
0x0080 0x00000000 0x00000000 0x00000000 0x00000000
0x0090 0x00000000 0x00000000 0x00000000 0x00000000
0x00a0 0x00000000 0x00000000
RESULT: 0 0xc1d04214 0x5c000002 0x2080012f 0x00000000
0x0010 0x950713f0 0x00007fff 0x000000a8 0x00000029
GPU method 0x5c000002:2080012f **************MODIFICATION FROM CALL
0x00000000 0x00000000 0x00000000 0x00000000
0x0010 0x00000000 0x00000000 0x00000000 0x00000000
0x0020 0x00000000 0x00000000 0x00000000 0x00000000
0x0030 0x00000000 0x00000000 0x00000000 0x00000000
0x0040 0x00000000 0x00000000 0x00000000 0x00000000
0x0050 0x00000000 0x00000000 0x00000000 0x00000000
0x0060 0x00000000 0x00000000 0x00000000 0x00000000
0x0070 0x00000000 0x00000000 0x00000000 0x00000000
0x0080 0x00000000 0x00000000 0x00000000 0x00000000
0x0090 0x00000000 0x00000000 0x00000000 0x00000000
0x00a0 0x00000000 0x00000000 </pre>
|-
| 0x2b
| 0x020
| stack
| cuInit
|
* GPU object creation(?)
|-
|-
| 0x4d
| 0x4d
Line 66: Line 383:
| stack
| stack
| cuInit  
| cuInit  
| Performed following opening of nvidiaX device
|
* Performed following opening of nvidiaX device
|-
|-
| 0x2d
| 0x2d
Line 72: Line 390:
| stack
| stack
| cuInit
| cuInit
| Performed following read of /proc/interrupts
|
* Performed following read of /proc/interrupts
|-
| 0x4e
| 0x030
|
| cuInit
|
* Immediately prior to first mmap()
|-
|-
| 0x4f
| 0x020
|
| cuInit
|
* Invoked if mmap() returns MAP_FAILED, prior to failing out
|-
| 0x54
| 0x30
|
|
|
|-
| 0x57
| 0x038
|
|
|
|-
| 0x58
| 0x28
|
|
|
|-
| 0x59
| 0x10
|
|
|
|-
! colspan="5" style="background:#ffdead;" | /dev/nvidiaX
|-
|-
| 0x32
| 0x32
Line 78: Line 438:
| stack
| stack
| cuInit
| cuInit
| Performed several times in succession
|
* Performed several times in succession
|-
| 0x37
| 0x020
| stack
| cuInit
|
* Follows burst of 3x 0x32's, then interwoven with bursts of 2a's
|-
|}
==GPU methods==
{| border="1" class="sortable"
! Code
! Param size
! Notes
|-
! COLSPAN="3" style="background:#efefef;" | 0x5c000002 (per-device)
|-
| 0x20800110
| 0x84
|
* Retrieves device name:
<pre>RESULT: 0 0xc1d04277 0x5c000002 0x20800110 0x00000000
0x0010 0x73be4970 0x00007fff 0x00000084 0x00000000
GPU method 0x5c000002:20800110 0x00000000 0x6f466547 0x20656372 0x20535447
0x0010 0x4d303633 0x00000000 0x00000000 0x00000000 </pre>
* 6f46654720656372205354474d303633 == "oFeG ecr STGM063"
|-
|-
|}
|}
==raw data==
<pre>edi == ebp
esi == 0xc04846d2
rdx == r12
call(edi,esi,rdx)
eax == 0
ebp == file descriptor
rsp(0x4c7) = 0
rsp(0x488) = rax
rsp(0x484) = 0
rsp(0x480) = 0
r12 = rsp + 0x480 (0x7ffff78b3c41)
rbx(0x30) = 0
rbx(0x28) = 0
rbx(0x20) = 0
rbx(0x18) = 0
rbx(0x10) = 0
rbx(0x8) = 0x35
rbx(0x38) = 0
cuInit:
  0x7ffff78b3031: mov    0x8(%rsp),%ecx
  0x7ffff78b3035: mov    $0x14,%r8d
  0x7ffff78b303b: mov    $0xa02,%edx
  0x7ffff78b3040: mov    %ebp,%esi
  0x7ffff78b3042: mov    %ebp,%edi
  0x7ffff78b3044: callq  0x7ffff78b1a60
  0x7ffff78b3049: test  %eax,%eax
  0x7ffff78b304b: jne    0x7ffff78b2b84
  0x7ffff78b3051: mov    0x1c(%rsp),%eax
  0x7ffff78b3055: cmp    0x6c(%rsp),%eax
  0x7ffff78b3059: jne    0x7ffff78b2b84
  0x7ffff78b305f: nop
  0x7ffff78b3060: jmpq  0x7ffff78b2c70
  0x7ffff78b3065: mov    0x704944(%rip),%r9        # 0x7ffff7fb79b0
  0x7ffff78b306c: mov    (%r9),%rdi
  0x7ffff78b306f: mov    0x10(%rdi),%rdx
  0x7ffff78b3073: test  %rdx,%rdx
  0x7ffff78b3076: je    0x7ffff78b3094
  0x7ffff78b3078: cmp    %r8d,(%rdx)
  0x7ffff78b307b: jne    0x7ffff78b308b
  0x7ffff78b307d: jmpq  0x7ffff78b2f82
  0x7ffff78b3082: cmp    (%rdx),%r8d
  0x7ffff78b3085: je    0x7ffff78b2f82
  0x7ffff78b308b: mov    0x10(%rdx),%rdx
  0x7ffff78b308f: test  %rdx,%rdx
  0x7ffff78b3092: jne    0x7ffff78b3082
  0x7ffff78b3094: mov    $0x1d,%r12d
  0x7ffff78b309a: movl  $0x0,0x708768(%rip)        # 0x7ffff7fbb80c
  0x7ffff78b30a4: jmpq  0x7ffff78b29e5
  0x7ffff78b30a9: mov    0x58(%rsp),%edi
  0x7ffff78b30ad: test  %edi,%edi
  0x7ffff78b30af: je    0x7ffff78b29e5
  0x7ffff78b30b5: mov    %rbx,%rdi
  0x7ffff78b30b8: callq  0x7ffff78b22d0
  0x7ffff78b30bd: mov    0x58(%rsp),%r12d
  0x7ffff78b30c2: jmpq  0x7ffff78b29e5
  0x7ffff78b30c7: mov    0x70551a(%rip),%r15        # 0x7ffff7fb85e8
  0x7ffff78b30ce: mov    (%r15),%rbx
  0x7ffff78b30d1: test  %rbx,%rbx
  0x7ffff78b30d4: je    0x7ffff78b2f9e
  0x7ffff78b30da: lea    0x20(%rsp),%rdx
  0x7ffff78b30df: jmp    0x7ffff78b30ee
  0x7ffff78b30e1: mov    0x30(%rbx),%rbx
  0x7ffff78b30e5: test  %rbx,%rbx
  0x7ffff78b30e8: je    0x7ffff78b2f9e
  0x7ffff78b30ee: cmp    (%rbx),%ebp
  0x7ffff78b30f0: jne    0x7ffff78b30e1
  0x7ffff78b30f2: cmp    0x4(%rbx),%r14d
  0x7ffff78b30f6: jne    0x7ffff78b30e1
  0x7ffff78b30f8: movq  $0x0,0x20(%rsp)
  0x7ffff78b3101: movq  $0x0,0x28(%rsp)
  0x7ffff78b310a: xor    %eax,%eax
  0x7ffff78b310c: mov    %ebp,0x20(%rsp)
  0x7ffff78b3110: mov    %r14d,0x28(%rsp)
  0x7ffff78b3115: mov    $0xc020462b,%esi
  0x7ffff78b311a: mov    0x18(%rsp),%ebp
  0x7ffff78b311e: mov    0x10(%rsp),%r14
  0x7ffff78b3123: mov    0x676f57(%rip),%edi        # 0x7ffff7f2a080
  0x7ffff78b3129: movl  $0x0,0x7086d9(%rip)        # 0x7ffff7fbb80c
  0x7ffff78b3133: movq  $0x0,0x38(%rsp)
  0x7ffff78b313c: movl  $0x83f3,0x2c(%rsp)
  0x7ffff78b3144: mov    %ebp,0x24(%rsp)
  0x7ffff78b3148: mov    %r14,0x30(%rsp)
  0x7ffff78b314d: callq  0x7ffff782ab20 <ioctl@plt>
  0x7ffff78b3152: test  %eax,%eax
  0x7ffff78b3154: js    0x7ffff78b2f2a
  0x7ffff78b315a: mov    0x38(%rsp),%r9d
  0x7ffff78b315f: test  %r9d,%r9d
  0x7ffff78b3162: je    0x7ffff78b29e5
  0x7ffff78b3168: mov    %rbx,%rdi
  0x7ffff78b316b: callq  0x7ffff78b22d0
  0x7ffff78b3170: mov    0x38(%rsp),%r12d
  0x7ffff78b3175: jmpq  0x7ffff78b29e5
  0x7ffff78b317a: data32 xchg %ax,%ax
  0x7ffff78b317d: data32 xchg %ax,%ax
  0x7ffff78b3180: mov    %r12,-0x20(%rsp)
  0x7ffff78b3185: mov    %r13,-0x18(%rsp)
  0x7ffff78b318a: mov    %edi,%r12d
  0x7ffff78b318d: mov    %r14,-0x10(%rsp)
  0x7ffff78b3192: mov    %r15,-0x8(%rsp)
  0x7ffff78b3197: mov    %esi,%r14d
  0x7ffff78b319a: mov    %rbx,-0x30(%rsp)
  0x7ffff78b319f: mov    %rbp,-0x28(%rsp)
  0x7ffff78b31a4: sub    $0x68,%rsp
  0x7ffff78b31a8: cmp    $0x80,%edx
  0x7ffff78b31ae: mov    %edx,%r13d
  0x7ffff78b31b1: mov    %rcx,%r15
  0x7ffff78b31b4: jb    0x7ffff78b327a
  0x7ffff78b31ba: cmp    $0x87,%edx
  0x7ffff78b31c0: ja    0x7ffff78b3272
  0x7ffff78b31c6: lea    -0x80(%r13),%esi
  0x7ffff78b31ca: xor    %edx,%edx
  0x7ffff78b31cc: callq  0x7ffff78b2160
  0x7ffff78b31d1: mov    %eax,%edx
  0x7ffff78b31d3: mov    %eax,%edi
  0x7ffff78b31d5: shr    $0x1f,%edx
  0x7ffff78b31d8: cmp    $0x20,%eax
  0x7ffff78b31db: sete  %bl
  0x7ffff78b31de: or    %dl,%bl
  0x7ffff78b31e0: jne    0x7ffff78b327a
  0x7ffff78b31e6: mov    %edi,%ecx
  0x7ffff78b31e8: mov    %r14d,%esi
  0x7ffff78b31eb: mov    %r14d,%edx
  0x7ffff78b31ee: mov    %r12d,%edi
  0x7ffff78b31f1: callq  0x7ffff78b2400
  0x7ffff78b31f6: xor    %esi,%esi
  0x7ffff78b31f8: test  %eax,%eax
  0x7ffff78b31fa: mov    %eax,%ebp
  0x7ffff78b31fc: mov    $0x1,%ecx
  0x7ffff78b3201: jne    0x7ffff78b327f
  0x7ffff78b3203: mov    %esi,%eax
  0x7ffff78b3205:
    lock cmpxchg %ecx,0x7085ff(%rip)        # 0x7ffff7fbb80c
  0x7ffff78b320d: setne  %dl
  0x7ffff78b3210: test  %dl,%dl
  0x7ffff78b3212: je    0x7ffff78b3400
  0x7ffff78b3218: mov    0x7085ee(%rip),%edi        # 0x7ffff7fbb80c
  0x7ffff78b321e: test  %edi,%edi
  0x7ffff78b3220: je    0x7ffff78b3203
  0x7ffff78b3222: mov    0x7085e3(%rip),%r8d        # 0x7ffff7fbb80c
  0x7ffff78b3229: test  %r8d,%r8d
  0x7ffff78b322c: je    0x7ffff78b3203
  0x7ffff78b322e: mov    0x7085d7(%rip),%r9d        # 0x7ffff7fbb80c
  0x7ffff78b3235: test  %r9d,%r9d
  0x7ffff78b3238: je    0x7ffff78b3203
  0x7ffff78b323a: mov    0x7085cb(%rip),%r10d        # 0x7ffff7fbb80c
  0x7ffff78b3241: test  %r10d,%r10d
  0x7ffff78b3244: je    0x7ffff78b3203
  0x7ffff78b3246: mov    0x7085bf(%rip),%r11d        # 0x7ffff7fbb80c
  0x7ffff78b324d: test  %r11d,%r11d
  0x7ffff78b3250: je    0x7ffff78b3203
  0x7ffff78b3252: mov    0x7085b4(%rip),%ebx        # 0x7ffff7fbb80c
  0x7ffff78b3258: test  %ebx,%ebx
  0x7ffff78b325a: je    0x7ffff78b3203
  0x7ffff78b325c: mov    0x7085aa(%rip),%edx        # 0x7ffff7fbb80c
  0x7ffff78b3262: test  %edx,%edx
  0x7ffff78b3264: je    0x7ffff78b3203
  0x7ffff78b3266: mov    0x7085a0(%rip),%eax        # 0x7ffff7fbb80c
  0x7ffff78b326c: test  %eax,%eax
  0x7ffff78b326e: jne    0x7ffff78b3218
  0x7ffff78b3270: jmp    0x7ffff78b3203
  0x7ffff78b3272: cmp    $0xff,%edx
  0x7ffff78b3278: je    0x7ffff78b32a4
  0x7ffff78b327a: mov    $0x2a,%ebp
  0x7ffff78b327f: mov    %ebp,%eax
  0x7ffff78b3281: mov    0x38(%rsp),%rbx
  0x7ffff78b3286: mov    0x40(%rsp),%rbp
  0x7ffff78b328b: mov    0x48(%rsp),%r12
  0x7ffff78b3290: mov    0x50(%rsp),%r13
  0x7ffff78b3295: mov    0x58(%rsp),%r14
  0x7ffff78b329a: mov    0x60(%rsp),%r15
  0x7ffff78b329f: add    $0x68,%rsp
  0x7ffff78b32a3: retq 
  0x7ffff78b32a4: test  %rcx,%rcx
  0x7ffff78b32a7: je    0x7ffff78b327a
  0x7ffff78b32a9: mov    $0x3a,%esi
  0x7ffff78b32ae: mov    %rcx,%rdi
  0x7ffff78b32b1: callq  0x7ffff782a980 <strchr@plt>
  0x7ffff78b32b6: test  %rax,%rax
  0x7ffff78b32b9: je    0x7ffff78b327a
  0x7ffff78b32bb: cmpb  $0x2a,(%r15)
  0x7ffff78b32bf: je    0x7ffff78b327a
  0x7ffff78b32c1: lea    0x28(%rsp),%rsi
  0x7ffff78b32c6: xor    %ecx,%ecx
  0x7ffff78b32c8: xor    %edx,%edx
  0x7ffff78b32ca: mov    %r15,%rdi
  0x7ffff78b32cd: callq  0x7ffff782a930 <__strtol_internal@plt>
  0x7ffff78b32d2: mov    0x28(%rsp),%rdi
  0x7ffff78b32d7: xor    %edx,%edx
  0x7ffff78b32d9: xor    %ecx,%ecx
  0x7ffff78b32db: xor    %esi,%esi
  0x7ffff78b32dd: mov    %eax,%ebx
  0x7ffff78b32df: inc    %rdi
  0x7ffff78b32e2: callq  0x7ffff782a930 <__strtol_internal@plt>
  0x7ffff78b32e7: mov    0x705012(%rip),%rdx        # 0x7ffff7fb8300
  0x7ffff78b32ee: mov    %eax,%r8d
  0x7ffff78b32f1: xor    %edi,%edi
  0x7ffff78b32f3: add    $0x30,%rdx
  0x7ffff78b32f7: jmpq  0x7ffff78b33d4
  0x7ffff78b32fc: lea    0x30(%rdx),%rcx
  0x7ffff78b3300: lea    0x1(%rdi),%esi
  0x7ffff78b3303: testb  $0x1,-0x30(%rcx)
  0x7ffff78b3307: mov    %esi,%edi
  0x7ffff78b3309: je    0x7ffff78b3317
  0x7ffff78b330b: movzbl -0x2e(%rcx),%ebp
  0x7ffff78b330f: cmp    %ebp,%ebx
  0x7ffff78b3311: je    0x7ffff78b34a2
  0x7ffff78b3317: lea    0x30(%rcx),%rdx
  0x7ffff78b331b: lea    0x1(%rsi),%edi
  0x7ffff78b331e: testb  $0x1,-0x30(%rdx)
  0x7ffff78b3322: je    0x7ffff78b3332
  0x7ffff78b3324: movzbl -0x2e(%rdx),%r10d
  0x7ffff78b3329: cmp    %r10d,%ebx
  0x7ffff78b332c: je    0x7ffff78b34b5
  0x7ffff78b3332: lea    0x60(%rcx),%rdx
  0x7ffff78b3336: lea    0x2(%rsi),%edi
  0x7ffff78b3339: testb  $0x1,-0x30(%rdx)
  0x7ffff78b333d: je    0x7ffff78b334b
  0x7ffff78b333f: movzbl -0x2e(%rdx),%eax
  0x7ffff78b3343: cmp    %eax,%ebx
  0x7ffff78b3345: je    0x7ffff78b34e5
  0x7ffff78b334b: lea    0x90(%rcx),%rdx
  0x7ffff78b3352: lea    0x3(%rsi),%edi
  0x7ffff78b3355: testb  $0x1,-0x30(%rdx)
  0x7ffff78b3359: je    0x7ffff78b3369
  0x7ffff78b335b: movzbl -0x2e(%rdx),%r9d
  0x7ffff78b3360: cmp    %r9d,%ebx
  0x7ffff78b3363: je    0x7ffff78b34f7
  0x7ffff78b3369: lea    0xc0(%rcx),%rdx
  0x7ffff78b3370: lea    0x4(%rsi),%edi
  0x7ffff78b3373: testb  $0x1,-0x30(%rdx)
  0x7ffff78b3377: je    0x7ffff78b3387
  0x7ffff78b3379: movzbl -0x2e(%rdx),%r11d
  0x7ffff78b337e: cmp    %r11d,%ebx
  0x7ffff78b3381: je    0x7ffff78b3510
  0x7ffff78b3387: lea    0xf0(%rcx),%rdx
  0x7ffff78b338e: lea    0x5(%rsi),%edi
  0x7ffff78b3391: testb  $0x1,-0x30(%rdx)
  0x7ffff78b3395: je    0x7ffff78b33a3
  0x7ffff78b3397: movzbl -0x2e(%rdx),%ebp
  0x7ffff78b339b: cmp    %ebp,%ebx
  0x7ffff78b339d: je    0x7ffff78b3525
  0x7ffff78b33a3: lea    0x120(%rcx),%rdx
  0x7ffff78b33aa: lea    0x6(%rsi),%edi
  0x7ffff78b33ad: testb  $0x1,-0x30(%rdx)
  0x7ffff78b33b1: je    0x7ffff78b33c1
  0x7ffff78b33b3: movzbl -0x2e(%rdx),%r10d
  0x7ffff78b33b8: cmp    %r10d,%ebx
  0x7ffff78b33bb: je    0x7ffff78b34d0
  0x7ffff78b33c1: lea    0x7(%rsi),%edi
  0x7ffff78b33c4: lea    0x150(%rcx),%rdx
  0x7ffff78b33cb: cmp    $0x20,%edi
  0x7ffff78b33ce: je    0x7ffff78b327a
  0x7ffff78b33d4: testb  $0x1,-0x30(%rdx)
  0x7ffff78b33d8: je    0x7ffff78b32fc
  0x7ffff78b33de: movzbl -0x2e(%rdx),%eax
  0x7ffff78b33e2: cmp    %eax,%ebx
  0x7ffff78b33e4: jne    0x7ffff78b32fc
  0x7ffff78b33ea: movzbl -0x2d(%rdx),%ecx
  0x7ffff78b33ee: cmp    %ecx,%r8d
  0x7ffff78b33f1: jne    0x7ffff78b32fc
  0x7ffff78b33f7: jmpq  0x7ffff78b31e6
  0x7ffff78b33fc: data32 data32 xchg %ax,%ax
  0x7ffff78b3400: mov    0x7051e1(%rip),%rsi        # 0x7ffff7fb85e8
  0x7ffff78b3407: mov    (%rsi),%rbx
  0x7ffff78b340a: test  %rbx,%rbx
  0x7ffff78b340d: jne    0x7ffff78b341b
  0x7ffff78b340f: nop
  0x7ffff78b3410: jmp    0x7ffff78b348e
  0x7ffff78b3412: mov    0x30(%rbx),%rbx
  0x7ffff78b3416: test  %rbx,%rbx
  0x7ffff78b3419: je    0x7ffff78b348e
  0x7ffff78b341b: cmp    (%rbx),%r12d
  0x7ffff78b341e: xchg  %ax,%ax
  0x7ffff78b3420: jne    0x7ffff78b3412
  0x7ffff78b3422: cmp    0x4(%rbx),%r14d
  0x7ffff78b3426: jne    0x7ffff78b3412
  0x7ffff78b3428: mov    0x676c52(%rip),%edi        # 0x7ffff7f2a080
  0x7ffff78b342e: xor    %eax,%eax
  0x7ffff78b3430: mov    %rsp,%rdx
  0x7ffff78b3433: mov    $0xc0204623,%esi
  0x7ffff78b3438: movq  $0x0,(%rsp)
  0x7ffff78b3440: movq  $0x0,0x8(%rsp)
  0x7ffff78b3449: movl  $0x0,0x7083b9(%rip)        # 0x7ffff7fbb80c
  0x7ffff78b3453: movq  $0x0,0x18(%rsp)
  0x7ffff78b345c: mov    %r12d,(%rsp)
  0x7ffff78b3460: mov    %r14d,0x4(%rsp)
  0x7ffff78b3465: mov    %r13d,0x8(%rsp)
  0x7ffff78b346a: mov    %r15,0x10(%rsp)
  0x7ffff78b346f: callq  0x7ffff782ab20 <ioctl@plt>
  0x7ffff78b3474: test  %eax,%eax
  0x7ffff78b3476: jns    0x7ffff78b353b
  0x7ffff78b347c: mov    %rbx,%rdi
  0x7ffff78b347f: mov    $0x2a,%ebp
  0x7ffff78b3484: callq  0x7ffff78b22d0
  0x7ffff78b3489: jmpq  0x7ffff78b327f
  0x7ffff78b348e: mov    $0xb,%ebp
  0x7ffff78b3493: movl  $0x0,0x70836f(%rip)        # 0x7ffff7fbb80c
  0x7ffff78b349d: jmpq  0x7ffff78b327f
  0x7ffff78b34a2: movzbl -0x2d(%rcx),%r9d
  0x7ffff78b34a7: cmp    %r9d,%r8d
  0x7ffff78b34aa: jne    0x7ffff78b3317
  0x7ffff78b34b0: jmpq  0x7ffff78b31e6
  0x7ffff78b34b5: movzbl -0x2d(%rdx),%r11d
  0x7ffff78b34ba: cmp    %r11d,%r8d
  0x7ffff78b34bd: data32 xchg %ax,%ax
  0x7ffff78b34c0: jne    0x7ffff78b3332
  0x7ffff78b34c6: jmpq  0x7ffff78b31e6
  0x7ffff78b34cb: data32 xchg %ax,%ax
  0x7ffff78b34ce: xchg  %ax,%ax
  0x7ffff78b34d0: movzbl -0x2d(%rdx),%r11d
  0x7ffff78b34d5: cmp    %r11d,%r8d
  0x7ffff78b34d8: jne    0x7ffff78b33c1
  0x7ffff78b34de: xchg  %ax,%ax
  0x7ffff78b34e0: jmpq  0x7ffff78b31e6
  0x7ffff78b34e5: movzbl -0x2d(%rdx),%ebp
  0x7ffff78b34e9: cmp    %ebp,%r8d
  0x7ffff78b34ec: jne    0x7ffff78b334b
  0x7ffff78b34f2: jmpq  0x7ffff78b31e6
  0x7ffff78b34f7: movzbl -0x2d(%rdx),%r10d
  0x7ffff78b34fc: cmp    %r10d,%r8d
  0x7ffff78b34ff: nop
  0x7ffff78b3500: jne    0x7ffff78b3369
  0x7ffff78b3506: jmpq  0x7ffff78b31e6
  0x7ffff78b350b: data32 xchg %ax,%ax
  0x7ffff78b350e: xchg  %ax,%ax
  0x7ffff78b3510: movzbl -0x2d(%rdx),%eax
  0x7ffff78b3514: cmp    %eax,%r8d
  0x7ffff78b3517: jne    0x7ffff78b3387
  0x7ffff78b351d: data32 xchg %ax,%ax
  0x7ffff78b3520: jmpq  0x7ffff78b31e6
  0x7ffff78b3525: movzbl -0x2d(%rdx),%r9d
  0x7ffff78b352a: cmp    %r9d,%r8d
  0x7ffff78b352d: data32 xchg %ax,%ax
  0x7ffff78b3530: jne    0x7ffff78b33a3
  0x7ffff78b3536: jmpq  0x7ffff78b31e6
  0x7ffff78b353b: mov    0x18(%rsp),%r12d
  0x7ffff78b3540: test  %r12d,%r12d
  0x7ffff78b3543: je    0x7ffff78b327f
  0x7ffff78b3549: mov    %rbx,%rdi
  0x7ffff78b354c: callq  0x7ffff78b22d0
  0x7ffff78b3551: mov    0x18(%rsp),%ebp
  0x7ffff78b3555: jmpq  0x7ffff78b327f
  0x7ffff78b355a: data32 xchg %ax,%ax
  0x7ffff78b355d: data32 xchg %ax,%ax
  0x7ffff78b3560: push  %rbx
  0x7ffff78b3561: mov    %rdx,%r9
  0x7ffff78b3564: xor    %r8d,%r8d
  0x7ffff78b3567: mov    %rcx,%rbx
  0x7ffff78b356a: mov    $0x22,%edx
  0x7ffff78b356f: mov    $0x1,%ecx
  0x7ffff78b3574: sub    $0x10,%rsp
  0x7ffff78b3578: test  %r9,%r9
  0x7ffff78b357b: je    0x7ffff78b3681
  0x7ffff78b3581: mov    %r8d,%eax
  0x7ffff78b3584:
    lock cmpxchg %ecx,0x708280(%rip)        # 0x7ffff7fbb80c
  0x7ffff78b358c: setne  %dl
  0x7ffff78b358f: test  %dl,%dl
  0x7ffff78b3591: je    0x7ffff78b35ed
  0x7ffff78b3593: mov    0x708272(%rip),%r10d        # 0x7ffff7fbb80c
  0x7ffff78b359a: test  %r10d,%r10d
  0x7ffff78b359d: je    0x7ffff78b3581
  0x7ffff78b359f: mov    0x708266(%rip),%r11d        # 0x7ffff7fbb80c
  0x7ffff78b35a6: test  %r11d,%r11d
  0x7ffff78b35a9: je    0x7ffff78b3581
  0x7ffff78b35ab: mov    0x70825b(%rip),%edx        # 0x7ffff7fbb80c
  0x7ffff78b35b1: test  %edx,%edx
  0x7ffff78b35b3: je    0x7ffff78b3581
  0x7ffff78b35b5: mov    0x708251(%rip),%eax        # 0x7ffff7fbb80c
  0x7ffff78b35bb: test  %eax,%eax
  0x7ffff78b35bd: je    0x7ffff78b3581
  0x7ffff78b35bf: mov    0x708246(%rip),%r10d        # 0x7ffff7fbb80c
  0x7ffff78b35c6: test  %r10d,%r10d
  0x7ffff78b35c9: je    0x7ffff78b3581
  0x7ffff78b35cb: mov    0x70823a(%rip),%r11d        # 0x7ffff7fbb80c
  0x7ffff78b35d2: test  %r11d,%r11d
  0x7ffff78b35d5: je    0x7ffff78b3581
  0x7ffff78b35d7: mov    0x70822f(%rip),%edx        # 0x7ffff7fbb80c
  0x7ffff78b35dd: test  %edx,%edx
  0x7ffff78b35df: je    0x7ffff78b3581
  0x7ffff78b35e1: mov    0x708225(%rip),%eax        # 0x7ffff7fbb80c
  0x7ffff78b35e7: test  %eax,%eax
  0x7ffff78b35e9: jne    0x7ffff78b3593
  0x7ffff78b35eb: jmp    0x7ffff78b3581
  0x7ffff78b35ed: mov    0x704ff4(%rip),%rcx        # 0x7ffff7fb85e8
  0x7ffff78b35f4: mov    (%rcx),%rax
  0x7ffff78b35f7: test  %rax,%rax
  0x7ffff78b35fa: jne    0x7ffff78b360e
  0x7ffff78b35fc: jmpq  0x7ffff78b3689
  0x7ffff78b3601: mov    0x30(%rax),%rax
  0x7ffff78b3605: test  %rax,%rax
  0x7ffff78b3608: je    0x7ffff78b3689
  0x7ffff78b360e: cmp    (%rax),%edi
  0x7ffff78b3610: jne    0x7ffff78b3601
  0x7ffff78b3612: mov    0x10(%rax),%rcx
  0x7ffff78b3616: test  %rcx,%rcx
  0x7ffff78b3619: jne    0x7ffff78b362b
  0x7ffff78b361b: data32 xchg %ax,%ax
  0x7ffff78b361e: xchg  %ax,%ax
  0x7ffff78b3620: jmp    0x7ffff78b3601
  0x7ffff78b3622: mov    0x10(%rcx),%rcx
  0x7ffff78b3626: test  %rcx,%rcx
  0x7ffff78b3629: je    0x7ffff78b3601
  0x7ffff78b362b: cmp    (%rcx),%esi
  0x7ffff78b362d: data32 xchg %ax,%ax
  0x7ffff78b3630: jne    0x7ffff78b3622
  0x7ffff78b3632: movl  $0x0,0x4(%rsp)
  0x7ffff78b363a: movl  $0x0,0x7081c8(%rip)        # 0x7ffff7fbb80c
  0x7ffff78b3644: mov    %rsp,%rdx
  0x7ffff78b3647: movl  $0x0,0x8(%rsp)
  0x7ffff78b364f: movl  $0x0,0xc(%rsp)
  0x7ffff78b3657: xor    %eax,%eax
  0x7ffff78b3659: mov    %r9,(%rsp)
  0x7ffff78b365d: mov    (%rcx),%edi
  0x7ffff78b365f: mov    $0xc0104652,%esi
  0x7ffff78b3664: callq  0x7ffff782ab20 <ioctl@plt>
  0x7ffff78b3669: test  %eax,%eax
  0x7ffff78b366b: mov    $0x2a,%edx
  0x7ffff78b3670: js    0x7ffff78b3681
  0x7ffff78b3672: test  %rbx,%rbx
  0x7ffff78b3675: je    0x7ffff78b367d
  0x7ffff78b3677: mov    0x8(%rsp),%esi
  0x7ffff78b367b: mov    %esi,(%rbx)
  0x7ffff78b367d: mov    0xc(%rsp),%edx
  0x7ffff78b3681: add    $0x10,%rsp
  0x7ffff78b3685: mov    %edx,%eax
  0x7ffff78b3687: pop    %rbx
  0x7ffff78b3688: retq 
  0x7ffff78b3689: mov    0x704320(%rip),%r8        # 0x7ffff7fb79b0
  0x7ffff78b3690: mov    (%r8),%rdi
  0x7ffff78b3693: mov    0x10(%rdi),%rcx
  0x7ffff78b3697: test  %rcx,%rcx
  0x7ffff78b369a: jne    0x7ffff78b36a9
  0x7ffff78b369c: jmp    0x7ffff78b36b7
  0x7ffff78b369e: xchg  %ax,%ax
  0x7ffff78b36a0: mov    0x10(%rcx),%rcx
  0x7ffff78b36a4: test  %rcx,%rcx
  0x7ffff78b36a7: je    0x7ffff78b36b7
  0x7ffff78b36a9: cmp    (%rcx),%esi
  0x7ffff78b36ab: data32 xchg %ax,%ax
  0x7ffff78b36ae: xchg  %ax,%ax
  0x7ffff78b36b0: jne    0x7ffff78b36a0
  0x7ffff78b36b2: jmpq  0x7ffff78b3632
  0x7ffff78b36b7: movl  $0x0,0x70814b(%rip)        # 0x7ffff7fbb80c
  0x7ffff78b36c1: add    $0x10,%rsp
  0x7ffff78b36c5: mov    $0x1d,%edx
  0x7ffff78b36ca: pop    %rbx
  0x7ffff78b36cb: mov    %edx,%eax
  0x7ffff78b36cd: retq 
  0x7ffff78b36ce: xchg  %ax,%ax
  0x7ffff78b36d0: push  %rbp
  0x7ffff78b36d1: mov    $0x22,%eax
  0x7ffff78b36d6: mov    %ecx,%ebp
  0x7ffff78b36d8: push  %rbx
  0x7ffff78b36d9: mov    %edx,%ebx
  0x7ffff78b36db: sub    $0x38,%rsp
  0x7ffff78b36df: test  %r9,%r9
  0x7ffff78b36e2: je    0x7ffff78b3820
  0x7ffff78b36e8: mov    (%r9),%ecx
  0x7ffff78b36eb: xor    %r11d,%r11d
  0x7ffff78b36ee: mov    $0x1,%r10d
  0x7ffff78b36f4: test  %ecx,%ecx
  0x7ffff78b36f6: jle    0x7ffff78b381b
  0x7ffff78b36fc: data32 data32 xchg %ax,%ax
  0x7ffff78b3700: mov    %r11d,%eax
  0x7ffff78b3703:
    lock cmpxchg %r10d,0x708100(%rip)        # 0x7ffff7fbb80c
  0x7ffff78b370c: setne  %dl
  0x7ffff78b370f: test  %dl,%dl
  0x7ffff78b3711: je    0x7ffff78b3765
  0x7ffff78b3713: mov    0x7080f3(%rip),%edx        # 0x7ffff7fbb80c
  0x7ffff78b3719: test  %edx,%edx
  0x7ffff78b371b: je    0x7ffff78b3700
  0x7ffff78b371d: mov    0x7080e9(%rip),%edx        # 0x7ffff7fbb80c
  0x7ffff78b3723: test  %edx,%edx
  0x7ffff78b3725: je    0x7ffff78b3700
  0x7ffff78b3727: mov    0x7080df(%rip),%eax        # 0x7ffff7fbb80c
  0x7ffff78b372d: test  %eax,%eax
  0x7ffff78b372f: je    0x7ffff78b3700
  0x7ffff78b3731: mov    0x7080d5(%rip),%edx        # 0x7ffff7fbb80c
  0x7ffff78b3737: test  %edx,%edx
  0x7ffff78b3739: je    0x7ffff78b3700
  0x7ffff78b373b: mov    0x7080cb(%rip),%eax        # 0x7ffff7fbb80c
  0x7ffff78b3741: test  %eax,%eax
  0x7ffff78b3743: je    0x7ffff78b3700
  0x7ffff78b3745: mov    0x7080c1(%rip),%edx        # 0x7ffff7fbb80c
  0x7ffff78b374b: test  %edx,%edx
  0x7ffff78b374d: je    0x7ffff78b3700
  0x7ffff78b374f: mov    0x7080b7(%rip),%eax        # 0x7ffff7fbb80c
  0x7ffff78b3755: test  %eax,%eax
  0x7ffff78b3757: je    0x7ffff78b3700
  0x7ffff78b3759: mov    0x7080ad(%rip),%edx        # 0x7ffff7fbb80c
  0x7ffff78b375f: test  %edx,%edx
  0x7ffff78b3761: jne    0x7ffff78b3713
  0x7ffff78b3763: jmp    0x7ffff78b3700
  0x7ffff78b3765: mov    0x704e7c(%rip),%r10        # 0x7ffff7fb85e8
  0x7ffff78b376c: mov    (%r10),%rax
  0x7ffff78b376f: test  %rax,%rax
  0x7ffff78b3772: jne    0x7ffff78b378d
  0x7ffff78b3774: jmpq  0x7ffff78b3827
  0x7ffff78b3779: data32 data32 xchg %ax,%ax
  0x7ffff78b377d: data32 xchg %ax,%ax
  0x7ffff78b3780: mov    0x30(%rax),%rax
  0x7ffff78b3784: test  %rax,%rax
  0x7ffff78b3787: je    0x7ffff78b3827
  0x7ffff78b378d: cmp    (%rax),%edi
  0x7ffff78b378f: nop
  0x7ffff78b3790: jne    0x7ffff78b3780
  0x7ffff78b3792: mov    0x10(%rax),%r10
  0x7ffff78b3796: test  %r10,%r10
  0x7ffff78b3799: jne    0x7ffff78b37ab
  0x7ffff78b379b: data32 xchg %ax,%ax
  0x7ffff78b379e: xchg  %ax,%ax
  0x7ffff78b37a0: jmp    0x7ffff78b3780
  0x7ffff78b37a2: mov    0x10(%r10),%r10
  0x7ffff78b37a6: test  %r10,%r10
  0x7ffff78b37a9: je    0x7ffff78b3780
  0x7ffff78b37ab: cmp    (%r10),%ecx
  0x7ffff78b37ae: xchg  %ax,%ax
  0x7ffff78b37b0: jne    0x7ffff78b37a2
  0x7ffff78b37b2: movq  $0x0,(%rsp)
  0x7ffff78b37ba: movq  $0x0,0x8(%rsp)
  0x7ffff78b37c3: xor    %eax,%eax
  0x7ffff78b37c5: movl  $0x0,0x70803d(%rip)        # 0x7ffff7fbb80c
  0x7ffff78b37cf: movq  $0x0,0x10(%rsp)
  0x7ffff78b37d8: mov    %rsp,%rdx
  0x7ffff78b37db: movq  $0x0,0x18(%rsp)
  0x7ffff78b37e4: mov    %edi,(%rsp)
  0x7ffff78b37e7: mov    %esi,0x4(%rsp)
  0x7ffff78b37eb: movq  $0x0,0x20(%rsp)
  0x7ffff78b37f4: mov    $0xc0284644,%esi
  0x7ffff78b37f9: mov    %ebx,0x8(%rsp)
  0x7ffff78b37fd: mov    %ebp,0xc(%rsp)
  0x7ffff78b3801: mov    %r8d,0x10(%rsp)
  0x7ffff78b3806: mov    0x4(%r10),%ecx
  0x7ffff78b380a: mov    (%r9),%edi
  0x7ffff78b380d: mov    %rcx,0x18(%rsp)
  0x7ffff78b3812: callq  0x7ffff782ab20 <ioctl@plt>
  0x7ffff78b3817: test  %eax,%eax
  0x7ffff78b3819: jns    0x7ffff78b386d
  0x7ffff78b381b: mov    $0x2a,%eax
  0x7ffff78b3820: add    $0x38,%rsp
  0x7ffff78b3824: pop    %rbx
  0x7ffff78b3825: pop    %rbp
  0x7ffff78b3826: retq 
  0x7ffff78b3827: mov    0x704182(%rip),%rax        # 0x7ffff7fb79b0
  0x7ffff78b382e: mov    (%rax),%r11
  0x7ffff78b3831: mov    0x10(%r11),%r10
  0x7ffff78b3835: test  %r10,%r10
  0x7ffff78b3838: jne    0x7ffff78b3849
  0x7ffff78b383a: jmp    0x7ffff78b3857
  0x7ffff78b383c: data32 data32 xchg %ax,%ax
  0x7ffff78b3840: mov    0x10(%r10),%r10
  0x7ffff78b3844: test  %r10,%r10
  0x7ffff78b3847: je    0x7ffff78b3857
  0x7ffff78b3849: cmp    (%r10),%ecx
  0x7ffff78b384c: data32 data32 xchg %ax,%ax
  0x7ffff78b3850: jne    0x7ffff78b3840
  0x7ffff78b3852: jmpq  0x7ffff78b37b2
  0x7ffff78b3857: movl  $0x0,0x707fab(%rip)        # 0x7ffff7fbb80c
  0x7ffff78b3861: add    $0x38,%rsp
  0x7ffff78b3865: mov    $0x1d,%eax
  0x7ffff78b386a: pop    %rbx
  0x7ffff78b386b: pop    %rbp
  0x7ffff78b386c: retq 
  0x7ffff78b386d: mov    0x20(%rsp),%eax
  0x7ffff78b3871: add    $0x38,%rsp
  0x7ffff78b3875: pop    %rbx
  0x7ffff78b3876: pop    %rbp
  0x7ffff78b3877: retq 
  0x7ffff78b3878: data32 data32 xchg %ax,%ax
  0x7ffff78b387c: data32 data32 xchg %ax,%ax
  0x7ffff78b3880: push  %r13
  0x7ffff78b3882: mov    $0x22,%eax
  0x7ffff78b3887: mov    %rdi,%r13
  0x7ffff78b388a: push  %r12
  0x7ffff78b388c: push  %rbp
  0x7ffff78b388d: push  %rbx
  0x7ffff78b388e: sub    $0x4e8,%rsp
  0x7ffff78b3895: test  %rdi,%rdi
  0x7ffff78b3898: je    0x7ffff78b3933
  0x7ffff78b389e: xchg  %ax,%ax
  0x7ffff78b38a0: xor    %eax,%eax
  0x7ffff78b38a2: mov    $0x1,%ecx
  0x7ffff78b38a7:
    lock cmpxchg %ecx,0x707f5d(%rip)        # 0x7ffff7fbb80c
  0x7ffff78b38af: setne  %dl
  0x7ffff78b38b2: test  %dl,%dl
  0x7ffff78b38b4: je    0x7ffff78b3941
  0x7ffff78b38ba: mov    0x707f4c(%rip),%ebx        # 0x7ffff7fbb80c
  0x7ffff78b38c0: test  %ebx,%ebx
  0x7ffff78b38c2: je    0x7ffff78b38a0
  0x7ffff78b38c4: mov    0x707f42(%rip),%ecx        # 0x7ffff7fbb80c
  0x7ffff78b38ca: test  %ecx,%ecx
  0x7ffff78b38cc: je    0x7ffff78b38a0
  0x7ffff78b38ce: mov    0x707f38(%rip),%esi        # 0x7ffff7fbb80c
  0x7ffff78b38d4: test  %esi,%esi
  0x7ffff78b38d6: je    0x7ffff78b38a0
  0x7ffff78b38d8: mov    0x707f2d(%rip),%r12d        # 0x7ffff7fbb80c
  0x7ffff78b38df: test  %r12d,%r12d
  0x7ffff78b38e2: je    0x7ffff78b38a0
  0x7ffff78b38e4: mov    0x707f22(%rip),%edx        # 0x7ffff7fbb80c
  0x7ffff78b38ea: test  %edx,%edx
  0x7ffff78b38ec: je    0x7ffff78b38a0
  0x7ffff78b38ee: mov    0x707f18(%rip),%eax        # 0x7ffff7fbb80c
  0x7ffff78b38f4: test  %eax,%eax
  0x7ffff78b38f6: je    0x7ffff78b38a0
  0x7ffff78b38f8: mov    0x707f0d(%rip),%r9d        # 0x7ffff7fbb80c
  0x7ffff78b38ff: test  %r9d,%r9d
  0x7ffff78b3902: je    0x7ffff78b38a0
  0x7ffff78b3904: mov    0x707f01(%rip),%r10d        # 0x7ffff7fbb80c
  0x7ffff78b390b: test  %r10d,%r10d
  0x7ffff78b390e: jne    0x7ffff78b38ba
  0x7ffff78b3910: jmp    0x7ffff78b38a0
  0x7ffff78b3912: mov    0x4d8(%rsp),%ecx
  0x7ffff78b3919: test  %ecx,%ecx
  0x7ffff78b391b: jne    0x7ffff78b3e97
  0x7ffff78b3921: mov    0x4d0(%rsp),%ebx
  0x7ffff78b3928: mov    %ebx,0x0(%r13)
  0x7ffff78b392c: mov    0x4d8(%rsp),%eax
  0x7ffff78b3933: add    $0x4e8,%rsp
  0x7ffff78b393a: pop    %rbx
  0x7ffff78b393b: pop    %rbp
  0x7ffff78b393c: pop    %r12
  0x7ffff78b393e: pop    %r13
  0x7ffff78b3940: retq 
  0x7ffff78b3941: mov    0x707ec1(%rip),%eax        # 0x7ffff7fbb808
  0x7ffff78b3947: test  %eax,%eax
  0x7ffff78b3949: je    0x7ffff78b3a4f
  0x7ffff78b394f: inc    %eax
  0x7ffff78b3951: movl  $0x0,0x707eb1(%rip)        # 0x7ffff7fbb80c
  0x7ffff78b395b: mov    %eax,0x707ea7(%rip)        # 0x7ffff7fbb808
  0x7ffff78b3961: lea    0x4d0(%rsp),%rdx
  0x7ffff78b3969: movl  $0x0,0x0(%r13)
  0x7ffff78b3971: movl  $0x0,0x4d0(%rsp)
  0x7ffff78b397c: mov    $0xc00c4622,%esi
  0x7ffff78b3981: xor    %eax,%eax
  0x7ffff78b3983: movl  $0x0,0x8(%rdx)
  0x7ffff78b398a: mov    0x6766f0(%rip),%edi        # 0x7ffff7f2a080
  0x7ffff78b3990: movl  $0x0,0x4d4(%rsp)
  0x7ffff78b399b: callq  0x7ffff782ab20 <ioctl@plt>
  0x7ffff78b39a0: xor    %esi,%esi
  0x7ffff78b39a2: test  %eax,%eax
  0x7ffff78b39a4: mov    $0x1,%ecx
  0x7ffff78b39a9: jns    0x7ffff78b3912
  0x7ffff78b39af: mov    %esi,%eax
  0x7ffff78b39b1:
    lock cmpxchg %ecx,0x707e53(%rip)        # 0x7ffff7fbb80c
  0x7ffff78b39b9: setne  %dl
  0x7ffff78b39bc: test  %dl,%dl
  0x7ffff78b39be: je    0x7ffff78b3a1c
  0x7ffff78b39c0: mov    0x707e45(%rip),%r8d        # 0x7ffff7fbb80c
  0x7ffff78b39c7: test  %r8d,%r8d
  0x7ffff78b39ca: je    0x7ffff78b39af
  0x7ffff78b39cc: mov    0x707e39(%rip),%r10d        # 0x7ffff7fbb80c
  0x7ffff78b39d3: test  %r10d,%r10d
  0x7ffff78b39d6: je    0x7ffff78b39af
  0x7ffff78b39d8: mov    0x707e2d(%rip),%r11d        # 0x7ffff7fbb80c
  0x7ffff78b39df: test  %r11d,%r11d
  0x7ffff78b39e2: je    0x7ffff78b39af
  0x7ffff78b39e4: mov    0x707e21(%rip),%r13d        # 0x7ffff7fbb80c
  0x7ffff78b39eb: test  %r13d,%r13d
  0x7ffff78b39ee: je    0x7ffff78b39af
  0x7ffff78b39f0: mov    0x707e16(%rip),%ebx        # 0x7ffff7fbb80c
  0x7ffff78b39f6: test  %ebx,%ebx
  0x7ffff78b39f8: je    0x7ffff78b39af
  0x7ffff78b39fa: mov    0x707e0c(%rip),%ebp        # 0x7ffff7fbb80c
  0x7ffff78b3a00: test  %ebp,%ebp
  0x7ffff78b3a02: je    0x7ffff78b39af
  0x7ffff78b3a04: mov    0x707e02(%rip),%edi        # 0x7ffff7fbb80c
  0x7ffff78b3a0a: test  %edi,%edi
  0x7ffff78b3a0c: je    0x7ffff78b39af
  0x7ffff78b3a0e: mov    0x707df7(%rip),%r8d        # 0x7ffff7fbb80c
  0x7ffff78b3a15: test  %r8d,%r8d
  0x7ffff78b3a18: jne    0x7ffff78b39c0
  0x7ffff78b3a1a: jmp    0x7ffff78b39af
  0x7ffff78b3a1c: mov    0x707de6(%rip),%esi        # 0x7ffff7fbb808
  0x7ffff78b3a22: dec    %esi
  0x7ffff78b3a24: test  %esi,%esi
  0x7ffff78b3a26: mov    %esi,0x707ddc(%rip)        # 0x7ffff7fbb808
  0x7ffff78b3a2c: je    0x7ffff78b3dcb
  0x7ffff78b3a32: movl  $0x0,0x707dd0(%rip)        # 0x7ffff7fbb80c
  0x7ffff78b3a3c: mov    $0x2a,%eax
  0x7ffff78b3a41: add    $0x4e8,%rsp
  0x7ffff78b3a48: pop    %rbx
  0x7ffff78b3a49: pop    %rbp
  0x7ffff78b3a4a: pop    %r12
  0x7ffff78b3a4c: pop    %r13
  0x7ffff78b3a4e: retq 
  0x7ffff78b3a4f: mov    0x705b9a(%rip),%rbp        # 0x7ffff7fb95f0
  0x7ffff78b3a56: mov    $0x700,%edx
  0x7ffff78b3a5b: xor    %esi,%esi
  0x7ffff78b3a5d: mov    %rbp,%rdi
  0x7ffff78b3a60: callq  0x7ffff782a990 <memset@plt>
  0x7ffff78b3a65: lea    0x700(%rbp),%rdx
  0x7ffff78b3a6c: mov    %rbp,%rax
  0x7ffff78b3a6f: movl  $0xffffffff,(%rax)
  0x7ffff78b3a75: movl  $0xffffffff,0x38(%rax)
  0x7ffff78b3a7c: movl  $0xffffffff,0x70(%rax)
  0x7ffff78b3a83: movl  $0xffffffff,0xa8(%rax)
  0x7ffff78b3a8d: movl  $0xffffffff,0xe0(%rax)
  0x7ffff78b3a97: movl  $0xffffffff,0x118(%rax)
  0x7ffff78b3aa1: movl  $0xffffffff,0x150(%rax)
  0x7ffff78b3aab: movl  $0xffffffff,0x188(%rax)
  0x7ffff78b3ab5: add    $0x1c0,%rax
  0x7ffff78b3abb: cmp    %rdx,%rax
  0x7ffff78b3abe: jne    0x7ffff78b3a6f
  0x7ffff78b3ac0: callq  0x7ffff782a290 <geteuid@plt>
  0x7ffff78b3ac5: test  %eax,%eax
  0x7ffff78b3ac7: jne    0x7ffff78b3bf7
  0x7ffff78b3acd: movzbl 0x4d1b06(%rip),%esi        # 0x7ffff7d855da
  0x7ffff78b3ad4: lea    0x1(%rsp),%rdi
  0x7ffff78b3ad9: mov    $0x3ff,%edx
  0x7ffff78b3ade: lea    0x480(%rsp),%rbp
  0x7ffff78b3ae6: mov    %sil,(%rsp)
  0x7ffff78b3aea: xor    %esi,%esi
  0x7ffff78b3aec: callq  0x7ffff782a990 <memset@plt>
  0x7ffff78b3af1: lea    0x53ee73(%rip),%rsi        # 0x7ffff7df296b
  0x7ffff78b3af8: lea    0x4d178d(%rip),%rdi        # 0x7ffff7d8528c
  0x7ffff78b3aff: movl  $0x1,0x4dc(%rsp)
  0x7ffff78b3b0a: callq  0x7ffff782a520 <fopen64@plt>
  0x7ffff78b3b0f: test  %rax,%rax
  0x7ffff78b3b12: mov    %rax,%rbx
  0x7ffff78b3b15: jne    0x7ffff78b3b40
  0x7ffff78b3b17: jmp    0x7ffff78b3b60
  0x7ffff78b3b19: data32 data32 xchg %ax,%ax
  0x7ffff78b3b1d: data32 xchg %ax,%ax
  0x7ffff78b3b20: cld   
  0x7ffff78b3b21: lea    0x4d1772(%rip),%rdi        # 0x7ffff7d8529a
  0x7ffff78b3b28: movb  $0x0,0x48f(%rsp)
  0x7ffff78b3b30: mov    $0x7,%ecx
  0x7ffff78b3b35: mov    %rbp,%rsi
  0x7ffff78b3b38: repz cmpsb %es:(%rdi),%ds:(%rsi)
  0x7ffff78b3b3a: je    0x7ffff78b3f09
  0x7ffff78b3b40: lea    0x4d175a(%rip),%rsi        # 0x7ffff7d852a1
  0x7ffff78b3b47: xor    %eax,%eax
  0x7ffff78b3b49: mov    %rbp,%rdx
  0x7ffff78b3b4c: mov    %rbx,%rdi
  0x7ffff78b3b4f: callq  0x7ffff782a400 <fscanf@plt>
  0x7ffff78b3b54: dec    %eax
  0x7ffff78b3b56: je    0x7ffff78b3b20
  0x7ffff78b3b58: mov    %rbx,%rdi
  0x7ffff78b3b5b: callq  0x7ffff782a5a0 <fclose@plt>
  0x7ffff78b3b60: lea    0x4d1746(%rip),%rdi        # 0x7ffff7d852ad
  0x7ffff78b3b67: xor    %esi,%esi
  0x7ffff78b3b69: xor    %eax,%eax
  0x7ffff78b3b6b: callq  0x7ffff782a360 <open64@plt>
  0x7ffff78b3b70: test  %eax,%eax
  0x7ffff78b3b72: mov    %eax,%ebx
  0x7ffff78b3b74: js    0x7ffff78b3ba2
  0x7ffff78b3b76: mov    %rsp,%rsi
  0x7ffff78b3b79: mov    $0x3ff,%edx
  0x7ffff78b3b7e: mov    %eax,%edi
  0x7ffff78b3b80: callq  0x7ffff782a2b0 <read@plt>
  0x7ffff78b3b85: test  %eax,%eax
  0x7ffff78b3b87: jle    0x7ffff78b4012
  0x7ffff78b3b8d: dec    %eax
  0x7ffff78b3b8f: cltq 
  0x7ffff78b3b91: cmpb  $0xa,(%rsp,%rax,1)
  0x7ffff78b3b95: je    0x7ffff78b406a
  0x7ffff78b3b9b: mov    %ebx,%edi
  0x7ffff78b3b9d: callq  0x7ffff782ab40 <close@plt>
  0x7ffff78b3ba2: cmpb  $0x0,(%rsp)
  0x7ffff78b3ba6: je    0x7ffff78b401b
  0x7ffff78b3bac: callq  0x7ffff782a2a0 <fork@plt>
  0x7ffff78b3bb1: cmp    $0xffffffffffffffff,%eax
  0x7ffff78b3bb4: mov    %eax,%edi
  0x7ffff78b3bb6: je    0x7ffff78b3f46
  0x7ffff78b3bbc: test  %eax,%eax
  0x7ffff78b3bbe: xchg  %ax,%ax
  0x7ffff78b3bc0: je    0x7ffff78b3f72
  0x7ffff78b3bc6: lea    0x4dc(%rsp),%rsi
  0x7ffff78b3bce: xor    %edx,%edx
  0x7ffff78b3bd0: callq  0x7ffff782aa70 <waitpid@plt>
  0x7ffff78b3bd5: test  %eax,%eax
  0x7ffff78b3bd7: js    0x7ffff78b3f46
  0x7ffff78b3bdd: mov    0x4dc(%rsp),%eax
  0x7ffff78b3be4: test  $0x7f,%al
  0x7ffff78b3be6: jne    0x7ffff78b3f46
  0x7ffff78b3bec: movzbl %ah,%edx
  0x7ffff78b3bef: test  %edx,%edx
  0x7ffff78b3bf1: jne    0x7ffff78b3f46
  0x7ffff78b3bf7: lea    0x400(%rsp),%rbx
  0x7ffff78b3bff: lea    0x4d16c7(%rip),%rdx        # 0x7ffff7d852cd
  0x7ffff78b3c06: mov    $0x80,%esi
  0x7ffff78b3c0b: xor    %eax,%eax
  0x7ffff78b3c0d: mov    %rbx,%rdi
  0x7ffff78b3c10: callq  0x7ffff782a740 <snprintf@plt>
  0x7ffff78b3c15: mov    $0xff,%esi
  0x7ffff78b3c1a: mov    %rbx,%rdi
  0x7ffff78b3c1d: callq  0x7ffff78b0050
  0x7ffff78b3c22: xor    %eax,%eax
  0x7ffff78b3c24: mov    $0x2,%esi
  0x7ffff78b3c29: mov    %rbx,%rdi
  0x7ffff78b3c2c: callq  0x7ffff782a360 <open64@plt>
  0x7ffff78b3c31: test  %eax,%eax
  0x7ffff78b3c33: mov    %eax,%ebp
  0x7ffff78b3c35: mov    %eax,0x676445(%rip)        # 0x7ffff7f2a080
  0x7ffff78b3c3b: js    0x7ffff78b3e0c
  0x7ffff78b3c41: lea    0x480(%rsp),%r12
memset 0x48 (72) bytes to 0 at %r12. rbx preserves 8 bytes into the struct.
r12 == 0x480(%rsp)
  0x7ffff78b3c49: xor    %esi,%esi
  0x7ffff78b3c4b: mov    $0x48,%edx
  0x7ffff78b3c50: lea    0x8(%r12),%rbx
  0x7ffff78b3c55: mov    %r12,%rdi
  0x7ffff78b3c58: callq  0x7ffff782a990 <memset@plt>
  0x7ffff78b3c5d: lea    0x4d1678(%rip),%rdi        # 0x7ffff7d852dc
  0x7ffff78b3c64: movabs $0x312e36332e353931,%rax
  0x7ffff78b3c6e: movq  $0x0,0x38(%rbx)
*(uint64_t *)r12 + 16 = 35
  0x7ffff78b3c76: movq  $0x35,0x8(%rbx)
  0x7ffff78b3c7e: movq  $0x0,0x10(%rbx)
  0x7ffff78b3c86: movq  $0x0,0x18(%rbx)
  0x7ffff78b3c8e: movq  $0x0,0x20(%rbx)
  0x7ffff78b3c96: movq  $0x0,0x28(%rbx)
  0x7ffff78b3c9e: movq  $0x0,0x30(%rbx)
  0x7ffff78b3ca6: movl  $0x0,0x480(%rsp)
  0x7ffff78b3cb1: movl  $0x0,0x484(%rsp)
*(uint64_t *)r12 + 8 = 0x312e36332e353931
  0x7ffff78b3cbc: mov    %rax,0x488(%rsp)


  0x7ffff78b3cc4: movb  $0x0,0x4c7(%rsp)
==disassembly==
  0x7ffff78b3ccc: callq  0x7ffff782a460 <getenv@plt>
These disassemblies makes use of <tt>libcuda.so.195.36.15</tt> (0867d66be617faab3782fa0ba19ec9ba, 7404990 bytes). Symbols were extracted via <tt>objdump -T</tt>.
  0x7ffff78b3cd1: test  %rax,%rax
* AMD64 ABI:
  0x7ffff78b3cd4: je    0x7ffff78b3ce0
** Integer arguments via RDI, RSI, RDX, RCX, R8 and R9, then stack
  0x7ffff78b3cd6: movsbl (%rax),%edi
** FP arguments in XMM0..XMM7, then stack
  0x7ffff78b3cd9: mov    %edi,0x480(%rsp)
** Return value in RAX
  0x7ffff78b3ce0: xor    %eax,%eax
** [[libcuda traces]]
  0x7ffff78b3ce2: mov    %r12,%rdx
  0x7ffff78b3ce5: mov    $0xc04846d2,%esi
  0x7ffff78b3cea: mov    %ebp,%edi
  0x7ffff78b3cec: callq  0x7ffff782ab20 <ioctl@plt>
  0x7ffff78b3cf1: test  %eax,%eax
  0x7ffff78b3cf3: js    0x7ffff78b3e4a
  0x7ffff78b3cf9: mov    0x704ad0(%rip),%rdx        # 0x7ffff7fb87d0
  0x7ffff78b3d00: xor    %eax,%eax
  0x7ffff78b3d02: mov    $0xc00446ca,%esi
  0x7ffff78b3d07: movl  $0x0,(%rdx)
  0x7ffff78b3d0d: mov    0x67636d(%rip),%edi        # 0x7ffff7f2a080
  0x7ffff78b3d13: callq  0x7ffff782ab20 <ioctl@plt>
  0x7ffff78b3d18: test  %eax,%eax
  0x7ffff78b3d1a: js    0x7ffff78b3e78
  0x7ffff78b3d20: mov    0x7045d9(%rip),%rbp        # 0x7ffff7fb8300
  0x7ffff78b3d27: xor    %esi,%esi
  0x7ffff78b3d29: mov    $0x600,%edx
  0x7ffff78b3d2e: mov    %rbp,%rdi
  0x7ffff78b3d31: callq  0x7ffff782a990 <memset@plt>
  0x7ffff78b3d36: movl  $0xffffffff,0x0(%rbp)
  0x7ffff78b3d3d: mov    0x67633d(%rip),%edi        # 0x7ffff7f2a080
  0x7ffff78b3d43: xor    %eax,%eax
  0x7ffff78b3d45: mov    %rbp,%rdx
  0x7ffff78b3d48: mov    $0xc60046c8,%esi
  0x7ffff78b3d4d: callq  0x7ffff782ab20 <ioctl@plt>
  0x7ffff78b3d52: test  %eax,%eax
  0x7ffff78b3d54: js    0x7ffff78b3e78</pre>


==See Also==
==See Also==
* Kernel [http://www.mjmwired.net/kernel/Documentation/ioctl-number.txt ioctl numbering] documentation
* Kernel [http://www.mjmwired.net/kernel/Documentation/ioctl-number.txt ioctl numbering] documentation
* My [[CUDA]] and [[CUBAR]] pages
* My [[CUDA]] and [[CUBAR]] pages
* I develped [[ptracer]] to get traces for this project
** Some [[CUDA traces|traces]]
[[CATEGORY: GPGPU]]
[[CATEGORY: Projects]]

Latest revision as of 22:18, 22 August 2011

Reverse engineering of the CUDA system. CUDA primarily communicates with the NVIDIA closed-source driver via several dozen undocumented ioctl()s. My open source implementation, libcudest, is located at GitHub. Sundry utilities for reverse engineering are also within this repository, though recent modifications to valgrind-mmt have rather superseded my tools.

libcudest began as a project for Hyesoon Kim's CS4803DGC at the Georgia Institute of Technology.

Driver versions

Newer drivers can be used with older CUDA versions, but the converse is not true. The "CUDA macroversion" listed below is the first CUDA release designed explicitly for use with the listed drivers.

Version CUDA macroversion Notes
195.36.15 3.0
195.36.24 3.0
195.36.31 3.0
256.22 3.1-beta
256.29 3.1-beta
256.35 3.1-beta

CUDA Environment variables

Discovered via binary analysis and a shimmed getenv(3). Effects determined via blackbox and binary analyses:

Variable Notes Documented? Effects
__RM_NO_VERSION_CHECK N Also checked by nvidia-smi
COMPUTE_PROFILE Y If set to 1, profiling will be performed. Implies CUDA_LAUNCH_BLOCKING.
COMPUTE_PROFILE_CONFIG Y Specifies a profiler configuration file. Only checked if COMPUTE_PROFILE is set.
COMPUTE_PROFILE_CSV Y If set to 1, a profiling data will be written in CSV format. Only checked if COMPUTE_PROFILE is set.
COMPUTE_PROFILE_LOG Y Specifies profiler output file (default: "./cuda_profile.log"). Only checked if COMPUTE_PROFILE is set.
CUDA_AMODEL_DLL N
CUDA_AMODEL_GPU N
CUDA_API_TRACE_PTR N
CUDA_CACHE_DISABLE Y If this is unset, the code cache will be used.
CUDA_CACHE_MAXSIZE Y
CUDA_CACHE_PATH Y If this is set, it overrides the code cache's default path of $HOME/.nv/ComputeCache
CUDA_DEVCODE_CACHE Y PTX compilation cache.
CUDA_DEVCODE_PATH Y Search path for fat binaries.
CUDA_EMULATION_MODE
CUDA_FORCE_PTX_JIT
CUDA_HEAP_RANGE Checked each time a context is created
CUDA_INJECTION64_PATH
CUDA_LAUNCH_BLOCKING Y (CUDA 3.0 Programmer's Guide, 3.2.6.1) Forces synchronization of host threads on GPU kernels.
CUDA_MEMCHECK Checked each time a context is created
CUDA_MEMORY_LOG Checked each time a context is created
CUDA_VISIBLE_DEVICES

Maps

Ordered from highest to lowest locations in x86 memory. These are architecture-, and to a lesser degree driver- and kernel version-specific. Applications and libraries can of course create many more maps than these.

  • vsyscalls. read-execute-private, very few pages, topmost area of memory, usually highest mapping
  • VDSO. read-execute-private, one page, high in memory (SYSENTER/SYSEXIT)
  • Userspace stack. read-write-private, many pages, high in memory
  • Anonymous map, 3 read-write-private pages, high in memory.
    • Possibly associated with nvidia driver's NV_STACK_SIZE stack. read-write-private, (3 * 4096 on amd64, 2 * 4096 on i686)
  • Two sets of /dev/nvidiaX maps for each bound device. Sets are usually continguous, and contain:
    • an anonymous page, read-write-private
    • several mappings of the device, having variable number of pages, all read-write-shared
  • Libraries. variable, middle of memory.
  • Userspace heap. read-write-private, many pages, low in memory
  • Application (data region). read-write-private, variable, low in memory
  • Application (text region). read-execute-private, variable, usually lowest mapping

mmap()s

offset size notes Nouveau name block range
reg_addr + 0x0000 0x2000 not mapped by libcuda PMC functional block 0x000000--0x001fff
reg_addr + 0x9000 0x1000 [Rwxs] mapped in cuInit(). first mapping. per-device. PTIMER functional block 0x009000--0x009fff
reg_addr + 0xc0a000 / 0xc0c000 0x1000 [RWxs] location is acquired from ioctl 4e PFIFO command submission interface 0xc00000--0xcfffff

ioctls

An ioctl (on x86) is 32 bits. The following definition comes from linux/asm-generic/ioctl.h in a 2.6.34 kernel:

  • Bit 31: Read?
  • Bit 30: Write?
  • Bits 29-16: Parameter size
  • Bits 15-8: Type (module)
  • Bits 7-0: Number (command)

Looking at the source of the 195.36.15 kernel driver's OS interface, we see that NVIDIA uses the standard ioctl-creation macros from ioctl.h, and can be expected to adhere to this format. The type code used (NV_IOCTL_MAGIC) is 'F' (0x46), which overlaps with the framebuffer ioctl range as registered in 2.6.34. We further see that only _IOWR() is used to declare ioctls, implying that the first two bits will always be '11'. Both of these deductions are borne out observing strace output of a CUDA process.

Code Param size Param location(s) Driver API call sites Notes
/dev/nvidiactl
0xc8

NV_ESC_CARD_INFO

0x600 (1536) anonymous page cuInit
  • Largest parameter by far.
    • Possibly scaled? Shifted 3 bits left, this is 0x3000, the size of the amd64 anonymous mapping.
    • More likely we support returning up to 32x 48-byte descriptors, and...
  • Wants the first 32 bits to be 1, all others 0.
    • ...this is most likely a mask indicating which card IDs we want information for!
typedef struct nv_ioctl_card_info
{
    NvU16    flags;               /* see below                   */
    NvU8     bus;                 /* bus number (PCI, AGP, etc)  */
    NvU8     slot;                /* card slot                   */
    NvU16    vendor_id;           /* PCI vendor id               */
    NvU16    device_id;
    NvU16    interrupt_line;
    NvU64    reg_address    NV_ALIGN_BYTES(8);
    NvU64    reg_size       NV_ALIGN_BYTES(8);
    NvU64    fb_address     NV_ALIGN_BYTES(8);
    NvU64    fb_size        NV_ALIGN_BYTES(8);
} nv_ioctl_card_info_t;
  • Returns (all subsequent bytes are 0):
0x00010001	0x0cb110de	0x00000026	0x00000000
0xf2000000	0x00000000	0x01000000	0x00000000
0xe0000000	0x00000000	0x10000000	0x00000000
  • 0x0001: flag (NV_IOCTL_CARD_INFO_FLAG_PRESENT)
  • 0x0001: bus/slot
  • 0x0cb110de: vendor + device IDs
    • lspci -n: 01:00.0 0300: 10de:0cb1 (rev a2)
    • lspci -t -v: \-[0000:00]-+-03.0-[01]--+-00.0 nVidia Corporation GT215 [GeForce GTS 360M]
  • 0x26: IRQ line (here, #38)
  • 0xf2000000 00000000: reg_address
  • 0x01000000 00000000: reg_size
  • 0xe0000000 00000000: fb_address
  • 0x10000000 00000000: fb_size
    • these are all system memory references, see /proc/iomem:
  e0000000-f30fffff : PCI Bus 0000:01
    e0000000-efffffff : 0000:01:00.0
    f0000000-f1ffffff : 0000:01:00.0
    f2000000-f2ffffff : 0000:01:00.0
      f2000000-f2ffffff : nvidia
    f3000000-f307ffff : 0000:01:00.0
    f3080000-f3083fff : 0000:01:00.1
      f3080000-f3083fff : ICH HD audio
0xca

NV_ESC_ENV_INFO

0x004 anonymous page cuInit
  • Seems to ignore input value.
  • Writes result value (0x00000001).
typedef struct nv_ioctl_env_info
{
    NvU32 pat_supported;
} nv_ioctl_env_info_t;
0xce

NV_ESC_ALLOC_OS_EVENT

0x14
0xcf

NV_ESC_FREE_OS_EVENT

0xd1

NV_ESC_STATUS_CODE

0xd2

NV_ESC_CHECK_VERSION_STR

0x048 stack cuInit
  • Performed immediately following opening of the nvidiactl device
typedef struct nv_ioctl_rm_api_version
{
    NvU32 cmd;
    NvU32 reply;
    char versionString[NV_RM_API_VERSION_STRING_LENGTH];
} nv_ioctl_rm_api_version_t;

#define NV_RM_API_VERSION_CMD_STRICT         0
#define NV_RM_API_VERSION_CMD_RELAXED       '1'
#define NV_RM_API_VERSION_CMD_OVERRIDE      '2'

#define NV_RM_API_VERSION_REPLY_UNRECOGNIZED 0
#define NV_RM_API_VERSION_REPLY_RECOGNIZED   1
  • 0x312e 3633 2e35 3931 35ull == 195.36.15
    • '1' '.' '6' '3' '.' '5' '9' '1', '5'
    • looks like: all version chars in ascii. first 8 reversed, then any left follow?
  • All other bytes are 0.
  • Writes result to first 8 bytes (0x00000001), leaves others untouched
0x22 0x00c stack cuInit
  • Inputs set to 0.
  • Outputs (example):
3251635025	65	0
  • First value is used as first input word to the majority of subsequent ioctls
  • Second value ranges over (at least) 41--65...
  • Not sent in 256.22/3.10...
0x2a 0x020 stack cuInit
  • GPU method invocation. Second and third words specify the method being called. Fifth and sixth specify the address being passed; seventh and eighth the size thereof.

Sample inputs:

0x7fffffffd310:	3251635025	3251635025	533	0
0x7fffffffd320:	4294955888	32767	132	0
  • First and second words are *not* always equivalent.
  • Outputs are usually unchanged, but not always:
ioctl 2a, 32-byte param, fd 3	0xc1d04214 0x5c000002 0x2080012f 0x00000000 
0x0010				0x950713f0 0x00007fff 0x000000a8 0x00000000 
GPU method 0x5c000002:2080012f	0x00000000 0x00000000 0x00000000 0x00000000 
0x0010				0x00000000 0x00000000 0x00000000 0x00000000 
0x0020				0x00000000 0x00000000 0x00000000 0x00000000 
0x0030				0x00000000 0x00000000 0x00000000 0x00000000 
0x0040				0x00000000 0x00000000 0x00000000 0x00000000 
0x0050				0x00000000 0x00000000 0x00000000 0x00000000 
0x0060				0x00000000 0x00000000 0x00000000 0x00000000 
0x0070				0x00000000 0x00000000 0x00000000 0x00000000 
0x0080				0x00000000 0x00000000 0x00000000 0x00000000 
0x0090				0x00000000 0x00000000 0x00000000 0x00000000 
0x00a0				0x00000000 0x00000000 
RESULT: 0			0xc1d04214 0x5c000002 0x2080012f 0x00000000 
0x0010				0x950713f0 0x00007fff 0x000000a8 0x00000029 
GPU method 0x5c000002:2080012f	**************MODIFICATION FROM CALL
0x00000000 0x00000000 0x00000000 0x00000000 
0x0010				0x00000000 0x00000000 0x00000000 0x00000000 
0x0020				0x00000000 0x00000000 0x00000000 0x00000000 
0x0030				0x00000000 0x00000000 0x00000000 0x00000000 
0x0040				0x00000000 0x00000000 0x00000000 0x00000000 
0x0050				0x00000000 0x00000000 0x00000000 0x00000000 
0x0060				0x00000000 0x00000000 0x00000000 0x00000000 
0x0070				0x00000000 0x00000000 0x00000000 0x00000000 
0x0080				0x00000000 0x00000000 0x00000000 0x00000000 
0x0090				0x00000000 0x00000000 0x00000000 0x00000000 
0x00a0				0x00000000 0x00000000 
0x2b 0x020 stack cuInit
  • GPU object creation(?)
0x4d 0x048 stack cuInit
  • Performed following opening of nvidiaX device
0x2d 0x014 stack cuInit
  • Performed following read of /proc/interrupts
0x4e 0x030 cuInit
  • Immediately prior to first mmap()
0x4f 0x020 cuInit
  • Invoked if mmap() returns MAP_FAILED, prior to failing out
0x54 0x30
0x57 0x038
0x58 0x28
0x59 0x10
/dev/nvidiaX
0x32 0x014 stack cuInit
  • Performed several times in succession
0x37 0x020 stack cuInit
  • Follows burst of 3x 0x32's, then interwoven with bursts of 2a's

GPU methods

Code Param size Notes
0x5c000002 (per-device)
0x20800110 0x84
  • Retrieves device name:
RESULT: 0			0xc1d04277 0x5c000002 0x20800110 0x00000000 
0x0010				0x73be4970 0x00007fff 0x00000084 0x00000000 
GPU method 0x5c000002:20800110	0x00000000 0x6f466547 0x20656372 0x20535447 
0x0010				0x4d303633 0x00000000 0x00000000 0x00000000 
  • 6f46654720656372205354474d303633 == "oFeG ecr STGM063"

disassembly

These disassemblies makes use of libcuda.so.195.36.15 (0867d66be617faab3782fa0ba19ec9ba, 7404990 bytes). Symbols were extracted via objdump -T.

  • AMD64 ABI:
    • Integer arguments via RDI, RSI, RDX, RCX, R8 and R9, then stack
    • FP arguments in XMM0..XMM7, then stack
    • Return value in RAX
    • libcuda traces

See Also