Check out my first novel, midnight's simulacra!

CUBAR: Difference between revisions

From dankwiki
No edit summary
No edit summary
Line 1: Line 1:
[[CUDA]] (and [http://en.wikipedia.org/wiki/GPGPU General-Purpose Graphics Processing Unit] programming in general) is rapidly becoming a mainstay of high-performance computing. As CUDA and OpenCL move off of the workstation, and into the server -- off of the console, and into the cluster -- the security of these systems will become critical parts of the associated [http://en.wikipedia.org/wiki/Trusted_computing_base trusted computing base]. Even ignoring the issue of multiuser security, the properties of isolation and (to a lesser extent) confidentiality are important for debugging, profiling and reproducibility. I've authored cudafucker and associated tools to investigate the security properties -- primarily the means and parameters of memory protection, and the division of protection between soft- and hardware -- of [[CUDA]] on NVIDIA hardware since the G80 architecture.
[[CUDA]] (and [http://en.wikipedia.org/wiki/GPGPU General-Purpose Graphics Processing Unit] programming in general) is rapidly becoming a mainstay of high-performance computing. As CUDA and OpenCL move off of the workstation, and into the server -- off of the console, and into the cluster -- the security of these systems will become critical parts of the associated [http://en.wikipedia.org/wiki/Trusted_computing_base trusted computing base]. Even ignoring the issue of multiuser security, the properties of isolation and (to a lesser extent) confidentiality are important for debugging, profiling and reproducibility. I've authored cudafucker and associated tools to investigate the security properties -- primarily the means and parameters of memory protection, and the division of protection between soft- and hardware -- of [[CUDA]] on NVIDIA hardware since the G80 architecture.
==Questions==
===Memory details===
* What address translations, if any, are performed?
** If address translation is performed, can physical memory be multiply aliased?
* How are accesses affected by use of incorrect state space affixes?
** Compute Capability 2.0 introduces unified addressing, but still supports modal addressing
* How do physical addresses correspond to distinct memory regions?
===Driver details===
* Is a CUDA context a true [http://en.wikipedia.org/wiki/Capability-based_security security capability]?
** Can a process modify details of the contexts it creates?
** Can a process transmit its contexts to another? Will they persist if the originating process exits?
** Can a process forge another process's contexts on its own?
===Protection===
* What mechanisms, if any, exist to protect memory? At what granularities (of address and access) do they operate?
* How is memory protection split across hardware, kernelspace, and userspace?
** Any userspace protection can, of course, be trivially subverted
* Are code and data memories separated (a [http://en.wikipedia.org/wiki/Harvard_architecture Harvard architecture]), or unified ([http://en.wikipedia.org/wiki/Von_Neumann_architecture Von Neumann architecture])?
** In the case of Von Neumann or [http://en.wikipedia.org/wiki/Modified_Harvard_architecture Modified Harvard], is there [http://en.wikipedia.org/wiki/Executable_space_protection execution protection]?
* What memories, if any, are scrubbed between kernels' execution?
* How many different regions can be tracked? How many contexts? What behavior exists at these limits?
===Variation===
* Have these mechanisms changed over various hardware?
** The "Fermi" hardware (Compute Capability 2.0) adds unified addressing and caches for global memory. Effects?
* Have these mechanisms changed over the course of various driver releases?
* Open source efforts (particularly the [http://nouveau.freedesktop.org/wiki/ nouveau project]) are working on their own drivers.
** What all needs be addressed by these softwares?
* How is the situation affected by multiple devices, whether in an [http://en.wikipedia.org/wiki/Scalable_Link_Interface SLI]/[http://en.wikipedia.org/wiki/ATI_CrossFire CrossFire] setup or not?
===Accountability===
* What forensic data, if any, is created by typical CUDA programs? Adversarial programs? Broken programs?
* What relationship exists between CPU processes and GPU kernels?
==Experiments==
===Memory space exploration===
Probe memory via attempts to read, write and execute various addresses, including:
* those unallocated within the probing context,
* those unallocated by any running context, and
* those unallocated by any existing context.
===Context exploration===
Determine whether CUDA contexts can be moved or shared between processes:
* <tt>fork(2)</tt> and execute <tt>cudaAlloc(3)</tt> without creating a new context
** If this works, see whether the change is reflected in the parent binary
** Ensure that PPID isn't just being checked (dubious, but possible) by <tt>fork(2)</tt>ing twice
* Transmit the CUcontext body to another process via IPC or the filesystem, and repeat the tests

Revision as of 10:24, 11 April 2010

CUDA (and General-Purpose Graphics Processing Unit programming in general) is rapidly becoming a mainstay of high-performance computing. As CUDA and OpenCL move off of the workstation, and into the server -- off of the console, and into the cluster -- the security of these systems will become critical parts of the associated trusted computing base. Even ignoring the issue of multiuser security, the properties of isolation and (to a lesser extent) confidentiality are important for debugging, profiling and reproducibility. I've authored cudafucker and associated tools to investigate the security properties -- primarily the means and parameters of memory protection, and the division of protection between soft- and hardware -- of CUDA on NVIDIA hardware since the G80 architecture.

Questions

Memory details

  • What address translations, if any, are performed?
    • If address translation is performed, can physical memory be multiply aliased?
  • How are accesses affected by use of incorrect state space affixes?
    • Compute Capability 2.0 introduces unified addressing, but still supports modal addressing
  • How do physical addresses correspond to distinct memory regions?

Driver details

  • Is a CUDA context a true security capability?
    • Can a process modify details of the contexts it creates?
    • Can a process transmit its contexts to another? Will they persist if the originating process exits?
    • Can a process forge another process's contexts on its own?

Protection

  • What mechanisms, if any, exist to protect memory? At what granularities (of address and access) do they operate?
  • How is memory protection split across hardware, kernelspace, and userspace?
    • Any userspace protection can, of course, be trivially subverted
  • Are code and data memories separated (a Harvard architecture), or unified (Von Neumann architecture)?
  • What memories, if any, are scrubbed between kernels' execution?
  • How many different regions can be tracked? How many contexts? What behavior exists at these limits?

Variation

  • Have these mechanisms changed over various hardware?
    • The "Fermi" hardware (Compute Capability 2.0) adds unified addressing and caches for global memory. Effects?
  • Have these mechanisms changed over the course of various driver releases?
  • Open source efforts (particularly the nouveau project) are working on their own drivers.
    • What all needs be addressed by these softwares?
  • How is the situation affected by multiple devices, whether in an SLI/CrossFire setup or not?

Accountability

  • What forensic data, if any, is created by typical CUDA programs? Adversarial programs? Broken programs?
  • What relationship exists between CPU processes and GPU kernels?

Experiments

Memory space exploration

Probe memory via attempts to read, write and execute various addresses, including:

  • those unallocated within the probing context,
  • those unallocated by any running context, and
  • those unallocated by any existing context.

Context exploration

Determine whether CUDA contexts can be moved or shared between processes:

  • fork(2) and execute cudaAlloc(3) without creating a new context
    • If this works, see whether the change is reflected in the parent binary
    • Ensure that PPID isn't just being checked (dubious, but possible) by fork(2)ing twice
  • Transmit the CUcontext body to another process via IPC or the filesystem, and repeat the tests