Check out my first novel, midnight's simulacra!
CUBAR: Difference between revisions
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
[[CUDA]] (and [http://en.wikipedia.org/wiki/GPGPU General-Purpose Graphics Processing Unit] programming in general) is rapidly becoming a mainstay of high-performance computing. As CUDA and OpenCL move off of the workstation, and into the server -- off of the console, and into the cluster -- the security of these systems will become critical parts of the associated [http://en.wikipedia.org/wiki/Trusted_computing_base trusted computing base]. Even ignoring the issue of multiuser security, the properties of isolation and (to a lesser extent) confidentiality are important for debugging, profiling and reproducibility. I've authored cudafucker and associated tools to investigate the security properties -- primarily the means and parameters of memory protection, and the division of protection between soft- and hardware -- of [[CUDA]] on NVIDIA hardware since the G80 architecture. | [[CUDA]] (and [http://en.wikipedia.org/wiki/GPGPU General-Purpose Graphics Processing Unit] programming in general) is rapidly becoming a mainstay of high-performance computing. As CUDA and OpenCL move off of the workstation, and into the server -- off of the console, and into the cluster -- the security of these systems will become critical parts of the associated [http://en.wikipedia.org/wiki/Trusted_computing_base trusted computing base]. Even ignoring the issue of multiuser security, the properties of isolation and (to a lesser extent) confidentiality are important for debugging, profiling and reproducibility. I've authored cudafucker and associated tools to investigate the security properties -- primarily the means and parameters of memory protection, and the division of protection between soft- and hardware -- of [[CUDA]] on NVIDIA hardware since the G80 architecture. | ||
==Questions== | |||
===Memory details=== | |||
* What address translations, if any, are performed? | |||
** If address translation is performed, can physical memory be multiply aliased? | |||
* How are accesses affected by use of incorrect state space affixes? | |||
** Compute Capability 2.0 introduces unified addressing, but still supports modal addressing | |||
* How do physical addresses correspond to distinct memory regions? | |||
===Driver details=== | |||
* Is a CUDA context a true [http://en.wikipedia.org/wiki/Capability-based_security security capability]? | |||
** Can a process modify details of the contexts it creates? | |||
** Can a process transmit its contexts to another? Will they persist if the originating process exits? | |||
** Can a process forge another process's contexts on its own? | |||
===Protection=== | |||
* What mechanisms, if any, exist to protect memory? At what granularities (of address and access) do they operate? | |||
* How is memory protection split across hardware, kernelspace, and userspace? | |||
** Any userspace protection can, of course, be trivially subverted | |||
* Are code and data memories separated (a [http://en.wikipedia.org/wiki/Harvard_architecture Harvard architecture]), or unified ([http://en.wikipedia.org/wiki/Von_Neumann_architecture Von Neumann architecture])? | |||
** In the case of Von Neumann or [http://en.wikipedia.org/wiki/Modified_Harvard_architecture Modified Harvard], is there [http://en.wikipedia.org/wiki/Executable_space_protection execution protection]? | |||
* What memories, if any, are scrubbed between kernels' execution? | |||
* How many different regions can be tracked? How many contexts? What behavior exists at these limits? | |||
===Variation=== | |||
* Have these mechanisms changed over various hardware? | |||
** The "Fermi" hardware (Compute Capability 2.0) adds unified addressing and caches for global memory. Effects? | |||
* Have these mechanisms changed over the course of various driver releases? | |||
* Open source efforts (particularly the [http://nouveau.freedesktop.org/wiki/ nouveau project]) are working on their own drivers. | |||
** What all needs be addressed by these softwares? | |||
* How is the situation affected by multiple devices, whether in an [http://en.wikipedia.org/wiki/Scalable_Link_Interface SLI]/[http://en.wikipedia.org/wiki/ATI_CrossFire CrossFire] setup or not? | |||
===Accountability=== | |||
* What forensic data, if any, is created by typical CUDA programs? Adversarial programs? Broken programs? | |||
* What relationship exists between CPU processes and GPU kernels? | |||
==Experiments== | |||
===Memory space exploration=== | |||
Probe memory via attempts to read, write and execute various addresses, including: | |||
* those unallocated within the probing context, | |||
* those unallocated by any running context, and | |||
* those unallocated by any existing context. | |||
===Context exploration=== | |||
Determine whether CUDA contexts can be moved or shared between processes: | |||
* <tt>fork(2)</tt> and execute <tt>cudaAlloc(3)</tt> without creating a new context | |||
** If this works, see whether the change is reflected in the parent binary | |||
** Ensure that PPID isn't just being checked (dubious, but possible) by <tt>fork(2)</tt>ing twice | |||
* Transmit the CUcontext body to another process via IPC or the filesystem, and repeat the tests |
Revision as of 10:24, 11 April 2010
CUDA (and General-Purpose Graphics Processing Unit programming in general) is rapidly becoming a mainstay of high-performance computing. As CUDA and OpenCL move off of the workstation, and into the server -- off of the console, and into the cluster -- the security of these systems will become critical parts of the associated trusted computing base. Even ignoring the issue of multiuser security, the properties of isolation and (to a lesser extent) confidentiality are important for debugging, profiling and reproducibility. I've authored cudafucker and associated tools to investigate the security properties -- primarily the means and parameters of memory protection, and the division of protection between soft- and hardware -- of CUDA on NVIDIA hardware since the G80 architecture.
Questions
Memory details
- What address translations, if any, are performed?
- If address translation is performed, can physical memory be multiply aliased?
- How are accesses affected by use of incorrect state space affixes?
- Compute Capability 2.0 introduces unified addressing, but still supports modal addressing
- How do physical addresses correspond to distinct memory regions?
Driver details
- Is a CUDA context a true security capability?
- Can a process modify details of the contexts it creates?
- Can a process transmit its contexts to another? Will they persist if the originating process exits?
- Can a process forge another process's contexts on its own?
Protection
- What mechanisms, if any, exist to protect memory? At what granularities (of address and access) do they operate?
- How is memory protection split across hardware, kernelspace, and userspace?
- Any userspace protection can, of course, be trivially subverted
- Are code and data memories separated (a Harvard architecture), or unified (Von Neumann architecture)?
- In the case of Von Neumann or Modified Harvard, is there execution protection?
- What memories, if any, are scrubbed between kernels' execution?
- How many different regions can be tracked? How many contexts? What behavior exists at these limits?
Variation
- Have these mechanisms changed over various hardware?
- The "Fermi" hardware (Compute Capability 2.0) adds unified addressing and caches for global memory. Effects?
- Have these mechanisms changed over the course of various driver releases?
- Open source efforts (particularly the nouveau project) are working on their own drivers.
- What all needs be addressed by these softwares?
- How is the situation affected by multiple devices, whether in an SLI/CrossFire setup or not?
Accountability
- What forensic data, if any, is created by typical CUDA programs? Adversarial programs? Broken programs?
- What relationship exists between CPU processes and GPU kernels?
Experiments
Memory space exploration
Probe memory via attempts to read, write and execute various addresses, including:
- those unallocated within the probing context,
- those unallocated by any running context, and
- those unallocated by any existing context.
Context exploration
Determine whether CUDA contexts can be moved or shared between processes:
- fork(2) and execute cudaAlloc(3) without creating a new context
- If this works, see whether the change is reflected in the parent binary
- Ensure that PPID isn't just being checked (dubious, but possible) by fork(2)ing twice
- Transmit the CUcontext body to another process via IPC or the filesystem, and repeat the tests