Check out my first novel, midnight's simulacra!
CUBAR
CUDA (and General-Purpose Graphics Processing Unit programming in general) is rapidly becoming a mainstay of high-performance computing. As CUDA and OpenCL move off of the workstation, and into the server -- off of the console, and into the cluster -- the security of these systems will become critical parts of the associated trusted computing base. Even ignoring the issue of multiuser security, the properties of isolation and (to a lesser extent) confidentiality are important for debugging, profiling and reproducibility. I've authored cudafucker and associated tools to investigate the security properties -- primarily the means and parameters of memory protection, and the division of protection between soft- and hardware -- of CUDA on NVIDIA hardware since the G80 architecture.
Questions
Memory details
- What address translations, if any, are performed?
- If address translation is performed, can physical memory be multiply aliased?
- How are accesses affected by use of incorrect state space affixes?
- Compute Capability 2.0 introduces unified addressing, but still supports modal addressing
- How do physical addresses correspond to distinct memory regions?
Driver details
- Is a CUDA context a true security capability?
- Can a process modify details of the contexts it creates?
- Can a process transmit its contexts to another? Will they persist if the originating process exits?
- Can a process forge another process's contexts on its own?
Protection
- What mechanisms, if any, exist to protect memory? At what granularities (of address and access) do they operate?
- How is memory protection split across hardware, kernelspace, and userspace?
- Any userspace protection can, of course, be trivially subverted
- Are code and data memories separated (a Harvard architecture), or unified (Von Neumann architecture)?
- In the case of Von Neumann or Modified Harvard, is there execution protection?
- What memories, if any, are scrubbed between kernels' execution?
- How many different regions can be tracked? How many contexts? What behavior exists at these limits?
Variation
- Have these mechanisms changed over various hardware?
- The "Fermi" hardware (Compute Capability 2.0) adds unified addressing and caches for global memory. Effects?
- Have these mechanisms changed over the course of various driver releases?
- Open source efforts (particularly the nouveau project) are working on their own drivers.
- What all needs be addressed by these softwares?
- How is the situation affected by multiple devices, whether in an SLI/CrossFire setup or not?
Accountability
- What forensic data, if any, is created by typical CUDA programs? Adversarial programs? Broken programs?
- What relationship exists between CPU processes and GPU kernels?
Experiments
Memory space exploration
Probe memory via attempts to read, write and execute various addresses, including:
- those unallocated within the probing context,
- those unallocated by any running context, and
- those unallocated by any existing context.
Context exploration
Determine whether CUDA contexts can be moved or shared between processes:
- fork(2) and execute cudaAlloc(3) without creating a new context
- If this works, see whether the change is reflected in the parent binary
- Ensure that PPID isn't just being checked (dubious, but possible) by fork(2)ing twice
- Transmit the CUcontext body to another process via IPC or the filesystem, and repeat the tests
Determine how many contexts can be created across a process, and across the system.