Check out my first novel, midnight's simulacra!

CUBAR

From dankwiki

CUDA (and General-Purpose Graphics Processing Unit programming in general) is rapidly becoming a mainstay of high-performance computing. As CUDA and OpenCL move off of the workstation, and into the server -- off of the console, and into the cluster -- the security of these systems will become critical parts of the associated trusted computing base. Even ignoring the issue of multiuser security, the properties of isolation and (to a lesser extent) confidentiality are important for debugging, profiling and reproducibility. I've authored cudafucker and associated tools to investigate the security properties -- primarily the means and parameters of memory protection, and the division of protection between soft- and hardware -- of CUDA on NVIDIA hardware since the G80 architecture.

Questions

Memory details

  • What address translations, if any, are performed?
    • If address translation is performed, can physical memory be multiply aliased?
  • How are accesses affected by use of incorrect state space affixes?
    • Compute Capability 2.0 introduces unified addressing, but still supports modal addressing
  • How do physical addresses correspond to distinct memory regions?

Driver details

  • Is a CUDA context a true security capability?
    • Can a process modify details of the contexts it creates?
    • Can a process transmit its contexts to another? Will they persist if the originating process exits?
    • Can a process forge another process's contexts on its own?

Protection

  • What mechanisms, if any, exist to protect memory? At what granularities (of address and access) do they operate?
  • How is memory protection split across hardware, kernelspace, and userspace?
    • Any userspace protection can, of course, be trivially subverted
  • Are code and data memories separated (a Harvard architecture), or unified (Von Neumann architecture)?
  • What memories, if any, are scrubbed between kernels' execution?
  • How many different regions can be tracked? How many contexts? What behavior exists at these limits?

Variation

  • Have these mechanisms changed over various hardware?
    • The "Fermi" hardware (Compute Capability 2.0) adds unified addressing and caches for global memory. Effects?
  • Have these mechanisms changed over the course of various driver releases?
  • Open source efforts (particularly the nouveau project) are working on their own drivers.
    • What all needs be addressed by these softwares?
  • How is the situation affected by multiple devices, whether in an SLI/CrossFire setup or not?

Accountability

  • What forensic data, if any, is created by typical CUDA programs? Adversarial programs? Broken programs?
  • What relationship exists between CPU processes and GPU kernels?

Experiments

Memory space exploration

Probe memory via attempts to read, write and execute various addresses, including:

  • those unallocated within the probing context,
  • those unallocated by any running context, and
  • those unallocated by any existing context.

Context exploration

Determine whether CUDA contexts can be moved or shared between processes:

  • fork(2) and execute cudaAlloc(3) without creating a new context
    • If this works, see whether the change is reflected in the parent binary
    • Ensure that PPID isn't just being checked (dubious, but possible) by fork(2)ing twice
  • Transmit the CUcontext body to another process via IPC or the filesystem, and repeat the tests

Determine how many contexts can be created across a process, and across the system.