Check out my first novel, midnight's simulacra!
Libtorque: Difference between revisions
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
My project for Professor [http://vuduc.org/ Rich Vuduc's] Fall 2009 [[High Performance Parallel Computing|CSE6230]], libtorque is a multithreaded event library for UNIX designed to take full advantage of the manycore, heterogenous, [[NUMA]] future. Previous, non-threaded event libraries include [http://www.monkey.org/~provos/libevent/ libevent], [http://software.schmorp.de/pkg/libev.html libev] and [http://liboop.ofb.net liboop]. My [http://dank.qemfd.net/tabpower/cse6230proposal.pdf project proposal] suggests motivation for libtorque: I believe it necessary to take scheduling and memory-placement decisions into account to most optimally handle events, especially on manycore machines and ''especially'' to handle unexpected traffic sets (denial of service attacks, oversubscribed pipes, mixed-latency connections, etc). | My project for Professor [http://vuduc.org/ Rich Vuduc's] Fall 2009 [[High Performance Parallel Computing|CSE6230]], <tt>libtorque</tt> is a multithreaded event library for UNIX designed to take full advantage of the manycore, heterogenous, [[NUMA]] future. Previous, non-threaded event libraries include [http://www.monkey.org/~provos/libevent/ libevent], [http://software.schmorp.de/pkg/libev.html libev] and [http://liboop.ofb.net liboop]. My [http://dank.qemfd.net/tabpower/cse6230proposal.pdf project proposal] suggests motivation for <tt>libtorque</tt>: I believe it necessary to take scheduling and memory-placement decisions into account to most optimally handle events, especially on manycore machines and ''especially'' to handle unexpected traffic sets (denial of service attacks, oversubscribed pipes, mixed-latency connections, etc). | ||
==Resources== | ==Resources== | ||
* [[git]] hosting from [http://github.com GitHub]: | * [[git]] hosting from [http://github.com GitHub]: | ||
Line 9: | Line 9: | ||
* 2009-12-10: CSE 6230 due date | * 2009-12-10: CSE 6230 due date | ||
==Design/Functionality== | ==Design/Functionality== | ||
libtorque exposes an affinity-managing, continuations-based, architecture-aware, [[pthreads|multithreaded]] scheduler and various utility functions, including an architecture-, OS-, and thread-aware [[allocators|allocator]] with strong scheduler feedbacks. It can analyze arbitrary object code via <tt>libdl</tt> and <tt>libelf</tt>, discover where instructions live, and allocate around those areas. It can take dynamically balanced or (static) asymmetric interrupt loads into account. By making service decisions and allocations based on whole-system effects, libtorque provides low latency and high throughput under even pedantically asymmetric, irregular loads. By carefully distributing edge-triggered event descriptors among various processors' notification sets, highly scalable use is made of advanced low-level primitives such as <tt>kqueue</tt> and <tt>epoll</tt>. Through the use of lazy asynchronous I/O, each (expensive!) core is kept busy doing real work. | <tt>libtorque</tt> exposes an affinity-managing, continuations-based, architecture-aware, [[pthreads|multithreaded]] scheduler and various utility functions, including an architecture-, OS-, and thread-aware [[allocators|allocator]] with strong scheduler feedbacks. It can analyze arbitrary object code via <tt>libdl</tt> and <tt>libelf</tt>, discover where instructions live, and allocate around those areas. It can take dynamically balanced or (static) asymmetric interrupt loads into account. By making service decisions and allocations based on whole-system effects, <tt>libtorque</tt> provides low latency and high throughput under even pedantically asymmetric, irregular loads. By carefully distributing edge-triggered event descriptors among various processors' notification sets, highly scalable use is made of advanced low-level primitives such as <tt>kqueue</tt> and <tt>epoll</tt>. Through the use of lazy asynchronous I/O, each (expensive!) core is kept busy doing real work. | ||
===System discovery=== | ===System discovery=== |
Revision as of 11:55, 31 October 2009
My project for Professor Rich Vuduc's Fall 2009 CSE6230, libtorque is a multithreaded event library for UNIX designed to take full advantage of the manycore, heterogenous, NUMA future. Previous, non-threaded event libraries include libevent, libev and liboop. My project proposal suggests motivation for libtorque: I believe it necessary to take scheduling and memory-placement decisions into account to most optimally handle events, especially on manycore machines and especially to handle unexpected traffic sets (denial of service attacks, oversubscribed pipes, mixed-latency connections, etc).
Resources
- git hosting from GitHub:
- Available from the dankamongmen/libtorque project page
- git clone from git://github.com/dankamongmen/libtorque.git
- bugzilla, hosted here on http://dank.qemfd.net/bugzilla/
Milestones
- 2009-11-19: CSE 6230 checkpoint
- 2009-12-10: CSE 6230 due date
Design/Functionality
libtorque exposes an affinity-managing, continuations-based, architecture-aware, multithreaded scheduler and various utility functions, including an architecture-, OS-, and thread-aware allocator with strong scheduler feedbacks. It can analyze arbitrary object code via libdl and libelf, discover where instructions live, and allocate around those areas. It can take dynamically balanced or (static) asymmetric interrupt loads into account. By making service decisions and allocations based on whole-system effects, libtorque provides low latency and high throughput under even pedantically asymmetric, irregular loads. By carefully distributing edge-triggered event descriptors among various processors' notification sets, highly scalable use is made of advanced low-level primitives such as kqueue and epoll. Through the use of lazy asynchronous I/O, each (expensive!) core is kept busy doing real work.
System discovery
- Full support for CPUID as most recently defined by Intel and AMD (more advanced, as of 2009-10-31, than x86info)
- Full support for Linux and FreeBSD's native cpuset libraries, and SGI's libcpuset and libNUMA
- Discovers and makes available, for each processor type:
- ISA, ISA-specific capabilities, and number of concurrent threads supported (degrees of SMT)
- Line count, associativity, line length, geometry, and type of all caches
- Entry count, associativity, page size and type of all TLBs
- Inclusiveness relationships among cache and TLB levels
- Interconnection topology, APIC ID's, and how caches are shared among processors
- More: properties of hardware prefetching, ability to support non-temporal loads (MOVNTDQA, PREFETCHNTA, etc)
- Discovers and makes available, for each memory node type:
- Connected processor groups and relative distance information
- Number of pages and bank geometry
- More: OS page prefetching policy, error-recovery info
References/Prior Art
- Philip Mucci's "Linux Multicore Performance Analysis and Optimization in a Nutshell", delivered at NOTUR 2009
- Elmeleegy et al's "Lazy Asynchronous I/O", USENIX 2004
- PGAS: Kathy Yelick's "Performance and Productivity Opportunities using Global Address Space Programming Models", 2006
- Emery Berger's Hoard and other manycore-capable allocators (libumem aka magazined slab, Google's ctmalloc, etc).
- A lot of discursive theorizing/ruminating is captured on my Fast UNIX Servers page
Phatty ASCII Logo
888 ,e, 888 d8 "...tear the roof off the sucka..." 888 " 888 88e d88 e88 88e 888,8, e88 888 8888 8888 ,e e, 888 888 888 888b d88888 d888 888b 888 " d888 888 8888 8888 d88 88b 888 888 888 888P 888 Y888 888P 888 Y888 888 Y888 888P 888 , 888 888 888 88" 888 "88 88" 888 "88 888 "88 88" "YeeP" _____________________________________________ 888 _________________ continuation-based unix i/o for manycore numa\888/© nick black 2009