Check out my first novel, midnight's simulacra!
Fast UNIX Servers: Difference between revisions
From dankwiki
Line 24: | Line 24: | ||
* <tt>aio_</tt> and friends for aysnchronous i/o | * <tt>aio_</tt> and friends for aysnchronous i/o | ||
* <tt>mmap(2)</tt> and an entire associated bag of tricks ('''FIXME detail''') | * <tt>mmap(2)</tt> and an entire associated bag of tricks ('''FIXME detail''') | ||
** most uses of <tt>mincore(2)</tt> and <tt>madvise(2)</tt> are questionable at best and useless at likely. '''FIXME defend''' | |||
** broad use of <tt>mlock(2)</tt> as a performance hack is not even really questionable '''FIXME defend''' | |||
** use of [[Pages|large pages]] is highly recommended for any large, non-sparse maps '''FIXME explain''' | |||
** <tt>mremap(2)</tt> and <tt>remap_file_pages(2)</tt> on [[Linux APIs|Linux]] can be used effectively at times | |||
** There's nothing wrong with <tt>MAP_FIXED</tt> so long as you've already allocated the region before (see caveats...) | |||
==The Full Monty: A Theory of UNIX Servers== | ==The Full Monty: A Theory of UNIX Servers== |
Revision as of 22:31, 25 June 2009
Everyone ought start with Dan Kegel's classic site, "The C10K Problem" (still updated from time to time). Jeff Darcy's "High-Performance Server Architecture" is much of the same. Everything here is advanced followup material to these excellent works, and of course the books of W. Richard Stevens.
Queueing Theory
- "Introduction to Queueing"
- Leonard Kleinrock's peerless Queueing Systems (Volume 1: Theory, Volume 2: Computer Applications)
Event Cores
- epoll on Linux, /dev/poll on Solaris, kqueue on FreeBSD
- liboop, libev and libevent
- Ulrich Drepper's "The Need for Aynchronous, ZeroCopy Network I/O"
- If nothing else, Drepper's plans tend to become sudden and crushing realities in the glibc world
Edge and Level Triggering
- Historic interfaces like POSIX.1g/POSIX.1-2001's select(2) and POSIX.1-2001's poll(2) were level-triggered
- epoll (via EPOLLET) and kqueue (via EV_CLEAR) provide edge-triggered semantics
- fixme a thorough comparison of these is sorely needed
A Garden of Interfaces
We all know doddering old read(2) and write(2) (which can't, by the way, be portably used with shared memory). But what about...
- readv(2), writev(2) (FreeBSD's sendfile(2) has a struct iov handily attached)
- splice(2), vmsplice(2) and tee(2) on Linux since version 2.6.17
- (When the first page of results for your interface centers largely on exploits, might it be time to reconsider your design assumptions?)
- sendfile(2) (with charmingly different interfaces on FreeBSD and Linux)
- On Linux since 2.6.2x (FIXME get a link), sendfile(2) is implemented in terms of splice(2)
- aio_ and friends for aysnchronous i/o
- mmap(2) and an entire associated bag of tricks (FIXME detail)
- most uses of mincore(2) and madvise(2) are questionable at best and useless at likely. FIXME defend
- broad use of mlock(2) as a performance hack is not even really questionable FIXME defend
- use of large pages is highly recommended for any large, non-sparse maps FIXME explain
- mremap(2) and remap_file_pages(2) on Linux can be used effectively at times
- There's nothing wrong with MAP_FIXED so long as you've already allocated the region before (see caveats...)
The Full Monty: A Theory of UNIX Servers
We must mix and match:
- Many event sources, of multiple types and possibly various triggering mechanisms (edge- vs level-triggered):
- Socket descriptors, pipes
- File descriptors referring to actual files (these usually have different blocking semantics)
- Signals, perhaps being used for asynchronous I/O with descriptors (signalfd(2) on Linux unifies these with socket descriptors; kqueue supports EVFILT_SIGNAL events)
- Timers (timerfd(2) on Linux unifies these with socket descriptors; kqueue supports EVFILT_TIMER events)
- Condition variables and/or mutexes becoming available
- Filesystem events (inotify(7) on Linux, EVFILT_VNODE with kqueue)
- Networking events (netlink(7) (PF_NETLINK) sockets on Linux, EVFILT_NETDEV with kqueue)
- One or more event notifiers (epoll or kqueue fd)
- One or more event vectors, into which notifiers dump events
- kqueue supports vectorized registration of event changes, extending the issue
- Threads -- one event notifier per? one shared event notifier with one event vector per? one shared event notifier feeding one shared event vector? work-stealing/handoff?
- It is doubtful (but not, AFAIK, proven impossible) that one scheduling/sharing solution is optimal for all workloads
DoS Prevention or, Maximizing Useful Service
- TCP SYN -- to Syncookie or nay? The "half-open session" isn't nearly as meaningful or important a concept on modern networking stacks as it was in 2000.
- Long-fat-pipe options, fewer MSS values, etc...but recent work (in Linux, at least) has improved them (my gut feeling: nay)
- Various attacks like slowloris, TCPPersist as written up in Phrack 0x0d-0x42-0x09, Outpost24 etc...
- What are the winning feedbacks? fractals and queueing theory, oh my! fixme detail
See Also
- "sendfile(): fairly sexy (nothing to do with ECN)" on lkml
- "mmap() sendfile()" on freebsd-hackers
- "sharing memory map between processes (same parent)" on comp.unix.programmer
- "some mmap observations compared to Linux 2.6/OpenBSD" on freebsd-hackers
- Stuart Cheshire's "Laws of Networkdynamics" and "It's the Latency, Stupid"
- "mremap help? or no support for FreeBSD?" on freebsd-hackers
- "Edge-triggered interfaces are too difficult?" on LWN, 2003-05-16
- "Edge- vs Level-Triggered Events on Pierre Phaneuf's livejournal (pphaneuf)
- "edge-triggered vs level-triggered epoll in kernel 2.6" on comp.unix.programmer, 2004-12-01