Check out my first novel, midnight's simulacra!

Fast UNIX Servers: Difference between revisions

From dankwiki
No edit summary
No edit summary
Line 14: Line 14:
* <tt>epoll</tt> (via <tt>EPOLLET</tt>) and <tt>kqueue</tt> (via <tt>EV_CLEAR</tt>) provide edge-triggered semantics
* <tt>epoll</tt> (via <tt>EPOLLET</tt>) and <tt>kqueue</tt> (via <tt>EV_CLEAR</tt>) provide edge-triggered semantics
* '''fixme''' a thorough comparison of these is sorely needed
* '''fixme''' a thorough comparison of these is sorely needed
==A Garden of Interfaces==
* <tt>readv(2)</tt>, <tt>writev(2)</tt> ([[FreeBSD APIs|FreeBSD]]'s <tt>sendfile(2)</tt> has a <tt>struct iov</tt> handily attached)
* <tt>splice(2)</tt>, <tt>vmsplice(2)</tt> and <tt>tee(2)</tt> on [[Linux APIs|Linux]] since version 2.6.17
** (When the first page of results for your interface centers largely on exploits, might it be time to reconsider your design assumptions?)
* <tt>sendfile(2)</tt> (with charmingly different interfaces on FreeBSD and Linux)
** On Linux since 2.6.2x ('''FIXME get a link'''), <tt>sendfile(2)</tt> is implemented in terms of <tt>splice(2)</tt>
* <tt>aio_</tt> and friends for aysnchronous i/o
* <tt>mmap(2)</tt> and an entire associated bag of tricks ('''FIXME detail''')


==The Full Monty: A Theory of UNIX Servers==
==The Full Monty: A Theory of UNIX Servers==

Revision as of 21:38, 25 June 2009

Everyone ought start with Dan Kegel's classic site, "The C10K Problem" (still updated from time to time). Jeff Darcy's "High-Performance Server Architecture" is much of the same. Everything here is advanced followup material to these excellent works, and of course the books of W. Richard Stevens.

Queueing Theory

Event Cores

Edge and Level Triggering

  • Historic interfaces like POSIX.1g/POSIX.1-2001's select(2) and POSIX.1-2001's poll(2) were level-triggered
  • epoll (via EPOLLET) and kqueue (via EV_CLEAR) provide edge-triggered semantics
  • fixme a thorough comparison of these is sorely needed

A Garden of Interfaces

  • readv(2), writev(2) (FreeBSD's sendfile(2) has a struct iov handily attached)
  • splice(2), vmsplice(2) and tee(2) on Linux since version 2.6.17
    • (When the first page of results for your interface centers largely on exploits, might it be time to reconsider your design assumptions?)
  • sendfile(2) (with charmingly different interfaces on FreeBSD and Linux)
    • On Linux since 2.6.2x (FIXME get a link), sendfile(2) is implemented in terms of splice(2)
  • aio_ and friends for aysnchronous i/o
  • mmap(2) and an entire associated bag of tricks (FIXME detail)

The Full Monty: A Theory of UNIX Servers

We must mix and match:

  • Many event sources, of multiple types and possibly various triggering mechanisms (edge- vs level-triggered):
    • Socket descriptors, pipes
    • File descriptors referring to actual files (these usually have different blocking semantics)
    • Signals, perhaps being used for asynchronous I/O with descriptors (signalfd(2) on Linux unifies these with socket descriptors; kqueue supports EVFILT_SIGNAL events)
    • Timers (timerfd(2) on Linux unifies these with socket descriptors; kqueue supports EVFILT_TIMER events)
    • Condition variables and/or mutexes becoming available
    • Filesystem events (inotify(7) on Linux, EVFILT_VNODE with kqueue)
    • Networking events (netlink(7) (PF_NETLINK) sockets on Linux, EVFILT_NETDEV with kqueue)
  • One or more event notifiers (epoll or kqueue fd)
  • One or more event vectors, into which notifiers dump events
    • kqueue supports vectorized registration of event changes, extending the issue
  • Threads -- one event notifier per? one shared event notifier with one event vector per? one shared event notifier feeding one shared event vector? work-stealing/handoff?
    • It is doubtful (but not, AFAIK, proven impossible) that one scheduling/sharing solution is optimal for all workloads

DoS Prevention or, Maximizing Useful Service

  • TCP SYN -- to Syncookie or nay? The "half-open session" isn't nearly as meaningful or important a concept on modern networking stacks as it was in 2000.
    • Long-fat-pipe options, fewer MSS values, etc...but recent work (in Linux, at least) has improved them (my gut feeling: nay)
  • Various attacks like slowloris, TCPPersist as written up in Phrack 0x0d-0x42-0x09, Outpost24 etc...
  • What are the winning feedbacks? fractals and queueing theory, oh my! fixme detail

See Also