Check out my first novel, midnight's simulacra!
Io uring and xdp enter 2024: Difference between revisions
No edit summary |
|||
(5 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
'''[[Dankblog|dankblog!]] 2024-02-15, 1452 EST, at [[Viewpoint|the danktower]]''' | '''[[Dankblog|dankblog!]] 2024-02-15, 1452 EST, at [[Viewpoint|the danktower]]''' | ||
Last year (2023), I spent significant time writing code using [[XDP]] and [[io_uring]]. The latter was delightful, the former less so | Last year (2023), I spent significant time writing code using [[XDP]] and [[io_uring]]. The latter was delightful, the former less so. Work on these technologies has progressed, and it seems time for an update. | ||
==XDP== | ==XDP== | ||
Line 11: | Line 11: | ||
I used to have the following on my io_uring page: | I used to have the following on my io_uring page: | ||
It would be nice to have tight integration with condition variables or even mutex/futex (allow me to submit a request to get a lock, and when i get the CQE, i have that lock). Bonus points if the fast (uncontended) path never needs a system call (like mutexes built atop futexes today). | <q>It would be nice to have tight integration with condition variables or even mutex/futex (allow me to submit a request to get a lock, and when i get the CQE, i have that lock). Bonus points if the fast (uncontended) path never needs a system call (like mutexes built atop futexes today).</q> | ||
The new API appears to satisfy all my desires! | The new API appears to satisfy all my desires! | ||
Also new is <tt>IORING_OP_WAITID</tt> for working with process state changes (though I would recommend use of [[pidfd|pidfds]] for this kind of thing in new code). | Also new is <tt>IORING_OP_WAITID</tt> for working with process state changes (though I would recommend use of [[pidfd|pidfds]] for this kind of thing in new code). | ||
[https://git.kernel.dk/cgit/linux/commit/?h=io_uring-send-queue&id=060845d3788f20b427631b64a6dbdbd249a0309b Kernel 6.8] introduces <tt>IORING_SEND_MULTISHOT</tt>, bringing the multishot pattern to the TX (<tt>send</tt> and <tt>sendmsg</tt>) side. See Axboe's "[https://lore.kernel.org/io-uring/20240308235045.1014125-1-axboe@kernel.dk/ Send and receive bundles]" post for further optimization using "bundles" (these require registered buffers). | |||
<tt>IORING_SETUP_REGISTERED_FD_ONLY</tt> registers the ring fd for use with <tt>IORING_REGISTER_USE_REGISTERED_RING</tt>. | <tt>IORING_SETUP_REGISTERED_FD_ONLY</tt> registers the ring fd for use with <tt>IORING_REGISTER_USE_REGISTERED_RING</tt>. | ||
Line 21: | Line 23: | ||
<tt>io_uring_prep_cmd_sock(2)</tt> configures <tt>IORING_OP_URING_CMD</tt> SQEs to perform <tt>setsockopt(2)</tt> operations. | <tt>io_uring_prep_cmd_sock(2)</tt> configures <tt>IORING_OP_URING_CMD</tt> SQEs to perform <tt>setsockopt(2)</tt> operations. | ||
<tt>io_uring_prep_getxattr(2)</tt> and <tt>io_uring_prep_setxattr(2) prep SQEs for <tt>getxattr(2)</tt> and <tt>setxattr(2)</tt> operations; <tt>io_uring_prep_fgetxattr(2)</tt> and <tt>io_uring_prep_fsetxattr(2)</tt> do exactly what you'd think. | <tt>io_uring_prep_getxattr(2)</tt> and <tt>io_uring_prep_setxattr(2)</tt> prep SQEs for <tt>getxattr(2)</tt> and <tt>setxattr(2)</tt> operations; <tt>io_uring_prep_fgetxattr(2)</tt> and <tt>io_uring_prep_fsetxattr(2)</tt> do exactly what you'd think. | ||
Making huge pages as unpleasant as possible to use is a central tenet (perhaps <i>the</i> central tenet) underpinning the entire Linux mission, but <tt>IORING_SETUP_NO_MMAP</tt> was [https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=03d89a2de25b added in 6.5], huzzah! | Making huge pages as unpleasant as possible to use is a central tenet (perhaps <i>the</i> central tenet) underpinning the entire Linux mission, but <tt>IORING_SETUP_NO_MMAP</tt> was [https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=03d89a2de25b added in 6.5], huzzah! |
Latest revision as of 07:24, 9 March 2024
dankblog! 2024-02-15, 1452 EST, at the danktower
Last year (2023), I spent significant time writing code using XDP and io_uring. The latter was delightful, the former less so. Work on these technologies has progressed, and it seems time for an update.
XDP
One of the big problems I had with AF_XDP sockets was the lack of support for large packets. One could typically only use an MTU of 3KB and some change, putting XDP at a disadvantage relative to systems which happily process 8KB frames. One must supply XDP_USE_SG, and supply xdp.frags as the program's section name. A single RXring frame might not hold the entire packet, in which case it will be written across several ring frames (with attendant multiple RXring descriptors). The XDP_PKT_CONTD flag is set in the options field if there are more frames. They will always be written in order, and if there are insufficient frames to write all data, none will be written.
io_uring
Easily the most exciting development is integration of futexes (Jens Axboe 2023-07-20 "Add io_uring futex/futexv support"). Multiple futexes ("vectored wait") can be supplied at once using IORING_OP_FUTEX_WAITV (Jens Axboe 2023-09-29 "add support for vectored futex waits"). I used to have the following on my io_uring page:
It would be nice to have tight integration with condition variables or even mutex/futex (allow me to submit a request to get a lock, and when i get the CQE, i have that lock). Bonus points if the fast (uncontended) path never needs a system call (like mutexes built atop futexes today).
The new API appears to satisfy all my desires!
Also new is IORING_OP_WAITID for working with process state changes (though I would recommend use of pidfds for this kind of thing in new code).
Kernel 6.8 introduces IORING_SEND_MULTISHOT, bringing the multishot pattern to the TX (send and sendmsg) side. See Axboe's "Send and receive bundles" post for further optimization using "bundles" (these require registered buffers).
IORING_SETUP_REGISTERED_FD_ONLY registers the ring fd for use with IORING_REGISTER_USE_REGISTERED_RING.
io_uring_prep_cmd_sock(2) configures IORING_OP_URING_CMD SQEs to perform setsockopt(2) operations.
io_uring_prep_getxattr(2) and io_uring_prep_setxattr(2) prep SQEs for getxattr(2) and setxattr(2) operations; io_uring_prep_fgetxattr(2) and io_uring_prep_fsetxattr(2) do exactly what you'd think.
Making huge pages as unpleasant as possible to use is a central tenet (perhaps the central tenet) underpinning the entire Linux mission, but IORING_SETUP_NO_MMAP was added in 6.5, huzzah!
See also
- AF_XDP kernel documentation
previously: "ebooks are hot garbage" 2024-02-11