Check out my first novel, midnight's simulacra!

Io uring and xdp enter 2024

From dankwiki

dankblog! 2024-02-15, 1452 EST, at the danktower

Last year (2023), I spent significant time writing code using XDP and io_uring. The latter was delightful, the former less so. Work on these technologies has progressed, and it seems time for an update.

XDP

One of the big problems I had with AF_XDP sockets was the lack of support for large packets. One could typically only use an MTU of 3KB and some change, putting XDP at a disadvantage relative to systems which happily process 8KB frames. One must supply XDP_USE_SG, and supply xdp.frags as the program's section name. A single RXring frame might not hold the entire packet, in which case it will be written across several ring frames (with attendant multiple RXring descriptors). The XDP_PKT_CONTD flag is set in the options field if there are more frames. They will always be written in order, and if there are insufficient frames to write all data, none will be written.

io_uring

Easily the most exciting development is integration of futexes (Jens Axboe 2023-07-20 "Add io_uring futex/futexv support"). Multiple futexes ("vectored wait") can be supplied at once using IORING_OP_FUTEX_WAITV (Jens Axboe 2023-09-29 "add support for vectored futex waits"). I used to have the following on my io_uring page:

It would be nice to have tight integration with condition variables or even mutex/futex (allow me to submit a request to get a lock, and when i get the CQE, i have that lock). Bonus points if the fast (uncontended) path never needs a system call (like mutexes built atop futexes today).

The new API appears to satisfy all my desires!

Also new is IORING_OP_WAITID for working with process state changes (though I would recommend use of pidfds for this kind of thing in new code).

Kernel 6.8 introduces IORING_SEND_MULTISHOT, bringing the multishot pattern to the TX (send and sendmsg) side. See Axboe's "Send and receive bundles" post for further optimization using "bundles" (these require registered buffers).

IORING_SETUP_REGISTERED_FD_ONLY registers the ring fd for use with IORING_REGISTER_USE_REGISTERED_RING.

io_uring_prep_cmd_sock(2) configures IORING_OP_URING_CMD SQEs to perform setsockopt(2) operations.

io_uring_prep_getxattr(2) and io_uring_prep_setxattr(2) prep SQEs for getxattr(2) and setxattr(2) operations; io_uring_prep_fgetxattr(2) and io_uring_prep_fsetxattr(2) do exactly what you'd think.

Making huge pages as unpleasant as possible to use is a central tenet (perhaps the central tenet) underpinning the entire Linux mission, but IORING_SETUP_NO_MMAP was added in 6.5, huzzah!

See also


previously: "ebooks are hot garbage" 2024-02-11