Check out my first novel, midnight's simulacra!
Netlink: Difference between revisions
No edit summary |
|||
(2 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
Netlink sockets (PF_NETLINK) are a mechanism within Linux to retrieve and manage various aspects of the networking stacks -- they are a Linux-specific extension to the Berkeley Sockets model, and should not be used in portable programs. The information available via netlink sockets was previously available to userspace, if at all, via a collection of <tt>ioctl(2)</tt>s and a grabbag of <tt>get*(2)</tt> custom-purpose system calls; the majority of these are obsoleted by netlink sockets, but still implemented for backwards compatability. [http://www.faqs.org/rfcs/rfc3549.html RFC 3549] provides a snapshot current as of kernel 2.4.6; the netlink socket interface, however, is prone to change. That doesn't affect RFC 3549 as much as one might think, as it really has nothing to do with the netlink programming model; I suspect it to be a joke Andi Kleen perpetrated knowing that W. Richard Stevens wasn't around to call him out on it anymore. | Netlink sockets (PF_NETLINK) are a mechanism within Linux to retrieve and manage various aspects of the networking stacks -- they are a Linux-specific extension to the Berkeley Sockets model, and should not be used in portable programs. The information available via netlink sockets was previously available to userspace, if at all, via a collection of <tt>ioctl(2)</tt>s and a grabbag of <tt>get*(2)</tt> custom-purpose system calls; the majority of these are obsoleted by netlink sockets, but still implemented for backwards compatability. [http://www.faqs.org/rfcs/rfc3549.html RFC 3549] provides a snapshot current as of kernel 2.4.6; the netlink socket interface, however, is prone to change. That doesn't affect RFC 3549 as much as one might think, as it really has nothing to do with the netlink programming model; I suspect it to be a joke Andi Kleen perpetrated knowing that W. Richard Stevens wasn't around to call him out on it anymore. | ||
==IFA_ADDRESS v IFA_LOCAL== | ==IFA_ADDRESS v IFA_LOCAL== | ||
Line 11: | Line 6: | ||
==Netlink Families== | ==Netlink Families== | ||
As in the third argument to <tt>socket(2)</tt>; the full and current list of families can be had at your local <tt>netlink(7)</tt> man page. Here's the important ones: | As in the third argument to <tt>socket(2)</tt>; the full and current list of families can be had at your local <tt>netlink(7)</tt> man page. Here's the important ones: | ||
* <tt>NETLINK_ROUTE</tt> — pretty much everything corresponding to <tt>ip(8)</tt>, also known as <tt>iproute</tt>, including: | * <tt>NETLINK_ROUTE</tt> — pretty much everything corresponding to <tt>ip(8)</tt>, also known as <tt>[[iproute]]</tt>, including: | ||
** <tt>RTM_NEWLINK, RTM_DELLINK, RTM_GETLINK</tt> — device tables (ifinfomsg and rtattr structs) (see <tt>netdevice(7)</tt>) | ** <tt>RTM_NEWLINK, RTM_DELLINK, RTM_GETLINK</tt> — device tables (ifinfomsg and rtattr structs) (see <tt>netdevice(7)</tt>) | ||
** <tt>RTM_NEWADDR, RTM_DELADDR, RTM_GETADDR</tt> — address tables (ifaddrmsg and rtattr structs) | ** <tt>RTM_NEWADDR, RTM_DELADDR, RTM_GETADDR</tt> — address tables (ifaddrmsg and rtattr structs) | ||
Line 27: | Line 22: | ||
** Used the <tt>ip_queue</tt> kernel module and the QUEUE target | ** Used the <tt>ip_queue</tt> kernel module and the QUEUE target | ||
** Userspace was provided the [https://en.wikipedia.org/wiki/Libipq libipq] wrapper library. | ** Userspace was provided the [https://en.wikipedia.org/wiki/Libipq libipq] wrapper library. | ||
==Extended error handling== | |||
Using the confusingly named <tt>NETLINK_EXT_ACK</tt> <tt>SOCK_RAW</tt>-level socket option, the <tt>nlmsgerr</tt> structs accompanying <tt>NLMSG_ERROR</tt> messages will be followed by a set of TLVs from <tt>enum nlmsgerr_attrs</tt>, assuming the backend family supports this functionality. | |||
==Netlink Stupidity== | ==Netlink Stupidity== |
Latest revision as of 11:56, 28 April 2023
Netlink sockets (PF_NETLINK) are a mechanism within Linux to retrieve and manage various aspects of the networking stacks -- they are a Linux-specific extension to the Berkeley Sockets model, and should not be used in portable programs. The information available via netlink sockets was previously available to userspace, if at all, via a collection of ioctl(2)s and a grabbag of get*(2) custom-purpose system calls; the majority of these are obsoleted by netlink sockets, but still implemented for backwards compatability. RFC 3549 provides a snapshot current as of kernel 2.4.6; the netlink socket interface, however, is prone to change. That doesn't affect RFC 3549 as much as one might think, as it really has nothing to do with the netlink programming model; I suspect it to be a joke Andi Kleen perpetrated knowing that W. Richard Stevens wasn't around to call him out on it anymore.
IFA_ADDRESS v IFA_LOCAL
When browsing the IFA_* attributes of an rtnetlink ADDR message, do not naively think that IFA_ADDRESS is the local address. The local address is IFA_LOCAL. On a broadcast device, this will be the same as IFA_ADDRESS, but on a point-to-point link, IFA_ADDRESS is the remote side of the link!
Netlink Families
As in the third argument to socket(2); the full and current list of families can be had at your local netlink(7) man page. Here's the important ones:
- NETLINK_ROUTE — pretty much everything corresponding to ip(8), also known as iproute, including:
- RTM_NEWLINK, RTM_DELLINK, RTM_GETLINK — device tables (ifinfomsg and rtattr structs) (see netdevice(7))
- RTM_NEWADDR, RTM_DELADDR, RTM_GETADDR — address tables (ifaddrmsg and rtattr structs)
- RTM_NEWROUTE, RTM_DELROUTE, RTM_GETROUTE — routing tables (rtmsg and rtattr structs)
- RTM_NEWNEIGH, RTM_DELNEIGH, RTM_GETNEIGH — neighbor (ARP, for IPv4) tables (ndmsg structs)
- RTM_NEWRULE, RTM_DELRULE, RTM_GETRULE — rule tables for advanced routing (rtmsg structs)
- See rtnetlink(7) for more info
- NETLINK_SOCK_DIAG — socket snapshots, as used by ss(8)
- Aliased as NETLINK_INET_DIAG
- This can only generate snapshots (it was originally added to assist checkpointing). It cannot be subscribed to for streaming events.
- NETLINK_NFLOG — iptables replacement for NETLINK_QUEUE since 2.6.14
- Userspace provided by libnetfilter
- ss(8) uses this when invoked with -E to print events continuously
- NETLINK_QUEUE — obsolete iptables packet interface for userspace
- Used the ip_queue kernel module and the QUEUE target
- Userspace was provided the libipq wrapper library.
Extended error handling
Using the confusingly named NETLINK_EXT_ACK SOCK_RAW-level socket option, the nlmsgerr structs accompanying NLMSG_ERROR messages will be followed by a set of TLVs from enum nlmsgerr_attrs, assuming the backend family supports this functionality.
Netlink Stupidity
Each time I come into contact with a new piece of netlink or the code that uses it, I'm flabbergasted by the utter lack of design elegance or even basic good taste. Alexey Kuznetsov, the primary author, is almost famous for getting the bits from one end of the wire to another in the most efficient and ugliest way possible (but I wouldn't try to write a better networking stack). PF_NETLINK and all it touches feels like someone took all the untyped, unsafe ioctl(2) layers, wrapped them up with some message queues, stuck them in an Eastern European hellhole and waited for NATO air power to solve the design problem.
The "Big Tent" approach to socket(2) (from netlink(7)):
Netlink is a datagram-oriented service. Both SOCK_RAW and SOCK_DGRAM are valid values for socket_type. However, the netlink protocol does not distinguish between datagram and raw sockets.
Things like this all over the place (taken from misc/ss.c in the iproute source package):
req.nlh.nlmsg_seq = 123456;
See Also
- Paul Moore's Generic Netlink document