Check out my first novel, midnight's simulacra!

Netlink: Difference between revisions

From dankwiki
(→‎Netlink Families: NETLINK_QUEUE is gone, meet NETLINK_NFLOG, alias NETLINK_SOCK_DIAG)
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
Netlink sockets (PF_NETLINK) are a mechanism within Linux to retrieve and manage various aspects of the networking stacks -- they are a Linux-specific extension to the Berkeley Sockets model, and should not be used in portable programs. The information available via netlink sockets was previously available to userspace, if at all, via a collection of <tt>ioctl(2)</tt>s and a grabbag of <tt>get*(2)</tt> custom-purpose system calls; the majority of these are obsoleted by netlink sockets, but still implemented for backwards compatability. [http://www.faqs.org/rfcs/rfc3549.html RFC 3549] provides a snapshot current as of kernel 2.4.6; the netlink socket interface, however, is prone to change. That doesn't affect RFC 3549 as much as one might think, as it really has nothing to do with the netlink programming model; I suspect it to be a joke Andi Kleen perpetrated knowing that W. Richard Stevens wasn't around to call him out on it anymore.
Netlink sockets (PF_NETLINK) are a mechanism within Linux to retrieve and manage various aspects of the networking stacks -- they are a Linux-specific extension to the Berkeley Sockets model, and should not be used in portable programs. The information available via netlink sockets was previously available to userspace, if at all, via a collection of <tt>ioctl(2)</tt>s and a grabbag of <tt>get*(2)</tt> custom-purpose system calls; the majority of these are obsoleted by netlink sockets, but still implemented for backwards compatability. [http://www.faqs.org/rfcs/rfc3549.html RFC 3549] provides a snapshot current as of kernel 2.4.6; the netlink socket interface, however, is prone to change. That doesn't affect RFC 3549 as much as one might think, as it really has nothing to do with the netlink programming model; I suspect it to be a joke Andi Kleen perpetrated knowing that W. Richard Stevens wasn't around to call him out on it anymore.


The <tt>netlink(3)</tt> man page includes the following text:
==IFA_ADDRESS v IFA_LOCAL==
<pre>NOTES
When browsing the <tt>IFA_*</tt> attributes of an rtnetlink <tt>ADDR</tt> message, do not naively think that <tt>IFA_ADDRESS</tt> is the local address. The local address is <tt>IFA_LOCAL</tt>. On a broadcast device, this will be the same as <tt>IFA_ADDRESS</tt>, but on a point-to-point link, <tt>IFA_ADDRESS</tt> is the <i>remote</i> side of the link!
      It is often better to use netlink via libnetlink than via the low-level
      kernel interface.</pre>It has been this author's experience that this is untrue; the cold hard reality is that just about anything involving netlink sockets is bound to be unpleasant, usually in the extreme, and libnetlink won't improve things in the slightest. [[libdank]] has grown a capable netlink module over the years, and I would advise its use.


==Netlink Families==
==Netlink Families==
As in the third argument to <tt>socket(2)</tt>; the full and current list of families can be had at your local <tt>netlink(7)</tt> man page. Here's the important ones:
As in the third argument to <tt>socket(2)</tt>; the full and current list of families can be had at your local <tt>netlink(7)</tt> man page. Here's the important ones:
* <tt>NETLINK_ROUTE</tt> — pretty much everything corresponding to <tt>ip(8)</tt>, also known as <tt>iproute</tt>, including:
* <tt>NETLINK_ROUTE</tt> — pretty much everything corresponding to <tt>ip(8)</tt>, also known as <tt>[[iproute]]</tt>, including:
** <tt>RTM_NEWLINK, RTM_DELLINK, RTM_GETLINK</tt> — device tables (ifinfomsg and rtattr structs) (see <tt>netdevice(7)</tt>)
** <tt>RTM_NEWLINK, RTM_DELLINK, RTM_GETLINK</tt> — device tables (ifinfomsg and rtattr structs) (see <tt>netdevice(7)</tt>)
** <tt>RTM_NEWADDR, RTM_DELADDR, RTM_GETADDR</tt> — address tables (ifaddrmsg and rtattr structs)
** <tt>RTM_NEWADDR, RTM_DELADDR, RTM_GETADDR</tt> — address tables (ifaddrmsg and rtattr structs)
Line 15: Line 13:
** <tt>RTM_NEWRULE, RTM_DELRULE, RTM_GETRULE</tt> — rule tables for advanced routing (rtmsg structs)
** <tt>RTM_NEWRULE, RTM_DELRULE, RTM_GETRULE</tt> — rule tables for advanced routing (rtmsg structs)
** See <tt>rtnetlink(7)</tt> for more info
** See <tt>rtnetlink(7)</tt> for more info
* <tt>NETLINK_SOCK_DIAG</tt> — socket monitoring, as used by <tt>ss(8)</tt>
* <tt>NETLINK_SOCK_DIAG</tt> — socket snapshots, as used by <tt>ss(8)</tt>
** Aliased as <tt>NETLINK_INET_DIAG</tt>
** Aliased as <tt>NETLINK_INET_DIAG</tt>
** <tt>ss(8)</tt> as of at least 2019 seems to actually be using <tt>NETLINK_NFLOG</tt>
** This can only generate snapshots (it was originally added to assist checkpointing). It cannot be subscribed to for streaming events.
* <tt>NETLINK_NFLOG</tt> — [[iptables]] replacement for <tt>NETLINK_QUEUE</tt> since 2.6.14
* <tt>NETLINK_NFLOG</tt> — [[iptables]] replacement for <tt>NETLINK_QUEUE</tt> since 2.6.14
** Userspace provided by <tt>libnetfilter</tt>
** Userspace provided by <tt>libnetfilter</tt>
** <tt>ss(8)</tt> uses this when invoked with <tt>-E</tt> to print events continuously
* <tt>NETLINK_QUEUE</tt> — '''obsolete''' [[iptables]] packet interface for userspace
* <tt>NETLINK_QUEUE</tt> — '''obsolete''' [[iptables]] packet interface for userspace
** Used the <tt>ip_queue</tt> kernel module and the QUEUE target
** Used the <tt>ip_queue</tt> kernel module and the QUEUE target
** Userspace was provided the [https://en.wikipedia.org/wiki/Libipq libipq] wrapper library.
** Userspace was provided the [https://en.wikipedia.org/wiki/Libipq libipq] wrapper library.
==Extended error handling==
Using the confusingly named <tt>NETLINK_EXT_ACK</tt> <tt>SOCK_RAW</tt>-level socket option, the <tt>nlmsgerr</tt> structs accompanying <tt>NLMSG_ERROR</tt> messages will be followed by a set of TLVs from <tt>enum nlmsgerr_attrs</tt>, assuming the backend family supports this functionality.


==Netlink Stupidity==
==Netlink Stupidity==
Each time I come into contact with a new piece of netlink or the code that uses it, I'm flabbergasted by the utter lack of design elegance or even basic good taste. [[Alexey Kuznetsov]], the primary author, is almost famous for getting the bits from one end of the wire to another in the most efficient and ugliest way possible (but I wouldn't try to write a better networking stack). PF_NETLINK and all it touches feels like someone took all the untyped, unsafe <tt>ioctl(2)</tt> layers, wrapped them up with some message queues, stuck them in an Eastern European hellhole and waited for NATO air power to solve the design problem.
Each time I come into contact with a new piece of netlink or the code that uses it, I'm flabbergasted by the utter lack of design elegance or even basic good taste. [[Alexey Kuznetsov]], the primary author, is almost famous for getting the bits from one end of the wire to another in the most efficient and ugliest way possible (but I wouldn't try to write a better networking stack). PF_NETLINK and all it touches feels like someone took all the untyped, unsafe <tt>ioctl(2)</tt> layers, wrapped them up with some message queues, stuck them in an Eastern European hellhole and waited for NATO air power to solve the design problem.


The "Big Tent" approach to <tt>socket(2)</tt> (from <tt>netlink(7)</tt>):<pre>Netlink is a datagram-oriented service. Both SOCK_RAW and SOCK_DGRAM are valid values for socket_type. However, the netlink protocol does
The "Big Tent" approach to <tt>socket(2)</tt> (from <tt>netlink(7)</tt>):<pre>Netlink is a datagram-oriented service. Both SOCK_RAW and SOCK_DGRAM are valid values for socket_type. However, the netlink protocol does not distinguish between datagram and raw sockets.</pre>
not distinguish between datagram and raw sockets.</pre>
Things like this all over the place (taken from ''misc/ss.c'' in the iproute source package):<pre>req.nlh.nlmsg_seq = 123456;</pre>
Things like this all over the place (taken from ''misc/ss.c'' in the iproute source package):<pre>req.nlh.nlmsg_seq = 123456;</pre>
==See Also==
==See Also==
* Paul Moore's [https://lwn.net/Articles/208755/ Generic Netlink] document
* Paul Moore's [https://lwn.net/Articles/208755/ Generic Netlink] document

Latest revision as of 11:56, 28 April 2023

Netlink sockets (PF_NETLINK) are a mechanism within Linux to retrieve and manage various aspects of the networking stacks -- they are a Linux-specific extension to the Berkeley Sockets model, and should not be used in portable programs. The information available via netlink sockets was previously available to userspace, if at all, via a collection of ioctl(2)s and a grabbag of get*(2) custom-purpose system calls; the majority of these are obsoleted by netlink sockets, but still implemented for backwards compatability. RFC 3549 provides a snapshot current as of kernel 2.4.6; the netlink socket interface, however, is prone to change. That doesn't affect RFC 3549 as much as one might think, as it really has nothing to do with the netlink programming model; I suspect it to be a joke Andi Kleen perpetrated knowing that W. Richard Stevens wasn't around to call him out on it anymore.

IFA_ADDRESS v IFA_LOCAL

When browsing the IFA_* attributes of an rtnetlink ADDR message, do not naively think that IFA_ADDRESS is the local address. The local address is IFA_LOCAL. On a broadcast device, this will be the same as IFA_ADDRESS, but on a point-to-point link, IFA_ADDRESS is the remote side of the link!

Netlink Families

As in the third argument to socket(2); the full and current list of families can be had at your local netlink(7) man page. Here's the important ones:

  • NETLINK_ROUTE — pretty much everything corresponding to ip(8), also known as iproute, including:
    • RTM_NEWLINK, RTM_DELLINK, RTM_GETLINK — device tables (ifinfomsg and rtattr structs) (see netdevice(7))
    • RTM_NEWADDR, RTM_DELADDR, RTM_GETADDR — address tables (ifaddrmsg and rtattr structs)
    • RTM_NEWROUTE, RTM_DELROUTE, RTM_GETROUTE — routing tables (rtmsg and rtattr structs)
    • RTM_NEWNEIGH, RTM_DELNEIGH, RTM_GETNEIGH — neighbor (ARP, for IPv4) tables (ndmsg structs)
    • RTM_NEWRULE, RTM_DELRULE, RTM_GETRULE — rule tables for advanced routing (rtmsg structs)
    • See rtnetlink(7) for more info
  • NETLINK_SOCK_DIAG — socket snapshots, as used by ss(8)
    • Aliased as NETLINK_INET_DIAG
    • This can only generate snapshots (it was originally added to assist checkpointing). It cannot be subscribed to for streaming events.
  • NETLINK_NFLOGiptables replacement for NETLINK_QUEUE since 2.6.14
    • Userspace provided by libnetfilter
    • ss(8) uses this when invoked with -E to print events continuously
  • NETLINK_QUEUEobsolete iptables packet interface for userspace
    • Used the ip_queue kernel module and the QUEUE target
    • Userspace was provided the libipq wrapper library.

Extended error handling

Using the confusingly named NETLINK_EXT_ACK SOCK_RAW-level socket option, the nlmsgerr structs accompanying NLMSG_ERROR messages will be followed by a set of TLVs from enum nlmsgerr_attrs, assuming the backend family supports this functionality.

Netlink Stupidity

Each time I come into contact with a new piece of netlink or the code that uses it, I'm flabbergasted by the utter lack of design elegance or even basic good taste. Alexey Kuznetsov, the primary author, is almost famous for getting the bits from one end of the wire to another in the most efficient and ugliest way possible (but I wouldn't try to write a better networking stack). PF_NETLINK and all it touches feels like someone took all the untyped, unsafe ioctl(2) layers, wrapped them up with some message queues, stuck them in an Eastern European hellhole and waited for NATO air power to solve the design problem.

The "Big Tent" approach to socket(2) (from netlink(7)):

Netlink is a datagram-oriented service. Both SOCK_RAW and SOCK_DGRAM are valid values for socket_type. However, the netlink protocol does not distinguish between datagram and raw sockets.

Things like this all over the place (taken from misc/ss.c in the iproute source package):

req.nlh.nlmsg_seq = 123456;

See Also