Check out my first novel, midnight's simulacra!
The beginning of the end of iptables
dankblog! 2024-04-19, 2334 EST, at the danktower
when i was but a wee lad, Linux was circling around a sensible packet filtering infrastructure. there had been the generally unacceptable ipfwadm in 2.0, pretty clearly behind OpenBSD's PF (later brought into FreeBSD 5.2) and FreeBSD's IPFW. Rusty Russell (whom we haven't heard much from recently--I wonder where he's gone) implemented the stateless ipchains in 2.2, and then replaced it in 2.4 with iptables, a full-featured solution that stood for a decade. i memorized the entirety of iptables early on, and it served me very well in my career. indeed, even in 2022 i was employing obscure iptables functionality in conjunction with eBPF to implement advanced, novel warez (the "Illithid" traffic shaper at Microsoft).
kernel 3.13 introduced "nftables", controlled by the nft userspace binary. nftables unifies the packet filtering space, previously split among iptables, ip6tables, ebtables (for bridging), and arptables, combining the existing Netfilter netstack hooks with a virtual machine. nftables supported a compatibility layer for the vast majority (by popularity) of iptables functionality: for at least the past five years, if you've been using iptables, you've most likely been using iptables-nft, and actually creating nftables rulesets (iptables-legacy has continued to use the legacy kernel infrastructure). there are four sets of -nft and -legacy for the four elements of the space.
[schwarzgerat](0) $ readlink -f $(command -v iptables) /usr/sbin/xtables-nft-multi [schwarzgerat](0) $ iptables -V iptables v1.8.10 (nf_tables) [schwarzgerat](0) $ iptables-nft -V iptables v1.8.10 (nf_tables) [schwarzgerat](0) $ iptables-legacy -V iptables v1.8.10 (legacy) [schwarzgerat](0) $
from the userspace perspective, major advantages of nftables include the unified approach (no more IPv4/IPv6 duplication!) and the ability to monitor changes to the ruleset. from the developer's perspective, addition/deletion of rules now operates on a single rule at a time, rather than requiring a complete replacement of the ruleset to make any change.
so, for the majority of iptables users who are just DROPping most incoming traffic and possibly jumping to MASQUERADE on an outgoing interface or two, maybe DNATting to a Docker, the move to nftables has been entirely under-the-hood, and many were probably unaware of the switch. i've used nft a few times, but there's twenty-five years of iptables muscle memory present, so i've largely continued to make use of that model.
recently, however, i've been trying to make more use of systemd-networkd. part of this was using IPMasquerade=ipv4 in a systemd.network unit, rather than hooking an iptables -w -t nat -I POSTROUTING -o iface -j MASQUERADE to the interface in some ad hoc manner as i've always done (usually a pre-up rule in debian's /etc/network/interfaces or its non-union redhat equivalent). it worked just as expected, until i ran iptables -t nat -L -v -n:
[qgp](0) $ sudo iptables -t nat -L -v -n Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination Chain INPUT (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination [qgp](0) $
whaaaaaaaaaaaat? where's my POSTROUTING rule? i quickly verified via inspection that ipv4 masquerading was taking place as expected. no iptables rule listed, though! i checked nft list ruleset:
table ip io.systemd.nat { set masq_saddr { type ipv4_addr flags interval elements = { 192.168.88.0/24, 192.168.90.0/24, -- type nat hook postrouting priority srcnat + 1; policy accept; ip saddr @masq_saddr masquerade } } table ip6 io.systemd.nat { set masq_saddr { type ipv6_addr flags interval } -- type nat hook postrouting priority srcnat + 1; policy accept; ip6 saddr @masq_saddr masquerade } } [qgp](0) $
well, there's my masquerade configuration. and note that, true to form, nft appears to now be masquerading for both ipv4 and ipv6, which i didn't intend, did not want, and did not expect based off the systemd-networkd directive IPMasquerade=ipv4 (this is presumably either a bug in systemd-networkd, or (more likely) i'm misunderstanding the output). since v248, systemd has used nftables directly when available. this bypasses the ip filter and ip6 filter tables managed by iptables-nft, and thus doesn't show up using iptables-nft.
systemd-hating retards ought know that this move is mirrored in docker, libvirtd, podman, and other tools, so it's pretty inescapable.
which means i finally, after a decade plus, need to start using nft and abandon iptables, even in its iptables-nft incarnation. bah!
See also
- iptables
- nftables
- ArchWiki's nftables page
- NixOS issue #156041
previously: "Io uring and xdp enter 2024" 2024-02-15