123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166 |
- Netdev features mess and how to get out from it alive
- =====================================================
- Author:
- Michał Mirosław <mirq-linux@rere.qmqm.pl>
- Part I: Feature sets
- ======================
- Long gone are the days when a network card would just take and give packets
- verbatim. Today's devices add multiple features and bugs (read: offloads)
- that relieve an OS of various tasks like generating and checking checksums,
- splitting packets, classifying them. Those capabilities and their state
- are commonly referred to as netdev features in Linux kernel world.
- There are currently three sets of features relevant to the driver, and
- one used internally by network core:
- 1. netdev->hw_features set contains features whose state may possibly
- be changed (enabled or disabled) for a particular device by user's
- request. This set should be initialized in ndo_init callback and not
- changed later.
- 2. netdev->features set contains features which are currently enabled
- for a device. This should be changed only by network core or in
- error paths of ndo_set_features callback.
- 3. netdev->vlan_features set contains features whose state is inherited
- by child VLAN devices (limits netdev->features set). This is currently
- used for all VLAN devices whether tags are stripped or inserted in
- hardware or software.
- 4. netdev->wanted_features set contains feature set requested by user.
- This set is filtered by ndo_fix_features callback whenever it or
- some device-specific conditions change. This set is internal to
- networking core and should not be referenced in drivers.
- Part II: Controlling enabled features
- =======================================
- When current feature set (netdev->features) is to be changed, new set
- is calculated and filtered by calling ndo_fix_features callback
- and netdev_fix_features(). If the resulting set differs from current
- set, it is passed to ndo_set_features callback and (if the callback
- returns success) replaces value stored in netdev->features.
- NETDEV_FEAT_CHANGE notification is issued after that whenever current
- set might have changed.
- The following events trigger recalculation:
- 1. device's registration, after ndo_init returned success
- 2. user requested changes in features state
- 3. netdev_update_features() is called
- ndo_*_features callbacks are called with rtnl_lock held. Missing callbacks
- are treated as always returning success.
- A driver that wants to trigger recalculation must do so by calling
- netdev_update_features() while holding rtnl_lock. This should not be done
- from ndo_*_features callbacks. netdev->features should not be modified by
- driver except by means of ndo_fix_features callback.
- Part III: Implementation hints
- ================================
- * ndo_fix_features:
- All dependencies between features should be resolved here. The resulting
- set can be reduced further by networking core imposed limitations (as coded
- in netdev_fix_features()). For this reason it is safer to disable a feature
- when its dependencies are not met instead of forcing the dependency on.
- This callback should not modify hardware nor driver state (should be
- stateless). It can be called multiple times between successive
- ndo_set_features calls.
- Callback must not alter features contained in NETIF_F_SOFT_FEATURES or
- NETIF_F_NEVER_CHANGE sets. The exception is NETIF_F_VLAN_CHALLENGED but
- care must be taken as the change won't affect already configured VLANs.
- * ndo_set_features:
- Hardware should be reconfigured to match passed feature set. The set
- should not be altered unless some error condition happens that can't
- be reliably detected in ndo_fix_features. In this case, the callback
- should update netdev->features to match resulting hardware state.
- Errors returned are not (and cannot be) propagated anywhere except dmesg.
- (Note: successful return is zero, >0 means silent error.)
- Part IV: Features
- ===================
- For current list of features, see include/linux/netdev_features.h.
- This section describes semantics of some of them.
- * Transmit checksumming
- For complete description, see comments near the top of include/linux/skbuff.h.
- Note: NETIF_F_HW_CSUM is a superset of NETIF_F_IP_CSUM + NETIF_F_IPV6_CSUM.
- It means that device can fill TCP/UDP-like checksum anywhere in the packets
- whatever headers there might be.
- * Transmit TCP segmentation offload
- NETIF_F_TSO_ECN means that hardware can properly split packets with CWR bit
- set, be it TCPv4 (when NETIF_F_TSO is enabled) or TCPv6 (NETIF_F_TSO6).
- * Transmit DMA from high memory
- On platforms where this is relevant, NETIF_F_HIGHDMA signals that
- ndo_start_xmit can handle skbs with frags in high memory.
- * Transmit scatter-gather
- Those features say that ndo_start_xmit can handle fragmented skbs:
- NETIF_F_SG --- paged skbs (skb_shinfo()->frags), NETIF_F_FRAGLIST ---
- chained skbs (skb->next/prev list).
- * Software features
- Features contained in NETIF_F_SOFT_FEATURES are features of networking
- stack. Driver should not change behaviour based on them.
- * LLTX driver (deprecated for hardware drivers)
- NETIF_F_LLTX is meant to be used by drivers that don't need locking at all,
- e.g. software tunnels.
- This is also used in a few legacy drivers that implement their
- own locking, don't use it for new (hardware) drivers.
- * netns-local device
- NETIF_F_NETNS_LOCAL is set for devices that are not allowed to move between
- network namespaces (e.g. loopback).
- Don't use it in drivers.
- * VLAN challenged
- NETIF_F_VLAN_CHALLENGED should be set for devices which can't cope with VLAN
- headers. Some drivers set this because the cards can't handle the bigger MTU.
- [FIXME: Those cases could be fixed in VLAN code by allowing only reduced-MTU
- VLANs. This may be not useful, though.]
- * rx-fcs
- This requests that the NIC append the Ethernet Frame Checksum (FCS)
- to the end of the skb data. This allows sniffers and other tools to
- read the CRC recorded by the NIC on receipt of the packet.
- * rx-all
- This requests that the NIC receive all possible frames, including errored
- frames (such as bad FCS, etc). This can be helpful when sniffing a link with
- bad packets on it. Some NICs may receive more packets if also put into normal
- PROMISC mode.
|