Skip to content

Commit 141fc19

Browse files
committed
Merge: CNB96: netdev: add per-queue statistics
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5149 JIRA: https://issues.redhat.com/browse/RHEL-57771 ``` netdev: add per-queue statistics The ethtool-nl family does a good job exposing various protocol related and IEEE/IETF statistics which used to get dumped under ethtool -S, with creative names. Queue stats don't have a netlink API, yet, and remain a lion's share of ethtool -S output for new drivers. Not only is that bad because the names differ driver to driver but it's also bug-prone. Intuitively drivers try to report only the stats for active queues, but querying ethtool stats involves multiple system calls, and the number of stats is read separately from the stats themselves. Worse still when user space asks for values of the stats, it doesn't inform the kernel how big the buffer is. If number of stats increases in the meantime kernel will overflow user buffer. Add a netlink API for dumping queue stats. Queue information is exposed via the netdev-genl family, so add the stats there. Support per-queue and sum-for-device dumps. Latter will be useful when subsequent patches add more interesting common stats than just bytes and packets. The API does not currently distinguish between HW and SW stats. The expectation is that the source of the stats will either not matter much (good packets) or be obvious (skb alloc errors). Acked-by: Stanislav Fomichev <[email protected]> Reviewed-by: Amritha Nambiar <[email protected]> Reviewed-by: Xuan Zhuo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> (cherry picked from commit ab63a23) ``` ``` netdev: add queue stat for alloc failures Rx alloc failures are commonly counted by drivers. Support reporting those via netdev-genl queue stats. Acked-by: Stanislav Fomichev <[email protected]> Reviewed-by: Amritha Nambiar <[email protected]> Reviewed-by: Xuan Zhuo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> (cherry picked from commit 92f8b1f) ``` Signed-off-by: CKI Backport Bot <[email protected]> Approved-by: Marcelo Ricardo Leitner <[email protected]> Approved-by: José Ignacio Tornos Martínez <[email protected]> Approved-by: Petr Oros <[email protected]> Approved-by: CKI KWF Bot <[email protected]> Merged-by: Rado Vrbovsky <[email protected]>
2 parents 75a638b + c9a1aea commit 141fc19

File tree

9 files changed

+433
-0
lines changed

9 files changed

+433
-0
lines changed

Documentation/netlink/specs/netdev.yaml

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,10 @@ definitions:
7474
name: queue-type
7575
type: enum
7676
entries: [ rx, tx ]
77+
-
78+
name: qstats-scope
79+
type: flags
80+
entries: [ queue ]
7781

7882
attribute-sets:
7983
-
@@ -265,6 +269,73 @@ attribute-sets:
265269
doc: ID of the NAPI instance which services this queue.
266270
type: u32
267271

272+
-
273+
name: qstats
274+
doc: |
275+
Get device statistics, scoped to a device or a queue.
276+
These statistics extend (and partially duplicate) statistics available
277+
in struct rtnl_link_stats64.
278+
Value of the `scope` attribute determines how statistics are
279+
aggregated. When aggregated for the entire device the statistics
280+
represent the total number of events since last explicit reset of
281+
the device (i.e. not a reconfiguration like changing queue count).
282+
When reported per-queue, however, the statistics may not add
283+
up to the total number of events, will only be reported for currently
284+
active objects, and will likely report the number of events since last
285+
reconfiguration.
286+
attributes:
287+
-
288+
name: ifindex
289+
doc: ifindex of the netdevice to which stats belong.
290+
type: u32
291+
checks:
292+
min: 1
293+
-
294+
name: queue-type
295+
doc: Queue type as rx, tx, for queue-id.
296+
type: u32
297+
enum: queue-type
298+
-
299+
name: queue-id
300+
doc: Queue ID, if stats are scoped to a single queue instance.
301+
type: u32
302+
-
303+
name: scope
304+
doc: |
305+
What object type should be used to iterate over the stats.
306+
type: uint
307+
enum: qstats-scope
308+
-
309+
name: rx-packets
310+
doc: |
311+
Number of wire packets successfully received and passed to the stack.
312+
For drivers supporting XDP, XDP is considered the first layer
313+
of the stack, so packets consumed by XDP are still counted here.
314+
type: uint
315+
value: 8 # reserve some attr ids in case we need more metadata later
316+
-
317+
name: rx-bytes
318+
doc: Successfully received bytes, see `rx-packets`.
319+
type: uint
320+
-
321+
name: tx-packets
322+
doc: |
323+
Number of wire packets successfully sent. Packet is considered to be
324+
successfully sent once it is in device memory (usually this means
325+
the device has issued a DMA completion for the packet).
326+
type: uint
327+
-
328+
name: tx-bytes
329+
doc: Successfully sent bytes, see `tx-packets`.
330+
type: uint
331+
-
332+
name: rx-alloc-fail
333+
doc: |
334+
Number of times skb or buffer allocation failed on the Rx datapath.
335+
Allocation failure may, or may not result in a packet drop, depending
336+
on driver implementation and whether system recovers quickly.
337+
type: uint
338+
268339
operations:
269340
list:
270341
-
@@ -405,6 +476,26 @@ operations:
405476
attributes:
406477
- ifindex
407478
reply: *napi-get-op
479+
-
480+
name: qstats-get
481+
doc: |
482+
Get / dump fine grained statistics. Which statistics are reported
483+
depends on the device and the driver, and whether the driver stores
484+
software counters per-queue.
485+
attribute-set: qstats
486+
dump:
487+
request:
488+
attributes:
489+
- scope
490+
reply:
491+
attributes:
492+
- ifindex
493+
- queue-type
494+
- queue-id
495+
- rx-packets
496+
- rx-bytes
497+
- tx-packets
498+
- tx-bytes
408499

409500
mcast-groups:
410501
list:

Documentation/networking/statistics.rst

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,15 @@ If `-s` is specified once the detailed errors won't be shown.
4141

4242
`ip` supports JSON formatting via the `-j` option.
4343

44+
Queue statistics
45+
~~~~~~~~~~~~~~~~
46+
47+
Queue statistics are accessible via the netdev netlink family.
48+
49+
Currently no widely distributed CLI exists to access those statistics.
50+
Kernel development tools (ynl) can be used to experiment with them,
51+
see `Documentation/userspace-api/netlink/intro-specs.rst`.
52+
4453
Protocol-specific statistics
4554
----------------------------
4655

@@ -147,6 +156,12 @@ Statistics are reported both in the responses to link information
147156
requests (`RTM_GETLINK`) and statistic requests (`RTM_GETSTATS`,
148157
when `IFLA_STATS_LINK_64` bit is set in the `.filter_mask` of the request).
149158

159+
netdev (netlink)
160+
~~~~~~~~~~~~~~~~
161+
162+
`netdev` generic netlink family allows accessing page pool and per queue
163+
statistics.
164+
150165
ethtool
151166
-------
152167

include/linux/netdevice.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1980,6 +1980,7 @@ enum netdev_ml_priv_type {
19801980
*
19811981
* @sysfs_rx_queue_group: Space for optional per-rx queue attributes
19821982
* @rtnl_link_ops: Rtnl_link_ops
1983+
* @stat_ops: Optional ops for queue-aware statistics
19831984
*
19841985
* @gso_max_size: Maximum size of generic segmentation offload
19851986
* @tso_max_size: Device (as in HW) limit on the max TSO request size
@@ -2370,6 +2371,8 @@ struct net_device {
23702371

23712372
const struct rtnl_link_ops *rtnl_link_ops;
23722373

2374+
const struct netdev_stat_ops *stat_ops;
2375+
23732376
/* for setting kernel sock attribute on TCP connection setup */
23742377
#define GSO_MAX_SEGS 65535u
23752378
#define GSO_LEGACY_MAX_SIZE 65536u

include/net/netdev_queues.h

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,62 @@
44

55
#include <linux/netdevice.h>
66

7+
/* See the netdev.yaml spec for definition of each statistic */
8+
struct netdev_queue_stats_rx {
9+
u64 bytes;
10+
u64 packets;
11+
u64 alloc_fail;
12+
};
13+
14+
struct netdev_queue_stats_tx {
15+
u64 bytes;
16+
u64 packets;
17+
};
18+
19+
/**
20+
* struct netdev_stat_ops - netdev ops for fine grained stats
21+
* @get_queue_stats_rx: get stats for a given Rx queue
22+
* @get_queue_stats_tx: get stats for a given Tx queue
23+
* @get_base_stats: get base stats (not belonging to any live instance)
24+
*
25+
* Query stats for a given object. The values of the statistics are undefined
26+
* on entry (specifically they are *not* zero-initialized). Drivers should
27+
* assign values only to the statistics they collect. Statistics which are not
28+
* collected must be left undefined.
29+
*
30+
* Queue objects are not necessarily persistent, and only currently active
31+
* queues are queried by the per-queue callbacks. This means that per-queue
32+
* statistics will not generally add up to the total number of events for
33+
* the device. The @get_base_stats callback allows filling in the delta
34+
* between events for currently live queues and overall device history.
35+
* When the statistics for the entire device are queried, first @get_base_stats
36+
* is issued to collect the delta, and then a series of per-queue callbacks.
37+
* Only statistics which are set in @get_base_stats will be reported
38+
* at the device level, meaning that unlike in queue callbacks, setting
39+
* a statistic to zero in @get_base_stats is a legitimate thing to do.
40+
* This is because @get_base_stats has a second function of designating which
41+
* statistics are in fact correct for the entire device (e.g. when history
42+
* for some of the events is not maintained, and reliable "total" cannot
43+
* be provided).
44+
*
45+
* Device drivers can assume that when collecting total device stats,
46+
* the @get_base_stats and subsequent per-queue calls are performed
47+
* "atomically" (without releasing the rtnl_lock).
48+
*
49+
* Device drivers are encouraged to reset the per-queue statistics when
50+
* number of queues change. This is because the primary use case for
51+
* per-queue statistics is currently to detect traffic imbalance.
52+
*/
53+
struct netdev_stat_ops {
54+
void (*get_queue_stats_rx)(struct net_device *dev, int idx,
55+
struct netdev_queue_stats_rx *stats);
56+
void (*get_queue_stats_tx)(struct net_device *dev, int idx,
57+
struct netdev_queue_stats_tx *stats);
58+
void (*get_base_stats)(struct net_device *dev,
59+
struct netdev_queue_stats_rx *rx,
60+
struct netdev_queue_stats_tx *tx);
61+
};
62+
763
/**
864
* DOC: Lockless queue stopping / waking helpers.
965
*

include/uapi/linux/netdev.h

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,10 @@ enum netdev_queue_type {
7070
NETDEV_QUEUE_TYPE_TX,
7171
};
7272

73+
enum netdev_qstats_scope {
74+
NETDEV_QSTATS_SCOPE_QUEUE = 1,
75+
};
76+
7377
enum {
7478
NETDEV_A_DEV_IFINDEX = 1,
7579
NETDEV_A_DEV_PAD,
@@ -132,6 +136,21 @@ enum {
132136
NETDEV_A_QUEUE_MAX = (__NETDEV_A_QUEUE_MAX - 1)
133137
};
134138

139+
enum {
140+
NETDEV_A_QSTATS_IFINDEX = 1,
141+
NETDEV_A_QSTATS_QUEUE_TYPE,
142+
NETDEV_A_QSTATS_QUEUE_ID,
143+
NETDEV_A_QSTATS_SCOPE,
144+
NETDEV_A_QSTATS_RX_PACKETS = 8,
145+
NETDEV_A_QSTATS_RX_BYTES,
146+
NETDEV_A_QSTATS_TX_PACKETS,
147+
NETDEV_A_QSTATS_TX_BYTES,
148+
NETDEV_A_QSTATS_RX_ALLOC_FAIL,
149+
150+
__NETDEV_A_QSTATS_MAX,
151+
NETDEV_A_QSTATS_MAX = (__NETDEV_A_QSTATS_MAX - 1)
152+
};
153+
135154
enum {
136155
NETDEV_CMD_DEV_GET = 1,
137156
NETDEV_CMD_DEV_ADD_NTF,
@@ -144,6 +163,7 @@ enum {
144163
NETDEV_CMD_PAGE_POOL_STATS_GET,
145164
NETDEV_CMD_QUEUE_GET,
146165
NETDEV_CMD_NAPI_GET,
166+
NETDEV_CMD_QSTATS_GET,
147167

148168
__NETDEV_CMD_MAX,
149169
NETDEV_CMD_MAX = (__NETDEV_CMD_MAX - 1)

net/core/netdev-genl-gen.c

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,11 @@ static const struct nla_policy netdev_napi_get_dump_nl_policy[NETDEV_A_NAPI_IFIN
6868
[NETDEV_A_NAPI_IFINDEX] = NLA_POLICY_MIN(NLA_U32, 1),
6969
};
7070

71+
/* NETDEV_CMD_QSTATS_GET - dump */
72+
static const struct nla_policy netdev_qstats_get_nl_policy[NETDEV_A_QSTATS_SCOPE + 1] = {
73+
[NETDEV_A_QSTATS_SCOPE] = NLA_POLICY_MASK(NLA_UINT, 0x1),
74+
};
75+
7176
/* Ops table for netdev */
7277
static const struct genl_split_ops netdev_nl_ops[] = {
7378
{
@@ -138,6 +143,13 @@ static const struct genl_split_ops netdev_nl_ops[] = {
138143
.maxattr = NETDEV_A_NAPI_IFINDEX,
139144
.flags = GENL_CMD_CAP_DUMP,
140145
},
146+
{
147+
.cmd = NETDEV_CMD_QSTATS_GET,
148+
.dumpit = netdev_nl_qstats_get_dumpit,
149+
.policy = netdev_qstats_get_nl_policy,
150+
.maxattr = NETDEV_A_QSTATS_SCOPE,
151+
.flags = GENL_CMD_CAP_DUMP,
152+
},
141153
};
142154

143155
static const struct genl_multicast_group netdev_nl_mcgrps[] = {

net/core/netdev-genl-gen.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,8 @@ int netdev_nl_queue_get_dumpit(struct sk_buff *skb,
2828
struct netlink_callback *cb);
2929
int netdev_nl_napi_get_doit(struct sk_buff *skb, struct genl_info *info);
3030
int netdev_nl_napi_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb);
31+
int netdev_nl_qstats_get_dumpit(struct sk_buff *skb,
32+
struct netlink_callback *cb);
3133

3234
enum {
3335
NETDEV_NLGRP_MGMT,

0 commit comments

Comments
 (0)