Skip to content

Commit 7930edc

Browse files
committed
Merge tag 'io_uring-6.15-20250403' of git://git.kernel.dk/linux
Pull more io_uring updates from Jens Axboe: "Set of fixes/updates for io_uring that should go into this release. The ublk bits could've gone via either tree - usually I put them in block, but they got a bit mixed this series with the zero-copy supported that ended up dipping into both trees. This contains: - Fix for sendmsg zc, include in pinned pages accounting like we do for the other zc types - Series for ublk fixing request aborting, doing various little cleanups, fixing some zc issues, and adding queue_rqs support - Another ublk series doing some code cleanups - Series cleaning up the io_uring send path, mostly in preparation for registered buffers - Series doing little MSG_RING cleanups - Fix for the newly added zc rx, fixing len being 0 for the last invocation of the callback - Add vectored registered buffer support for ublk. With that, then ublk also supports this feature in the kernel revision where it could generically introduced for rw/net - A bunch of selftest additions for ublk. This is the majority of the diffstat - Silence a KCSAN data race warning for io-wq - Various little cleanups and fixes" * tag 'io_uring-6.15-20250403' of git://git.kernel.dk/linux: (44 commits) io_uring: always do atomic put from iowq selftests: ublk: enable zero copy for stripe target io_uring: support vectored kernel fixed buffer block: add for_each_mp_bvec() io_uring: add validate_fixed_range() for validate fixed buffer selftests: ublk: kublk: fix an error log line selftests: ublk: kublk: use ioctl-encoded opcodes io_uring/zcrx: return early from io_zcrx_recv_skb if readlen is 0 io_uring/net: avoid import_ubuf for regvec send io_uring/rsrc: check size when importing reg buffer io_uring: cleanup {g,s]etsockopt sqe reading io_uring: hide caches sqes from drivers io_uring: make zcrx depend on CONFIG_IO_URING io_uring: add req flag invariant build assertion Documentation: ublk: remove dead footnote selftests: ublk: specify io_cmd_buf pointer type ublk: specify io_cmd_buf pointer type io_uring: don't pass ctx to tw add remote helper io_uring/msg: initialise msg request opcode io_uring/msg: rename io_double_lock_ctx() ...
2 parents c0dbd11 + 3905136 commit 7930edc

30 files changed

+673
-238
lines changed

Documentation/block/ublk.rst

Lines changed: 26 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -309,18 +309,35 @@ with specified IO tag in the command data:
309309
``UBLK_IO_COMMIT_AND_FETCH_REQ`` to the server, ublkdrv needs to copy
310310
the server buffer (pages) read to the IO request pages.
311311

312-
Future development
313-
==================
314-
315312
Zero copy
316313
---------
317314

318-
Zero copy is a generic requirement for nbd, fuse or similar drivers. A
319-
problem [#xiaoguang]_ Xiaoguang mentioned is that pages mapped to userspace
320-
can't be remapped any more in kernel with existing mm interfaces. This can
321-
occurs when destining direct IO to ``/dev/ublkb*``. Also, he reported that
322-
big requests (IO size >= 256 KB) may benefit a lot from zero copy.
323-
315+
ublk zero copy relies on io_uring's fixed kernel buffer, which provides
316+
two APIs: `io_buffer_register_bvec()` and `io_buffer_unregister_bvec`.
317+
318+
ublk adds IO command of `UBLK_IO_REGISTER_IO_BUF` to call
319+
`io_buffer_register_bvec()` for ublk server to register client request
320+
buffer into io_uring buffer table, then ublk server can submit io_uring
321+
IOs with the registered buffer index. IO command of `UBLK_IO_UNREGISTER_IO_BUF`
322+
calls `io_buffer_unregister_bvec()` to unregister the buffer, which is
323+
guaranteed to be live between calling `io_buffer_register_bvec()` and
324+
`io_buffer_unregister_bvec()`. Any io_uring operation which supports this
325+
kind of kernel buffer will grab one reference of the buffer until the
326+
operation is completed.
327+
328+
ublk server implementing zero copy or user copy has to be CAP_SYS_ADMIN and
329+
be trusted, because it is ublk server's responsibility to make sure IO buffer
330+
filled with data for handling read command, and ublk server has to return
331+
correct result to ublk driver when handling READ command, and the result
332+
has to match with how many bytes filled to the IO buffer. Otherwise,
333+
uninitialized kernel IO buffer will be exposed to client application.
334+
335+
ublk server needs to align the parameter of `struct ublk_param_dma_align`
336+
with backend for zero copy to work correctly.
337+
338+
For reaching best IO performance, ublk server should align its segment
339+
parameter of `struct ublk_param_segment` with backend for avoiding
340+
unnecessary IO split, which usually hurts io_uring performance.
324341

325342
References
326343
==========
@@ -332,5 +349,3 @@ References
332349
.. [#userspace_nbdublk] https://gitlab.com/rwmjones/libnbd/-/tree/nbdublk
333350
334351
.. [#userspace_readme] https://github.com/ming1/ubdsrv/blob/master/README
335-
336-
.. [#xiaoguang] https://lore.kernel.org/linux-block/[email protected]/

0 commit comments

Comments
 (0)