Skip to content

Conversation

@behlendorf
Copy link
Contributor

Motivation and Context

Proposed patch set for zfs-2.4.0-rc3

Description

This includes important bug fixes, doc improvements, performance tuning, and CI updates landed in master since zfs-2.4.0-rc3.

Note that the libzfs and libzpool libraries have had their soname bumped from 6 to 7 in preparation for the final 2.4.0 release. External consumers of these libraries will need to recompile their applications.

How Has This Been Tested?

Will be tested by the CI. Additionally, all of these changes were clean cherry-pick from the master branch where they were previously tested.

Checklist:

robn and others added 30 commits November 12, 2025 12:57
zfs-tests.sh executes test-runner.py to do the actual test work. Any
exit code < 4 is interpreted as success, with the actual value
describing the outcome of the tests inside.

If a Python program crashes in some way (eg an uncaught exception), the
process exit code is 1.

Taken together, this means that test-runner.py can crash during setup,
but return a "success" error code to zfs-tests.sh, which will report and
exit 0. This in turn causes the CI runner to believe the test run
completed successfully.

This commit addresses this by making zfs-tests.sh interpret an exit code
of 255 as a failure in the runner itself. Then, in test-runner.py, the
"fail()" function defaults to a 255 return, and the main function gets
wrapped in a generic exception handler, which prints it and calls
fail().

All together, this should mean that any unexpected failure in the test
runner itself will be propagated out of zfs-tests.sh for CI or any other
calling program to deal with.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#17858
We’re not always on the same page, but at least we’re in the same book.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#17860
In zio_crypt_key_wrap and zio_crypt_key_unwrap, the cuio_s variable was
not initialized before the calls to zfs_uio_init, leading to
uninitialized access to cuio_s.uio_offset.  Initialize it to avoid gcc
warnings.

Similar issue as fixed in 2bf1520 ("Fix gcc uninitialized warning in
FreeBSD zio_crypt.c")

Signed-off-by: Ryan Libby <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Closes openzfs#17863
functional/trim tests do create pools of different types to test
trim, autotrim_config.ksh is missing the type from zpool
create command line while we are looping over different pool
types.

Sponsored-by: Edgecast Cloud LLC.
Signed-off-by: Toomas Soome <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Closes openzfs#17874
Over the time many of DMU functions got flags argument to control
prefetch, caching, etc.  Few functions though left without it, even
though closer look shown that many of them do not require prefetch
due to their access pattern.  This patch adds the flags argument to
dmu_write(), dmu_buf_hold_array() and dmu_buf_hold_array_by_bonus(),
passing DMU_READ_NO_PREFETCH where applicable.

I am going to also pass DMU_UNCACHEDIO to some of them later.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Rob Norris <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Closes openzfs#17872
Remove the out of date helper scripts originally used to port
Illumos commits to the ZoL repository.  Due to layout changes
made to this repository they're no longer entirely correct.
Remove them to make it clear they're no longer being used or
actively maintained.

Reviewed-by: Rob Norris <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#17880
Ultimately this is a revert of 779ac93, which according to
@nabijaczleweli is to paper over automake <1.14's lack of
%reldir% support.

As I understand it, EL8 is the lowest current build target.

Reviewed-by: Rob Norris <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Joseph Holsten <[email protected]>
Closes openzfs#17878
Since we set bv_mos_brtvdev block size, and since we keep dirty
bitmap at the same granularity, we should keep the allocations
and writes done with.  Otherwise it makes the last block write
short, that will be odd once we implement writing of only dirty
blocks, but also requires read-modify-write on DMU layer.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Closes openzfs#17875
Both DDT log and BRT counters we read on pool import and then only
append or overwrite in full blocks.  We don't need them in DMU or
ARC caches.  Fortunately we have DMU_UNCACHEDIO for this now.

Even more we don't need BRT in non-evictable metadata DMU caches,
since it will likely never fit there, while block the cache from
its original users.  Since DMU_OT_IS_METADATA_CACHED() has no way
to differentiate the new metadata types, mark BRT with storage
type of DMU_OT_DDT_ZAP.  As side effect it will also put it on
dedup device, but that should actually be right.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Closes openzfs#17875
In cases where all issued ZIOs must succeed, and we can't do
anything clever about the errors, we should just explicitly set
ZIO_FLAG_TRYHARD and let OS to do all the reasonable retries.

In other cases, where retries can be different from the original,
for example, some ZIOs are allowed to fail due to redundancy, or
we can disable aggregation on retrial to get at least some of
the data, we can do first pass without TRYHARD, and only if needed
retry with ZIO_FLAG_IO_RETRY (which implies TRYHARD semantics).

Reviewed-by: Rob Norris <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Closes openzfs#17877
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by: Sivesh Kumar <[email protected]>
Closes openzfs#17889
Currently this function uses L0 offsets which:
1. is hard to read since it maps offsets to blkid and back each call
2. necessitates dnode_next_block to handle edge cases at limits
3. makes it hard to tell if the traversal can loop infinitely

Instead, update this and dnode_next_offset to work in (blkid, index).
This way the blkid manipulations are clear, and it's also clear that
the traversal always terminates since blkid goes one direction.

I've also considered updating dnode_next_offset to operate on blkid.
Callers use both patterns, so maybe another PR can split the cases?

While here tidy up dnode_next_offset_level comments.

Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Robert Evans <[email protected]>
Closes openzfs#17792
The label 'kfdok' is only used with O_TMPFILE, we need to use
the same #ifdef around this label.

Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Toomas Soome <[email protected]>
Closes openzfs#17894
Update FreeBSD versions:
- add FreeBSD 15.0-STABLE
- add FreeBSD 16.0-CURRENT

So we use the latest versions of each line now:
  - Freebsd 14.3 (RELEASE)
  - FreeBSD 15.0 (STABLE)
  - FreeBSD 16.0 (CURRENT)

In commits - you may specify which type of CI should run:
- ZFS-CI-Type: quick
- ZFS-CI-Type: linux
- ZFS-CI-Type: freebsd
- ZFS-CI-Type: full

Reviewed-by: Alexx Saver <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tino Reichardt <[email protected]>
Closes openzfs#17896
When a write comes in via dmu_sync_late_arrival, its txg is equal to the
open TXG. If that write gangs, and we have not yet activated the new
gang header feature, and the gang header we pick can store a larger gang
header, we will try to schedule the upgrade for the open TXG + 1. In
debug mode, this causes an assertion to trip. This PR sets the TXG for
activating the feature to be the larger of either the current open TXG
or the syncing TXG + 1.

Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Paul Dagnelie <[email protected]>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes openzfs#17824
We need to specifically use the FX_XFLAG_* macros in zpl_ioctl_*attr()
codepaths, and the FS_*_FL macros in the zpl_ioctl_*flags() codepaths.
The earlier code just assumes the FS_*_FL macros for both codepaths.
The 6.17 kernel add a bitmask check in copy_fsxattr_from_user() that
exposed this error via failing 'projectquota' ZTS tests.

Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes openzfs#17884
Closes openzfs#17869
Add an introductory sentance explaining why the reader may want to use
this command, and establishing the requirement that the jail must be
running. Move other requirements from the description of the subcommands
to follow this for flow and structure. Move the caveat that this is for
FreeBSD down to a cannonical CAVEATS section, and crossreference Linux's
equivelant functionality. Mention that this utility can not be used to
delegate the root directory of the jail to that section also.

Reported by: Jan Brankamp <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Allan Jude <[email protected]>
Signed-off-by: Alexander Ziaee <[email protected]>
Closes openzfs#17883
We've heard anecdotes that suggest some
confusion/surprise/disappointment that a changed recordsize is not
applied during rewrite. Until such time as we actually can do that, we
can at least explicitly mention it at something that doesn't work.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Allan Jude <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#17898
We have infinite loop and on certain condition, we exit this loop
and thread with pthread_exit(). But also after this loop,
we have a code to perform pthread_cleanup_pop() and return from the
thread.

The  problem is that modern compilers are able to recognize that we
actually never get to the statements after loop and therefore
it is dead code there.

I think, instead of pthread_exit(), it is better to break out of loop
and let the last statements to work as intended. This is because
we do need to keep pthread_cleanup_pop() anyhow. Of course,
it is matter of taste if we want to use return or pthread_exit as very
last statement in this function.

Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Igor Kozhukhov <[email protected]>
Signed-off-by: Toomas Soome <[email protected]>
Closes openzfs#17900
Change the spelling of "begining" on line 4875 to
"beginning".

Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Adi Gollamudi <[email protected]>
Closes openzfs#17905
Disable the aarch64 NEON SIMD intrinsics for kernel builds.  Safely
using them in the kernel context requires saving/restoring the FPU
registers which is not currently done.

Additionally, remove the aarch64 optimized PREFETCH_L1 and PREFETCH_L2
instruction.  Rely on the more portable compiler built ins.

This lets us remove the problematic workaround in the aarch64_compat.h
header which undefines the __aarch64__ macro.

Reviewed-by: Rob Norris <[email protected]>
Reviewed-by: Tino Reichardt <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#17904
Closes openzfs#17852
FreeBSD now has a pathconf name called _PC_CASE_INSENSITIVE
used to check if a file system performs case insensitive
name lookups.

This patch adds support for this name.

Reviewed-by: Alexander Motin <[email protected]>
 Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rick Macklem <[email protected]>
Closes openzfs#17908
Free issue threads might block waiting for synchronous DDT, BRT or
GANG header reads. So unlike other taskqs using ZTI_SCALE to scale
with number of CPUs, here we also need some amount of threads to
potentially saturate pool reads.  I am not sure we always want the
96 threads we had before ZTI_SCALE introduction at openzfs#11966 on small
systems, but lets make it at least 32.

While here, make free taskqs configurable, similar to read and
write ones.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Rob Norris <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Closes openzfs#17903
Update description of zpool import --rewind-to-checkpoint in
man/man7/zpoolconcepts.7 to explain that rewinding automatically
discards a checkpoint.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Adi Gollamudi <[email protected]>
Closes openzfs#12646
Closes openzfs#17918
BRT_RANGESIZE_TO_NBLOCKS() takes number of ranges as its argument.
To get number of blocks we should multiply it by the entry size,
not divide by it, as it was due to missing parentheses.

Before openzfs#17875 this could cause small memory corruptions for vdevs
bigger than 64TB, but the change made the bug more noticeable.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Closes openzfs#17886
Closes openzfs#17915
This is useful as debugging support, as it lets namespace lock
operations be traced directly. It will also be useful for future work to
reduce the use of spa_namespace_lock, traditionally a source of
difficult deadlocks.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#17906
dmu_object_info_from_dnode() takes two locks and copies plenty of
data that we don't need in zap_lockdir_impl().  Just read dn_type
directly in this hot path.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Closes openzfs#17921
According to my observations, BRT ZAPs are typically compressible
3:1 for data and 2:1 for indirects.  With ashift=12, typical these
days, it means increasing the block sizes to 8KB we may get most
of possible compression, reducing on-disk and in-ARC BRT footprint
in half by the cost of some compression/decompression overhead,
but without real write inflation, only some dirty data increase.

Increase to 32KB similar to DDT could further increase compression
and storage efficiency, but at the cost of write inflation and
much bigger dirty data increase, which we can not properly control
now.  So lets leave this for a time when BRT log gets implemented.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Closes openzfs#17916
Implement BRT (Block Reference Table) prefetch functionality similar
to existing DDT prefetch.  This allows preloading BRT metadata into
ARC to improve performance for block cloning operations and frees
of earlier cloned blocks.

Make -t parameter optional.  When omitted, prefetch all supported
metadata types (both DDT and BRT now).

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Closes openzfs#17890
oshogbo and others added 7 commits November 12, 2025 13:07
Introduce a new vdev property `VDEV_PROP_SLOW_IO_REPORTING` that
allows users to disable notifications for slow devices.
This prevents ZED and/or ZFSD from degrading the pool due to slow
I/O.

Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Mariusz Zaborski <[email protected]>
Closes 17477
Detect container environments and set timeout to zero unless
ZFS_MODULE_TIMEOUT is already set. This avoids an unnecessary ten
second delay after running zfs/zpool commands in a container where
/dev/zfs is unavailable.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Adi Gollamudi <[email protected]>
Closes openzfs#15165
Closes openzfs#17922
The nvlist_snprintf() function was added to the ABI of libnvpair.
No other symbols were modified or removed.  Bump the library-info
SONAME current and age args to reflect this is a minor library
version update.

Reviewed-by: Rob Norris <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#17911
The ABI of libzfs and libzpool have breaking changes since the
last major release.  Bump the SONAME for the upcoming 2.4 release
branch to libzfs7 and libzpool7.

Reviewed-by: Rob Norris <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#17911
The zfs_tunable_* functions are a public interface which are
part of the internal libspl convenience library.  They should
be hidden to prevent an unnecessary ABI change in installed
libraries which link against libspl (e.g. libzfs_core, libuutil).

We do already leak long standing libspl symbols.  This commit is
solely intended to prevent leaking these new ones until this is
properly sorted out.

Reviewed-by: Rob Norris <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#17911
Refresh all ABI files using the CI generated files to reflect
the library interfaces to be published for the 2.4 release.

Reviewed-by: Rob Norris <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#17911
Signed-off-by: Brian Behlendorf <[email protected]>
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Nov 12, 2025
@github-actions github-actions bot added the Status: Work in Progress Not yet ready for general review label Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Code Review Needed Ready for review and testing Status: Work in Progress Not yet ready for general review

Projects

None yet

Development

Successfully merging this pull request may close these issues.