Skip to content

[CIQ 6.12.y] Merge v6.12.17 & config Updates #194

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 156 commits into from

Conversation

PlaidCat
Copy link
Collaborator

@PlaidCat PlaidCat commented Apr 3, 2025

This is just a base update with refresh of the configs along with a merge rather than rebase ... its not how we usually operate but it might be the only way to work on this.

git checkout -b {jmaple}_merge_test_ciq-6.12.y
git merge v6.12.17
../kernel-src-tree-tools/lt_rebase.sh
git status
git diff

Last Commit is the changes CIQ has made otherwise everything else merged:
9117443

[CIQ] v6.12.17 config updates
All configs dropped the follwoing since its dependent on ARCH_MVEBU
which is not configured on so there is no reason to ask.
 # CONFIG_CZNIC_PLATFORMS is not set
See upstream commit: https://github.com/ctrliq/kernel-src-tree/commit/dd0f05b98925111f4530d7dab774398cdb32e9e3

x86_64 configs also dropped a previously defined y config
 -CONFIG_IMX_SCMI_MISC_DRV=y
 This comes from firmware: imx: IMX_SCMI_MISC_DRV should depend on ARCH_MXC
See Upstream Commit: https://github.com/ctrliq/kernel-src-tree/commit/be6686b823b30a69b1f71bde228ce042c78a1941
Whats a little confusing is that the fedora kernel-ark says that this is
marked as a `y`
$ cat redhat/configs/rhel/generic/CONFIG_IMX_SCMI_MISC_DRV
CONFIG_IMX_SCMI_MISC_DRV=y

[kernel-ark]$ ls redhat/configs/kernel-6.13.8-x86_64*
redhat/configs/kernel-6.13.8-x86_64-automotive.config
redhat/configs/kernel-6.13.8-x86_64.config
redhat/configs/kernel-6.13.8-x86_64-rt.config
redhat/configs/kernel-6.13.8-x86_64-automotive-debug.config
redhat/configs/kernel-6.13.8-x86_64-debug.config
redhat/configs/kernel-6.13.8-x86_64-rt-debug.config

[kernel-ark]$ grep CONFIG_IMX_SCMI_MISC_DRV redhat/configs/kernel-6.13.8-x86_64*
[kernel-ark]$

Do to this we're leaving this as the default Kconfig of off for x86_64

Build

Will be done by PR Builders

Testing

No Testing for this PR, we're still in development and this merge had no conflicts from the last PR we merged.

ctmarinas and others added 30 commits February 27, 2025 04:30
PROT_MTE (memory tagging extensions) is not supported on all user mmap()
types for various reasons (memory attributes, backing storage, CoW
handling). The arm64 arch_validate_flags() function checks whether the
VM_MTE_ALLOWED flag has been set for a vma during mmap(), usually by
arch_calc_vm_flag_bits().

Linux prior to 6.13 does not support PROT_MTE hugetlb mappings. This was
added by commit 25c17c4 ("hugetlb: arm64: add mte support").
However, earlier kernels inadvertently set VM_MTE_ALLOWED on
(MAP_ANONYMOUS | MAP_HUGETLB) mappings by only checking for
MAP_ANONYMOUS.

Explicitly check MAP_HUGETLB in arch_calc_vm_flag_bits() and avoid
setting VM_MTE_ALLOWED for such mappings.

Fixes: 9f34193 ("arm64: mte: Add PROT_MTE support to mmap() and mprotect()")
Cc: <[email protected]> # 5.10.x-6.12.x
Reported-by: Naresh Kamboju <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
[ Upstream commit dddcb19 ]

When we introduce xe_syncs, we don't wait for internal OA programming
batches to complete. That is, xe_syncs are signaled asynchronously. In
anticipation for this, separate out batch submission from waiting for
completion of those batches.

v2: Change return type of xe_oa_submit_bb to "struct dma_fence *" (Matt B)
v3: Retain init "int err = 0;" in xe_oa_submit_bb (Jose)

Reviewed-by: Jonathan Cavitt <[email protected]>
Signed-off-by: Ashutosh Dixit <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Stable-dep-of: f0ed398 ("xe/oa: Fix query mode of operation for OAR/OAC")
Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit c8507a2 ]

Now that we have laid the groundwork, introduce OA sync properties in the
uapi and parse the input xe_sync array as is done elsewhere in the
driver. Also add DRM_XE_OA_CAPS_SYNCS bit in OA capabilities for userspace.

v2: Fix and document DRM_XE_SYNC_TYPE_USER_FENCE for OA (Matt B)
    Add DRM_XE_OA_CAPS_SYNCS bit to OA capabilities (Jose)

Acked-by: José Roberto de Souza <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Signed-off-by: Ashutosh Dixit <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Stable-dep-of: f0ed398 ("xe/oa: Fix query mode of operation for OAR/OAC")
Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 2fb4350 ]

Add input fence dependencies which will make OA configuration wait till
these dependencies are met (till input fences signal).

v2: Change add_deps arg to xe_oa_submit_bb from bool to enum (Matt Brost)

Reviewed-by: Jonathan Cavitt <[email protected]>
Signed-off-by: Ashutosh Dixit <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Stable-dep-of: f0ed398 ("xe/oa: Fix query mode of operation for OAR/OAC")
Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 5503983 ]

This is a set of squashed commits to facilitate smooth applying to
stable. Each commit message is retained for reference.

1) Allow a GGTT mapped batch to be submitted to user exec queue

For a OA use case, one of the HW registers needs to be modified by
submitting an MI_LOAD_REGISTER_IMM command to the users exec queue, so
that the register is modified in the user's hardware context. In order
to do this a batch that is mapped in GGTT, needs to be submitted to the
user exec queue. Since all user submissions use q->vm and hence PPGTT,
add some plumbing to enable submission of batches mapped in GGTT.

v2: ggtt is zero-initialized, so no need to set it false (Matt Brost)

2) xe/oa: Use MI_LOAD_REGISTER_IMMEDIATE to enable OAR/OAC

To enable OAR/OAC, a bit in RING_CONTEXT_CONTROL needs to be set.
Setting this bit cause the context image size to change and if not done
correct, can cause undesired hangs.

Current code uses a separate exec_queue to modify this bit and is
error-prone. As per HW recommendation, submit MI_LOAD_REGISTER_IMM to
the target hardware context to modify the relevant bit.

In v2 version, an attempt to submit everything to the user-queue was
made, but it failed the unprivileged-single-ctx-counters test. It
appears that the OACTXCONTROL must be modified from a remote context.

In v3 version, all context specific register configurations were moved
to use LOAD_REGISTER_IMMEDIATE and that seems to work well. This is a
cleaner way, since we can now submit all configuration to user
exec_queue and the fence handling is simplified.

v2:
(Matt)
- set job->ggtt to true if create job is successful
- unlock vm on job error

(Ashutosh)
- don't wait on job submission
- use kernel exec queue where possible

v3:
(Ashutosh)
- Fix checkpatch issues
- Remove extra spaces/new-lines
- Add Fixes: and Cc: tags
- Reset context control bit when OA stream is closed
- Submit all config via MI_LOAD_REGISTER_IMMEDIATE

(Umesh)
- Update commit message for v3 experiment
- Squash patches for easier port to stable

v4:
(Ashutosh)
- No need to pass q to xe_oa_submit_bb
- Do not support exec queues with width > 1
- Fix disabling of CTX_CTRL_OAC_CONTEXT_ENABLE

v5:
(Ashutosh)
- Drop reg_lri related comments
- Use XE_OA_SUBMIT_NO_DEPS in xe_oa_load_with_lri

Fixes: 8135f1c ("drm/xe/oa: Don't reset OAC_CONTEXT_ENABLE on OA stream close")
Signed-off-by: Umesh Nerlige Ramappa <[email protected]>
Reviewed-by: Matthew Brost <[email protected]> # commit 1
Reviewed-by: Ashutosh Dixit <[email protected]>
Cc: [email protected]
Reviewed-by: Jonathan Cavitt <[email protected]>
Signed-off-by: Ashutosh Dixit <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Stable-dep-of: f0ed398 ("xe/oa: Fix query mode of operation for OAR/OAC")
Signed-off-by: Sasha Levin <[email protected]>
…page_io()

[ Upstream commit 928b4de ]

The function extent_writepage_io() will submit the dirty sectors inside
the page for the write.

But recently to co-operate with the incoming subpage compression
enhancement, a new bitmap is introduced to
btrfs_bio_ctrl::submit_bitmap, to only avoid a subset of the dirty
range.

This is because we can have the following cases with 64K page size:

    0      16K       32K       48K       64K
    |      |/////////|         |/|
                                 52K

For range [16K, 32K), we queue the dirty range for compression, which is
ran in a delayed workqueue.
Then for range [48K, 52K), we go through the regular submission path.

In that case, our btrfs_bio_ctrl::submit_bitmap will exclude the range
[16K, 32K).

The dirty flags for the range [16K, 32K) is only cleared when the
compression is done, by the extent_clear_unlock_delalloc() call inside
submit_one_async_extent().

This patch fix the false alert by removing the
btrfs_folio_assert_not_dirty() check, since it's no longer correct for
subpage compression cases.

Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Stable-dep-of: 8bf334b ("btrfs: fix double accounting race when extent_writepage_io() failed")
Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 2bca8eb ]

Currently for subpage (sector size < page size) cases, we reuse subpage
locked bitmap to find out all delalloc ranges we have locked, and run
all those found ranges.

However such reuse is not perfect, e.g.:

    0       32K      64K      96K       128K
    |       |////////||///////|    |////|
                                   120K

For above range, writepage_delalloc() for page 0 will handle the range
[32K, 96k), note delalloc range can be beyond the page boundary.

But writepage_delalloc() for page 64K will only handle range [120K,
128K), as the previous run on page 0 has already handled range [64K,
96K).
Meanwhile for the writeback we should expect range [64K, 96K) to also be
locked, this leads to the mismatch from locked bitmap and delalloc
range.

This is not causing problems yet, but it's still an inconsistent
behavior.

So instead of relying on the subpage locked bitmap, move the delalloc
range search using local @delalloc_bitmap, so that we can remove the
existing btrfs_folio_find_writer_locked().

Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Stable-dep-of: 8bf334b ("btrfs: fix double accounting race when extent_writepage_io() failed")
Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit c96d0e3 ]

Currently we only mark sectors as locked if there is a *NEW* delalloc
range for it.

But NEW delalloc range is not the same as dirty sectors we want to
submit, e.g:

        0       32K      64K      96K       128K
        |       |////////||///////|    |////|
                                       120K

For above 64K page size case, writepage_delalloc() for page 0 will find
and lock the delalloc range [32K, 96K), which is beyond the page
boundary.

Then when writepage_delalloc() is called for the page 64K, since [64K,
96K) is already locked, only [120K, 128K) will be locked.

This means, although range [64K, 96K) is dirty and will be submitted
later by extent_writepage_io(), it will not be marked as locked.

This is fine for now, as we call btrfs_folio_end_writer_lock_bitmap() to
free every non-compressed sector, and compression is only allowed for
full page range.

But this is not safe for future sector perfect compression support, as
this can lead to double folio unlock:

              Thread A                 |           Thread B
---------------------------------------+--------------------------------
                                       | submit_one_async_extent()
				       | |- extent_clear_unlock_delalloc()
extent_writepage()                     |    |- btrfs_folio_end_writer_lock()
|- btrfs_folio_end_writer_lock_bitmap()|       |- btrfs_subpage_end_and_test_writer()
   |                                   |       |  |- atomic_sub_and_test()
   |                                   |       |     /* Now the atomic value is 0 */
   |- if (atomic_read() == 0)          |       |
   |- folio_unlock()                   |       |- folio_unlock()

The root cause is the above range [64K, 96K) is dirtied and should also
be locked but it isn't.

So to make everything more consistent and prepare for the incoming
sector perfect compression, mark all dirty sectors as locked.

Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Stable-dep-of: 8bf334b ("btrfs: fix double accounting race when extent_writepage_io() failed")
Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 8511074 ]

This function is not really suitable to lock a folio, as it lacks the
proper mapping checks, thus the locked folio may not even belong to
btrfs.

And due to the above reason, the last user inside lock_delalloc_folios()
is already removed, and we can remove this function.

Signed-off-by: Qu Wenruo <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Stable-dep-of: 8bf334b ("btrfs: fix double accounting race when extent_writepage_io() failed")
Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 336e69f ]

Since commit d7172f5 ("btrfs: use per-buffer locking for
extent_buffer reading"), metadata read no longer relies on the subpage
reader locking.

This means we do not need to maintain a different metadata/data split
for locking, so we can convert the existing reader lock users by:

- add_ra_bio_pages()
  Convert to btrfs_folio_set_writer_lock()

- end_folio_read()
  Convert to btrfs_folio_end_writer_lock()

- begin_folio_read()
  Convert to btrfs_folio_set_writer_lock()

- folio_range_has_eb()
  Remove the subpage->readers checks, since it is always 0.

- Remove btrfs_subpage_start_reader() and btrfs_subpage_end_reader()

Signed-off-by: Qu Wenruo <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Stable-dep-of: 8bf334b ("btrfs: fix double accounting race when extent_writepage_io() failed")
Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 0f71202 ]

Since there is no user of reader locks, rename the writer locks into a
more generic name, by removing the "_writer" part from the name.

And also rename btrfs_subpage::writer into btrfs_subpage::locked.

Signed-off-by: Qu Wenruo <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Stable-dep-of: 8bf334b ("btrfs: fix double accounting race when extent_writepage_io() failed")
Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 011a9a1 ]

As extent_writepage() is internal helper we should use our inode type,
so change it from struct inode.

Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Anand Jain <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Stable-dep-of: 8bf334b ("btrfs: fix double accounting race when extent_writepage_io() failed")
Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 72dad8e ]

[BUG]
When running btrfs with block size (4K) smaller than page size (64K,
aarch64), there is a very high chance to crash the kernel at
generic/750, with the following messages:
(before the call traces, there are 3 extra debug messages added)

  BTRFS warning (device dm-3): read-write for sector size 4096 with page size 65536 is experimental
  BTRFS info (device dm-3): checking UUID tree
  hrtimer: interrupt took 5451385 ns
  BTRFS error (device dm-3): cow_file_range failed, root=4957 inode=257 start=1605632 len=69632: -28
  BTRFS error (device dm-3): run_delalloc_nocow failed, root=4957 inode=257 start=1605632 len=69632: -28
  BTRFS error (device dm-3): failed to run delalloc range, root=4957 ino=257 folio=1572864 submit_bitmap=8-15 start=1605632 len=69632: -28
  ------------[ cut here ]------------
  WARNING: CPU: 2 PID: 3020984 at ordered-data.c:360 can_finish_ordered_extent+0x370/0x3b8 [btrfs]
  CPU: 2 UID: 0 PID: 3020984 Comm: kworker/u24:1 Tainted: G           OE      6.13.0-rc1-custom+ #89
  Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
  Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
  Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs]
  pc : can_finish_ordered_extent+0x370/0x3b8 [btrfs]
  lr : can_finish_ordered_extent+0x1ec/0x3b8 [btrfs]
  Call trace:
   can_finish_ordered_extent+0x370/0x3b8 [btrfs] (P)
   can_finish_ordered_extent+0x1ec/0x3b8 [btrfs] (L)
   btrfs_mark_ordered_io_finished+0x130/0x2b8 [btrfs]
   extent_writepage+0x10c/0x3b8 [btrfs]
   extent_write_cache_pages+0x21c/0x4e8 [btrfs]
   btrfs_writepages+0x94/0x160 [btrfs]
   do_writepages+0x74/0x190
   filemap_fdatawrite_wbc+0x74/0xa0
   start_delalloc_inodes+0x17c/0x3b0 [btrfs]
   btrfs_start_delalloc_roots+0x17c/0x288 [btrfs]
   shrink_delalloc+0x11c/0x280 [btrfs]
   flush_space+0x288/0x328 [btrfs]
   btrfs_async_reclaim_data_space+0x180/0x228 [btrfs]
   process_one_work+0x228/0x680
   worker_thread+0x1bc/0x360
   kthread+0x100/0x118
   ret_from_fork+0x10/0x20
  ---[ end trace 0000000000000000 ]---
  BTRFS critical (device dm-3): bad ordered extent accounting, root=4957 ino=257 OE offset=1605632 OE len=16384 to_dec=16384 left=0
  BTRFS critical (device dm-3): bad ordered extent accounting, root=4957 ino=257 OE offset=1622016 OE len=12288 to_dec=12288 left=0
  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
  BTRFS critical (device dm-3): bad ordered extent accounting, root=4957 ino=257 OE offset=1634304 OE len=8192 to_dec=4096 left=0
  CPU: 1 UID: 0 PID: 3286940 Comm: kworker/u24:3 Tainted: G        W  OE      6.13.0-rc1-custom+ #89
  Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
  Workqueue:  btrfs_work_helper [btrfs] (btrfs-endio-write)
  pstate: 404000c5 (nZcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  pc : process_one_work+0x110/0x680
  lr : worker_thread+0x1bc/0x360
  Call trace:
   process_one_work+0x110/0x680 (P)
   worker_thread+0x1bc/0x360 (L)
   worker_thread+0x1bc/0x360
   kthread+0x100/0x118
   ret_from_fork+0x10/0x20
  Code: f84086a1 f9000fe1 53041c21 b9003361 (f9400661)
  ---[ end trace 0000000000000000 ]---
  Kernel panic - not syncing: Oops: Fatal exception
  SMP: stopping secondary CPUs
  SMP: failed to stop secondary CPUs 2-3
  Dumping ftrace buffer:
     (ftrace buffer empty)
  Kernel Offset: 0x275bb9540000 from 0xffff800080000000
  PHYS_OFFSET: 0xffff8fbba0000000
  CPU features: 0x100,00000070,00801250,8201720b

[CAUSE]
The above warning is triggered immediately after the delalloc range
failure, this happens in the following sequence:

- Range [1568K, 1636K) is dirty

   1536K  1568K     1600K    1636K  1664K
   |      |/////////|////////|      |

  Where 1536K, 1600K and 1664K are page boundaries (64K page size)

- Enter extent_writepage() for page 1536K

- Enter run_delalloc_nocow() with locked page 1536K and range
  [1568K, 1636K)
  This is due to the inode having preallocated extents.

- Enter cow_file_range() with locked page 1536K and range
  [1568K, 1636K)

- btrfs_reserve_extent() only reserved two extents
  The main loop of cow_file_range() only reserved two data extents,

  Now we have:

   1536K  1568K        1600K    1636K  1664K
   |      |<-->|<--->|/|///////|      |
               1584K  1596K
  Range [1568K, 1596K) has an ordered extent reserved.

- btrfs_reserve_extent() failed inside cow_file_range() for file offset
  1596K
  This is already a bug in our space reservation code, but for now let's
  focus on the error handling path.

  Now cow_file_range() returned -ENOSPC.

- btrfs_run_delalloc_range() do error cleanup <<< ROOT CAUSE
  Call btrfs_cleanup_ordered_extents() with locked folio 1536K and range
  [1568K, 1636K)

  Function btrfs_cleanup_ordered_extents() normally needs to skip the
  ranges inside the folio, as it will normally be cleaned up by
  extent_writepage().

  Such split error handling is already problematic in the first place.

  What's worse is the folio range skipping itself, which is not taking
  subpage cases into consideration at all, it will only skip the range
  if the page start >= the range start.
  In our case, the page start < the range start, since for subpage cases
  we can have delalloc ranges inside the folio but not covering the
  folio.

  So it doesn't skip the page range at all.
  This means all the ordered extents, both [1568K, 1584K) and
  [1584K, 1596K) will be marked as IOERR.

  And these two ordered extents have no more pending ios, they are marked
  finished, and *QUEUED* to be deleted from the io tree.

- extent_writepage() do error cleanup
  Call btrfs_mark_ordered_io_finished() for the range [1536K, 1600K).

  Although ranges [1568K, 1584K) and [1584K, 1596K) are finished, the
  deletion from io tree is async, it may or may not happen at this
  time.

  If the ranges have not yet been removed, we will do double cleaning on
  those ranges, triggering the above ordered extent warnings.

In theory there are other bugs, like the cleanup in extent_writepage()
can cause double accounting on ranges that are submitted asynchronously
(compression for example).

But that's much harder to trigger because normally we do not mix regular
and compression delalloc ranges.

[FIX]
The folio range split is already buggy and not subpage compatible, it
was introduced a long time ago where subpage support was not even considered.

So instead of splitting the ordered extents cleanup into the folio range
and out of folio range, do all the cleanup inside writepage_delalloc().

- Pass @null as locked_folio for btrfs_cleanup_ordered_extents() in
  btrfs_run_delalloc_range()

- Skip the btrfs_cleanup_ordered_extents() if writepage_delalloc()
  failed

  So all ordered extents are only cleaned up by
  btrfs_run_delalloc_range().

- Handle the ranges that already have ordered extents allocated
  If part of the folio already has ordered extent allocated, and
  btrfs_run_delalloc_range() failed, we also need to cleanup that range.

Now we have a concentrated error handling for ordered extents during
btrfs_run_delalloc_range().

Fixes: d1051d6 ("btrfs: Fix error handling in btrfs_cleanup_ordered_extents")
CC: [email protected] # 5.15+
Reviewed-by: Boris Burkov <[email protected]>
Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Stable-dep-of: 8bf334b ("btrfs: fix double accounting race when extent_writepage_io() failed")
Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 8bf334b ]

[BUG]
If submit_one_sector() failed inside extent_writepage_io() for sector
size < page size cases (e.g. 4K sector size and 64K page size), then
we can hit double ordered extent accounting error.

This should be very rare, as submit_one_sector() only fails when we
failed to grab the extent map, and such extent map should exist inside
the memory and has been pinned.

[CAUSE]
For example we have the following folio layout:

    0  4K          32K    48K   60K 64K
    |//|           |//////|     |///|

Where |///| is the dirty range we need to writeback. The 3 different
dirty ranges are submitted for regular COW.

Now we hit the following sequence:

- submit_one_sector() returned 0 for [0, 4K)

- submit_one_sector() returned 0 for [32K, 48K)

- submit_one_sector() returned error for [60K, 64K)

- btrfs_mark_ordered_io_finished() called for the whole folio
  This will mark the following ranges as finished:
  * [0, 4K)
  * [32K, 48K)
    Both ranges have their IO already submitted, this cleanup will
    lead to double accounting.

  * [60K, 64K)
    That's the correct cleanup.

The only good news is, this error is only theoretical, as the target
extent map is always pinned, thus we should directly grab it from
memory, other than reading it from the disk.

[FIX]
Instead of calling btrfs_mark_ordered_io_finished() for the whole folio
range, which can touch ranges we should not touch, instead
move the error handling inside extent_writepage_io().

So that we can cleanup exact sectors that ought to be submitted but failed.

This provides much more accurate cleanup, avoiding the double accounting.

CC: [email protected] # 5.15+
Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit d91060e ]

Access KVM's emulated APIC base MSR value directly instead of bouncing
through a helper, as there is no reason to add a layer of indirection, and
there are other MSRs with a "set" but no "get", e.g. EFER.

No functional change intended.

Reviewed-by: Kai Huang <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Stable-dep-of: 04bc93c ("KVM: nVMX: Defer SVI update to vmcs01 on EOI when L2 is active w/o VID")
Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit adfec1f ]

Inline kvm_get_apic_mode() in lapic.h to avoid a CALL+RET as well as an
export.  The underlying kvm_apic_mode() helper is public information, i.e.
there is no state/information that needs to be hidden from vendor modules.

No functional change intended.

Reviewed-by: Kai Huang <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Stable-dep-of: 04bc93c ("KVM: nVMX: Defer SVI update to vmcs01 on EOI when L2 is active w/o VID")
Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 04bc93c ]

If KVM emulates an EOI for L1's virtual APIC while L2 is active, defer
updating GUEST_INTERUPT_STATUS.SVI, i.e. the VMCS's cache of the highest
in-service IRQ, until L1 is active, as vmcs01, not vmcs02, needs to track
vISR.  The missed SVI update for vmcs01 can result in L1 interrupts being
incorrectly blocked, e.g. if there is a pending interrupt with lower
priority than the interrupt that was EOI'd.

This bug only affects use cases where L1's vAPIC is effectively passed
through to L2, e.g. in a pKVM scenario where L2 is L1's depriveleged host,
as KVM will only emulate an EOI for L1's vAPIC if Virtual Interrupt
Delivery (VID) is disabled in vmc12, and L1 isn't intercepting L2 accesses
to its (virtual) APIC page (or if x2APIC is enabled, the EOI MSR).

WARN() if KVM updates L1's ISR while L2 is active with VID enabled, as an
EOI from L2 is supposed to affect L2's vAPIC, but still defer the update,
to try to keep L1 alive.  Specifically, KVM forwards all APICv-related
VM-Exits to L1 via nested_vmx_l1_wants_exit():

	case EXIT_REASON_APIC_ACCESS:
	case EXIT_REASON_APIC_WRITE:
	case EXIT_REASON_EOI_INDUCED:
		/*
		 * The controls for "virtualize APIC accesses," "APIC-
		 * register virtualization," and "virtual-interrupt
		 * delivery" only come from vmcs12.
		 */
		return true;

Fixes: c7c9c56 ("x86, apicv: add virtual interrupt delivery support")
Cc: [email protected]
Link: https://lore.kernel.org/kvm/[email protected]
Reported-by: Markku Ahvenjärvi <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]
Cc: Janne Karhunen <[email protected]>
Signed-off-by: Chao Gao <[email protected]>
[sean: drop request, handle in VMX, write changelog]
Tested-by: Chao Gao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit b042004 ]

[Why]
For Header related changes for core

[How]
Refactoring if and endif statements to enable DC_LOGGER

Reviewed-by: Mounika Adhuri <[email protected]>
Reviewed-by: Alvin Lee <[email protected]>
Signed-off-by: Lohita Mudimela <[email protected]>
Signed-off-by: Tom Chung <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Stable-dep-of: f88192d ("drm/amd/display: Correct register address in dcn35")
Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit a1fc283 ]

[why]
hw register offset delta

Reviewed-by: Martin Leung <[email protected]>
Signed-off-by: Charlene Liu <[email protected]>
Signed-off-by: Aurabindo Pillai <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Stable-dep-of: f88192d ("drm/amd/display: Correct register address in dcn35")
Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit f88192d ]

[Why]
the offset address of mmCLK5_spll_field_8 was incorrect for dcn35
which causes SSC not to be enabled.

Reviewed-by: Charlene Liu <[email protected]>
Signed-off-by: Lo-An Chen <[email protected]>
Signed-off-by: Zaeem Mohamed <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit a4c5a46 ]

Different connectivity boards may be attached to the same platform. For
example, QCA6698-based boards can support either a two-antenna or
three-antenna solution, both of which work on the sa8775p-ride platform.
Due to differences in connectivity boards and variations in RF
performance from different foundries, different NVM configurations are
used based on the board ID.

Therefore, in the firmware-name property, if the NVM file has an
extension, the NVM file will be used. Otherwise, the system will first
try the .bNN (board ID) file, and if that fails, it will fall back to
the .bin file.

Possible configurations:
firmware-name = "QCA6698/hpnv21";
firmware-name = "QCA6698/hpnv21.bin";

Signed-off-by: Cheng Jiang <[email protected]>
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Stable-dep-of: a2fad24 ("Bluetooth: qca: Fix poor RF performance for WCN6855")
Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit a2fad24 ]

For WCN6855, board ID specific NVM needs to be downloaded once board ID
is available, but the default NVM is always downloaded currently.

The wrong NVM causes poor RF performance, and effects user experience
for several types of laptop with WCN6855 on the market.

Fix by downloading board ID specific NVM if board ID is available.

Fixes: 095327f ("Bluetooth: hci_qca: Add support for QTI Bluetooth chip wcn6855")
Cc: [email protected] # 6.4
Signed-off-by: Zijun Hu <[email protected]>
Tested-by: Johan Hovold <[email protected]>
Reviewed-by: Johan Hovold <[email protected]>
Tested-by: Steev Klimaszewski <[email protected]> #Thinkpad X13s
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
…ports

[ Upstream commit 0e45a09 ]

serio_pause_rx() and serio_continue_rx() are usually used together to
temporarily stop receiving interrupts/data for a given serio port.
Define "serio_pause_rx" guard for this so that the port is always
resumed once critical section is over.

Example:

	scoped_guard(serio_pause_rx, elo->serio) {
		elo->expected_packet = toupper(packet[0]);
		init_completion(&elo->cmd_done);
	}

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Torokhov <[email protected]>
Stable-dep-of: 08bd5b7 ("Input: synaptics - fix crash when enabling pass-through port")
Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 08bd5b7 ]

When enabling a pass-through port an interrupt might come before psmouse
driver binds to the pass-through port. However synaptics sub-driver
tries to access psmouse instance presumably associated with the
pass-through port to figure out if only 1 byte of response or entire
protocol packet needs to be forwarded to the pass-through port and may
crash if psmouse instance has not been attached to the port yet.

Fix the crash by introducing open() and close() methods for the port and
check if the port is open before trying to access psmouse instance.
Because psmouse calls serio_open() only after attaching psmouse instance
to serio port instance this prevents the potential crash.

Reported-by: Takashi Iwai <[email protected]>
Fixes: 100e169 ("Input: libps2 - attach ps2dev instances as serio port's drvdata")
Link: https://bugzilla.suse.com/show_bug.cgi?id=1219522
Cc: [email protected]
Reviewed-by: Takashi Iwai <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 541011d ]

The stop trigger invokes rz_ssi_stop() and rz_ssi_stream_quit().
- The purpose of rz_ssi_stop() is to disable TX/RX, terminate DMA
  transactions, and set the controller to idle.
- The purpose of rz_ssi_stream_quit() is to reset the substream-specific
  software data by setting strm->running and strm->substream appropriately.

The function rz_ssi_is_stream_running() checks if both strm->substream and
strm->running are valid and returns true if so. Its implementation is as
follows:

static inline bool rz_ssi_is_stream_running(struct rz_ssi_stream *strm)
{
    return strm->substream && strm->running;
}

When the controller is configured in full-duplex mode (with both playback
and capture active), the rz_ssi_stop() function does not modify the
controller settings when called for the first substream in the full-duplex
setup. Instead, it simply sets strm->running = 0 and returns if the
companion substream is still running. The following code illustrates this:

static int rz_ssi_stop(struct rz_ssi_priv *ssi, struct rz_ssi_stream *strm)
{
    strm->running = 0;

    if (rz_ssi_is_stream_running(&ssi->playback) ||
        rz_ssi_is_stream_running(&ssi->capture))
        return 0;

    // ...
}

The controller settings, along with the DMA termination (for the last
stopped substream), are only applied when the last substream in the
full-duplex setup is stopped.

While applying the controller settings only when the last substream stops
is not problematic, terminating the DMA operations for only one substream
causes failures when starting and stopping full-duplex operations multiple
times in a loop.

To address this issue, call dmaengine_terminate_async() for both substreams
involved in the full-duplex setup when the last substream in the setup is
stopped.

Fixes: 4f8cd05 ("ASoC: sh: rz-ssi: Add full duplex support")
Cc: [email protected]
Reviewed-by: Biju Das <[email protected]>
Signed-off-by: Claudiu Beznea <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 82a0a3e ]

My static checker rule complains about this code.  The concern is that
if "sample_space" is negative then the "sample_space >= runtime->channels"
condition will not work as intended because it will be type promoted to a
high unsigned int value.

strm->fifo_sample_size is SSI_FIFO_DEPTH (32).  The SSIFSR_TDC_MASK is
0x3f.  Without any further context it does seem like a reasonable warning
and it can't hurt to add a check for negatives.

Cc: [email protected]
Fixes: 03e786b ("ASoC: sh: Add RZ/G2L SSIF-2 driver")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit d9d959c ]

In order to remove the deprecated function
pcim_iomap_regions_request_all(), a few drivers need an interface to
request all BARs a PCI device offers.

Make pcim_request_all_regions() a public interface.

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Philipp Stanner <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Reviewed-by: Ilpo Järvinen <[email protected]>
Stable-dep-of: d555ed4 ("PCI: Restore original INTX_DISABLE bit by pcim_intx()")
Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit f546e80 ]

pci_intx() is a hybrid function which sometimes performs devres operations,
depending on whether pcim_enable_device() has been used to enable the
pci_dev. This sometimes-managed nature of the function is problematic.
Notably, it causes the function to allocate under some circumstances which
makes it unusable from interrupt context.

Export pcim_intx() (which is always managed) and rename __pcim_intx()
(which is never managed) to pci_intx_unmanaged() and export it as well.

Then all callers of pci_intx() can be ported to the version they need,
depending whether they use pci_enable_device() or pcim_enable_device().

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Philipp Stanner <[email protected]>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Stable-dep-of: d555ed4 ("PCI: Restore original INTX_DISABLE bit by pcim_intx()")
Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit dfa2f4d ]

pci_intx() is a hybrid function which can sometimes be managed through
devres. This hybrid nature is undesirable.

Since all users of pci_intx() have by now been ported either to
always-managed pcim_intx() or never-managed pci_intx_unmanaged(), the
devres functionality can be removed from pci_intx().

Consequently, pci_intx_unmanaged() is now redundant, because pci_intx()
itself is now unmanaged.

Remove the devres functionality from pci_intx(). Have all users of
pci_intx_unmanaged() call pci_intx(). Remove pci_intx_unmanaged().

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Philipp Stanner <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Acked-by: Paolo Abeni <[email protected]>
Stable-dep-of: d555ed4 ("PCI: Restore original INTX_DISABLE bit by pcim_intx()")
Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit d555ed4 ]

pcim_intx() tries to restore the INTx bit at removal via devres, but there
is a chance that it restores a wrong value.

Because the value to be restored is blindly assumed to be the negative of
the enable argument, when a driver calls pcim_intx() unnecessarily for the
already enabled state, it'll restore to the disabled state in turn.  That
is, the function assumes the case like:

  // INTx == 1
  pcim_intx(pdev, 0); // old INTx value assumed to be 1 -> correct

but it might be like the following, too:

  // INTx == 0
  pcim_intx(pdev, 0); // old INTx value assumed to be 1 -> wrong

Also, when a driver calls pcim_intx() multiple times with different enable
argument values, the last one will win no matter what value it is.  This
can lead to inconsistency, e.g.

  // INTx == 1
  pcim_intx(pdev, 0); // OK
  ...
  pcim_intx(pdev, 1); // now old INTx wrongly assumed to be 0

This patch addresses those inconsistencies by saving the original INTx
state at the first pcim_intx() call.  For that, get_or_create_intx_devres()
is folded into pcim_intx() caller side; it allows us to simply check the
already allocated devres and record the original INTx along with the
devres_alloc() call.

Link: https://lore.kernel.org/r/[email protected]
Fixes: 25216af ("PCI: Add managed pcim_intx()")
Link: https://lore.kernel.org/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Philipp Stanner <[email protected]>
Cc: [email protected]	# v6.11+
Signed-off-by: Sasha Levin <[email protected]>
Amit Kumar Mahapatra and others added 21 commits February 27, 2025 04:30
commit 539bd20 upstream.

'commit 18bcb4a ("mtd: spi-nor: sst: Factor out common write operation
to `sst_nor_write_data()`")' introduced a bug where only one byte of data
is written, regardless of the number of bytes passed to
sst_nor_write_data(), causing a kernel crash during the write operation.
Ensure the correct number of bytes are written as passed to
sst_nor_write_data().

Call trace:
[   57.400180] ------------[ cut here ]------------
[   57.404842] While writing 2 byte written 1 bytes
[   57.409493] WARNING: CPU: 0 PID: 737 at drivers/mtd/spi-nor/sst.c:187 sst_nor_write_data+0x6c/0x74
[   57.418464] Modules linked in:
[   57.421517] CPU: 0 UID: 0 PID: 737 Comm: mtd_debug Not tainted 6.12.0-g5ad04afd91f9 #30
[   57.429517] Hardware name: Xilinx Versal A2197 Processor board revA - x-prc-02 revA (DT)
[   57.437600] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   57.444557] pc : sst_nor_write_data+0x6c/0x74
[   57.448911] lr : sst_nor_write_data+0x6c/0x74
[   57.453264] sp : ffff80008232bb40
[   57.456570] x29: ffff80008232bb40 x28: 0000000000010000 x27: 0000000000000001
[   57.463708] x26: 000000000000ffff x25: 0000000000000000 x24: 0000000000000000
[   57.470843] x23: 0000000000010000 x22: ffff80008232bbf0 x21: ffff000816230000
[   57.477978] x20: ffff0008056c0080 x19: 0000000000000002 x18: 0000000000000006
[   57.485112] x17: 0000000000000000 x16: 0000000000000000 x15: ffff80008232b580
[   57.492246] x14: 0000000000000000 x13: ffff8000816d1530 x12: 00000000000004a4
[   57.499380] x11: 000000000000018c x10: ffff8000816fd530 x9 : ffff8000816d1530
[   57.506515] x8 : 00000000fffff7ff x7 : ffff8000816fd530 x6 : 0000000000000001
[   57.513649] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[   57.520782] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0008049b0000
[   57.527916] Call trace:
[   57.530354]  sst_nor_write_data+0x6c/0x74
[   57.534361]  sst_nor_write+0xb4/0x18c
[   57.538019]  mtd_write_oob_std+0x7c/0x88
[   57.541941]  mtd_write_oob+0x70/0xbc
[   57.545511]  mtd_write+0x68/0xa8
[   57.548733]  mtdchar_write+0x10c/0x290
[   57.552477]  vfs_write+0xb4/0x3a8
[   57.555791]  ksys_write+0x74/0x10c
[   57.559189]  __arm64_sys_write+0x1c/0x28
[   57.563109]  invoke_syscall+0x54/0x11c
[   57.566856]  el0_svc_common.constprop.0+0xc0/0xe0
[   57.571557]  do_el0_svc+0x1c/0x28
[   57.574868]  el0_svc+0x30/0xcc
[   57.577921]  el0t_64_sync_handler+0x120/0x12c
[   57.582276]  el0t_64_sync+0x190/0x194
[   57.585933] ---[ end trace 0000000000000000 ]---

Cc: [email protected]
Fixes: 18bcb4a ("mtd: spi-nor: sst: Factor out common write operation to `sst_nor_write_data()`")
Signed-off-by: Amit Kumar Mahapatra <[email protected]>
Reviewed-by: Pratyush Yadav <[email protected]>
Reviewed-by: Tudor Ambarus <[email protected]>
Reviewed-by: Bence Csókás <[email protected]>
[[email protected]: add Cc stable tag]
Signed-off-by: Pratyush Yadav <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 2b9df00 upstream.

Replace dma_request_channel() with dma_request_chan_by_mask() and use
helper functions to return proper error code instead of fixed -EBUSY.

Fixes: ec4ba01 ("mtd: rawnand: Add new Cadence NAND driver to MTD subsystem")
Cc: [email protected]
Signed-off-by: Niravkumar L Rabara <[email protected]>
Signed-off-by: Miquel Raynal <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit d76d22b upstream.

Remap the slave DMA I/O resources to enhance driver portability.
Using a physical address causes DMA translation failure when the
ARM SMMU is enabled.

Fixes: ec4ba01 ("mtd: rawnand: Add new Cadence NAND driver to MTD subsystem")
Cc: [email protected]
Signed-off-by: Niravkumar L Rabara <[email protected]>
Signed-off-by: Miquel Raynal <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit f37d135 upstream.

dma_map_single is using physical/bus device (DMA) but dma_unmap_single
is using framework device(NAND controller), which is incorrect.
Fixed dma_unmap_single to use correct physical/bus device.

Fixes: ec4ba01 ("mtd: rawnand: Add new Cadence NAND driver to MTD subsystem")
Cc: [email protected]
Signed-off-by: Niravkumar L Rabara <[email protected]>
Signed-off-by: Miquel Raynal <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 782cffe upstream.

According to the latest event list, update the event constraint tables
for Lion Cove core.

The general rule (the event codes < 0x90 are restricted to counters
0-3.) has been removed. There is no restriction for most of the
performance monitoring events.

Fixes: a932aa0 ("perf/x86: Add Lunar Lake and Arrow Lake support")
Reported-by: Amiri Khalil <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 4cb7779 upstream.

Christoph reports that their rk3399 system dies since commit 773c05f
("irqchip/gic-v3: Work around insecure GIC integrations").

It appears that some rk3399 have secure payloads, and that the firmware
sets SCR_EL3.FIQ==1. Obivously, disabling security in that configuration
leads to even more problems.

Revisit the workaround by:

  - making it rk3399 specific
  - checking whether Group-0 is available, which is a good proxy
    for SCR_EL3.FIQ being 0
  - either apply the workaround if Group-0 is available, or disable
    pseudo-NMIs if not

Note that this doesn't mean that the secure side is able to receive
interrupts, as all interrupts are made non-secure anyway.

Clearly, nobody ever tested secure interrupts on this platform.

Fixes: 773c05f ("irqchip/gic-v3: Work around insecure GIC integrations")
Reported-by: Christoph Fritz <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Christoph Fritz <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/all/[email protected]
Closes: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 860ca5e upstream.

Add check for the return value of cifs_buf_get() and cifs_small_buf_get()
in receive_encrypted_standard() to prevent null pointer dereference.

Fixes: eec04ea ("smb: client: fix OOB in receive_encrypted_standard()")
Cc: [email protected]
Signed-off-by: Haoxiang Li <[email protected]>
Signed-off-by: Steve French <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit c158647 upstream.

The previous implementation incorrectly configured the cmn_interrupt_2_enable
register for interrupt handling. Using cmn_interrupt_2_enable to configure
Tag, Data RAM ECC interrupts would lead to issues like double handling of the
interrupts (EL1 and EL3) as cmn_interrupt_2_enable is meant to be configured
for interrupts which needs to be handled by EL3.

EL1 LLCC EDAC driver needs to use cmn_interrupt_0_enable register to configure
Tag, Data RAM ECC interrupts instead of cmn_interrupt_2_enable.

Fixes: 2745065 ("drivers: edac: Add EDAC driver support for QCOM SoCs")
Signed-off-by: Komal Bajaj <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Manivannan Sadhasivam <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 57b76be upstream.

The function tracer should record the preemption level at the point when
the function is invoked. If the tracing subsystem decrement the
preemption counter it needs to correct this before feeding the data into
the trace buffer. This was broken in the commit cited below while
shifting the preempt-disabled section.

Use tracing_gen_ctx_dec() which properly subtracts one from the
preemption counter on a preemptible kernel.

Cc: [email protected]
Cc: Wander Lairson Costa <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/[email protected]
Fixes: ce5e480 ("ftrace: disable preemption when recursion locked")
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Tested-by: Wander Lairson Costa <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 38b1406 upstream.

Function graph uses a subops and manager ops mechanism to attach to
ftrace.  The manager ops connects to ftrace and the functions it connects
to is defined by a list of subops that it manages.

The function hash that defines what the above ops attaches to limits the
functions to attach if the hash has any content. If the hash is empty, it
means to trace all functions.

The creation of the manager ops hash is done by iterating over all the
subops hashes. If any of the subops hashes is empty, it means that the
manager ops hash must trace all functions as well.

The issue is in the creation of the manager ops. When a second subops is
attached, a new hash is created by starting it as NULL and adding the
subops one at a time. But the NULL ops is mistaken as an empty hash, and
once an empty hash is found, it stops the loop of subops and just enables
all functions.

  # echo "f:myevent1 kernel_clone" >> /sys/kernel/tracing/dynamic_events
  # cat /sys/kernel/tracing/enabled_functions
kernel_clone (1)           	tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60

  # echo "f:myevent2 schedule_timeout" >> /sys/kernel/tracing/dynamic_events
  # cat /sys/kernel/tracing/enabled_functions
trace_initcall_start_cb (1)             tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
run_init_process (1)            tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
try_to_run_init_process (1)             tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
x86_pmu_show_pmu_cap (1)                tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
cleanup_rapl_pmus (1)                   tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
uncore_free_pcibus_map (1)              tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
uncore_types_exit (1)                   tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
uncore_pci_exit.part.0 (1)              tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
kvm_shutdown (1)                tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
vmx_dump_msrs (1)               tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
vmx_cleanup_l1d_flush (1)               tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
[..]

Fix this by initializing the new hash to NULL and if the hash is NULL do
not treat it as an empty hash but instead allocate by copying the content
of the first sub ops. Then on subsequent iterations, the new hash will not
be NULL, but the content of the previous subops. If that first subops
attached to all functions, then new hash may assume that the manager ops
also needs to attach to all functions.

Cc: [email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Link: https://lore.kernel.org/[email protected]
Fixes: 5fccc75 ("ftrace: Add subops logic to allow one ops to manage many")
Reviewed-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 8eb4b09 upstream.

Check if a function is already in the manager ops of a subops. A manager
ops contains multiple subops, and if two or more subops are tracing the
same function, the manager ops only needs a single entry in its hash.

Cc: [email protected]
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Link: https://lore.kernel.org/[email protected]
Fixes: 4f554e9 ("ftrace: Add ftrace_set_filter_ips function")
Tested-by: Heiko Carstens <[email protected]>
Reviewed-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 22bec11 upstream.

When the function tracing_set_tracer() switched over to using the guard()
infrastructure, it did not need to save the 'ret' variable and would just
return the value when an error arised, instead of setting ret and jumping
to an out label.

When CONFIG_TRACER_SNAPSHOT is enabled, it had code that expected the
"ret" variable to be initialized to zero and had set 'ret' while holding
an arch_spin_lock() (not used by guard), and then upon releasing the lock
it would check 'ret' and exit if set. But because ret was only set when an
error occurred while holding the locks, 'ret' would be used uninitialized
if there was no error. The code in the CONFIG_TRACER_SNAPSHOT block should
be self contain. Make sure 'ret' is also set when no error occurred.

Cc: Mathieu Desnoyers <[email protected]>
Link: https://lore.kernel.org/[email protected]
Reported-by: kernel test robot <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Closes: https://lore.kernel.org/r/[email protected]/
Fixes: d33b10c ("tracing: Switch trace.c code over to use guard()")
Signed-off-by: Steven Rostedt (Google) <[email protected]>
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 488fb6e upstream.

Fix a deadlock in pse_pi_get_current_limit and pse_pi_set_current_limit
caused by consecutive mutex_lock calls. One in the function itself and
another in pse_pi_get_voltage.

Resolve the issue by using the unlocked version of pse_pi_get_voltage
instead.

Fixes: e0a5e2b ("net: pse-pd: Use power limit at driver side instead of current limit")
Signed-off-by: Kory Maincent <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
… task_can_run_on_remote_rq()

commit f3f08c3 upstream.

While fixing migration disabled task handling, 3296682 ("sched_ext: Fix
migration disabled handling in targeted dispatches") assumed that a
migration disabled task's ->cpus_ptr would only have the pinned CPU. While
this is eventually true for migration disabled tasks that are switched out,
->cpus_ptr update is performed by migrate_disable_switch() which is called
right before context_switch() in __scheduler(). However, the task is
enqueued earlier during pick_next_task() via put_prev_task_scx(), so there
is a race window where another CPU can see the task on a DSQ.

If the CPU tries to dispatch the migration disabled task while in that
window, task_allowed_on_cpu() will succeed and task_can_run_on_remote_rq()
will subsequently trigger SCHED_WARN(is_migration_disabled()).

  WARNING: CPU: 8 PID: 1837 at kernel/sched/ext.c:2466 task_can_run_on_remote_rq+0x12e/0x140
  Sched_ext: layered (enabled+all), task: runnable_at=-10ms
  RIP: 0010:task_can_run_on_remote_rq+0x12e/0x140
  ...
   <TASK>
   consume_dispatch_q+0xab/0x220
   scx_bpf_dsq_move_to_local+0x58/0xd0
   bpf_prog_84dd17b0654b6cf0_layered_dispatch+0x290/0x1cfa
   bpf__sched_ext_ops_dispatch+0x4b/0xab
   balance_one+0x1fe/0x3b0
   balance_scx+0x61/0x1d0
   prev_balance+0x46/0xc0
   __pick_next_task+0x73/0x1c0
   __schedule+0x206/0x1730
   schedule+0x3a/0x160
   __do_sys_sched_yield+0xe/0x20
   do_syscall_64+0xbb/0x1e0
   entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fix it by converting the SCHED_WARN() back to a regular failure path. Also,
perform the migration disabled test before task_allowed_on_cpu() test so
that BPF schedulers which fail to handle migration disabled tasks can be
noticed easily.

While at it, adjust scx_ops_error() message for !task_allowed_on_cpu() case
for brevity and consistency.

Signed-off-by: Tejun Heo <[email protected]>
Fixes: 3296682 ("sched_ext: Fix migration disabled handling in targeted dispatches")
Acked-by: Andrea Righi <[email protected]>
Reported-by: Jake Hillion <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 4603618 upstream.

The mm kselftests are currently built with no optimisation (-O0).  It's
unclear why, and besides being obviously suboptimal, this also prevents
the pkeys tests from working as intended.  Let's build all the tests with
-O2.

[[email protected]: silence unused-result warnings]
  Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kevin Brodsky <[email protected]>
Cc: Aruna Ramakrishna <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Joey Gouly <[email protected]>
Cc: Keith Lucas <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
[ Yifei: This commit also fix the failure of pkey_sighandler_tests_64,
  which is also in linux-6.12.y, thus backport this commit. ]
Signed-off-by: Yifei Liu <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
… plus lts

commit a6a7cba upstream.

In general the delay should be added by the PHY instead of the MAC,
and this improves network stability on some boards which seem to
need different delay.

Fixes: 387b3bb ("arm64: dts: rockchip: Add Xunlong OrangePi R1 Plus LTS")
Cc: [email protected] # 6.6+
Signed-off-by: Tianling Shen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Heiko Stuebner <[email protected]>
[Fix conflicts due to missing dtsi conversion]
Signed-off-by: Tianling Shen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit b35eb91 upstream.

When mesa started using compute queues more often
we started seeing additional hangs with compute queues.
Disabling gfxoff seems to mitigate that.  Manually
control gfxoff and gfx pg with command submissions to avoid
any issues related to gfxoff.  KFD already does the same
thing for these chips.

v2: limit to compute
v3: limit to APUs
v4: limit to Raven/PCO
v5: only update the compute ring_funcs
v6: Disable GFX PG
v7: adjust order

Reviewed-by: Lijo Lazar <[email protected]>
Suggested-by: Błażej Szczygieł <[email protected]>
Suggested-by: Sergey Kovalenko <[email protected]>
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3861
Link: https://lists.freedesktop.org/archives/amd-gfx/2025-January/119116.html
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected] # 6.12.x
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 55ed2b1 upstream.

Bump the driver version for RV/PCO compute stability fix
so mesa can use this check to enable compute queues on
RV/PCO.

Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected] # 6.12.x
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Tested-by: FLorian Fainelli <[email protected]>
Tested-by: Peter Schneider <[email protected]>
Tested-by: Shuah Khan <[email protected]>
Tested-by: Ron Economos <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Tested-by: Jon Hunter <[email protected]>
Tested-by: Harshit Mogalapalli <[email protected]>
Tested-by: Mark Brown <[email protected]>
Tested-by: Peter Schneider <[email protected]>
Tested-by: Markus Reichelt <[email protected]>
Tested-by: Slade Watkins <[email protected]>
Tested-by: Florian Fainelli <[email protected]>
Tested-by: Linux Kernel Functional Testing <[email protected]>
Tested-by: Salvatore Bonaccorso <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
All configs dropped the follwoing since its dependent on ARCH_MVEBU
which is not configured on so there is no reason to ask.
 # CONFIG_CZNIC_PLATFORMS is not set
See upstream commit: dd0f05b

x86_64 configs also dropped a previously defined y config
 -CONFIG_IMX_SCMI_MISC_DRV=y
 This comes from firmware: imx: IMX_SCMI_MISC_DRV should depend on ARCH_MXC
See Upstream Commit: be6686b
Whats a little confusing is that the fedora kernel-ark says that this is
marked as a `y`
$ cat redhat/configs/rhel/generic/CONFIG_IMX_SCMI_MISC_DRV
CONFIG_IMX_SCMI_MISC_DRV=y

[kernel-ark]$ ls redhat/configs/kernel-6.13.8-x86_64*
redhat/configs/kernel-6.13.8-x86_64-automotive.config
redhat/configs/kernel-6.13.8-x86_64.config
redhat/configs/kernel-6.13.8-x86_64-rt.config
redhat/configs/kernel-6.13.8-x86_64-automotive-debug.config
redhat/configs/kernel-6.13.8-x86_64-debug.config
redhat/configs/kernel-6.13.8-x86_64-rt-debug.config

[kernel-ark]$ grep CONFIG_IMX_SCMI_MISC_DRV redhat/configs/kernel-6.13.8-x86_64*
[kernel-ark]$

Do to this we're leaving this as the default Kconfig of off for x86_64
Copy link
Collaborator

@bmastbergen bmastbergen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥌

Copy link

@josephtate josephtate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🆗

@PlaidCat
Copy link
Collaborator Author

PlaidCat commented Apr 3, 2025

This did does not look like it will do what what we want.

I'm going to revert back to the previous rebase process and fall back to git push --force to do the update date.

This will keep our stuff directly dangling off the end.

@PlaidCat PlaidCat closed this Apr 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.