Skip to content

Commit 32b2253

Browse files
committed
Merge tag 'sched-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar: "Core & fair scheduler changes: - Cancel the slice protection of the idle entity (Zihan Zhou) - Reduce the default slice to avoid tasks getting an extra tick (Zihan Zhou) - Force propagating min_slice of cfs_rq when {en,de}queue tasks (Tianchen Ding) - Refactor can_migrate_task() to elimate looping (I Hsin Cheng) - Add unlikey branch hints to several system calls (Colin Ian King) - Optimize current_clr_polling() on certain architectures (Yujun Dong) Deadline scheduler: (Juri Lelli) - Remove redundant dl_clear_root_domain call - Move dl_rebuild_rd_accounting to cpuset.h Uclamp: - Use the uclamp_is_used() helper instead of open-coding it (Xuewen Yan) - Optimize sched_uclamp_used static key enabling (Xuewen Yan) Scheduler topology support: (Juri Lelli) - Ignore special tasks when rebuilding domains - Add wrappers for sched_domains_mutex - Generalize unique visiting of root domains - Rebuild root domain accounting after every update - Remove partition_and_rebuild_sched_domains - Stop exposing partition_sched_domains_locked RSEQ: (Michael Jeanson) - Update kernel fields in lockstep with CONFIG_DEBUG_RSEQ=y - Fix segfault on registration when rseq_cs is non-zero - selftests: Add rseq syscall errors test - selftests: Ensure the rseq ABI TLS is actually 1024 bytes Membarriers: - Fix redundant load of membarrier_state (Nysal Jan K.A.) Scheduler debugging: - Introduce and use preempt_model_str() (Sebastian Andrzej Siewior) - Make CONFIG_SCHED_DEBUG unconditional (Ingo Molnar) Fixes and cleanups: - Always save/restore x86 TSC sched_clock() on suspend/resume (Guilherme G. Piccoli) - Misc fixes and cleanups (Thorsten Blum, Juri Lelli, Sebastian Andrzej Siewior)" * tag 'sched-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits) cpuidle, sched: Use smp_mb__after_atomic() in current_clr_polling() sched/debug: Remove CONFIG_SCHED_DEBUG sched/debug: Remove CONFIG_SCHED_DEBUG from self-test config files sched/debug, Documentation: Remove (most) CONFIG_SCHED_DEBUG references from documentation sched/debug: Make CONFIG_SCHED_DEBUG functionality unconditional sched/debug: Make 'const_debug' tunables unconditional __read_mostly sched/debug: Change SCHED_WARN_ON() to WARN_ON_ONCE() rseq/selftests: Fix namespace collision with rseq UAPI header include/{topology,cpuset}: Move dl_rebuild_rd_accounting to cpuset.h sched/topology: Stop exposing partition_sched_domains_locked cgroup/cpuset: Remove partition_and_rebuild_sched_domains sched/topology: Remove redundant dl_clear_root_domain call sched/deadline: Rebuild root domain accounting after every update sched/deadline: Generalize unique visiting of root domains sched/topology: Wrappers for sched_domains_mutex sched/deadline: Ignore special tasks when rebuilding domains tracing: Use preempt_model_str() xtensa: Rely on generic printing of preemption model x86: Rely on generic printing of preemption model s390: Rely on generic printing of preemption model ...
2 parents 5a658af + 3785c7d commit 32b2253

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+589
-441
lines changed

Documentation/scheduler/sched-debug.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
Scheduler debugfs
33
=================
44

5-
Booting a kernel with CONFIG_SCHED_DEBUG=y will give access to
5+
Booting a kernel with debugfs enabled will give access to
66
scheduler specific debug files under /sys/kernel/debug/sched. Some of
77
those files are described below.
88

Documentation/scheduler/sched-design-CFS.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ picked and the current task is preempted.
9696
CFS uses nanosecond granularity accounting and does not rely on any jiffies or
9797
other HZ detail. Thus the CFS scheduler has no notion of "timeslices" in the
9898
way the previous scheduler had, and has no heuristics whatsoever. There is
99-
only one central tunable (you have to switch on CONFIG_SCHED_DEBUG):
99+
only one central tunable:
100100

101101
/sys/kernel/debug/sched/base_slice_ns
102102

Documentation/scheduler/sched-domains.rst

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -73,9 +73,8 @@ Architectures may override the generic domain builder and the default SD flags
7373
for a given topology level by creating a sched_domain_topology_level array and
7474
calling set_sched_topology() with this array as the parameter.
7575

76-
The sched-domains debugging infrastructure can be enabled by enabling
77-
CONFIG_SCHED_DEBUG and adding 'sched_verbose' to your cmdline. If you
78-
forgot to tweak your cmdline, you can also flip the
76+
The sched-domains debugging infrastructure can be enabled by 'sched_verbose'
77+
to your cmdline. If you forgot to tweak your cmdline, you can also flip the
7978
/sys/kernel/debug/sched/verbose knob. This enables an error checking parse of
8079
the sched domains which should catch most possible errors (described above). It
8180
also prints out the domain structure in a visual format.

Documentation/scheduler/sched-ext.rst

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -107,8 +107,7 @@ detailed information:
107107
nr_rejected : 0
108108
enable_seq : 1
109109
110-
If ``CONFIG_SCHED_DEBUG`` is set, whether a given task is on sched_ext can
111-
be determined as follows:
110+
Whether a given task is on sched_ext can be determined as follows:
112111

113112
.. code-block:: none
114113

Documentation/scheduler/sched-stats.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ One of these is produced per domain for each cpu described. (Note that if
8888
CONFIG_SMP is not defined, *no* domains are utilized and these lines
8989
will not appear in the output. <name> is an extension to the domain field
9090
that prints the name of the corresponding sched domain. It can appear in
91-
schedstat version 17 and above, and requires CONFIG_SCHED_DEBUG.)
91+
schedstat version 17 and above.
9292

9393
domain<N> <name> <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
9494

Documentation/translations/sp_SP/scheduler/sched-design-CFS.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ CFS usa una granularidad de nanosegundos y no depende de ningún
112112
jiffy o detalles como HZ. De este modo, el gestor de tareas CFS no tiene
113113
noción de "ventanas de tiempo" de la forma en que tenía el gestor de
114114
tareas previo, y tampoco tiene heurísticos. Únicamente hay un parámetro
115-
central ajustable (se ha de cambiar en CONFIG_SCHED_DEBUG):
115+
central ajustable:
116116

117117
/sys/kernel/debug/sched/base_slice_ns
118118

arch/arm/kernel/traps.c

Lines changed: 2 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -258,13 +258,6 @@ void show_stack(struct task_struct *tsk, unsigned long *sp, const char *loglvl)
258258
barrier();
259259
}
260260

261-
#ifdef CONFIG_PREEMPT
262-
#define S_PREEMPT " PREEMPT"
263-
#elif defined(CONFIG_PREEMPT_RT)
264-
#define S_PREEMPT " PREEMPT_RT"
265-
#else
266-
#define S_PREEMPT ""
267-
#endif
268261
#ifdef CONFIG_SMP
269262
#define S_SMP " SMP"
270263
#else
@@ -282,8 +275,8 @@ static int __die(const char *str, int err, struct pt_regs *regs)
282275
static int die_counter;
283276
int ret;
284277

285-
pr_emerg("Internal error: %s: %x [#%d]" S_PREEMPT S_SMP S_ISA "\n",
286-
str, err, ++die_counter);
278+
pr_emerg("Internal error: %s: %x [#%d]" S_SMP S_ISA "\n",
279+
str, err, ++die_counter);
287280

288281
/* trap and error numbers are mostly meaningless on ARM */
289282
ret = notify_die(DIE_OOPS, str, regs, err, tsk->thread.trap_no, SIGSEGV);

arch/arm64/kernel/traps.c

Lines changed: 1 addition & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -172,22 +172,14 @@ static void dump_kernel_instr(const char *lvl, struct pt_regs *regs)
172172
printk("%sCode: %s\n", lvl, str);
173173
}
174174

175-
#ifdef CONFIG_PREEMPT
176-
#define S_PREEMPT " PREEMPT"
177-
#elif defined(CONFIG_PREEMPT_RT)
178-
#define S_PREEMPT " PREEMPT_RT"
179-
#else
180-
#define S_PREEMPT ""
181-
#endif
182-
183175
#define S_SMP " SMP"
184176

185177
static int __die(const char *str, long err, struct pt_regs *regs)
186178
{
187179
static int die_counter;
188180
int ret;
189181

190-
pr_emerg("Internal error: %s: %016lx [#%d]" S_PREEMPT S_SMP "\n",
182+
pr_emerg("Internal error: %s: %016lx [#%d] " S_SMP "\n",
191183
str, err, ++die_counter);
192184

193185
/* trap and error numbers are mostly meaningless on ARM */

arch/powerpc/kernel/traps.c

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -263,10 +263,9 @@ static int __die(const char *str, struct pt_regs *regs, long err)
263263
{
264264
printk("Oops: %s, sig: %ld [#%d]\n", str, err, ++die_counter);
265265

266-
printk("%s PAGE_SIZE=%luK%s%s%s%s%s%s %s\n",
266+
printk("%s PAGE_SIZE=%luK%s %s%s%s%s %s\n",
267267
IS_ENABLED(CONFIG_CPU_LITTLE_ENDIAN) ? "LE" : "BE",
268268
PAGE_SIZE / 1024, get_mmu_str(),
269-
IS_ENABLED(CONFIG_PREEMPT) ? " PREEMPT" : "",
270269
IS_ENABLED(CONFIG_SMP) ? " SMP" : "",
271270
IS_ENABLED(CONFIG_SMP) ? (" NR_CPUS=" __stringify(NR_CPUS)) : "",
272271
debug_pagealloc_enabled() ? " DEBUG_PAGEALLOC" : "",

arch/s390/kernel/dumpstack.c

Lines changed: 1 addition & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -198,13 +198,8 @@ void __noreturn die(struct pt_regs *regs, const char *str)
198198
console_verbose();
199199
spin_lock_irq(&die_lock);
200200
bust_spinlocks(1);
201-
printk("%s: %04x ilc:%d [#%d] ", str, regs->int_code & 0xffff,
201+
printk("%s: %04x ilc:%d [#%d]", str, regs->int_code & 0xffff,
202202
regs->int_code >> 17, ++die_counter);
203-
#ifdef CONFIG_PREEMPT
204-
pr_cont("PREEMPT ");
205-
#elif defined(CONFIG_PREEMPT_RT)
206-
pr_cont("PREEMPT_RT ");
207-
#endif
208203
pr_cont("SMP ");
209204
if (debug_pagealloc_enabled())
210205
pr_cont("DEBUG_PAGEALLOC");

arch/x86/kernel/dumpstack.c

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -395,18 +395,13 @@ NOKPROBE_SYMBOL(oops_end);
395395

396396
static void __die_header(const char *str, struct pt_regs *regs, long err)
397397
{
398-
const char *pr = "";
399-
400398
/* Save the regs of the first oops for the executive summary later. */
401399
if (!die_counter)
402400
exec_summary_regs = *regs;
403401

404-
if (IS_ENABLED(CONFIG_PREEMPTION))
405-
pr = IS_ENABLED(CONFIG_PREEMPT_RT) ? " PREEMPT_RT" : " PREEMPT";
406-
407402
printk(KERN_DEFAULT
408-
"Oops: %s: %04lx [#%d]%s%s%s%s%s\n", str, err & 0xffff,
409-
++die_counter, pr,
403+
"Oops: %s: %04lx [#%d]%s%s%s%s\n", str, err & 0xffff,
404+
++die_counter,
410405
IS_ENABLED(CONFIG_SMP) ? " SMP" : "",
411406
debug_pagealloc_enabled() ? " DEBUG_PAGEALLOC" : "",
412407
IS_ENABLED(CONFIG_KASAN) ? " KASAN" : "",

arch/x86/kernel/tsc.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -959,7 +959,7 @@ static unsigned long long cyc2ns_suspend;
959959

960960
void tsc_save_sched_clock_state(void)
961961
{
962-
if (!sched_clock_stable())
962+
if (!static_branch_likely(&__use_tsc) && !sched_clock_stable())
963963
return;
964964

965965
cyc2ns_suspend = sched_clock();
@@ -979,7 +979,7 @@ void tsc_restore_sched_clock_state(void)
979979
unsigned long flags;
980980
int cpu;
981981

982-
if (!sched_clock_stable())
982+
if (!static_branch_likely(&__use_tsc) && !sched_clock_stable())
983983
return;
984984

985985
local_irq_save(flags);

arch/xtensa/kernel/traps.c

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -629,15 +629,11 @@ DEFINE_SPINLOCK(die_lock);
629629
void __noreturn die(const char * str, struct pt_regs * regs, long err)
630630
{
631631
static int die_counter;
632-
const char *pr = "";
633-
634-
if (IS_ENABLED(CONFIG_PREEMPTION))
635-
pr = IS_ENABLED(CONFIG_PREEMPT_RT) ? " PREEMPT_RT" : " PREEMPT";
636632

637633
console_verbose();
638634
spin_lock_irq(&die_lock);
639635

640-
pr_info("%s: sig: %ld [#%d]%s\n", str, err, ++die_counter, pr);
636+
pr_info("%s: sig: %ld [#%d]\n", str, err, ++die_counter);
641637
show_regs(regs);
642638
if (!user_mode(regs))
643639
show_stack(NULL, (unsigned long *)regs->areg[1], KERN_INFO);

fs/proc/base.c

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1489,7 +1489,6 @@ static const struct file_operations proc_fail_nth_operations = {
14891489
#endif
14901490

14911491

1492-
#ifdef CONFIG_SCHED_DEBUG
14931492
/*
14941493
* Print out various scheduling related per-task fields:
14951494
*/
@@ -1539,8 +1538,6 @@ static const struct file_operations proc_pid_sched_operations = {
15391538
.release = single_release,
15401539
};
15411540

1542-
#endif
1543-
15441541
#ifdef CONFIG_SCHED_AUTOGROUP
15451542
/*
15461543
* Print out autogroup related information:
@@ -3331,9 +3328,7 @@ static const struct pid_entry tgid_base_stuff[] = {
33313328
ONE("status", S_IRUGO, proc_pid_status),
33323329
ONE("personality", S_IRUSR, proc_pid_personality),
33333330
ONE("limits", S_IRUGO, proc_pid_limits),
3334-
#ifdef CONFIG_SCHED_DEBUG
33353331
REG("sched", S_IRUGO|S_IWUSR, proc_pid_sched_operations),
3336-
#endif
33373332
#ifdef CONFIG_SCHED_AUTOGROUP
33383333
REG("autogroup", S_IRUGO|S_IWUSR, proc_pid_sched_autogroup_operations),
33393334
#endif
@@ -3682,9 +3677,7 @@ static const struct pid_entry tid_base_stuff[] = {
36823677
ONE("status", S_IRUGO, proc_pid_status),
36833678
ONE("personality", S_IRUSR, proc_pid_personality),
36843679
ONE("limits", S_IRUGO, proc_pid_limits),
3685-
#ifdef CONFIG_SCHED_DEBUG
36863680
REG("sched", S_IRUGO|S_IWUSR, proc_pid_sched_operations),
3687-
#endif
36883681
NOD("comm", S_IFREG|S_IRUGO|S_IWUSR,
36893682
&proc_tid_comm_inode_operations,
36903683
&proc_pid_set_comm_operations, {}),

include/linux/cpuset.h

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,9 +125,11 @@ static inline int cpuset_do_page_mem_spread(void)
125125

126126
extern bool current_cpuset_is_being_rebound(void);
127127

128+
extern void dl_rebuild_rd_accounting(void);
128129
extern void rebuild_sched_domains(void);
129130

130131
extern void cpuset_print_current_mems_allowed(void);
132+
extern void cpuset_reset_sched_domains(void);
131133

132134
/*
133135
* read_mems_allowed_begin is required when making decisions involving
@@ -259,11 +261,20 @@ static inline bool current_cpuset_is_being_rebound(void)
259261
return false;
260262
}
261263

264+
static inline void dl_rebuild_rd_accounting(void)
265+
{
266+
}
267+
262268
static inline void rebuild_sched_domains(void)
263269
{
264270
partition_sched_domains(1, NULL, NULL);
265271
}
266272

273+
static inline void cpuset_reset_sched_domains(void)
274+
{
275+
partition_sched_domains(1, NULL, NULL);
276+
}
277+
267278
static inline void cpuset_print_current_mems_allowed(void)
268279
{
269280
}

include/linux/energy_model.h

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -240,9 +240,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
240240
struct em_perf_state *ps;
241241
int i;
242242

243-
#ifdef CONFIG_SCHED_DEBUG
244243
WARN_ONCE(!rcu_read_lock_held(), "EM: rcu read lock needed\n");
245-
#endif
246244

247245
if (!sum_util)
248246
return 0;

include/linux/preempt.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -515,6 +515,8 @@ static inline bool preempt_model_rt(void)
515515
return IS_ENABLED(CONFIG_PREEMPT_RT);
516516
}
517517

518+
extern const char *preempt_model_str(void);
519+
518520
/*
519521
* Does the preemption model allow non-cooperative preemption?
520522
*

include/linux/sched.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -382,6 +382,11 @@ enum uclamp_id {
382382
#ifdef CONFIG_SMP
383383
extern struct root_domain def_root_domain;
384384
extern struct mutex sched_domains_mutex;
385+
extern void sched_domains_mutex_lock(void);
386+
extern void sched_domains_mutex_unlock(void);
387+
#else
388+
static inline void sched_domains_mutex_lock(void) { }
389+
static inline void sched_domains_mutex_unlock(void) { }
385390
#endif
386391

387392
struct sched_param {

include/linux/sched/deadline.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,11 @@ static inline bool dl_time_before(u64 a, u64 b)
3434
struct root_domain;
3535
extern void dl_add_task_root_domain(struct task_struct *p);
3636
extern void dl_clear_root_domain(struct root_domain *rd);
37+
extern void dl_clear_root_domain_cpu(int cpu);
3738

3839
#endif /* CONFIG_SMP */
3940

41+
extern u64 dl_cookie;
42+
extern bool dl_bw_visited(int cpu, u64 cookie);
43+
4044
#endif /* _LINUX_SCHED_DEADLINE_H */

include/linux/sched/debug.h

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,12 +35,10 @@ extern void show_stack(struct task_struct *task, unsigned long *sp,
3535

3636
extern void sched_show_task(struct task_struct *p);
3737

38-
#ifdef CONFIG_SCHED_DEBUG
3938
struct seq_file;
4039
extern void proc_sched_show_task(struct task_struct *p,
4140
struct pid_namespace *ns, struct seq_file *m);
4241
extern void proc_sched_set_task(struct task_struct *p);
43-
#endif
4442

4543
/* Attach to any functions which should be ignored in wchan output. */
4644
#define __sched __section(".sched.text")

include/linux/sched/idle.h

Lines changed: 16 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,21 @@ static __always_inline bool __must_check current_clr_polling_and_test(void)
7979
return unlikely(tif_need_resched());
8080
}
8181

82+
static __always_inline void current_clr_polling(void)
83+
{
84+
__current_clr_polling();
85+
86+
/*
87+
* Ensure we check TIF_NEED_RESCHED after we clear the polling bit.
88+
* Once the bit is cleared, we'll get IPIs with every new
89+
* TIF_NEED_RESCHED and the IPI handler, scheduler_ipi(), will also
90+
* fold.
91+
*/
92+
smp_mb__after_atomic(); /* paired with resched_curr() */
93+
94+
preempt_fold_need_resched();
95+
}
96+
8297
#else
8398
static inline void __current_set_polling(void) { }
8499
static inline void __current_clr_polling(void) { }
@@ -91,21 +106,15 @@ static inline bool __must_check current_clr_polling_and_test(void)
91106
{
92107
return unlikely(tif_need_resched());
93108
}
94-
#endif
95109

96110
static __always_inline void current_clr_polling(void)
97111
{
98112
__current_clr_polling();
99113

100-
/*
101-
* Ensure we check TIF_NEED_RESCHED after we clear the polling bit.
102-
* Once the bit is cleared, we'll get IPIs with every new
103-
* TIF_NEED_RESCHED and the IPI handler, scheduler_ipi(), will also
104-
* fold.
105-
*/
106114
smp_mb(); /* paired with resched_curr() */
107115

108116
preempt_fold_need_resched();
109117
}
118+
#endif
110119

111120
#endif /* _LINUX_SCHED_IDLE_H */

include/linux/sched/mm.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -531,6 +531,13 @@ enum {
531531

532532
static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm)
533533
{
534+
/*
535+
* The atomic_read() below prevents CSE. The following should
536+
* help the compiler generate more efficient code on architectures
537+
* where sync_core_before_usermode() is a no-op.
538+
*/
539+
if (!IS_ENABLED(CONFIG_ARCH_HAS_SYNC_CORE_BEFORE_USERMODE))
540+
return;
534541
if (current->mm != mm)
535542
return;
536543
if (likely(!(atomic_read(&mm->membarrier_state) &

0 commit comments

Comments
 (0)