a737737cdb9c94e40a9926cdc2320f874c05d709 - s390/percpu: Infrastructure for more efficient this_cpu operations

Tue, 26 May 2026 07:56:56 +0200

s390/percpu: Infrastructure for more efficient this_cpu operationsWith the intended removal of PREEMPT_NONE this_cpu operations based onatomic instructions, guarded with preempt_disable()/preempt_enable() pairsbecome more expensive: the preempt_disable() / preempt_enable() pairs arenot optimized away anymore during compile time.In particular the conditional call to preempt_schedule_notrace() afterpreempt_enable() adds additional code and register pressure.E.g. this simple C code sequenceDEFINE_PER_CPU(long, foo);long bar(long a) { return this_cpu_add_return(foo, a); }generates this code: 11a976: eb af f0 68 00 24 stmg %r10,%r15,104(%r15) 11a97c: b9 04 00 ef lgr %r14,%r15 11a980: b9 04 00 b2 lgr %r11,%r2 11a984: e3 f0 ff c8 ff 71 lay %r15,-56(%r15) 11a98a: e3 e0 f0 98 00 24 stg %r14,152(%r15) 11a990: eb 01 03 a8 00 6a asi 936,1 <- __preempt_count_add(1) 11a996: c0 10 00 d2 ac b5 larl %r1,1b70300 <- address of percpu var 11a9a0: e3 10 23 b8 00 08 ag %r1,952 <- add percpu offset 11a9a6: eb ab 10 00 00 e8 laag %r10,%r11,0(%r1) <- atomic op 11a9ac: eb ff 03 a8 00 6e alsi 936,-1 <- __preempt_count_dec_and_test() 11a9b2: a7 54 00 05 jnhe 11a9bc 11a9b6: c0 e5 00 76 d1 bd brasl %r14,ff4d30 11a9bc: b9 e8 b0 2a agrk %r2,%r10,%r11 11a9c0: eb af f0 a0 00 04 lmg %r10,%r15,160(%r15) 11a9c6 07 fe br %r14Even though the above example is more or less the worst case, since thebranch to preempt_schedule_notrace() requires a stackframe, whichotherwise wouldn't be necessary, there is also the conditional jnhe branchinstruction.Get rid of the conditional branch with the following code sequence: 11a8e6: c0 30 00 d0 c5 0d larl %r3,1b33300 11a8ec: b9 04 00 43 lgr %r4,%r3 11a8f0: eb 00 43 c0 00 52 mviy 960,4 11a8f6: e3 40 03 b8 00 08 ag %r4,952 11a8fc: eb 52 40 00 00 e8 laag %r5,%r2,0(%r4) 11a902: eb 00 03 c0 00 52 mviy 960,0 11a908: b9 08 00 25 agr %r2,%r5 11a90c 07 fe br %r14The general idea is that this_cpu operations based on atomic instructionsare guarded with mviy instructions:- The first mviy instruction writes the register number, which contains the percpu address variable to lowcore. This also indicates that a percpu code section is executed.- The first instruction following the mviy instruction must be the ag instruction which adds the percpu offset to the percpu address register.- Afterwards the atomic percpu operation follows.- Then a second mviy instruction writes a zero to lowcore, which indicates the end of the percpu code section.- In case of an interrupt/exception/nmi the register number which was written to lowcore is copied to the exception frame (pt_regs), and a zero is written to lowcore.- On return to the previous context it is checked if a percpu code section was executed (saved register number not zero), and if the process was migrated to a different cpu. If the percpu offset was already added to the percpu address register (instruction address does _not_ point to the ag instruction) the content of the percpu address register is adjusted so it points to percpu variable of the new cpu.Reviewed-by: Alexander Gordeev Signed-off-by: Heiko Carstens Signed-off-by: Alexander Gordeev List of files: /linux/arch/s390/include/asm/entry-percpu.h

Changes in entry-percpu.h

a737737cdb9c94e40a9926cdc2320f874c05d709 - s390/percpu: Infrastructure for more efficient this_cpu operations