PostgreSQL Performance Plummets 50% on Linux 7.0!? The Scheduler Change Trap
📰 News Overview
- Dramatic Performance Drop: On a 96-vCPU Graviton4 machine running Linux 7.0, PostgreSQL throughput has dropped to about half (from approximately 98,000 to about 50,000 transactions per second) compared to Linux 6.x.
- Identifying the Bottleneck: AWS engineer Salvatore Dipietro’s investigation revealed that over 55% of CPU time is consumed by the
s_lock(spinlock) function. - Scheduler Specification Change: The removal of the previously recommended server setting “PREEMPT_NONE” in Linux 7.0 has forced the application of “PREEMPT_LAZY” or “FULL,” which is the root cause.
💡 Key Points
- Collapse of Spinlocks: A negative feedback loop occurs when threads holding spinlocks are preempted by the kernel, causing other threads to spin in wait for that lock.
- Competition in StrategyGetBuffer: Contention in the
StrategyGetBufferfunction, used by PostgreSQL to seek buffers from the shared buffer pool, is causing catastrophic damage. - Challenges of Modern Architecture: In an extreme setup with a multi-core environment (96 vCPUs) and high parallel load, the tiniest discrepancies in OS scheduling can paralyze the entire system.
🦈 Shark’s Eye (Curator’s Perspective)
This news represents a fascinating case where an “evolution” that was intended to be beneficial for the OS has struck at the very heart of middleware like databases!
Specifically, the introduction of “PREEMPT_LAZY” as a compromise in Linux 6.12 has proven to be insufficient for high-load server workloads that heavily utilize spinlocks, like PostgreSQL. The fact that 55% of CPU time is wasted on a single function, s_lock, illustrates just how finely balanced OS preemption is. Engineers considering a shift to the latest Linux 7.0 environment will need to implement patches or configuration changes to avoid falling into this “scheduler trap”!
🚀 What’s Next?
It’s highly likely that the Linux kernel community will discuss and introduce preemption control patches to remedy cases like PostgreSQL. Additionally, the database side may accelerate the transition to more advanced lock-free algorithms, reducing reliance on spinlocks.
💬 A Note from Haru-Shark
Just because it’s the latest OS doesn’t mean it’s faster! Spinlocks can spin you dizzy, and even sharks might end up buttered! 🦈🌀
📚 Terminology Breakdown
-
Preemption: The OS forcibly interrupts a running process to allocate CPU to another process.
-
Spinlock: A lightweight synchronization mechanism where the CPU waits in a loop until the lock is released.
-
Buffer Pool: An area where a database caches data read from disk into memory.
-
Source: Linux 7.0 Broke PostgreSQL: The Preemption Regression Explained