The Linux 6.6 kernel ships with a new default scheduler[1]. Is the testing metho...

mochomocha · on Dec 14, 2023

I'd like to wager that EEVDF has been tested less methodologically than how this paper investigates CFS. The primary author of EEVDF and maintainer of the subsystem has been dismissing alternative approaches and plethora of robustly tested patches from Google and Facebook over the years, with mostly replies boiling down to "meh I don't like it".

I'd take a patch of CFS and its millions of broken knobs from Google over newly released EEVDF any day, because I trust scheduler AB testing by Google over millions of machines and every single scheduling pattern under the sun way more than whatever synthetic micro-benchmark a single kernel dev (as competent as they might be) ran.

If you're interested in quantitative analysis of schedulers & tooling around it, these 2 projects are very interesting:

https://github.com/google/schedviz

https://fuchsia.dev/fuchsia-src/concepts/kernel/fair_schedul...

jorvi · on Dec 14, 2023

With the pretty stunning performance improvements EEVDF posts (over CFS), I’m not so sure where your hate for it comes from.

mochomocha · on Dec 14, 2023

Where can I find benchmarks of EEVDF vs CFS that were 1) not run by Zijlstra and 2) not synthetic? ie AB tested on a large fleet of computers running heterogeneous processes.

I have nothing against the EEVDF algorithm itself (in fact I like it) and I dislike CFS very much. But I dislike the current development process of the Linux scheduler even more. Proper quantitative benchmarks of CPU schedulers are missing, which is why CFS ended in the sad state it did, where hundreds of patches were submitted to fix random edge cases over the years. What makes you confident that the initial EEVDF Linux implementation won't suffer the same fate, given that the development process hasn't changed (single kernel dev implementing it and running micro benchmarks)?

jorvi · on Dec 14, 2023

I mean, I only know of the test at the time of the 6.4RC: https://openbenchmarking.org/result/2305210-NE-2305205NE89

An increase in performance of 10-30% across "real" workloads. Just for changing one line (well, two lines including disabling p-state drivers)? I'll take it.

I would say its not a strange position to assume it has been improved further since then.

What would be interesting although niche is checking how iGPU's perform alongside it. I know that on Intel, "thermald" lowers iGPU performance because it improves CPU utilization and thus leaves less mW for the iGPU. Perhaps something similar will happen with EEVDF.

hinkley · on Dec 14, 2023

There’s nothing in that article that talks about how work stealing is managed, so there’s no knowing if this scheduler fixes the problem or makes it worse.

redeeman · on Dec 14, 2023

and something reasonably like it existed for... more than a decade, but mingo was not a fan, threw a hissyfit and originated CFS, and well, torvalds didnt care too much, so deferred to mingo

mochomocha · on Dec 14, 2023

Having experimented with a lot of CFS knobs, my high-level conclusion is that every single non-default behavior is broken to various degrees, and default behavior isn't immune to pathological edge cases either.