Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Ironically things went upside down with linux scheduling over the last 15 years.

All major OSes had PhD CS people doing their schedulers "efficient" at around 2000, and be able to sustain 98%+ MP loads. Linux was an odd case of a very naive, and "inefficient" scheduler that left CPU 30-20% idle unless you really knew how to hack around that. Linux however came with near no stalls, and latency spikes.

Things have changed dramatically now. Linux got attention of the entire industry on it, and years of holy wars about scheduling.

You can now hit 95%+ under multithreaded load easily, but even on the desktop you can now see how a block in IO hog app can visibly freeze your system for few seconds.

Now, the trend has reversed again, and now everybody is praising conservative scheduling that doesn't give out too much CPU :-/



> but even on the desktop you can now see how a block in IO hog app can visibly freeze your system for few seconds.

Or worse. I ran into an issue when copying an image from an NVME drive to a USB drive using `cat` (e.g. `cat somefile.img >/dev/sdX`) where mouse performance (also a USB device) became so sluggish that the desktop was unusable until the copy was complete.

This wasn't the only instance where background I/O brought my desktop to it's knees. In another instance it was copying a large file across the network (Gigabit Ethernet) to a file server. The network was saturated and the desktop was unusable. I think that the remote system was the destination but my recollection could be in error. The network interface is a PCI device and there was no USB device involved in this transfer.

This is on Debian Stable and using the Gnome desktop. The non-standard portion of the system is that it runs off a ZFS filesystem. I don;t know if that matters.

I'd like to know where to look and how to diagnose this issue if/when it crops up again. Interrupt storm? USB driver issue? Some other issue resulting in some resource starvation? (In neither case did overall CPU load appear high.)


I'd guess the apps were waiting for I/O, not CPU.

Don't know the reason, but lots of unwritten I/O on a (slow) drive can block processes waiting for I/O on an unrelated drive.

By default, dirty buffers are a predetermined portion of RAM, so lots of RAM and slow I/O can make for long pauses. I usually tweak sysctls on desktops to limit dirty buffers. E.g.

  vm.dirty_background_bytes = 67108864
  vm.dirty_bytes = 134217728


You will not get enough CPU time given to a task if it does, say, 10% IO, and 90% CPU if there is an IO hog running in parallel. You either IOnice everything to oblivion, or bare with this.

IO blocks are almost always CPU blocks too on anything that does even minimal IO. And the more simultaneous IO, the more CPU overhead there is.

Try running compiling something big with -J 32 or more.


When system is IO busy, processes block (more) waiting for IO completion. IO blocked processes wouldn't know what to do with a CPU, so they won't have it assigned. Are you saying something different?

Decreasing dirty memory decreases flush times, decreasing process write/flush times, decreasing process/UI hangs/pauses. Presumably it also lowers throughput in some cases, but it's better than having UI hangs.


Is that a CPU scheduler issue though?

Linux traditionally has a problem of overall system architecture in that userspace isn't set up to provide policy information such as "this thread is doing UI" vs. "this is a background task" by default.

However, if I help the scheduler along by nice(1)-ing my big compute tasks, then the desktop remains perfectly smooth even while the CPU is 100% busy.

It'd be much better to have this experience by default, but unfortunately it requires architectural improvements throughout the entire software stack. Trying to band-aid it in the kernel scheduler may allow you to heuristically improve some workloads, but it's ultimately a fool's errand.

Where the desktop does tend to fall over and become totally unusable is when the system starts to swap...


There is pretty much nothing you can do on a Linux system that doesn't make a swaptest (showing e.g. red and blue screens on even and odd frames) glitch.


I remember when the 'rotating cube' 3D desktop was released for linux (compiz?), and the animations were instant and flawless, which seemed absolutely _impossible_ on Windows at the time.


It still does seem impossible. If i (on accident) hit super+tab on my work computer i get to wait 1-2 seconds while the workspace overview stabilizes before i can get back to work.


This could be due to the focus of the kernel developers involved. At some point in the 2000s, companies such as IBM invested heavily in Linux, working on making the kernel scale well on 'big iron' machines with tens of cores and NUMA-style architecture. As a result, the focus shifted away from 'normal' desktop boxes.

But was this a good or bad thing? Even desktop machines can now have 16+ cores and NUMA. So, perhaps that focus has paid off for more users?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: