Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Open Source Programming Linux

Linus Torvalds Calls Blogger's Linux Scheduler Tests 'Pure Garbage' (phoronix.com) 191

On Wednesday Phoronix cited a blog post by C++ game developer Malte Skarupke claiming his spinlocks experiments had discovered the Linux kernel had a scheduler issue affecting developers bringing games to Linux for Google Stadia.

Linus Torvalds has now responded: The whole post seems to be just wrong, and is measuring something completely different than what the author thinks and claims it is measuring.

First off, spinlocks can only be used if you actually know you're not being scheduled while using them. But the blog post author seems to be implementing his own spinlocks in user space with no regard for whether the lock user might be scheduled or not. And the code used for the claimed "lock not held" timing is complete garbage.

It basically reads the time before releasing the lock, and then it reads it after acquiring the lock again, and claims that the time difference is the time when no lock was held. Which is just inane and pointless and completely wrong...

[T]he code in question is pure garbage. You can't do spinlocks like that. Or rather, you very much can do them like that, and when you do that you are measuring random latencies and getting nonsensical values, because what you are measuring is "I have a lot of busywork, where all the processes are CPU-bound, and I'm measuring random points of how long the scheduler kept the process in place".

And then you write a blog-post blamings others, not understanding that it's your incorrect code that is garbage, and is giving random garbage values...

You might even see issues like "when I run this as a foreground UI process, I get different numbers than when I run it in the background as a batch process". Cool interesting numbers, aren't they?

No, they aren't cool and interesting at all, you've just created a particularly bad random number generator...

[Y]ou should never ever think that you're clever enough to write your own locking routines.. Because the likelihood is that you aren't (and by that "you" I very much include myself -- we've tweaked all the in-kernel locking over decades, and gone through the simple test-and-set to ticket locks to cacheline-efficient queuing locks, and even people who know what they are doing tend to get it wrong several times).

There's a reason why you can find decades of academic papers on locking. Really. It's hard.

"It really means a lot to me that Linus responded," the blogger wrote later, "even if the response is negative." They replied to Torvalds' 1,500-word post on the same mailing list -- and this time received a 1900-word response arguing "you did locking fundamentally wrong..." The fact is, doing your own locking is hard. You need to really understand the issues, and you need to not over-simplify your model of the world to the point where it isn't actually describing reality any more...

Dealing with reality is hard. It sometimes means that you need to make your mental model for how locking needs to work a lot more complicated...

This discussion has been archived. No new comments can be posted.

Linus Torvalds Calls Blogger's Linux Scheduler Tests 'Pure Garbage'

Comments Filter:
  • by Anonymous Coward on Sunday January 05, 2020 @08:15PM (#59590444)

    "Sir, thank you for your contribution, I believe you are in error and your demonstration code doesn't yield the correct values due to incorrect assumptions. Please note the following issues with your submission: Thank you."

    As opposed to, your work is crap..

    • by JaredOfEuropa ( 526365 ) on Sunday January 05, 2020 @08:47PM (#59590546) Journal
      A bit of both. Sure, Linus seems to not have forgotten his leadership training, specifically the part that says you will get your point across much better with insults and abuse than with a well structured argument. But if you read his post you will see that he also points out why he thinks the work is crap, some suggestions on how to do better, and even a statement that he himself might not have done better because it is a very hard problem that requires specialist knowledge. Like the developer in question, I personally would not mind getting feedback like this despite the abusive language and public derision. What I don't get is how this exchange merits an article on ./, the potential for clickbaity headlines notwithstanding.
      • by Cochonou ( 576531 ) on Monday January 06, 2020 @02:25AM (#59591232) Homepage
        Actually, I found him more civil this time... maybe he is really working on improving his communications skills. Or maybe it's because he's just interacting with someone discussing on a blog, and not with someone championing for questionable commits to the kernel source.
        • Pretty much. Said blogger isn't protected by any code of conduct.

        • I don't know what it says about the world if someone drums up complete shit and then blames the shitty results of that shit on someone else very publicly, and that someone else isn't allowed to call them out on it.

          Yeah, vulgarity probably doesn't help, but I didn't feel that Torvalds was out of line in calling garbage code "garbage." He explained why it was garbage, where the Linus of the past would have just said it's fucking shit and the author is an idiot not expound upon why.

          It's incremental progress.

      • by Sumguy2436 ( 6186944 ) on Monday January 06, 2020 @06:10AM (#59591478)

        I personally would not mind getting feedback like this despite the abusive language and public derision

        Better to be blunt and honest than ACs overly-sweet word vomit. Everybody knows you're just faking it and regurgitating empty phrases for the sake of political correctness. You're not being polite by doing it. You're being passive aggressive and dishonest.

        Worst of all, people have to cut through all the clutter to figure out what your actual point is because you're too busy thinking about HOW to say something rather than WHAT you're saying.

      • What I don't get is how this exchange merits an article on ./

        News for Nerds? This topic totally merits an article on /. .

    • For linus the response was incredibly polite and generous. Someone that doesn't understand what they are seeing trying to tell others how wrong they are doesn't really deserve a lot of politeness. People get abused for far less.
    • by peppepz ( 1311345 ) on Monday January 06, 2020 @03:05AM (#59591274)
      Did you, or the people who modded you up, actually read the post and Linus' response to it? It's the blog poster who titled his post "how Bad the Linux Scheduler Really is", and gave an invalid demonstration of that. Linus called that proof "garbage" that was producing "garbage results". He never used the word "crap", that is something you made up. "Garbage" is a standard term to refer to nonsensical data. He proceeded to show why that demonstration was invalid, and suggested the blog author not to use spinlocks at all for his game, therefore giving a precious advice for his professional career.
      • That'll probably just make him pivot to lockfree algorithms instead and that can be even greater can of worms.
    • Anyone that had followed responses to Linux Kernel critique's by Linus T. would known that this feedback in in the Torvald-o-meter of rudeness scale 1-5 from benign to vicious

      Is about 1.

    • No, arrogant programmers need to get these smackdowns, shame is the only thing that can counteract narcissism.

      You should never ever think that you're clever enough to write your own locking routines

      This is really basic. Really basic. When you write your own locking routine, for reasons, know that they probably suck. Just know it. I don't meant, "know it unless you feel lots of warm fuzzies about your code," I mean, know that your own locking routines are crap. Bugs are a force of nature, you can't macho your way past them with self-audits either. If you wrote locking code, it sucked. It doesn'

  • by engineer37 ( 6205042 ) on Sunday January 05, 2020 @08:19PM (#59590450)
    When code is bad someone needs to have the balls to say that the code is bad, otherwise you end up with bad code. If you coded something wrong, someone needs to tell you. Linus was quite polite here given that the original blog post leveled a baseless accusation against his baby on the basis of bad code and a fundamental misunderstanding (and general lack of knowledge) of the nature of the system being analyzed.
    • Saying "code is bad" is fine. Being a complete arsehole about it is not.
      • This wasn't even close to being a complete "arsehole" about it. He was rather gentle actually considering that the person making a claim about the Linux scheduler was totally and completely wrong, publicly making claims based on incorrect assumptions and without peer review.

        Bad behavior begets bad behavior, and Torvalds has lambasted others publicly for far less. He actually showed some restraint here.

        • He wasn't just nice or mean about it, he admitted the Kernel dev's have done it wrong several times and it took them a long time with lots of the right people to get it right (well as right as it is currently) and he's not even sure he could do it correctly without that assistance. That is key advice IMO, he's telling this coder what he's trying to model is very very hard to get right and he doesn't have the experience to do it right so he shouldn't be trying to model it without experienced people to help.

          I

      • Eh, sometimes being an asshole is the only way to get through some people's thick skulls.
    • The simple fact here is that you've got a block of code that is trivial to schedule and work properly on Windows and OSX, and getting it to schedule and work properly in Linux is a giant PITA.

      Anyone who thinks that Linux doesn't have scheduling problems is living in a fantasy land. It has had them for almost two decades... and i am convinced it is because people keep making their own net-new schedulers for all these special snowflake situations instead of focusing on making *ONE GOOD SCHEDULER* like every o

  • He's being so nice (Score:5, Informative)

    by ragahast ( 879945 ) on Sunday January 05, 2020 @08:26PM (#59590470)
    The first thing I noticed in TFS is how nice Linus is being (and helpful - seriously).
    • The first thing I noticed in TFS is how nice Linus is being (and helpful - seriously).

      Within that kernel of observational accurateness, we can distill a single truth: you can be a hardass and scroll back effectively to nicety once in awhile, but you can't be a player's coach, all likable & blowing sunshine, and suddenly become an asshole... folks just don't respond well to it.

    • by Type44Q ( 1233630 ) on Sunday January 05, 2020 @09:18PM (#59590608)

      He's being so nice

      I'm worried; he's clearly sustained a head injury.

    • by 93 Escort Wagon ( 326346 ) on Sunday January 05, 2020 @09:38PM (#59590662)

      Agreed. Speaking as (an unimportant) someone who's faulted Linus' behavior in similar situations in the past - I gotta say this was well and reasonably written. It doesn't hold back regarding the "garbage" code, and it shouldn't; but he never ventures into anything a reasonable person might consider a personal attack.

    • by Zuriel ( 1760072 )
      He's not really being nice as such, he's just focusing criticism on the code itself instead of the developer.
    • by seebs ( 15766 )

      This really is a ton more polite and less insulting than Old Linus, and yet, it still communicates the problem clearly. It's very good.

  • Sounds good that after the "code of conduct" fiasco, Linus seems to return to his normal self again.

    • by jon3k ( 691256 )
      This was AWFULLY mild compared to the old Linus.
    • I honestly don't think he was ever that bad and most of the examples people use are hacked up pieces of a single email he sent in 2004.

      If you're not familiar with those emails beyond the tabloid-style writing about them, in them he'd try to patiently explain to someone who was fairly badly wrong, but very confident in being right that they were wrong and only after he couldn't get trough to them would he lose his temper. Even at that the insults he'd level at them would usually either be restrained or si
  • Comment removed (Score:5, Interesting)

    by account_deleted ( 4530225 ) on Sunday January 05, 2020 @08:43PM (#59590524)
    Comment removed based on user account deletion
  • >"the blogger wrote later, "even if the response is negative." They replied to Torvalds' 1,500-word post on the same mailing list -- and this time received a 1900-word response"

    Who replied? Some group of people? Confusing.

    • by eagle42 ( 58594 )

      >"the blogger wrote later, "even if the response is negative." They replied to Torvalds' 1,500-word post on the same mailing list -- and this time received a 1900-word response"

      Who replied? Some group of people? Confusing.

      Hardly confusing, especially since there's also a direct link to that reply. "They" can be singular [wikipedia.org], too.

  • by TheDarkener ( 198348 ) on Sunday January 05, 2020 @09:33PM (#59590654) Homepage

    FTA: "Besides that I found that most mutex implementations are really good, that most spinlock implementations are pretty bad, and that the Linux scheduler is OK but far from ideal. The most popular replacement, the MuQSS scheduler has other problems instead. (the Windows scheduler is pretty good though)"

    That last part almost made me spit my beer, lol

    • Re: Alternatives... (Score:5, Interesting)

      by chas.williams ( 6256556 ) on Sunday January 05, 2020 @10:03PM (#59590754)
      A spin lock in userspace is likely to make the scheduler treat you as CPU intensive and reduce your priority to favor other tasks to make the system more responsive. So yes, mutexes will be better behaved (in userspace) since they explicitly acknowledge scheduling by sleeping. It sounds like someone wants realtime Linux which is really quite a headache.
    • by sxpert ( 139117 )

      the reality of this whole thing is simple... if your code needs a userland spinlock of any kind, you're doing it wrong. period

    • Re:Alternatives... (Score:4, Interesting)

      by Antique Geekmeister ( 740220 ) on Monday January 06, 2020 @03:28AM (#59591300)

      Isn't the Windows NT scheduler straight from VMS? Ithought it was one of the aspects of the Digital vs. Microsoft lawsuit when David Cutler brought his kernel team from Digital to Microsoft to write the original NT kernel, and a great deal of the kernel from VMS was simply copied wholesale.

    • Why? It's amusing to me when people treat OSs like sports teams. "Well, I don't like Windows so everything about Windows must suck ass, the people Microsoft pays to work there from the labor market are far dumber than the people other companies pay to work there!".

      It's nonsensical.

  • by Proudrooster ( 580120 ) on Sunday January 05, 2020 @09:36PM (#59590658) Homepage

    And Linus is back baby!!! Who hoo....

    I can see the sensitivity training kicking in. He didn't refer to the coder as a moron, nitwit, A**hole or tell him to take his spin locks and shove them up his a**.

    I am looking forward to the rest of 2020 now...

    All new Picard
    All new Dr. Who
    Top Gun Maverick
    Hellfire missiles blowing up terrorists
    Crazy global politics
    and Linus rants....

    However, I truly feel sorry for our human brothers burning in Australia.

  • "Dealing with reality is hard."

    No shit. You don't even have to be doing spinlocks to know that. It's _this close_ (||) to being a tautology.
  • Game developers are pretty bad at coding, they can't figure out locking, and they certainely can't figure out coding for multicore architecture.

  • You are sloppy as shit and publically shout loud ass claims.

    You are rewarded for your sloppiness and belligerence with hits and views.

    You are called out on being sloppy by someone famous.

    As a consequence you are rewarded even more.

  • by m.dillon ( 147925 ) on Monday January 06, 2020 @12:49AM (#59591110) Homepage

    Locks are complicated. It's really that simple (ha ha). All of the operating system projects have gone through a dozen generations of lock design over the last 30 years because performance depends heavily on all sorts of things. In modern-day, cache-line effects (what we call cache-line ping-ponging between CPUs) are a big deal due to the number of CPU cores that might be involved. Optimal implementations in the days of 4-core and 8-core machines fall flat on their faces as the core count increases.

    Even situations that you might think wouldn't be an issue, such as a simple non-contended shared lock, has serious performance consequences on multi-core machines when they are banged on heavily... consequences that can cause latencies in excess of one microsecond from JUST a single NON-CONTENDED atomic increment instruction. That's how bad it can get.

    In modern-day kernel programming, spin-locks can only be used safely because the kernel has fine control over the scheduler. Spin-locks in userland tends to be disastrous in the face of any sort of uncontrolled scheduler action. And to even make them work reliably on many-core machines we need backoff mechanisms to reduce the load on the cache coherency busses inside the CPU. Linus is exactly right.

    There are other major issues with locks that become dominant on systems with more cores. Shared/Exclusive lock conflict resolution becomes a big problem, so the locking code needs to handle situations where many overlapping shared locks are preventing a single exclusive lock from being taken, or where many serial exclusive locks are preventing one or more shared locks from being taken. Just two examples there.

    Even cache-line-friendly queued locks (sequence space locks) have major trade-offs. Stacked locks (that look like mini binary trees) eat up serious amounts of memory and have their own problems.

    The general answer to all of this is to develop code to be as lockless as possible through the use of per-cpu (or per-thread) data structures. The design of RCU was one early work-around to the problem (though RCU itself has serious problems, too). Locks cannot be entirely avoided, but real performance is gained only when you are able to code an algorithm where no locks are required in most critical path situations. That's where all the OS projects are moving in modern-day.

    -Matt

  • So, what should/could Stadia do? There appears to be glitches in some games that people are noticing (what started the investigation), and those problems aren't occurring on other platforms. Is this something Google needs to fix with their dev tools? To scan all the code to ensure the wrong type of locks aren't being used and show what needs changing/where? Can it be fixed with a recompile? Or will Google's Stadia servers need something tweaked? I just hope the original problem gets fixed.
    • by sad_ ( 7868 )

      wanted to make the same comment!

      sure the game dev's code might be garbage, but the fact is that his garbage code doesn't cause any problems on windows, while it is an issue on linux.
      ofcourse, windows and linux are two different beasts, but it doesn't change the fact that the problem is there.

      i can't also imagine this is the only game conversion running into this issue. watching yt videos that compare fps between linux and windows, you'll notice that even native ports sometimes lag behind in fps. is this bec

  • by crustyhacker ( 6506576 ) on Monday January 06, 2020 @06:15AM (#59591484)

    The whole post is garbage. The problems with usespace spinlocks are pointed out in the pthread_spin_init(3) manpage and it's clear he knows about that library function as he mentions it in the comments [probablydance.com]. Shame he didn't bother to read the documentation.

    Spin locks should be employed in conjunction with real-time scheduling policies (SCHED_FIFO, or possibly SCHED_RR).
    Use of spin locks with nondeterministic scheduling policies such as SCHED_OTHER probably indicates a design mistake.
    The problem is that if a thread operating under such a policy is scheduled off the CPU while it holds a spin
    lock, then other threads will waste time spinning on the lock until the lock holder is once more rescheduled and releases the lock.

    If threads create a deadlock situation while employing spin locks, those threads will spin forever consuming CPU time.

    User-space spin locks are not applicable as a general locking solution. They are, by definition, prone to priority inversion and
    unbounded spin times. A programmer using spin locks must be exceptionally careful not only in the code, but also in terms
    of system configuration, thread placement, and priority assignment.

    And now a load of people with an actual clue are having to refute his nonsense. He should grow a pair and retract the post.

    • What blows my mind is that he didn't figure that all out on his own.
      When I first discovered the random number generator that is latency timing while under the control of SCHED_OTHER, trying to implement primitives as if I were the only process running, I investigated as to why... and it all became immediately clear to me.
      That was the very minute that preemption, and how it was accomplished, clicked for me.
      It was also about the same time I started growing hair on my balls.
      I apparently should have gone in
  • Linus is such a smart troll ...

  • If the blogger's post was a paper it wouldn't have gotten accepted to any decent OS conference. It's full of mistakes and bad assumptions from an inexperienced developer who doesn't realize how wrong he is and decides to blame something else. I mean the guy titled his post and how Bad the Linux Scheduler Really is.

    Linus's reply, while a little over the top, really contained some on the mark comments:

    I repeat: do not use spinlocks in user space, unless you actually know what you're doing. And be aware that the likelihood that you know what you are doing is basically nil.

    Because you should never ever think that you're clever enough to write your own locking routines.

    There's a reason why you can find decades of academic papers on locking. Really. It's hard.

  • It's a good thing he didn't get the older more grouchier Linus. That would have been something to read.

Never test for an error condition you don't know how to handle. -- Steinbach

Working...