Dev Boots Linux 292,612 Times to Find Kernel Bug (tomshardware.com) 32
Long-time Slashdot reader waspleg shared this story from Hot Hardware:
Red Hat Linux developer Richard WM Jones has shared an eyebrow raising tale of Linux bug hunting. Jones noticed that Linux 6.4 has a bug which means it will hang on boot about 1 in 1,000 times. Jones set out to pinpoint the bug, and prove he had caught it red handed. However, his headlining travail, involving booting Linux 292,612 times (and another 1,000 times to confirm the bug) apparently "only took 21 hours." It also seems that the bug is less common with Intel hardware than AMD based machines.
Dedication (Score:5, Insightful)
Re:Dedication (Score:5, Funny)
Re: (Score:2)
Re: (Score:2)
Thank you for this rare dose of /. positivity!
Credit (Score:2)
A lot of credit go to the people who made the Linux system faster to boot :)
Oblig. (Score:5, Funny)
You could say he gave that bug the boot.
Re: (Score:1)
Truly the hero we needed and did not deserve. (Score:3)
Digital isn't always precise and definitive (Score:2)
Digital stuff isn't always as precise and definitive as one would expect.
One plus one should equal two but due to the complexity of the machine and its interactions with both external and internal forces, sometimes one plus one equals 3, maybe 0, or 65535.
Re: (Score:2)
Oh it's precise, but there's enough variations in hardware to have race conditions. We know hardware race conditions as setup/hold violations where violating the timing may do the right thing, but it can also capture the wrong bit. And there are still anal
Found the repro; bug is still unclear (Score:4, Insightful)
Re:Found the repro; bug is still unclear (Score:5, Informative)
Either side of bisect (Score:2)
involving booting *pre-commit* Linux 292,612 times *successfully* (and another 1,000 times *post-commit* to confirm the *hanging* bug)
2023 (Score:4, Funny)
The year of the Linux sweatshop
Lucky guy (Score:5, Interesting)
booting Linux 292,612 times (and another 1,000 times to confirm the bug) apparently "only took 21 hours."
Way back in 1999, I had to test the real time clocks and OS timekeeping for Y2K preparedness on several HP 9000 [ALNT]-Class production systems (three of each) systems running HP-UX 11 which involved resetting the clocks to various dates/times and cold rebooting the systems about 20 times. I wrote an Init script to handle everything except toggling the power switches. The T-600 systems took 20 minutes to cold boot to single-user and a few more to run the script and shutdown, while the other systems were a bit faster but still long enough for it to get old. It was a very long, lonely night in the machine room.
Kudos to Jones.
Y2K (Score:3)
This is the reason why Y2K wasn't a widespread disaster as people feared it might be. It's not because the the danger had been overblown (as many people said afterward). It's because people like this spent hours in testing and remediating beforehand.
I personally tested a Data General mini-computer running DG-UX (Unix) by shutting it down, rolling the date forward in the BIOS, and restarting the machine. Never mind obscure date bugs in the application software - the server was unable to even boot! We wound u
First Debian release (Score:2)
The intial stable release of Debian (1.1, iirc) was delayed a couple of days when I tripped over and reported one of these.
On the Gateway (?) machines in the student lab I was using for a test install, *alternate* boots from floppy would fail. I tested on another couple of machines, and found it consistent.
Regular day. (Score:4, Funny)
So it's essentially just another day trying to get nVidia drivers to work properly.
Linux developers go above and beyond once again. (Score:2)
Re: (Score:2)
Re: (Score:2)
The sleuthing is impressive - but wouldn't it have been better to simply design the function before writing it the first time? A well defined domain and range for function inputs, along with some basic what-if analysis of fault conditions, probably would have uncovered the potential issue before initial release.
This is what disciplined engineering gives you - avoiding many problems before they occur. Sure it doesn't get all of them, but it gets most of them. And it's always[1] less expensive to find design
A man with two clocks (Score:4, Interesting)
It happened to me -- twice (Score:2)
This has happened to me twice in the last 2 years. Unexplained hang -- I looked at it and went, why?
This has to be it
Speedrunning the Linux kernel? (Score:2)
Seems Linux needs a "100% glitchless" category at speedruns.com...
Impressive (Score:1)
what kind of setup is he using? (Score:1)
Well (Score:2)