Putting Linux Reliability to the Test 296
Frank writes "This paper documents the test results and analysis of the Linux kernel and other core OS components, including everything from libraries and device drivers to file systems and networking, all under some fairly adverse conditions, and over lengthy durations. The IBM Linux Technology Center has just finished this comprehensive testing over a period of more than three months and shares the results of their LTP (Linux Test Project) testing."
Linux Test? (Score:5, Funny)
Re:Linux Test? (Score:3, Funny)
Re:Linux Test? (Score:3, Funny)
I had a feeling (Score:2, Troll)
Almost 1P, but I RTFAd :( (Score:4, Interesting)
USE BAD HARDWARE! (Score:5, Insightful)
Get some ECS motherboard, generic RAM... bang. You're in for the evening.
Re:USE BAD HARDWARE! (Score:5, Insightful)
I've done it with my ECS board with generic ram, and I came out on top.
It's the big computer makers that sell the cheap generic hardware. Try getting anything that's essential and non-OEM, hardware or software, to work with one of those boxes.
Re:USE BAD HARDWARE! (Score:2, Interesting)
andy
You have to plan for that. (Score:3, Insightful)
In my experience, most of the no-moving-parts hardware will fail within the first week, or last for years and years.
The stuff with moving parts will eventually fail. But that's harder to predict.
Re:USE BAD HARDWARE! (Score:2, Insightful)
IOW, I agree - pick decent parts and get *exactly* what you want. I usually pick the previous generation CPU and get the biggest mobo I can for that from trusted brands. Then I stuff the mobo with the most it can handle, which is a *lot* nowdays. Of course, I get it all below retail from local OEM's, cash pai
Re:USE BAD HARDWARE! (Score:2)
ECS motherboards == PC Chips motherboards
lol
Oh well, they still make half decent parts provided you select the correct chipset.
Re:USE BAD HARDWARE! (Score:2)
Re:USE BAD HARDWARE! (Score:2)
Really, it sucked, windows sometimes crashed on boot on the thing, reinstalling the thing didn't help at all. Reliability was gruesome... until I installed Linux (this machine was one of the reasons I switched). I never had one single crash while the machine was running Linux, running it on Windows (dual-boot) was still as unreliable as ever.
Offcourse this is only one story.
Re:USE BAD HARDWARE! (Score:3, Informative)
Re:USE BAD HARDWARE! (Score:2)
Re:USE BAD HARDWARE! (Score:2, Insightful)
Otherwise you may have bad OS stability because you have a bad hardware constellation.
Re:USE BAD HARDWARE! (Score:2)
Re:USE BAD HARDWARE! (Score:2)
The value of this test is for people considering Linux for mission critical applications. That is to say the kind of bet-the-company applications that people don't run on a surplus box snatched out of a basement storage room.
That said, the kind of "hey kids lets put on a show" projects are where open source really shines. I'd swear that if we had to make the decision to pay money up front, my company still wouldn't have an email server, a web server, an i
Re:USE BAD HARDWARE! (Score:2)
Re:USE BAD HARDWARE! (Score:2)
So essentially this isnt a test of Linux, this is just a test of the pSeries hardware.
Now IBM can breathe easier, knowing they can just use Linux instead of having to pay people to support AIX. This is a great day for the pointy-haired boss!
USE GOOD HARDWARE, THEN! (Score:2)
There's plenty of good hardware to be had from places like Newegg, Directron and Computer Geeks. Just to name a few. Get yourself an ASUS motherboard, RAM from Crucial or from a reputable manufacturer like Kingston ValueRAM or Viking or Mushkin or Corsair, get a video card from a good manufacturer, and you have a nice solid machine that can handle anything.
Re:USE BAD HARDWARE! (Score:2, Insightful)
Are you saying good hardware can compensate for lousy software?
Good software CAN deal with louse hardware, but only up to a point. Even so, are you going to be running your mission-critical enterprise server on ECS motherboards and knock-off RAM? I hope not.
You don't trust Microsoft to evaluate Windows... (Score:5, Insightful)
Re:You don't trust Microsoft to evaluate Windows.. (Score:2, Insightful)
Re:You don't trust Microsoft to evaluate Windows.. (Score:4, Insightful)
They have much to gain: more corporate customers and more respect and funding by greater IBM. Just because IBM supports Linux doesn't mean its motives are pure (not financially driven). Another reason for bias is the division also stood to have huge setbacks if the tests were unfavorable. How could they justify expansion and better funding if their previous statements about Linux being enterprise-ready were unfounded?
Re:You don't trust Microsoft to evaluate Windows.. (Score:5, Insightful)
Also, Linux has weathered some unfavorable (and honost!) critiques before. Linus Torvalds said it best when he said (and I paraphrase since I am too lazy ATM to look up the actual quote) that it doesn't matter if there's negative publicity in the press about Linux. It just meant he got his bug reports from the Wall Street Journal as opposed to the regular kernel mailng list.
Re:You don't trust Microsoft to evaluate Windows.. (Score:2)
I think the rest of your post is either biased (assuming IBM only intends this article for a technical audience that won't use it for business decisions) or inflammatory (the comments on Microsoft that mostly have root in the attitude on Slashdot more than Microsoft's actual statements).
I need something more empirical than this assertion. I don't see kernel developers spending weeks duplicating IBM's results jus
Why? Here's why... (Score:5, Interesting)
Re:Why? Here's why... (Score:3)
Yes, they are documented, but some of the evaluation criteria ("expected behavior") depend on the opinions of the team performing the evaluation.
because it's disclosed up-front that it's IBM Linux Team testing Linux
Yes, that's better than the Ziff-Davis test (which I'm familiar with and mention in another post on this thread). However, assuming no bias because of a disclosed possibility of bias is illogical. Such disclosure is a necessary but insufficient co
Re:Why? Here's why... (Score:5, Informative)
You should not trust this evaluation at all.
After all... On the internet , nobody knows you're a dog.
Any JimBOB can write a convinving paper, with all the right buzzwords, that sounds as if X+Y=Z, especially if that was logically a likely/expected outcome in the first place.
As a well-known TV show once said (several times and loudly) Trust No-One.
Remember people, YMMV.
Re:You don't trust Microsoft to evaluate Windows.. (Score:2)
Because it is to IBM's advantage to find any weakness AND FIX IT before their customers run into the same problems. Any "insider knowledge" would be used to make the tests harder rather than easier. If it were "IBM Linux" rather that "SuSE Linux" that was being tested you'd have at least a chance of making a point.
Re:You don't trust Microsoft to evaluate Windows.. (Score:2)
Why do you trust IBM's Linux Technology Center to evaluate Linux?
Because the goal of the test is to find out whether or not IBM's customers should feel comfortable using Linux instead of AIX or some other highly reliable OS for mission critical computing applications, and IBM will look really bad in front of its own loyal, big-money customer base if the test results don't hold up in the real world.
More concisely: Because IBM will lose sales if Linux fails after IBM said it wouldn't.
Because it confirms daily experience? (Score:3, Insightful)
MS is running ads saying how windows XP is so reliable. It is kinda hard to believe when you hear the ad because you a getting a cup of coffee waiting for XP to reboot. Same with 2k3. It crashes. Not as often as XP same as XP doesn't crash as often as 98 and so on. But it still crashes.
Now on to my linux machines. Wich don't crash. I only run in total about a dozen of them and not one of them has crashed.
I also have had some experience with AIX. Typically on machines
Re:You don't trust Microsoft to evaluate Windows.. (Score:5, Interesting)
The people performing it have a vested financial interest in having it turn out a specific way, notably positive. If the test resulted showed poor reliability, then I would understand trusting it because it would go against the motives of the people performing it. Since the test affirms their business model, no matter how documented it is, it should be suspect.
It doesn't appear to be a test rigged to make one platform look better than the other.
It looks a bit skewed to me. Many of the test results depend on the computer systems meeting expectations of the people testing it, particularly in overload cases. Since the people who tested work in the Linux Technology Center, their expectations stand a greater likelyhood of being consistant with the system.
Take C/C++ and Java. Someone who regularly works with C/C++ knows certain libraries (notably the character ones) return ints for status in the form 0 being false and not 0 being true. If someone expects that, the system meets expectations and passes. If someone comes from a different background, say Java, he or she may not expect that, and the system would consequently fail the test of meeting expectations. I would like an evaluation from somewhere in-between, not someone whose years of experience allow them to gloss over what might be problems for another person.
Re:You don't trust Microsoft to evaluate Windows.. (Score:2, Insightful)
Re:You don't trust Microsoft to evaluate Windows.. (Score:2, Interesting)
That may be human instinct, but let's be honest: it's not fair either. Either trust the source of information or don't, but don't trust the result of a test based on the result of the test -- circular dependency.
It's much simpler to simply say
Re:You don't trust Microsoft to evaluate Windows.. (Score:3, Insightful)
I think it's more human than that. If a company or group releases results completely against their interests, integrity is the only reason such a group would go forward. Why would anyone skew results to spite themselves?
Re:You don't trust Microsoft to evaluate Windows.. (Score:3, Insightful)
"wow, last year, they had to admit their product just wasn't up to the task. but now, dang, look at 'em go!"
yes, it's quite human indeed. you don't know what all they're up to -- what seems to be self-defeating isn't always. and sometimes, well, you honestly find out that you're doing the job you had hoped you were doing, trouncing the competition. go figure: you might actually manage to not suck! but you don't get to tell anyone? and your only so
Java uses booleans for logic variables (Score:2)
Re:You don't trust Microsoft to evaluate Windows.. (Score:4, Informative)
Microsoft commonly hires outside companies to perform their tests. Do you remember the evaluation of Exchange versus Notes/Domino scalability by Ziff-Davis but funded by Microsoft? People justifiably questioned those results, as the company hired (Ziff-Davis) has an interest in pleasing the hiring company (Microsoft) so they get future work.
s/w -vs- h/w failure? (Score:5, Interesting)
I seem to recall getting random crashes with cheapo memory, and it was a pain to track down the offending component. Of course, one would assume that IBM wouldn't go for cheapo components, but still: how does one point the finger at the software, instead of hardware? Is it just repeatability?
Re:s/w -vs- h/w failure? (Score:5, Informative)
memtest86 on PPC? (Score:2)
Re:memtest86 on PPC? (Score:2)
Re:memtest86 on PPC? (Score:2)
Re:memtest86 on PPC? (Score:2)
Which is kinda teh suck when you try to figure out if you have a bad motherboard or just a bad stick of ram...
So, could someone point me towards a version (CD/floppy-bootable pre-built one, if possible) that does work?
Diagnosing software vs. hardware is easy. (Score:3, Insightful)
So when you run a test 5 times, and you get 5 results, the hardware is broken. When you run the same test 5 times, and it gets to the exact same point before sig11ing, you have a software flaw.
This is also why you do multiple tests to ensure you're getting an accurate picture of what's going on (flawed or not).
Re:Diagnosing software vs. hardware is easy. (Score:2, Informative)
Not necessarily: When uncompressing one of the XFree86 source tarballs, X430src-3.tgz, on my old k6 2-450, gzip would always die with a bad CRC. Nothing else at all seemed to go wrong with the machine, but I couldn't uncompress the file until I downed the memory clock to 66MHz, rather than 100.
I found one other person with the same motherboard having the same problem in a google search,
True. (Score:2)
Re:True. (Score:2)
Re:Diagnosing software vs. hardware is easy. (Score:4, Informative)
Re:Diagnosing software vs. hardware is easy. (Score:5, Insightful)
That's true... in theory. In practice, there are many ways software can fail in random (in the weak sense) ways. Many of these are related to timing. For example if you have many threads and fail to lock things properly, the result will depend on when the tasks are preempted. You can also have different results because of the way the interrupts (disk, net,
I'd say that the only kind of software that can't fail randomly is single-threaded and doesn't rely on any input other than regular files (and even then I'm not sure it's enough).
True in theory, very true in practice. (Score:2)
I've put in many hours of debugging software (my own and others), and all the crashes and other prob
Re:True in theory, very true in practice. (Score:2)
Right. That's where two bugs get together and you notice something.
Remove either of the bugs and nobody will notice anything.
Remove both of the bugs and maybe you get a bit closer to having something debugged.
(IE: Apache won't die in a random fashion)
One of us severely misunderstands Apache. My understanding is that it is quite possible to run Apache in production with some very broken modules.
"Try it again. Maybe your browser has some problems."
and v
Re:Diagnosing software vs. hardware is easy. (Score:3, Informative)
This isn't true. If you're running a program that uses a deterministic memory allocation algorithm (a compiler, for instance) and have a segment of bad memory, then you easily could crash at the exact same point (when a pointer in that segment is dereferenced, for instance).
I know. It's happened to me. I've even
Re:Diagnosing software vs. hardware is easy. (Score:4, Interesting)
I think you left something out (Score:2)
And this is the real way to determine it. Run the suspect software on a piece of hardware known to be good or run a piece of software known to be good on suspect hardware. Testing anything with a single sample is meaningless. If you
Maybe.. (Score:2)
Re:Maybe.. (Score:2)
Or you boot system, start process x, at stage y you get a segfault, you reboot and try it again, and again and again, it always happens at stage
Re:s/w -vs- h/w failure? (Score:2)
Yes, the box makes the hardware call. It is freak to come to work and have IBM service sitting at your door want into install a hard drive or I/O Card... and the machine is still running.
And now most of the time, they change it while the machines is active.
Not bad (Score:5, Insightful)
Brian
Re:Not bad (Score:2)
Stability of a stripped down linux boxen doing just what it was intended to it might be much better than that of windows XP (with all the bells and whistles) but i would really like to see both of them loaded with similar number of apps and look at how their performance match. From my experience, XP has been much better in that instance.
Re:Not bad (Score:2)
I wasnt taking away the credit from either of the operating systems. just mentioning the problems i have observed working with linux distros - RH 8.0 , 9.0 and more recently Suse 9.0 pro (dual boot with XP). I do think XP has been much faster to boot/ much more stable when it comes to managing rouge applications (killing one process doesnt affect the rest - freeze the system) on the same PC. Try that with Suse 9.0 pro and wat
Re:Not bad (Score:2)
They certainly aren't an everyday or everyweek or even everymonth event but they do happen. Personally I just walk 3 feet to my wife's computer and ssh into my system and take care of it (the system hasn't c
Re:Not bad (Score:2, Interesting)
Windows, even the server versions, are not the enterprise class OSs that they are marketed as. This should come as no surprise, because they were not even designed that way in the first place.
All you have to do to realize this is boot up W2K AS and use it as a desktop machine for awhile. All of the desktop crap is still there sucking up resources. Even Freecell is there, fer cryin' out loud! Try as I might, I can't come up with a good reason for a headless serve
not much available (Score:2)
How many operating systems run on IBM's pSeries machines? AIX and...?
Re:Not bad (Score:2)
Results... (Score:3, Interesting)
Re:Results... (Score:2)
Needs to be done independantly (Score:5, Insightful)
Second off, If this were M$ testing 2k3 and publishing the paper, everyone here would be crying foul. But because its, "Linux" it must be 100% unbais and true.
I've been using Linux for 8 years now including under high stress enviroments, 3d graphics rendering mainly, and from experiance I have see very good things from Linux. We have had software glitches before, but the core software maybe has caused 3 - 5% of our downtime. Over 70% of our downtime involves human error and about 25% of failures are due to hardware giving out.
Still what my customers are wanting to see isn't benchmarks as "So easy Grandma could use it" in Linux. While the people in the datacenters want to know how well Linux will bear under a load, most end-users and SMB's don't need to worry about it, they just need something easy to use that works.
Re:Needs to be done independantly (Score:3, Insightful)
Second, you're probably right about the publishers of the paper, but hey, what can you do? The people with the most interest in th
Re:Needs to be done independantly (Score:2)
Re:Needs to be done independantly (Score:4, Insightful)
They aren't making a comparison to other OSes or saying that Linux is more suitable than such-and-such operating system; just that it is suitable for particular tasks or environments.
A comparison between different OSes should be carried out by an independent testing facility but, in this particular case, I don't see anything
wrong with their modus operandi.
Re:Needs to be done independantly (Score:4, Insightful)
An ironic assertion regarding bias. IBM isn't the author of Linux or any of its tools, add-ons, servers, etc. as Microsoft is of 2k3 and its support software. Microsoft also has a long and distinguished history of FUD. IBM doesn't have anywhere near the historical attachment to Linux that MS has to Windows, and IBM hasn't been caught lying about it yet. It would be irrational to treat the two equally at their word.
Re:Needs to be done independantly (Score:5, Insightful)
Pretty amusing that you would say that, considering the origin of the term: [catb.org]
Not that I disagree with your assertions - IBM doesn't have near the same ties to Linux that MS has to Windows. But it's amusing to see how much the technological landscape has changed, that a term coined to describe IBM can now be used to (in some sense) defend it.
WHAT is the failure? (Score:5, Interesting)
Sorry but that means nothing. Even if there -was- a comparison to other systems, it would still mean nothing. 95% success ratio, 78% happiness factor and 93% user satisfaction.
Re:WHAT is the failure? (Score:4, Informative)
Re:WHAT is the failure? (Score:2)
Lies, damn lies... (Score:2)
Failures needed (Score:4, Interesting)
I have been trying to write some tests of my own recently. So far I have found a filesystem OOPs, a ptrace BUG(), and my system locks up on low memory situations. Probably the lockup is because my ethernet driver allocates memory in the interrupt handler (GFP_ATOMIC) and can't handle the result when there is no memory available.
I need to fix the lock up first of all so the other tests have time to run...
Here goes (Score:5, Insightful)
The tests demonstrate that the Linux system is reliable and stable over long durations and can provide a robust, enterprise-level environment.
Ok, now i dont mean to troll here, so mod down if you wish, i really dont care.... BUT...
I am a linux user/programmer/lover for the past few years now, and i wanna see a company that is not SO IN LOVE with linux say what have just been said by IBM above.
In other words, i dont want to see companies who sell Linux, or who have benefit in selling Linux praise it. Does any one of you know of someone who fills in these criteria. Sun for one is not very fond of Linux, nor is MS ofcorse (despite the fact sometimes i doubt they have code in their stuff from Linux...)...to make a long story short
It would be really nice if such a judgment came from someone else besides IBM/REDHAT/ORACLE...
Re:Here goes (Score:3, Informative)
It takes a lot of time and money to do very thorough analyses of operating systems, hardware and enterprise apps. So that money has to come from somewhere. It would
Unbiased research is REALLY hard to get (Score:3, Insightful)
Even true independants are often not unbiased. For example some individual, with no teis to and OS developers or vendor, might decide to test OSes. Of course it might be that they are a huge Linux or Mac or Windows zealot so again stack things in their favour
Look at companies using Linux (Score:3, Interesting)
The closest you're likely to get is good testimonials from companies using Linux. IBM, SUN etc. all have a stake in Linux, and the 'independant' research outfits are probably funded by them, or by Microsoft (in case Linux needs a good bashing).
My client is a big megacorp. Their strategy for the coming years is to migrate all Unix systems to Windows/.Net (client side), and to Linux or NT (server side, depending on which OS fits best). This is
Scaleable.... really? (Score:5, Insightful)
Sorry, but how can the scaleability of the CPU resource be proven on a 2 CPU system? Show incremental results on 1, 2, 4, 8, 16, 32 etc. etc. and then CPU scaleability may be proven.
This is NOT an anti-Linux troll, rather the evaluation needs to justify it's outcomes or it starts to look like something from a company starting with M.
Axioms of Christmas Linux Economics (Score:3, Funny)
2) My time is not worthless
3) Linux is not free
4) Giving Linux as a Christmas present is not "cheap"
5) Linux is a good Christmas present.
So what ? (Score:4, Insightful)
The trouble is that, after a period of increased stability in the 1980's, in the last decade people have come to expect that computers fail, and they wonder with amasement if they don't.
OK: 30 years ago I remember it being a good day if the mainframe stayed up 12 hours. But things have moved on, today you expect your: MVS, VMS, Unix, Linux machine to stay working. The only OS vendor who's products have not matured is the one in Redmond - largely because of rampant infestation with new features.
The above is not intended to belittle the fantastic efforts of all those involved.
I already knew GNU/Linux was stable (Score:4, Interesting)
IBM just confirmed what I already knew. Guess what, Win2k is pretty stable, too. Sorry, but it's true.
But, jeeze, isn't anyone else drooling over those systems they tested on? Makes me hate my busted whiteboxes and horrible HP's a little more everyday.
Repeat after me....."MMMM, dual Power4......MMMM, dual Power4...."
My experience (Score:2, Informative)
My notebook has a flaky RAM connection. 32 MB comes and goes depending on how the machine is squeezed. Win 9x products crash it hard, Linux and Win2k don't even notice.
So in my experience, Linux doesn't mind a hostile platform.
IMHO (Score:4, Informative)
The very reason Linux has already made so many inroads into coporations in the first place is because of its reliability and stability, and not because some marketing campaign has churned out the words on header paper.
Another point is that I personally expect the sytems I administer to run for a darn side longer than 30, 60 or 90 days unless I need to restart them because of a kernel upgrade. When my last bunch I worked for went tits-up, our SAMBA file server had a 790 day uptime, and had run the SAMBA daemons reliably throughout, as well as doing internal DNS and DHCP. That's what your average Linux sysadmin expects from a Linux server box.
A Linux desktop being used for all manner of things though is completely another story: if I muck around with the Linux install on my laptop, as I do because that's what I do, then I expect to break it from time to time, and so "reliability" is not measured in the same way on a desktop/laptop system, IMHO.
The ideal environment for Linux is as a networked server, where it can get on with doing what it was setup to do, and will continue doing so until someone pulls the power plug on it. In that context, there are few OS's playing on the same field that can rival it for reliability and stability.
The problem lies not with reliability (Score:4, Insightful)
I couldn't tell you the number of times I tried to install something and it fails because I was missing "X-Widget-2.41.so.1", so I try to install that "X-Widget-2.41" package and the "X-Widget-2.41-devel" package and they fail because they are missing several other depends as well.
Linux stability is fine. The GNU software stability is fine. We need a better way to install and maintain software.
LK
My experience: Linux survives hard drive crash (Score:5, Interesting)
It's been running just fine for a month now with a dead hard drive.
(Yes, I'm getting a replacement because it won't survive an extended power outage on that ancient battery.)
Re:Linux Reliability? (Score:5, Funny)
I'm thinking it doesn't.
Re:Linux 2.4.19-ull-ppc64-SMP (SLES 8 SP 1) (Score:3, Interesting)
Kernel 2.6 (Score:3, Insightful)
Don't knock "yesterday's news". Far be it from some geeks to understand this, but there are times that "tried and true" is more important than having the latest and greatest. This testing started well before 2.6.0 was released! They can probably get started wit 2.6 as soon as an enterprise Linux distribution incorporates it.
Re:Kernel 2.6 (Score:2)
They can probably get started wit 2.6 as soon as an enterprise Linux distribution incorporates it.
Is it just me, or is that backwards? If SuSE or RedHat wants me to pay $BIGNUMBER for use of their Enterprise distro services, I'd expect the testing to be done before it gets integrated into the distro. In fact, this kind of testing should be part of the doco you get to see before you sign up for the newest Enterprise distro.
Hopefully this is the start of a trend.
Re:Linux 2.4.19-ull-ppc64-SMP (SLES 8 SP 1) (Score:3, Interesting)
Re:Linux 2.4.19-ull-ppc64-SMP (SLES 8 SP 1) (Score:5, Insightful)
I think IBM used SuSE instead of Redhat because IBM Global Services and SuSE have been partners [itworld.com] for almost two years.
Maybe you should stop hmmmmm'ing about these great mysteries and start googling.
Re:Linux 2.4.19-ull-ppc64-SMP (SLES 8 SP 1) (Score:2)
Re:Linux 2.4.19-ull-ppc64-SMP (SLES 8 SP 1) (Score:2, Informative)