Improving Linux Kernel Performance 97
developerWorks writes "The first step in improving Linux performance is quantifying it, but how exactly do you quantify performance for Linux or for comparable systems? In this article, members of the IBM Linux Technology Center share their expertise as they describe how they ran several benchmark tests on the Linux 2.4 and 2.5 kernels late last year. The benchmarks provide coverage for a diverse set of workloads, including Web serving, database, and file serving. In addition, we show the various components of the kernel (disk I/O subsystem, for example) that are stressed by each benchmark."
But how many 3dmarks can I get?? (Score:2, Funny)
The Problems with Benchmarking like this... (Score:5, Insightful)
It sounds interesting, but it looks like the tuning is done specifically on the IBM platform, which makes me wonder. Linux already blows and MS product away for these applications, so I'm curious what they are comparing the results to. Did they just take an arbitrary point (processor load) for specific applications, or are they creating a specialized measurement (like SysMarks in Windows) that is only valid in their test suite.
Anyway, it should be interesting to see where it ends up, eventually.
Re:The Problems with Benchmarking like this... (Score:3, Insightful)
Some scientific/mathematical benchmarks would also be good to see.
Re:The Problems with Benchmarking like this... (Score:2, Informative)
You obviously need to look better at how Linux scale on 8P machines and more before making such statements.
Go to http://www.tpc.org and look at the results for Linux and Windows for 16P systems and more, Linux is non-existent, for a reason.
Re:The Problems with Benchmarking like this... (Score:2, Informative)
Seems like the post was probably more a troll than any important issue, since the site had 90% of tests on windows servers, and 2% linux, it cannot be taken to seriusly.
Please correct me with real tech facts.. not just some marked bs to tell what windows might be, but are not.
Re:The Problems with Benchmarking like this... (Score:2, Interesting)
Beyond that, they are not using a unified standard as their monitoring system. All of the Win machines use Com+ and the non-win use a variety.
They also say that most of the best Price/Performance machines are running Windows 2000 Server, or
Re:The Problems with Benchmarking like this... (Score:4, Interesting)
That reason would be the cost of the tests and the fact that most linux hackers don't have pockets as deep as billg's.
Re:The Problems with Benchmarking like this... (Score:1)
Re:The Problems with Benchmarking like this... (Score:5, Informative)
According to their website, "Full Members of the TPC participate in all aspects of the TPC's work, including development of benchmark standards and setting strategic direction. Full Membership costs $15,000 per calendar year."
Wow, a large percentage of the benchmarks are using MS operating systems. Oh look full members get to set benchmark standards. Mmmm, the only pure OS company who is a full member [tpc.org] is Microsoft. I wonder what kind of conclusion can be drawn
Re:The Problems with Benchmarking like this... (Score:2)
Do it yourself, or,
Trust them, potential interest conflicts and all.
This is the usual story when these "mine's better" discussions arise.
For benchmarks, who has a reputation for
Knowing what they are about, and
Remaining objective?
Re:The Problems with Benchmarking like this... (Score:1, Insightful)
TPC-C is not a perfect benchmark (in fact all cluster numbers or "cluster in a box" numbers should be disregarded or completely separated from "single DB instance" numbers). Still it takes a lot of work to get good numbers on TPC-C. A lot of that work will benefit normal DB users.
MS has good numbers because they did that work.
Oracle also has excellent numbers on Unix and Windows systems. DB2 also.
Oh, and I don't like MS numbers. When scalability or performance is required I'll recommend Oracle or DB2 over SQL Server any day of the week.
But to think that the bench are taylored to MS because they are members. They are as much taylored to MS as they are to Oracle, HP, IBM, Sun (well maybe not Sun, you'be need major tayloring to make Sun look good on any bench :-)).
You'll see open-source DB vendors join tpc.org when their software reaches the level of performance needed to show decent numbers on _current_ TPC benchmarks (I'm sure TPC-C will be replaced as it becomes increasingly irrelevant). Until then Op-Src zealots will feel the need to spread FUD about tpc.org.
Re:The Problems with Benchmarking like this... (Score:1)
If linux scalability is really an issue, beyond 8 processors, then i guess that the SGI Altix 3700 [sgi.com]is just
:
vaporware/gare.
I suggest that you read the following articles that debunk the myth of the 8processor barrier
SGI Busts into Linux with 64-Processor Scalability [linuxplanet.com]
NEC Calls Dibs on Breaking Linux Eight-Processor Limit [linuxplanet.com]
I personally hope that these benchmarks can be run against more recent kernels and a full description of optimizations and patches used disclosed.
Considering that SGI is using a [somewhat] standard 2.4.19 kernel to scale this well , I am certain that the results will be much better.
Re:The Problems with Benchmarking like this... (Score:2)
IMHO most optimization and tuning issues are roughly about three things: a static component (eg. RAM used for caching), a variable component (eg. RAM used for each request) and a 'panic' type component (eg. extra work needed for requests when running out of RAM). Its typically these type of differences in behaviour and system load which are interesting to compare. Even with a M$ box.
Re:The Problems with Benchmarking like this... (Score:2)
Re:The Problems with Benchmarking like this... (Score:1, Flamebait)
Amen. +5, Insightful.
A useful linux speedup guide (Score:1, Funny)
Some howto's include recompilering the kernal, enabling UDMA, turning off logging and enabling MMX enhancements.
Usually not necessary (Score:5, Insightful)
Things actually useful are: disabling unnecessary services on startup (if you don't use atd, don't start it to save start-up time, and in many machines it is unnecessary to detect hardware changes using kudzu upon startup); for machines with multiple HD's, put the swap on the faster HD.
Re:Usually not necessary (Score:1)
Now, I respect the testing and validation RedHat provides with their kernels, so I use them when I can. Arguably, if I would use more server-oriented hardware it wouldn't be an issue, but my budget is, to put it mildly, modest.
But you're right in the sense that there is probably little to be gained in saving, say, 50KB in your bzImage by cutting out drivers that you don't use, etc. At least I don't see it subjectively, maybe somebody else can volunteer some benchmarks, but I think the attitude that you can really see the difference by recompiling your own kernel for performance is a holdover from the days when the major distros only compiled for i386 and memory was a whole lot tighter.
Re:A useful linux speedup guide (Score:1)
No wonder M$ apps are so slow and bloated...
All their programmers are out pretending to be helping *nix admins. :)
Re:A useful linux speedup guide (Score:2)
"Bollocks to that", I thought, and put Unix back on.
Re:A useful linux speedup guide (Score:1)
Actually finding the performance problem? (Score:5, Interesting)
I was also suprised to see that they still use some of the old performance monitoring tools like looking at /proc, and other ascii tools, rather than something like PCP [sgi.com] that collects all these statistics together so that you can look at any combination of subsystems on the same time line. Then they could have graphs showing the interraction and load on the disk, cpus, vm, network etc.
Re:Actually finding the performance problem? (Score:4, Insightful)
Re:Actually finding the performance problem? (Score:5, Informative)
Some of the issues we have addressed that have resulted in improvements include adding O(1) scheduler, SMP scalable timer, tunable priority preemption and soft affinity kernel patches.
contributed kernel improvements (Score:1)
Re:Actually finding the performance problem? (Score:2)
For the list of things IMB has had a hand in lately: There was the above mentioned 0(1) schedular, lockless PID allocation, faster threads, IRQ load ballancing improvements and the retooling of several drivers' SMP locking. That's just what I can remember without actually going through my kernel archives.
Re:Huh? (Score:4, Informative)
I have to disagree, I thought Figure 3 illustrated how important it is to baseline to ensure that you are heading in the right direction with each change you make (although they did not have a uni-processor baseline result).
It also showed that with the changes in June they are able to get a 4 times performance with another 7 cpus. Maybe next time they will show how it scales over the number of cpus you have.
well duh, the article is about benchmarking (Score:2)
Notice not "IBM share Benchmark testing results"
Call me incredibly stupid, but.. (Score:5, Insightful)
Why is what we compare it to the most important issue?
Sure, we want to see how the Linux kernel is performing, but that's unrelated to increasing it's performance - when working on the performance of a single part, people built a test for that part, and tweaked it.
No benchmark or comparison is required in this case.
For simplicity's sake (Score:5, Informative)
Of course this applies to something else, like making transfers zero-copy, too.
Re:For simplicity's sake (Score:1)
Re:to improve linux kernel performance (Score:1)
Use the build-in benchmark tool (Score:1, Informative)
[...]
real 6m2.519s
user 5m13.950s
sys 0m20.080s
=> efficency: 93.6%
(2.4.18,xfs,ide)
Re:Use the build-in benchmark tool (Score:2)
user 1m35.010s
sys 0m6.030s
Hah!
Is there deviation? (Score:5, Insightful)
In theory, all benchmarks should come with an average value, and an error margin. Without this, the data should be not be trusted. It not only implies that the margin of error *might* be over 100%, it indicates the people running the bench marks don't know what they are doing.
There are a lot of reasons benchmarks can have errors, one of them being the benchmark program itself can be broken. How would you know that the numbers returned on some test weren't random if you didnt run it more than once?
Also, disk drives and networks have latencies which can make a huge difference; those difference can wash out apparent benefits of OS tweaks.
Re:Is there deviation? (Score:2)
--Robert
More attention to IO needed (Score:5, Interesting)
The problem is that the typical PC hardware is just not designed for that. Large proprietary Unix or mainframe systems usually have multiple very high speed buses; a single 32-bit PCI bus is rather low-end in comparison. Now of course this is not Linux's fault; but then again Linux is not just a PC operating system! So I guess my question is if this is about benchmarking Linux for enterprise use, how about some information about Linux running on enterprise-class hardware rather than suped up PC's. I'm sure IBM must have a few resources there.
In particular I'm interested in how the Linux kernel is designed to handle multiple independent I/O buses. Are the I/O schedulers weighted down with locking issuesor interrupt contention. Or what about the allocation of memory buffers between faster and slower I/O devices. Or even it's support for advisory I/O operations (hinting) that some proprietary OS's provide? What about asynchronous I/O?
And of course Linux suffers from the general Unix philosophy when it come's to giving I/O the same level of attention as CPU. For instance there are lots of processor use controls, such as process nice levels, processor affinities, real-time schedulers, threading options galore, etc. But how do you say that a given process may only use 30% of the I/O bandwidth on a particular bus? And those are things that mainframes were good at, so how does Linux on mainframes compare?
Re:More attention to IO needed (Score:5, Informative)
By running multiple kernels. Seriously: the way to get great performance out of PC hardware is to buy lots of it and cluster it. You still end up paying less for more performance than with the high end systems.
Re:More attention to IO needed (Score:1)
Re:More attention to IO needed (Score:2)
Some open source equivalent would sure be nice. But even something homegrown for particular applications isn't too hard; usually, you can find an obvious field pretty easily to distribute and balance database content to different servers by.
Re:More attention to IO needed (Score:5, Informative)
A single 32 bit PCI bus is anemic these days. That's why high-end servers based on ia32 processors include multiple PCI busses, increasingly PCI-X (133MHz, 64bit). Note that servers based on other processors are increasingly moving away from proprietary busses and using the same PCI you'll find in those intel-based systems.
Can Linux become Mozilla? (Score:2, Interesting)
Re:Can Linux become Mozilla? (Score:1)
Re:Can Linux become Mozilla? (Score:3, Interesting)
I'd say that's a rather strange conclusion. The only thing the 'C:' means there is a non graphic shell to use the OS.
My cellphone has something called Explorer, very similar to the M$ one. You can browse somekind of filesystem with it. Does that mean it's running windows? Does that mean there is a disk in it? Download cygwin and then windows can come up with '/root/ $:'.
Re:Can Linux become Mozilla? (Score:3, Informative)
ostiguy
Re:Can Linux become Mozilla? (Score:2, Interesting)
Not to say that featuritis isn't a threat. But ironically, the very "disadvantage" of Linux, its monolithic design that microkernel hackers love to bash, is making it pretty hard to add new features willy-nilly. If we were using the HURD, the kernel would be 900 megs by now... (and Emacs would be a kernel module)
Re:Can Linux become Mozilla? (Score:2, Funny)
Since when is having a choice from 900 megs of kernelmodules bad?
Not THAT bloated. (Score:2, Interesting)
Mozilla uses C++ (and most methods are virtual) and component interfaces like XPCOM. Such things probably enhance developer's productivity, but they incur quite a bit of overhead in code size and (less so) in speed.
It is great that core developers actually care about code size and instruction-level speed (such as the recent syscall patch, or those highly optimized inline functions in headers), and there are many people sending patches to clean up code. Maybe linux won't get as bloated as mozilla after all...
Benchmark junkies (Score:1, Funny)
I am one of them.
Please, mooore!!!
Measurement - Lord Kelvin said it best (Score:4, Insightful)
Re:Measurement - Lord Kelvin said it best (Score:1)
but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind
This kind of knowlege is called Enlightenment
Sub kernel? (Score:3, Interesting)
Is this a bad idea? Would it take too many hours of extra work?
-JB
Benchmarking for interactivity (Score:5, Interesting)
On a related note, my Mac Powerbook was really sluggish until I managed to kill some unneeded processes; they weren't really eating up time by themselves, but were somehow impacting system reactivity: The load factor hardly moved but the system became responsive to mouse clicks.
Something like the contest benchmark? (Score:3, Informative)
Still based on kernel compiles, granted, but at least it tries to measure responsiveness. Been used heavily to benchmark recent kernels - check Kernel Trap [kerneltrap.org] for results.
The Linux scheduling latency [zip.com.au] page of Andrew Morton might be useful as well. Alas, kernel patches tend to work on x86 first before PPC..
I/O vs CPU & Modularised Linux (Score:5, Interesting)
It is a pity that Linux like Unix developers have become a little stuck in their ways - hopefully they will do their best to address this in the 2.6 and 3.0 kernels.
I like the idea of a modularised kernel, where people could use the I/O system that best suited their setup - but this could involve an awful lot of division and arguments and the number of bugs that would result could be huge. Perhaps Linux itself could automatically adapt the way it works more to suit its needs - hence solving the problem of Linux hugely varying performance. Does anyone else have any suggestions or comments on this?
Re:I/O vs CPU & Modularised Linux (Score:1, Interesting)
mod_specweb (Score:1)
Uhmm... isn't this considered cheating?
source code for the patch [apache.org]
The fastest kernel confiuration is... (Score:1, Funny)
Not very scientific, not very informative (Score:1)
Interesting ideas about performance profiling (Score:3, Interesting)
S-Check [nist.gov]
S-Check starts with your original source code and points suspected of being bottlenecks. It adds artificial delays at the specific points throughout the parallel code. These delays can be switched ON or OFF. The switched delays generate numerous new versions of the program, with the delays simulating adjustments in code efficiency. S-Check methodically executes the many variants, recording delay settings and corresponding run times. S-Check analyzes the recorded entries against a linear response model using techniques from statistics. The results are a sensitivity analysis from which program problem areas can be identified. This provides a portable, scalable, and generic basis for assaying parallel and network based programs.
Paradyn [wisc.edu]
(overview) [wisc.edu]
"...a heuristic, goal-seeking algorithm was coupled with a dynamic instrumentation package to drive an automated, systematic inquiry into the performance of a parallel application."
The upshot is tools which can instrument a running system on the fly, and use statistical techniques that identify "hot spots" by looking for the amount of "collateral damage" when adding artificial delays to a particular location. You can even go farther, mapping out relationships, etc.
These are approaches that came out of parallel supercomputing, because in that field traditional approaches to benchmarking and profiling are often useless and/or impractical, and the systems (and programming problems) have become so complex that effective hand tuning becomes nearly impossible as well. Of course the kernel isn't so simple either, and these days you have parallelism to boot... I would love to see these techniques solving a wider range of problems.
With apologies to "Friends" (Score:1)
Cheers
Stor
It is very interesting who wrote the story (Score:1)
2 from ibm and one from AMD . It seems amd is
looking at intel boxes ?? "The architecture used for the majority of this work is IA-32 (in other words, x86), from one to eight processors. We also study the issues associated with future use of non-uniform memory access (NUMA) IA-32 and NUMA IA-64 architectures."
Hmm i am shure the next hammers could do the NUMA maybe they try do do it better in linux
Spelling (Score:1)
Desktop performance... (Score:2)
I have a crap all in one mobo, with shared memory Graphics without DRI support (ok i needed a pc quick), KDE is super clunkey under 2.4, with the CK performance patchset.
Under 2.5 the desktop is quick and smooth, applicartion seem to load a lot faster, Java applets don't hog the CPU.
So, if your running linux on the desktop, and you feel sufficiently compitent. Start testing 2.5.