Is Profiling Useless in Today's World? 229
rngadam writes "gprof doesn't work in Linux multithreaded programs without a workaround that doesn't work that well. It seems that if you want to use profiling, you have to look for alternatives or agree with RedHat's Ulrich Drepper that "gprof is useless in today's world"... Is profiling useless? How do you profile your programs? Is the lack of good profiling tools under Linux leading us in a world of bloated applications and killing Linux adoption by the embedded developers? Or will the adoption of a LinuxThreads replacement solve our problems?"
Profiling Again? (Score:4, Funny)
Down with profiling!
Profiling is Useful (Score:5, Insightful)
Saying "profiling isn't useful" is similar to saying "having information isn't useful".
That's just dumb.
Re:Profiling is Useful (Score:5, Informative)
Of course, for many applications, multi-threading achieves the vast majority of the speed increase, and profiling will only be of marginal utility. The profiler is just one tool of many, and is not a silver bullet.
Re:Profiling is Useful (Score:3, Funny)
Ulrich Drepper (Score:2, Insightful)
Yeah, mod me down, but I have insight into the things Ulrich does, and he mostly does sh*t. Just my 2 cents (USD or EUR, you decide).
Re:Ulrich Drepper (Score:2)
I tracked down a bug in __fsetlocking and he was most helpful in fixing glibc.
Pan
Re:Ulrich Drepper (Score:2)
OProfile (Score:5, Informative)
OProfile is a system-wide profiler for Linux x86 systems, capable of profiling all running code at low overhead. OProfile is released under the GNU GPL.
It consists of a kernel module and a daemon for collecting sample data, and several post-profiling tools for turning data into information.
OProfile leverages the hardware performance counters of the CPU to enable profiling of a wide variety of interesting statistics, which can also be used for basic time-spent profiling. All code is profiled: hardware and software interrupt handlers, kernel modules, the kernel, shared libraries, and applications (the only exception being the oprofile interrupt handler itself).
OProfile + Prospect (Score:4, Informative)
No, take a look at FunctionCheck (Score:2, Informative)
Five bucks says that this server is slashdot'ed within the hour, so you may have more success with the less descriptive SourceForge project page [sourceforge.net], indicates that the project is not dead, as the homepage says.
I discovered this program when I was optimizing some code I wrote to multiply sparse matrices. By the time I had gotten it 100x faster than the initial code, gprof had lost all semblance of granularity and was giving me obviously bogus results. The problem is that such things as cache performance (i.e. optimizing for cache hits) were now heavily affecting the profile and gprof could not figure such things out. FunctionCheck [univ-lyon1.fr] works much better than gprof and actually generates accurate profile information under high-stress situations.
From the homepage (all grammatical errors theirs):
"I created FunctionCheck [univ-lyon1.fr] because the well known profiler gprof have some limitations:
My approach is simple: I add (small) treatments at each enter and exit of all the functions of the profiled program. It allows me to compute many information:
Try it out and please contribute some source code.
tsprof: process profiling on Linux/x86 (Score:2, Informative)
'pstack' on Solaris (Score:2, Informative)
given function, running 'pstack' against a
processID under Solaris will give the execution
stack trace of any threads present.
If you find that 80% of your threads are in
slow_function( someParam ) then ya better get to
work fixing it. This also has the added advantage
of not slowing down your program with profiling
code and other hooks.
Obviously this isn't great for fine-grained
profiling, or with applications with few threads,
but I've found it helpful on my larger projects.
Re:'pstack' on Solaris (Score:3, Informative)
Anyway, we ran the equivalent of pstack at frequent intervals (like once per millisecond) and then collected the addresses of all functions in the call tree present each time we polled the system. Got a humongous file. Then postprocessed the file to record which functions called what other functions, and how often and looked up the addresses in the symbol table to give usable names.
It turns out that polling the system like that usually gives all the important information you could want- it tends to show not the most called functions but the heaviest users of the processor because they are much more likely to be running when the pstack happens- the number of times they will appear is proportional to the total time they run for, statistically. And the technique is minimally invasive and doesn't require recompilation of the code under test.
Then we printed the summary out in a huge printout, each function sorted by percentage ticks spent in it; and then spent a week or two staring at it. It showed some amazing features like certain functions were spending an order of magnitude longer in them than originally designed, that kind of thing.
It is really quite a useful technique.
Re:'pstack' on Solaris (Score:2)
Hell, yes it's useful (Score:3, Insightful)
What could be more useful is if the compiler implementor would spend as much time on the profiler than on the compiler: you would then be able to easily see faulty parts in your software and be able to determine what needs to be optimized.
Good profilers would means efficient code. Don't think profilers are useless because most implementations of them sucks.
Re:Hell, yes it's useful (Score:4, Insightful)
Better yet: Optimization from profiler feedback (Score:2)
What could be more useful is if the compiler implementor would spend as much time on the profiler than on the compiler: you would then be able to easily see faulty parts in your software and be able to determine what needs to be optimized.
Better yet, if an architecture has a static branch predictor that encodes "mostly taken" or "mostly not taken", the compiler could emit profile code that measures how fast a particular variant runs and then take that into account for the next optimization pass.
Dead on linux? What about windows? (Score:2)
VTune and Quantify (Score:4, Informative)
If you want a flat profiler or need to analyze the cost of specific low level operations then you MUST get Intel VTune.
Profiling will always be useful (Score:5, Informative)
But even if you aren't doing something that is speed intensive like games, you always have tradeoffs when you choose your data structures and algorithms. Generally you first code up the easiest algorithm that you think will use an acceptable amount of memory and CPU time. Then, later, if something is too slow, you have to identify where the problem is. If could be that you chose an O(N^2) algorithm not realizing that N might be 1,000 instead of the max of 100 you were counting on, forcing you to switch to an O(NlogN) algorithm that is more complex.
Now, if it is a small application, you might have enough familiarity with the code to be able to guess where the problem is -- then you fix it and see if it is still slow. If that works, then you're set and profiling isn't necessary. But if the fix doesn't speed it up enough, then you're stuck. You have to profile it somehow.
You might try simple tricks like changing the code to loop on a suspected bit of code 100 times and see how much longer it takes. Or maybe throw in some printf's that spit out the current time at different points. Or maybe create your own profiling code that you manually call in functions you want to time. Or, you might use an actual profiler without modifications to the code. But lacking a profiler doesn't mean you can't or won't profile your code.
And even with CPU speed doubling every couple of years or so, that doesn't mean speed is no longer an issue. You can easily choose the wrong algorithm and have something take 1000s of times longer to run than the proper algorithm.
I used gprof (Score:3, Informative)
This program was parallellised on network level - all clients were singlethreaded. If someone has multithreaded for performance (to utilize more than one cpu) I suppose gprof will still work well on a single cpu machine with just one thread.
For programs that consumes lots of cpu time for well-defined computations it should not be hard to profile a single threaded version (a single threaded version is needed for debugging anyway).
More complex applications (for example a web browser) I imagine are more dependant on multi-threading, and should pose a larger problem.
gprof, is probably not dead - if you need it you can adapt the program...
Good point there. (Score:2)
Isn't that what Open Source is all about?
Programmers, not tools (Score:4, Insightful)
Those of us that started programming in 1k and sub megahurts can really feel the time taken by badly coded applications. We know that forgetting what is happening on the silcon can kill how well our code will run.
However, those who started coding after ~1987 don't really have a gut feeling for it. To them the latest processor will make up for their bad coding. To a certain extent they are right. Today's advances STILL keep up with Moore's law, still make up for their lack of skill. However, when one looks at what is actually performed with all that power, one tends to question why we are paying so much, for so little.
Can you actually say that MS WordXP is much better than the non-WYSIWYG wordprocessor of yesteryear (itself a blast from the past) ?
We don't need profilers, we need coders have have that tacit knowledge of what really counts, where they should put real effort.
Unfortunately that doesn't come in a software box.
Performance is not what really counts. (Score:2)
It's folks like you who are the reason people still write their SSH daemons in C, and why we live in a mixed up world where we have neither stability NOR speed!
Yes it does! (Score:2)
But he didn't say that... He said that programmers should know where to invest the effort, and take an interest in creating efficient code. That means, first and foremost, exactly what you just said: you have to be smart about your DS&As, aware of what you're writing and not pointlessly lazy when coding. It doesn't mean, and wasn't claimed to mean, that you have to micro-optimise everything at the assembler level.
Re:Programmers, not tools (Score:3, Insightful)
I used WordPerfect 5.0 (or whatever it was) on a dual 360K 5.25" floppy disk drive machine. Plain blue text screen only. I have to say, I *much* prefer Word XP. If given the choice, I would not go back to those crappy DOS days.
By all means, be sentimental and reminisce about the old days. But things have changed - accept it.
Re:Programmers, not tools (Score:2)
Not to be a troll, but I see a lot of programmers with this kind of attitude - "let the compiler catch mistakes", or "code it fast and use a debugger," etc... What invariably happens is that these programmers who learn to code this way spend their careers writing code which is neither efficient nor easy to maintain. Worse, they waste a lot of time using a debugger that they could have avoided had they thoroughly planned their code.
I used to just blitz through the code, without really planning what I was going to do. While this worked well for small projects, when I got into the professional world, my debugging time went up by an order of magnitude. I've found that I actually get code done faster if I think it through and plan it out before I start writing. I've learned that if I want to fly through coding and debugging, I have to take some time first and plan what I'm going to do. Otherwise, if I scramble off to code writing without planning, I end up using the debugger quite a bit. But then again, YMMV.
No, he's right (Score:4, Insightful)
Why are these mutually exclusive? There's efficient and there's optimised, and one is a much easier subset of the other.
He's not claiming that everyone should hand-optimise from the word go. He's saying programmers should have a basic knowledge of their craft. It doesn't take much extra effort to use an efficient sorting algorithm or store data in a fast look-up structure, rather than writing a naff, hand-crafted shuffle sort and using arrays for everything whether they're appropriate or not. And yet, through ignorance or plain laziness, most programmers in most languages take the latter approach. (If you've never seen any of the source code for big name applications/OSes, trust me, it's scary.)
Similarly, it is just careless to pass large structures by value unnecessarily in a language that has reference semantics. You have to know the basics of what is efficient use of your tools of choice if you want to write good code, and the old Moore's Law excuse is just a cover for laziness and failure to do the requisite amount of homework.
Note that, very importantly, none of these things requires more than a small effort. They certainly don't compromise maintainability, bug count or any other relevant metrics, and a competent programmer (if you can find one) will take these things in his stride, and still be faster than the others.
Interesting... We have just acquired a new P4/2.2GHz with 512MB RAM and running WinXP as a development machine at work. You know what? It's way, way slower than the 1.4GHz P4 running 2000 we already had. And that in turn is way slower than the 1GHz P3 running NT4. This is not subjective, it is based on obvious, objective measures. For example, my new machine (the fastest of the above) sometimes takes 3-4 minutes to act on an OK'd dialog in Control Panel. The NT4 box reacts instantly when you configure the equivalent options. Something is wrong at this point, and I'm betting it's a combination of code bloat and feature creep.
Re:Programmers, not tools (Score:2)
Thanks for making my point for me.
Quality is designed IN, taking bugs OUT is an admission that you really didn't pay enough attention at the beginning. Sure you get the odd typo, but the real bugs are the ones in the logic of what your writing - and you often don't catch all of those.
If you are thinking about what is actually happening, rather than just pasting in a bit of gash code, you are much more likely to create something with quality engineered IN. Trust me, its the only way its going to get there.
As for the 'speed it up after the event' crowd - did you ever think that if you used the right approaches, the right concepts, from the start, you wouldn't have to spend the time tweaking some supposed critical element at the end? It should be second nature IF you really understand what you are doing. Sure there are always the games, device drivers etc., but I'm talking about the day to day code that gets executed every day by millions of people around the world. It generally takes no more effort to use the right technique as the wrong one - if you only knew the difference.
Have a little pride in your work man! You might find that your 'good enough, lets stuff it out the door' mentality is why you don't go forward and your company goes to the wall as a result of a buggy product.
Re:Programmers, not tools (Score:2)
Hell, yes. WYSIWYG is very useful.
However, if you ask if WordXP is much better than Word for Windows 2.0, well, that's a much harder question to answer in the affirmative.
For me, anyway.
Tim
Re:Programmers, not tools (Score:2)
I seem to spend time everyday helping out someone who is trying to fight Word into doing what THEY want, rather than what it wants to do. This is not you or me, the people who can just pick it up and use it; its the non expert, the majority. They simply find today's Wordprocessors no great advance over those textmode wordprocessors of yesteryear.
The reason? Back then they KNEW what it was doing, they could SEE the control codes, and delete them if they were wrong. Sure it couldn't tell you your grammer was wrong, but it never really fought against you either.
If you look back to Word 2.0 and compare it against today you can see certain elements that you can think of as advances. But you can't really see much, and its certainly not an order of magnitude better. But we do have a whole load of attendant junk. Basically, we're going backward again.
If we are going to go in the direction of a 'smart' wordprocessor then I want a truely smart one. Something that means I do less work and produce a much better result. I don't want something with a level of complexity that means I'm forever fighting it in doing the actual job - the one of transfering knowledge from my head to someone else's with the minimum of time and effort.
Re:Programmers, not tools (Score:2)
But Word is smart; it has IntelliSense Technology(TM)(R)(C). That's how it knows that when I type "6 July 2002" at the top of my letter, I really mean "6 July 2002-07-06". Come on, it's obvious... ;-)
"Better" UI depends on your perspective (Score:2)
That depends on your point of view. Personally, I write lots of technical documents, where every other word (ish) isn't in the dictionary. That "better interface" makes my screen unreadable, since it's littered with red. On top of that, I usually spell correctly in the first place, and look words up in a dictionary as I go along if I'm not sure. Spell checkers rarely have to correct genuine mistakes in my documents. So personally, I'd much rather see that feature done away with and have the performance back, rather than waiting for Word to catch up as I type, as I had to ten years ago. If it's useful for others, by all means have it as an option, but don't call it "better" in a blanket statement.
working code, not pipe dreams (Score:3, Interesting)
listen to your profiler. everything else lies.
Re:Programmers, not tools (Score:2)
Considering my experience dates back to Wordstar, I can answer yes to that question. Of course, if you can show me a better way to do tables in Wordstar 1.0 for CP/M besides using | and - characters, it'd be greatly appreciated.
Ah, hokey (Score:2)
Bah yourself. Who's this Knuth guy, and what the hell does he know about efficient programming, anyway?
Re:Programmers, not tools (Score:2)
1k? Luxury! I had 512 bytes, for both program and data, come home and Dad would beat us around the head and neck with a broken bottle, if we were LUCKY!
</Monty Python>You're so right (Score:2)
I couldn't agree more. Sadly, the fact that almost everyone replying to your post thinks it is advocating premature optimisation at the level of assembly-level tweaks makes your point all too well.
Re:Programmers, not tools (Score:2)
Yes.
I've had firsthand experience with two non-WYSIWYG word processors, Wordstar 2000 and Wordperfect 5.1 The first one was clunky, clutzy, and there was no way to tell what the darn thing was going to look like without printing it out. The second suffered from the "external blind manual" syndrome, to the point where it was necessary to memorize commands just in case the "secret cheat sheet" for the F* keys was missing.
'course, I'd love it if the @!$@!#%ing thing acutally worked faster... but at least it gets em-dashes right.
Not useless (Score:5, Insightful)
Remember the words of Knuth: "Premature optimization is the root of all evil." Without profiling, you don't know what optimization is really needed and what isn't.
That said...
BEGIN RANT
I've used gprof successfully with plenty of recent code. It works perfectly fine in non-threaded code, which _should_ be the majority (99%+) of code out there. Yes, that includes big network servers (the last one I wrote just recently passed the 6 billion requests served mark without blinking). Threads are a really nasty programming rathole that should be applied in a limited way; they take much of the time and effort spent developing protected memory OSes and toss it out the window. They also tend to encourage highly synchronized executions instead of decoupled execution, which often makes things both slower and more bug-prone (locking issues are _tough_ to get right when they become more than 1-level) and slower to implement than a well-designed multiprocess solution with an appropriate I/O paradigm. Just because two popular platforms (Windows and Java) make good non-threaded programming difficult doesn't mean you should cave in.
END RANT
Re:Not useless (Score:2)
WTF? How does Java make it hard to write non-threaded programs? If anything Java makes it easy to START writing threaded programs. When all the details start hitting you, you realize that it's trickier than it looks.
-jon
Re:Not useless (Score:2)
No fork(). No multiplexed I/O. Try writing a good scalable network server in a single thread without the moral equivalent of select()
Java 1.4 recognized that, and added I/O multiplexing. Still no good multiprocess (but not multithreaded) framework, though, and I/O multiplexing only solves a limited subset of cases.
Sumner
Re:Not useless (Score:2)
So threads are evil -- now what? (Score:2, Insightful)
But processes as provided by current operating systems are too expensive to use. If I have a network server (e.g. a httpd) that has to create a process for each network request, it will never scale. In theory all that has to happen is inetd (or equivalent) fork/execs and does the necessary plumbing so that the ends of the socket are STDIN and STDOUT. Then the process just reads and writes as necessary to fulfil the request. In practice, this just doesn't work.
That's why you can't use cgi for high-volume transactions. So lets make the server a single multithreaded daemon process instead [aolserver.com], where each request is handled by a thread. Now you can handle each request much faster, but you lose the protected address space the OS gives you in a process.
Obviously, the OS needs to change, and give use something (maybe a hybrid between processes and threads) that more closely meets applications needs. I don't see anybody making suggestions as to ways to move forward. Anybody know of research in this area?
Re:So threads are evil -- now what? (Score:5, Insightful)
Okay.
But processes as provided by current operating systems are too expensive to use.
No, they aren't. Have you measured fork() speeds under Linux vs. pthread_create() speeds()? Sure, Windows and Solaris blow at process creation (and Windows doesn't have a reasonable fork() alternative--it conflates fork() and exec() into CreateProcess*()), but that doesn't make all OSes brain-dead.
If I have a network server (e.g. a httpd) that has to create a process for each network request, it will never scale.
Right. And if you create a new thread for each network request, you'll never scale--give it a try some time. Good servers that use a thread/process for every connection do so with pre-fork()'d/pre-pthread_create()'d/whatever pools. Apache, for instance, uses multiple processes (but no multithreading, except in some builds of 2.x) but pre-forks a pool of them. This is really basic stuff, even an introductory threading book will talk about pooling and other server designs.
Really scalable/fast implementations don't even do that. They use just one process (or one per CPU) and multiplex the I/O with something like select, poll, queued realtime signals (Linux), I/O completion ports (NT),
Obviously, the OS needs to change, and give use something (maybe a hybrid between processes and threads) that more closely meets applications needs
http://www-124.ibm.com/pthreads/ proposes an M:N threading model and offers an implementation, but it still has the shared memory problems of threads. multiprocessing may not be sexy but it's really a lot cleaner for most problems and can be more efficient in a lot of domains.
Sumner
Re:So threads are evil -- now what? (Score:2)
>>>>
Umm, fork() is the one that's braindead. Who the hell dreamed up a system where creating a new process would copy the entire state of an existing one only to have it wiped out when the other process did an exec()? fork() requires all sorts of nasty stuff (like copy-on write in the VM) that is ditched if the OS follows a process/thread model. Windows might be braindead, but CreateProcess() makes a hell of a lot more sense than fork().
Re:So threads are evil -- now what? (Score:3, Insightful)
Uh, COW isn't ditched in a process/thread model. Shared libraries would suck without it. Demand paging of executables wouldn't work with it. It's a fundamentally good thing used by Unix, MacOS X, Windows, and almost all other modern OSes which support protected memory. Definitely not "nasty stuff", and by itself it eliminates 99% of the fork() overhead vs. threads.
You really want to be able to create a new process with the same state as the existing one, and fork/exec allows that. There's system() if you want an entirely new executable (which might call fork()/exec() or might call spawn(), vfork()/exec(), or whatever...). I don't feel like arguing over whether a spawn()/CreateProcess*()-style syscall is good, but not having a fork()-style syscall is simply braindead. There are things you can do with fork()/exec() that you can't do with spawn() or CreateProcess*(); the reverse isn't true.
Sumner
Re:So threads are evil -- now what? (Score:2)
So at the expense of a little complexity in the VM system for copy-on-write, you get a much simpler, cleaner interface for the programmer building stuff on top of the OS. (Not to mention the fact that you can fork() without exec'ing, and in many situations that turns out to be a much more convenient and safe approach to concurrency than using threads.) Sounds like a good tradeoff to me. We're not talking about much complexity either; students routinely do this stuff in OS course project assignments.
I think understanding why fork()/exec() is better than CreateProcess() is an excellent lesson on how to design a good interface.
Re:So threads are evil -- now what? (Score:3, Interesting)
No problem.
I guess the key point I want people to remember (if I only clear one thing up...) is that a decision about whether to use threads or processes should be based on whether they want all (or mostly) shared memory, in which case threads are in order, or some protected memory (and possibly some shared) in which case processes are the way to go.
Windows has hoodwinked people into thinking threads are fast and processes are slow (and that processes have to start new executables), when that's really not the interesting detail and isn't really very true under well-designed operating systems. And you lose a lot by giving up protected memory (even only giving it up wrt other threads in your memory space).
Sumner
Re:So threads are evil -- now what? (Score:2)
The simple reason is that the OS can optimize context switches to avoid switching page tables, and the resulting cache and TLB flushes.
Sure, its not likely to be more than a 5-10% speed up on linux, but when you're groping for those last few TPS, it matters.
Can you name a single real-world application where using threads instead of processes on Linux speeds it up even 1%, let alone 5%?
Certainly if it does exist it's not an efficient application.
(Sure, other OSes can't context switch to save their lives and force you to use incorrect abstractions because of it--I don't much care)
I'm not saying to never use threads, but the decision to use threads vs. processes should be based on whether you want/need your memory to be shared (with all the problems that introduces as well as the convenience) or not, not on any perceived performance problems. Or, as Alan puts it, "threads are processes that share more". That's the way to think of them. And good modular programming remembers to share only what is absolutely necessary--keep your data hidden when possible.
Sumner
write event driven programs; threads for CPU work (Score:4, Informative)
the basic mentality to switch from threads to event programming is this: anytime you're using a thread solely so that it can sit around and block on high latency events (network or disk I/O) most of its lifetime, it should not be a thread.
its acceptable to have worker threads/processes that you hand computational tasks to and they trigger an event in your event loop when they hand a result back, but don't use threads of execution to manage your state. you'll pull your hair out and still have a nonfunctional program.
Re:Not useless (Score:2)
When I'm running a graphical program, the UI must not lock up, no matter what processing is going on in the background. I don't care how you solve that problem, but a simple use of threads is one of the simplest methods.
Re:Not useless (Score:2)
Not really. It seems simple until you get into the details. Yes, for some things multi-threaded is the way to go. But a multi-process solution is usually easier to implement and more stable, and a straight asynchronous single-threaded state machine solution is often the best (in terms of ease of implementation and performance). Remember, the difference between threads and process is that processes have protected memory and threads have all shared memory. The number of cases where you really don't want most of your memory protected is very small, especially when you remember that processes can easily get shared memory segments for select pieces of memory. Most people choose threads because they think threads are better/faster/smaller than processes (which is true on some broken OSes but not meaningfully true on Linux) rather than based on whether or not they want most memory shared.
Sumner
Re:Not useless (Score:2)
>>>>>>
Not so. Have you ever compared the time between a thread-switch and a process-switch? The only difference on Linux is changing the MMU context. Yet, on x86, changing the MMU context is the slowest operation you can perform.
Re:Not useless (Score:2)
Yep. And for most applications, it's not meaningful. If you spend all your time context switching, you're definitely not efficiently designed whether you use threads or processes--you can definitely measure the overhead in that case, but when you go to a situation where you're synchronizing on anything (mutexes, sockets, whatever) the difference essentially disappears. And even in the measurable situation, the difference isn't huge--about 2 usecs on my home machine on a total overhead of 4 usecs vs. 6 usecs (threads vs. procs). Sure, it's 33% SLOWER!!!! Horror!!! In the real world, it generally doesn't matter and it's small enough that if context switch overhead is hurting your multiprocess app then switching to multithreading won't really help.
Both are so fast that if you though about your design at all they won't even be a blip on the radar, unlike on some OSes where switching process can take 100s of usecs vs. 10 usecs for a thread switch.
There are exceptions, which is why I didn't say that threads are always bad. But the performance argument here is almost always specious, brought up by people who learned about threaded programming on other platforms where it is a huge win and used to defend a poor design choice (look, I can measure the difference in contrived situation X even though it has no effect on system performance).
Sumner
Re:Not useless (Score:2)
Re:Not useless (Score:3, Interesting)
(The quote in question is:
"A computer is a state machine. Threads are for people who can't program state machines." Alan Cox)
Except I'd assert that threads are far harder to program correctly than state machines. Easier concept at first, and easy to come up with a design for the 90% solution, but the devil's in the details and threads have a ton of details. Not to say that state machines don't, but they seem to cause less problems in practice.
Sumner
Linux pthreads breaks lots of things (Score:2)
ACE has the answer (Score:3, Informative)
And remember, in the immortal words of Michael Abrash, "Assume Nothing. Measure the improvements. If you don't measure, you're just guessing."
Not useless, just different (Score:2, Insightful)
There's also a continuing trend of software developers spending user's computing power to make thier jobs easier. Java, J2EE, C#,
Some people thinks that the wasted processing power is a crime. Me, I think it's just economics. It's much cheaper to pay for processing power than it is to pay for the developers to squeeze every last bit of performance out of an app.
However, there are some applications where profiling is absolutely required. Database engines, games, simulations, anything that is CPU-bound has the potential of benifiting from profiling.
Quantify! (Score:3, Insightful)
Faced with a similar problem in Linux, I'd probably port the program to Solaris, Quantify it there, and hope the results are similar under Linux.
Plenty of options for Java (Score:2, Informative)
I use it on threaded code ... (Score:2)
Of course this worked because from gprof's point of view I was running in one kernel thread - apart from that oprofile rocks
short answer yes, the long answer no (Score:2, Insightful)
On the other hand, profilers are very good at indicated, if the code is well designed, that out of 10K lines of code,these three function of 10 lines each eat up 80% of the time. A sufficiently clever programmer will focus on those areas for analysis. If the code is not good, the profiler will unlikely be able to reduce the problem domain. If the programmer is not good, the information will not be so useful.
Wrt the multithreading issue, I find most problems occur in two cases. First, as in debugging, the programmer does not begin with sufficiently simple conditions. Often one cannot debug the whole application at once. Likewise, profiling an entire application in multithreading mode may no the proper approach. Second, The function to be profiled may not be properly designed to allow a useful profile. Multithreading applications are often best when they are made up of simple small purpose functions. These are easier to debug, and easier to profile.
No (Score:2)
Sample-based profiling (Score:3, Interesting)
This can be done for about 40 lines of code. All you need is to set up an alarm timer, and then install a signal handler for it that spits out the current program counter to a file. After the run is finished, filter the PC values through addr2line and voila. If you want to get really fancy, make it walk the stack via the ebp register (on x86) and you can build yourself a call stack.
Unit Profiling (Score:2)
Profiling should be performed at the unit-test level, and not on full-blown applications.
For the most part, this approach avoids hassles with threading and processes, and has worked effectively for me on multiprocessor clusters.
[PATCH] fix gprof bug with large c++ programs (Score:3, Informative)
another gprof problem: it chokes after 65534
symbols. This makes it hard to profile large
c++ programs.
I think gprof is still useful. Ulrich is just
being cranky. The workaround for the multithreaded support works pretty well...
Oh no... (Score:2)
Then, the idea was to write in a high-level language, but always be careful about performance.
Then, the idea was to develop apps quickly, then profile to optimize the important parts.
Now, screw optimization, let the user buy more hardware!
I think this attitude sucks. Even my 1.5Ghz Athlon-XP is slower running KDE 3.x (or any version of gnome for that matter) than my old 300Mhz PII was running Win98. And it doesn't do a hell of a lot of stuff that my old machine couldn't. I switched to Linux and took the performance hit because I hated Microsoft. I keep upgrading KDE (and my hardware) because the latest apps only work on the latest version. I don't expect more complex software to get faster, but I'd expect that as I upgrade my hardware, software should stay relatively the same speed. Yet, it seems as if software is getting slower more quickly than system bottlenecks (specifically RAM and hard-drive speed) can keep up. That means that the end-user experience is deteriorating, even as users pump more money into their hardware to get usable performance.
KDE vs. Win98 performance (Score:2)
Thanks for that information. I'm about to upgrade my trusty PII/350 running Win98 to a nice, new top-of-the-range custom-built beastie. Well, it's been four years, and it was my birthday last week. :-)
I'd been considering installing Linux as an alternative to MS stuff, since I now object enough to the nature of Microsoft's attitudes to make the effort to switch. In the light of your information, I think I'll just install Win2K instead.
Re:KDE vs. Win98 performance (Score:2)
You can use whatever you like, not just the latest KDE or whatever.
(Oh, and don't bother to upgrade your hardware, I am writing this on a Celeron 266MHz, 64 megs of RAM, and this is quite fast with KDE, Mozilla and Netscape running.).
I always find it funny... (Score:2, Insightful)
what's the problem? (Score:4, Interesting)
Now, compute intensive code tends not to spend a lot of time in system calls, so it isn't clear that it matters whether a profiler counts time spent in system calls. I kind of prefer if it doesn't because it doesn't clutter up the profile with I/O delays (which are usually unavoidable).
If you want to find out where your code is spending time in system calls, you can use "strace -c".
There are also gcov-like tools that can be used for profiling via code insertion (as opposed to statistical profiling like gprof), although I'm not sure whether PC hardware has the necessary timer support.
Overall, the answer is: yes, profiling still matters for programs that push the limits of the machine. But fewer programs do. I think most people would be a lot better off not programming in C or C++ at all and not worrying about performance. Too much worry about "efficiency" often results in code that is not only buggy but also quite inefficient: tricks that are fine for optimizing a few inner loops wreak havoc with performance when applied throughout a program. Too much tuning of low-level stuff also causes people to miss opportunities for better data structure and program logic. This is actually an endemic problem in the industry that affects almost all big C/C++ software systems. Desktop software, major servers, and even major parts of the kernel should simply not be written in C/C++ anymore.
The thing with profiling and optimization is to know when to stop, and few people know that. So, maybe the best thing to say is: "no, profiling doesn't matter anymore". That will keep most people out of trouble, and the few that still need to profile will figure it out themselves.
Re:what's the problem? (Score:2)
I used to use gprof on Suns, which by your definition is "working".
In a working profiler, time spent waiting for I/O doesn't show up because it doesn't take CPU cycles to wait.
Oh? Could have fooled me. I have had plenty of CPU intensive I/O. But in any case, when I do look at system calls, of course, I want to know time waiting. It is of no interest to me whether the process is slow because the CPU is spinning or because it's waiting for a disk block.
Thread synchronization is expensive. I had a multithreaded server app that spent 15% of it's time just in the posix mutex functions.
I don't get it: do you or don't you want to see time spent waiting? Waiting for a mutex may well be just--waiting.
Re:what's the problem? (Score:2)
This will cost me karma... (Score:2, Insightful)
The problem you are complaining about profiling having is that it can't profile threaded programs. Don't write threaded programs, and the problem is solved.
Frankly, I've always considered threading useful for only a few situations:
o When you have an SMP system, and you need to scale your applicaiton to multiple CPUs so that you can throw hardware at the problem instead of solving it the right way
o When you have programmers who can't write finite state automata, because they don't understand computer science, and should really be asking "Would you like fries with that?" somewhere, instead of cranking out code
o When your OS doesn't support async I/O, and you need to interleave your I/O in order to achieve better virtual concurrency
Other than those situations, threads don't make a lot of sense: you have all this extra context switching overhead, and you have all sorts of other problems -- like an iniability to reasonably profile the code with a statistical profiler.
OK... Whew! Boy do I feel better! 8-).
Statistically examining the PC, unless it's done on a per thread basis, is just a waste of time in threaded programs.
If you want to solve the profiling problem for threaded programs, then you need to go to non-statistical profiling. This requires compiler support. The compiler needs to call a profile_enter and profile_exit for each function, with the thread ID as one of the arguments. THis lets you create an arc-list per thread ID, and seperately deal with the profiling, as if you has written the threads as seperate programs. It also catches out inter-thread stalls.
-- Terry
Re:This will cost me karma... (Score:2)
"Solving it the right way"? If you know how to solve the travelling salesman problem, or chess, or simulate the world's weather without throwing hardware at the problem, you really ought to publish it for the good of mankind.
threads don't make a lot of sense
Some problems are conceptually parellel; it almost always easist to write a procedure in a way that mirrors the way it's conceptualized.
you have all this extra context switching overhead,
So your multitasking system does 1001 context switches a millisecond rather than 1000. Woo hoo.
Re:This will cost me karma... (Score:3, Insightful)
In that case... fork and use IPC. It's not substantially more expensive and you wont have to ensure your parallel code is thread safe.
Re:This will cost me karma... (Score:2)
But then you're forced to serialize and deserialize all the data you need to share.
Re:This will cost me karma... (Score:2)
I have a formal background in CS, I'm well aware of how to use FSAs, and I'm a professional software developer, and yet I disagree with this argument. One thing I've learned is that if a tool is available and its purpose matches your need, it's generally a better solution to use that tool than to reinvent the wheel.
I've worked on several multithreaded systems, some small scale, some enormous. While it would theoretically have been possible to rewrite the multithreaded code as a FSA, it would surely have led to a maintenance handicap and an increased bug count, in exchange for -- possibly -- a tiny increase in performance, and even that is not guaranteed by any means. Why spend hours writing a multithreading system of my own when there's a tried and tested one already there for me to use?
Threading... (Score:2)
Threading is useful in the instance where you have an application that needs to scale with SMP and which you cannot, for whatever reason, fork. But the accompanying pain of being forced to pay extremely close attention and mutex lock the code all over makes it not worth it for most situations.
Use fork. Use other IPC methods if necessary. But dont thread or you'll spend an order of magnitude more time debugging.
Here's how I profile my code.... (Score:3, Funny)
Me: Really? Which part?
User: When I click the "report" icon
Me: Oh (tinkers with report code). Try it now.
User: It's still slow
Me: (shakes BOFH excuse 8-ball) Hrmm, must be interference from sunspots, try it again tommorrow
Re:I don't know... (Score:3, Informative)
You could argue that with good up front design, you'll know in advance what 10% of the code to focus on, but I don't think that works that well in practice. At best, you're making educated guesses about where bottlenecks will appear, and you'll be wrong some of the time -- requiring profiling at that point.
And lacking tools doesn't mean you can't or won't profile -- it just means you'll have to do more work to profile the code.
Re:I don't know... (Score:5, Insightful)
And a lot of smart people, from Knuth and Kernighan to Linus and Guido, will freely admit that predicting what to optimize is nearly impossible. Even people at that level of programming prowess are often surprised by where the bottlenecks appear (and where they don't appear). You certainly want to design for flexible optimization from the start, but you'll often discover that the stupid O(n) scan you put in is good enough for now and that you better optimize the I/O system before you think about replacing it with a tree or hash table or whatever.
Sumner
Re:I don't know... (Score:3, Insightful)
Wrong. You design your code as a compromise between factors such as speed, maintainance, reusability, readability, and, most importantly, the resources you are allowed to expend.
If speed is a critical factor, then you might try to do some predictive profiling using exisiting principles to make sure the code is fast. Otherwise, you write the best damn code you can, which generally means using good practices to insure that you don't waste time, and then profile it. Profiling will work best if the code is written is such a way(read a lot of reusabled functions) that allows simple optimization.
BTW, the biggest wrinkle in this is that programmers time has become more valuable the clock cycles. We will now waste some clock cycles to same programmers time, which is why profiling is not nearly as important as it used to be.
If the code is not written well, and has to be rewritten when the profiler says it sucks, then you wasted your time.
Re:I don't know... (Score:2)
I agree completely that good design and good coding practices will save time when it comes time for profiling and optimizations.
Re:I don't know... (Score:2)
Anyhow, whats the definition of optimize
Re:gprof far from useless (Score:2)
Re:gprof far from useless (Score:4, Insightful)
I would disagree with this wholeheartedly. What about databases like Oracle, MS SQL Server, and so on? They're internally multithreaded, and most definitely not "interactive" after you initiate a SQL query.
I believe apache 2.0 is threaded. HTTP by nature is not interactive. And so on. There are many other examples, left as an exercise to the reader.
While it is true that threads are very useful for interactive programs, in fact critical, their use does not stop there by a longshot. Any program which needs to do two things at once without fear of blocking on a system call is a candidate for threads. Threads are also useful for distributing compute cycles over multiple processors within a single process, allowing it to gain the benefit of concurrency.
The project I'm currently working on is a custom database application, and without threads it would be useless. And there are no users talking to it directly, that's for sure.
reducing the amount of input required from the user will always pay off better than any optimizations.
I find this perplexing. Nobody cares about optimizing a user dialog. Reducing user input or optimization of user input code would serve little purpose in most multithreaded applications I'm aware of. Generally, interactive multithreaded programs use threads so they can interact with users while simultaneously performing some other task that shouldn't be stalled by waiting for user input. For example, a network monitor might have three threads: one for watching network traffic, one for resolving IP addresses to hostnames, and one for taking user input. It doesn't matter how long the user input thread sits around waiting for the user to type/click something. There are two other threads working away in the meantime, watching traffic and displaying it for the user, oblivious to whether or not the user is doing anything. In such a case as this, profiling the watcher/resolver threads might be very useful indeed, since they need to be more or less realtime.
This gprof problem is a serious issue, and minimizing it by saying that threaded programs generally wouldn't benefit from profiling is naive.
Re:gprof far from useless (Score:2)
>>>>>>>>>
Wouldn't this sentence be fun taken out of context? Seriously, though, I think what the original poster was getting at was the fact that a lot of powerful interactive programs (3D modelers for example) can really make the user cry if they run computations and UI code in the same thread. In those cases, splitting off the calculation code to a seperate thread and giving it a lower priority than the UI thread ensures that the user-interface stays responsive, no matter what's going on in the background.
Re:There is no question that profiling is necessar (Score:5, Insightful)
That's hardly true. Certainly you shouldn't waste time optimizing code until you know where the bottlenecks are. But it a lot of cases--I'd even venture to say most cases--code gets written and is fast enough. In such cases, profiling is a waste of time. Profiling is only indicated if there's a legitimate performance problem.
To a lesser extent, the same is true of unit testing and integration testing. If you're writing some code to convert one image to a GIF and you run it successfully to get the GIF, there's no reason to unit test. Even if the code has horrible bugs on some inputs, the job is done. One-off code isn't (unfortunately) uncommon. Prototype code is also very common and often you don't need to do extensive testing on it, either. Any code where the total cost of code failure is lower than the cost of QA probably doesn't need to be QA'd (which is not to say that you should spend an amount on QA equal to the failure cost; if spending $1000 on QA reduces the chance of failure by 99.999% and spending $1000000 reduces the chance of failure by 99.9999%, the $1000 expenditure suffices in all but the most demanding applications)
Sumner
Re:There is no question that profiling is necessar (Score:2, Insightful)
Of course "one off, disposable code" doesn't need the same degree of "analness" applied to it as does mission critical code.
However, "fast enough" is a really bad metric to use. Yes, utility "X" is fast enough. But oh, I didn't realize it was going to be used in conjunction with utility "Y" and "Z". Now, everything is really slow. Hey, can you say Microsoft?
Fortune telling is not part of any programming job description I've ever seen.
Re:There is no question that profiling is necessar (Score:3, Interesting)
Hey, I need this report on my desk every morning. It takes 3 hours to run. Let's kick it off every night at midnight.
Fast enough, even though a well-coded, well-designed implementation might take seconds to run. And mission critical. No point wasting programmer time speeding it up when we can do another project with big upside instead.
This sort of thing is not uncommon at all.
Sumner
Re:There is no question that profiling is necessar (Score:2)
Re:There is no question that profiling is necessar (Score:2)
Then you profile and optimize, because it's not "fast enough" any more.
Is that hard to understand?
Sumner
Re:There is no question that profiling is necessar (Score:2)
It's worth a quick overview of the profile, to determine how long it would take to optimize said report.
I talk from painful experience - a job I once worked at ran overnight DB jobs on their Oracle database. Nobody bothered checking for efficacy of their SQL until the jobs that had accrued grew to take more than 8 hours in total, and were still running when users came in for the morning.
Then, with a scant four days of programmer, the jobs got pared back to three hours, AND some bugs got fixed. If they'd done that a few months earlier, we would have avoided 4 months of pain and anguish from users coming in, trying to use the system, and screaming bloody murder because it still wasn't available for them at 7:30 AM.
Re:There is no question that profiling is necessar (Score:3, Interesting)
Right. It's obviously a cost/benefit tradeoff. If you start the report at midnight and need it at 8:00 in the morning, then if it takes 15 minutes to run you probably don't even want to think about profiling. If it takes 7 hours, it's still fast enough for now but you may want to concern yourself with whether it'll always be fast enough. What's the cutoff? 1 hour? 4 hours? Depends on how crucial the report is and what other projects are on your plate at the moment.
Obviously "performance problem" is tough to quantify in general, but I still contend that you should normally only profile if there is a potential performance problem (or if you have idle resources, etc). Otherwise, go do some QA. Work on a new project. Clean up the nasty hack you wrote late at night to get it going. Write some documentation. Whatever.
Sumner
Re:There is no question that profiling is necessar (Score:2)
Um, but...I think there's a confusion of context occurring. The situation you describe happens when you're writing little chunks of one-off code to perform one task and be done with. Usually it'll be used once, or is part of a stopgap "until there's a real solution." If you're producing a product - if an entity external to your workplace is paying money for what you're producing, then you code isn't good without testing; and if you've got some spare cycles going on, profiling isn't too bad either. Something for a Malicious Coder to do when he's bored of adding bugs.
I'd even argue you have the same moral obligation to produce the same level of quality (in terms of well tested and possibly profiled code) if entities outside your workplace will use your software. Just because it was free doesn't mean it should suck.
Re:There is no question that profiling is necessar (Score:3)
With testing, that's generally right. If something's going to run often, it can potentially fail a lot of times and so even a small cost of failure will be compounded to the point where QA is worthwile.
With performance, that's often not true. There are a lot of jobs that don't need anything approaching "good" performance (batch reports--I need a web usage report every morning on my desk/in my inbox--where the quick-and-dirty multipass solution that takes 3 hours to run can be scheduled at midnight, and the programmer can then do another project with big ROI instead of spending time writing a faster solution that takes only seconds to run) are one extremely common example of this (as is other batch processing). Many applications fall into that domain, many of them absolutely mission critical and responsible for millions in revenue but also not worth spending time optimizing when it could be better spent testing, adding features, or working on another project entirely.
And many (I'd say most) interactive application are fast enough from the get-go and never need optimization. Sure, there are some apps that either do a lot of computation (mp3 players, games, compilers, etc), or are run many times at once (web servers), or are too slow when first run for unknown reasons. But a lot of programs are fine from the start and profiling them is a waste.
Sumner
Re:There is no question that profiling is necessar (Score:2)
I agree that profiling isn't always necessary, and that sometimes profiling and optimization won't reap any advantage, but I think the range between not necessary and useless is wide, and the advantage from profiling in that range is subtle but existant.
Additionally, profiling can serve other purposes. It's been suggested that, under a unit testing regime, a coder new to a project can serve as a "Malicious Coder," whose job it is to add bugs to code to catch out situations the unit tests miss. The advantage is that this can improve the testing as well as bringing up a new team member quickly. Profiling/optimization tasks can serve a similar purpose. By giving a direction to code investigation, it speeds the acquisition of familiarity with the code.