 
			
		
		
	
		
		
		
		
			
				 
			
		
		
	
    
	Running 100,000 Parallel Threads 409
			
		 	
				An anonymous reader writes "This story explains how the latest Linux development kernel is now able to start and stop over 100,000 threads in parallel in only 2 seconds (about 14 minutes 58 seconds faster than with earlier Linux kernels)!  Much of this impressive work is thanks to Ingo Molnar, author of the O(1) scheduler recently merged with the 2.5 Linux development kernel."
		 	
		
		
		
		
			
		
	
Hold this thread while I walk away (Score:3, Funny)
100,000 Linux threads (Score:5, Funny)
Re:100,000 Linux threads (Score:2, Funny)
Re:100,000 Linux threads (Score:2)
Is that red splatter on the ground the remains of Bill Gates?
Win ME Kicks that sorry statistic!!!! (Score:4, Funny)
Re:Win ME Kicks that sorry statistic!!!! (Score:4, Funny)
I'm only a humble C programmer, but.... (Score:4, Interesting)
Re:I'm only a humble C programmer, but.... (Score:5, Funny)
Re:I'm only a humble C programmer, but.... (Score:3, Insightful)
Re:I'm only a humble C programmer, but.... (Score:5, Informative)
Re:I'm only a humble C programmer, but.... (Score:5, Informative)
I'm not so sure about that.
A threaded model doesn't necessarily offer advantages -- Apache's multiprocess model is really just as good on platforms without serious performance penalties on fork(), and Boa (which neither forks nor threads) is much, much faster than either Apache mode (though of course on SMP systems multiple instances must be run to use all the available CPUs).
Indeed, unless SMP is being taken advantage of, a well-written single-threaded application will always be faster than an equivalent multithreaded application. Such an application has less overhead and is able to jump between its "subprocesses" only when needed -- and without the latencies involved by letting the OS handle said scheduling. Back in the Real World, I still write threaded code -- but because writing unthreaded code (in the problem spaces where threads are useful) is harder, not because it's faster.
Re:I'm only a humble C programmer, but.... (Score:2)
Re:I'm only a humble C programmer, but.... (Score:3, Insightful)
That said, though, sharing (and putting locks around) your DB connections or script interpreters is an easy way to lose performance and introduce potential deadlocks (or other hard-to-track, hard-to-reproduce bugs due to bad shared state) as opposed to having each process able to operate completely independantly from the others. Shared state is a Good Thing when it's genuinely needed -- but should be avoided when it's not.
I'm not saying -- and I've never tried to say -- that threading is worthless; I just object to people who take the position that making an application multithreaded will necessarily make it faster.
Re:I'm only a humble C programmer, but.... (Score:3, Insightful)
I have written a dynamic content server that over the past 2 years has served over 6 billion requests, with 5 9's of uptime. I've written several realtime instrument control applications. I've written a distributed text mining application that does index-assisted regex searches of 1/2 terabyte of data in Threads can really be life savers when used correctly. Sure you have to implement locking but that's what pthread_mutex is for.
On low-mem devices making full copies of the process to spawn copies is just insane.
1) Look up COW and memory sharing.
2) I never said "use only processes". A combination of processes and event loops is the way to go 99% of the time. There are some corner cases where threads are useful, but they tend to be abused by people who think "threads are good" without considering the alternatives nor the ramifications of that choice.
And on windows the Thread implementation is *intentional* not accidental. The idea is that people using threads will take advantage of the speed increase.
It's not a speed increase. Thread switching and thread creation on Windows are slower than process creation and process switching on Linux. On a par, but slower. Process creation on Windows is laughably slow, though, and process switching is substantially slower than thread switching.
It's not that Windows figured out how to make their threads go fast, it's that their processes were dog-slow and they had to create an entirely seperate execution primitive to get any sort of reasonable concurrency. Linux did things the right way by making them both fast, and now allows you to choose between the two for _design_ reasons (do I want to share memory?) rather than artificial implementation reasons.
You'll find a lot of knowledgeable people (Larry McVoy, former SGI kernel architect) who echo the same belief: use threads sparingly. Use as many threads as you have CPUs, and use processes instead if that makes more sense. Use more threads than that only if you're intimately familiar with the alternatives and know why they don't work, because while a state machine with non-blocking I/O may seem hard at first glance it'll almost certainly turn out to be easier to implement correctly, easier to debug, faster, and easier to maintain.
Sumner
Re:I'm only a humble C programmer, but.... (Score:2)
My question is why does the multithreading in Mozilla suck so badly on Linux and will this help it?
Re:I'm only a humble C programmer, but.... (Score:2)
Well, for one thing, you're now going to have to start typing a helluva lot faster. The machine is not going to slow you down.
In truth, this is great news for those running servers but you probably won't notice much of a difference on a desktop, barring a few really thread heavy applications. UML (User Mode Linux) is one notorious example.
Re:I'm only a humble C programmer, but.... (Score:2)
If you want to destroy my boxen. . . (Score:3, Funny)
OK I'll shut up now.
boxen. . . (Score:2, Troll)
Re: boxen. . . (Score:2)
Parallelism (Score:5, Interesting)
Re:Parallelism (Score:2, Interesting)
Many algorithms work great for one extra processor but fail miserably with more.
In most cases, you can just busy wait on a semaphore with two CPUs and never notice the hit. 8, 32 or 512 CPUs and you're going to throw away most of your processing time.
Re:Parallelism (Score:2)
You are encouraged to read the list for yourself because it's early in the morning and my brain might be playing tricks on me.
--Knots;
Re:Parallelism (Score:2)
The p4 xeons can.
Great news! (Score:2, Funny)
Re:Great news! (Score:2)
Now THAT would be impressive.
Wow gotta try this out (Score:2, Funny)
Help with DoS? (Score:2)
Re:Help with DoS? (Score:5, Informative)
How the hell did this get +2?
a) most server software, apache 2 aside, doesn't use threads;
b) most DoS attacks are based on bandwidth saturation rather than local resource starvation. As a result, it doesn't help.
BS. This guy asks an innocent question, certainly not worth 2 antagonistic responses. Besides, Apache is a very important application, plus I know other server software that uses thread.
The DoS attacks that are based on bandwidth saturation are typically called DDoS attacks instead. I would say that most DoS attacks are in fact based on crashing the server, which is clearly not based on bandwidth saturation. Plus, there are important attacks that rely on resource saturation, such as the TCP SYN attack. Stopping DoS is not a one step process; you have to deal with it at every stage of the game.
-a
Re:Help with DoS? (Score:2)
Sounds cool, but all I could think of... (Score:5, Funny)
Sorry, had to
Re:Sounds cool, but all I could think of... (Score:5, Funny)
"Hello, my name is Ingo Molnar. You kill -9 my parent process. Prepare to vi."
NOOO!!!!! (Score:3, Funny)
Re:NOOO!!!!! (Score:2)
One of our final projects was to impliment our own shell. This would of course necessitate a fork() command... he hadn't checked conditions quite right and managed to use up all the resources for his account. Fortunately someone had set the Ultrix (Unix on VAX) system up with a little intelligence. He only bombed his own account and had to get the Prof. to go in and kill the out of control Shell
I on the other hand merely got half-baked tokenizing. Great teacher (pity the disbanded the Comp-Sci department around us).
Re:NOOO!!!!! (Score:5, Funny)
I am further replying pre-emptively to dissuade the AC's who would otherwise reply to me and point out my egregious abuse of run-on sentences.
I am further replying pre-emptively to dissuade the AC's who would otherwise reply to me and point out my egregious abuse of +1 bonus.
I am further replying pre-emptively to dissuade the AC's who would mod this post down as off-topic because they do not get the parallel allusion to fork-bombing.
Re:NOOO!!!!! (Score:5, Funny)
Not to mention.... (Score:2)
Re:NOOO!!!!! (Score:3, Funny)
The fun part of that was when the system operators saw the processes replicating like crazy and started to kill them, that made it worse.
Another fun trick with that machine was to set up a circularly-linked list and invoke the LLLU (linked list lookup) instruction on it...
(Yeah, stupid things to do. At least I only did them during relatively quiet times.)
How *I* got kicked out of the computer lab (Score:3, Funny)
prompt "Enter Password:"
No one could figure out that all i did was change the prompt from "$P$G" to that, and everyone was asking what the password was. haha, good old teacher was infinitely frustrated as well! IT WAS BEAUTIFUL.
I got kicked out for a year (not beautiful).
Other similar mischief... (Score:3, Funny)
Imagine the poor victim vainly clicking on the buttons, and getting more and more worried. Said victim actually rebooted the machine to see it reappear, and was not happy when he started to notice the sniggering bunch behind him...
For example pic:
http://www.adobe.com/support/techguides/ope
Probably want to replace CCmail with Explorer or something more dear to heart
I also installed a bluescreen STOP screensaver on April Fool's day on a colleague's PC. Heh, he was shocked enough to actually called another colleague over and made the usual worried mumbles.
http://www.sysinternals.com/ntw2k/freeware/blue
Since I had admin privs, I was also tempted to have ad.doubleclick.net and similar dns names to resolve to a private webserver which served out custom banner ads.
Wonder how users would take it if they see the "Staff Meeting at 2pm banner ad". Or "Company Slogan here". Or "Big boss is watching you!". Or for search result sensitive ads: "Stop downloading mp3s/movies/porn!"
I could actually justify that as a useful application. It's probably more useful than a doubleclick ad...
But I'd probably need the 100K parallel thread kernel to serve up all those ad banners
Bwahaha!
Link.
Windows (Score:3, Interesting)
Re:Windows (Score:2, Troll)
Or they could just blatantly pay some other company that does "independant testing" *cough*mindcraft*cough to lie about it  :)
Re:Windows (Score:2, Insightful)
C//
Re:Windows (Score:2)
you can address 3GB with the
you can address significantly more with AWE/PAE, but i dont know that you can use that additional memory for thread stacks.
Just FYI, Yesterday i had SQL server 2k running with 1914 threads ( in AWE mode)
Re:Windows comparison (Score:3, Interesting)
Max I can create in a process is 2031 threads... That being done in 700ms.
It's odd cause I can create more if I run several processes. It doesn't look like the kernel is choking on thread creation...
will investigate more.
Re:Windows comparison (Score:4, Informative)
C//
Re:Windows comparison (Score:2)
C//
Re:Windows comparison (Score:3, Informative)
This is not true; the kernel stack is two pages in size, i.e. 8KB on i386.
Also, in 2.5 (where these tests were done), the task_struct is no longer allocated on the stack. It is allocated off the slab cache, while the thread_info struct is on the stack. The task_struct slab object is another ~1.7KB per task.
Finally, I do not know what the pthreads default stack size is (user-space? what is that?) but it is certainly larger than one page.
whoa! (Score:4, Funny)
Great (Score:3, Funny)
Possible use (Score:2)
Re:Possible use (Score:2, Informative)
Re:Possible use (Score:2)
It is also generally the case that switching between processes is more expensive than switching between threads.
to the parent poster : 1 thread per connection is a pretty naive way to do it, but its got advantages - simplicity. It's a moot point since on a stock OS you'd run out of socket descriptors long before you'd run into a thread-count maximum.
Re:Possible use (Score:2)
Re:Possible use (Score:4, Insightful)
>>>>>>>>
Then please stay away from my GUI apps. I hate those UNIX grognards that come from that school of thought, then try to code GUI applications with only one thread and end up with apps that can't update the GUI while doing I/O. On my 300 MHz PII, that particular trait made Galeon unusable. It had one rendering thread for all the tabs, so when I was loading a complex page like
nice, but... (Score:4, Interesting)
Re:nice, but... (Score:4, Informative)
According to a mail from Ingo Molnar halfway down the linked article, M:N threading doesn't really solve the real problem - it's good at switching back and forth between running threads, but the real reason for having very large amounts of threads (be they kernel or user space threads) to begin with, is to do IO, and for that, there is no real advantage of user space threads.
More info on the 1:1 vs M:N issue can be read in the white paper [redhat.com]
Re:nice, but... (Score:2)
user-level threads are useless (Score:2)
How will this affect Mozilla, OpenOffice... (Score:4, Interesting)
While it probably is generally true that it will take some time for most applications to start using the new threading model some larger applications could support it fairly soon.
Can we expect these applications to be adapted to the new threading model some time soon, and how will it affect performance?
Great... Now every lamer with no design knowledge (Score:2)
big deal (Score:4, Funny)
Re:big deal (Score:2)
Hooray for fixing the dynamic linking problem! (Score:2, Interesting)
" - - libpthread should now be much more resistant to linking problems: even if the application doesn't list libpthread as a direct dependency functions which are extended by libpthread should work correctly."
This ought to be a big help for those of us who write plug-in modules for servers like Apache 1.x and PHP. The existing thread library doesn't work properly unless the program executable explicitly links to it, which means that my shared libraries can't take advantage of standard thread management such as pthread_atfork().Does this help Apache 2.x (Score:2, Interesting)
100000?! (Score:3, Redundant)
LEXX
Wow! (Score:2)
POSIX compliance ahead? (Score:2, Informative)
missing cancellation points: testing whether a thread has been cancelled should be done in lots of system calls, but linux pthreads do not support this. Instead, you have to call pthread_testcancel() before and after every such call. A real drag.
signal handling: linux pthread signal handling is very different from the POSIX specification. However, proper signal handling is crucial for any real world application.
fork() will not work as expected. This is a real nuissance if you want proper daemon behaviour for your application.
documentation of linux-specific behaviour is poor. As a result, most of the existing literature on thread programming is pretty useless for linux.
All these points can be worked around, for sure. Nevertheless, it makes writing portable software a nightmare. Porting threaded software to linux, well
A solid, well documented, standard conforming threads implementation will make linux a much nicer environment for serious programming than it already is. I am really looking forward to this.
Re:POSIX compliance ahead? (Score:2, Interesting)
That all said, I totally agree with you -- especially regarding cancellation points, fork(), and documentation.
Please bear in mind that much of this behavior will be inherited from whatever libc it it compiled against. IMO, this simply shows the power of C, nothing else.
The above scenario simply points out the differences between OpenGroup/POSIX and GNU/FSF... if things like that "bug" you (no pun intended, seriously), then perhaps you should recompile with whatever "-- posixly-correct" options you have available.
And yes, I have a copy of the SUSV3 spec right here, in fact.
Mod this up, please (was: POSIX compliance ahead?) (Score:2)
And yes, I would of course
NGPT (Score:2)
Wait, there's more (Score:2)
Ingo:...Anton tested 1 million concurrent threads on one of his bigger PowerPC boxes, which started up in around 30 seconds. I think he saw a load average of around 200 thousand. [ie. the runqueue was probably a few hundred thousand entries long at times.]
Wow.. this is pretty good.The ability to spawn & run 1 million concurrent threads should keep even the most demanding users happy for a few years...
OTOH, I hope this post doesn't become the butt of jokes a few months from now ("and you thought 1 million was a lot! Ha! My Palm 5000XL does more than that!")...
Linus didn't think much of O1 scheduler (Score:3, Interesting)
Re:Posix thread... (Score:5, Informative)
http://www.cs.wustl.edu/~schmidt/ACE.html
This is so far the best library I have used for pthread programming. Powerful, easy to use, and encapsulates message passing really well...
Re:Posix thread... (Score:2, Funny)
Your answer:
http://www.linux.ncsu.edu/lug/lectures/rpm-pres/m
This is so true to all of us
ACE is nice for big systems (Score:2)
But it's also way overkill for small stuff. It's a whole distributed framework, not a wrapper around pthreads.
Re:Posix thread... (Score:2)
Probably you do the light one, and include it in the heavy when required.
Ah, the one-size-fits-all thought process...
Re:Not 100,000 threads in parallel, just 50. (Score:5, Informative)
Yeah right. And modded to "Informative"? Slashdot moderators are the _pits_.
Read ingo's reply to Linus. They _did_ start
one test serially and also _parallelly_ . In short he says that its possible.
vv
Re:Not 100,000 threads in parallel, just 50. (Score:2)
So, am I right in thinking this means threading (and hence Apache 2.0) will be a big win for Linux web servers, now?
Re:Not 100,000 threads in parallel, just 50. (Score:3, Insightful)
That's called "making lemonade out of lemons". Clearly this test has shown that thread creation in Linux was horribly broken, not the flip side that process creation was so wonderfully good.
Re:Not 100,000 threads in parallel, just 50. (Score:5, Informative)
See, for example, http://www.linux.cu/pipermail/linux-prog/2001-Feb
Don't forget: Just because this is
Re:Not 100,000 threads in parallel, just 50. (Score:5, Informative)
And yes, Linux's process context switches are on a par (possibly faster - can't be bothered to look up benchmarks) with NT's thread context switches.
K.
Re:Not 100,000 threads in parallel, just 50. (Score:4, Interesting)
The low latency patches go through the kernel breaking up areas where spinlocks are held for long periods of time. That's what causes massive scheduling latency in the kernel.
Context switching under Linux
Go do some real research before you accuse someone who's right of karma whoring bullshit.
himi
Re:Not 100,000 threads in parallel, just 50. (Score:4, Informative)
-  The speed with which the kernel can
schedule and context-switch among threads m =103228014211983 [theaimsgroup.com].
The O(1) scheduler patch for 2.4 seems to help
here.
-  Memory usage per thread 
-  Concurrency limitations of the Apache code
itself 
-  General robustness of the thread
implementation 
At first glance, it looks like the NPTL could be a win for threaded Apache on Linux, as offers some solutions first the first and last of these issues.For some recent data on this, see http://marc.theaimsgroup.com/?l=apache-httpd-dev&
This has been improving gradually with successive 2.0 releases, as the remaining global locks are removed or optimized.
The current (2.4) Linux threading implementation doesn't work well with debuggers.
Re:Not 100,000 threads in parallel, just 50. (Score:2)
Re:Not 100,000 threads in parallel, just 50. (Score:2, Insightful)
For instance, because of the expense many applications use thread pools, which is simply a bunch of idle threads that sit around doing nothing, waiting for work to do. These idle threads still take up system resources even though there not actually using CPU. Not to mention the extra work the developers have do to make the thread pools work for there applications.
Re:Not 100,000 threads in parallel, just 50. (Score:2)
Re:Not 100,000 threads in parallel, just 50. (Score:5, Informative)
Re:Not 100,000 threads in parallel, just 50. (Score:2)
> comments further down in the article, Linus
> points out that only 50 threads at a time were
> running in parallel:
And the next comment down is from Ingo:
actually, that was Ulrich's other test, which
tests the serial starting of 100,000 threads.
the test i did started up 100,000 concurrent
threads which shot up the load-average to a
couple of thousands. [the default timeslice the
parent has is enough to start more than 50,000
parallel threads a pop or so.]
So, yes, they did manage 100,000 threads running in parallel.
Matt
Real World Example (Score:2)
I'm building a project where there will be one huge database with up to 200 different companies connected to it pretty much nonstop. 1-10 users from every company depending on the time of the year. 2 threads for every connection.
200*10*2=4000 threads.
Re:Real World Example (Score:2)
-Kevin
Re:Real World Example (Score:2, Interesting)
Just imagine a situation where a thread might need to calculate something, or initialize a big array. Now, if it's run under a select-loop, you need to do that in parts to avoid starving the server. With threads, you just do the trick and don't care about the rest of the world which keeps serving the clinets, no matter how long youo stay in the functino.
Re:Real World Example (Score:2, Informative)
It's not practical to serve hundreds/thousands of clients with a thread per client model. A typical machine can't handle the load well because it has limited resources. It will thrash. By having a thread pool you place a limit (throttle if you will) on resource utilization. Most high performance, highly scalable web and app servers use this model or a variant.
There is another architecture based on event driven state machines aka SPED (single process event driven) that is high performance and single process/single thread in its pure form. The Zeus web server does this.
-Kevin
Re:Alternative headline (Score:4, Informative)
Uh, why did that get moderated as a troll? Oh, right, Linux is absolutely perfect, and anyone who says otherwise must be a troll.
Come on, Linux's scheduler has long been known to have performance problems once you have a lot of processes/threads... for example, read this paper [linuxalpha.org] [text version [216.239.39.100]] (appropriately subtitled "How I Learned to Love the Alpha and Hate the Scheduler"):
Moderators, don't be Slashbots, moderating according to the groupthink. Educate yourselves, and you'll be better moderators, and better people.Re:Alternative headline (Score:3, Insightful)
Likewise, threading support under Linux has been oriented towards what the developers considered sane: a fairly small number of threads. They had good reasons for considering that the right way to do it - for a start, it worked nicely for what they wanted, and it was sufficiently simple that they didn't have to put in lots of complex code. Further, it's almost never a good idea to have a program architecture that requires very large numbers of threads - it generally only shows up in naive code where people simply don't understand the problems it brings. So, as far as the kernel developers were concerned, stupid people hurting themselves wasn't something to put any effort into amelioriating. This has changed recently, as people have started using Linux in areas where this kind of thing
You need to understand the reasoning behind a lot of these decisions before you can start complaining about them. First and foremost, you simply
himi
Re:Alternative headline (Score:2, Insightful)
I think I'll just slam XP performance based off of NT benchmarks and aricles. What the hell, thier both from MS the argument must be a valid.
Get a grip!
Re:How long before (Score:2)
Speeding The Net is an excellent book about Netscape vs Microsoft, in case anybody cares (it's been a long while since I read it, thus why my date memory is rusty).
Re:How long before (Score:2)
Wrong in every respect.
First Mosaic was not the 'Ur browser'. Tim's NextStep browser was. Mosaic was browser number 15 or so. The significant things about Mosaic were that 1) it actually compiled without having to hack the code yourself or mess with 6 different support packages like tkwww and 2) it was the first X-Windows browser that did not look really amateur.
Second, Netscape does not contain any code from Mosaic, although it was written by the same main author - Eric Bina. NCSA sold the commercial rights to Mosaic to Spyglass.
Third IE was originally based on the Spyglass code, so if any browser is 'the direct descendant' it would be IE. Go look at the 'about' box on IE, although the original Mosaic actually had more lines of CERN code than NCSA code which were never acknowledged.
Re:Ur browser??? (Score:2, Informative)
Anyway, and I'm really not well qualified to answer this, Ur was an ancient city-state from which a prominent ancestral of the Jewish-Christian-Islamic heritage (Abraham, if I'm not wrong).
This city, IIRC already found, was sumerian (I'm not sure about this), the folks who are said to be the inventors of the wheel, among other neat things.
So an Ur browser would be the primeval browser, in other words.
Upon writing a note, one must be sure it will be understood; nonetheless, the "Ur" mention boosted the note level way up. All in all, I think it was great and I'm all for it.
But explanations as these sometimes become necessary.
Re:hmm (Score:2)
My name is ingo Molnar.
You kill -15 my parent process - Prepare to die.