Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

Hyper-Threading Speeds Linux 246

Posted by Hemos on Tuesday January 14, 2003 @03:36PM from the making-things-faster dept.

developerWorks writes "The Intel Xeon processor introduces a new technology called Hyper-Threading (HT) that makes a single processor behave like two logical processors. The technology allows the processor to execute multiple threads simultaneously, which can yield significant performance improvement. But, exactly how much improvement can you expect to see? This article gives the results the investigation into the effects of Hyper-Threading (HT) on the Linux SMP kernel. It compares the performance of a Linux SMP kernel that was aware of Hyper-Threading to one that was not." Ah, the joys of high performance.

This discussion has been archived. No new comments can be posted.

Hyper-Threading Speeds Linux

Load All Comments

Search 246 Comments Log In/Create an Account

Comments Filter:

Also the Pentium 4 - 3 Ghz is hyperthreaded. (Score:5, Funny)

by deathcow ( 455995 ) writes: on Tuesday January 14, 2003 @03:56PM (#5082607)

Xeon folks arent having the only fun. The 3 Ghz Pentium 4 is also hyperthreaded for that crunchy flavor and great taste.

Share
twitter facebook
- Re:Also the Pentium 4 - 3 Ghz is hyperthreaded. (Score:5, Informative)
  
  by deathcow ( 455995 ) writes: on Tuesday January 14, 2003 @04:02PM (#5082649)
  
  Here [intel.com] is the associated press release from Intel about the HT in 3 Ghz P4's. I have seen screenshots of Windows task manager showing (2) CPU performance graphs.
  
  Parent Share
  twitter facebook
  - Win2K 2 CPU == 1 HT CPU ?? (Score:2)
    
    by MyHair ( 589485 ) writes:
    
    I have seen screenshots of Windows task manager showing (2) CPU performance graphs.
    
    Since the "Professional" line of NT/2K/XP kernels only support two processors, does this mean you can only use one HT CPU?
    - Re:Win2K 2 CPU == 1 HT CPU ?? (Score:2)
      
      by spongman ( 182339 ) writes:
      
      XP sees HT processors as a single processor from a licensing standpoint so, yes, you can use two on XP pro.
- Re:Also the Pentium 4 - 3 Ghz is hyperthreaded. (Score:2, Interesting)
  
  by TokyoBoy ( 217214 ) writes:
  
  Does anyone know if AMD will be doing something similar or if their current processors do something like this? I know that many High Performance Clusters use SMP machines and multi-threaded code and could take advantage of HT. Many clusters are made with AMD processors due to the fact that they are so much less expensive than Intel.
  - Re:Also the Pentium 4 - 3 Ghz is hyperthreaded. (Score:3, Insightful)
    
    by Henry V .009 ( 518000 ) writes:
    
    Yes, they do something almost exactly like it. Simply buy two processors and a multi-processor motherboard. That defeats the purpose of this technology, of course, but it nearly accomplishes the same thing.
    
    Other than that, well, I'm--still--waiting for Hammer. AMD is dropping a long ways behind Intel. Price is all they've got, and AMD isn't even competeing on price-performance real well at the moment. My guess is that Intel hyperthreaded systems will probably be better price-performance wise than AMD before long--if they aren't already.
What's really cool also (Score:4, Interesting)

by esconsult1 ( 203878 ) writes: on Tuesday January 14, 2003 @03:57PM (#5082611) Homepage Journal

We've used XEON's on our DB server for a few months now. The performance has been outstanding. You also see 4 processors when you run top.

At first we thought this was an error, and got in touch with Dell's tech support. But the geeks there said this is normal behavior.

Share
twitter facebook
- Re:What's really cool also (Score:2, Interesting)
  
  by JCholewa ( 34629 ) writes:
  
  > We've used XEON's on our DB server for a few months now. The performance
  > has been outstanding. You also see 4 processors when you run top.
  
  > At first we thought this was an error, and got in touch with Dell's tech support.
  > But the geeks there said this is normal behavior.
  
  Of course it's normal behavior. Windows is (well, basically) counting the number of threads that the system can simultaneously execute (that's probably not entirely an accurate depiction), not the number of physical processors. But this does not mean that you're getting the performance of four processors. You still only have the execution resources of two processors at your system's disposal. The best that simultaneous multithreading can do is make more efficient use of the existing execution units. This can result in very nice performance boosts, no performance boosts at all, and (in some rarer circumstances) performance penalties. But it is in no way anything near like having that actual number of processors.
  
  Probably, a good rule of thumb would be "if it already stresses the execution units, then you won't see a boost, but if the code causes frequent thread stalls, then you'll probably see a nice jump".
  
  *EDIT* Crap, I didn't notice that you said top. Made the assumption about Windows. Sorry about that. My post more or less stands as the same, though. :)
- Re:What's really cool also (Score:4, Funny)
  
  by DivideX0 ( 177286 ) writes: on Tuesday January 14, 2003 @04:56PM (#5083136)
  
  But do you really want to see additional processors, wouldn't SCO want to charge you more for them?
  Earlier SCO Story [slashdot.org]
  
  Parent Share
  twitter facebook
  - Re:What's really cool also (Score:2)
    
    by garyrich ( 30652 ) writes:
    
    funny maybe, but the answer is probably yes. I know that If you have 2 mutithreading processors you have to have a versino of Windows that supports 4 processors - and that's a big additional chunk of change.
- Re:What's really cool also (Score:2)
  
  by ostiguy ( 63618 ) writes:
  
  Yup, I've heard task manager in win2k also shows 4 meters on a 2 physical cpu box.
  
  ostiguy
  - Re:What's really cool also (Score:3, Informative)
    
    by afidel ( 530433 ) writes:
    
    Nope, win2k only shows 2, well at least for pro. It also binds to the first physical cpu and it hyperthreaded child. For this reason you have to turn off hyperthreading if you are going to install win2k pro on a 2 physical cpu workstation, I should know I have a system reimaging right now because it came from the factory with hyperthreading enabled and so only 1 physical cpu was being used.
    - Re:What's really cool also (Score:4, Interesting)
      
      by Richard_at_work ( 517087 ) writes: on Tuesday January 14, 2003 @06:21PM (#5083675)
      
      Windows2000Pro only shows two as thats all it can handle. Its part of the Windows2000 limitation:
      
      Windows2000pro - 2 cpus
      
      Windows2000server - 4 cpus
      
      Windows2000AdvServer - 8 cpus
      
      We put win2kserver on a dual Xeon with HT, and it showed 4 cpus (this was when we realised we had HT capable Xeons! Suree enough, after checking, we were right)
      
      Parent Share
      twitter facebook
- Re:What's really cool also (Score:2)
  
  by Chundra ( 189402 ) writes:
  
  Yeah. I dunno about other systems, but on my supermicro p4dp6 the POST messages even say there are 4. Using my own ad hoc benchmarks with dnetc, it appears like there are 2 fast processors and the hyper-threaded ones crunch at around 20%-30% of those.
Fundamental mistake (Score:5, Insightful)

by cbcbcb ( 567490 ) writes: on Tuesday January 14, 2003 @03:58PM (#5082623)

>It compares the performance of a Linux SMP kernel that
>was aware of Hyper-Threading to one that was not."
But if you aren't going to use hyper threading you would use a UP (non-SMP) kernel, which would gain you considerable performance. The benefits are not so clear cut as many of the benchmarks show limited benefit from hyperthreading and would perform faster on a uniprocessor kernel.

Share
twitter facebook
- Re:Fundamental mistake (Score:2)
  
  by pergamon ( 4359 ) writes:
  
  Well, unless you have a computer that has multiple physical CPUs.
  - Re:Fundamental mistake (Score:2, Interesting)
    
    by JCholewa ( 34629 ) writes:
    
    > Well, unless you have a computer that has multiple physical CPUs.
    
    You have a point, but he does, as well. SMT ("hyper-threading") should work automatically for multiprocessor systems. So if you have a dual processor, SMT-capable board in a system that's unaware of the SMT functionality, you should still get a boost from SMT. Unless Hyper-Threading is a really, really bizarre implementation of SMT. Reviewers should really compare against an SMP system that is incapable of doing SMT, because it'll do it automatically (or it should), even if you don't tell it to. Alternatively, you could approximate the same results by forcing the system to only use a number of threads equivalent to the number of processors. Not all programs can do this, though (compiling is the only thing that immediately comes to mind).
    
    Granted, I've been out of the loop a bit, so I might be making some really off the wall (and inaccurate) assumptions about Intel's SMT implementation.
    - Re:Fundamental mistake (Score:5, Interesting)
      
      by afidel ( 530433 ) writes: on Tuesday January 14, 2003 @05:19PM (#5083291)
      
      nope, you make incorrect assumptions, the hyperthreaded portion of the cpu shows up to software as a seperate cpu. For this reason a win2k pro machine has to have hyperthreading disabled on a dual xeon machine or else it will just use the first physical cpu and its child hyperthread. This is why artificial smp limitations suck. Also win2k server will only allow 4 cpus in standard edition so it can only utilize two physical cpus and their hyperthreads. Windows Server 2003 ups the amount of cpus allowed for standard edition to 8 to account for this.
      
      Parent Share
      twitter facebook
      - Re:Fundamental mistake (Score:2)
        
        by afidel ( 530433 ) writes:
        
        WRONG. Win2k pro will only use 2 cpu's total, XP Pro on the other hand will handle 2 physical cpus with or without hyperthreading. I just had to reimage a HP X4000 because it came from the factory with hyperthreading enabled (this is not usually the case not sure how it happened) and so the first 2 cpus the installer saw were physical cpu #1 and HT #1, physical cpu #2 and HT #2 were not used. This may have changed with SP3, but I doubt it as it is a major architecture change.
    - Re:Fundamental mistake (Score:2)
      
      by sjames ( 1099 ) writes:
      
      Reviewers should really compare against an SMP system that is incapable of doing SMT, because it'll do it automatically (or it should), even if you don't tell it to.
      
      By default, hyperthreading will be used. Every board i've seen that supports it has a BIOS option to disable the virtual processor(s) by setting a bit in one of the MSRs.
Imagine a Beowulf Cluster of these (Score:5, Funny)

by Anonymous Coward writes: on Tuesday January 14, 2003 @04:00PM (#5082633)

All operating on a single chip!

Share
twitter facebook
But the real question... (Score:3, Interesting)

by Jace of Fuse! ( 72042 ) writes: on Tuesday January 14, 2003 @04:00PM (#5082635) Homepage

Does SMP support automatically allow benefits from Hyperthreading, or does that require special support all it's own?

Share
twitter facebook
- Re:But the real question... (Score:2, Insightful)
  
  by stratjakt ( 596332 ) writes:
  
  >> Does SMP support automatically allow benefits from Hyperthreading
  
  Yes
  
  HT essentially partitions out the CPUs pipeline into two pipelines executing concurrently: That is, two CPUs on the same die.
- Re:But the real question... (Score:5, Informative)
  
  by norton_I ( 64015 ) writes: <hobbes@utrek.dhs.org> on Tuesday January 14, 2003 @04:14PM (#5082787)
  
  SMP already can gain benefit from hyperthreading. However, an OS really needs special support to A) get the most out of hyperthreading and B) avoid worst-case scenarios, especially when you have both multiple physical CPUs and multiple logical CPUs per physical CPU.
  
  For instance, if you have two processes running, you want to put them on different physical CPUs, and if you have a choice, grouping threads with the same memory image on a single processor improves cache usage.
  
  Without this, hyperthreading may
  
  Parent Share
  twitter facebook
- Re:But the real question... (Score:3, Interesting)
  
  by lederhosen ( 612610 ) writes:
  
  Yes, as said in the other posts,
  BUT, you want to schedule the
  same process on the same CPU in
  order to not trash the cache.
  
  I.e. you can make a huge inprovement
  by make the scheduler aware of
  processors *and* logical processors.
- Re:But the real question... (Score:2)
  
  by Mattsson ( 105422 ) writes:
  
  Well, the *article* says that you get more performance if you patch the kernel to be optimized for ht. =)
  It also says that you get a performance boost ever by using the standard smp kernel.
- - Wrong about XP (Score:2, Informative)
    
    by DudemanX ( 44606 ) writes:
    
    While Win2K will see a hyper-threaded CPU as 2 physical CPUs, WinXP is smart enough to see it as a CPU and a Virtual CPU. At the last Intel conference I attended they made sure to emphasize that while XP Home doesn't support 2 phyisical prosessors it will properly recognize a hyper-threaded CPU and allocate resources accordingly. Do you think Intel would enable the technology in the P4 3Ghz(a desktop CPU) without making sure Microsoft supported it in their desktop operating systems?
good stuff (Score:5, Insightful)

by The Evil Couch ( 621105 ) writes: on Tuesday January 14, 2003 @04:02PM (#5082651) Homepage

The results on Linux kernel 2.4.19 show Hyper-Threading technology could improve multithreaded applications by 30%. Current work on Linux kernel 2.5.32 may provide performance speed-up as much as 51%.

while it may not be very useful for a single-user box(it actually looks like it would be a detriment), integrating it into client-server situations would give us some nice boosts in performance. web servers ought to see some real gains with this.

Share
twitter facebook
- - Re:good stuff (Score:2)
    
    by cioxx ( 456323 ) writes:
    
    you would, if you're running an ircD server.
  - Re:good stuff (Score:5, Insightful)
    
    by windex ( 92715 ) writes: on Tuesday January 14, 2003 @04:26PM (#5082893) Homepage
    
    You aren't looking at this logically. It's not that "you need that much CPU for a webserver", is that "look at how many more customers you can squeeze in per server".
    
    This lowers cost for providers, and eventually lowers costs for consumers.
    
    Yee haw.
    
    Parent Share
    twitter facebook
    - Re:good stuff (Score:2)
      
      by The Evil Couch ( 621105 ) writes:
      
      precisely. more powerful, more efficient webservers mean lower overhead costs, which is never a bad thing.
  - It depends... (Score:2)
    
    by sheldon ( 2322 ) writes:
    
    Obviously that depends.
    
    If your web server is just doing static content, then probably not as a 486 can saturate a T1.
    
    If your web server is doing dynamic content, then possibly.
  - Re:good stuff (Score:2)
    
    by Anonvmous Coward ( 589068 ) writes:
    
    "err ... do you really need that much CPU power for a webserver ?"
    
    Ask one of Slashdot's victims when they come back on-line.
  - Re:good stuff (Score:3, Informative)
    
    by koreth ( 409849 ) writes:
    
    Depends on what site you're running. If you read your traffic report and say to yourself, "Wow, 10000 hits yesterday! A new record!" then no. If you say to yourself, "Uh oh, only 7500000 hits yesterday, must have been a big network outage somewhere," then yes.
    There's a reason some sites have multiple racks of dedicated web servers, and any technology that lets them serve more users in less physical space is going to be a win if the cost isn't prohibitive.
What are you talking about? (Score:2, Interesting)

by pVoid ( 607584 ) writes:

The article clearly shows that syscalls and basically OS dependant stuff rarely improves in performance, in fact decreases in most spots.
Of course multi-threaded applications are going to improve. What's your point?
For those who didn't RTFA:
Simple syscall 1.10 1.10 0%
Simple read 1.49 1.49 0%
Simple write 1.40 1.40 0%
Simple stat 5.12 5.14 0%
Simple fstat 1.50 1.50 0%
Simple open/close 7.38 7.38 0%
Select on 10 fd's 5.41 5.41 0%
Select on 10 tcp fd's 5.69 5.70 0%
Signal handler installation 1.56 1.55 0%
Signal handler overhead 4.29 4.27 0%
Pipe latency 11.16 11.31 -1%
Process fork+exit 190.75 198.84 -4%
Process fork+execve 581.55 617.11 -6%
Process fork+/bin/sh -c 3051.28 3118.08 -2%
is it just me? or does the linux kernel not perform so much better in SMP HT?
- It's just you (Score:5, Insightful)
  
  by Royster ( 16042 ) writes: on Tuesday January 14, 2003 @04:40PM (#5083013) Homepage
  
  WHat you've conveniently snipped out in your trollish post is all of the applications benchmarks showing improvements. If you're not going to run any application code, you might as well shut the machine off and save the marginal stress on the environment.
  
  Most of us have our computers do work and those applications, running on an OS which has *barely* slowed, will be able to do more work in the same amount of time under the HT-aware OS than under one which does not utilize the second, virtual processor.
  
  Parent Share
  twitter facebook
- Re:What are you talking about? (Score:2)
  
  by be-fan ( 61476 ) writes:
  
  Hah! The first poster on the OSNews thread about this story wasn't impressed either. Apparently, a lot of people don't have the attention span to read more than the first few tables in an article!
- - Re:What are you talking about? (Score:5, Interesting)
    
    by pVoid ( 607584 ) writes: on Tuesday January 14, 2003 @04:22PM (#5082852)
    
    You're assuming the kernel is one thread =)
    The kernel has lots of work to do when you call into it. Of course it wouldn't boost up if all kernel calls were like this:
    void myKernelFunc( long param ) { return param * 2 - 23; }
    But kernels do work.
    See, something that very few people know is that the NT kernel is fully pre-emptible, fully-interuptible, and re-entrant by design. ie. even for single processor systems, it's like that.
    The linux kernel is not.
    It's very hard to 'tack on' SMP features onto a system that wasn't made with that in mind in the first place.
    This has some advantages, and some drawbacks... NT kernel programming is *frigging* hard. HARD.
    But it also has the advantage that it makes *much* better use of SMP.
    Sad, but true.
    
    Parent Share
    twitter facebook
    - Re:What are you talking about? (Score:2)
      
      by IamTheRealMike ( 537420 ) writes:
      
      Well, I think the Linux kernel is now pre-emptable to some extent (not everywhere, but most places). Robert Love did the work necessary. I don't know how well it stacks up against the NT kernel, but I'd guess over the years it'll close the gap.
      - Re:What are you talking about? (Score:2)
        
        by pVoid ( 607584 ) writes:
        
        Thank you for probably the only rational post I've seen so far.
        Well, I agree with you, over the years, the gap will probably slowly close. With kernels, really, much more than the kernel, it's all the parafanelia that counts. The countless drivers that have, or have not been designed with pre-emtability - and interruptability in mind.
        Even if the kernel is modified completely overnight, it will take a few years for the whole kernel mode system to catch up.
        Bottom line is, and given that this benchmark comes from IBM it really doesn't surprise me, there isn't much there to see *yet*.
    - Re:What are you talking about? (Score:2)
      
      by Paul Jakma ( 2677 ) writes:
      
      You're assuming the kernel is one thread =) Of course it wouldn't boost up if all kernel calls were like this:
      
      void myKernelFunc( long param ) { return param * 2 - 23; }
      
      urrgg... what balderdash.
      
      Look at what the LMBench benchmark is doing - in most cases it tests fairly specific OS paths. Eg a copy of data from userspace to kernel. Or a fork + exec. While these fairly shorts paths may not be "multi-threaded" the kernel itself still is.
      
      the NT kernel is fully pre-emptible, fully-interuptible, and re-entrant by design
      
      Care to explain what these mean?
      
      I doubt very much its 100% preemptible and interruptable - eg the initial OS interrupt vector must not be interrupted, drivers have often have a requirement to turn off interrupts (usually of the one they handle, but sometimes all interrupts). And even if by some magic NT is "fully interruptable and preemtible" - that does not mean it has much of a gain. You must lock data before you can access it, the kernel might preempt one task with another, but that second task might find that data it needs is locked and so has to spin or sleep.
      
      Highly threaded and preemtible kernel's also suffer from complexity (affects maintainability and stability) and this complexity and locking overhead can quite easily lead to /worse/ performance for many workloads.
      
      The linux kernel is not
      
      False. Linux acquired fine grained locking in 2.2, and has moved ever away from the big kernel lock since then. 2.5 is almost bkl free.
      
      But it also has the advantage that it makes *much* better use of SMP. (than linux you mean presumably.)
      
      Perhaps you may want to run some benchmarks and then come back and revisit that claim. Eg, which OS holds the SPECWeb record? Linux process/thread creation is an order of magnitude better than that of NT. etc..
      - Re:What are you talking about? (Score:3, Informative)
        
        by pVoid ( 607584 ) writes:
        
        the NT kernel is fully pre-emptible, fully-interuptible, and re-entrant by design
        Care to explain what these mean?
        I'll explain to you what this means:
        it means a piece of hardware raising an interupt will launch a driver's ISR, hardware wise, at that point, any other hardware raising an interupt at a lower IRQ level will not be serviced. BUT, any higher IRQLed hardware can take over of that 'thread' that is servicing the first interupt. There are *no* exceptions to this rule. In effect, your ISR is fully interuptible.
        The thread dispatcher runs at DIRQL (D=Disptach), any IRQL higher than DIRQL is kind of beyond the concept of threads. But any thread running bellow DIRQL (ie, APC or normal threads) are fully pre-emtible. There is no thread that has *any* priority of not being pre-empted anywhere in the system.
        All of the kernel is also re-entrant, which means from anywhere within the kernel, so long as you are in proper IRQL, you can call back the kernel.
        At these altitudes, or depths, whichever you wish, there are many strange beasts that you've never heard of - or never had a reason to really use - that are being used to do synchronization. Namely spin locks. Moft came up with queued spin locks a few years ago, and that was a rather Good Thing (tm). It made spin locks so much better under SMP systems.
        Now, to all the posts that say "all I need is an OS that doesn't suffer from SMP", all I have to say is why do you use SymmetricalMP in the first place??! Why not use assymetrical processing, and just queue all interupts on a single CPU? it'll sure as hell simplify everything, and reduce overloading the bus with those damn spin locks!
        If you're going to claim that the Linux kernel is doing a good job of an SMP system, you have to show me it's actually performing better. Not just allowing more threads to run on more processors... everyone can do that.
        Second thing is: don't flatter yourself by 'easy wins'. All I'm saying is that this is just *not* an win for linux. It's only a win for HT and multithreading... but hey, we all knew that Multithreading is a Good Thing(tm)... right?
Only Threads ? (Score:2, Insightful)

by makapuf ( 412290 ) writes:

I know, there might be many places where it has been discussed before, but could someone please tell me if HT is only for threading or can it be used for precesses, too.
And I know, they are essentially the same syscall under linux, and might be faster, b/c of synchronization issues wrt to the memory access IIRC ...
- Re:Only Threads ? (Score:2)
  
  by Wesley Felter ( 138342 ) writes:
  
  Threads and processes are almost the same thing in Linux, so HT benefits them both.
51% speed-up! (Score:5, Interesting)

by core plexus ( 599119 ) writes: on Tuesday January 14, 2003 @04:05PM (#5082688) Homepage

An excellent, detailed article. For those in a hurry:
"Conclusion
Intel Xeon Hyper-Threading is definitely having a positive impact on Linux kernel and multithreaded applications. The speed-up from Hyper-Threading could be as high as 30% in stock kernel 2.4.19, to 51% in kernel 2.5.32 due to drastic changes in the scheduler run queue's support and Hyper-Threading awareness."
My questions: What's the downside? Is AMD doing anything similar?
Fight with computer brings SWAT team [xnewswire.com]

Share
twitter facebook
- Re:51% speed-up! (Score:5, Informative)
  
  by PCM2 ( 4486 ) writes: on Tuesday January 14, 2003 @04:16PM (#5082813) Homepage
  
  The downside is that for code that isn't SMP/HT-aware, performance can actually degrade. Tom's Hardware ran tests [tomshardware.com] of hyperthreading on the 3.06GHz P-4, and in almost every case, it performed better with hyperthreading disabled.
  
  Parent Share
  twitter facebook
  - Re:51% speed-up! (Score:2)
    
    by Julian Morrison ( 5575 ) writes:
    
    The downside is that for code that isn't SMP/HT-aware, performance can actually degrade.
    
    How many modern programs use no kernel threads / multiple processes at all? Not many I'm guessing.
    - Re:51% speed-up! (Score:3)
      
      by tshak ( 173364 ) writes:
      
      Although threading is popular for server based apps, for normal desktop apps threads should be sed lightly or not at all. Take an Mp3 coder for example. Sure, the MP3 encoding itself will launch a thread to update the status bar, but the real CPU hog is the encoding itself which is done in a single thread. According to Tom Pabst in this scenarion the MP3 encoding will perform slower than a non-HT proc.
      
      Also, consider another big peformance hog, games. Although a Game Server may take advantage of HT, I don't think (and this is purely speculation based on _minimal_ 3D engine programming experience) it would be a good idea for games to use threads. Threads carry overhead, and they also can make your codebase difficult to manage.
  - Re:51% speed-up! (Score:2)
    
    by neurojab ( 15737 ) writes:
    
    Those tests are very end user (luser) specific. Yes it's true that the majority of people running games on Windows won't benefit a bit from HT... neither would those people benefit from ordinary SMP. That's why Intel left HT disabled in all P4 models until recently. Server workloads are very different, where every decent app makes heavy use of threads, and therefore benefits much more from HT (and SMP). The IBM tests are pertinent to servers and some power users.
- Re:51% speed-up! (Score:2)
  
  by Zathrus ( 232140 ) writes:
  
  What's the downside?
  
  Well, if your apps aren't multi-threaded then they can't make use of it. If you don't run enough CPU-intensive processes on the box, it won't buy you anything and may actually hurt you.
  
  If you look at the benchmarks not all the numbers are in the positive realm... although if you exclude the sync read/write numbers then it's generally a rather small difference.
  
  Is AMD doing anything similar?
  
  Not to my knowledge. They're betting the farm on Opteron/Athlon64.
Application dependant (Score:5, Insightful)

by PaschalNee ( 451912 ) writes: <pnee@NosPAM.toombeola.com> on Tuesday January 14, 2003 @04:06PM (#5082695) Homepage

The pretty detailed (for me anyway) article [slashdot.org] on Ars Technica concludes [arstechnica.com] that performance on a HyperThreaded CPU will be very much dependant on the application mix. While research like this is useful it will probably always be a try and see scenario.

Share
twitter facebook
- Concurrency benefits from HT (Score:2)
  
  by Jeppe Salvesen ( 101622 ) writes:
  
  Simply put, you'll need two or more processes consuming all available CPU power before you'll see some real benefits from HT. If you're severely IO-bound, running a high-end FC SAN solution on an old P2 server will outperform a 5ghz machine with a mediocre disk.
  
  So - yes, not all people and applications will benefit from this. But no - it is not try and see.
HT hurt perf (Score:5, Interesting)

by steelerguy ( 172075 ) writes: on Tuesday January 14, 2003 @04:07PM (#5082701) Homepage

Tested HT running couple large jobs on a 2 CPU box with each process using over a GB of RAM. Performance went down.

Also HT can play havoc with a openMosix cluster since processes can start being migrated around to CPU's that do not really exist and appear to have no load, yet the physical CPU may be 100% loaded in reality.

It is not all peaches and cream.

Share
twitter facebook
- Re:HT hurt perf (Score:3, Informative)
  
  by Zathrus ( 232140 ) writes:
  
  processes can start being migrated around to CPU's that do not really exist and appear to have no load, yet the physical CPU may be 100% loaded in reality
  
  The article indicates that they're fixing this in the 2.5 branch. Lots of additional patches to the scheduler to let it comprehend the difference between physical and logical processors and do the Right Thing with them.
  
  Oh, and if you're running a 2 CPU box with only a couple (as in two) large jobs then no, you won't see a performance gain. You already have 1 CPU/process and HT would just be additional overhead.
  - Re:HT hurt perf (Score:2)
    
    by jelle ( 14827 ) writes:
    
    I didn't see a mention of OpenMosix [openmosix.org] in the article, and the original poster worried specifically about process migration in an openmosix cluster.
- Re:HT hurt perf (Score:2)
  
  by pizza_milkshake ( 580452 ) writes:
  
  the "unbalance" problem between physical/virtual cpus was mentioned in the article, and it was said that as of 2.5.32 it was addressed.
  i guess it depends what apps you're running; from the article it looks like (web|file|db)servers (and a kernel that runs a smarter scheduler than 2.4.17 had) might be able to squeeze out a little (~30%) performance gain.
  - HT Raises performance, depending. (Score:2)
    
    by mmol_6453 ( 231450 ) writes:
    
    Keep in mind, a 30% gain (for the 2.4 series) in a 2GHz machine would equate to a machine that performed server-oriented functions at an effective 2.6GHz.
    
    When they benchmarked 2.5.32, they showed a 51% increase, which would boost your effective server performance to 3GHz.
    
    Granted, the way I understand it, the actual coordination of core components for the two threads is hard-wired or in firmware. That means Intel can still improve HT, to get a better performance boost. To further that line, consider if Intel were to add additional core sections of their CPUs, to be allocated dynamically by the firmware. That means you're increasing your per-clock performance without the major overhead of developing a whole new CPU core.
    
    I can't see Microsoft standing for it. Intel could put all the pieces for two CPUs on the same die, and call it HT. You might have all the functionality of a dual-CPU setup, with less latency, and still have it show up as a single HT-enabled processor.
    
    With the way Microsoft's handling SMP machines (with CPU licenses), in addition to their statement that they are developing a 64-bit version of Windows based on the Hammer architecture, I think AMD's future looks pretty bright.
Useful for development? (Score:5, Insightful)

by NixterAg ( 198468 ) writes: on Tuesday January 14, 2003 @04:08PM (#5082710)

Like most development shops, we do a great deal of development for multiprocessor machines so we write a lot of multithreaded code. Multithreaded code creates a whole host of new debugging pitfalls that don't show up if the developer is debugging on a single processor workstation. As John Robbins says in his terrific Debugging Applications [microsoft.com] book, if you are developing a multithreaded application, you better be certain you are doing your debugging in a multiprocessor environment.

From a development standpoint, will a hyperthreaded chip provide an adequate environment in duplicating the behavior of a multi-processor PC well enough that shops can buy cheaper, one CPU machines for development and still be confident in their results? I'm guessing nothing will replace the real thing but I'd be interested in any commentary.

Share
twitter facebook
Humph! (Score:4, Funny)

by airrage ( 514164 ) writes: on Tuesday January 14, 2003 @04:08PM (#5082721) Homepage Journal

Well if I must say something, it' this: that's really going to put a fancy how-do-you-do in the knickers of all those pay-per-processor software types. I mean Oracle, for heaven's sake, is going to have to go absolutely bonkers trying to figure out how to screw the light-bulb into that buffalo (if you pardon my french). I mean what's a meglomanic to do? I mean I've got expenses! I've got tricarbonfiberalloy yacht hulls to pay for! Can have people going around trying to process code in a processor without us getting some slice of that monkey, I'll tell you right here and now sir! No sir! Maybe it's your not patriotic enough. Trying to cut corners, eh gov'nor? Now I'm gonna have to go and rewrite all the contracts stating explicitly that "processor" is defined as a virtual space for processing. Yes that ought to do it. But I'll still have to have the lawyers check it, just to make sure they aren't any loopies. Drats those laywers! Taking all my money too!

Share
twitter facebook
- - Summary (Score:3, Insightful)
    
    by swillden ( 191260 ) writes:
    
    So, in a nutshell, what MS says is: Windows 2000 counts processors in a broken way and requires you to buy licenses for every logical processor, even though you won't get nearly as much processing power as you would if you really had that many physical processors. But rather than fix this bug, we're going to solve the problem by making you buy .NET, which counts processors correctly. So either way, if you're going to use hyperthreading, expect to send us more money.
Hyper(Space)Threading (Score:5, Funny)

by LookSharp ( 3864 ) writes: on Tuesday January 14, 2003 @04:12PM (#5082758)

If you overclock the Xeons (And newer P4 CPUs) too high...

"Prepare to go to HyperThread."

"Go to HyperThread!"

*WHOOSH*

"My God, they've gone plaid!"

(Just to keep on topic, this is a very informative shootout between HT/non-HT Intel and AMD SMP processors setups here. [gamepc.com])

Just couldn't resist the Spaceballs reference, tho!

Share
twitter facebook
- Re:Hyper(Space)Threading (Score:5, Funny)
  
  by The Evil Couch ( 621105 ) writes: on Tuesday January 14, 2003 @04:56PM (#5083141) Homepage
  
  so you're saying AMD's response is going to be to go to LudiciousThreading?
  
  Parent Share
  twitter facebook
  - Re:Hyper(Space)Threading (Score:2)
    
    by Scooter ( 8281 ) writes:
    
    Well, it aint like dusting crops.
Executive summary... (Score:3, Informative)

by guido1 ( 108876 ) writes: on Tuesday January 14, 2003 @04:13PM (#5082771)

Hyperthread support vs not.

Standard API calls (w/ hyper thread) Increase (a bad thing (tm)) of latency of calls by 1-6%.

STD workload (w/ hyper thread) Increase in throughput an average of 5-10%. Disk writes decreased throughput by 30%.

Client network perf: "Chat room" test, increase of throughput 22-28%.

Server network perf: File serving, increase of 9-31%.

Kernal 2.5.24 roughly doubles the above benefits.

Looks like no real downfalls... (How often are you running a single thread? Me either.)

Share
twitter facebook
In other news... (Score:3, Insightful)

by dirvish ( 574948 ) writes: <dirvish.foundnews@com> on Tuesday January 14, 2003 @04:15PM (#5082792) Homepage Journal

Hyper-Threading Speeds Windows

Share
twitter facebook
But... (Score:2)

by suwain_2 ( 260792 ) writes:

...I don't understand how this helps. I'm typing this on a Dual 1.4 GHz system -- even if a process is multi-threaded, it's still not as fast as a 2800 MHz processor. In addition, many programs can't take advantage of SMP, rendering dual processors 'useless' (for any single process; Linux distributes processes across processors.)

So if 2*1400 1400? Shouldn't taking, say, the 3 GHz P4 and 'emulating' SMP actually slow things down slightly? I don't understand how it can help, and am actually surprised that it doesn't *hurt* speedwise.
- Re:But... (Score:2, Insightful)
  
  by stratjakt ( 596332 ) writes:
  
  It doesn't 'emulate' SMP, it actually performs two operations at the same time by splitting the instruction pipeline in half (well not in half, it varies as to how much pipeline each 'cpu' gets). It's not as good as SMP for various reasons, mostly boiling down to the two threads sharing the rest of the chip.
  
  It does 'hurt' sometimes, but it's usually negligable, and you have to pretty much go out of your way to design code that would run slower - such code can 'hurt' traditional SMP systems as well.
  
  I'm sure there will be plenty of cooked benchmarks for fanboys to rant about in the future, just like there are between 3DNow! and MMX/SSE/2..
  
  It is a cool development, and *can* be shut off if it's only hindering your system (ie; you're running Windows 98 or a linux kernel with no HT support - and thus wasting pipeline to a 'CPU' that isn't used)
- Re:But... (Score:2)
  
  by be-fan ( 61476 ) writes:
  
  Look at it this way. The CPU has a bunch of execution units on it. The P4, specifically, has two arithmatic units, two FPUs, and some other stuff. Since threads usually don't use all these units optimally, some are wasted. A second simultanious thread might be able to use the otherwise unused units, and thus the overall performance of the two threads combined increases.
Benefits!? (Score:2)

by caluml ( 551744 ) writes:

* 128-byte lock alignment
* Spin-wait loop optimization
* Non-execution based delay loops
* Detection of Hyper-Threading enabled processor and starting the logical processor as if machine was SMP
* Serialization in MTRR and Microcode Update driver as they affect shared state
* Optimization to scheduler when system is idle to prioritize scheduling on a physical processor before scheduling on logical processor
* Offset user stack to avoid 64K aliasing

Is that all?! I hoped it'd do the post-integer-supercooled-re-automation-longterm-bu zzword-cipher-reallignment too. That's something new that you guys haven't heard of yet ;)
In Other News. (Score:4, Funny)

by OS24Ever ( 245667 ) writes: <trekkie@nomorestars.com> on Tuesday January 14, 2003 @04:28PM (#5082909) Homepage Journal

Faster clock speed processors speed up Linux.

Share
twitter facebook
Proof Bogomips are bogus! (Score:4, Funny)

by Zathrus ( 232140 ) writes: on Tuesday January 14, 2003 @04:35PM (#5082965) Homepage

As if there wasn't enough already...

processor : 0
bogomips : 3191.60
processor : 1
bogomips : 3198.15

According to that the logical processor is actually faster than the physical one! Just think of what you could wind up with if you instantiated a logical CPU on the logical CPU!

Share
twitter facebook
SMP HT (Score:2)

by DrSkwid ( 118965 ) writes:

Anyone know of any details around SMP versions of HT CPUs. It's not a very google friendly set of search terms.

I expect that there would be a performance difference if the scheduler knew which were real cpus and which were half of an HT pair.

Even flags to fork concerning which processor to fork to. i.e. --this_cpu_but_different_HT_CPU
Because you might want the freedom to attempt to reduce the in-CPU cache misses and the like.

Likewise the the implmentation of Process Groups - setpgid() [die.net] warrants investigation.
Technical Summary (Score:5, Insightful)

by 0x69 ( 580798 ) writes: on Tuesday January 14, 2003 @04:48PM (#5083083) Journal

If you're running code that's efficient on a P4 (few mis-predicted branches, low cache miss rate, good parallelism, etc.) then HT is pretty much useless.

If you're running code that's inefficient on a P4 (which pays for its high GHz with long pipelines, large latencies, a slow decode stage, and several other drawbacks), then HT can usually paper over a fair percentage of these problems. But remember that HT requires OS support, may require application support, and "your mileage will vary".

Share
twitter facebook
- Re:Technical Summary (Score:2)
  
  by iggymanz ( 596061 ) writes:
  
  but what about 2 unrelated apps runninng at the same time? Not everyone runs just one heavy program.
- Re:Technical Summary (Score:3, Interesting)
  
  by cartman ( 18204 ) writes:
  
  What you said was false.
  
  Take the example of database & OLTP applications. Database transactions are heavily dependent on repeated access to RAM. Virtually no database is small enough to fit into cache, and there is often little regularity in which data is accessed. Memory latency will REQUIRE a non-SMT processor to wait IDLY each time there is a memory latency, which takes >100 proc cycles on a modern CPU. This has NOTHING to do with he p4 architecture or long pipelines.
  
  "But remember that HT requires OS support, may require application support..."
  
  HT does not require OS support as long as the OS is capable of recognizing more than 1 CPU. Any threaded app can benefit from HT.
Expensive HT or cheap real SMP? (Score:5, Insightful)

by ponos ( 122721 ) writes: on Tuesday January 14, 2003 @04:54PM (#5083115)

In Europe P4 3.0 with HT costs ~745 euro (+tax)
An Asus A7M for dual Athlon costs ~260 euro (+tax)
Two Athlon XP 2200+ cost ~340 euro (+tax).
Alternatively you can get two Athlon MP 2000+ for
roughly the same money (if you don't trust the
XPs).

Now, please explain to me why would someone
with real SMP needs in mind (and NOT games)
consider the P4 with HT.

P.

P.S. I understand that the prices in the US are
different, but still, it is VERY expensive.

Share
twitter facebook
1.6GHz? (Score:2)

by WasterDave ( 20047 ) writes:

This is fine, I guess, if you're going to run a processor as slow (!) as this. Point being that a hyperthreaded system will place greater demands on the ram bandwidth.

With a slow processor they may be using 80% of the available bandwidth instead of 60% with HT switched off. Upping to processor speed to ... say 3GHz, where HT is enabled in vanilla P4's ... and we can expect to see the memory bandwidth being toasted continuously. Under these conditions I doubt we would see a speedup at all, and quite possibly the reduced cache efficiency would reduce it.

Executive Summary: Can we do this again with a non-Xeon P4 3GHz?

Dave
The price comparison to 2 x Athlon-MP? (Score:2)

by wytcld ( 179112 ) writes:

If the results are similar to running SMP with two processors (and they look roughly similar), isn't a system with 2 Athlon-MPs still cheaper for a given performance level?
License issues? (Score:2)

by Stonent1 ( 594886 ) writes:

Don't companies like (guessing) Oracle charge by how many processors you use with their software? I know for solaris (even intel) you are licensed by how many cpus you can use. (Just like windows I guess, 1, 2, 4, 8+ cpus)

Also since XP Home is only single processor capable where does that leave the home users that buy 3.x Ghz computers? Surely it wouldn't be long before someone figures out how to swap a multiprocessor HAL into XP Home...
HT is not single-chip SMP (Score:2)

by TeknoHog ( 164938 ) writes:

There are too many posts here asking how HT compares with SMP. Correct me if I'm wrong, but isn't HT quite a lot different:
To simplify greatly, if the CPU has separate units for integer and floating-point math (for example), Hyperthreading means you can use these units in parallel. Therefore, HT will not speed up pure integer or pure FP math, like SMP would. It will only speed up things if you run different kinds of process simultaneously.
Also, many people have noted that HT sometimes slows things down a bit. I don't find this very surprising because the OS needs more work to organize things for HT, but it may not have more CPU resources than a non-HT version.
Personally, I think HT is a good idea because it's using the existing hardware more efficiently in a true hacker spirit. However, it's nowhere near proper SMP.
- Re:HT is not single-chip SMP (Score:3, Insightful)
  
  by SpinyNorman ( 33776 ) writes:
  
  I don't believe that's correct.
  
  As I understand it HT can indeed speed up pure integer code (or more generally code that's competing for a single CPU resource). HT will allow another thread to exceute if the current one is waiting on anything from pipeline results to memory access. I believe that modern CPU/memory speed disaparity was one of the driving forces behind it - if one thread gets a cache miss then another may be able to continue executing rather than having to sit idle waiting for main memory.
Hyperthreading and memory access (Score:5, Informative)

by cartman ( 18204 ) writes: on Tuesday January 14, 2003 @07:45PM (#5084322)

One of the major impediments to increasing CPU performance has been increasing memory latency. Memory latency has grown worse as CPUs have gotten faster. Accessing RAM will now cause a >150 cycle latency, during which the processor sits IDLE.

Cache only partly mitigates this problem. Some applications, such as databases and OLTP, are heavily dependent on repeatedly accessing non-cached RAM. There is no way to cache all the relevant data, since virtually all databases are larger than can fit in any present cache, no matter how large, and there is sometimes no way to predict which data will be accessed. ALL of these applications have CPUs that spend much of their time being IDLE, waiting for memory to be returned.

SMT (hyperthreading) allows the processor to perform useful work during these otherwise idle periods, by allowing the cpu to switch to a thread that is not blocked on memory access. The "idle bubbles" in the execution pipeline can therefore be "filled in" by useful work that advances the state of relevant programs.

SMT can cause a degredation in performance beceause it can lead to "cache thrashing." In an SMT-naive kernel, two unrelated threads could be scheduled for the same physical CPU. These unrelated threads will likely share very little code or data. The two threads will therefore "compete" for the single shared cache, with each thread's data being repeatedly displaced by the other's.

This difficulty can be substantially mitigated by making the kernel aware of "virtual processors," and by implementing scheduleing algorithms to minimize the impact. The performance of hyperthreading will likely improve as kernels are better able to exploit it.

Share
twitter facebook
What is the margin of error? (Score:2)

by rufusdufus ( 450462 ) writes:

It really bugs me when I see benchmark numbers relied upon when they have not been presented as statistically significant.
Whenever you run a benchmark, you MUST run it multiple times and do the proper statistical calculations for standard deviation.
It is NOT VALID to do one run, and it is NOT VALID to average a bunch of runs without knowing what the deviation is.
Some times a benchmark's time will vary by more than 100%. Sometimes the reasons are valid, sometimes they are because of an error in the benchmark.
Without this sort of validation, the numbers presented should not be trusted.
SMT will become increasingly important (Score:5, Interesting)

by cartman ( 18204 ) writes: on Tuesday January 14, 2003 @08:11PM (#5084482)

SMT (hyperthreading) will become increasingly important when processors are able to execute more than 2 threads simultaneously.

This development is inevitable. Previously, each new processor generation was faster than the prior one at a given clock rate, because each new processor core had more execution units, and was therefore able to perform more work in parallel. This trend abruptly ended recently, for one reason: there is no more instruction-level parallelism (ILP) to exploit. It is impossible for a processor to look at a thead of execution and find more than a few instructions to execute in parallel.

The only parallelism left to exploit is THREAD-LEVEL parallelism (TLP). Therefore the only way to continually increase performance is to increase the number of threads that a CPU can execute in parallel. This requires two modifications to CPU cores: first, increase the number of thread contexts per CPU, and second, increase the number of pipelines to which those threads can be dispatched.

With the P4, it would be pointless to have more than 2 thread contexts, because there aren't enough CPU resources lying idle to execute more than 2 threads. But future CPUs could make use of more than 2 thread contexts by having enough CPU resources to execute all of them. Future CPUs could have 20 execution units or more, which would be enough to execute several threads. Remember that the number of transistors per CPU continues to increase exponentially.

It's easy to forsee a time when processors have 20 execution units (10 integer, 10 fp) and 4 thread contexts, offering more than triple the performance of a non-SMT cpu. In the future, non-SMT CPUs will make as little sense as a non-superscalar CPU would today.

Share
twitter facebook
- Re:excellent (Score:2)
  
  by pVoid ( 607584 ) writes:
  
  You are quite the Lame one.
  if you had read the article, you would have seen that the kernel doesn't show too many signs of superb HT usage. In fact, performance degrades in many places.
  Also, if you knew just an itsy bit about kernels, you would know that Microsoft has done some pretty good advancements and achievements in the SMP realm.
- Re:excellent (Score:2, Informative)
  
  by stratjakt ( 596332 ) writes:
  
  Yeah
  
  SMP support has existed since NT 4.
  
  If you use NT 4 MP edition, 2k Pro or XP pro, HT just works if you have the hardware.
  
  Linux had to change to accomodate it, as it bypasses the original system BIOS with it's own code.
  
  So what you meant to say was "once again Linux plays catchup to MicroSoft, but only about a year or so later this time, and not 5-10."
  - Re:excellent (Score:5, Informative)
    
    by Russ Steffen ( 263 ) writes: on Tuesday January 14, 2003 @04:56PM (#5083137) Homepage
    
    Holy intellectual dishonesty, Batman!
    
    NT and Windows 2000 do not support HT and never will. NT will not becuase it's been end-of-lifed, and Windows 2000 will not because of Microsft policy. On a 2-CPU system with HyperThreading, NT and Windows 2000 will think they have real 4 CPUs (unsurprisingly, this is what a pre-HT version of Linux will see as well). HT support means the OS knows that it has, in this example, 2 real CPUS and 2 fakes, and the scheduler will weight the real CPUs accordingly.
    
    XPPro SP1 is the first, and only shipping version of Windows to support HT.
    
    Parent Share
    twitter facebook
    - Re:excellent (Score:3, Informative)
      
      by zdarnell ( 16295 ) writes:
      
      Any SMP capable operating system supports HT. On the other hand license issues make combining true SMP and HT a pain on non server version of Windows.
      
      You're totally missing that part of the beauty of HT is the transparency.
      
      On the other hand you can write things SPECIFICALLY for HT to deal with things such as cache issues, but saying that windows doesn't support it at all is rather misleading and makes it seem like people wouldn't see any improvements at all.
  - - - Re:excellent (Score:2)
        
        by benzapp ( 464105 ) writes:
        
        Wow. Its posts like this that really make me feel old at 25.
        
        They had 486 SMP systems. In fact there was an awesome upgrade that came out like ten years ago that let you put two 486 processors in one socket. Of course you needed the clearance for it. SMP was actually all the rage ten years ago for the same reason PowerPC was all the rage. Intel had a hard time scaling, so one of the solutions was to us multithreading and divide up the work.
        
        OS/2 2.11 SMP was out in 1993, and NT 3.1 came out shortly thereafter. Both supported SMP. The Pentium Pro, which came out in early 1995 was highly optimized for 32-bit code and multiprocessing. 4 and 8 way Pentium Pro boards existed. And were somewhat common.
        
        If anything, SMP is LESS common today. When was the last time you saw a 4-way SMP board for sale anywhere? You could easily get them back then. The reason its less common today is processors really are a lot faster. Intel is doing this Hyperthreading crap because they know that orders of magnitude performance gains are a thing of the past, so multithreading is the key.
        
        Of course, us old OS/2 fanatics were saying this ten years ago.
        
        Re:excellent (Score:2)
        
        by Listen Up ( 107011 ) writes:
        
        Incorrect, OS/2 was SMP since 2.1. The OS/2 SMP model is still known to be one of the best SMP models to have ever been written. Click on this link http://www.byte.com/art/9406/sec11/art2.htm [byte.com] and learn something about OS/2 SMP (oh geez, it's 1994) and SMP in general.
- Re:Underwhelmed (Score:3, Informative)
  
  by pVoid ( 607584 ) writes:
  
  Compilers are notoriously single threaded monsters.
  On moft platforms, CL.exe goes file per file, and outputs. It's a linear opertation. So HT for compiling makes no difference.
  However, the NT DDK has a cool feature that allows you to spawn as many instances of CL as there are processors. Which you guessed it, is only of any use if you are compiling tens/hundreds of files.
  Sorry, I do not really know of compiler internals for *NIX. Maybe someone can back me up? or clear it up?
  - Re:Underwhelmed (Score:3)
    
    by Xerithane ( 13482 ) writes:
    
    Sorry, I do not really know of compiler internals for *NIX. Maybe someone can back me up? or clear it up?
    
    With gcc, the -j will setup gcc to utilize SMP. You specify the number of processors you physical have. I do not know how it would work with HT, and I didn't RTFA to see if they covered it. There is native support inside of gcc for SMP-based compiling though.
    - Re:Underwhelmed (Score:2)
      
      by JoeBuck ( 7947 ) writes:
      
      -j is an option for GNU make, not gcc. And there is no rule that says you must specify the number of processors you physically have; for big compiles, you'll get a somewhat better time if you say -j2 on a single-processor machine. This is because, when two gcc's run in parallel, one can take the processor while the other is waiting for disk.
      There is no native support inside of gcc for SMP-based compiling. gcc itself is completely sequential. You are perhaps thinking of parallel makes.
    - Re:Underwhelmed (Score:2)
      
      by danish ( 60748 ) writes:
      
      With gcc, the -j will setup gcc to utilize SMP. You specify the number of processors you physical have. I do not know how it would work with HT, and I didn't RTFA to see if they covered it. There is native support inside of gcc for SMP-based compiling though.
      
      Um, no, not quite. You pass the -j option to make. make will then go through your makefile, and assuming you wrote it right, run specified commands (like gcc) in parallel. You have to be careful about target dependencies when doing this, though. And this parallelization is even useful on uniprocessor machines, as if you use make -j2 you will get some gain in time in a big compile because while one gcc is doing I/O, the other can be using the CPU and compiling.
      Just to be a pedant,
      
      -chris
    - Re:Underwhelmed (Score:2)
      
      by hackstraw ( 262471 ) writes:
      
      Maybe this is a feature of newer releases of gcc, but I've never heard of -j doing auto SMP. There is a -j option for parallel makes with gnu make, but this is only for the compilation and not runtime.
      
      The portland group compiler [pgroup.com] and the intel compiler [intel.com]. Do support some auto-parallalization via openmp [openmp.org] and threads.
    - Re:Underwhelmed (Score:2)
      
      by cant_get_a_good_nick ( 172131 ) writes:
      
      Typo, I assume you meant GNUmake -j, not gcc.
    - Re:Underwhelmed (Score:2)
      
      by sholden ( 12227 ) writes:
      
      Gcc has no -j option. Make has a -j option.
      
      Which has nothing to do with SMP, it's simply how many jobs make will run simultaneously, which of course is a wise thing to use in a multi processor environment, but also a good thing to use in places where the CPU waits for IO (ie. if your code and output is stored on a disk).....
- Re:Underwhelmed (Score:2, Insightful)
  
  by stratjakt ( 596332 ) writes:
  
  It's not so great, if you need SMP you still cant beat two or more physical CPUs.
  
  In this scheme, the pipeline is split into two and two concurrent threads run in it. Which is pretty neat, but hurts performance in some situations.
  
  - Cache latency is basically doubled, as two VCPUs now fight over access to the cache
  
  - Pipeline depth is shortened for either given VCPU, which hurts code that was optimized for the longer pipelines (lots of matrix math, MMX stuff).
  
  It's a cool development in CPU design, but it has a ways to go, and the OS needs to be aware of it. You should be able to 'shut it off' in code on the fly, if you want to dedicate 100% real CPU to a given task.
  - Re:Underwhelmed (Score:2, Insightful)
    
    by dcmeserve ( 615081 ) writes:
    
    > Cache latency is basically doubled, as two VCPUs now fight over access to the cache
    
    I'm pretty sure this is wrong -- cache latency isn't doubled; the SIZE is HALVED. The two threads access two different virtual caches. Trying to get them to contend for a single cache would be an architectural nightmare.
    
    Though I believe it's still one physical cache -- which means that the latency is going to be higher than what you'd expect for a cache of its apparent size.
    
    > Pipeline depth is shortened for either given VCPU, which hurts code that was optimized for the longer pipelines (lots of matrix math, MMX stuff).
    
    I don't actually know about the pipeline, but I suspect this is wrong too: shortening the pipeline (reducing the number of stages) is a fundamental change in the architecture; a pipeline isn't something you can cut in half and give the front end to one process and the back end to another. Each stage is quite unique.
    
    Now if you mean that the latest Pentiums have a shorter pipeline than previous incarnations, then maybe that's right (though I'd doubt it -- they're always *lengthening* the pipeline to get those higher GHz numbers). But that would have nothing to do with Hyperthreading.
- Re:Something else to think about... (Score:2)
  
  by cant_get_a_good_nick ( 172131 ) writes:
  
  As opposed to MS OSes that are also sped up by this? If anything, it's more of a threat to other UNIX vendors where Linux and other x86 based free Unices get better performance/cost ratios.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Also the Pentium 4 - 3 Ghz is hyperthreaded. (Score:5, Funny)

Re:Also the Pentium 4 - 3 Ghz is hyperthreaded. (Score:5, Informative)

Win2K 2 CPU == 1 HT CPU ?? (Score:2)

Re:Win2K 2 CPU == 1 HT CPU ?? (Score:2)

Re:Also the Pentium 4 - 3 Ghz is hyperthreaded. (Score:2, Interesting)

Re:Also the Pentium 4 - 3 Ghz is hyperthreaded. (Score:3, Insightful)

What's really cool also (Score:4, Interesting)

Re:What's really cool also (Score:2, Interesting)

Re:What's really cool also (Score:4, Funny)

Re:What's really cool also (Score:2)

Re:What's really cool also (Score:2)

Re:What's really cool also (Score:3, Informative)

Re:What's really cool also (Score:4, Interesting)

Re:What's really cool also (Score:2)

Fundamental mistake (Score:5, Insightful)

Re:Fundamental mistake (Score:2)

Re:Fundamental mistake (Score:2, Interesting)

Re:Fundamental mistake (Score:5, Interesting)

Re:Fundamental mistake (Score:2)

Re:Fundamental mistake (Score:2)

Imagine a Beowulf Cluster of these (Score:5, Funny)

But the real question... (Score:3, Interesting)

Re:But the real question... (Score:2, Insightful)

Re:But the real question... (Score:5, Informative)

Re:But the real question... (Score:3, Interesting)

Re:But the real question... (Score:2)

Wrong about XP (Score:2, Informative)

good stuff (Score:5, Insightful)

Re:good stuff (Score:2)

Re:good stuff (Score:5, Insightful)

Re:good stuff (Score:2)

It depends... (Score:2)

Re:good stuff (Score:2)

Re:good stuff (Score:3, Informative)

What are you talking about? (Score:2, Interesting)

It's just you (Score:5, Insightful)

Re:What are you talking about? (Score:2)

Re:What are you talking about? (Score:5, Interesting)

Re:What are you talking about? (Score:2)

Re:What are you talking about? (Score:2)

Re:What are you talking about? (Score:2)

Re:What are you talking about? (Score:3, Informative)

Only Threads ? (Score:2, Insightful)

Re:Only Threads ? (Score:2)

51% speed-up! (Score:5, Interesting)

Re:51% speed-up! (Score:5, Informative)

Re:51% speed-up! (Score:2)

Re:51% speed-up! (Score:3)

Re:51% speed-up! (Score:2)

Re:51% speed-up! (Score:2)

Application dependant (Score:5, Insightful)

Concurrency benefits from HT (Score:2)

HT hurt perf (Score:5, Interesting)

Re:HT hurt perf (Score:3, Informative)

Re:HT hurt perf (Score:2)

Re:HT hurt perf (Score:2)

HT Raises performance, depending. (Score:2)

Useful for development? (Score:5, Insightful)

Humph! (Score:4, Funny)

Summary (Score:3, Insightful)

Hyper(Space)Threading (Score:5, Funny)

Re:Hyper(Space)Threading (Score:5, Funny)

Re:Hyper(Space)Threading (Score:2)

Executive summary... (Score:3, Informative)

In other news... (Score:3, Insightful)

But... (Score:2)

Re:But... (Score:2, Insightful)

Re:But... (Score:2)

Benefits!? (Score:2)

In Other News. (Score:4, Funny)

Proof Bogomips are bogus! (Score:4, Funny)

SMP HT (Score:2)

Technical Summary (Score:5, Insightful)

Re:Technical Summary (Score:2)

Re:Technical Summary (Score:3, Interesting)

Expensive HT or cheap real SMP? (Score:5, Insightful)

1.6GHz? (Score:2)

The price comparison to 2 x Athlon-MP? (Score:2)

License issues? (Score:2)

HT is not single-chip SMP (Score:2)