Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
IBM Software Linux

Linux Gains Support for NUMA 143

soosterh writes "CNet has an article about a NUMA patch from IBM. It says that the improvement adds some support in Linux for nonuniform memory access, or NUMA, a design for higher-end servers with many processors. Linus Torvalds, the original creator of the operating system and still its top authority, accepted the update this month into version 2.5, the current test version of the software."
This discussion has been archived. No new comments can be posted.

Linux Gains Support for NUMA

Comments Filter:
  • I thought that I'd seen some other NUMA stuff in previous runs of 'make menuconfig'-- Can anyone explain what's already there and what this patch adds?
  • Will this help with both 32 bit and 64 bit desktop platforms? It seems that in the future there won't be much distiction between the current server machines and desktops...
    • Re:32/64 (Score:3, Informative)

      by larien ( 5608 )
      I'd imagine it's mainly for 64-bit as that's the kind of systems which tend to ship with NUMA (usually with MIPS or Itanium). Without knowing more, I couldn't comment as to whether it will work under 32-bit or not, but I can't see how it would be so limited.

      Also, I seriously doubt if any desktop machine will use NUMA; it's primarily about systems which use system boards, where there are CPUs & RAM on a board which slots into the system & a CPU can access memory on a local board faster than that on other boards. Desktops tend to use one "system board" (i.e. the motherboard) so there isn't the difference in speed for accessing the data.

      • Re:32/64 (Score:3, Informative)

        by hansendc ( 95162 )
        I'd imagine it's mainly for 64-bit as that's the kind of systems which tend to ship with NUMA (usually with MIPS or Itanium). Without knowing more, I couldn't comment as to whether it will work under 32-bit or not, but I can't see how it would be so limited.

        That is an incredibly naive comment. NUMA systems have been around for quite a while (think Sequent), the current generation of IBM x440 are NUMA. These are all 32-bit Intel architectures.

        This patch didn't even address memory, it only dealt with scheduling processes anyway.
      • the MIPS/Itanium systems the parent refers to are (I assume) the SGI Origin and Altix multiprocessor servers, both 64bit, the first MIPS/IRIX, the second Itanium/Linux:

        Origin [sgi.com]

        Altix [sgi.com]
  • Imagine a beowulf... (Score:4, Informative)

    by Gordonjcp ( 186804 ) on Friday January 31, 2003 @03:01AM (#5194803) Homepage
    Seriously, this is something that will close one of the last remaining gaps between Linux and Solaris. Not that it will do much good for 99% of users out there, but if you need this, you *really* need it.
    • And close the gap between Linux and AIX - which is interesting when you consider:

      A) Where this patch has come from
      B) What the guts of the p690 look like
      C) What the guts of the x440 look like

      Death of AIX predicted. Film at 11.
  • And AMD... (Score:5, Informative)

    by addaon ( 41825 ) <addaon+slashdot@nOsPAM.gmail.com> on Friday January 31, 2003 @03:02AM (#5194808)
    And, of course, also support for the Hammer architecture, which is (smaller scale) NUMA. Each processor in an x86-64 system has its own memory bus, so time to access memory depends on whether the memory is directly connected to a given processor, or whether another processor needs to mediate, the definition of NUMA.
  • --OK, got a noob ram question then. Does this NUMA allow for upgrading total RAM beyond the original specs? Any sort of add-ons? I ask this from noticing it's used because of the physical distances it can access (among others).

    thanks in advance to yon knowledgeable ones
    • I don't think so. What you're asking is (for example) can you now plug a 512 meg module into a 256 meg limited mobo, yep? The answer is no, as this limit is defined by the mobo, though it may be hackable in other ways - but that would be even more OT. Numa is about accessing large sets of parallel memory banks, as others will expound at length elsewhere in the discussion. Essentially, it's of no practical use for playing Tux Racer, so it's OK to ignore it.
      • If the NUMA patch can handle latencies on the order of a few milliseconds, you might be able to use this to safely fool your kernel into thinking you have 120 gigs of RAM (from storage).

        Of course, actually doing this would involve jumping through a few hoops, and I have to think hard to come up with situations why this would be the way to go.

        Gees, any MB I've bought in the last several years can take more RAM than I'm willing to buy for it. I wonder what kind of memory limits the OP was asking about.

        • Re:ram question then (Score:4, Informative)

          by larien ( 5608 ) on Friday January 31, 2003 @07:55AM (#5195362) Homepage Journal
          Er, why use a hard drive as RAM when you can just add loads of swap space? The VM will handle that space more efficiently if it knows it's hard disk rather than RAM.

          However, the main way you might be able to add RAM over and above the MB limit is via some kind of PCI card with DIMMS on it. I'm not sure how that would work over PCI (even 66MHz/64bit) or how it would work at a lower level, but it might get by some limits. The limits OP was asking about may be of the order of trying to get over 1GB of RAM for some simulation code. Of course if you need over 1GB of RAM, buy a system which supports it.

          In any event, from what people are saying, the NUMA patch is a change to the scheduler, to ensure that processes run on the CPU nearest the RAM bank storing the data. I don't think it addresses trying to add RAM from other sources (either disk or hypothetical PCI card)

          • Oh, my god!!! You've just reinvented EMS in 32-bit systems


            Remember those good ol' days?

    • No. You're still limited by the 36 bit PAE addressing scheme on intial ia-32 processors, as all memory is universally adressable, albeit at different speeds.

      We do end up with lots of PCI buses, etc. With careful programming, this gives you a shitload of IO bandwidth.

      Martin J. Bligh.
  • NUMA (Score:5, Informative)

    by Anonymous Coward on Friday January 31, 2003 @03:09AM (#5194831)
    NUMA [webopedia.com]
    Short for
    Non-Uniform Memory Access, a type of parallel processing architecture in which each processor has its own local memory but can also access memory owned by other processors. It's called non-uniform because the memory access times are faster when a processor accesses its own memory than when it borrows memory from another processor.

    NUMA computers offer the scalability of MPP and the programming ease of SMP.
  • by Screaming Lunatic ( 526975 ) on Friday January 31, 2003 @03:18AM (#5194847) Homepage
    I thought there was a feature freeze. There must have been some NUMA code in kernel already and this cleans up all the loose ends.

    Someone correct me if I'm wrong.

    • by greppling ( 601175 ) on Friday January 31, 2003 @03:27AM (#5194872)
      The NUMA-aware scheduler was merged recently despite the feauture freeze. The patch was considered non-intrusive (and safe for non-NUMA architectures). Feature freeze is not code freeze.

      See the good discussion in the LWN article [lwn.net] on this topic.

      • Sort some snow? Eat ice cream too fast? At any rate, you've experienced BRAIN FREEZE.
      • The article also says that "The NUMA software is being worked on and refined in version 2.5, the testing ground for 2.6."
  • NUMA? (Score:2, Funny)

    This article says the code was submitted by Martin Bligh, not Dirk Pitt [google.com].

    Clearly it's a typo. I'll have to e-mail Clive about this.
  • by awptic ( 211411 ) <infinite@@@complex...com> on Friday January 31, 2003 @03:33AM (#5194883)
    Not only is this beneficial for large computers, but also on smaller SMP systems with hyperthreading. On CPU's with hyperthreading,
    it's often faster for a process to reside on the same CPU but not necessary the same 'virtual' CPU when accessing memory.
    And alot of 8way+ systems are NUMA whether or not they are advertised as such.
    • But those virtual CPUs are on the same physical chip with the same cache state so how can their distance to memory be different?
      • That's just it. Imagine a two-way SMP system with Hyperthreading. That's four virtual CPU's, but two virtual CPU's have access to local Level1/Level2 Cache, while the two other virtual CPU's have access to their own local Level1/Level2 cache. Hence : NUMA...

        Of course, a single-CPU HT system has no need for this.
      • But those virtual CPUs are on the same physical chip with the same cache state so how can their distance to memory be different?

        Not different from each other, different from the other virtual CPUs, on other chips.

        If you have two threads, both accessing the same chunk of memory, think about these two options:

        1. Put both threads on different virtual processors of the same chip. That block of memory gets cached once, and both threads get cache hits.

        2. You put each thread on a different physical chip. Now, that block of memory is being cached by both chips at once, which means lots more bus traffic for each modification (each chip needs to tell the other what just happened); meanwhile, both threads are 'fighting' for cache with threads from other programs.

        Clearly, the first is better (more efficient) from a memory point of view: more data cache hits, less traffic between CPUs as they invalidate each other's cache lines. Similarly, it's better for the I-cache (instruction cache): you only cache and execute a single set of instructions, because both threads will tend to use the same code.

        • Actually the HT implementation in the P4/Xeon chips does not act as you suggest in 1. When doing HT the cache is cut in half and each virtual CPU gets a half cache ... which is probably the main reason HT can yeild inferior performance for some applications.

          There is a very good reason for doing it this way. The P4 cache uses VIRTUAL addresses so if each virtual cpu is executing in a different virtual address space(which is allowed) then you need a way to differentiate which cache lines belong to each virtual cpu since they might very well both reference lets say virtual address 0xDEADBEEF which translates into a different physical address (and hence different data). Intel engineers went with the simple solution of splitting the cache in two, instead of adding an extra tag to each cache line which would have created extra overhead/latency on every cache access.

          I apologize for overusing the word virtual ... but I really couldn't help it too much. It just seems to be an overused word in CS/EE.
    • by be-fan ( 61476 )
      Actually, the previous 2.5 scheduler handled hyperthreading just fine. The real draw is that this new patch makes hyperthreading just a subset of NUMA, which makes things much cleaner.
    • P4's at 3.06 GHz and higher sold in the last few months are supposed to support hyperthreading. Does anybody know, though, whether slower P4's also do? K just bought a 1.8 GHz P4, and I'd really like to know whether I can hyperthread on it. (running linux)

      On that note, does one also need a hyperthreading mobo? I've searched, but after reading several linux kernel archives that google pointed me to, I'm still not sure whether my lowly 1.8 GHz P4 can hyperthread.

      • Hyperthreading was apparantly included in the silicon for most if not all P4's, but is physically disabled on all but the Xeon and P4 3.06+ CPU's. There is no way to enable it on your 1.8.
  • For a great adventure story, read some of Clive's books.

    ~S
  • Cool, Disk Pitt (tm) [freeyellow.com] will be using Linux.


    Oh wiat. You mean _that_ NUMA.

    /b

    • Wow. How many Dirk Pitt fanboys so far? /. has the most bizzare tastes.

      All I ever read was the one with Howard Hughes' secret moon base. I think I stopped after I realized Dirk was gonna try and fuck the rich old Jackie-O impersonator. Actually, no, it was after they got on the blimp the second time. I always say you can't have more than one blimp scene in a story. Did Indy go back to the Nazi blimp? Nooo. He moved on to exciting new vehicles. Dirk can't let go of the damn blimp. It's like Hamlet or something. Blimps are Dirk's tragic flaw. They'd probably have been his downfall if I'd finished the story, too. I'll bet he has dreams where the blimp talks to him, the poor bastard.
  • by Bemmu ( 42122 ) <lomise&uta,fi> on Friday January 31, 2003 @04:29AM (#5194994) Homepage Journal
    Thank you for explaining who Linus Torvalds is.
  • Linus Torvalds, the original creator of the operating system and still its top authority.

    You know Slashdot is going "mainstream" when people have to explain who Linus is.
  • Wow, that guy called Linux changed his name to be just like that computer thing!
  • by Anonymous Coward
    Hi,

    SGI must have added NUMA support for their Itanium-based Altix-servers as well. On their web-page [sgi.com] it says: "Enhanced Operating System for High-Productivity Computing" [aka Linux] with "High-performance NUMA support".

    Anyone ever seen a patch for this?

    - jarman

    • SGI's stuff is (I believe) supported by their own 2.4-based tree, and is pretty much unmergable back into mainline.

      This is about stuff that's designed to be non-invasive enough to go into the mainline kernel, less bells and whistles, but a longer term goal, designed to work on every architecture. We'll slowly add features to it as they come up, if they're well tested across multiple arches and make sense.

      Martin J. Bligh.
  • Linus' Acceptance... (Score:2, Informative)

    by sd790 ( 643354 )
    ...can be found here [iu.edu].
  • by nibelung ( 226749 ) on Friday January 31, 2003 @06:35AM (#5195176)
    they are copying Linux related news from CNET.
  • Gee, when you have to -remind- us of Linus' claim
    to fame, it seems a sign that the History of Linux
    must be fading from the modern /.er's mind...

    I really -doubt- that it has...
  • by LJPeixoto ( 130298 ) on Friday January 31, 2003 @07:15AM (#5195250) Homepage
    "More recently, the NUMA scheduler patch has been reworked (by Martin Bligh, Erich Focht, Michael Hohnbaum, and others) around a simple observation: most of the NUMA problems can be solved by simply restricting the current scheduler's balancing code to processors within a single node. If the rebalancer - which moves processes across CPUs in order to keep them all busy - only balances inside a node, the worst processor imbalances will be addressed without moving processes into a foreign-node slow zone. A simple (three-line) patch which did nothing but add the within-node restriction yielded most of the benefits of the full NUMA scheduler; indeed, it performed better on some benchmarks. Real-world loads, however, will require a scheduler which can distribute processes evenly across nodes. Occasionally it is necessary, even, to move processes to a slower node; a lot of CPU time on a lightly-loaded node will give better performance than waiting in the run queue on a heavily-loaded node. So a bit of complexity had to be added back into the new scheduler to complete the job."

    Extracted from:
    http://lwn.net/Articles/20741/ [lwn.net]
  • Linus Torvalds, the original creator of the operating system

    I was waiting for someone to jump all over this...

    T

  • IBM not replacing AIX with Linxu? Yea rigth we really believe that type of FUD...:)
  • Richard Stallman has sent an angry e-mail demanding a name change to GNUMA.
  • I know this bores everyone, but statements like
    Linus Torvalds, the original creator of the operating system and still its top authority
    are why RMS rants endlessly that the environment should be called "GNU/Linux" instead. Please get it right, and credit Linus for the Linux kernel (and a few utilities) only.
  • by slavemowgli ( 585321 ) on Friday January 31, 2003 @10:25AM (#5196234) Homepage
    Contrary to what is said in the post, NUMA support has been in Linux for quite a while already. The recent patches accepted by Linus merely add NUMA awareness to the scheduler, which, while certainly being a prerequisite for Linux being used on production NUMA boxen, is not at all required for NUMA support in general.
  • Since when were Dirk Pitt and Al Giordino kernel hackers? I thought they were marine engineers?
  • I wonder if this isn't some of the technology they snarfed from Sequent? I worked there in the late '80s and they were one of the top Unix technology shops around (or at least we thought so of ourselves at the time).
    • I wonder if this isn't some of the technology they snarfed from Sequent? I worked there in the late '80s and they were one of the top Unix technology shops around (or at least we thought so of ourselves at the time).

      That was probably true in the 80s. It hasn't been true for a while though, I think.

      I used a cluster of Sequent NUMA-Q machines running DYNIX 4.4.2 in 1999 (in fact, I still use them occasionally). I find DYNIX (at least, that version) to be the most annoying Unix implementation I've ever used. It's so antiquated!

      It also had annoying features like telnetd defaulting to stripping the 8th bit of the data stream (turning the UK sterling symbol into a #). Oh, yes, and there was a hard-coded limit on the number of lines "tail" would cope with.

      The support is good though (but I've only used the support since Sequent were bought by IBM).

  • ...now I can put Linux on that 256-node NUMA cluster sitting in my spare bedroom...

    Seriously, though, this is one of the strengths of the GPL and is proof that the Linux kernel can only advance in time as it sucks up more and more features that will never go away (I hope 'refactoring' is in the developers' vocabularies!).
  • by Cyno ( 85911 )
    I want to see this in a cluster. If the kernel can understand NUMA then it should be able to handle a cluster over a network, such as gigabit ethernet, IMO. Craylink is only what 800Mbps/full duplex? But, dammit Jim, I'm a Sys Admin, not a programmer.

    If it were possible to scale a cluster of PCs easily and keep a single concurrent system image there would be no limits to what we're capable of doing. Encode a video in 30 minutes or less, play the latest games with 2 year old hardware, etc.

    Make it so.

Keep up the good work! But please don't ask me to help.

Working...