Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Linux Software

Linux On Another New Architecture: PowerPC 64-bit 131

An unnamed correspondent writes: "This one rather silently whizzed by on the kernel mailing list. IBM reports that they have ported Linux to PowerPC hardware running in 64-bit mode. This no doubt applies only to the larger processors but it's pretty cool all the same." I don't see this processor yet listed on the NetBSD page, even on the mind-bending list of not-yet-integrated ports; is this a first? :)
This discussion has been archived. No new comments can be posted.

Linux On Another New Architecture: PowerPC 64-bit

Comments Filter:
  • I think the impetus for this was their committment to deliver Linux running on the i-series (formerly AS/400). Of course, there's a lot more to bringing Linux to i-series than making it run on the processor (you also have to virtualize primary and secondary memory to map them into a single address space, which takes a bit of doing for OS's that weren't designed that way from the ground up).
  • by Anonymous Coward
    The POWER3 processor is used in a number of RS/6000 and a E-server p-series system: The 44P 170 desktop system [ibm.com] The 44P 270 deskside system [ibm.com] Three types of SP node [ibm.com] The E-server p-series p640 server [http] These systems range from 1-2 processors (44P-170) to a 16 way system (nighthawk II). The p640 is probably the most interesting system, as it is rack mounted and support up to 4 processors. All in all quite a few powerful systems to choose from. The more options the merrier!
  • 64 bit is going to be helpful if you're interested in high precision floating point math. A 64 bit processor can do operations on 64 bit FP numbers as a single operation, rather than 4 operations as you'd need with a 32 bit processor, so there's a big speed up in heavy duty number crunching. That's why you always see people doing really chunky numerical modeling- like predicting the weather- using 64 bit computers instead of cheaper 32 bit ones. Not what everyone needs, mind you, but the people who do need it are willing to pay.

  • ...ibm produces inexpensive ppc workstations ala sun? would ibm rather use their own processors as opposed to intel's, given the issues intel is having with itanium? linux is linux is linux, whether it runs on intel hardware or ppc.
  • I grabbed an "old" Alpha 21164PC 533Mhz/mainboard for $250 a while back. Miles faster than the Multia, and fits in an ATX case. Run it with standard PC hardware (ATA controllers built in). I haven't seen them for that cheap recently, but there's usually an auction on eBay once in a while.

    --

  • And where is a semi-usable UltraSPARC distribution?

    I actually have been running RH 6.2 for some time on this machine - runs like a champ. Often have to recompile software, but no real problems to speak of...

    [chris@gomez chris]$ cat /proc/cpuinfo
    cpu : TI UltraSparc IIi
    fpu : UltraSparc IIi integrated FPU
    promlib : Version 3 Revision 15
    prom : 3.15.2
    type : sun4u
    ncpus probed : 1
    ncpus active : 1
    BogoMips : 665.19
    MMU Type : Spitfire
  • _PowerPC Concepts, Architecture, and Design_ by Chakravarty and Cannon (McGraw Hill, 1994) mentioned the 64-bit architecture.

    --
  • Bitness is very important for the programming model. If (char *) cannot address a byte in a file of a size comparable to the size of a modern hard drive (it is not unreasonable to have such files) it's a major problem - you must either refuse to support large files or use long long everywhere, which is slow without native CPU support.
  • The PPC market seems to have splintered between AIX and the Mac, both declining markets...how long can IBM continue to develop and market an independent architecture when most users and developers have concluded that Intel processors are "fast enough"?

    Given the current trend of consolidation, I see room Intel, AMD, and a high-end player yet to be named - either Alpha or PPC. I'm discounting the Mac userbase in advance as I believe Mac users care the least about the technical details of their platform, and hence constitute an OS market more than a microprocessor market.

  • Neither Sun nor Compaq builds mainframes... not sure what you are referring to...

    --
  • > and the one IBM/Motorola 64-bit PowerPC, the 620, was a horrible flop, coming
    > out six weeks before the internally developed 630.

    The 620 made it int at least one Apple server, iirc. And when it trounced the wintel boxes in a benchmark, the predictable response back was that it wasn't fair to compare a 64 bit machine to 32 bit machines.


    hawk

  • If the PPC group can't get their changes into the linux kernel, (as has been noted on /.), they why does it matter?

  • by hawk ( 1151 )

    > The guys in the server room who clean the crap off the floor get hot
    > and bothered by operating systems. The guys these people clean up
    > after only care about getting the job done right.

    For the most part. But if you told them to run NT on big iron, they would probably get hot under the collar and very bothered :)


    hawk


    :)

  • Ok, I'm sure I do need a clarification (I'm a compiler person, not a hardare person, and have never worked with the Power series). Why not just use multiple cores on a single chip? Last time I heard, that's what Power4 was going to do, right? Also, are you saying that the 8 execution units are split between several separate threads? If so, does anybody know if it's a fixed split (4 for first thread running, 4 for second) or dynamic (which would be surprising, but cool. . .)?
    Thanks, all!
    --JRZ
  • convienient that you omit the only mature 64 bit port that has almost as many distributions available 2nd only to x86. That would be the Alpha Processor. You can buy then new for around 2k and used under 1k.

    Peter
    --
    www.alphalinux.org
  • What set of benchmarks were you running when you collected those numbers? I assume you were using a benchmark suite such as SPEC. This is only one measure, useful to a certain audience.

    For that particular project, I was using the go and cc1 integer benchmarks from the SPEC suite (not sure which year). No special reason; these were just the ones I had on-hand, and for the project it didn't really matter (as I was interested in relative and not absolute results).

    I cite the figures from that project as a reference point to give some idea of the ballpark values that can be expected. A 50% increase in ILP for "average" code I might believe. A 400% increase I wouldn't.

    Yes, certain scientific applications can be written to be easily parallelized, but this is only one niche. For most code, I am deeply skeptical of filling 8 issue units per clock. SMT offers the potential for across-the-board speedup (as long as you're running more than one CPU-bound thread on the machine at once).
  • Why not just use multiple cores on a single chip? Last time I heard, that's what Power4 was going to do, right? Also, are you saying that the 8 execution units are split between several separate threads? If so, does anybody know if it's a fixed split (4 for first thread running, 4 for second) or dynamic (which would be surprising, but cool. . .)?

    The multiple cores idea has been around for a while, and certainly works; SMT is just more resource-efficient.

    My impression is that the Power4 is going to have two cores, but I haven't been following it closely, so I could easily be wrong about that.

    In a SMT system, functional units are indeed shared dynamically between the threads. As far as most of the chip's concerned, there's only one instruction stream, composed of interleaved instructions from the two threads (well, not interleaved in lockstep, but close enough). All you'd need to do would be to add an extra bit onto the register specifier tags (so that the two threads access non-overlapping sections of the register file) and give each thread its own page table identifier (selected by a few bits tacked on to the address). You could even get away with having a single TLB cache.

    In summary, you can keep most of the design the same as for a single-thread machine, and make relatively minor changes in a few places to implement SMT. This takes far less silicon than dual cores, and lets you use the functional units more efficiently and use a wide issue unit efficiently (by boosting parallelism in the instruction stream).
  • It's all well and good that IBM has ported linux to the powerPC, but when are they going to port linux to the system that we've all been waiting for? I mean, I don't know how much longer I can wait to run linux on Natalie Portman. I can't wait to show her my uptime; the 'touch' command alone would make me happy... I mean, can you imagine a beowulf cluster of those? Ohhh mamma!
    This message was encrypted with rot-26 cryptography.
  • One of us must be mistaken here... I've always thought that the general "bitness" of a processor referred to the width of the data bus which can be vastly different from its address bus. There is no abvious (to me) reason while a 32bit CPU cannot address more than 16GB of memory provided its address bus is sufficiently wide. But then again my CPU knowledge is limited to 16bit CPUs so take this with a grain of salt as I generally have no clue what I'm talking about.
  • "I am not a smart man" as a friend of mine is fond of saying, but I believe that it can best be explained this way. Let us say that we have two processors identical in every way save one is 32 bit and one is 64 bit. A majority of the work of a cpu is not in calculation but rather data transfer. A fetch takes 2 cycles while a write also takes 2 cycles. If we were to write 128 bits of information to disk from memory it would take 8 cycles for the 64 bit machine and 16 cycles for the 32 bit machine. End result: a 32 bit machine has to be clocked 2X the 64 bit machine to keep up in this sort of senario.
  • The PowerPC has a 64-bit mode, added for the AS/400, that supports the 64-bit single address space (really cool concept...basically combine your RAM and DASD into a really big virtual memory space, where the program can assume that something is always in memory, and the kernel takes care of making sure that it's in RAM when needed).

  • http://slashdot.org/article.pl?sid=01/02/27/145820 3&mode=thread
  • Good way of looking at it, but lots of other things matter as well:

    pipeline efficiency, memory bus bandwidth, smp cache coherency efficiency.

    If you don't need >4GB of address-space then you're probably better off with a high-clock 32-bit chip and a good memory bus
  • power3 = the other 64-bit PowerPC implementation. Ignore previous comment.
  • by mike260 ( 224212 )
    Playstation2 is *kind of* 64bit:
    sizeof(int)==4
    sizeof(long)==8
    sizeof(void*)==4 (IIRC)
  • So if IBM can cut server OS development and maintenence costs by 50% by having much of the work done by the Linux community, that increases their profit margins. And it also benefits the Linux community, since they'd be developing and maintaining Linux anyway, and this adds IBM and IBM server customers to the people who have an interest in helping develop and maintain Linux.

    I think that this precisely underlines why Free/Open Source software is such a great idea. When you share, everyone wins. Getting more people onto the platform increases the development effort much less than the support base, so the average effort per user is less. As long as IBM is truly sharing by adding some effort into the system rather than leeching off everybody else, bringing them onboard helps everybody. Admittedly, adding new platforms as IBM is doing is more effort than more users on already supported platforms, but there's also potentially more benefit. Adding users with different needs adds new features (which is why it's more effort), but many of them will provide trickledown benefits to other users who wouldn't necessarily have been willing to develop them by themselves. And IBM is playing fair by putting in the development effort of adding those new platforms and features themselves rather than demanding that others do the work.

  • by Anonymous Coward
    You've been reading slashdot way to long if you need to see a conspiracy in everything. IBM is interested in having Linux available on all of its server platform and has commited to that. This included the former Netfinity which I most assuredly predict will include Itanium. (Either this or possibly Athlon's sledgehammer will be the processor for Windows which unfortunately customers demand.)

    You have to understand that IBM has traditionally been a hardware company but services really drive revenue and profit now. Sun hasn't stepped up to Linux the way that IBM, Compaq, and HP has so IBM is hoping to pick up some of the sales and services as Linux momentum picks up.

  • I've used GNU/Linux since early '98, and I haven't posted to USEnet since 1996, sooo.... what good are your figures?

    Seriously, most of us have better things to do with our time than chat in newsgroups or IRC (I finally gave up MUDding in 1997, and I still miss it...), like work for a living!

    Sheesh. This doesn't even qualify as FUD; your 'logic' is ^severaly^ flawed!

  • Hot off the presses tonight:

    Maybe the folks who write the Slashcode would find it helpful.

    I've posted this here before, but don't want the IBM folks who might be reading to miss it:

    Comments, criticism, additional links and resources to add, suggestions for future articles to write and of course articles you would like to write are appreciated.

    I could also use some help from someone with expertise in designing database schemas.

    Thank you for your attention.


    Mike [goingware.com]

  • In summary, you can keep most of the design the same as for a single-thread machine, and make relatively minor changes in a few places to implement SMT. This takes far less silicon than dual cores, and lets you use the functional units more efficiently and use a wide issue unit efficiently (by boosting parallelism in the instruction stream).

    I'm not sure I buy this argument. It is far easier to duplicate an existing design to make another core than to modify the design to support SMT. SMT requires big fetch/decode/rename hardware, a big register file and probably big caches, too. All of this stuff is in the critical path and will be difficult to run at the high clock speeds expected from modern cores.

    Actually, unless you want to take a moderate speed hit from recycling external bus protocols internally, you'll have quite a bit of design work on your hands building the internal communications bus for a multi-core system. Whether this is comparable to the amount of work needed for a SMT system is an open question, but it's definitely not negligible.

    The fetch/decode hardware is a straight duplication of the existing hardware - it doesn't take up any more space than for two duplicated cores.

    The architectural register file is in two independent banks; again, no more space than you'd have normally. Your physical register file is probably in the form of distributed reservation stations on the functional units; again, no more space than for duplicate cores.

    You do need more bandwidth on your result busses if you want to use more functional units at once, but this holds true for an aggressively-superscalar single-thread processor too. This is a manageable problem. If necessary, you can trade off bandwidth and latency when building the thing, because your execution stream is much less sensitive to latency than it would be with a single-thread machine.

    Renaming hardware is manageable. You'd need the bandwidth anyways on a wide-issue single-thread processor with the same issue rate.

    Caches will have less locality, which can be partly addressed by the operating system (keeping related threads on the same die), but will still be a problem. I don't think this will kill performance. You can pull tricks like having multiple cache banks to fake multiporting, or you can pipeline the cache and run it at a higher clock speed (as you don't mind _some_ extra latency), or do a number of other things. We're reaching the point of diminishing returns with cache size anyways, so you'll probably still have enough cache to effectively handle both threads.

    Re. clock speed, again, you don't mind a _moderate_ amount of extra latency, because you have enough parallelism to reschedule around it. You also can get away with a smaller instruction window, because you won't have to work as hard to find independent instructions. This saves latency in the scheduler.

    In summary, while you raise legitimate concerns, I don't think that they'll be significant problems.
  • Actually, unless you want to take a moderate speed hit from recycling external bus protocols internally, you'll have quite a bit of design work on your hands building the internal communications bus for a multi-core system.

    I don't know exactly what you mean by "recycling external bus protocols." Certainly you'd want to design the core interface to be efficient in a single-die environment, but it seems to me that the fundamental protocols (cache coherence, etc.) are the same. But I'm not an MP expert, so you probably have more insight on this than I do.

    The fetch/decode hardware is a straight duplication of the existing hardware - it doesn't take up any more space than for two duplicated cores.

    Not true. The SMT has to worry about fetch policy. This is not a trivial problem to solve. Starvation is a real concern here. Two independent cores don't need to worry about the fetch stream of one interfering with that of the other.

    Decode is not a trivial problem, either. IA32 has bug problems with this. My (and others') guess is that's why a trace cache was put on the P4. It's decode cache!

    Then there are problems with a large, muti-ported L1 instruction cache. Or interleaved fetch, which gets back to the first problem.

    The architectural register file is in two independent banks; again, no more space than you'd have normally.

    Except for the additional routing logic and wiring.

    Your physical register file is probably in the form of distributed reservation stations on the functional units; again, no more space than for duplicate cores.

    First off, pet peeve of mine. This is not directed to you personally, but to the computer architecture community in general. It's function (or execution, etc.) unit, not functional unit. I would hope all our execution units are functional. :)

    As for the physical file, distributing it implies a non-uniform register acces time a la the 21264. It's not an impossible problem to handle but there is a penalty with such a large file. Register caching can help with this and it may not be a large concern in the end. More study is needed in this area.

    If necessary, you can trade off bandwidth and latency when building the thing

    Certainly. This is why engineering is fun. :) The point of my post is that SMT is not a guaranteed win. It might be beneficial in some situations, but not all. I'm not sure it's justified for a POWER-class machine.

    Renaming hardware is manageable. You'd need the bandwidth anyways on a wide-issue single-thread processor with the same issue rate.

    Ah, but there is no single-thread machine that has the bandwidth of the proposed SMT schemes. Building a 4-way machine is a challenge. 8-way should be much more challenging.

    We're reaching the point of diminishing returns with cache size anyways, so you'll probably still have enough cache to effectively handle both threads.

    That's not true on server-class machines. Capacity is still a problem. This is why we see superlinear speedup on MP systems. The extra cache on each chip makes the machine as a whole run faster than you would expect, given the number of processors. SMT takes this distributed cache and puts it in one big array. This will slow it down in one way or another.

    Re. clock speed, again, you don't mind a _moderate_ amount of extra latency, because you have enough parallelism to reschedule around it.

    Eh? ILP has nothing to do with cycle time, save the impact on cycle time that complex O-O-O hardware can have. One cannot "get around" a slow clock through scheduling. Pipelining can be used, but that has its own costs.

    You also can get away with a smaller instruction window, because you won't have to work as hard to find independent instructions. This saves latency in the scheduler.

    Hmm...maybe. But you argue above that any extra latency in the system can be masked by the O-O-O engine. To me, this implies a larger window. Eventually one thread or another is going to get backed up waiting on memory. When that happens you have to have room available to fetch from the other threads. Has anyone done any studies of instruction queue utilization in SMT? I'd like to know how often the queue is full and how big it has to be to sustain execution. I seem to recall some of the work out of U. Wash. doing this, but I can't find the reference at the moment.

    All this is not to say that STM is worthless. Far from it. In fact, the fast thread context switching allows some super-cool techniques not previously possible. I'm trying to temper the enthusiasm for SMT a bit. Think of me as a Devil's Advocate. :)

    --

  • Dunno how the latest snapshots are doing, but historically, the ia32 gcc compiler has produced *really* lousy code for 64-bit long long data.
    So much so, that Linus has shunned use of 'long long' in the kernel as much as possible. This is rather a pity. I hope that someone improves this in the future.

    Tim
  • Right now the top non-clustered TPC-C score [tpc.org] is held IBM's s80 system. TPC-C [tpc.org] (not SPEC) is considered by many to be the most important server benchmark.

    The system from the benchmark report has 24 RS64-IV 64-bit processors running at 600 Mhz with 96GB (yes, GB) of system DRAM. Each processor has 128kB L1 data cache, 128kB L1 instruction cache, and a 16MB L2 cache. The chips also support course-grain multithreading (simpler, but similar to SMT).

    (600 Mhz sounds slow until you realize that it uses a simple, very efficient 5-stage pipeline. Intel and others achieve high clock rates through deep piplines and rely on branch prediction and other techniques to keep the pipe full. Branch mispredictions and cache misses can kill the actual performance of these chips on real server code.)

    This system with 24 processors outperforms HP's 48 processor "SuperDome" and Sun's 64 processor EU10k (though the UE10k is an old system by now, it is the fastest server Sun is shipping.)

    The above system is not using the Power3 chip from the posted story. You can bet IBM will port Linux to this beast next. We won't see a 24 processor systems with Linux right away, but an s80-like system would make a sweet 4-processor Linux server.

    One last note: these systems are not vapor-ware. A 12-processor system with an earlier version of the same processor has been shipping since the summer of '98.

  • You're comparing one SMT CPU to two regular CPUs (on the same chip or not.) Certainly, two CPUs will perform better than a single SMT CPU (for most, but probably not all, tasks.) So what's SMT good for? It doesn't cost nearly as much as two whole CPUs. The idea is to get quite a bit more performance for a bit more cost. You say "SMT is not a guaranteed win", but relative to a single non-SMT CPU on the same amount of silicon, I think's a almost always a win. (Unless you only have one process that needs to run fast, so multithreading won't help you much).

    From what I've seen, SMT is based on the idea that "We have all these execution units to supply peak demand, but they go unused a lot." You add more of everything else, and share the execution units. It's unlikely that both threads will be making peak demands at the same time, so you can probably keep them satisfied. It's a good way to get the average throughput up. This is like ethernet, where the bandwidth is shared, but not everybody uses it all at once, so it works.

    I also want to reply to some of your specific points:

    Decode is not a trivial problem, either. IA32 has bug problems with this. My (and others') guess is that's why a trace cache was put on the P4. It's decode cache!

    Decode is pretty much trivial on anything but an IA32. They have this problem because they have to run an old instruction set that wasn't designed to be easy for hardware like we have now to deal with. New designs try to be good compiler targets, and to make the CPU's job easy. Most instruction sets are like the decoded x86 instructions that are generated internally on modern IA32 processors.

    Eh? ILP has nothing to do with cycle time, save the impact on cycle time that complex O-O-O hardware can have. One cannot "get around" a slow clock through scheduling. Pipelining can be used, but that has its own costs.

    Of course you'd use pipelining! Pipelining is has a similar goal to SMT: Keep more of the hardware busy more of the time, to increase total work/time. Since run time is the quantity of interest when measuring how fast a computer is, ILP and scheduling have everything to do with cycle time. You can trade off one to get the other, and have a CPU that gets the job done in the same amount of time.
    Time = instructions * CPI * clock period
    So raising the clock speed (smaller clock period) has the same effect as decreasing the average number of cycles per instruction (more ILP).

    As for window size, remember the law of diminishing returns. Two small windows on two threads will find more instructions to run than one large window on one thread.

    All this is not to say that STM is worthless. Far from it. In fact, the fast thread context switching allows some super-cool techniques not previously possible. I'm trying to temper the enthusiasm for SMT a bit. Think of me as a Devil's Advocate. :)

    Cool. Just remember that we're not claiming an SMT cpu can do the work of two whole CPUs. As I said, it's an idea in the same category as pipelining. One pipelined CPU with 5 stages probably isn't as good as 5 non-pipelined CPUs (as long as there are five tasks to keep all the CPUs busy. If there's only one job to do, and the compiler didn't parallelize it, the single pipelined CPU will be faster). However, a pipelined CPU only takes a bit more silicon, and gives a big (but not quite x5) speedup.

    You probably figured out some of that before I said it, but I hope helped :)
    #define X(x,y) x##y

  • You're comparing one SMT CPU to two regular CPUs (on the same chip or not.)

    Yes, but I'm assuming equal (or as near equal as is practically possible) execution resources.

    So what's SMT good for? It doesn't cost nearly as much as two whole CPUs.

    See, now that's where I disagree. In terms of raw transitors you are right. But you are forgetting the cost of design and slippage in time-to-market. It is less costly to design and fabricate an MP design from previously designed cores than it is to take such a core and modify it for SMT.

    Realize that I am not saying this should never be done. Just that more evaluation is needed. Some of that evaluation will involve silicon, probably in the consumer market.

    but relative to a single non-SMT CPU on the same amount of silicon, I think's a almost always a win.

    If you look at raw transistor count, I might agree with you. I depends greatly on the architecture and the expense of the duplicated vs. shared resources of the two designs (decode logic, etc.). SMT has a cycle time impact and you have to balance that against the extra transistors required for a CMP.

    You add more of everything else, and share the execution units.

    This is an argument I've never understood. The execution units are such a tiny, tiny part of the die that I don't see much benefit in sharing them. Sharing the decode/O-O-O logic seems more beneficial, but even an SMT requires more of that (in terms of bandwidth).

    Decode is pretty much trivial on anything but an IA32.

    Sure about that? The POWER architecture is pretty complex. Even on a MIPS-like machine, the rename and dependency logic complexity rises rapidly with fetch/issue width. As does the wakeup logic with larger instruction windows.

    ILP and scheduling have everything to do with cycle time. You can trade off one to get the other, and have a CPU that gets the job done in the same amount of time.

    Time = instructions * CPI * clock period

    So raising the clock speed (smaller clock period) has the same effect as decreasing the average number of cycles per instruction (more ILP).

    My point is that raising ILP with SMT can decrease the clock speed (increase the cycle time). So you get more ILP but everything runs more slowly. A CMP with an equivalent number of contexts should get the same ILP without the extra cycle time penalty (ignoring messaging and coherence overhead). SMT does not make any one thread run faster. It increases throughput, which is exactly what a CMP does.

    Pipelining is not a panacea, either. There is a limit to how deep you want to make your pipe. This is one of the reasons good branch prediction is so important -- it allows a longer pipe.

    Remember also that cycle time is pretty much the only thing companies have to market, everything else (i.e. number of threads) being equal. Is this unfortunate? Clearly. But it is reality and engineers need to deal with it when making design decisions. Which reminds me to plug the excellent book The Soul of a New Machine [barnesandnoble.com] by Tracy Kidder, a fascinating account of Data General's race to kill the VAX. There's a bit of discussion devoted to market times and perfect designs.

    Two small windows on two threads will find more instructions to run than one large window on one thread.

    A CMP has two small(er) windows for two threads. An SMT has one big window for two threads. Two smaller windows should run faster. Whether they find more ILP is an open question. SMT does have the advantage that it can trade off window space, etc. between threads. I think this is more difficult to do than most people realize due to the challenges with fetch policy.

    In any event, I am extremely curious to see what happens with the SMT chips coming out. Let's sit back and see if they make it! :)

    --

  • Try finding an old Multia. They have 166 MHz Alpha processors and generally cost around $100 on eBay.
  • Here's some acronyms for you, which should clear up the mess:

    POWER == Performance Optimised With Enhanced RISC
    PowerPC == POWER for Personal Computers

    The PowerPC was developed as a cut-down (32-bit instead of 64-bit and lacking a few rarely-used and complex instructions), largely binary-compatible version of the POWER.

    PowerPC isn't really any particular processor, but a specification, which was first implemented as the PowerPC 601 back in 1994 (remember how it totally wiped out the Pentium-75?). Subsequently, embedded versions have been made, along with more powerful desktop versions of the PowerPC - the 603, 604, 750 (G3), 7400 (G4) and now the 7450 (G4+).

    Meanwhile, the POWER has been developed as well, remaining a high-end 64-bit monster for the enterprise-level RS/6000 machines. The PowerPC 601 was based more on the POWER1 than anything else, the chip shown in the log is a POWER3, and the current hot topic is the POWER4 with all these nice new features (one or two of which have reportedly already made it into the 7450...).

    The bottom line is that the POWER and the PowerPC are different but surprisingly similar beasts. They are nearly binary-compatible, which is why the kernel reports it as a PowerPC-class processor.
  • by Cirvam ( 216911 )
    What value does this have? I don't know of too many systems that use that processor. Was there a demand for it in the community or was IBM just scratching an itch?
  • So if the last two are correct, does this mean we've won? Has IBM waved its wand and made Linux mainstream? Heres one hopin' happy human.
    :-)
  • Tell me...my school has one and the bastards use it for a *mail server*. Physics can't get dick out of "Office of Information Systems" because they all have MBAs and think Win2k is *the* answer. Oh the shame!
  • is when are they going to make swanky 64+bit hardware I can actually afford? :)
  • I don't see this processor yet listed on the NetBSD page, even on the mind-bending list of not-yet-integrated ports; is this a first? :)

    Well, not exactly. Perhaps some of us remember MkLinux? Apple ported that to the older Nubus based Macs, something nobody else seems to have done with any other OS. So IBM porting Linux to a chip they designed? Whoo hoo. Yay Linux. They could've just as easily ported NetBSD or any other operating system to it, if they had the inclination (which, obviously, they don't given their latest Linux kick). Now, if they'd ported Mac OS X to the PPC64, that'd be something to write home about. =-)
  • I'll bite, only because the parent comment somehow managed to be modded up.

    Linux was the first OS ever to boot on Itanium. (*bsd not there).

    Where are the Itanium computers? This port isn't of much use to nearly everyone.

    Linux was first on PPC64. (*bsd not there).

    Where are the PPC64 computers?

    Linux was first free OS on S/390. (*bsd not there.)How many people own an S/390?

    Linux was first on UltraSPARC.

    And where is a semi-usable UltraSPARC distribution?

    Heck, all of these ports require much hand-rolling. And you also mentioned hardware which the vast majority of people here have never even touched or seen- have you?

    Proof of concept ports, and ports that aren't deployed anywhere in the real world: these aren't of much use, regardless of if the port is of a Linux or a BSD.

  • by carlfish ( 7229 ) <cmiller@pastiche.org> on Sunday March 04, 2001 @12:00PM (#385455) Homepage Journal
    Off the top of my head:
    • Products such as Websphere can be released on one OS platform (Linux) and run on IBM's entire range of hardware.
    • Linux has a lot more "geek momentum" than AIX. The guys in the server room would probably be much more excited to get a kick-ass RS/6000 if it meant they could stick Linux on it.
    • It gives them something to talk about in their upcoming advertising campaign.
    • IBM is a hardware company. To them, software is a way to sell hardware. If Linux is popular, then it's in IBM's interest to make sure it runs on their most expensive kit. They'd rather sell an RS/6000 than a Netfinity. (this also explains their porting it to S/390 first)
    • I wouldn't be surprised if the long term plan was to fold the enterprise functionality of AIX into Linux, have the OS maintained by the open source community with much less IBM manpower than AIX takes, and then put AIX out to pasture.
    Charles Miller
    --
  • Actually AIX was first on PPC64.

    Also most people cannot just buy new hardware, that is why NetBSD is ported to the VAX, but why not LINUX?
  • Datacenter can address 64 GB via Intel's Processor Address Extension hack (basically allows 34bit addresses via "dual address cycle" hardware (can pump in two 32 bit addresses in once clock cycle).

    It performs pretty well for a kludge but does require your application to use the MS AWE (address windowing extensions) memory allocation api's which have some restrictions, such as only providing page fixed memory and only allowing you to dealloc in the same unit you alloc'd (so writing dynamic memory handling is not easy))

    The increased address space is cool if your o/s has a good (fast, influenceable) vm manager - you can strip out buffer mgt code from your app (reduces complexity)

    Also great for server apps that do lots of read io as you can buffer even at large concurrent user workloads, so can see Real/Oracle/Akamai type apps benefiting
  • I have a number of rs/6000s at work, running AIX currently. It's nice to know that as soon as I don't need AIX (read: as soon as I don't need V4 CATIA or CATIA is ported to GNU(!)) I can run GNU/Linux on those bad boys!

    For those of you who haven't used a late model rs/6000, I highly recommend them. Those chips are darned fast!

  • by barneyfoo ( 80862 ) on Sunday March 04, 2001 @12:04PM (#385459)
    Where are the Itanium computers? This port isn't of much use to nearly everyone.

    Itanium represents the first commodity 64bit enterprise computing platform. A major advance if you ask me (regardless of performance), and linux will be there first, along with SCO, and win2k bringing up the rear.

    Where are the PPC64 computers?

    Ever hear of Power3 and Power4, and AIX? 'nuff said.

    How many people own an S/390?

    I think the count of people that use S/390 is far less inportant than the importance of those people. S/390 has no peer in its class as a mainfraim. Sun's starfire comes close.

    And where is a semi-usable UltraSPARC distribution?

    Debian has a semi-usable distribution for Ultra Sparc. I beleive they have Xfree working, among other things, along with the trivial ports that just require a linux kernel

    Proof of concept ports... these aren't of much use...

    Needless to say, I disagree.
  • The little thread this started [appwatch.com] noted discrepancies in the number of CPUs reported in the bootlog (4, 8, and 1). There are 4 CPUs in the machine, it supports 8, and the native 64bit Linux port supports 1. But the 32bit Linux port (emulates 32bit on the Power3) supports SMP. I'd be interested to see a performance comparison between the 64bit native and 32bit emulation kernels. :)
    Of course, I assume SMP will be arriving sometime shortly.
  • Yes, but if you don't artificially exlude [sic] clustered results because you don't like what they say...

    One of the well-known problems with TPC-C is that it uses a hierarchical system where all the data is part of a particular warehouse and only a small percent of transactions need to access any cross-warehouse data. The little intra-warehouse communication required is evenly distributed, so load imbalance is not a problem.

    For this reason, the TPC-C benchmark can run more efficiently on a cluster of servers than many real world OLTP setups. From what I've been told, real clustered DBMS setups suffer from load imbalance and are even more difficult to setup and tune than 'single instance' systems. The majority of OLTP systems in the field don't use clusters for performance.

    That said, one of the best uses of small clusters is to provide fault tolerance and high availability of data, but that is a different setup than the 'clustered' TPC-C results you mentioned.

  • I said early 90s, not 1991. You're telling me that Linux was stable and usable in 91? I never ran into *anyone* running a server doing serious work on Linux before 95.

    -lx
  • All I can do here is give my personal perspective, and point out that this is all based on a very small part of the original post I wrote.
    I'm a long time BSD user, who's also tried the top 10 Linux distros at various points. The only ones I even came close to liking were Slackware and SuSE. SuSE is very commercialized, and it's hard to get ISOs for, so I don't mess with it much.

    But I do know that the various times I've played with Linux over the years, it's proven to be quite a bit less stable(first time I installed redhat, it locked up after the first reboot, redhat, caldera, debian and corel all failed to install and boot on a rather flaky p166, whereas FreeBSD did flawlessly), far behind in terms of package management, i.e., rpm(I know about apt, and I think it's cool, but that's one Linux distro, and one I don't dig on much), and fragmented(even though this is what people say about the BSDs, the different distros are very dissimilar, and are quite large in number).

    These are the things that have made me stick with BSD over the years. I get my OS from a central location, worked on people in an actual team environment with democracy and accountability, released under a license which is truly free, is easy to get ahold of and install(many linuces didn't have install over ftp for years, some still don't), they have great package management, great performance, great support, and great stability. I'm not saying there's no place for Linux, but given the reasons that I've just mentioned, why would I want to use it?

    Anyhow, that's all just my opinon, and you expressed yours. What I *do* take issue with is the "never caught up" bit. In what way is any given BSD distrobution not equal or superior to Linux?

    -lx
  • That sounds about right, as I said, it was a long time ago...
  • SGI's, on the other hand, was redesigned from the ground up (starting with the gcc parser for compatability) to use all of the neat, theoretical tricks that you need to get ILP in this situation. TurboLinux has already gone with it and demonstrated good results (that's one reason why NCSA will be using Turbo for the second stage of their huge, new cluster).

    This sounds a bit like the apache patches SGI did for the Accelerated Apache project. The patches were submitted but aren't (apparently) going into apache.

    My concern is that as more and more code is given to us, will we refuse it becuase we didn't write it? And would that be a bad thing?
  • Segment registers, ala 80286...
  • Well, folks...
    I know that PowerPC is subset architecture of the Power. ( not Power2, Power3, Power4 )
    I'm not sure if current PowerPC processors have binary compatibility with current Power processors. Although the begining of PowerPC is similar to the Power, I don't think PowerPC architecture is identical to Power architecture.

    You can't say that Z80 is x86. ( Z80 is said to have binary compatibility with x86. )

    Does AIX for RS/6000 with PowerPC processors run on RS/6000 with Power2, Power3 without modification? I'm not sure of it.
    Is there any person who tried it?
  • Right, but who said these chips aren't targeted at servers? That's what I'll be using them for :)
  • Except for the server market where linux is the only platform to grow faster(24%) than MS(20%). As long as we can "falter" our way to the front of the pack, then lets "falter".
  • I'm not sure you understand the market IBM's Power4 chips are aimed at. These chips are two processors on a die, 1GHz, but they'd smoke anything Intel or AMD have even if they were running at 250MHz. They can have incredibly low yields and still make a profit, because these chips are freaking expensive ($10-20k for the chip, I think, although you wouldn't buy one outside an IBM RS/6000). Intel and AMD don't reach that high end, so they have to have the commodity market to prop up their higher priced offerings (Xeons are where the bulk of the profit is). IBM is able to skim off the top, selling a small number of units integrated with all their own hardware and software, and make plenty of money. These are not chips 99% of users will ever have to use or care about.
  • Isn't the scalability and adaptibility of linux wonderful! :) Way to go IBM
  • 1) Q: Where are the Itanium computers?
    A: Still vapour.

    2) Q: Where are the PPC64 computers?
    A: Any late-model RS/6000. I have quite a few at work; don't you? *grin*

    3) Q: How many people own an S/390?
    A: A great many large corporations. Duh. What could anyone do with one in their living room! For that matter, how many people owned VAXen when they were new? Again, corporate hardware.

    4) Q: And where is a semi-usable UltraSPARC distribution?
    A: Try SuSE; you'll be glad you did.

  • Debian has a semi-usable distribution for Ultra Sparc. I beleive they have Xfree working, among other things, along with the trivial ports that just require a linux kernel

    dont forget mandrake. they also have a version for sparc: ftp://fr.rpmfind.net/linux/Mandrake-iso/sparc/ [rpmfind.net]

    although it is beta. i must confess that i have never used it, but i would think it's at least semi-useable.

    use LaTeX? want an online reference manager that
  • by Trepalium ( 109107 ) on Sunday March 04, 2001 @12:20PM (#385475)
    eh? The 'bittiness' of the CPU rarely has anything do with floating point capabilities. The Intel x86 line all have the ability to use 80-bit floating point numbers (10 bytes). In fact, it was because of this the [in]famous FPU memory move was created for the Pentium processors -- it was faster to move memory into the FPU registers and then out back to memory than it was to use the usual movsd instructions to do the same, because via the FPU you moved 8 bytes (64 bits) at a time, whereas with movsd, you were only moving 4 bytes at a time. On the Pentium Pro and Pentium II, they finally fixed this by the use of write combining so that movsd'ing a block of memory was as fast or faster than doing it via the FPU. The numbers of bits generally refers to one of two features of the CPU -- either it's bus, or the size of the general purpose registers and address space. The Intel Pentium for example, had a 64-bit bus, but still only 32-bit registers and memory space. The Intel 80386SX had a 16-bit bus, and 32-bit registers.
  • this is kinda offtopic and pedantic but...

    IBM haven't ported Lunix (A UNIX implementation for the Commodore 64/128) to the 64-bit PPC platform. They ported Linux. Get It Right.

    Sorry... someone had to do it though :)
  • Power3/4 compatiblity would be very nice for me. I can run Linux and OS/400 at the same time on the same machine. This means I do not have the need to buy other hardware, like pc-servers to provide a total solution from databases and mail with OS/400 to file, print and webserving with Linux. The hardware based on the powerseries chip is scalable to a maximum of 24 processors, so one machine does it all!
  • by vipw ( 228 )
    hehe thanks :)

    i was referencing some Jeff K stuff.

    i suppose the moderators are gone by now, here's the link ;)
    USAR FREINDLEY [efront.com]

    and this strip in particular [efront.com]

  • The ancient Sinclair QL used a 68008, which could handle 32 bit addresses, and thus 4GB of memory, but only had an 8-bit combined address and data bus. It'd take 4 bus clocks to select an address, and another four to read/write a 32 bit value from/to the location.

    I'm sorry to correct you, but the 68000 series of microprocessors does not have a combined address / data bus. The 68008 had a separate address bus of 20 or 22 pins, so it could directly address 1 or 4 MB of memory (depending on package)
  • Ok, I'll bite.
    First, /. is biased, biased in favor of Linux, biased in favor of Open Source (whatever that really means).
    Second, /. likes to stir up controversy, with the result that the commentary is usually much more interesting and informative than the linked articles.
    Third, for anyone coming from outside, the flame wars help identify the major players, and occasionally even some useful informations.

    There seems to be some kind of natural progression from Windoze to Linux to BSD, with a number of people running a mix. My idea of "World Domination" is half the destops running OpenBSD, but them I've got a warped sense of humor ;-)
  • "Unlike a typical PC microprocessor, the chip features eight execution units fed by a 6.4 gigabyte-per-second memory subsystem, allowing the POWER3 to outperform competitors' processors running at two to three times the clock speed"

    Eight execution units! I recall that the x86 line have half of that. And 6.4Gb/s memory is not to be laughed at either!


    Memory bandwidth is a good thing. Low latency cache hits are great thing, if you can get them (no idea if PPC does this or not).

    However, adding more execution units won't buy you much beyond a fairly small number. The reason: you just don't have that much extractable parallelism in the serial instruction stream.

    I had the good fortune to be playing with this recently via simulation. If you give the processor a *huge* instruction window (256 instructions) and the ability to execute *any* number of instructions of *any* type in parallel (except for memory accesses - see below), you still get an average Instructions Per Clock of about 2.1-2.2. 95% of the time, you're getting four instructions or fewer issued (and most of the time, far fewer than that).

    When SMT is put in silicon, wider issue will become practical (due to increased parallelism in the instruction stream), but as it is, you're better off spending the silicon on other improvements.

    Re. memory accesses; the reason why it's extremely difficult to do memory accesses out-of-order with each other is that you have to check to see if any given two memory accesses refer to the same location (indicating a dependence). You often don't know what the target address is until late in the pipeline, and you'll still need to do a TLB translation to get the physical address, and compare two large bit vectors (the addresses).

    Remember, to be useful for scheduling, you have to be able to do all of this very quickly and very early in the pipeline.

    All of this makes out-of-order memory accesses very difficult to implement theoretically, and a nightmare to implement in real silicon. It's still sometimes done in a limited manner, but this doesn't affect the IPC very much.
  • Oh, the awful irony! Seeing cubes on desks where bulky PC AT computers once sat!
  • "The difference is that NetBSD is an a complete Operating System, not just a kernel."

    Wow, your right. All this time I have been interfacing my kernel by hand. You OS-bigots were right all along! What I need is an "operating system". I feel so bad that I have wasted all this time using a kernel when I could have been using a "cool" operating system. Sign me up!
  • by Anonymous Coward
    Unfortunately, there is demand for 64-bit PowerPC processors out there now. IBM will not sell these chips outside of the company as a general rule. It is because these processors where designed by the RS6000 group, and not the microelectronics group. Further, Apple and Motorola have repeatedly turned there backs on a 64-bit implementation, and the one IBM/Motorola 64-bit PowerPC, the 620, was a horrible flop, coming out six weeks before the internally developed 630. IBM developed these processors as follow on to the 604 processors, and only for the RS6000 and AS400 lines. This move is more in line with the migration to move RS6000 away from being AIX centric and towards Linux. Linux is an ideal convergance point for IBM. IBM at this time has 6 OSes they support throught the company, OS2, Win9X, WinNT, AIX, OS390, and OS400. Merging all there different hardware platforms, Netfinity, RS6000, S390, and AS400, on to one common software platform seems ideal to them, and seems consistant with there other porting efforts reported on slashdot such as the very cool S390 ports. I for one would love to see IBM sell these processors to outside vendors, but until that day comes.....
  • Right. Let's take a look at this.

    Itanium: Linux has an Itanium emulator written specifically for it, by Intel, I believe. That makes it kind of easier. Besides that, BSD does boot on the Itanium, even though they were severely impeded by lack of tools.

    PPC64: It was ported by a corporation, fuckwit. A corporation with more resources than a non-profit organization could ever put towards porting to a platform, porting Linux to run on their own hardware, whereas NetBSD is an independent effort. They can't just run out and get a PPC64 box for themselves.

    S/390: Same story.

    UltraSPARC: Both run on UltraSparc, but I don't know dates of when they first booted, or the extent of Linux/Sparc support. This might actually be...a *relevant point*!

    And then you call this stuff "mainstream, state of the art hardware". For all but the UltraSPARC, it's impossible for a normal person to even lay hands on one of those machines. Even in the case of corporations, how many do you know that are running Linux on IBM boxes instead of AIX? Why the hell would anyone want to, seeing as how AIX generally outperforms it anyhow?

    In any event, how about high-end hardware that people can actually buy? NetBSD was the first to be running on the Alpha, for instance, a high performance platform that actually matters. First on SGI boxes. How about i386, the architecture everyone uses? In the early 90s, NetBSD was far more complete and usable than Linux, and to this day has very complete hardware support for the platform. One could also point out that Linux has been lagging behind on new technologies, like IPv6. Might want to take that into account when you're tallying up the final "Score".
  • read the second line down, or so, they say just to go to netbsd anyway.
  • It's called AIX 5l, and is the next (and now late) release of AIX. The core feel is still AIX (I'm sorry I know this will cause argumetns, but AIX has about the best LVM I've used, plus it's ls.. commands I miss on other unices) but with linux libraries in addition to the AIX ones, allowing for the easier porting of freeware and shareware.
  • Yes. I've used the same install media for a couple of winterhawk nodes (POWER2), a couple of 44P270s (POWER3) and an elderly C20 (Control workstation PPC), with no problems at all.
  • There's a list of most of the currently supported architectures available here [linux.org.uk], mentioning the architectures actually in the kernel tree, and some that aren't.
    Of course, this is not all of them, S/390 is even missing.

    And uLinux runs on architectures like the DragonBall, and other things too. I don't know of a complete list anywhere.

  • There's another list here [wanadoo.es], with some other ports mentioned, that a quick google search turned up.
  • Actually even the width of the address bus isn't necessarily a limiting factor. The ancient Sinclair QL used a 68008, which could handle 32 bit addresses, and thus 4GB of memory, but only had an 8-bit combined address and data bus. It'd take 4 bus clocks to select an address, and another four to read/write a 32 bit value from/to the location.

    Ouch!
  • If your general purpose registers are 32 bits (which is the definition of a 32-bit CPU) and addresses are 64 bits, where do you store the pointers? That's why in most recent chips, pointers are the same size as the integer registers.
  • IBM's current p680 box (up to 24 Power3 IV cpus) does implement a kind of multi-threading already it is too coarse to be called SMT, but it is multi-threading. As it is now, each processor 'presents' itself as two cpus to the OS. They say that it took less than 5% of the chip real-estate to support this multi-threading. If you look at their benchmark results on Spec and TPC, it seems to have paid off quite well.
  • had the good fortune to be playing with this recently via simulation. If you give the processor a *huge* instruction window (256 instructions) and the ability to execute *any* number of instructions of *any* type in parallel (except for memory accesses - see below), you still get an average Instructions Per Clock of about 2.1-2.2. 95% of the time, you're getting four instructions or fewer issued (and most of the time, far fewer than that).

    Yes. You average 2.1-2.2, and 95% of the time you're only getting 4 or fewer. However, when you look at the other stuff the Power3 architecture includes, it's pretty obvious what the overall intent is:

    Hardware Loop Unrolling.

    IBM has got some customers that use some serious CPU. We're talking national labs and the like. For them, the ability to run 8 of those neat 'multiply-and-add' instructions per clock cycle is quite an important feature.

    The chip *starts* at 375mz, and can do 16 floating point ops/clock (an amazing amount of code uses that mult-and-add over an array - and the IBM compilers are smart enough to detect and convert divide to multiply-by-inverse and add/subtract issues).

    And of course, IBM is hoping that even though the big SP/2 iron is limited to national labs and Fortune-500 companies (see The Top500 List [top500.org] for details), that they'll be able to sell a lot of the smaller 43P deskside boxes (1-4 Power3 CPUS) and the 8-16 CPU rackmount servers, to all the smaller companies that need number-crunching.

  • Well, a 64-bit integer solves the Y2037 bug inherent in Unix.

    ...and a 64-bit integer can be manipulated on a 32-bit machine - and even fairly conveniently, if the compiler cooperates, and GCC does cooperate here (think long long int).

  • I was arguing with a friend a couple of days ago on the merits of BSD vs. Linux, and while he rattled off the list of CPU architectures that NetBSD supports (obviously not off the top of his head), I was unable to find a central listing of CPU's supported by Linux.

    My question is, is there such a page updated with such info? I don't believe that Linux Torvalds maintains all different architecture branches..

    Thanks!

    r. ghaffari
    (25/M/Baltimore, MD)

  • As in almost any area where theres money to be earned Big Blue is in there with some really cool hardware.

    Taken from:

    http://www.rs6000.ibm.com/resource/pressreleases/1 998/Oct/power3.html [ibm.com]:

    "Unlike a typical PC microprocessor, the chip features eight execution units fed by a 6.4 gigabyte-per-second memory subsystem, allowing the POWER3 to outperform competitors' processors running at two to three times the clock speed"

    Eight execution units! I recall that the x86 line have half of that. And 6.4Gb/s memory is not to be laughed at either!
  • by vipw ( 228 )
    IBM can build big-ass proprietary servers and deploy them for customers while still using standard software products. Big deal for IBM since lunix is now a well respected server operating system. Easy to port software to and easy to market.

    So, you can see this as yes IBM is scratching an itch, but at the same time making lunix more available in the high-end enterprise environment.
  • Linux and/or NetBSD being ported to yet another chip is nice, but not really big time news.

    These are platforms I'd like to see some porting being done for:

    Other people's brains (remote access, perhaps X-10 integration)

    Garage door opener (so I can apply sound themes to it, and replace the "so 1991" screeching)

    Alarm clock (see garage door opener)

    Pets (obediance school in any C, Perl, or any language you want).

  • by Anonymous Coward
    Without passing value judgement...

    The PowerPC running in 64-bit mode will help them get Linux up-and-running on the eServer iSeries (that platform has been 64-bit for longer than just about any other major server). It allows them to funfill their goal of getting Linux running across all the eServers.

  • My concern is that as more and more code is given to us, will we refuse it becuase we didn't write it? And would that be a bad thing?

    In a word, yes. If the "Not Invented Here" syndrome becomes rampant, then large corporations will have less incentive to build improvements upon the system, and therefore, will start again with either a fork or a potentially closed-source proprietary system. Either way, support for the original open-source system will wither away, reducing the potential for corporate uptake.

    However, this is just a generalisation; I cannot comment specifically upon Apache itself.


    --
  • The PowerPC has a 64-bit mode, added for the AS/400, that supports the 64-bit single address space

    The 64-bit PowerPC architecture antedated the RISC AS/400's, as far as I know - as I remember, I saw a PowerPC architecture manual describing 64-bit mode before the RISC AS/400's came out (it was some time in 1994, I think, when I saw it).

    The PowerPC 620 was supposed to be the first 64-bit PowerPC; I don't know whether any machines shipped with it. IBM now have 64-bit PowerPC's in both the AS/400 and RS/6000 machines (I think some of the RS/6000's use the same chip as some of the AS/400's, with the tag bits and other AS/400 extensions disabled in the RS/6000's).

  • I completely understand your point and agree, but if I may offer an example or two of Slashdot reporting:

    http://slashdot.org/article.pl?sid=01/02/21/145023 4&mode=thread [slashdot.org]
    "it looks like NetBSD could give Linux a run for its money in the handheld arena."

    http://slashdot.org/bsd/01/02/05/1859221.shtml [slashdot.org]
    " 'Linux 2.4.0 is available for no money. So is FreeBSD. Linux uses advanced hardware, so does FreeBSD. FreeBSD is more stable and faster than Linux, in my opinion. "

    Basically the precident is that it is acceptable to be inflammatory as long as your aren't Linux. A majority of the articles comparing BSD and Linux do so on a well-known point, stress under high loads. Notice that Slashdot does not post articles comparing native application support, user-base, or multi-processor support. Posting of such articles or comments will likely be considered inflammitory.

    In posting this in now way am I trying to start a flamewar. However, I do feel Slashdot holds a double standard in how it treats BSD remarks, especially on the front page. Being immature and biased is useless, regardless of OS choice. Thoughts?
  • IBM doesn't want to see Linux ported to the Itanium, but not the PPC, in 64 bit mode. If that were to happen, it would pretty much kill support for the chip outside of things like the AS/400.

    Far better for them to put the work into ensuring a stable port to their new chip. Now all they need to do is to wait for Intel to put out a sickly version of the Itanium (like they did with the first release of the P4).
    --

  • you can check http://www.kernel.org/ for a list and also you can look into the latest kernel source, check the directory /usr/src/linux/arch.
  • by rgmoore ( 133276 ) <glandauer@charter.net> on Sunday March 04, 2001 @11:35AM (#385524) Homepage

    This is probably IBM anticipating. After all, just because there's no demand now doesn't mean that there won't be demand when the system is available. Getting the system ready ahead of demand is smart; it means that when people running PPC want more horsepower, IBM will be able to provide them with a nice smooth path to 64 bit PPC. This looks like it's just a regular part of IBM's Linux strategy. They want to make it available everywhere, so companies can upgrade to more and more powerful systems without having to relearn everything.

  • by Cef ( 28324 )

    Something that I can see this having a use for, is for boxes that are going to soon (very probably) reach an 'end of life' in the IBM OS camp. True there is Project Montery, but some of these 64 bit machines could definately end up lying in the dust. Especially with a whole new OS and designers that may decide that the earlier systems are too 'hard' to support easily.

    I'd much rather see IBM make sure that something decent still runs on older boxes, than having no shipping OS at all that will support such platforms.

    And I'm not saying IBM will drop support for these platforms anytime soon, but it's much easier to get the thing started now, than later, when it could be too little, too late.

    HP's PA-RISC's are a prime example. Linux runs on these, as various numbers of the machines fell into the hands of Linux developers that have experience in kernel code/porting. Unfortunately the machines never really made it, but at least the people out there have something that runs and is probably better supported than what you would otherwise get. It was the thing, just too late in the game. At least IBM aren't making the mistake of leaving people lingering with unusable hardware.

    Also remember that IBM has a considerably large Second Hand group that reconditions trade-in systems and then sells them to more disadvanged groups or countries - and if they don't have an OS to ship on those machines, what do they do with them?

    Of course, what contributions they get from some of the truely bright sparks in the community with the Linux port, may actually improve the way their own code monkeys write/implement their next OS kernel, which they can only view as a win-win situation.

  • Where do you get your figures (everybody does not know that, yadda, yadda...)

    Don't fall into the arrogant assumption of thinking one architecture is enough for everything. You wouldn't want just VGA graphics now would you?

    The 370 architecture is alive and quite well, thank you, and processing payroll, accounting and other mundane crap that you can't live without.

    But you wouldn't want to have to write a game for a 3279 terminal now would you. No more than you'd want to bank with somebody who'd balance your accounts on a PS2.

    The PPC architecture is alive and well and the G4 is very useful for some types of processing and totally useless for other things but what it does, it does damn fast.

    The x86 is as much of a dead-end as the z80. It will be utterly swamped by the requirements of voice processing and image recognition that a wired economy needs. Forget passwords. Just say your name and smile for the cam. (And that's only the first app. The one at the gate, so to speak. )
  • Doing this kind of parallelism extraction in the compiler just plain makes more sense than doing it on the chip (with SMT). The compiler can see all of the source code at once and spend a huge amount of time studying the problem (which can be extremely complicated, if you want a really good, inter-procedural, flow-sensitive analysis) and then spit out code that bundles it up in explicit parallelism.
    That's exactly why IA-64 is going to kick the crap out of other architectures 3 years down the line (once the compilers actually get good). And it's also why RedHat is screwing over their users by sticking with gcc over SGI's (GPL'ed) IA-64 compiler. As the first author said, conventional compilers only 2.X IPC at best. So two-thirds of the Itanium execution units are wasted when you use a compiler like gcc. SGI's, on the other hand, was redesigned from the ground up (starting with the gcc parser for compatability) to use all of the neat, theoretical tricks that you need to get ILP in this situation. TurboLinux has already gone with it and demonstrated good results (that's one reason why NCSA will be using Turbo for the second stage of their huge, new cluster). But gcc is Cygnus' baby, and they will fight to keep using it, no matter how badly it hurts performance in the end.
    --JRZ
  • Doing this kind of parallelism extraction in the compiler just plain makes more sense than doing it on the chip (with SMT). The compiler can see all of the source code at once and spend a huge amount of time studying the problem which can be extremely complicated, if you want a really good, inter-procedural, flow-sensitive analysis) and then spit out code that bundles it up in explicit parallelism.

    You seem to have an incomplete picture of what SMT is.

    SMT - Symmetrical Multi-Threading - is simply the ability to have multiple threads running on a chip at the same time, with separate fetch units and register files but with the instruction window and the functional units still shared.

    The threads don't even have to be from the same program, or in the same address space (though it'll reduce TLB and cache load if they are).

    No extra effort is needed on the part of the programmer, and you get N times as much instruction level parallelism with N threads as you would for one thread. In one instruction stream, you'll always have dependencies that can't be avoided - true dependencies. Parallel threads don't have any shared dependencies for register operations, and are much less likely to have dependencies for memory operations (under most conditions).

    A compiler, on the other hand, has to be made extremely complex to extract much more parallelism than is currently extracted, and still won't be able to capture a lot of it. I know this far too well, having seen the guts of compilers on a few occasions. You'll also get no benefit for legacy code or for code that was compiled with a mediocre compiler (as almost all code is, to Intel's continuing dismay).

    SMT is especially nice because there's almost no extra hardware overhead for implementing SMT. It's a winning strategy from all angles.
  • Well, a 64-bit integer solves the Y2037 bug inherent in Unix.

    +++

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...