Porting Linux Software to the IA64 Platform 160
axehind writes "In this Byte.com article, Dr Moshe Bar explains some of the differences between IA32 and IA64. He also explains some things to watch out for when porting applications to the IA64 architecture."
size_t (Score:2, Informative)
return (char *) ((((long) cp) + 15) & ~15);
is not portable.
return (char *) ((((size_t) cp) + 15) & ~15);
is much better.
The more things change ..... (Score:3, Informative)
When things were shifting from 16 to 32 bit (seems like just yesterday, oh wait, for M$ it was just yesterday), we had pretty much the same issues. Never had to do any 8 -> 16bit ports (since pretty much everything was either in BASIC, where it didn't matter, or assembler, which you couldn't "port" anyway).
Speaking of assembler, I guess the days of hand crafting code out of assembler is really going to take a hit if IA64 ever takes off. The assembler code would be so tied to a specific rev of EPIC, that it would be hard to justify the future expense of doing so. It would be interesting to see what type of tools are available for the assembler developer. Does the chip provide any enhanced debugging capabilities (keeping writes straight at a particular point in execution, can you see speculative writes too?). It'd be cool if the assembler IDE could automagically group parallelizable (is that a word?) together as you are coding.
Debian on the IA64 (Score:5, Informative)
See here [debian.org] for more details
Re:size_t (Score:1, Informative)
What he doesn't mention, is that most Linux people have gcc, and last time I looked, the object code produced by gcc on IA64 was +20% of the speed of the intel compiler. This isn't a criticism of gcc, it's just that the IA64 arch. is so different that you absolutely _must_ have the intel compiler to get any performance out of it.
NULL barfage (Score:3, Informative)
Re:Awesome! (Score:3, Informative)
Maybe you could try the patches here [debian.org]?
Re:The more things change ..... (Score:4, Informative)
But the example you mention won't actually cause assembly writers any problems: the code won't be tied to a specific version of EPIC.
The IA-64 assembly contains so-called "stop bits", which specify that the instruction(s) following the bit cannot be run in parallel with those before the bit.
Those bits have nothing to do with the actual number of instructions that the machine is capable of handling.
For example, if a program consisted of 100 independent instructions, the assembly would not contain any stop bits. Now the actual machine implementation might only handle 2 or 4 or 8 instructions at a time, but that does not appear anywhere in the assembly. The only requirement is that the machine respect the stop bits.
Now, you might question how it deals with load-value dependencies (ie. load a value into a register, use that register). Obviously, the load and use must be on different sides of a stop bit, but that would still not guarantee correctness. I'm not sure how IA64 actually works (and someone should reply with the real answer) but I imagine that either: a) loads have a fixed max latency, and the compiler is required to insert as many stop bits between the load and the use to ensure correctness, or b) the machine will stall (like current machines).
Either way, the whole point of speculative loads is to avoid that being a problem.
Re:Debian on the IA64 (Score:3, Informative)
Topic for #debian-ia64 is 95.70% up-to-date, 96.07% if also counting uploaded pkgs
There are over 8000 packages for i386 (the most up to date architecture) - ia64 currently has about 7650 or so packages built
More stats are available at buildd.debian.org/stats/ [debian.org]
Re:What's the deal with IA64? (Score:2, Informative)
Look for Sun and/or IBM to be selling 8-way Hammer machines by this time next year, according to my Spirit Guides.
Re:Why can't i386 assembler be used? (Score:3, Informative)
In any case, what makes it difficult to write an IA-64 compiler is taking advantage of the things that the new instruction set lets you tell the processor. It's not hard to write code for the IA64 that's as good as some code for the i386. It's just that you won't get the benefits of the new architecture until you write better code, and the processors aren't optimized for running code that doesn't take advantage of the architecture.
Re:Why can't i386 assembler be used? (Score:4, Informative)
Yes. Compatability. Nothing more. Your old apps will run, but not fast. It's basically a bullet point to try to make the transition to Itanium sound more palatable.
Or is there some factor that means the choice of 32 bit vs 64 bit code must be made process-by-process?
It is highly likely that the procedure to change from 64 to 32 bit mode is a privileged operation, meaning you need operating system intervention. Which means the operating system would have to provide an interface for user code to switch modes, just so a small block of inline assembly can be executed. I highly doubt such an interface exists (ick... IA-64 specific syscalls).
Interesting question: which would run faster, hand-optimized i386 code running under emulation on an Itanium, or native IA-64 code produced by gcc?
An interesting question, but one for which the answer is clear: gcc will be faster, and by a lot. Itanium is horrible at 32-bit code. It isn't designed for it, it has to emulate it, and it stinks a lot at it.
They say that writing a decent IA-64 compiler is difficult, and I'm sure Intel has put a lot of work into making the backwards compatibility perform at a reasonable speed (if not quite as fast as a P4 at the same clock).
Writing the compiler is difficult, but a surmountable task. And your surety does not enhance IA-64 32-bit support in any way. It is quite poor, well behind a P4 at the same clock, and of course at a much lower clock. Even with a highly sub-optimal compiler and the top-notch x86 assembly, you're better off going native on Itanium.
Re:No FP in kernel? (Score:4, Informative)
1/ The massive amount of FP state in IA-64 (128 FP registers). So the linux kernel is compiled in such a way that only some FP registers can be used by the compiler. This means that on kernel entry and exit, only those FP registers need to be saved/restored. Also, by software conventions, these FP registers are "scratch" (modified by a call), so the kernel needs not save/restore them on a system call (which is seen as a call by the user code)
2/ The "software assist" for some FP operations. For instance, the FP divide and square root are not completely implemented in hardware (it's actually dependent on the particular IA-64 implementation, so future chips may implement it). For corner cases such as overflow, underflow, infinites, etc, the processor traps ("floating-point software assist" or FPSWA trap). The IA-64 Linux kernel designers decided to not support FPSWA from the kernel itself, which means that you can't do a FP divide in the kernel. I suspect this is what is more problematic for the application in question (load balancer doing FP computations, probably has some divides in there...)
XL: Programming in the large [sf.net]
Re:What's the deal with IA64? (Score:3, Informative)
What SPEC needs to benchmark is SPECInt-per-$. Considering that commodity Athlons, Pentiums, Celerons and Durons handily beat the extremely expensive Itanic in a straight SPECInt benchmark, what's the advantage of the IA64 performing more efficiently per mhz?
It was very silly of Intel to graft a 386 unit onto the IA64 chip, that's for sure. Fast int ops are important for running databases. They are essential in supporting that 64-bit I/O.
That's been Intel's promise since they announced the chip project many, many, many years ago. They also promised that the chip would be inexpensive. It isn't very fast, it isn't a good value compared to todays 32-bit commodity CPUs.
From what I've read, the Itanic scales in a way very similar to the Hammer -- 8 CPUs at a time and if you want more than you have to run a pipe between each group of eight. Hammer claims a Hypertransport link between each set with a one cycle wait state (Intels simply calls their a pipe), but really, anything more than 8-way is still going to be the realm of POWER4, UltraSparc, etc. IMO. To tell the truth, the Itanic and the X86-64 will have very similar scaleability, the x86-64 is less than half the die size of the Itanic and better performing. It's NUMA setup gives greater throughput between multiple CPUs in an 8-way or less. It may be ugly on the inside, but both CPUs do the about same thing. And one will be faster and a whole lot cheaper. And don't forget AMD's 4-way chipset. The Taiwanese motherboard makers are going to be moving into that space with this chipset. Commoditization.
Well, just take a 32-bit commodity CPU and kludge it to 64 bits, gain about 25% speedup in doing so and SELL IT FOR AROUND $400 maximum and you will quickly see that the Itanic is sinking! Sure the x86 instruction set is lame, but that's the roll of the dice. If the Motorola 68000 had been chosen by IBM for the PC, we would be singing the same tune. I think the x86 instruction set will be around ad infinitum. Just like the accellerator pedal is on the right side, the clutch is on the left and the brake pedal is in the middle. Totally arbitrary, but it somehow stuck.
The Itanic wasn't a piece of crap 5 years ago, but it is obsolete today. Intel raves about its "266mhz" memory bus and its 66mhz-64-bit PCI support. You can get this in a commodity motherboard and two Athlon CPUs for around $600. You can get the Pentium 4 with 133mhz X4 quad-pumped memory bus nowadays. The Itanic's parallel execution method is nice, but why did they wait till the CPU was released before they began making compilers that took advantage of this? Completely useless without the right tools (assuming decent tools can be made).