Forgot your password?
typodupeerror
Software Linux

Linux x32 ABI Not Catching Wind 262

Posted by Soulskill
from the try-a-bigger-sail dept.
jones_supa writes "The x32 ABI for Linux allows the OS to take full advantage of an x86-64 CPU while using 32-bit pointers and thus avoiding the overhead of 64-bit pointers. Though the x32 ABI limits the program to a virtual address space of 4GB, it also decreases the memory footprint of the program and in some cases can allow it to run faster. The ABI has been talked about since 2011 and there's been mainline support since 2012. x32 support within other programs has also trickled in. Despite this, there still seems to be no widespread interest. x32 support landed in Ubuntu 13.04, but no software packages were released. In 2012 we also saw some x32 support out of Gentoo and some Debian x32 packages. Besides the kernel support, we also saw last year the support for the x32 Linux ABI land in Glibc 2.16 and GDB 7.5. The only Linux x32 ABI news Phoronix had to report on in 2013 was of Google wanting mainline LLVM x32 support and other LLVM project x32 patches. The GCC 4.8.0 release this year also improved the situation for x32. Some people don't see the ABI as being worthwhile when it still requires 64-bit processors and the performance benefits aren't very convincing for all workloads to make maintaining an extra ABI worthwhile. Would you find the x32 ABI useful?"
This discussion has been archived. No new comments can be posted.

Linux x32 ABI Not Catching Wind

Comments Filter:
  • no (Score:4, Insightful)

    by Anonymous Coward on Tuesday December 24, 2013 @07:24PM (#45778949)

    no

    • Catching Wind

      LOL

    • Re:no (Score:5, Insightful)

      by mlts (1038732) on Tuesday December 24, 2013 @09:27PM (#45779627)

      For general computing, iffish.

      For embedded computing where I am worried about every chunk of space, and I can deal with the 3-4 GB RAM limit, definitely.

      This is useful, and IMHO, should be considered the mainstream kernel, but it isn't something everyone would use daily.

    • Re:no (Score:5, Insightful)

      by GPLHost-Thomas (1330431) on Wednesday December 25, 2013 @06:16AM (#45781123)
      Well, I do find it extremely useful. Especially in Debian & Ubuntu, we have multi-arch support. For some specific workload using interpreted languages, it just reduces the memory footprint by a half. For example, PHP and Perl. If you once ran Amavis and spamassassin, you certainly know what I mean: it takes double the amount of RAM on 64 bits. Since most of our servers are running PHP, Amavis and Spamassassin, this would be a huge benefits (from 800 MB to 400 MB as the minimum server footprint), while still being able to run the rest of the workloads using 64 bits: for example, Apache itself and MySQL, which aren't taking much RAM anyway compared to these anti-spam dogs.
  • Subject (Score:2, Insightful)

    by Daimanta (1140543)

    With memory being dirt cheap I ask: Who cares?

    • Re:Subject (Score:4, Insightful)

      by mellon (7048) on Tuesday December 24, 2013 @07:48PM (#45779103) Homepage

      Memory? What about cache? Is cache dirt cheap?

      • by Bengie (1121981)
        Yes and no. The larger your cache, the higher its latency. Can't get around this. L1 caches tend to be small to keep the execution units fed with typically 1 or 2 cycle latencies. L2 caches tend to be about 16x larger, but have about 10x the latency.

        L2 cache may have high latency, but it still has decent bandwidth. To help hide the latency, modern CPUs have automatic pre-fetching and also async low-priority pre-fetching instructions that allow programmer to tell the CPU to attempt to load data from memor
    • Re:Subject (Score:5, Interesting)

      by KiloByte (825081) on Tuesday December 24, 2013 @07:49PM (#45779105)

      For some workloads, it's ~40% faster vs amd64, and for some, even more than that vs i386. For a typical case, though, it's typical to see ~7% speed and ~35% memory boost over amd64.

      As for memory being cheap, this might not matter on your home box where you use 2GB of 16GB you have installed, but vserver hosting tends to be memory-bound. And using bad old i386 means a severe speed loss due to ancient instructions and register shortage.

      • by Tim12s (209786)

        That seams reasonable advantage. If it could take me from 60K tps to 100K tps per blade its a no-brainer. I doubt its going to allow office/home application to run any noticeably quicker but with a blade centre of 16 blades, I'll want to get my monies worth before needing to expand.

      • by mwvdlee (775178)

        ~35% memory boost is quite nice if you're running memory-bound multithreading processes; each thread being relatively light on CPU% but uses lots of memory.
        I run a webserver where one of the batch jobs is exactly that. ~35% memory boost would be very close to ~35% increase in throughput.

    • Re:Subject (Score:4, Interesting)

      by Evan Teran (2911843) on Tuesday December 24, 2013 @07:49PM (#45779107) Homepage

      It's not just about "having enough RAM". While that certainly is a factor, it's not the only one. As you suggest, pretty much everyone has enough RAM to run just about any normal application with 64-bit pointers.

      But if you want speed, you also have to pay attention to things like cache lines. 64-bit pointers often means larger instructions are needed to be encoded to do the same work, larger instructions means more cache misses. This can be a large difference in performance.

    • Desktop memory is cheap but ECC server memory can be very expensive
      • by haruchai (17472)

        Damn straight. Just spent $1000 for used 16 4GB sticks of HP DDR3 ECC registered memory; that's considered a bargain. New sticks would be $120 each.

      • Re: (Score:2, Insightful)

        by Anonymous Coward

        ECC memory is artificially expensive. Were ECC standard as it ought to be, it would only cost about 12.5% more. (1 bit for every byte) That is a pittance when considering the cost of the machine and the value of one's data and time. It is disgusting that Intel uses this basic reliability feature to segment their products.

        • That's right. Unfortunately it's called the market. The same boneheads that says x32 isn't worth it, are the same boneheads which have no idea how ECC is important, how hard it is to properly code everything worrying about cache hits is. Probably people that never wrote a single line of C or assembly code.

          But the Intel way of making the same physical hardware cost 50% more (with a simple on/off switch) will continue until ARM Cortex start giving intel some real competition (at least competing with the lates

    • This is what Apple needs for it's silly 64 Bit mobile processors & OS.

    • Re:Subject (Score:4, Informative)

      by Reliable Windmill (2932227) on Tuesday December 24, 2013 @08:45PM (#45779391)
      You've not understood this correctly. x32 is an enhancement and optimization for executable files that do not require gigabytes of RAM, primarily regarding performance. It has nothing to do with the availability or lack of RAM in the system, or how much RAM costs to buy in the computer store.
    • The other applications running on your system who also want to use memory and are programmed by people who don't care about resource utilization. Or the other VM, or the 8 other VMs.

      Processor power, memory and disk space should be considered like earth's natural resources. They're in limited supply and should never be wasted no matter how available they may seem at the present.

    • Re:Subject (Score:5, Informative)

      by Forever Wondering (2506940) on Wednesday December 25, 2013 @12:03AM (#45780287)

      With x32 you get:
      - You get 16 registers instead of 8. This allows much more efficient code to be generated because you don't have to dump/reload automatic variables to the stack because the register pressure is reduced.
      - You also get a crossover from the 64 bit ABI where the first 6 arguments are passed in registers instead of push/pop on the stack.
      - If you need a 64 bit arithmetic op (e.g. long long), compiler will gen a single 64 instruction (vs. using multiple 32 ops).
      - You also get the RIP relative addressing mode which works great when a lot of dynamic relocation of the program occurs (e.g. .so files).

      You get all these things [and more] if you port your program to 64 bit. But, porting to 64 bit requires that you go through the entire code base and find all the places where you said:
          int x = ptr1 - ptr2;
      instead of:
          long x = ptr1 - ptr2;
      Or, you put a long into a struct that gets sent across a socket. You'd need to convert those to int's
      Etc ...

      Granted, these should be cleaned up with abstract typedef's, but porting a large legacy 32 bit codebase to 64 bit may not be worth it [at least in the short term]. A port to x32 is pretty much just a recompile. You get [most of] the performance improvement for little hassle.

      It also solves the 2037 problem because time_t is now defined to be 64 bits, even in 32 bit mode. Likewise, in struct timeval, the tv_sec field is 64 bit

      • Re:Subject (Score:4, Informative)

        by TheRaven64 (641858) on Wednesday December 25, 2013 @06:15AM (#45781119) Journal
        The C standard does not guarantee that sizeof(long) is as big as sizeof(void*). The type that you want is intptr_t (or ptrdiff_t for differences between pointers). If you've gone through replacing everything with long, then good luck getting your code to run on win64 (where long is 4 bytes).
  • Eh? (Score:4, Insightful)

    by fuzzyfuzzyfungus (1223518) on Tuesday December 24, 2013 @07:27PM (#45778973) Journal
    If I wanted to divide my nice big memory space into 32-bit address spaces, I'd dig my totally bitchin' PAE-enabled Pentium Pro rig out of the basement, assuming the rats haven't eaten it...
  • Some people don't see the ABI as being worthwhile when it still requires 64-bit processors

    There's your answer. If I'm writing a program that won't need over 2GB, the decision is obvious: target x86. How many developers even know about x32? Of those, how many need what it offers? That little fraction will be the number of users.

    • by loufoque (1400831)

      This way you'll be able to make it magically much faster when building it for x32 or amd64.

    • Some people don't see the ABI as being worthwhile when it still requires 64-bit processors

      There's your answer. If I'm writing a program that won't need over 2GB, the decision is obvious: target x86. How many developers even know about x32? Of those, how many need what it offers? That little fraction will be the number of users.

      Wait, what are you talking about? "target x86" Wat? Are you writing code in Assembly? How do you target C or higher level code code for x86 vs x86-64, or ARM for that matter?

      Ooooh, wait, you're one of those proprietary Linux software developers? Protip: 1's and 0's are in infinite supply, so Economics 101 says they have zero price regardless of cost to create. What's scarce is your ability to create new configurations of bits -- new source code -- not the bits. Just like a mechanic, home builder, burg

  • Nice concept (Score:3, Insightful)

    by Anonymous Coward on Tuesday December 24, 2013 @07:34PM (#45779025)

    I do not see many cases where this would be useful. If we have a 64-bit processor and a 64-bit operating system then it seems the only benefit to running a 32-bit binary is it uses a slightly smaller amount of memory. Chances are that is a very small difference in memory used. Maybe the program loads a little faster, but is it a measurable, consistent amount? For most practical use case scenarios it does not look like this technology would be useful enough to justify compiling a new package. Now, if the process worked with 64-bit binaries and could automatically (and safely) decrease pointer size on 64-bit binaries then it might be worth while. But I'm not going to re-build an application just for smaller pointers.

    • by mjrauhal (144713)

      You misunderstand the desired impact. "Loads a little faster" doesn't really enter into it. It's rather that system memory is _slow_, and you have to cram a lot of stuff into CPU cache for things to work quickly. That's were the smaller pointers help, with some workloads. Especially if you're doing a lot of pointery data structure heavy computing where you often compile your own stuff to run anyway.

      Still not saying it's necessarily worth the maintenance hassle, but let's understand the issues first.

    • Re: (Score:3, Informative)

      by maswan (106561)

      The main benefit is that it runs faster. 64-bit pointers take up twice the space in caches, and especially L1 cache is very space-limited. Loading and storing them also takes twice the bandwidth to main memory.

      So for code with lots of complex data types (as opposed to big arrays of floating point data), that still has to run fast, it makes sense. I imagine the Linux kernel developers No1 benchmark of compiling the kernel would run noticably faster with gcc in x32.

      The downside is that you need a proper fully

      • by sribe (304414)

        So for code with lots of complex data types (as opposed to big arrays of floating point data), that still has to run fast, it makes sense.

        Well, here's the problem. Code that is that performance-sensitive can often benefit a whole lot more from a better design that does not have so many pointers pointing to itty-bitty data bits. (For instance, instead of a binary tree, a B-tree with nodes that are at least a couple of cache lines, or maybe even a whole page, wide.) There are very very few problems that actually require that a significant portion of data memory be occupied by pointers. There are lots and lots of them where the most convenient

      • by Rockoon (1252108)

        64-bit pointers take up twice the space in caches, and especially L1 cache is very space-limited.

        L1 cache is typically 64KB, which is room for 8K 64-bit pointers or 16K 32-bit pointers. Now riddle me this.. if you are following thousands or more pointers, what are the chances that your access pattern is at all cache friendly?

        The chance is virtually zero.

        Of course, not all of the data is pointers, but that actually doesnt help the argument. The smaller the percentage of the cache that is pointers, the less important their size actually is, for after all when 0% are pointers then pointer size cannot

    • by LWATCDR (28044)

      Simple.
      It is just as fast.
      Takes less drive space.
      Uses less memory.
      As to rebuilding apps it should be just a simple compile and yes while memory is cheap it is not always available even today. What about x86 tablets on Atom? I mean really does ls need to be 64bit what about more?

  • by 93 Escort Wagon (326346) on Tuesday December 24, 2013 @07:37PM (#45779045)

    The maintainer(s) find it interesting, and they're developing it on their own dime... so I don't get the hate in some of these first few posts. No one's forcing you to use it, or even to think about it when you're coding something else.

    If it's useful to someone, that's all that matters.

  • It's not only RAM (Score:5, Informative)

    by jandar (304267) on Tuesday December 24, 2013 @07:41PM (#45779071)

    The company I work for compiles almost all programms with 32 bits on x86-64 CPUs. It's not only cheap RAM usage, it's also expensive cache which is wasted with 64 pointer and 64 bit int. Since 3 GB is much more than our programms are using, x86-64 would be foolish. I'm eager waiting for a x32 SuSE version.

    • by Ecuador (740021)

      I don't get it. x86-64 doubles the general purpose and SSE registers over x86. This alone makes a (usually quite big) difference even for programs that don't use 64bit arithmetic. The point of the x32 ABI as I understand it is to keep that advantage without having 64bit pointers.
      But you just compile with 32bits losing all the advantages of x86-64?

  • by bheading (467684) on Tuesday December 24, 2013 @07:43PM (#45779081)

    The idea makes sense in theory. Build binaries that are going to be smaller (32-bit binaries have smaller pointers compared with 64-bit) and faster (because the code is smaller, in theory cache should be used more efficiently and accesses to external memory should be reduced).

    But I suspect the problem is that the benefits simply outweigh the inconvenience of having to run with an entirely separate ABI. I doubt the average significant C program spends a lot of time doing direct addressing, and as such I suspect the size benefits of using 32-bit pointers is overstated.

    • by mysidia (191772)

      But I suspect the problem is that the benefits simply outweigh the inconvenience of having to run with an entirely separate ABI.

      Well; if the benefits outweigh the inconvenience --- then it seems x32 should be catching on more than it is.

      Personally I think it is a bad idea because of the 4GB program virtual address space limit; which applications will be frequently exceeding, especially the server applications that would otherwise benefit the most from optimization.

      • by bheading (467684)

        Oops, I meant the other way round. The inconvenience outweighs the benefit.

      • by tlhIngan (30335)

        Personally I think it is a bad idea because of the 4GB program virtual address space limit; which applications will be frequently exceeding, especially the server applications that would otherwise benefit the most from optimization.

        You're making an assumption that the 4GB limit is prohibitive. For some applications, it could be - databases and scientific processing, and definitely games. But there are plenty of other applications that won't really benefit from the enlarged address space - would a word proc

    • and faster (because the code is smaller, in theory cache should be used more efficiently

      Your skill is Not enough. when you blow registers onto the stack the code crawls. x86-64 has more registers. Code compiled for is far faster than x86 because of the extra registers. The L1 cache is how big on your CPU? Is your binary MEGABYTES in size? If your code is jumping all over the digital universe generating cache misses then you're purposefully doing something more idiotic than this universe should care about.

  • It depends on the delta. There are still many 32bit problems out there, and there are plenty of cases where having extra performance helps. If you have enough of the right size problems you could even reduce the number of systems that you would need.

    It looks like it could allow packing a single system tighter with less wasted resources.

    Reducing the footprint of individual programs could also have some benefits from system performance / management, especially in tight resource situations.

    One minor drawback

  • by KiloByte (825081) on Tuesday December 24, 2013 @07:56PM (#45779145)

    debootstrap --arch=x32 unstable /path/to/chroot http://ftp.debian-ports.org/debian/ [debian-ports.org]
    Requires an amd64 kernel compiled with CONFIG_X86_X32=y (every x32-capable kernel can also run 64 bit stuff).

  • This could have a home on smart phones. A smaller memory footprint is *key* on smartphone apps.

  • There's plenty of applications around still without a 64 bit binary. From what I understand this layer just allows 32 bit programs to utilize some performance enhancing features of 64 bit architecture. It seems a genuinely good idea.

    • by cnettel (836611)

      There's plenty of applications around still without a 64 bit binary. From what I understand this layer just allows 32 bit programs to utilize some performance enhancing features of 64 bit architecture. It seems a genuinely good idea.

      It allows 32-bit programs, which are *recompiled*, to benefit from those features. You still need the source and x32 builds of all dependencies. However, sometimes I guess there could be porting issues due to pointer size assumptions (but no other hard assumptions of x86 ABI behavior). Those codebases could not be recompiled for x64, but might port to x32 more easily.

  • x32 would have been nice as the first transition away from x86-32, but memory needs keep increasing, and we are far too used to full 64-bit spaces. In fact, it feels like we're finally over with the 32-64 bit transition, and people no longer worry about different kinds of x86 when buying new hardware. So introducing this alternative is a needless complication. As others have pointed out, it's too special a niche to warrant its own ABI.

    • It's not a complication, it's an enhancement. A majority of software does not need a 64-bit address space and can thus be streamlined while still getting the benefits of doing fast 64-bit integer math, among other things. Obviously you just select the target when compiling and that's that, it's like enabling an optimization, so what are you talking about?
  • General question about x32 ABI: is the OS still can use more than 4GB RAM w/o penalties? IOW, is kernel still 64bit? Only userspace is x32? Or x32 and pure 64-bit can run alongside?

    Anyway. Most performance-sensitive programs went 64-bit anyway - since RAM is cheap and there are bunch of faster but memory-hogging algorithms.

    • by mjrauhal (144713)

      The kernel needs to be an amd64 one for x32 to work, at least as things stand now. The most common situation would _probably_ be an amd64 system with some specialist x32 software doing performance intensive stuff. (Or possibly a hobbyist system running an all-x32 userspace for the hack value.)

      Yeah, working with big data is unlikely to benefit, and data _is_ generally getting bigger.

    • Of course the OS is still 64-bit in that regard, it's just the address space of that particular application which is reduced to 32-bit to streamline it. The majority of all executable files do not require several gigabytes of RAM, hence it makes sense to streamline their address space.
      • The majority of all executable files do not require several gigabytes of RAM, hence it makes sense to streamline their address space.

        I know that. Many commercial *NIX systems are doing it. Though... Having a 32-bit "cat" doesn't really changes anything.

        That why I have mentioned the memory hungry algorithms. Many applications are doing it this days. Needless to mention that java this days is started almost exclusively with the "-d64".

        The market for 4GB address space is really small. Because modern general programming practices generally disregard the resources in general, RAM in particular. (The (number of) CPUs being the most disrega

    • I do some alternative OS development. When I setup a program to run there are 3 different 64bit modes (programming models) for me to select to run the program under: ILP64, LLP64, and LP64. In ILP64 you get 64 bit ints, longs, long longs, and pointers. In LLP64 you get 32bit longs and ints, and 64bit long longs and pointers. In LP64 you get 32bit ints, 64 bit longs, long longs and pointers. Note: All these pointers are 64 bit (but the hardware may have less bits than this, the OS will query it, code mus

  • by billcarson (2438218) on Tuesday December 24, 2013 @08:22PM (#45779267)
    Wouldn't this require all common shared libraries (glib, mpi, etc.) to be recompiled for both x86-64 and x32? What am I missing here?
    • by mjrauhal (144713) on Tuesday December 24, 2013 @08:32PM (#45779307) Homepage

      Yes it would. That's among the nontrivial maintenance costs.

      • by Arker (91948)

        Funny thing I notice in articles of this sort. There are always comments saying it's dumb because there is no point in optimising software for performance because hardware is so cheap. And there are comments like yours, complaining that having to do a recompile to achieve it is too big a burden.

        Do you see the tension between the thoughts? Because if hardware is so cheap that it is more reasonable to tell the user to upgrade his computer, rather than optimise your software, then does it not follow that same

        • by bogjobber (880402)
          Nontrivial doesn't necessarily mean large. It just means significant enough that it needs to be accounted for. The actual cost will of course be dependent on the size and complexity of your codebase.
  • Think Atom processors running Android, or High-performance computing applications. Neither of these require a huge external ecosystem, but if you get a 30-40% boost in some workload, they are worth it. It's my understanding that small-cache Atoms benefit from this more than huge Xeons.
  • And I don't want another set of libraries in my system in addition to 64 bit and 32 bit emulation.

  • by Just Brew It! (636086) on Tuesday December 24, 2013 @10:29PM (#45779933)

    This sure feels a lot like a throwback to the old 16-bit DOS days, where you had small/medium/large memory models depending on the size of your code and data address spaces. We've already got 32-bit mode for supporting pure 32-bit apps and 64-bit mode for pure 64-bit; supporting yet a third ABI is just going to result in more bloat as all the runtime libraries need to be duplicated for yet another combination of code/data pointer size.

    I hate to say this since I'm sure a lot of smart people put significant effort into this, but it seems like a solution in search of a problem. RAM is cheap, and the performance advantage of using 32-bit pointers is typically small.

  • by manu0601 (2221348)

    I understand it is the same beast as the COMPAT_NETBSD32 [netbsd.org] option that has been available in NetBSD for 15 years now. It works amazingly well: one can throw a 64 bit kernel on a 32 bit userland and it just works, except for a few binaries that rely on ioctl(2) on some special device to cooperate with the kernel.

    NetBSD even had a COMPAT_LINUX32 [netbsd.org] option for 7 years, which enables running a 32 bit Linux binary on a 64 bit NetBSD kernel. Of course the Linux ABI is a fast moving target, and one often misses the lat

    • by adri (173121)

      No, it's not the same.

      The idea is that you use the 32 bit pointer model, with 32 bit indirect instructions, but you're doing it all using the x86-64 instruction set. Ie, the task is in 64 bit mode. The 64 bit mode includes primarily more registers, so you can write / compile to tighter code.

      The stuff you described is for running 32 bit binaries that use the i386/i485/i586 instruction set, complete with the limited set of temporary registers. x86-64 has many more registers to use.

      It's not just about cache li

  • by macpacheco (1764378) on Tuesday December 24, 2013 @11:38PM (#45780175)

    While it's possible to have a system with 16GB that could use only x32 (the kernel is still x86_64 under x32, so the kernel can see the 16GB), for instance running thousands of tasks using up to 4GB each just fine, plus the page cache is a kernel thing, so the I/O cache can always use all memory.

    On the other hand, there are workloads that run on a 4GB system but that need x86_64 (mmaping of huge files for instance), and so boneheaded tasks reserve tons of never used RAM, it could actually use 1GB of RAM but reserve 8GB, the issue there really should be putting the coder in jail, but I digress.

    But the vast majority of linux workloads today that use even a 8GB system would run just fine under x32. Like 95-98%.
    And nobody is even suggesting a mainstream linux distro without x86_64 userland. I'm sugesting all standard tools using x32, but keeping the x86_64 shared libraries and compilers, so if you need you could use some apps with full 64bit capability. Just use x32 by default.

    Plus it's a good way to remind lazy developers that no matter how cheap RAM is, you should be serious about being efficient (specially to the KDE developers) !
    KDE functionality is great, but they really have no clue about efficiency (RAM and CPU).

  • So for me the answer is no. The whole thing reminds me of doing ARM assembler with thumb code mixed in. If you have a very specific usage for it then yes, it would certianly be useful - but it's going to be up to the people who need it to actually use and improve it. Everyone else has no need to care and the average developer shouldn't *need* to care or even be aware of it.

  • Errm (Score:5, Interesting)

    by countach (534280) on Wednesday December 25, 2013 @09:10AM (#45781517)

    Won't this require a 2nd copy of the shared libraries in memory, which will negate the benefit of a slightly smaller binary?

"If you don't want your dog to have bad breath, do what I do: Pour a little Lavoris in the toilet." -- Comedian Jay Leno

Working...