The Linux Kernel Is Now VLA-Free: A Win For Security, Less Overhead and Better For Clang (phoronix.com) 113
With the in-development Linux 4.20 kernel, it is now effectively VLA-free. From a report: The variable-length arrays (VLAs) that can be convenient and part of the C99 standard but can have unintended consequences. VLAs allow for array lengths to be determined at run-time rather than compile time. The Linux kernel has long relied upon VLAs in different parts of the kernel -- including within structures -- but going on for months now (and years if counting the kernel Clang'ing efforts) has been to remove the usage of variable-length arrays within the kernel. The problems with them are:
1. Using variable-length arrays can add some minor run-time overhead to the code due to needing to determine the size of the array at run-time.
2. VLAs within structures is not supported by the LLVM Clang compiler and thus an issue for those wanting to build the kernel outside of GCC, Clang only supports the C99-style VLAs.
3. Arguably most importantly is there can be security implications from VLAs around the kernel's stack usage.
1. Using variable-length arrays can add some minor run-time overhead to the code due to needing to determine the size of the array at run-time.
2. VLAs within structures is not supported by the LLVM Clang compiler and thus an issue for those wanting to build the kernel outside of GCC, Clang only supports the C99-style VLAs.
3. Arguably most importantly is there can be security implications from VLAs around the kernel's stack usage.
Re: (Score:1)
VLAs within structures ... are not supported
Maybe read the whole sentence?
Re:"VLAs within structures" not part of C (Score:5, Informative)
This [gnu.org] is what they are referring to. Code like (from that link):
Re: (Score:2, Informative)
Looks like you are lost, buddy. This is C.
And how does your vector BS solve the problem? Is its storage allocated on stack entirely?
Re: (Score:2)
Sadly I knew someone who prefered std::map over std::vector. Including when the key range was tiny and it was guaranteed to only have a single element at a time (I so wish I was making this up). If someone's only tool in their tool box is an nail-gun, don't be surprised if their project has a lot of nails in it.
Re:"VLAs within structures" not part of C (Score:4, Informative)
Generally, you can get the tricky parts of the kernel done in C, then layer C++ on top of it. That's what a lot of embedded RTOS systems do. The biggest snag is the tendency of getting bloated code from developoers not aware of what C++ does behind the scenes.
Re: (Score:2)
The biggest snag is the tendency of getting bloated code from developoers not aware of what C++ does behind the scenes.
Doesn't the Linux kernel have a whole army of people approving code changes? They could be aware of what C++ does behind the scenes even if the developers aren't.
(But seriously: Do you think a Linux core developer is incapable of using C++ properly or knowing what it does behind the scenes?)
Re: (Score:2)
I was referring to kernel and OS work in general, not Linux specifically.
Re: (Score:2, Informative)
Re: (Score:3)
It's almost as if you don't know that std::vector can use a custom memory allocator, eg alloca().
Re: (Score:2)
Re: (Score:2)
I just read a few things about VLAs in C99, and my god, it makes Stroustrup look like a rocket scientist.
Convenient while it works, then brutally unsafe the moment it doesn't work (recompile for a new platform, whole new stack-size ballgame—you do the math, except you can't, because the C standard is deaf-mute on the existence of the primary stack, and hence, perforce, also its size limits).
Of course, when you're compiling the Linux kernel, you are compiling the platform itself, so internally it can c
Re: (Score:2)
Huh? There is nothing unsafe with VLA. They also tend to *reduce* stack usage, as the alternative are oversized fixed size arrays on the stack. If you care about the dynamic sized stack, then you also can't have functions calls in a conditional path.
Re: (Score:2)
They reduce average stack size. They don't reduce the worst case stack size, which is what you care about.
Re: (Score:2)
They can reduce max stack size in come cases. E.g. if you split one array into two smaller arrays but you don't know how big the smaller arrays are. In any case, they never increase stack size when compared to static arrays on the stack.
Re: (Score:2)
The flexible array member of a struct is not the same as a variable length array.
Re: (Score:1)
What the hell is APK? You're not talking about android app packages, right?
Bummer. (Score:2)
Vla [wikipedia.org]
Re:Finally! (Score:4, Funny)
I think the Linux community should join CAT - the Campaign to Abolish TLAs.
Re: (Score:3, Informative)
Acronyms are words that you pronounce, like laser (Light Amplification by Stimulated Emission of Radiation), scuba, radar, or PIN (Personal Identification Number number).
Initialisms are words you spell out, like FBI, CIA, DNR, ECG, MRI, DVLA etc.
A TLA is an initialism, not an acronym, so really it's not a TLA, it's a TLI. Not sure which one CAT is supposed to be though!
Re: (Score:1, Troll)
Your distinction is false. FBI, CIA, etc. are all acronyms. They're just abstracted names. Acronym literally means high name.
The term "initialism" is bullshit. It was bandied about in the late 1950s by some clown trying to foist your ridiculous distinction on the world. No one bought into it, nor should they have. The word "initialism" was already in use well prior, referring specifically to authors names.
Re: (Score:2)
That was clearly intentional and meant as a joke, since everyone says "PIN number" all the time instead of just "PIN".
It's a case of RAS syndrome (or is it RIS syndrome?)
Re: (Score:1)
Not me. I think of Klaang (Score:2)
Klaang from Star Trek: http://memory-alpha.wikia.com/... [wikia.com]
Re: Clang (Score:1)
VLA-free does not resolve the problem. (Score:1)
The advantage of VLA is pushing items to it without knowing the max. capacity.
The risk of VLA-free is out of capacity, unknown max. capacity.
The stack is like VLA but for only 1 stack instead of many stacks as many VLAs.
Another example, the number of open files. I should not limit it to a small constant, by example, max. 1024 open files, but i need 1 million of open files (for P2P). With VLA it is more flexible under demand or needness.
Many current PCs are 64-bit and have much memory as 32 GB by example for
Re: (Score:2)
Another example, the number of open files. I should not limit it to a small constant
There is no such limitation. You can re-allocate arrays as needed. VLA is just automating that for you.
GCCisms (Score:5, Informative)
The first problem is that they can be dropped from future versions of GCC. They're not part of any standard, after all.
The second problem is that there are situations in which GCC isn't the most suitable compiler. You want to minimize hacks for each different compiler supported.
Security is a big thing, too. It's hard to audit fundamentally unpredictable code.
A major step forward.
Re: (Score:2)
Re: (Score:2)
The estimated defect density is around 0.014 (ie: about 14 issues per million lines of code). That gives you an upper threshold on exploitable defects.
However, if we can reduce that to 0.001, through Clang, which implies 325 defects found with Clang, I'm not going to complain. I'm going to cheer.
Good for debugging too (Score:4, Interesting)
GNU'isms (Score:3)
Re: (Score:3)
The GNU over Linux refers to GNU userspace over Linux kernelspace. So a GNU userspace over OpenBSD would be GNU/OpenBSD. BSD/BSD is 1, since you're dividing by itself.
Re:GCCisms (Score:4, Informative)
Re: (Score:2)
There's lots of stuff in the Linux kernel that uses a GNU variant of a concept rather than the ISO variant. Often because GNU was there first. If you switch from the vendor-specific form, the isms, to the standard form, you don't change the concepts involved but you do make it more portable.
Re: (Score:1)
How the fuck is this a step forward?
You previous needed a array of variable size.
You had a language feature that automatically allocated storage for it statically on the stack in a virtually seemless way.
"Oh no! Advanced features!! Memory stuff!! Scary, scary, go back to K+R!"
Great. Now you still need the same functionality , but you've now gone BACK to playing around with malloc/free, adding additional function parameters, wasting memory "just in case", or whatever other ad hoc soluti
C does not really have arrays (Score:5, Interesting)
Just memory addresses. *Foo could be one or a few or many. Pointer arithmetic.
So variable arrays feels odd.
If you did not like chasing down weird memory corruption problems then you would not be using C (or C++) in the first place.
It would have been trivial to add a little bit of sanity with syntax like
void foo(char buf[blen], int blen)
so a compiler could, in debug mode, check. But no, that would not be a hero's C. nor is variable length arrays.
Incidentally, C's lack of arrays is not efficient. E.g. it is the reason we need 64 bit pointers, namely that C can only address 4 gig in 32 bit pointers. Java can access 32 gig of memory with 32 bit pointers because mallocs are aligned, and 32 gig is more than enough for the vast majority of current applications, and likely to remain so for a long time to come. Doubling your pointer size with lots of zeros is expensive, it clogs caches etc.
Re: (Score:2)
Contiguous memory is the correct solution, yes. But nothing stops you having an index that tells you where the base is for a given offset. That lets you have a discontinuous set of arrays where each one is accessed as a contiguous array.
Examples: Memory pages in Linux.
If you go onto a different page, you have a different base address. But each page is contiguous. It works, we know it works, and except on crossing boundaries, it's t h e fastest method as you point out.
Re: (Score:2)
If you did not like chasing down weird memory corruption problems then you would not be using C (or C++) in the first place.
Well, perhaps but OTOH many of us avoid malloc like the plague.
Re: C does not really have arrays (Score:1)
why the fuck are you coding in C then?
Re: (Score:2)
Well, most machines probably do treat a machine-level 64-bit pointer as a 64-bit pointer if the opcode says to. It's an atomic operation to load into a register, after all.
In practice, the compiler won't use a 64-bit opcode if a smaller operation is faster and will work. One reason strongly-typed languages are good for optimizers - you can place things precisely and thus work out how large pointers need to be.
That's the compiler. The machine just runs the opcode as provided.
Re: (Score:2)
If you don't want to allocate on the stack you can also not call a function as the stack frame is allocated on the stack...
Re: (Score:2)
1. Calm down
2. You're assuming GNU's method is the only method and thus the standard method
3. Plenty of people staple together blocks to create virtual arrays. Some are called filing systems, some are called Linux memory managers, and there's one called GMP. It's the method underlying any potentially fragmented workspace if you don't want to keep copying. Because it's required in a lot of Linux, a standard, portable, form in a helper library would be nice. It might mean we can get rid of the umpteen queue a
Re: (Score:2)
VLA are part of the standard and it is a myth that removing them it helps security as it removes price information about run-time bounds from the compiler's view. In my opinion not using VLA is a major step back for security. (Yes, I contributed to with this overall effort myself, but only where it were fake VLAs which really had a constant size but the compiler couldn't know this).
For GNUisms in general, we will certainly try to standardize some of them (those which are useful, well-defined, and supported
Re: (Score:3)
So you're aware that GNU introduced features often way in advance of any standard and that the GNU syntax/semantics don't always match the ISO version.
Let's see what ISO says about VLAs:
C99 adds a new array type called a variable length array type. The inability to declare arrays whose size is known only at execution time was often cited as a primary deterrent to using C as a numerical computing language. Adoption of some standard notion of execution time arrays was considered crucial for C’s acc
Re: (Score:3)
So you're aware that GNU introduced features often way in advance of any standard and that the GNU syntax/semantics don't always match the ISO version.
Yes of course. In fact, I added myself a GNU extension. I am also participating in WG14.
Let's see what ISO says about VLAs:
C99 adds a new array type called a variable length array type. The inability to declare arrays whose size is known only at execution time was often cited as a primary deterrent to using C as a numerical computing language. Adoption of some standard notion of execution time arrays was considered crucial for C’s acceptance in the numerical computing world.
Does this match your experience?
Absolutely. I use C for numerical computing and VLAs a very important.
Would discontiguous pools of contiguous memory, giving you the ability to make anything flexible size, be that much worse as that's what the compiler will be using anyway?
I don't understand what you are trying to say. The VLA will live on the stack or the heap depending on where one allocates it. In both cases, there is no way to resize it. Making it resizable is much harder and no compiler does this as it would require a level of indirection which reduces performance and would require some kind of automatic memory man
Re: (Score:1)
The first problem is that they can be dropped from future versions of GCC. They're not part of any standard, after all.
IMHO, GCC rarely drops its proprietary extensions, once introduced and documented. Correct me if I'm wrong.
High vs low languages. (Score:4, Interesting)
VLAs are an example of C becoming ever so slightly higher level. When the language does things under the hood without telling you it's just an invitation to bite you in the ass. Good purge.
Re: (Score:2)
What was it doing under the hood without telling you? Isn't a VLA basically just a call to alloca() [stackoverflow.com]?
Re:High vs low languages. (Score:5, Interesting)
Yes. Exactly that. It's allocating space for you. It figures out at run-time the length of your array rather than you having to do it by hand at compile-time. I didn't actually know of any security flaws this would lead to, but it stops debuggers from knowing details about calls so it obscured some information from me and pissed me off once.
Re: (Score:1)
Calls to alloca() are explicit which makes them less hidden to human review. VLAs encourage writing code without putting more thought into it. It also encourages the use of variably length things when fixed length things make more sense. It also discourages thinking about when variably length may come into play and how to efficiently handle them.
In general, all of this is bad things for code running at all times and with [near] unlimited power to do harm, especially when those properties are ripe for abu
Re: (Score:2)
Finally TFS helps with TFT (Score:1)
I was thinking a different VLA [wikipedia.org] and what the f*ck that had to do with the Linux kernel.
Re: (Score:1)
This monolithic kernel bloat simply has to stop...
Re: (Score:3)
When the Linux kernel depends on non-standard language extensions that only GCC implements, that's OK.
Except that VLAs are part of the C99 standard, and there's nothing in the standard that says they can't be used in a struct - it's just difficult for the compilers. gcc has chosen to technically implement it as an extension, while Clang/LLVM doesn't support it (nor the floating point pragmas of C99, which has also been an issue for some kernel code).
Re: (Score:2)
Re: (Score:1)
Used sanely Java isn't terrible. Even with realtime stuff - I had no problems. Just don't create objects in the critical path. Given the choice, I wouldn't choose to write the lowest levels of a kernel in Java (or C++) though.
The merit of variable length C99 arrays is a good question. My conservative side says just allocate fixed sized stuff for the smaller cases and for the others malloc and deal with it. If pointers to lists of structs scare you, C ain't for you. But FFS it's 2018 - Is it too much to ask
Oh, god, the kernel is already falling apart! (Score:2, Funny)
See, adopting a code of conduct is already undermining the foundations of the kernel.