C99’s VLAs are evil
While, at first glance, the ability to define an array, whose length is determined at runtime from a variable, seems like a good idea. Hey, it’s very clean and easy to understand, and all of the normal array operations work[1]; there is one major flaw—there is no way to deal with errors.
Variable Length Arrays, in C99 are defined just like any other array, except that the length doesn’t need to be a compile-time constant:
void foo(int size) {
int array[size];
for(int i=0; i<size; ++i) {
array[i] = i;
}
...
}
Calling the above function with any reasonable value will do the obvious thing, the array will be sized appropriately, will contain sequential numbers. sizeof(array)
will even return size*sizeof(int)
. For the common case, everything works, and everybody is happy. But what happens when a size
is too large to fit on the stack? Who knows.[2]
In the most common implementation of VLAs the compiler literally decrements the stack pointer by size
and goes on it’s merry way, without regard to whether or not the stack pointer now points to an invalid address (or worse, a valid address somewhere inside of your program’s heap). If you’re lucky, and the stack pointer points to an invalid address, your program will likely crash. On the other hand, if you’re not so lucky, the seemingly innocuous function above will start trashing things in your program’s memory space; you may not catch it, you may have a strange, untraceable crash hours later, etc.. Needless to say, this can be quite difficult to debug.
The solution is to simply not use this feature of C99. If you’ve got a higher-level abstraction (such as std::vector
in C++, NSData
or NSArray
in Cocoa) use that. If you really want to stick with pure, standard C, use malloc()
/free()
[3]:
void foo(int size) {
int *array = malloc(size * sizeof *array);
if(!array) {
//Error out here
return;
}
for(int i=0; i<size; ++i) {
array[i] = i;
}
...
free(array);
}
- Even
sizeof
works, though that required a distasteful exception to the normalsizeof
rules. Before VLAs,sizeof
was always evaluated at compile time, and the resulting value was always a compile-time constant. When applied to VLAs, this obviously cannot be. [↩] - I’m not joking. The official term is “undefined behavior”. In such a case, the C standard doesn’t specify anything about what happens. [↩]
- Don’t be tempted to use
alloca()
, it is not a standard C function, and has exactly the same issue that VLAs do [↩]
Thanks for sharing this. I’ve used VLAs in the past without even thinking about the implementation details – figured it was one of those ‘it just works’ situations.
You mention that this is a problem for the most common implementation of VLAs… do you know which compilers that does/doesn’t include?
I’d be very surprised to find a compiler that didn’t implement them this way. In order to do anything sensible, the compiler would have to:
1) Know how much stack space was available, so it could determine whether or not the allocation would run off the end thereof.
2) Have some way of failing gracefully, which would likely require some language-level exception mechanism, which C doesn’t have.
You know, this really isn’t a new problem; it applies to compile-time-constant-length stack arrays as well. How do you know there’s room on the stack to allocate, say, int[1024]? You don’t.
You *assume* the runtime has allocated enough room for the stack, and that your function isn’t being called at a point when the stack has grown to the point where your array doesn’t fit.
The solution is the same in both cases: Use the stack responsibly. Don’t use it to store enormous arrays, and only use VLAs when the length is being passed or calculated from trusted data. Also, assert() is your friend.
I agree with SNF. This is not a new problem at all. One should not allocate large work arrays in functions in this way. They should allocate them dynamically. If one does not want to worry about calling free() for every malloc() they can use alloca().
C99 VLA does have a benefit for multi dimensional arrays. For example:
void func(int n, double data[][n]) {
…
}
This cannot be done in C unless the number of columns is static. This can have huge benefits in scientific programming when pointer arrays cannot be used effectively.
The case of using the VLA in the function parameter is a different issue. It won’t lead to the same kind of stack corruption, as you aren’t allocating that array on the stack, you’re merely receiving a pointer to it. In that case, you’re free to validate the value of (n) before using the array; you don’t have that option for arrays with automatic storage.
The way C handles arrays and pointers in general is simultaneously one of its greatest strengths and one of its greatest weaknesses. VLA’s serve to obfuscate the issue, with very little benefit, and make it even easier to crash your program.
Sure, that’s right, passing the VLA to the function will not lead to a stack overflow.
Here is another example of allocating a VLA with malloc that is sometimes useful
void func(int m, int n) {
int i, j;
double (*data)[n];
data = malloc(m * n * sizeof(double));
for (i = 0; i < m; ++i) {
for (j = 0; j < n; ++j) {
data[i][j] = 0.;
}
}
/* do more stuff with data here */
free((void *) data);
}
I disagree about the VLA having very little benefit. For numerical programming with a lot of matrices VLA’s can result in much cleaner code. The lack of VLA’s in pre C99 is one reason C was never widely used for numerical programming.
All normal compilers do stack allocations correctly, not by simply decrementing the stack pointer. For example (gcc (GCC) 3.4.2 (mingw-special)):
.00401730: 51 push ecx
.00401731: 89E1 mov ecx,esp
.00401733: 83C108 add ecx,008 ;”◘”
.00401736: 3D00100000 cmp eax,000001000 ;” ► ”
.0040173B: 7210 jb .00040174D —–↓ (1)
.0040173D: 81E900100000 sub ecx,000001000 ;” ► ”
.00401743: 830900 or d,[ecx],000 ;” ”
.00401746: 2D00100000 sub eax,000001000 ;” ► ”
.0040174B: EBE9 jmps .000401736 —–↑ (2)
.0040174D: 29C1 sub ecx,eax
.0040174F: 830900 or d,[ecx],000 ;” ”
.00401752: 89E0 mov eax,esp
.00401754: 89CC mov esp,ecx
.00401756: 8B08 mov ecx,[eax]
.00401758: 8B4004 mov eax,[eax][04]
.0040175B: FFE0 jmp eax
How about to do so:
void foo(int size) {
int asize = size;
if (asize > 1024)
asize = 1;
int buffer[asize];
int *array = buffer;
if (asize != size)
array = malloc(size * sizeof *array);
if(!array) {
//Error out here
return;
}
for(int i=0; i
And the code can be wrapped in macro to simplify the works :)
#define GET_BUFFER(size) ….
#define FREE_BUFFER(…)
Compilers like gcc is happy even if you pass -1 to a variable-length array. In this case, sizeof(array) is 0xFFFFFFFC.
[…] III, a Cocoa developer and fairly frequent contributor to the cocoa-dev mailing list, points out a major flaw in C99’s variable length arrays (VLAs) on his blog today. While it is nice to be able to allocate arrays on the stack, I’m with Clark on this one: use […]
Have to say I completely disagree. The stack is a finite resource, yes. It’s also dangerous to use recursion. It’s dangerous to make big arrays on it even with a compile-time constant (or any other large aggregate for that matter). Any kind of code that exhausts the stack invokes undefined behavior.
That doesn’t make this evil as far as I see. It does not make recursion evil either or big arrays (with or without sizes defined at runtime) evil. The whole point of C is to be very close to the hardware. There are risks involved with that and developers must take greater caution with their code, but in exchange we get high performance.
VLAs are a great feature as far as I see since it gives C something we can do in assembly with the stack that other languages can’t really do very effectively. It’s no more evil than a lot of C features which exchange safety for high performance.
In fact, I’d go so far as to say they are often *safer* than stack-based alternatives where people allocate some maximum size like MAX_PATH or MAX_BUFFER, since that would often involve using *more* of the stack than is needed for that function, and also gets into problematic territory if we exceed that max size.
Of course the safest alternatives involve the heap, but a heap allocation is generally far more expensive than a stack allocation, even with the most efficiently implemented allocators.
Yes, VLAs are evil but for a different reason, which is the backward compatibility. People use VLAs and then their programs do not compile with many standard C compilers.
Unlike VLAs, malloc() and free() work just fine evereywhere.
As others have said, declaring anything huge on the stack is problematic. You can use malloc/free with VLAs, too:
void foo(size_t sizeX, size_t sizeY ) {
double (*array)[sizeY];
array = malloc(sizeY*sizeX*sizeof(double));
}
Well, VLA allocations can cause problems, but it is the same for all stack allocations. You can run out of space with constant-length stack arrays as well, you can get a stack overflow with too many nested function calls (e.g. recursive) and so on.
But stack is efficient: Stack allocations are cheaper computationally. Stack allocations need not to be synchronized. Stack allocations don’t cause memory fragmentation.
And stack allocations in C are safer in the sense that you don’t need to remember to free() everything – which also makes the code cleaner, as freeing temporary buffers can easily clutter the code, make it full of gotos during error handling and so on. Heap allocations without GC or RAII are just plain inelegant.
Like many other times in programming, this is a tradeoff you need to take – use the heap and sacrifice code readability and performance for fault tolerance and generality – or vice versa, using the stack. VLAs are just another tool – you can use it for good, or to shoot yourself in the foot. C is a low-level language that gives you such choices, and it should stay this way.