Array of size 0 at the end of struct

Question

My professor of a systems programming course I'm taking told us today to define a struct with a zero-length array at the end:

struct array{
    size_t size;
    int data[0];
};

typedef struct array array;

This is a useful struct to define or initialize an array with a variable, i.e., something as follows:

array *array_new(size_t size){
    array* a = malloc(sizeof(array) + size * sizeof(int));

    if(a){
        a->size = size;
    }

    return a;
}

That is, using malloc(), we also allocate memory for the array of size zero. This is completely new for me, and it's seems odd, because, from my understanding, structs do not have their elements necessarily in continuous locations.

Why does the code in array_new allocate memory to data[0]? Why would it be legal to access then, say

array * a = array_new(3);
a->data[1] = 12;

?

From what he told us, it seems that an array defined as length zero at the end of a struct is ensured to come immediately after the last element of the struct, but this seems strange, because, again, from my understanding, structs could have padding.

I've also seen around that this is just a feature of gcc and not defined by any standard. Is this true?

'structs do not have their elements necessarily in continuous locations' - arrays in structs do. — Martin James, 18 hours ago
@MartinJames how do you compute the size to pass to malloc afterwards ? — Quentin, 18 hours ago
This is deprecated syntax, after C99 the size needs to be empty rather than anything else (including 0) to make it well defined behavior. — user3528438, 18 hours ago
I believe this kind of hacks may be common when programming for embedded devices or other "exotic platforms".But in this case most of the time you are using a custom compiler for the platform that has various extensions/limitations, so there's no point in following the C standard where the only compiler that is able to produce code for that strange CPU isn't standard conforming... — Bakuriu, 14 hours ago
@jamesqf because an array is not a pointer. Making only that change to the above example will absolutely not result in working code. — Leushenko, 11 hours ago

Sourav Ghosh · Answer 1 · 2016-04-12 15:52:31Z

Currently, there exists a standard feature, as mentioned in C11, chapter §6.7.2.1, called flexible array member.

Quoting the standard,

As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member. In most situations, the flexible array member is ignored. In particular, the size of the structure is as if the flexible array member were omitted except that it may have more trailing padding than the omission would imply. [...]

The syntax should be

struct s { int n; double d[]; };

where the last element is incomplete type, (no array dimensions, not even 0).

So, your code should better look like

struct array{
    size_t size;
    int data[ ];
};

to be standard-conforming.

Now, coming to your example, of a 0-sized array, this was a legacy way ("struct hack") of achieving the same. Before C99, GCC supported this as an extension to emulate flexible array member functionality.

@user3528438 elaborated a bit to clear the context. Added relevant link, too. — Sourav Ghosh, 17 hours ago
In really old code, before either the C99 feature or the GCC extension existed, you will see data[1] used almost exactly the same way, relying on the general lack of bounds checking in C. This (unlike the extension and the C99 feature) is arguably UB-provoking, depending on what you think the repeatedly-revised-yet-still-in-need-of-major-revision definition of an "object" in the C standard means. — zwol, 14 hours ago

Lundin · Answer 2 · 2016-04-12 15:24:18Z

up vote 19 down vote

Your professor is confused. They should go read what happens if I define a zero size array. This is a non-standard GCC extension; it is not valid C and not something they should teach students to use (*).

Instead, use standard C flexible array member. Unlike your zero-size array, it will actually work, portably:

struct array{
    size_t size;
    int data[];
};

Flexible array members are guaranteed to count as zero when you use sizeof on the struct, allowing you to do things like:

malloc(sizeof(array) + sizeof(int[size]));

(*) Back in the 90s people used an unsafe exploit to add data after structs, known as the "struct hack". To provide a safe way to extend a struct, GCC implemented the zero-size array feature as a non-standard extension. It became obsolete in 1999 when the C standard finally provided a better way to do this.

edited 17 hours ago

answered 17 hours ago

Lundin

49.3k1068130

2

What "unsafe exploit" are you talking about? The only one I remember was initialising the array with size 1 and then using the offset to the first element in the array for the size of the basic struct. Which from my understanding is perfectly safe (if a bit roundabout and has the disadvantage of not enabling people to have structs with 0 arrays) - or does that cause UB somehow too? – Voo 16 hours ago

@Voo you could always malloc the too-small struct, cast to the pointer to the struct with the 1 element array, then trust people not to touch that last element. Quite unsafe, as many people reasonably presume that given T*, you can memcpy sizeof(T) bytes in/out of it... which this hack wouldn't permit. – Yakk 15 hours ago

1

@Voo That's quite unsafe, because there might be padding bytes at the end of the array, and you end up writing data into padding bytes, where there are no guarantees of value preservation. The compiler is free to assume there's nothing of value in the padding bytes. – Lundin 15 hours ago

@Lundin No you don't write to any padding bytes. offsetof(struct, arr) gives you the offset to the start of the array including the padding bytes in between. Not sure I understand your hack though - clearly you have to allocate the whole size - including the array and all padding. That sizeof(T) doesn't work is a given (how would it?) but the C99 variant doesn't help with that either. – Voo 15 hours ago

1

@Voo: Yes, you do write to padding bytes. struct {int size, char data[1]}; usually contains three padding bytes after data, which any writes to data[1] would write to. OTOH, the use of malloc managing the memory arguably made it safer. The C spec contradicts itself in this area. – Mooing Duck 4 hours ago

| show 2 more comments

haccks · Answer 3 · 2016-04-13 03:37:23Z

Other answers explains that zero-length arrays are GCC extension and C allows variable length array but no one addressed your other questions.

from my understanding, structs do not have their elements necessarily in continuous locations.

Yes. struct data type do not have their elements necessarily in continuous locations.

Why does the code in array_new allocate memory to data[0]? Why would it be legal to access then, say
array * a = array_new(3);
a->data[1] = 12;
?

You should note that one of the the restriction on zero-length array is that it must be the last element of a structure. By this, compiler knows that the struct can have variable length object and some more memory will be needed at runtime. But, you shouldn't be confused that, "since zero-length array is the last member of the structure then the memory allocated for zero-length array must be added to the end of the structure and since structs do not have their elements necessarily in continuous locations then how could that allocated memory be accessed?"
No. That's not the case. The allocated chunk of memory can go anywhere, just before or after the location of size or at far from the location of size, but that allocated memory must be accessed with variable data. And yes, padding will have no effect over here.

I've also seen around that this is just a feature of gcc and not defined by any standard. Is this true?

Yes. As other answers already mentioned that zero-length arrays are not supported by standard C, but an extension of GCC compilers. C99 introduced flexible array member. An example from C standard (6.7.2.1):

After the declaration:
struct s { int n; double d[]; };
the structure struct s has a flexible array member d. A typical way to use this is:
int m = /* some value */;
struct s *p = malloc(sizeof (struct s) + sizeof (double [m]));
and assuming that the call to malloc succeeds, the object pointed to by p behaves, for most purposes, as if p had been declared as:
struct { int n; double d[m]; } *p;
(there are circumstances in which this equivalence is broken; in particular, the offsets of member d might not be the same).

"The allocated chunk of memory can go anywhere, just before or after the location of size"... That's not quite right. The memory has to go somewhere after size (possibly with padding), not before. If size was at the end of the variable-size object, indexing into data would have to go backwards. Given a struct array *, you have to be able access the size member without knowing the size. I think the C standard has enough requirements to make it impossible for an intentionally-weird implementation to satisfy all the rules. char* can alias anything, so you can look at the bytes. — Peter Cordes, 3 hours ago

JDługosz · Answer 4 · 2016-04-13 03:46:02Z

up vote 2 down vote

The way I used to do it is without a dummy member at the end of the structure: the size of the structure itself tells you the address just past it. Adding 1 to the typed pointer goes there:

header * p = malloc (sizeof (header) + buffersize);
char * buffer = (char*)(p+1);

As for structs in general, you can know that the fields are layed out in order. Being able to match some imposed structure needed by a file format binary image, operating system call, or hardware is one advantage of using C. You have to know how the padding for alignment works, but they are in order and in one contiguous block.

answered 5 hours ago

JDługosz

1,6861824

Code you have shown fails, if size of header is 3, and buffer is of type uint32_t which requires 4 byte alignment. You'd have to take care of the alignment manually, and the point of the flexible array member is that you don't have to do that. – user694733 2 hours ago

No it doesn't fail. The padding is part of the size of the structure (think about making an array of structures). I see that using a flex member (of type char, for example) lets you make the location of the array not aligned to a word boundary. IAC matching a known binary layout means understanding the compiler's alignment. – JDługosz 1 hour ago

Consider typedef struct { uint8_t a[3]; } header;. On some system it will have size of 3 bytes, and alignment of 1, which means there is no padding. If you then do uint32_t * buffer = (uint32_t*)(p+1);, you'll get UB, because (p+1) will result in address which is not correctly aligned for 32-bit type. – user694733 1 hour ago

I see what you're getting at. I recall doing things where the header is followed by various possible record types, depending on actual values (e.g. a file format). The buffer, if declared as a simple single primitive type, needs to match the actual alignment of the structures you will actually use. You have to understand (or controll) alignment no matter what you do. (Meanwhile, before the first ANSI standard we didn't have formal "UB" either. We just had whatever the compiler did, and hope it doesn't change in the next version.) – JDługosz 1 hour ago

add a comment |

Neil · Answer 5 · 2016-04-12 16:06:21Z

up vote 0 down vote

A more standard way would be to define your array with a data size of 1, as in:

struct array{
    size_t size;
    int data[1]; // <--- will work across compilers
};

Then use the offset of the data member (not the size of the array) in the calculation:

array *array_new(size_t size){
    array* a = malloc(offsetof(array, data) + size * sizeof(int));

    if(a){
        a->size = size;
    }

    return a;
}

This is effectively using array.data as a marker for where the extra data might go (depending on size).

answered 17 hours ago

Neil

1,423818

1

array_new(0); now you have a pointer to a struct where there are less than sizeof(struct) bytes valid! – Yakk 15 hours ago

@Yakk Indeed - a well known limitation of the old variant, which is the whole reason (I assume) that C99 introduced the new syntax. Apart from that little limitation, it works well and is standards compliant even in C90 as far as I know. – Voo 15 hours ago

4

Accessing beyond the first element of data causes undefined behaviour. Your code would be relying on non-standard compiler extensions . I don't think "more standard" is a good description here. Using 0 instead of 1 turns a compilation error into silent undefined behaviour. – M.M 12 hours ago

3

While there's frankly no good reason to use either, 1 is definitely worse than 0 because it removes the intent behind the code. 0 at least relies on a language extension that is defined in its own way. 1 is perfectly legal C code for a fixed size array and as a result communicates very little in practice (could easily be an artifact of strange metaprogramming or rigid style). – Leushenko 11 hours ago

add a comment |

asked	today
viewed	1192 times
active	today

current community

your communities

more stack exchange communities

Array of size 0 at the end of struct

5 Answers 5

Your Answer

Not the answer you're looking for? Browse other questions tagged c arrays pointers gcc struct or ask your own question.

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

Array of size 0 at the end of struct

5 Answers 5

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged c arrays pointers gcc struct or ask your own question.

Linked

Related

Hot Network Questions