Type punning in C
Reading the chapter on strings in Crafting Interpreters led me into a rabbit hole about strict aliasing rules in C. I can't say I master the subject now but I learned a few things.
I started with Wikipedia's article on type punning, which was probably a bad idea as it turns out the BSD sockets example doesn't, as far as I can tell, obey strict aliasing rules. sockaddr and sockaddr_in are defined like this.
// From <sys/socket.h>
struct __attribute_struct_may_alias__ sockaddr
{
__SOCKADDR_COMMON (sa_); /* Common data: address family and length. */
char sa_data[14]; /* Address data. */
};
// From <arpa/inet.h>
/* Structure describing an Internet socket address. */
struct __attribute_struct_may_alias__ sockaddr_in
{
__SOCKADDR_COMMON (sin_);
in_port_t sin_port; /* Port number. */
struct in_addr sin_addr; /* Internet address. */
/* Pad to size of `struct sockaddr'. */
unsigned char sin_zero[sizeof (struct sockaddr)
- __SOCKADDR_COMMON_SIZE
- sizeof (in_port_t)
- sizeof (struct in_addr)];
};The C standard says,
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
- a type compatible with the effective type of the object,
- a qualified version of a type compatible with the effective type of the object,
- a type that is the signed or unsigned type corresponding to the effective type of the object,
- a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
- an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
- a character type.
You might think the fifth rule applies since both structs have __SOCKADDR_COMMON and sockaddr_in has the sin_zero padding at the end so they have the same size, but as I understand it you'd actually have to include struct sockaddr inside sockaddr_in for it to apply, like:
struct sockaddr {
sa_family_t sa_family;
};
struct sockaddr_in {
struct sockaddr sa_addr;
in_port_t sin_port;
struct in_addr sin_addr;
}
This is also what Crafting Interpreters does with Obj and ObjString.
(Note also that the actual declarations of sockaddr and sockaddr_in have the may_alias attribute, which makes GCC treat this like a character type for the purposes of aliasing.)
While researching the subject I found a lot of writing about the subject, including a 2006 post by Mike Acton (of Data-Oriented Design fame). The best explanation I found was What is the Strict Aliasing Rule and Why do we care?. The GCC docs also have a good page. I also found this fun Linus email where he complains about strict aliasing and standards in general.