Mail me

Type punning in C

Reading the chapter on strings in Crafting Interpreters led me into a rabbit hole about strict aliasing rules in C. I can't say I master the subject now but I learned a few things.

I started with Wikipedia's article on type punning, which was probably a bad idea as it turns out the BSD sockets example doesn't, as far as I can tell, obey strict aliasing rules. sockaddr and sockaddr_in are defined like this.

  // From <sys/socket.h>
  struct __attribute_struct_may_alias__ sockaddr
    {
      __SOCKADDR_COMMON (sa_);    /* Common data: address family and length.  */
      char sa_data[14];           /* Address data.  */
    };

  // From <arpa/inet.h>
  /* Structure describing an Internet socket address.  */
  struct __attribute_struct_may_alias__ sockaddr_in
    {
      __SOCKADDR_COMMON (sin_);
      in_port_t sin_port;                 /* Port number.  */
      struct in_addr sin_addr;            /* Internet address.  */

      /* Pad to size of `struct sockaddr'.  */
      unsigned char sin_zero[sizeof (struct sockaddr)
                             - __SOCKADDR_COMMON_SIZE
                             - sizeof (in_port_t)
                             - sizeof (struct in_addr)];
    };

The C standard says,

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

  • a type compatible with the effective type of the object,
  • a qualified version of a type compatible with the effective type of the object,
  • a type that is the signed or unsigned type corresponding to the effective type of the object,
  • a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
  • a character type.

You might think the fifth rule applies since both structs have __SOCKADDR_COMMON and sockaddr_in has the sin_zero padding at the end so they have the same size, but as I understand it you'd actually have to include struct sockaddr inside sockaddr_in for it to apply, like:

  struct sockaddr {
    sa_family_t sa_family;
  };

  struct sockaddr_in {
    struct sockaddr sa_addr;
    in_port_t sin_port;
    struct in_addr sin_addr;
  }

This is also what Crafting Interpreters does with Obj and ObjString.

(Note also that the actual declarations of sockaddr and sockaddr_in have the may_alias attribute, which makes GCC treat this like a character type for the purposes of aliasing.)

While researching the subject I found a lot of writing about the subject, including a 2006 post by Mike Acton (of Data-Oriented Design fame). The best explanation I found was What is the Strict Aliasing Rule and Why do we care?. The GCC docs also have a good page. I also found this fun Linus email where he complains about strict aliasing and standards in general.