9.12 Building an Authenticated Secure Channel Without SSL

9.12.1 Problem

You want to encrypt communications between two peers without using SSL and the overhead that it incurs. Because it is normally a bad idea to encrypt without integrity checking (to avoid attacks such as man-in-the-middle, capture replay, and bit-flipping in stream ciphers), you also want to employ some kind of integrity checking so you'll be able to determine whether the data has been tampered with in transit.

We also assume here that you'd like to stay away from a full-fledged PKI, instead using a more traditional model of user accounts managed on a per-machine basis.

9.12.2 Solution

Use an authenticating key exchange mechanism from Chapter 8, and use the resulting session key with a solution for authenticated encryption, while performing proper key and nonce management.

In this recipe, we provide an infrastructure for the simple secure channel, for use once authentication and key exchange is performed.

9.12.3 Discussion

Given the tools we've discussed in previous recipes for authentication, key exchange, and the creation of secure channels, producing an end-to-end solution isn't drastically difficult. Nonetheless, there are some potential "gotchas" that we still need to address.

In protocols such as SSL/TLS, connection establishment is a bit more complex than simply authenticating and exchanging a key. In particular, such protocols tend to negotiate which version of the protocol is to be used, and perhaps negotiate which cryptographic algorithms and key sizes are to be used.

In situations like this, there is the threat of a rollback attack, which occurs when an attacker tampers with messages during establishment and tricks the parties into negotiating an insecure set of parameters (such as an old, broken version of a protocol).

A good protocol for authentication and key exchange, such as PAX or SAX (see Recipe 8.15), ensures that there are no opportunities for rollback in the context of the protocol. If you don't have messages that come before the key exchange, and if you immediately start using your encryption key after the exchange using an authenticated encryption solution, you can do other kinds of negotiation (such as agreeing on a protocol) and not have to worry about rollback.

If, on the other hand, you send messages before your key exchange, or you create your own end-to-end protocol (neither is a solution we recommend), you will need to protect against replay attacks on your own. To accomplish this, after connection establishment, have each side MAC every message that it thinks took place during the establishment. If the client sends its MAC first, and the server validates it, the server should MAC not only the establishment messages but also the MAC value sent by the client. Similarly, if the server sends the MAC first, the client should include the server's MAC in its response.

Our overall recommendation is not to introduce SSL-style configurability for your cryptography. If, for example, you use PAX, the only real option possible in the whole key exchange and authentication process is the size of the key that gets exchanged. We recommend that you use that key in a strong predetermined authenticated encryption scheme without negotiation. If you feel that you absolutely must allow for algorithm negotiation, we recommend you have a highly conservative default that you immediately start using after the key exchange, such as AES in CWC mode with 256-bit keys, and allow for renegotiation.

As we discuss in Recipe 6.21, you should use a message counter along with a MAC to thwart capture replay attacks. Message counters can also help determine when messages arrive out of order or are dropped, if you always check that the message number increases by exactly one (standard capture replay detection only checks to make sure the message number always increases).

Note that if you're using a "reliable" transport such as TCP, you will get modest prevention against message reordering and dropped messages. TCP's protection against these problems is not cryptographically secure, however. A savvy attacker can still launch such attacks in a manner that the TCP layer will not detect.

In some environments, message ordering and dropping aren't a big deal. These are the environments in which you would traditionally use an "unreliable" protocol such as UDP. Generally, cryptographically strong protocols may be able to tolerate drops, but they shouldn't tolerate reordering, because doing so means foregoing standard capture replay prevention. You can always drop out-of-order messages or explicitly keep track of recent message numbers that have been seen, then drop any duplicates or any messages with a number that comes before that window.

Particularly if you're using TCP, if a message fails to authenticate cryptographically, recovering is tremendously difficult. Accidental errors will almost always be caught at the TCP level, and you can assume that if the cryptography catches it, an attacker is tampering. In such a case, a smart attacker can cause a denial of service, no matter what. It's generally easiest to terminate the connection, perhaps sending back an error packet first.

Often, unrecoverable errors result in plaintext error messages. In such cases, you should be conservative and send no reason code signaling why you failed. There are instances in major protocols where verbose errors led to an important information leak.

When you're designing your protocol for client-server communications, you should include a sequence of messages between both parties to communicate to the other side that the connection is being terminated normally. That way, when a connection is prematurely terminated, both sides of the connection have some way of knowing whether the connection was terminated legitimately or was the result of a possible attack. In the latter case, you may wish to take appropriate action. For example, if the connection is prematurely terminated in the process of performing some database operation, you may want to roll back any changes that were made.

The next consideration is what the message format should look like. Generally, a message format will start out with a plaintext, fixed-size field encoding the length of the remainder of the message. Then, there may or may not be plaintext values, such as the message number (the message number can go inside the ciphertext, but often it's useful for computing the nonce, as opposed to assuming it). Finally comes the ciphertext and the MAC value (which may be one unit, depending on whether you use an authenticating encryption mode such as CWC).

Any unencrypted data in the message should be authenticated in a secure manner along with the encrypted data. Modes like CWC and CCM allow you to authenticate both plaintext and ciphertext with a single MAC value. CMAC has the same capability. With other MACs, you can simulate this behavior by MAC'ing the length of the plaintext portion, concatenated with the plaintext portion, concatenated with the ciphertext. To do this correctly, however, you must always include the plaintext length, even if it is zero.

Assume that we've established a TCP connection and exchanged a 128-bit key using a protocol such as PAX (as discussed in Recipe 8.15). Now, what should we do with that key? The answer depends on a few things. First, we might need separate keys for encryption and MAC'ing if we're not using a dual-use mode such as CWC. Second, we might have the client and server send messages in lockstep, or we might have them send messages asynchronously. If they send messages asynchronously, we can use a separate key for each direction or, if using a nonced encryption mode, manage two nonces, while ensuring that the client and server nonces never collide (we'll use this trick in the code below).

If you do need multiple keys for your setup, you can take the exchanged key and use it to derive those keys, as discussed in Recipe 4.11. If you do that, use the exchanged key only for derivation. Do not use it for anything else.

At this point, on each end of the connection, we should be left with an open file descriptor and whatever keys we need. Let's assume at this point that we're using CWC mode (using the API discussed in Recipe 5.10), our communication is synchronous, the file descriptor is in blocking mode, and the client sends the first message. We are using a random session key, so we don't have to make a derived key, as happens in Recipe 5.16.

The first thing we have to do is figure out how we're going to lay out the 11-byte nonce CWC mode gives us. We'll use the first byte to distinguish who is doing the sending, just in case we want to switch to asynchronous communication at a future point. The client will send with the high byte set to 0x80, and the server will send with that byte set to 0x00. We will then have a session-specific 40-bit (5-byte) random value chosen by the client, followed by a 5-byte message counter.

The message elements will be a status byte followed by the fixed-size nonce, followed by the length of the ciphertext encoded as a 32-bit big-endian value, followed finally by the CWC ciphertext (which includes the authentication value). The byte, the nonce, and the length field will be sent in the clear.

The status byte will always be 0x00, unless we're closing the connection, in which case we'll send 0xff. (If there is an error on the sender's end, we simply drop the connection instead of sending back an error status.) If we receive any nonzero value, we will terminate the connection. If the value is not 0x00 or 0xff, there was probably some sort of tampering.

When MAC'ing, we do not need to consider the nonce, because it is an integral element when the CWC message is validated. Similarly, the length field is implicitly authenticated during CWC decryption. The status byte should be authenticated, and we can pass it as associated data to CWC.

Now we have all the tools we need to complete our authenticated secure channel. First, let's create an abstraction for the connection, which will consist of a CWC encryption context, state information about the nonce, and the file descriptor over which we are communicating:

#include <stdlib.h>
#include <errno.h>
#include <cwc.h>

#define SPC_CLIENT_DISTINGUISHER 0x80
#define SPC_SERVER_DISTINGUISHER 0x00
#define SPC_SERVER_LACKS_NONCE   0xff

#define SPC_IV_IX   1
#define SPC_CTR_IX  6
#define SPC_IV_LEN  5
#define SPC_CTR_LEN 5

#define SPC_CWC_NONCE_LEN (SPC_IV_LEN + SPC_CTR_LEN + 1)

typedef struct {
  cwc_t         cwc;
  unsigned char nonce[SPC_CWC_NONCE_LEN];
  int           fd;
} spc_ssock_t;

After the key exchange completes, the client will have a key and a file descriptor connected to the server. We can use this information to initialize an spc_ssock_t:

/* keylen is in bytes.  Note that, on errors, we abort(), whereas you will
 * probably want to perform exception handling, as discussed in Recipe 13.1.
 * In any event, we never report an error to the other side; simply drop the
 * connection (by aborting).  We'll send a message when shutting down properly.
 */

void spc_init_client(spc_ssock_t *ctx, unsigned char *key, size_t klen, int fd) {
  if (klen != 16 && klen != 24 && klen != 32) abort();

  /* Remember that cwc_init() erases the key we pass in! */
  cwc_init(&(ctx->cwc), key, klen * 8);

  /* select 5 random bytes to place starting at nonce[1].  We use the API from
   * Recipe 11.2.
   */
  spc_rand(ctx->nonce + SPC_IV_IX, SPC_IV_LEN);

  /* Set the 5 counterbytes  to 0, indicating that we've sent no messages. */
  memset(ctx->nonce + SPC_CTR_IX, 0, SPC_CTR_LEN);
  ctx->fd = fd;

  /* This value always holds the value of the last person to send a message.
   * If the client goes to send a message, and this is sent to
   * SPC_CLIENT_DISTINGUISHER,  then we know there has been an error.
   */
  ctx->nonce[0] = SPC_SERVER_DISTINGUISHER;
}

The client may now send a message to the server using the following function, which accepts plaintext and encrypts it before sending:

#define SPC_CWC_TAG_LEN     16
#define SPC_MLEN_FIELD_LEN  4
#define SPC_MAX_MLEN        0xffffffff

static unsigned char spc_msg_ok  = 0x00;
static unsigned char spc_msg_end = 0xff;

static void spc_increment_counter(unsigned char *, size_t);
static void spc_ssock_write(int, unsigned char *, size_t);
static void spc_base_send(spc_ssock_t *ctx, unsigned char *msg, size_t mlen);

void spc_ssock_client_send(spc_ssock_t *ctx, unsigned char *msg, size_t mlen) {
  /* If it's not our turn to speak, abort. */
  if (ctx->nonce[0] != SPC_SERVER_DISTINGUISHER) abort();

  /* Set the distinguisher, then bump the counter before we actually send. */
  ctx->nonce[0] = SPC_CLIENT_DISTINGUISHER;
  spc_increment_counter(ctx->nonce + SPC_CTR_IX, SPC_CTR_LEN);
  spc_base_send(ctx, msg, mlen);
}

static void spc_base_send(spc_ssock_t *ctx, unsigned char *msg, size_t mlen) {
  unsigned char encoded_len[SPC_MLEN_FIELD_LEN];
  size_t        i;
  unsigned char *ct;

  /* If it's not our turn to speak, abort. */
  if (ctx->nonce[0] != SPC_SERVER_DISTINGUISHER) abort();

  /* First, write the status byte, then the nonce. */
  spc_ssock_write(ctx->fd, &spc_msg_ok, sizeof(spc_msg_ok));
  spc_ssock_write(ctx->fd, ctx->nonce, sizeof(ctx->nonce));

  /* Next, write the length of the ciphertext,
   * which will be the size of the plaintext plus SPC_CWC_TAG_LEN  bytes for
   * the tag.  We abort if the string is more than 2^32-1 bytes.
   * We do this in a way that is mostly oblivious to word size.
   */
  if (mlen > (unsigned long)SPC_MAX_MLEN ||  mlen < 0) abort( );
  for (i = 0;  i < SPC_MLEN_FIELD_LEN;   i++)
    encoded_len[SPC_MLEN_FIELD_LEN - i - 1] = (mlen >> (8 * i)) & 0xff;
  spc_ssock_write(ctx->fd, encoded_len, sizeof(encoded_len));
  /* Now, we perform the CWC encryption, and send the result. Note that,
   * if the send fails, and you do not abort as we do, you should remember to
   * deallocate the message buffer.
   */
  mlen += SPC_CWC_TAG_LEN;
  if (mlen < SPC_CWC_TAG_LEN) abort(); /* Message too long, mlen overflowed. */
  if (!(ct = (unsigned char *)malloc(mlen))) abort(); /* Out of memory.  */
  cwc_encrypt_message(&(ctx->cwc),  &spc_msg_ok, sizeof(spc_msg_ok), msg,
                        mlen - SPC_CWC_TAG_LEN, ctx->nonce, ct);
  spc_ssock_write(ctx->fd, ct, mlen);
  free(ct);
}

static void spc_increment_counter(unsigned char *ctr, size_t len) {
  while (len--) if (++ctr[len]) return;
  abort(); /* Counter rolled over, which is an error condition! */
}

static void spc_ssock_write( int fd, unsigned char *msg, size_t mlen) { 
  ssize_t w;

  while (mlen) {
    if ((w = write(fd, msg, mlen)) == -1) {
      switch (errno) {
case EINTR:
          break;
        default:
          abort();
      }
    } else {
      mlen -= w;
      msg += w;
    }
  }
}

Let's look at the rest of the client side of the connection, before we turn our attention to the server side. When the client wishes to terminate the connection politely, it will send an empty message but pass 0xff as the status byte. It must still send the proper nonce and encrypt a zero-length message (which CWC will quite happily do). That can be done with code very similar to the code shown previously, so we won't waste space by duplicating the code.

Now let's look at what happens when the client receives a message. The status byte should be 0x00. The nonce we get from the server should be unchanged from the one we just sent, except that the first byte should be SPC_SERVER_DISTINGUISHER. If the nonce is invalid, we'll just fail by aborting, though you could instead discard the message if you choose to do so (doing so is a bit problematic, though, because you then have to resync the connection somehow).

Next, we'll read the length value, dynamically allocating a buffer that's big enough to hold the ciphertext. This code can never allocate more than 232-1 bytes of memory. In practice, you should probably have a maximum message length and check to make sure the length field doesn't exceed that. Such a test can keep an attacker from launching a denial of service attack in which she has you allocate enough memory to slow down your machine.

Finally, we'll call cwc_decrypt_message( ) and see if the MAC validates. If it does, we'll return the message. Otherwise, we will abort.

static void spc_ssock_read(int, unsigned char *, size_t);
static void spc_get_status_and_nonce(int, unsigned char *, unsigned char *);
static unsigned char *spc_finish_decryption(spc_ssock_t *, unsigned char,
                                            unsigned char *, size_t *);

unsigned char *spc_client_read(spc_ssock_t *ctx,  size_t *len, size_t *end) {
  unsigned char status;
  unsigned char nonce[SPC_CWC_NONCE_LEN];

  /* If it's the client's turn to speak,  abort. */
  if (ctx->nonce[0] != SPC_CLIENT_DISTINGUISHER) abort();
  ctx->nonce[0] = SPC_SERVER_DISTINGUISHER;
  spc_get_status_and_nonce(ctx->fd, &status, nonce);
  *end = status;
  return spc_finish_decryption(ctx, status, nonce, len);
}

static void spc_get_status_and_nonce(int fd, unsigned char *status,
                                     unsigned char *nonce) {
  /* Read the status byte.  If it's 0x00 or 0xff, we're going to look at
   * the rest of the message, otherwise we'll just give up right away.  */
  spc_ssock_read(fd,  status, 1);
  if (*status != spc_msg_ok && *status != spc_msg_end) abort( );
  spc_ssock_read(fd, nonce, SPC_CWC_NONCE_LEN);
}

static unsigned char *spc_finish_decryption(spc_ssock_t *ctx, unsigned char status,
                                            unsigned char *nonce, size_t *len) {
  size_t        ctlen = 0, i;
  unsigned char *ct, encoded_len[SPC_MLEN_FIELD_LEN];

  /* Check the nonce. */
  for (i = 0;  i < SPC_CWC_NONCE_LEN;  i++)
    if (nonce[i] != ctx->nonce[i]) abort();

  /* Read the length field. */
  spc_ssock_read(ctx->fd, encoded_len, SPC_MLEN_FIELD_LEN);
  for (i = 0;  i < SPC_MLEN_FIELD_LEN;  i++) {
    ctlen <<= 8;
    ctlen += encoded_len[i];
  }

  /* Read the ciphertext. */
  if (!(ct = (unsigned char *)malloc(ctlen))) abort();
  spc_ssock_read(ctx->fd, ct, ctlen);

  /* Decrypt the ciphertext, and abort if decryption fails.
   * We decrypt into the buffer in which the ciphertext already lives.
   */
  if (!cwc_decrypt_message(&(ctx->cwc), &status, 1, ct, ctlen, nonce, ct)) {
    free(ct);
    abort();
  }

  *len = ctlen - SPC_CWC_TAG_LEN;
  /* We'll go ahead and avoid the realloc(), leaving SPC_CWC_TAG_LEN extra
   * bytes at the end of the buffer than we need to leave.
   */
  return ct;
}

static void spc_ssock_read(int fd, unsigned char *msg, size_t mlen) {
  ssize_t r;

  while (mlen) {
    if ((r = read(fd, msg, mlen)) == -1) {
      switch (errno) {
        case EINTR:
          break;
        default:
          abort();
      }
    } else {
      mlen -= r;
      msg += r;
    }
  }
}

The client is responsible for deallocating the memory for messages. We recommend securely wiping messages before doing so, as discussed in Recipe 13.2. In addition, you should securely erase the spc_ssock_t context when you are done with it.

That's everything on the client side. Now we can move on to the server. The server can share the spc_ssock_t type that the client uses, as well as all the helper functions, such as spc_ssock_read( ) and spc_ssock_write( ). But the API for initialization, reading, and writing must change.

Here's the server-side initialization function that should get called once the key exchange is complete but before the client's first message is read:

void spc_init_server(spc_ssock_t *ctx, unsigned char *key, size_t klen, int fd) {
  if (klen != 16 && klen != 24 && klen != 32) abort();

  /* Remember that cwc_init() erases the key we pass in! */
  cwc_init(&(ctx->cwc), key, klen * 8);

  /* We need to wait for the random portion of the nonce from the client.
   * The counter portion we can initialize to zero.  We'll set the distinguisher
   * to SPC_SERVER_LACKS_NONCE, so that we know to copy in the random portion
   * of the nonce when we receive a message.
   */
  ctx->nonce[0] = SPC_SERVER_LACKS_NONCE;
  memset(ctx->nonce + SPC_CTR_IX, 0, SPC_CTR_LEN);
  ctx->fd = fd;
}

The first thing the server does is read data from the client's socket. In practice, the following code isn't designed for a single-threaded server that uses select( ) to determine which client has data to be read. This is because once we start reading data, we keep reading until we've taken in the entire message, and all the reads are blocking. The code is not designed to work in a nonblocking environment.

Instead, you should use this code from a thread, or use the traditional Unix model, where you fork( ) off a new process for each client connection. Or you can simply rearrange the code so that you incrementally read data without blocking.

unsigned char *spc_server_read(spc_ssock_t *ctx,  size_t *len, size_t *end) {
  unsigned char nonce[SPC_CWC_NONCE_LEN], status;

  /* If it's the server's turn to speak, abort. We know it's the server's turn
   * to speak if the first byte of the nonce is the CLIENT distinguisher.
   */
  if (ctx->nonce[0] != SPC_SERVER_DISTINGUISHER &&
      ctx->nonce[0] != SPC_SERVER_LACKS_NONCE) abort();

  spc_get_status_and_nonce(ctx->fd, &status, nonce);
  *end = status;

  /* If we need to do so, copy over the random bytes of the nonce. */
  if (ctx->nonce[0] == SPC_SERVER_LACKS_NONCE)
    memcpy(ctx->nonce + SPC_IV_IX, nonce + SPC_IV_IX, SPC_IV_LEN);

  /* Now, set the distinguisher field to client, and increment our copy of
   * the nonce.
   */
  ctx->nonce[0] = SPC_CLIENT_DISTINGUISHER;
  spc_increment_counter(ctx->nonce + SPC_CTR_IX, SPC_CTR_LEN);

  return spc_finish_decryption(ctx, status, nonce, len);
}

Now we just need to handle the server-side sending of messages, which requires only a little bit of work:

void spc_ssock_server_send(spc_ssock_t *ctx, unsigned char *msg, size_t mlen) {
  /* If it's not our turn to speak, abort. We know it's our turn if the client
   * spoke last.
   */
  if (ctx->nonce[0] != SPC_CLIENT_DISTINGUISHER) abort();

  /* Set the distinguisher, but don't bump the counter, because we already did
   * when we received the message from the client.
   */
  ctx->nonce[0] = SPC_SERVER_DISTINGUISHER;
  spc_base_send(ctx, msg, mlen);
}

There is one more potential issue that we should note. In some situations in which you're going to be dealing with incredibly long messages, it does not make sense to have to know how much data is going to be in a message before you start to send it. Doing so will require buffering up large amounts of data, which might not always be possible, particularly in an embedded device.

In such cases, you need to be able to read the message incrementally, yet have some indication of where the message stops, so you know where to stop decrypting. Such scenarios require a special message format.

In this situation, we recommend sending data in fixed-size "frames." At the end of each frame is a field that indicates the length of the data that was in that frame, and a field that indicates whether the frame represents the end of a message. In nonfull frames, the bytes from the end of the data to the informational fields should be set to 0.

9.12.4 See Also

Recipe 4.11, Recipe 5.10, Recipe 5.16, Recipe 6.21, Recipe 8.15, Recipe 13.2