You want to use a MAC that is fast in both software and hardware.
Use CMAC. It is available from http://www.zork.org/cmac/.
CMAC is the message-integrity component of the CWC encryption mode. It is based on a universal hash function that is similar to hash127. It requires an 11-byte nonce per message. The Zork implementation has the following API:
int cmac_init(cmac_t *ctx, unsigned char key); void cmac_mac(cmac_t *ctx, unsigned char *msg, u_int32 msglen, unsigned char nonce, unsigned char output); void cmac_cleanup(cmac_t *ctx); void cmac_update(cmac_t *ctx, unsigned char *msg, u_int32 msglen); void cmac_final(cmac_t *ctx, unsigned char nonce, unsigned char output);
The cmac_t type keeps track of state and needs to be initialized only when you key the algorithm. You can then make messages interchangeably using the all-in-one API or the incremental API.
The all-in-one API consists of the cmac_mac( ) function. It takes an entire message and a nonce as arguments and produces a 16-byte output. If you want to use the incremental API, cmac_update( ) is used to pass in part of the message, and cmac_final( ) is used to set the nonce and get the resulting tag. The cmac_cleanup( ) function securely erases the context object.
To use the CMAC API, just copy the cmac.h and cmac.c files, and compile and link against cmac.c.
The CMAC home page: http://www.zork.org/cmac/