You are using a cipher mode that is not intrinsically parallelizable, but you have a large data set and want to take advantage of multiple processors at your disposal.
Treat the data as multiple streams of interleaved data.
Recipe 5.13 demonstrates how to parallelize CTR mode encryption on a per-block level using a single encryption context. Instead of having spc_pctr_do_even( ) and spc_pctr_do_odd( ) share a key and nonce, you could use two separate encryption contexts. In such a case, there is no need to limit your choice of mode to one that is intrinsically parallelizable. However, note that you won't get the same results when using two separate contexts as you do when you use a single context, even if you use the same key and IV or nonce (remembering that IV/nonce reuse is a bad idea?and that certainly applies here).
One consideration is how much to interleave. There's no need to interleave on a block level. For example, if you are using two parallel encryption contexts, you could encrypt the first 1,024 bytes of data with the first context, then alternate every 1,024 bytes.
Generally, it is best to use a different key for each context. You can derive multiple keys from a single base key, as shown in Recipe 4.11.
It's easiest to consider interleaving only at the plaintext level, particularly if you're using a block-based mode, where padding will generally be added for each cipher context. In such a case, you would send the encrypted data in multiple independent streams and reassemble it after decryption.
Recipe 4.11, Recipe 5.13