Many of these suggestions apply only after a bottleneck has been identified:
Include multithreading at the design stage.
Parallelize tasks with threads to speed up calculations.
Run slow operations in their own threads to avoid slowing down the main thread.
Keep the interface in a separate thread from other work so that the application feels more responsive.
Avoid designs and implementations that force points of serialized execution.
Use multiple resolution strategies racing in different threads to get quicker answers.
Avoid locking more resources than necessary.
Avoid synchronizing methods of stateless objects.
Build classes with synchronized wrappers, and use synchronized versions except when unsynchronized versions are definitely sufficient.
Selectively unwrap synchronized wrapped classes to eliminate identified bottlenecks.
Avoid synchronized blocks by using thread-specific data structures, combining data only when necessary.
Use atomic assignment where applicable.
Load-balance the application by distributing tasks among multiple threads, using a queue and thread-balancing mechanism for distributing tasks among task-processing threads.
Use thread pools to reuse threads if many threads are needed or if threads are needed for very short tasks.
Use a thread pool manager to limit the number of concurrent threads used.