The VM is optimized for creating threads, so you can usually create a new thread when you need to without having to worry about performance. But in some circumstances, maintaining a pool of threads can improve performance. For example, in a case where you would otherwise create and destroy many short-lived threads, you are far better off holding onto a (variable-sized) pool of threads. Here, the tasks are assigned to an already created thread, and when a thread completes its task, it is returned to the pool, ready for the next task. This improves performance because thread creation (and, to some extent, destruction) does have a significant overhead that is better avoided for short-lived threads.
A second situation is where you want to limit the number of threads in your application. In this case, your application needs to make all thread requests through a centralized pool manager. Although a pool manager does not prevent other threads from being started, it is a big step toward that goal. (Strictly speaking, limiting threads does not require a pool of threads, just a centralized pool manager, but the two usually come together.) Every system has a response curve with diminishing returns after a certain number of threads are running on it. This response curve is different for different systems, and you need to identify values for your particular setup. A heavy-duty server needs to show good behavior across a spectrum of loads, and at the high end, you don't want your server crashing when 10,000 requests try to spawn 10,000 threads; instead, you want the server response to degrade (e.g., by queuing requests) and maintain whatever maximum number of threads is optimal for the server system.
When deciding which thread to run next, there may be a slight gain by choosing the available thread that ran most recently. This thread is most likely to have its working set still fully in memory: the longer it has been since a thread was last used, the more likely it is that the thread has been paged or swapped out. Also, any caches (at any level of the system and application) that may apply are more likely to contain elements from the most recently used thread. By choosing the most recently used thread, paging and cache overhead may be minimized.
Thread pools can be completely generic if necessary. By using the java.lang.reflect package, you can execute any (public) methods from your threads, thus allowing you to implement a thread pool that can handle general requests that have not been anticipated or specified at implementation time.