Caches use local data when present and thus don't need to access nonlocal data. If the data is not present locally, the nonlocal data must be accessed or calculated; it is then stored locally as well as being returned. After the first access, the data is available locally, and access is quicker. How much quicker depends on the type of cache.
Most caches have to maintain the consistency of the data held in the cache: it is usually important for the data in the cache to be up to date. When considering the use of a cache, bear in mind the expected lifetime of the data and any refresh rate or time-to-live values associated with the data. Similarly, for output data, consider how long to keep data in the cache before it must be written out. You may have differing levels of priority for writing out different types of data. For example, some filesystems keep general written data in a write cache, but immediately write critical system data that ensures system consistency in case of crashes. Also, as caches cannot usually hold all the data you would like, a strategy for swapping data out of the cache to overcome cache space limitations is usually necessary. The memory used by the cache is often significant, and it is always better to release the resources used by it explicitly when it is no longer needed, or reduce resources being used by the cache when possible, even if the cache itself is still required.
Caching can apply to data held in single objects or groups of objects. For single objects, it is usual to maintain a structure or instance variable that holds cached values. For groups of objects, there is usually a structure maintained at the point of access to the elements of the group. In addition, caching applies generally to two types of locality of access, usually referred to as spatial and temporal. Spatial locality refers to the idea that if something is accessed, it is likely that something else nearby will be accessed soon. This is one of the reasons buffering I/O streams works so well. If every subsequent byte read from disk were in a completely different part of the disk, I/O buffering would be no help at all. Temporal locality refers to the idea that if you access something, you are likely to access it again in the near future. This is the principle behind browsers holding files locally once downloaded.
There is a lot of research into the use of caches, but most of it is related to CPU or disk hardware caches. Nevertheless, any good article or book chapter on caches should cover the basics and the pitfalls, and these are normally applicable (with some extra thought) to caches in applications. One thing you should do is monitor cache-hit rates, i.e., the number of times that accessing data retrieves data from the cache, compared to the total number of data accesses. This is important because if the cache-hit rate is too low, the overhead of having a cache may be more than any actual gain in performance. In this case, tune or disable the cache. It is frequently useful to build-in the option of disabling and emptying the cache. This can be very helpful for two reasons. First, you can make direct comparisons of operations with and without the cache, and second, there are times when you want to measure the overhead in filling an empty cache. In this case, you may need to repeatedly fill an empty cache to get a good measurement.