Evaluation of Design Alternatives for a
Directory-Based Cache Coherence Protocol
in Shared-Memory Multiprocessors

Håkan Grahn

Doctoral thesis submitted for partial fulfilment of the requirements for the degree of
Doctor of Philosophy in Computer Engineering.

Lund October, 1995


My thesis is a summary of papers, and all parts of the thesis are available as postscript files. If you would like a hard-copy instead, please send me a mail.

The following seven papers are summarized in the thesis. (In 1994, I changed my last name from Nilsson to Grahn, which is why some of the papers are published under my old name.)

  1. H. Grahn, P. Stenström, and M. Dubois: ``Implementation and Evaluation of Update-Based Cache Protocols Under Relaxed Memory Consistency Models,'' Future Generation Computer Systems, Vol. 11, No. 3, pages 247-271, June 1995. (pdf Note that this pdf file does not have the same layout as the journal paper.)

  2. H. Nilsson and P. Stenström: ``An Adaptive Update-Based Cache Coherence Protocol for Reduction of Miss Rate and Traffic,'' In Proceedings of Parallel Architectures and Languages Europe (PARLE), Lecture Notes in Computer Science 817, Springer-Verlag, pages 363-374, July 1994. The paper received Best Paper Award at the conference. (pdf)

  3. H. Nilsson and P. Stenström: ``The Scalable Tree Protocol -- A Cache Coherence Approach for Large-Scale Multiprocessors,'' In Proceedings of Fourth IEEE Symposium on Parallel and Distributed Processing, pages 498-506, December 1992. (pdf)

  4. H. Nilsson and P. Stenström: ``Performance Evaluation of Link-Based Cache Coherence Schemes,'' In Proceedings of the 26th Annual Hawaii International Conference on System Sciences, pages 486-495, January 1993. (pdf)

  5. H. Grahn and P. Stenström: ``Efficient Strategies for Software-Only Directory Protocols in Shared-Memory Multiprocessors,'' In Proceedings of the 22nd International Symposium on Computer Architecture, pages 38-47, June 1995. (pdf)

  6. H. Grahn and P. Stenström: ``Architectural Support for an Efficient Implementation of a Software-Only Directory Cache Coherence Protocol,'' Technical Report, Dept. of Computer Engineering, Lund University, June 1995. Presented at the Fifth Workshop on Scalable Shared-Memory Multiprocessors, June 1995. (pdf)

  7. H. Grahn and P. Stenström: ``Relative Performance of Hardware and Software-Only Directory Protocols Under Latency Tolerating and Reducing Techniques,'' Technical Report, Dept. of Computer Engineering, Lund University, August 1995. (pdf)


In shared-memory multiprocessors, caches are attached to the processors in order to reduce the memory access latency. To keep the memory consistent, a cache coherence protocol is needed. A well known approach is to record which caches have copies of a memory block in a directory and only notify the caches having a copy when a processor modifies the block. Such a protocol is called a directory-based cache coherence protocol. This thesis, which is a summary of seven papers, identifies three problems in a directory-based protocol, and evaluates implementation and performance aspects of some design alternatives. The evaluation methodology is based on program-driven simulation.

The write-invalidate policy, which is used in the baseline protocol, forces all other copies of a block to be invalidated when a processor modifies the block. This leads to a cache miss each time a processor accesses an invalidated block. To reduce the number of cache misses, a competitive-update policy is proposed in this thesis. The competitive-update policy is shown to reduce both the read stall and execution times as compared to write-invalidate under a relaxed memory consistency model. However, update-based policies need more buffering and hardware support in the caches.

In the baseline protocol, the implementation cost of the directory is proportional to the number of caches. To reduce this cost, an alternative directory organization is proposed which distributes the directory information among the caches sharing the same memory block. To achieve a low write latency, the caches sharing a block are organized in a tree. The caches are linked into the tree in parallel with application execution to achieve a low read latency.

The hardware-implemented directory controller in the baseline protocol may lead to high design complexity and implementation cost. This thesis evaluates a design alternative where the controller is implemented using software handlers executed on the compute processor. By using efficient strategies and proper architectural support, this design alternative is shown to be competitive with the baseline protocol. However, the performance of this alternative is more sensitive to other design choices, e.g., block size and latency tolerating techniques, than the baseline protocol.

Last Modified: October 10, 1995