DSA Memory coherence


Terms in this set (...)

Sufficient memory properties to ensure coherence (3)
Coherence vs consistency
Coherence: defines the behavior of reads and writes to the same memory location
Consistency: defines the behavior of reads and writes wrt accesses to other memory locations
How do caches in multiprocessors support migration and replication?
Migration: by moving data item to a local cache, reducing latency to access remotely allocated shared data item and reducing BW demand on the shared mem
Replication for shared data being simultaneously read, reduces both latency of access and contention for a read shared data item.
Basics of a directory based snooping protocol
The sharing status of a particular block of physical memory is kept in one location per core, called the directory
Basics of snooping protocol
All cache controllers monitor or snoop on the broadcast medium connecting all caches to determine whether or not they have a copy of the block that is requested on a bus or switch access. To invalidate, the writing processor puts the address on the bus and all snooping processors check whether they have that address in their cache - if they do, then it's invalidated.
Types of snooping protocols (2)
1. write invalidate: when a processor writes a value in its cache, all other copies of that value in other caches and in memory are "invalidated". The invalidation actually occurs when some other processor tries to read that from its own cache, and this is also when memory is updated - write back occurs only when block is replaced.
2. write update/ write broadcast protocol: broadcast all writes to shared cache lines. consumes more bandwidth.
How is serialization maintained in snooping schemes
Since there's one common bus and only one processor can access it at any point of time, writes are automatically serialized
How is a data item located on a read miss for write back snooping
In write through, this is easy because memory is always updated. In write back, snooping strategy used - the address in memory which is to be accessed is placed on the bus. If a proc snooping on the bus finds that it has a dirty copy of the requested cache block, it provides that cache block in response, and the memory or shared cache access by the requesting proc is aborted.
Modified state
Also called "unshared". The state "shared" signals that other caches also have this block. When the proc writes a shared block, it sets it to "unshared" or "modified", implying that it is the most updated.
Parallel snooping
Snooping operations or cache requests for different blocks can proceed independently even though there's a single controller for each cache due to interleaving of bus access
Read miss on a modified cache block
Address conflict miss; write back block and then place read miss on the bus
Write hit on a shared cache block
Write to local cache and place invalidate on bus. Also called upgrade or ownership miss, since they do not fetch data but only change the state.
Write hit on a modified cache block
Write to local cache normally. No invalidate is placed on the bus since there was one sent to set the block as M, and nothing else has set it to S yet.
Read miss on the bus on a modified block
Owner places cache block on bus and its status is set to S
Writing to an S block
First step is a write miss, because to write, you have to invalidate all others. Send write miss to bus, which would trigger a write back if anyone else had that block as E, followed by invalidating that block. In the next cycle, the initiating processor writes to the now E block
Address conflict miss
Stimuli causing state change apply to a block in a private cache, not to a specific address in the cache. If aliasing occurs where a different address is mapped to the block but can be identified as different by the tag, read miss occurs on a shared cache block but for a different address - place the read miss on the bus.
Exclusive state in MESI
When cache block is only in a single cache but it is clean. Any read miss converts it to S. Writing to an E block requires no state change, no bus use, no invalidate, hence faster.
How to keep cache tag checking separate from CPU cache accesses
1. duplicate set of tags for L1 caches to allow checks in || with CPU
2. L2 cache already exists as duplicate provided there is multilevel inclusion
M: only updated copy with me
E: only copy, no need to invalidate others, memory potentially otu of date
S: multiple coherent copies
I: invalid copy, someone else has an M
Coherency miss types (2)
1. true sharing misses: due to communication of data through cache coherence medium, .eg. write miss on a shared block
2. false sharing misses: due to multiword block size, whole block invalidated because some other word being written by another proc - no actual data communicated, just an extra cache miss
States of a memory block in dist directory protocol (3)
1. shared: multiple caches have ti, memory up to date
2. unshared: no processor has it, not valid in any cache
3. exclusive: 1 processor has data, memory out of date
Why is a directory scheme preferred in larger multicore systems
Snooping schemes require communication with all caches on every cache miss. This doesn't scale well - on larger systems, the coherence traffic requires more bandwidth than any modern bus available. Hence directory scheme used to separate local traffic from remote memory traffic, reducing the BW demands on the memory system.
What does a dist directory track
In a bit vector associated with each memory block, tracks whether corresponding multicore processor has a copy of that block. Bit vector also tracks the owner of the block when it is in an exclusive state, and the state of each cache block at the individual caches. There is some form of a sharer's list.
The 3 processors involved in direct based scheme
1. local node/ cache: where a request originates
2. home node/ directory: where the memory location of an address resides
3. remote node/cache has a copy of a cache block, whether exclusive or shared. might be the same as the home node, in which case interprocessor comm is replaced by intraprocessor comm