I've been doing some fiddling around with a high-performance queue, and have noticed something.  There is a wide-spread belief that CPU caches have a line size of 64 bytes.
Alas, all the world is not X86.
Modern PowerPC cache line size is 128 bytes.  Same with Itanium-2.  IBM's system Z CPUs have a line size of 256 bytes.
I saw the results of this personally with our Itanium Linux machine.  My queue performed VERY poorly.  I increased the padding between critical fields to assume a 128-byte cache line, and magically the performance improved dramatically.
While I'm on the subject, I heard from a smart person that cache lines do not necessarily align themselves on the corresponding memory boundaries.  I.e. assuming a 64-byte line size and a cache miss, a memory access to address 0 would load a cache line with 0-63.  However, if that access were instead to address 32, it would load a cache line with 32-95.
However, I have not been able to verify this experimentally.  I've tried on X86, SparcV9, and Itanium, and they all appear to align the caches to a multiple of the cache line size.  I.e. an access to address 32 would load 0-63.
If anybody has clarification here, I would appreciate a note.  Thanks.
UPDATE: another colleague of mine has stated categorically that order of access will NOT affect the mapping of cache lines to memory addresses.  That matches my experimental evidence.
No comments:
Post a Comment