I've been doing some fiddling around with a high-performance queue, and have noticed something. There is a wide-spread belief that CPU caches have a line size of 64 bytes.
Alas, all the world is not X86.
Modern PowerPC cache line size is 128 bytes. Same with Itanium-2. IBM's system Z CPUs have a line size of 256 bytes.
I saw the results of this personally with our Itanium Linux machine. My queue performed VERY poorly. I increased the padding between critical fields to assume a 128-byte cache line, and magically the performance improved dramatically.
While I'm on the subject, I heard from a smart person that cache lines do not necessarily align themselves on the corresponding memory boundaries. I.e. assuming a 64-byte line size and a cache miss, a memory access to address 0 would load a cache line with 0-63. However, if that access were instead to address 32, it would load a cache line with 32-95.
However, I have not been able to verify this experimentally. I've tried on X86, SparcV9, and Itanium, and they all appear to align the caches to a multiple of the cache line size. I.e. an access to address 32 would load 0-63.
If anybody has clarification here, I would appreciate a note. Thanks.
UPDATE: another colleague of mine has stated categorically that order of access will NOT affect the mapping of cache lines to memory addresses. That matches my experimental evidence.
No comments:
Post a Comment