I've been doing some fiddling around with a high-performance queue, and have noticed something. There is a wide-spread belief that CPU caches have a line size of 64 bytes.
Alas, all the world is not X86.
Modern PowerPC cache line size is 128 bytes. Same with Itanium-2. IBM's system Z CPUs have a line size of 256 bytes.
I saw the results of this personally with our Itanium Linux machine. My queue performed VERY poorly. I increased the padding between critical fields to assume a 128-byte cache line, and magically the performance improved dramatically.
While I'm on the subject, I heard from a smart person that cache lines do not necessarily align themselves on the corresponding memory boundaries. I.e. assuming a 64-byte line size and a cache miss, a memory access to address 0 would load a cache line with 0-63. However, if that access were instead to address 32, it would load a cache line with 32-95.
However, I have not been able to verify this experimentally. I've tried on X86, SparcV9, and Itanium, and they all appear to align the caches to a multiple of the cache line size. I.e. an access to address 32 would load 0-63.
If anybody has clarification here, I would appreciate a note. Thanks.
UPDATE: another colleague of mine has stated categorically that order of access will NOT affect the mapping of cache lines to memory addresses. That matches my experimental evidence.
Sunday, May 18, 2014
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment