Core-to-Core Latency

As the core count of modern CPUs is growing, we are reaching a time when the time to access each core from a different core is no longer a constant. Even before the advent of heterogeneous SoC designs, processors built on large rings or meshes can have different latencies to access the nearest core compared to the furthest core. This rings true especially in multi-socket server environments.

But modern CPUs, even desktop and consumer CPUs, can have variable access latency to get to another core. For example, in the first generation Threadripper CPUs, we had four chips on the package, each with 8 threads, and each with a different core-to-core latency depending on if it was on-die or off-die. This gets more complex with products like Lakefield, which has two different communication buses depending on which core is talking to which.

If you are a regular reader of AnandTech’s CPU reviews, you will recognize our Core-to-Core latency test. It’s a great way to show exactly how groups of cores are laid out on the silicon. This is a custom in-house test, and we know there are competing tests out there, but we feel ours is the most accurate to how quick an access between two cores can happen.


(Click on image to enlarge)

Looking at core-to-core latencies going from Alder Lake (12th Gen) to Raptor Lake (13th Gen), things look quite similar on the surface. The P-cores are listed within Windows 11 from cores 0 to 15, and in comparison to Alder Lake, latencies are much the same as what we saw when we reviewed the Core i9-12900K last year. The same comments apply here as with the Core i9-12900K, as we again see more of a bi-directional cache coherence.

Latencies between each Raptor Cove core have actually improved when compared to the Golden Cove cores on Alder Lake from 4.3/4.4 ns, down to 3.8/4.1 ns per each L1 access point.

The biggest difference is the doubling of the E-cores (Gracemont) on the Core i9-13900K, which as a consequence, adds more paths and crossovers. These paths do come with a harsher latency penalty than we saw with the Core i9-12900K, with latencies around the E-cores ranging from 48 to 54 ns within four core jumps between them; this is actually slower than it was on Alder Lake.

One possible reason for the negative latency is the 200 MHz reduction in base frequency on the Gracemont cores on Raptor Lake when compared with Alder Lake. When each E-core (Gracemont) core is communicating with each other, they travel through the L2 cache clusters via the L3 cache ring and back again, which does seem quite an inefficient way to go.

Test Bed and Setup: Updating Our Test Suite for 2023 SPEC2017 Single-Threaded Results
POST A COMMENT

169 Comments

View All Comments

  • mode_13h - Friday, October 21, 2022 - link

    "The new instruction cache on Gracemont is actually very unique. x86 instruction encoding is all over the place and in the worst (and very rare) case can be as long as 15 bytes long. Pre-decoding an instruction is a costly linear operation and you can’t seek the next instruction before determining the length of the prior one. Gracemont, like Tremont, does not have a micro-op cache like the big cores do, so instructions do have to be decoded each time they are fetched. To assist that process, Gracemont introduced a new on-demand instruction length decoder or OD-ILD for short. The OD-ILD generates pre-decode information which is stored alongside the instruction cache. This allows instructions fetched from the L1$ for the second time to bypass the usual pre-decode stage and save on cycles and power."

    Source: https://fuse.wikichip.org/news/6102/intels-gracemo...
    Reply
  • Sailor23M - Friday, October 21, 2022 - link

    Interesting to see Ryzen 5 7600X perform so well in excel/ppt benchmarks. Why is that so? Reply
  • Makste - Friday, October 21, 2022 - link

    Thank you for the review. So Intel too, is finally throwing more cores and increasing frequencies to the problem these days, which increases heat and power usage in turn. AMD too, is a culprit of this practice but has not gone to these lengths as Intel. 16 cores versus supposedly efficiency cores. What is not happening? Reply
  • ricebunny - Friday, October 21, 2022 - link

    It would be a good idea to highlight that the MT Spec benchmarks are just N instantiations of the single thread test. They are not indicative of parallel computing application performance. There are a few dedicated SPEC benchmarks for parallel performance but for some reason they are never included in Anandtechs benchmarks. Reply
  • Ryan Smith - Friday, October 21, 2022 - link

    "There are a few dedicated SPEC benchmarks for parallel performance but for some reason they are never included in Anandtechs benchmarks."

    They're not part of the actual SPEC CPU suite. I'm assuming you're talking about the SPEC Workstation benchmarks, which are system-level benchmarks and a whole other kettle of fish.

    With SPEC, we're primarily after a holistic look at the CPU architecture, and in the rate-N workloads, whether there's enough memory bandwidth and other resources to keep the CPU cores fed.
    Reply
  • wolfesteinabhi - Friday, October 21, 2022 - link

    its strange to me that when we are talking about value ...especially for budget constraint buyers ... who are also willing to let go of bleeding edge/performance ... we dont even mention AM4 platform.

    AM4 is still good ..if not great (not to say mature/stable) platform for many ..and you can still buy a lot of reasonably price good procs including 5800X3D ...and users have still chance to upgrade it upto 5950X if they need more cpu at a later date.
    Reply
  • cowymtber - Friday, October 21, 2022 - link

    Burning hot POS. Reply
  • BernieW - Friday, October 21, 2022 - link

    Disappointed that you didn't spend more time investigating the serious regression for the 13900K vs the 12900K in the 502.gc_r test. The single threaded test does not have the same regression so it's a curious result that could indicate something wrong with the test setup. Alternately, perhaps the 13900K was throttling during that part of the test or maybe E cores are really not good at compiling code. Reply
  • Avalon - Friday, October 21, 2022 - link

    I had that same thought. Why publish something so obviously anomalous and not even say anything about it? Did you try re-testing it? Did you accidentally flip the scores between the 12th and 13th gen? There's no obvious reason this should be happening given the few changes between 12th and 13th gen cores. Reply
  • Ryan Smith - Friday, October 21, 2022 - link

    "Disappointed that you didn't spend more time investigating the serious regression for the 13900K vs the 12900K in the 502.gc_r test."

    We still are. That was flagged earlier this week, and re-runs have produced the same results.

    So at this point we're digging into matters a bit more trying to figure out what is going on, as the cause is non-obvious. I'm thinking it may be a thread director hiccup or an issue with the ratio of P and E cores, but there's a lot of different (and weird) ways this could go.
    Reply

Log in

Don't have an account? Sign up now