DTLB Misses VS Cache(LLC) Misses count

DTLB Misses VS Cache(LLC) Misses count

Hello ,

I have an query about the number of DTLB miss count and Cache (LLC) Miss count.
As I understand the number of cache (LLC) miss count should be always greater or equal to the number of DTLB misses.
But for one of my test case of Binary search tree, I observed that, on Nehalem server the number of LLC miss count (MEM_LOAD_RETIRED.LLC_MISS) are less than DTLB miss count (MEM_LOAD_RETIRED.DTLB_MISS).
Please find the graph in attachment. ( I have plotted the number of events not number of samples)

Can you please give any suggestion on this behavior?

Thanking you,

Regards,
Dny

AttachmentSize
Download NHL_BinTree.png5.39 KB
10 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

It seems entirely possible to incur more DTLB misses in (1st level) data cache than misses in last level cache.

Quoting - tim18
It seems entirely possible to incur more DTLB misses in (1st level) data cache than misses in last level cache.

Hello Sir,

Thanks for your reply,

I didn't understand your answer completely, Can you please explain in more detail?
As per understanding for each DTLB miss (whether 1 lever or 2nd level) there will be always a LLC miss, because accessing DTLB means data needs to be fetch from main memory as it not available on any cache. So LLC Miss should be greater than or equal to DTLB misses.

Is my understanding correct?

Thanking you,

Regards,
Dny

Hi Dny,

I suggest that you read article -http://assets.devx.com/goparallel/18027.pdf

L2 Miss MEM_LOAD_RETIRED.L2_LINE_MISS ~165 desktop/~300 server
L1 DTLB MissMEM_LOAD_RETIRED.DTLB_MISS ~10

For Intel? Core? i7 processors, measuring DTLB misses - read http://software.intel.com/en-us/articles/using-intel-vtune-performance-analyzer-to-optimize-software-for-the-intelr-coretm-i7-processor-family/

Estimate the impact of "TLB misses" ((DTLB_LOAD_MISSES.WALK_COMPLETED * 30) / CPU_CLK_UNHALTED.THREAD) * 100
>If impact is significant (> 5-10%), optimize functions with high DTLB misses

Hope it helps!

Regards, Peter

Quoting - Peter Wang (Intel)

Hi Dny,

I suggest that you read article -http://assets.devx.com/goparallel/18027.pdf

L2 Miss MEM_LOAD_RETIRED.L2_LINE_MISS ~165 desktop/~300 server
L1 DTLB Miss MEM_LOAD_RETIRED.DTLB_MISS ~10

For Intel? Core? i7 processors, measuring DTLB misses - read http://software.intel.com/en-us/articles/using-intel-vtune-performance-analyzer-to-optimize-software-for-the-intelr-coretm-i7-processor-family/

Estimate the impact of "TLB misses" ((DTLB_LOAD_MISSES.WALK_COMPLETED * 30) / CPU_CLK_UNHALTED.THREAD) * 100
>If impact is significant (> 5-10%), optimize functions with high DTLB misses

Hope it helps!

Regards, Peter

Hello Sir,

I already referred these two documents and it helped me a lot. These documents helps us to calcu;ate the impact of LLC, DTLB misses and how the CPU CLK Cycles are being used.

My query is regarding the total number of LLC misses and DTLB misses. I'm wondering why the total number of DTLB Misses are higher than than the LLC misses.

Thanking you,

Regards,
Dny.

Probably this is due to situations: L1 Cache Hit, L1 DTLB Miss andL2 Cache Hit, L1 DTLB Miss. They are rare, but possible if the data iscirculatingaround.

-Vladimir

On earlier Intel CPUs, the in-cache DTLB miss was common enough, and handled poorly enough, to constitute a significant reason for performance loss. The capacity of DTLB covers only a small fraction of the last level cache capacity, soon to be reduced further on new models. Situations where attention to data locality may improve performance already become more frequent on 6 core CPUs.

Quoting - Vladimir Tsymbal (Intel)

Probably this is due to situations: L1 Cache Hit, L1 DTLB Miss andL2 Cache Hit, L1 DTLB Miss. They are rare, but possible if the data iscirculatingaround.

Hello Sir,

How can be its possible to have L1 Cache Hit and L1 DTLB Miss?
Can you explain in more details?

I'm still finding out the exact cause of this behavior.

Thanking you,

Regards,
Dny

Quoting - Dny

How can be its possible to have L1 Cache Hit and L1 DTLB Miss?
Can you explain in more details?

One of the examples I could imagine is acirculatingdata that fit into L1 cache but with high stride whichcausespage walking. It might be a corner case at the beginning of the cycle -- so, there could be a number of DTLB misses counted one event more than cache misses. In case of enough cycles it might become visible in results. I didn't try to reproduce it, though.

-Vladimir

Quoting - Dny

Hello Sir,

How can be its possible to have L1 Cache Hit and L1 DTLB Miss?
Can you explain in more details?

I'm still finding out the exact cause of this behavior.

Thanking you,

Regards,
Dny

Could it be due to the fact that sampling is not accurate? It might be that the different counters are sampled at different times and that's the cause of the difference.

Guy.

Leave a Comment

Please sign in to add a comment. Not a member? Join today