detail/CacheLocality.h - utilities for dynamic cache optimizations
Summary:
CacheLocality reads cache sharing information from sysfs to
determine how CPUs should be grouped to minimize contention, Getcpu
provides fast access to the current CPU via __vdso_getcpu, and
AccessSpreader uses these two to optimally spread accesses among a
predetermined number of stripes.
AccessSpreader<>::current(n) microbenchmarks at 22 nanos, which is
substantially less than the cost of a cache miss. This means that we
can effectively use it to reduce cache line ping-pong on striped data
structures such as IndexedMemPool or statistics counters.
Because CacheLocality looks at all of the cache levels, it can be used for
different levels of optimization. AccessSpreader<>::stripeByChip.current()
uses as few stripes as possible to avoid off-chip communication,
AccessSpreader<>::stripeByCore.current() uses as many stripes as necessary
to get the optimal speedup, but no more.
@override-unit-failures
Test Plan: new unit tests
Reviewed By: davejwatson@fb.com
FB internal diff:
D1076718