detail/CacheLocality.h - utilities for dynamic cache optimizations
authorNathan Bronson <ngbronson@fb.com>
Fri, 22 Nov 2013 01:15:06 +0000 (17:15 -0800)
committerJordan DeLong <jdelong@fb.com>
Fri, 20 Dec 2013 21:04:59 +0000 (13:04 -0800)
commit3d0c8f284dd1ce098db0fffb3b72c9396b6b50d2
tree1d5bc5bbda08df07d00e46b9e40ee68d40c073e2
parent3952281ca7ba3c4c133b9e6f0ed4651c35424bb8
detail/CacheLocality.h - utilities for dynamic cache optimizations

Summary:
CacheLocality reads cache sharing information from sysfs to
determine how CPUs should be grouped to minimize contention, Getcpu
provides fast access to the current CPU via __vdso_getcpu, and
AccessSpreader uses these two to optimally spread accesses among a
predetermined number of stripes.

AccessSpreader<>::current(n) microbenchmarks at 22 nanos, which is
substantially less than the cost of a cache miss.  This means that we
can effectively use it to reduce cache line ping-pong on striped data
structures such as IndexedMemPool or statistics counters.

Because CacheLocality looks at all of the cache levels, it can be used for
different levels of optimization.  AccessSpreader<>::stripeByChip.current()
uses as few stripes as possible to avoid off-chip communication,
AccessSpreader<>::stripeByCore.current() uses as many stripes as necessary
to get the optimal speedup, but no more.

@override-unit-failures

Test Plan: new unit tests

Reviewed By: davejwatson@fb.com

FB internal diff: D1076718
folly/Makefile.am
folly/detail/CacheLocality.cpp [new file with mode: 0644]
folly/detail/CacheLocality.h [new file with mode: 0644]
folly/test/CacheLocalityTest.cpp [new file with mode: 0644]
folly/test/DeterministicSchedule.cpp
folly/test/DeterministicSchedule.h