Programming Languages Research Group: Git

author	Nathan Bronson <ngbronson@fb.com>
	Fri, 22 Nov 2013 01:15:06 +0000 (17:15 -0800)
committer	Jordan DeLong <jdelong@fb.com>
	Fri, 20 Dec 2013 21:04:59 +0000 (13:04 -0800)
commit	3d0c8f284dd1ce098db0fffb3b72c9396b6b50d2
tree	1d5bc5bbda08df07d00e46b9e40ee68d40c073e2	tree \| snapshot
parent	3952281ca7ba3c4c133b9e6f0ed4651c35424bb8	commit \| diff

detail/CacheLocality.h - utilities for dynamic cache optimizations

Summary:
CacheLocality reads cache sharing information from sysfs to
determine how CPUs should be grouped to minimize contention, Getcpu
provides fast access to the current CPU via __vdso_getcpu, and
AccessSpreader uses these two to optimally spread accesses among a
predetermined number of stripes.

AccessSpreader<>::current(n) microbenchmarks at 22 nanos, which is
substantially less than the cost of a cache miss. This means that we
can effectively use it to reduce cache line ping-pong on striped data
structures such as IndexedMemPool or statistics counters.

Because CacheLocality looks at all of the cache levels, it can be used for
different levels of optimization. AccessSpreader<>::stripeByChip.current()
uses as few stripes as possible to avoid off-chip communication,
AccessSpreader<>::stripeByCore.current() uses as many stripes as necessary
to get the optimal speedup, but no more.

@override-unit-failures

Test Plan: new unit tests

Reviewed By: davejwatson@fb.com

FB internal diff: D1076718

folly/Makefile.am		diff \| blob \| history
folly/detail/CacheLocality.cpp	[new file with mode: 0644]	blob
folly/detail/CacheLocality.h	[new file with mode: 0644]	blob
folly/test/CacheLocalityTest.cpp	[new file with mode: 0644]	blob
folly/test/DeterministicSchedule.cpp		diff \| blob \| history
folly/test/DeterministicSchedule.h		diff \| blob \| history