folly/docs/Histogram.md

   1 `folly/Histogram.h`
   2 -------------------
   3
   4 ### Classes
   5 ***
   6
   7 #### `Histogram`
   8
   9 `Histogram.h` defines a simple histogram class, templated on the type of data
  10 you want to store.  This class is useful for tracking a large stream of data
  11 points, where you want to remember the overall distribution of the data, but do
  12 not need to remember each data point individually.
  13
  14 Each histogram bucket stores the number of data points that fell in the bucket,
  15 as well as the overall sum of the data points in the bucket.  Note that no
  16 overflow checking is performed, so if you have a bucket with a large number of
  17 very large values, it may overflow and cause inaccurate data for this bucket.
  18 As such, the histogram class is not well suited to storing data points with
  19 very large values.  However, it works very well for smaller data points such as
  20 request latencies, request or response sizes, etc.
  21
  22 In addition to providing access to the raw bucket data, the `Histogram` class
  23 also provides methods for estimating percentile values.  This allows you to
  24 estimate the median value (the 50th percentile) and other values such as the
  25 95th or 99th percentiles.
  26
  27 All of the buckets have the same width.  The number of buckets and bucket width
  28 is fixed for the lifetime of the histogram.  As such, you do need to know your
  29 expected data range ahead of time in order to have accurate statistics.  The
  30 histogram does keep one bucket to store all data points that fall below the
  31 histogram minimum, and one bucket for the data points above the maximum.
  32 However, because these buckets don't have a good lower/upper bound, percentile
  33 estimates in these buckets may be inaccurate.
  34
  35 #### `HistogramBuckets`
  36
  37 The `Histogram` class is built on top of `HistogramBuckets`.
  38 `HistogramBuckets` provides an API very similar to `Histogram`, but allows a
  39 user-defined bucket class.  This allows users to implement more complex
  40 histogram types that store more than just the count and sum in each bucket.
  41
  42 When computing percentile estimates `HistogramBuckets` allows user-defined
  43 functions for computing the average value and data count in each bucket.  This
  44 allows you to define more complex buckets which may have multiple different
  45 ways of computing the average value and the count.
  46
  47 For example, one use case could be tracking timeseries data in each bucket.
  48 Each set of timeseries data can have independent data in the bucket, which can
  49 show how the data distribution is changing over time.
  50
  51 ### Example Usage
  52 ***
  53
  54 Say we have code that sends many requests to remote services, and want to
  55 generate a histogram showing how long the requests take.  The following code
  56 will initialize histogram with 50 buckets, tracking values between 0 and 5000.
  57 (There are 50 buckets since the bucket width is specified as 100.  If the
  58 bucket width is not an even multiple of the histogram range, the last bucket
  59 will simply be shorter than the others.)
  60
  61 ``` Cpp
  62     folly::Histogram<int64_t> latencies(100, 0, 5000);
  63 ```
  64
  65 The addValue() method is used to add values to the histogram.  Each time a
  66 request finishes we can add its latency to the histogram:
  67
  68 ``` Cpp
  69     latencies.addValue(now - startTime);
  70 ```
  71
  72 You can access each of the histogram buckets to display the overall
  73 distribution.  Note that bucket 0 tracks all data points that were below the
  74 specified histogram minimum, and the last bucket tracks the data points that
  75 were above the maximum.
  76
  77 ``` Cpp
  78     unsigned int numBuckets = latencies.getNumBuckets();
  79     cout << "Below min: " << latencies.getBucketByIndex(0).count << "\n";
  80     for (unsigned int n = 1; n < numBuckets - 1; ++n) {
  81       cout << latencies.getBucketMin(n) << "-" << latencies.getBucketMax(n)
  82            << ": " << latencies.getBucketByIndex(n).count << "\n";
  83     }
  84     cout << "Above max: "
  85          << latencies.getBucketByIndex(numBuckets - 1).count << "\n";
  86 ```
  87
  88 You can also use the `getPercentileEstimate()` method to estimate the value at
  89 the Nth percentile in the distribution.  For example, to estimate the median,
  90 as well as the 95th and 99th percentile values:
  91
  92 ``` Cpp
  93     int64_t median = latencies.getPercentileEstimate(0.5);
  94     int64_t p95 = latencies.getPercentileEstimate(0.95);
  95     int64_t p99 = latencies.getPercentileEstimate(0.99);
  96 ```
  97
  98 ### Thread Safety
  99 ***
 100
 101 Note that `Histogram` and `HistogramBuckets` objects are not thread-safe.  If
 102 you wish to access a single `Histogram` from multiple threads, you must perform
 103 your own locking to ensure that multiple threads do not access it at the same
 104 time.