gdax-orderbook-hpp/demo/dependencies/rapidjson-1.1.0/doc/internals.md

   1 # Internals
   2
   3 This section records some design and implementation details.
   4
   5 [TOC]
   6
   7 # Architecture {#Architecture}
   8
   9 ## SAX and DOM
  10
  11 The basic relationships of SAX and DOM is shown in the following UML diagram.
  12
  13 ![Architecture UML class diagram](diagram/architecture.png)
  14
  15 The core of the relationship is the `Handler` concept. From the SAX side, `Reader` parses a JSON from a stream and publish events to a `Handler`. `Writer` implements the `Handler` concept to handle the same set of events. From the DOM side, `Document` implements the `Handler` concept to build a DOM according to the events. `Value` supports a `Value::Accept(Handler&)` function, which traverses the DOM to publish events.
  16
  17 With this design, SAX is not dependent on DOM. Even `Reader` and `Writer` have no dependencies between them. This provides flexibility to chain event publisher and handlers. Besides, `Value` does not depends on SAX as well. So, in addition to stringify a DOM to JSON, user may also stringify it to a XML writer, or do anything else.
  18
  19 ## Utility Classes
  20
  21 Both SAX and DOM APIs depends on 3 additional concepts: `Allocator`, `Encoding` and `Stream`. Their inheritance hierarchy is shown as below.
  22
  23 ![Utility classes UML class diagram](diagram/utilityclass.png)
  24
  25 # Value {#Value}
  26
  27 `Value` (actually a typedef of `GenericValue<UTF8<>>`) is the core of DOM API. This section describes the design of it.
  28
  29 ## Data Layout {#DataLayout}
  30
  31 `Value` is a [variant type](http://en.wikipedia.org/wiki/Variant_type). In RapidJSON's context, an instance of `Value` can contain 1 of 6 JSON value types. This is possible by using `union`. Each `Value` contains two members: `union Data data_` and a`unsigned flags_`. The `flags_` indiciates the JSON type, and also additional information.
  32
  33 The following tables show the data layout of each type. The 32-bit/64-bit columns indicates the size of the field in bytes.
  34
  35 | Null              |                                  |32-bit|64-bit|
  36 |-------------------|----------------------------------|:----:|:----:|
  37 | (unused)          |                                  |4     |8     |
  38 | (unused)          |                                  |4     |4     |
  39 | (unused)          |                                  |4     |4     |
  40 | `unsigned flags_` | `kNullType kNullFlag`            |4     |4     |
  41
  42 | Bool              |                                                    |32-bit|64-bit|
  43 |-------------------|----------------------------------------------------|:----:|:----:|
  44 | (unused)          |                                                    |4     |8     |
  45 | (unused)          |                                                    |4     |4     |
  46 | (unused)          |                                                    |4     |4     |
  47 | `unsigned flags_` | `kBoolType` (either `kTrueFlag` or `kFalseFlag`) |4     |4     |
  48
  49 | String              |                                     |32-bit|64-bit|
  50 |---------------------|-------------------------------------|:----:|:----:|
  51 | `Ch* str`           | Pointer to the string (may own)     |4     |8     |
  52 | `SizeType length`   | Length of string                    |4     |4     |
  53 | (unused)            |                                     |4     |4     |
  54 | `unsigned flags_`   | `kStringType kStringFlag ...`       |4     |4     |
  55
  56 | Object              |                                     |32-bit|64-bit|
  57 |---------------------|-------------------------------------|:----:|:----:|
  58 | `Member* members`   | Pointer to array of members (owned) |4     |8     |
  59 | `SizeType size`     | Number of members                   |4     |4     |
  60 | `SizeType capacity` | Capacity of members                 |4     |4     |
  61 | `unsigned flags_`   | `kObjectType kObjectFlag`           |4     |4     |
  62
  63 | Array               |                                     |32-bit|64-bit|
  64 |---------------------|-------------------------------------|:----:|:----:|
  65 | `Value* values`     | Pointer to array of values (owned)  |4     |8     |
  66 | `SizeType size`     | Number of values                    |4     |4     |
  67 | `SizeType capacity` | Capacity of values                  |4     |4     |
  68 | `unsigned flags_`   | `kArrayType kArrayFlag`             |4     |4     |
  69
  70 | Number (Int)        |                                     |32-bit|64-bit|
  71 |---------------------|-------------------------------------|:----:|:----:|
  72 | `int i`             | 32-bit signed integer               |4     |4     |
  73 | (zero padding)      | 0                                   |4     |4     |
  74 | (unused)            |                                     |4     |8     |
  75 | `unsigned flags_`   | `kNumberType kNumberFlag kIntFlag kInt64Flag ...` |4     |4     |
  76
  77 | Number (UInt)       |                                     |32-bit|64-bit|
  78 |---------------------|-------------------------------------|:----:|:----:|
  79 | `unsigned u`        | 32-bit unsigned integer             |4     |4     |
  80 | (zero padding)      | 0                                   |4     |4     |
  81 | (unused)            |                                     |4     |8     |
  82 | `unsigned flags_`   | `kNumberType kNumberFlag kUIntFlag kUInt64Flag ...` |4     |4     |
  83
  84 | Number (Int64)      |                                     |32-bit|64-bit|
  85 |---------------------|-------------------------------------|:----:|:----:|
  86 | `int64_t i64`       | 64-bit signed integer               |8     |8     |
  87 | (unused)            |                                     |4     |8     |
  88 | `unsigned flags_`   | `kNumberType kNumberFlag kInt64Flag ...`          |4     |4     |
  89
  90 | Number (Uint64)     |                                     |32-bit|64-bit|
  91 |---------------------|-------------------------------------|:----:|:----:|
  92 | `uint64_t i64`      | 64-bit unsigned integer             |8     |8     |
  93 | (unused)            |                                     |4     |8     |
  94 | `unsigned flags_`   | `kNumberType kNumberFlag kInt64Flag ...`          |4     |4     |
  95
  96 | Number (Double)     |                                     |32-bit|64-bit|
  97 |---------------------|-------------------------------------|:----:|:----:|
  98 | `uint64_t i64`      | Double precision floating-point     |8     |8     |
  99 | (unused)            |                                     |4     |8     |
 100 | `unsigned flags_`   | `kNumberType kNumberFlag kDoubleFlag` |4     |4     |
 101
 102 Here are some notes:
 103 * To reduce memory consumption for 64-bit architecture, `SizeType` is typedef as `unsigned` instead of `size_t`.
 104 * Zero padding for 32-bit number may be placed after or before the actual type, according to the endianess. This makes possible for interpreting a 32-bit integer as a 64-bit integer, without any conversion.
 105 * An `Int` is always an `Int64`, but the converse is not always true.
 106
 107 ## Flags {#Flags}
 108
 109 The 32-bit `flags_` contains both JSON type and other additional information. As shown in the above tables, each JSON type contains redundant `kXXXType` and `kXXXFlag`. This design is for optimizing the operation of testing bit-flags (`IsNumber()`) and obtaining a sequential number for each type (`GetType()`).
 110
 111 String has two optional flags. `kCopyFlag` means that the string owns a copy of the string. `kInlineStrFlag` means using [Short-String Optimization](#ShortString).
 112
 113 Number is a bit more complicated. For normal integer values, it can contains `kIntFlag`, `kUintFlag`,  `kInt64Flag` and/or `kUint64Flag`, according to the range of the integer. For numbers with fraction, and integers larger than 64-bit range, they will be stored as `double` with `kDoubleFlag`.
 114
 115 ## Short-String Optimization {#ShortString}
 116
 117  [Kosta](https://github.com/Kosta-Github) provided a very neat short-string optimization. The optimization idea is given as follow. Excluding the `flags_`, a `Value` has 12 or 16 bytes (32-bit or 64-bit) for storing actual data. Instead of storing a pointer to a string, it is possible to store short strings in these space internally. For encoding with 1-byte character type (e.g. `char`), it can store maximum 11 or 15 characters string inside the `Value` type.
 118
 119 | ShortString (Ch=char) |                                   |32-bit|64-bit|
 120 |---------------------|-------------------------------------|:----:|:----:|
 121 | `Ch str[MaxChars]`  | String buffer                       |11    |15    |
 122 | `Ch invLength`      | MaxChars - Length                   |1     |1     |
 123 | `unsigned flags_`   | `kStringType kStringFlag ...`       |4     |4     |
 124
 125 A special technique is applied. Instead of storing the length of string directly, it stores (MaxChars - length). This make it possible to store 11 characters with trailing `\0`.
 126
 127 This optimization can reduce memory usage for copy-string. It can also improve cache-coherence thus improve runtime performance.
 128
 129 # Allocator {#InternalAllocator}
 130
 131 `Allocator` is a concept in RapidJSON:
 132 ~~~cpp
 133 concept Allocator {
 134     static const bool kNeedFree;    //!< Whether this allocator needs to call Free().
 135
 136     // Allocate a memory block.
 137     // \param size of the memory block in bytes.
 138     // \returns pointer to the memory block.
 139     void* Malloc(size_t size);
 140
 141     // Resize a memory block.
 142     // \param originalPtr The pointer to current memory block. Null pointer is permitted.
 143     // \param originalSize The current size in bytes. (Design issue: since some allocator may not book-keep this, explicitly pass to it can save memory.)
 144     // \param newSize the new size in bytes.
 145     void* Realloc(void* originalPtr, size_t originalSize, size_t newSize);
 146
 147     // Free a memory block.
 148     // \param pointer to the memory block. Null pointer is permitted.
 149     static void Free(void *ptr);
 150 };
 151 ~~~
 152
 153 Note that `Malloc()` and `Realloc()` are member functions but `Free()` is static member function.
 154
 155 ## MemoryPoolAllocator {#MemoryPoolAllocator}
 156
 157 `MemoryPoolAllocator` is the default allocator for DOM. It allocate but do not free memory. This is suitable for building a DOM tree.
 158
 159 Internally, it allocates chunks of memory from the base allocator (by default `CrtAllocator`) and stores the chunks as a singly linked list. When user requests an allocation, it allocates memory from the following order:
 160
 161 1. User supplied buffer if it is available. (See [User Buffer section in DOM](doc/dom.md))
 162 2. If user supplied buffer is full, use the current memory chunk.
 163 3. If the current block is full, allocate a new block of memory.
 164
 165 # Parsing Optimization {#ParsingOptimization}
 166
 167 ## Skip Whitespaces with SIMD {#SkipwhitespaceWithSIMD}
 168
 169 When parsing JSON from a stream, the parser need to skip 4 whitespace characters:
 170
 171 1. Space (`U+0020`)
 172 2. Character Tabulation (`U+000B`)
 173 3. Line Feed (`U+000A`)
 174 4. Carriage Return (`U+000D`)
 175
 176 A simple implementation will be simply:
 177 ~~~cpp
 178 void SkipWhitespace(InputStream& s) {
 179     while (s.Peek() == ' ' || s.Peek() == '\n' || s.Peek() == '\r' || s.Peek() == '\t')
 180         s.Take();
 181 }
 182 ~~~
 183
 184 However, this requires 4 comparisons and a few branching for each character. This was found to be a hot spot.
 185
 186 To accelerate this process, SIMD was applied to compare 16 characters with 4 white spaces for each iteration. Currently RapidJSON only supports SSE2 and SSE4.2 instructions for this. And it is only activated for UTF-8 memory streams, including string stream or *in situ* parsing.
 187
 188 To enable this optimization, need to define `RAPIDJSON_SSE2` or `RAPIDJSON_SSE42` before including `rapidjson.h`. Some compilers can detect the setting, as in `perftest.h`:
 189
 190 ~~~cpp
 191 // __SSE2__ and __SSE4_2__ are recognized by gcc, clang, and the Intel compiler.
 192 // We use -march=native with gmake to enable -msse2 and -msse4.2, if supported.
 193 #if defined(__SSE4_2__)
 194 #  define RAPIDJSON_SSE42
 195 #elif defined(__SSE2__)
 196 #  define RAPIDJSON_SSE2
 197 #endif
 198 ~~~
 199
 200 Note that, these are compile-time settings. Running the executable on a machine without such instruction set support will make it crash.
 201
 202 ### Page boundary issue
 203
 204 In an early version of RapidJSON, [an issue](https://code.google.com/archive/p/rapidjson/issues/104) reported that the `SkipWhitespace_SIMD()` causes crash very rarely (around 1 in 500,000). After investigation, it is suspected that `_mm_loadu_si128()` accessed bytes after `'\0'`, and across a protected page boundary.
 205
 206 In [Intel® 64 and IA-32 Architectures Optimization Reference Manual
 207 ](http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html), section 10.2.1:
 208
 209 > To support algorithms requiring unaligned 128-bit SIMD memory accesses, memory buffer allocation by a caller function should consider adding some pad space so that a callee function can safely use the address pointer safely with unaligned 128-bit SIMD memory operations.
 210 > The minimal padding size should be the width of the SIMD register that might be used in conjunction with unaligned SIMD memory access.
 211
 212 This is not feasible as RapidJSON should not enforce such requirement.
 213
 214 To fix this issue, currently the routine process bytes up to the next aligned address. After tha, use aligned read to perform SIMD processing. Also see [#85](https://github.com/miloyip/rapidjson/issues/85).
 215
 216 ## Local Stream Copy {#LocalStreamCopy}
 217
 218 During optimization, it is found that some compilers cannot localize some member data access of streams into local variables or registers. Experimental results show that for some stream types, making a copy of the stream and used it in inner-loop can improve performance. For example, the actual (non-SIMD) implementation of `SkipWhitespace()` is implemented as:
 219
 220 ~~~cpp
 221 template<typename InputStream>
 222 void SkipWhitespace(InputStream& is) {
 223     internal::StreamLocalCopy<InputStream> copy(is);
 224     InputStream& s(copy.s);
 225
 226     while (s.Peek() == ' ' || s.Peek() == '\n' || s.Peek() == '\r' || s.Peek() == '\t')
 227         s.Take();
 228 }
 229 ~~~
 230
 231 Depending on the traits of stream, `StreamLocalCopy` will make (or not make) a copy of the stream object, use it locally and copy the states of stream back to the original stream.
 232
 233 ## Parsing to Double {#ParsingDouble}
 234
 235 Parsing string into `double` is difficult. The standard library function `strtod()` can do the job but it is slow. By default, the parsers use normal precision setting. This has has maximum 3 [ULP](http://en.wikipedia.org/wiki/Unit_in_the_last_place) error and implemented in `internal::StrtodNormalPrecision()`.
 236
 237 When using `kParseFullPrecisionFlag`, the parsers calls `internal::StrtodFullPrecision()` instead, and this function actually implemented 3 versions of conversion methods.
 238 1. [Fast-Path](http://www.exploringbinary.com/fast-path-decimal-to-floating-point-conversion/).
 239 2. Custom DIY-FP implementation as in [double-conversion](https://github.com/floitsch/double-conversion).
 240 3. Big Integer Method as in (Clinger, William D. How to read floating point numbers accurately. Vol. 25. No. 6. ACM, 1990).
 241
 242 If the first conversion methods fail, it will try the second, and so on.
 243
 244 # Generation Optimization {#GenerationOptimization}
 245
 246 ## Integer-to-String conversion {#itoa}
 247
 248 The naive algorithm for integer-to-string conversion involves division per each decimal digit. We have implemented various implementations and evaluated them in [itoa-benchmark](https://github.com/miloyip/itoa-benchmark).
 249
 250 Although SSE2 version is the fastest but the difference is minor by comparing to the first running-up `branchlut`. And `branchlut` is pure C++ implementation so we adopt `branchlut` in RapidJSON.
 251
 252 ## Double-to-String conversion {#dtoa}
 253
 254 Originally RapidJSON uses `snprintf(..., ..., "%g")`  to achieve double-to-string conversion. This is not accurate as the default precision is 6. Later we also find that this is slow and there is an alternative.
 255
 256 Google's V8 [double-conversion](https://github.com/floitsch/double-conversion
 257 ) implemented a newer, fast algorithm called Grisu3 (Loitsch, Florian. "Printing floating-point numbers quickly and accurately with integers." ACM Sigplan Notices 45.6 (2010): 233-243.).
 258
 259 However, since it is not header-only so that we implemented a header-only version of Grisu2. This algorithm guarantees that the result is always accurate. And in most of cases it produces the shortest (optimal) string representation.
 260
 261 The header-only conversion function has been evaluated in [dtoa-benchmark](https://github.com/miloyip/dtoa-benchmark).
 262
 263 # Parser {#Parser}
 264
 265 ## Iterative Parser {#IterativeParser}
 266
 267 The iterative parser is a recursive descent LL(1) parser
 268 implemented in a non-recursive manner.
 269
 270 ### Grammar {#IterativeParserGrammar}
 271
 272 The grammar used for this parser is based on strict JSON syntax:
 273 ~~~~~~~~~~
 274 S -> array | object
 275 array -> [ values ]
 276 object -> { members }
 277 values -> non-empty-values | ε
 278 non-empty-values -> value addition-values
 279 addition-values -> ε | , non-empty-values
 280 members -> non-empty-members | ε
 281 non-empty-members -> member addition-members
 282 addition-members -> ε | , non-empty-members
 283 member -> STRING : value
 284 value -> STRING | NUMBER | NULL | BOOLEAN | object | array
 285 ~~~~~~~~~~
 286
 287 Note that left factoring is applied to non-terminals `values` and `members`
 288 to make the grammar be LL(1).
 289
 290 ### Parsing Table {#IterativeParserParsingTable}
 291
 292 Based on the grammar, we can construct the FIRST and FOLLOW set.
 293
 294 The FIRST set of non-terminals is listed below:
 295
 296 |    NON-TERMINAL   |               FIRST              |
 297 |:-----------------:|:--------------------------------:|
 298 |       array       |                 [                |
 299 |       object      |                 {                |
 300 |       values      | ε STRING NUMBER NULL BOOLEAN { [ |
 301 |  addition-values  |              ε COMMA             |
 302 |      members      |             ε STRING             |
 303 |  addition-members |              ε COMMA             |
 304 |       member      |              STRING              |
 305 |       value       |  STRING NUMBER NULL BOOLEAN { [  |
 306 |         S         |                [ {               |
 307 | non-empty-members |              STRING              |
 308 |  non-empty-values |  STRING NUMBER NULL BOOLEAN { [  |
 309
 310 The FOLLOW set is listed below:
 311
 312 |    NON-TERMINAL   |  FOLLOW |
 313 |:-----------------:|:-------:|
 314 |         S         |    $    |
 315 |       array       | , $ } ] |
 316 |       object      | , $ } ] |
 317 |       values      |    ]    |
 318 |  non-empty-values |    ]    |
 319 |  addition-values  |    ]    |
 320 |      members      |    }    |
 321 | non-empty-members |    }    |
 322 |  addition-members |    }    |
 323 |       member      |   , }   |
 324 |       value       |  , } ]  |
 325
 326 Finally the parsing table can be constructed from FIRST and FOLLOW set:
 327
 328 |    NON-TERMINAL   |           [           |           {           |          ,          | : | ] | } |          STRING         |         NUMBER        |          NULL         |        BOOLEAN        |
 329 |:-----------------:|:---------------------:|:---------------------:|:-------------------:|:-:|:-:|:-:|:-----------------------:|:---------------------:|:---------------------:|:---------------------:|
 330 |         S         |         array         |         object        |                     |   |   |   |                         |                       |                       |                       |
 331 |       array       |       [ values ]      |                       |                     |   |   |   |                         |                       |                       |                       |
 332 |       object      |                       |      { members }      |                     |   |   |   |                         |                       |                       |                       |
 333 |       values      |    non-empty-values   |    non-empty-values   |                     |   | ε |   |     non-empty-values    |    non-empty-values   |    non-empty-values   |    non-empty-values   |
 334 |  non-empty-values | value addition-values | value addition-values |                     |   |   |   |  value addition-values  | value addition-values | value addition-values | value addition-values |
 335 |  addition-values  |                       |                       |  , non-empty-values |   | ε |   |                         |                       |                       |                       |
 336 |      members      |                       |                       |                     |   |   | ε |    non-empty-members    |                       |                       |                       |
 337 | non-empty-members |                       |                       |                     |   |   |   | member addition-members |                       |                       |                       |
 338 |  addition-members |                       |                       | , non-empty-members |   |   | ε |                         |                       |                       |                       |
 339 |       member      |                       |                       |                     |   |   |   |      STRING : value     |                       |                       |                       |
 340 |       value       |         array         |         object        |                     |   |   |   |          STRING         |         NUMBER        |          NULL         |        BOOLEAN        |
 341
 342 There is a great [tool](http://hackingoff.com/compilers/predict-first-follow-set) for above grammar analysis.
 343
 344 ### Implementation {#IterativeParserImplementation}
 345
 346 Based on the parsing table, a direct(or conventional) implementation
 347 that pushes the production body in reverse order
 348 while generating a production could work.
 349
 350 In RapidJSON, several modifications(or adaptations to current design) are made to a direct implementation.
 351
 352 First, the parsing table is encoded in a state machine in RapidJSON.
 353 States are constructed by the head and body of production.
 354 State transitions are constructed by production rules.
 355 Besides, extra states are added for productions involved with `array` and `object`.
 356 In this way the generation of array values or object members would be a single state transition,
 357 rather than several pop/push operations in the direct implementation.
 358 This also makes the estimation of stack size more easier.
 359
 360 The state diagram is shown as follows:
 361
 362 ![State Diagram](diagram/iterative-parser-states-diagram.png)
 363
 364 Second, the iterative parser also keeps track of array's value count and object's member count
 365 in its internal stack, which may be different from a conventional implementation.