From: Alex Lorenz Date: Tue, 19 Aug 2014 17:05:58 +0000 (+0000) Subject: Docs: add documentation for the coverage mapping format. X-Git-Url: http://plrg.eecs.uci.edu/git/?p=oota-llvm.git;a=commitdiff_plain;h=e3cea5fd9a8c275906eb059f844258ecd4cc6ca6 Docs: add documentation for the coverage mapping format. Differential Revision: http://reviews.llvm.org/D4729 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215990 91177308-0d34-0410-b5e6-96231b3b80d8 --- diff --git a/docs/CoverageMappingFormat.rst b/docs/CoverageMappingFormat.rst new file mode 100644 index 00000000000..8fcffb838a3 --- /dev/null +++ b/docs/CoverageMappingFormat.rst @@ -0,0 +1,576 @@ +.. role:: raw-html(raw) + :format: html + +================================= +LLVM Code Coverage Mapping Format +================================= + +.. contents:: + :local: + +Introduction +============ + +LLVM's code coverage mapping format is used to provide code coverage +analysis using LLVM's and Clang's instrumenation based profiling +(Clang's ``-fprofile-instr-generate`` option). + +This document is aimed at those who use LLVM's code coverage mapping to provide +code coverage analysis for their own programs, and for those who would like +to know how it works under the hood. A prior knowledge of how Clang's profile +guided optimization works is useful, but not required. + +We start by showing how to use LLVM and Clang for code coverage analysis, +then we briefly desribe LLVM's code coverage mapping format and the +way that Clang and LLVM's code coverage tool work with this format. After +the basics are down, more advanced features of the coverage mapping format +are discussed - such as the data structures, LLVM IR representation and +the binary encoding. + +Quick Start +=========== + +Here's a short story that describes how to generate code coverage overview +for a sample source file called *test.c*. + +* First, compile an instrumented version of your program using Clang's + ``-fprofile-instr-generate`` option with the additional ``-fcoverage-mapping`` + option: + + ``clang -o test -fprofile-instr-generate -fcoverage-mapping test.c`` +* Then, run the instrumented binary. The runtime will produce a file called + *default.profraw* containing the raw profile instrumentation data: + + ``./test`` +* After that, merge the profile data using the *llvm-profdata* tool: + + ``llvm-profdata merge -o test.profdata default.profraw`` +* Finally, run LLVM's code coverage tool (*llvm-cov*) to produce the code + coverage overview for the sample source file: + + ``llvm-cov show ./test -instr-profile=test.profdata test.c`` + +High Level Overview +=================== + +LLVM's code coverage mapping format is designed to be a self contained +data format, that can be embedded into the LLVM IR and object files. +It's described in this document as a **mapping** format because its goal is +to store the data that is required for a code coverage tool to map between +the specific source ranges in a file and the execution counts obtained +after running the instrumented version of the program. + +The mapping data is used in two places in the code coverage process: + +1. When clang compiles a source file with ``-fcoverage-mapping``, it + generates the mapping information that describes the mapping between the + source ranges and the profiling instrumentation counters. + This information gets embedded into the LLVM IR and conveniently + ends up in the final executable file when the program is linked. + +2. It is also used by *llvm-cov* - the mapping information is extracted from an + object file and is used to associate the execution counts (the values of the + profile instrumentation counters), and the source ranges in a file. + After that, the tool is able to generate various code coverage reports + for the program. + +The coverage mapping format aims to be a "universal format" that would be +suitable for usage by any frontend, and not just by Clang. It also aims to +provide the frontend the possibility of generating the minimal coverage mapping +data in order to reduce the size of the IR and object files - for example, +instead of emitting mapping information for each statement in a function, the +frontend is allowed to group the statements with the same execution count into +regions of code, and emit the mapping information only for those regions. + +Advanced Concepts +================= + +The remainder of this guide is meant to give you insight into the way the +coverage mapping format works. + +The coverage mapping format operates on a per-function level as the +profile instrumentation counters are associated with a specific function. +For each function that requires code coverage, the frontend has to create +coverage mapping data that can map between the source code ranges and +the profile instrumentation counters for that function. + +Mapping Region +-------------- + +The function's coverage mapping data contains an array of mapping regions. +A mapping region stores the `source code range`_ that is covered by this region, +the `file id `_, the `coverage mapping counter`_ and +the region's kind. +There are several kinds of mapping regions: + +* Code regions associate portions of source code and `coverage mapping + counters`_. They make up the majority of the mapping regions. They are used + by the code coverage tool to compute the execution counts for lines, + highlight the regions of code that were never executed, and to obtain + the various code coverage statistics for a function. + For example: + + :raw-html:`
int main(int argc, const char *argv[]) {     // Code Region from 1:40 to 9:2
+                                              
+    if (argc > 1) {                            // Code Region from 3:17 to 5:4
+      printf("%s\n", argv[1]);              
+    } else {                                   // Code Region from 5:10 to 7:4
+      printf("\n");                         
+    }                                         
+    return 0;                                 
+  }
+  
` +* Skipped regions are used to represent source ranges that were skipped + by Clang's preprocessor. They don't associate with + `coverage mapping counters`_, as the frontend knows that they are never + executed. They are used by the code coverage tool to mark the skipped lines + inside a function as non-code lines that don't have execution counts. + For example: + + :raw-html:`
int main() {                // Code Region from 1:12 to 6:2
+  #ifdef DEBUG                // Skipped Region from 2:1 to 4:2
+    printf("Hello world"); 
+  #endif                     
+    return 0;                
+  }
+  
` +* Expansion regions are used to represent Clang's macro expansions. They + have an additional property - *expanded file id*. This property can be + used by the code coverage tool to find the mapping regions that are created + as a result of this macro expansion, by checking if their file id matches the + expanded file id. They don't associate with `coverage mapping counters`_, + as the code coverage tool can determine the execution count for this region + by looking up the execution count of the first region with a corresponding + file id. + For example: + + :raw-html:`
int func(int x) {                             
+    #define MAX(x,y) ((x) > (y)? (x) : (y))     
+    return MAX(x, 42);                           // Expansion Region from 3:10 to 3:13
+  }
+  
` + +.. _source code range: + +Source Range: +^^^^^^^^^^^^^ + +The source range record contains the starting and ending location of a certain +mapping region. Both locations include the line and the column numbers. + +.. _coverage file id: + +File ID: +^^^^^^^^ + +The file id an integer value that tells us +in which source file or macro expansion is this region located. +It enables Clang to produce mapping information for the code +defined inside macros, like this example demonstrates: + +:raw-html:`
void func(const char *str) {         // Code Region from 1:28 to 6:2 with file id 0
+  #define PUT printf("%s\n", str)    // 2 Code Regions from 2:15 to 2:34 with file ids 1 and 2
+  if(*str)                          
+    PUT;                             // Expansion Region from 4:5 to 4:8 with file id 0 that expands a macro with file id 1
+  PUT;                               // Expansion Region from 5:3 to 5:6 with file id 0 that expands a macro with file id 2
+}
+
` + +.. _coverage mapping counter: +.. _coverage mapping counters: + +Counter: +^^^^^^^^ + +A coverage mapping counter can represents a reference to the profile +instrumentation counter. The execution count for a region with such counter +is determined by looking up the value of the corresponding profile +instrumentation counter. + +It can also represent a binary arithmetical expression that operates on +coverage mapping counters or other expressions. +The execution count for a region with an expression counter is determined by +evaluating the expression's arguments and then adding them together or +subtracting them from one another. +In the example below, a subtraction expression is used to compute the execution +count for the compound statement that follows the *else* keyword: + +:raw-html:`
int main(int argc, const char *argv[]) {    // Region's counter is a reference to the profile counter #0
+                                           
+  if (argc > 1) {                           // Region's counter is a reference to the profile counter #1
+    printf("%s\n", argv[1]);                
+  } else {                                  // Region's counter is an expression (reference to the profile counter #0 - reference to the profile counter #1)
+    printf("\n");                        
+  }                                        
+  return 0;                                
+}
+
` + +Finally, a coverage mapping counter can also represent an execution count of +of zero. The zero counter is used to provide coverage mapping for +unreachable statements and expressions, like in the example below: + +:raw-html:`
int main() {                  
+  return 0;                   
+  printf("Hello world!\n");    // Unreachable region's counter is zero
+}
+
` + +The zero counters allow the code coverage tool to display proper line execution +counts for the unreachable lines and highlight the unreachable code. +Without them, the tool would think that those lines and regions were still +executed, as it doesn't possess the frontend's knowledge. + +LLVM IR Representation +====================== + +The coverage mapping data is stored in the LLVM IR using a single global +constant structure variable called *__llvm_coverage_mapping* +with the *__llvm_covmap* section specifier. + +For example, let’s consider a C file and how it gets compiled to LLVM: + +.. _coverage mapping sample: + +.. code-block:: c + + int foo() { + return 42; + } + int bar() { + return 13; + } + +The coverage mapping variable generated by Clang is: + +.. code-block:: llvm + + @__llvm_coverage_mapping = internal constant { i32, i32, i32, i32, [2 x { i8*, i32, i32 }], [40 x i8] } + { i32 2, ; The number of function records + i32 20, ; The length of the string that contains the encoded translation unit filenames + i32 20, ; The length of the string that contains the encoded coverage mapping data + i32 0, ; Coverage mapping format version + [2 x { i8*, i32, i32 }] [ ; Function records + { i8*, i32, i32 } { i8* getelementptr inbounds ([3 x i8]* @__llvm_profile_name_foo, i32 0, i32 0), ; Function's name + i32 3, ; Function's name length + i32 9 ; Function's encoded coverage mapping data string length + }, + { i8*, i32, i32 } { i8* getelementptr inbounds ([3 x i8]* @__llvm_profile_name_bar, i32 0, i32 0), ; Function's name + i32 3, ; Function's name length + i32 9 ; Function's encoded coverage mapping data string length + }], + [40 x i8] c"..." ; Encoded data (dissected later) + }, section "__llvm_covmap", align 8 + +Version: +-------- + +The coverage mapping version number can have the following values: + +* 0 — The first (current) version of the coverage mapping format. + +.. _function records: + +Function record: +---------------- + +A function record is a structure of the following type: + +.. code-block:: llvm + + { i8*, i32, i32 } + +It contains the pointer to the function's name, function's name length, +and the length of the encoded mapping data for that function. + +Encoded data: +------------- + +The encoded data is stored in a single string that contains +the encoded filenames used by this translation unit and the encoded coverage +mapping data for each function in this translation unit. + +The encoded data has the following structure: + +``[filenames, coverageMappingDataForFunctionRecord0, coverageMappingDataForFunctionRecord1, ..., padding]`` + +If necessary, the encoded data is padded with zeroes so that the size +of the data string is rounded up to the nearest multiple of 8 bytes. + +Dissecting the sample: +^^^^^^^^^^^^^^^^^^^^^^ + +Here's an overview of the encoded data that was stored in the +IR for the `coverage mapping sample`_ that was shown earlier: + +* The IR contains the following string constant that represents the encoded + coverage mapping data for the sample translation unit: + + .. code-block:: llvm + + c"\01\12/Users/alex/test.c\01\00\00\01\01\01\0C\02\02\01\00\00\01\01\04\0C\02\02\00\00" + +* The string contains values that are encoded in the LEB128 format, which is + used throughout for storing integers. It also contains a string value. + +* The length of the substring that contains the encoded translation unit + filenames is the value of the second field in the *__llvm_coverage_mapping* + structure, which is 20, thus the filenames are encoded in this string: + + .. code-block:: llvm + + c"\01\12/Users/alex/test.c" + + This string contains the following data: + + * Its first byte has a value of ``0x01``. It stores the number of filenames + contained in this string. + * Its second byte stores the length of the first filename in this string. + * The remaining 18 bytes are used to store the first filename. + +* The length of the substring that contains the encoded coverage mapping data + for the first function is the value of the third field in the first + structure in an array of `function records`_ stored in the + fifth field of the *__llvm_coverage_mapping* structure, which is the 9. + Therefore, the coverage mapping for the first function record is encoded + in this string: + + .. code-block:: llvm + + c"\01\00\00\01\01\01\0C\02\02" + + This string consists of the following bytes: + + +----------+-------------------------------------------------------------------------------------------------------------------------+ + | ``0x01`` | The number of file ids used by this function. There is only one file id used by the mapping data in this function. | + +----------+-------------------------------------------------------------------------------------------------------------------------+ + | ``0x00`` | An index into the filenames array which corresponds to the file "/Users/alex/test.c". | + +----------+-------------------------------------------------------------------------------------------------------------------------+ + | ``0x00`` | The number of counter expressions used by this function. This function doesn't use any expressions. | + +----------+-------------------------------------------------------------------------------------------------------------------------+ + | ``0x01`` | The number of mapping regions that are stored in an array for the function's file id #0. | + +----------+-------------------------------------------------------------------------------------------------------------------------+ + | ``0x01`` | The coverage mapping counter for the first region in this function. The value of 1 tells us that it's a coverage | + | | mapping counter that is a reference ot the profile instrumentation counter with an index of 0. | + +----------+-------------------------------------------------------------------------------------------------------------------------+ + | ``0x01`` | The starting line of the first mapping region in this function. | + +----------+-------------------------------------------------------------------------------------------------------------------------+ + | ``0x0C`` | The starting column of the first mapping region in this function. | + +----------+-------------------------------------------------------------------------------------------------------------------------+ + | ``0x02`` | The ending line of the first mapping region in this function. | + +----------+-------------------------------------------------------------------------------------------------------------------------+ + | ``0x02`` | The ending column of the first mapping region in this function. | + +----------+-------------------------------------------------------------------------------------------------------------------------+ + +* The length of the substring that contains the encoded coverage mapping data + for the second function record is also 9. It's structured like the mapping data + for the first function record. + +* The two trailing bytes are zeroes and are used to pad the coverage mapping + data to give it the 8 byte alignment. + +Encoding +======== + +The per-function coverage mapping data is encoded as a stream of bytes, +with a simple structure. The structure consists of the encoding +`types `_ like variable-length unsigned integers, that +are used to encode `File ID Mapping`_, `Counter Expressions`_ and +the `Mapping Regions`_. + +The format of the structure follows: + + ``[file id mapping, counter expressions, mapping regions]`` + +The translation unit filenames are encoded using the same encoding +`types `_ as the per-function coverage mapping data, with the +following structure: + + ``[numFilenames : LEB128, filename0 : string, filename1 : string, ...]`` + +.. _cvmtypes: + +Types +----- + +This section describes the basic types that are used by the encoding format +and can appear after ``:`` in the ``[foo : type]`` description. + +.. _LEB128: + +LEB128 +^^^^^^ + +LEB128 is an unsigned interger value that is encoded using DWARF's LEB128 +encoding, optimizing for the case where values are small +(1 byte for values less than 128). + +.. _strings: + +Strings +^^^^^^^ + +``[length : LEB128, characters...]`` + +String values are encoded with a `LEB value `_ for the length +of the string and a sequence of bytes for its characters. + +.. _file id mapping: + +File ID Mapping +--------------- + +``[numIndices : LEB128, filenameIndex0 : LEB128, filenameIndex1 : LEB128, ...]`` + +File id mapping in a function's coverage mapping stream +contains the indices into the translation unit's filenames array. + +Counter +------- + +``[value : LEB128]`` + +A `coverage mapping counter`_ is stored in a single `LEB value `_. +It is composed of two things --- the `tag `_ +which is stored in the lowest 2 bits, and the `counter data`_ which is stored +in the remaining bits. + +.. _counter-tag: + +Tag: +^^^^ + +The counter's tag encodes the counter's kind +and, if the counter is an expression, the expression's kind. +The possible tag values are: + +* 0 - The counter is zero. + +* 1 - The counter is a reference to the profile instrumentation counter. + +* 2 - The counter is a subtraction expression. + +* 3 - The counter is an addition expression. + +.. _counter data: + +Data: +^^^^^ + +The counter's data is interpreted in the following manner: + +* When the counter is a reference to the profile instrumentation counter, + then the counter's data is the id of the profile counter. +* When the counter is an expression, then the counter's data + is the index into the array of counter expressions. + +.. _Counter Expressions: + +Counter Expressions +------------------- + +``[numExpressions : LEB128, expr0LHS : LEB128, expr0RHS : LEB128, expr1LHS : LEB128, expr1RHS : LEB128, ...]`` + +Counter expressions consist of two counters as they +represent binary arithmetic operations. +The expression's kind is determined from the `tag `_ of the +counter that references this expression. + +.. _Mapping Regions: + +Mapping Regions +--------------- + +``[numRegionArrays : LEB128, regionsForFile0, regionsForFile1, ...]`` + +The mapping regions are stored in an array of sub-arrays where every +region in a particular sub-array has the same file id. + +The file id for a sub-array of regions is the index of that +sub-array in the main array e.g. The first sub-array will have the file id +of 0. + +Sub-Array of Regions +^^^^^^^^^^^^^^^^^^^^ + +``[numRegions : LEB128, region0, region1, ...]`` + +The mapping regions for a specific file id are stored in an array that is +sorted in an ascending order by the region's starting location. + +Mapping Region +^^^^^^^^^^^^^^ + +``[header, source range]`` + +The mapping region record contains two sub-records --- +the `header`_, which stores the counter and/or the region's kind, +and the `source range`_ that contains the starting and ending +location of this region. + +.. _header: + +Header +^^^^^^ + +``[counter]`` + +or + +``[pseudo-counter]`` + +The header encodes the region's counter and the region's kind. + +The value of the counter's tag distinguishes between the counters and +pseudo-counters --- if the tag is zero, than this header contains a +pseudo-counter, otherwise this header contains an ordinary counter. + +Counter: +"""""""" + +A mapping region whose header has a counter with a non-zero tag is +a code region. + +Pseudo-Counter: +""""""""""""""" + +``[value : LEB128]`` + +A pseudo-counter is stored in a single `LEB value `_, just like +the ordinary counter. It has the following interpretation: + +* bits 0-1: tag, which is always 0. + +* bit 2: expansionRegionTag. If this bit is set, then this mapping region + is an expansion region. + +* remaining bits: data. If this region is an expansion region, then the data + contains the expanded file id of that region. + + Otherwise, the data contains the region's kind. The possible region + kind values are: + + * 0 - This mapping region is a code region with a counter of zero. + * 2 - This mapping region is a skipped region. + +.. _source range: + +Source Range +^^^^^^^^^^^^ + +``[deltaLineStart : LEB128, columnStart : LEB128, numLines : LEB128, columnEnd : LEB128]`` + +The source range record contains the following fields: + +* *deltaLineStart*: The difference between the starting line of the + current mapping region and the starting line of the previous mapping region. + + If the current mapping region is the first region in the current + sub-array, then it stores the starting line of that region. + +* *columnStart*: The starting column of the mapping region. + +* *numLines*: The difference between the ending line and the starting line + of the current mapping region. + +* *columnEnd*: The ending column of the mapping region. diff --git a/docs/index.rst b/docs/index.rst index 1d4fbd9d34d..08666a8462d 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -238,6 +238,7 @@ For API clients and LLVM developers. StackMaps InAlloca BigEndianNEON + CoverageMappingFormat :doc:`WritingAnLLVMPass` Information on how to write LLVM transformations and analyses. @@ -324,6 +325,8 @@ For API clients and LLVM developers. LLVM's support for generating NEON instructions on big endian ARM targets is somewhat nonintuitive. This document explains the implementation and rationale. +:doc:`CoverageMappingFormat` + This describes the format and encoding used for LLVM’s code coverage mapping. Development Process Documentation =================================