From: Chris Lattner Date: Mon, 5 Jan 2004 05:06:33 +0000 (+0000) Subject: First version of this document. It is still missing some pretty big pieces, and X-Git-Url: http://plrg.eecs.uci.edu/git/?p=oota-llvm.git;a=commitdiff_plain;h=bdfb339b8d1d0480c42bfbcf76b96c1f7fcdec75 First version of this document. It is still missing some pretty big pieces, and the debugging information formats will likely change, but it's a start, and I have to move on to other things in the short-term, so it might be a while before I get back to working on this. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@10683 91177308-0d34-0410-b5e6-96231b3b80d8 --- diff --git a/docs/SourceLevelDebugging.html b/docs/SourceLevelDebugging.html new file mode 100644 index 00000000000..ab9af996ba2 --- /dev/null +++ b/docs/SourceLevelDebugging.html @@ -0,0 +1,906 @@ + + + + Source Level Debugging with LLVM + + + + +
Source Level Debugging with LLVM
+ + + + +
Introduction
+ + +
+ +

This document is the central repository for all information pertaining to +debug information in LLVM. It describes how to use the llvm-db tool, which provides a +powerful source-level debugger to users of LLVM-based +compilers. When compiling a program in debug mode, the front-end in use adds +LLVM debugging information to the program in the form of normal LLVM program objects as well as a small set of LLVM intrinsic functions, which specify the mapping of the +program in LLVM form to the program in the source language. +

+ +
+ + +
+ Philosophy behind LLVM debugging information +
+ +
+ +

+The idea of the LLVM debugging information is to capture how the important +pieces of the source-language's Abstract Syntax Tree map onto LLVM code. +Several design aspects have shaped the solution that appears here. The +important ones are:

+ +

+ +

+The approach used by the LLVM implementation is to use a small set of intrinsic functions to define a mapping +between LLVM program objects and the source-level objects. The description of +the source-level program is maintained in LLVM global variables in an implementation-defined format (the C/C++ front-end +currently uses working draft 7 of the Dwarf 3 standard).

+ +

+When a program is debugged, the debugger interacts with the user and turns the +stored debug information into source-language specific information. As such, +the debugger must be aware of the source-language, and is thus tied to a +specific language of family of languages. The LLVM +debugger is designed to be modular in its support for source-languages. +

+ +
+ + + +
+ Debugging optimized code +
+ +
+

+An extremely high priority of LLVM debugging information is to make it interact +well with optimizations and analysis. In particular, the LLVM debug information +provides the following guarantees:

+ +

+ +

+Basically, the debug information allows you to compile a program with "-O0 +-g" and get full debug information, allowing you to arbitrarily modify the +program as it executes from the debugger. Compiling a program with "-O3 +-g" gives you full debug information that is always available and accurate +for reading (e.g., you get accurate stack traces despite tail call elimination +and inlining), but you might lose the ability to modify the program and call +functions where were optimized out of the program, or inlined away completely. +

+ +
+ + + +
+ Future work +
+ +
+

+There are several important extensions that could be eventually added to the +LLVM debugger. The most important extension would be to upgrade the LLVM code +generators to support debugging information. This would also allow, for +example, the X86 code generator to emit native objects that contain debugging +information consumable by traditional source-level debuggers like GDB or +DBX.

+ +

+Additionally, LLVM optimizations can be upgraded to incrementally update the +debugging information, new commands can be added to the +debugger, and thread support could be added to the debugger.

+ +

+The "SourceLanguage" modules provided by llvm-db could be substantially +improved to provide good support for C++ language features like namespaces and +scoping rules.

+ +

+After working with the debugger for a while, perhaps the nicest improvement +would be to add some sort of line editor, such as GNU readline (but that is +compatible with the LLVM license).

+ +

+For someone so inclined, it should be straight-forward to write different +front-ends for the LLVM debugger, as the LLVM debugging engine is cleanly +seperated from the llvm-db front-end. A GUI debugger or IDE would be +an interesting project. +

+ +
+ + + +
+ Using the llvm-db tool +
+ + +
+ +

+The llvm-db tool provides a GDB-like interface for source-level +debugging of programs. This tool provides many standard commands for inspecting +and modifying the program as it executes, loading new programs, single stepping, +placing breakpoints, etc. This section describes how to use the debugger. +

+ +

llvm-db has been designed to be as similar to GDB in its user +interface as possible. This should make it extremely easy to learn +llvm-db if you already know GDB. In general, llvm-db +provides the subset of GDB commands that are applicable to LLVM debugging users. +If there is a command missing that make a reasonable amount of sense within the +limitations of llvm-db, please report it as +a bug or, better yet, submit a patch to add it. :)

+ +
+ + +
+ Limitations of llvm-db +
+ +
+ +

llvm-db is the first LLVM debugger, and as such was designed to be +quick to prototype and build, and simple to extend. It is missing many many +features, though they should be easy to add over time (patches welcomed!). +Because the (currently only) debugger backend (implemented in +"lib/Debugger/UnixLocalInferiorProcess.cpp") was designed to work without any +cooperation from the code generators, it suffers from the following inherent +limitations:

+ +

+ +

That said, it is still quite useful, and all of these limitations can be +eliminated by integrating support for the debugger into the code generators. +See the future work section for ideas of how to extend +the LLVM debugger despite these limitations.

+ +
+ + + +
+ A sample llvm-db session +
+ +
+ +

+TODO +

+ +
+ + + + +
+ Starting the debugger +
+ +
+ +

There are three ways to start up the llvm-db debugger:

+ +

When run with no options, just llvm-db, the debugger starts up +without a program loaded at all. You must use the file command to load a program, and the set args or run +commands to specify the arguments for the program.

+ +

If you start the debugger with one argument, as llvm-db +<program>, the debugger will start up and load in the specified +program. You can then optionally specify arguments to the program with the set args or run +commands.

+ +

The third way to start the program is with the --args option. This +option allows you to specify the program to load and the arguments to start out +with. Example use: llvm-db --args ls /home

+ +
+ + +
+ Commands recognized by the debugger +
+ +
+ +

FIXME: this needs work obviously. See the GDB documentation for +information about what these do, or try 'help [command]' within +llvm-db to get information.

+ +

+

General usage:

+ + +

Program inspection and interaction:

+ + +

Call stack inspection:

+ + + +

Debugger inspection and interaction:

+ + +

TODO:

+ +

+
+ + +
+ Architecture of the LLVM debugger +
+ + +
+ +

+lib/Debugger
+  - UnixLocalInferiorProcess.cpp
+
+tools/llvm-db
+  - SourceLanguage interfaces
+  - ProgramInfo/RuntimeInfo
+  - Commands
+
+

+ +
+ + +
+ Short-term TODO list +
+ +
+ +

+FIXME: this section will eventually go away. These are notes to myself of +things that should be implemented, but haven't yet. +

+ +

+Breakpoints: Support is already implemented in the 'InferiorProcess' +class, though it hasn't been tested yet. To finish breakpoint support, we need +to implement breakCommand (which should reuse the linespec parser from the list +command), and handle the fact that 'break foo' or 'break file.c:53' may insert +multiple breakpoints. Also, if you say 'break file.c:53' and there is no +stoppoint on line 53, the breakpoint should go on the next available line. My +idea was to have the Debugger class provide a "Breakpoint" class which +encapsulated this messiness, giving the debugger front-end a simple interface. +The debugger front-end would have to map the really complex semantics of +temporary breakpoints and 'conditional' breakpoints onto this intermediate +level. Also, breakpoints should survive as much as possible across program +reloads. +

+ +

+run (with args) & set args: These need to be implemented. +Currently run doesn't support setting arguments as part of the command. The +only tricky thing is handling quotes right and stuff.

+ +

+UnixLocalInferiorProcess.cpp speedup: There is no reason for the debugged +process to code gen the globals corresponding to debug information. The +IntrinsicLowering object could instead change descriptors into constant expr +casts of the constant address of the LLVM objects for the descriptors. This +would also allow us to eliminate the mapping back and forth between physical +addresses that must be done.

+ +
+ + +
+ Debugging information implementation +
+ + +
+ +

LLVM debugging information has been carefully designed to make it possible +for the optimizer to optimize the program and debugging information without +necessarily having to know anything about debugging information. In particular, +the global constant merging pass automatically eliminates duplicated debugging +information (often caused by header files), the global dead code elimination +pass automatically deletes debugging information for a function if it decides to +delete the function, and the linker eliminates debug information when it merges +linkonce functions.

+ +

To do this, most of the debugging information (descriptors for types, +variables, functions, source files, etc) is inserted by the language front-end +in the form of LLVM global variables. These LLVM global variables are no +different from any other global variables, except that they have a web of LLVM +intrinsic functions that point to them. If the last references to a particular +piece of debugging information are deleted (for example, by the +-globaldce pass), the extraneous debug information will automatically +become dead and be removed by the optimizer.

+ +

The debugger is designed to be agnostic about the contents of most of the +debugging information. It uses a source-language-specific module to decode the +information that represents variables, types, functions, namespaces, etc: this +allows for arbitrary source-language semantics and type-systems to be used, as +long as there is a module written for the debugger to interpret the information. +

+ +

+To provide basic functionality, the LLVM debugger does have to make some +assumptions about the source-level language being debugged, though it keeps +these to a minimum. The only common features that the LLVM debugger assumes +exist are source files, global objects (aka methods, messages, global +variables, etc), and local variables. +These abstract objects are used by the debugger to form stack traces, show +information about local variables, etc. + +

This section of the documentation first describes the representation aspects +common to any source-language. The next section +describes the data layout conventions used by the C and C++ +front-ends.

+ +
+ + +
+ Anchors for global objects +
+ +
+

+One important aspect of the LLVM debug representation is that it allows the LLVM +debugger to efficiently index all of the global objects without having the scan +the program. To do this, all of the global objects use "anchor" globals of type +"{}", with designated names. These anchor objects obviously do not +contain any content or meaning by themselves, but all of the global objects of a +particular type (e.g., source file descriptors) contain a pointer to the anchor. +This pointer allows the debugger to use def-use chains to find all global +objects of that type. +

+ +

+So far, the following names are recognized as anchors by the LLVM debugger: +

+ +

+  %llvm.dbg.translation_units = linkonce global {} {}
+  %llvm.dbg.globals         = linkonce global {} {}
+

+ +

+Using anchors in this way (where the source file descriptor points to the +anchors, as opposed to having a list of source file descriptors) allows for the +standard dead global elimination and merging passes to automatically remove +unused debugging information. If the globals were kept track of through lists, +there would always be an object pointing to the descriptors, thus would never be +deleted. +

+ +
+ + + +
+ + Representing stopping points in the source program + +
+ +
+ +

LLVM debugger "stop points" are a key part of the debugging representation +that allows the LLVM to maintain simple semantics for debugging optimized code. The basic idea is that the +front-end inserts calls to the %llvm.dbg.stoppoint intrinsic function +at every point in the program where the debugger should be able to inspect the +program (these correspond to places the debugger stops when you "step" +through it). The front-end can choose to place these as fine-grained as it +would like (for example, before every subexpression was evaluated), but it is +recommended to only put them after every source statement.

+ +

+Using calls to this intrinsic function to demark legal points for the debugger +to inspect the program automatically disables any optimizations that could +potentially confuse debugging information. To non-debug-information-aware +transformations, these calls simply look like calls to an external function, +which they must assume to do anything (including reading or writing to any part +of reachable memory). On the other hand, it does not impact many optimizations, +such as code motion of non-trapping instructions, nor does it impact +optimization of subexpressions, or any other code between the stop points.

+ +

+An important aspect of the calls to the %llvm.dbg.stoppoint intrinsic +is that the function-local debugging information is woven together with use-def +chains. This makes it easy for the debugger to, for example, locate the 'next' +stop point. For a concrete example of stop points, see the next section.

+ +
+ + + +
+ Object lifetimes and scoping +
+ +
+

+In many languages, the local variables in functions can have their lifetime or +scope limited to a subset of a function. In the C family of languages, for +example, variables are only live (readable and writable) within the source block +that they are defined in. In functional languages, values are only readable +after they have been defined. Though this is a very obvious concept, it is also +non-trivial to model in LLVM, because it has no notion of scoping in this sense, +and does not want to be tied to a language's scoping rules. +

+ +

+In order to handle this, the LLVM debug format uses the notion of "regions" of a +function, delineated by calls to intrinsic functions. These intrinsic functions +define new regions of the program and indicate when the region lifetime expires. +Consider the following C fragment, for example: +

+ +

+1.  void foo() {
+2.    int X = ...;
+3.    int Y = ...;
+4.    {
+5.      int Z = ...;
+6.      ...
+7.    }
+8.    ...
+9.  }
+

+ +

+Compiled to LLVM, this function would be represented like this (FIXME: CHECK AND +UPDATE THIS): +

+ +

+void %foo() {
+    %X = alloca int
+    %Y = alloca int
+    %Z = alloca int
+    %D1 = call {}* %llvm.dbg.func.start(%lldb.global* %d.foo)
+    %D2 = call {}* %llvm.dbg.stoppoint({}* %D1, uint 2, uint 2, %lldb.compile_unit* %file)
+
+    %D3 = call {}* %llvm.dbg.DEFINEVARIABLE({}* %D2, ...)
+    ;; Evaluate expression on line 2, assigning to X.
+    %D4 = call {}* %llvm.dbg.stoppoint({}* %D3, uint 3, uint 2, %lldb.compile_unit* %file)
+
+    %D5 = call {}* %llvm.dbg.DEFINEVARIABLE({}* %D4, ...)
+    ;; Evaluate expression on line 3, assigning to Y.
+    %D6 = call {}* %llvm.dbg.stoppoint({}* %D5, uint 5, uint 4, %lldb.compile_unit* %file)
+
+    %D7 = call {}* %llvm.region.start({}* %D6)
+    %D8 = call {}* %llvm.dbg.DEFINEVARIABLE({}* %D7, ...)
+    ;; Evaluate expression on line 5, assigning to Z.
+    %D9 = call {}* %llvm.dbg.stoppoint({}* %D8, uint 6, uint 4, %lldb.compile_unit* %file)
+
+    ;; Code for line 6.
+    %D10 = call {}* %llvm.region.end({}* %D9)
+    %D11 = call {}* %llvm.dbg.stoppoint({}* %D10, uint 8, uint 2, %lldb.compile_unit* %file)
+
+    ;; Code for line 8.
+    %D12 = call {}* %llvm.region.end({}* %D11)
+    ret void
+}
+

+ +

+This example illustrates a few important details about the LLVM debugging +information. In particular, it shows how the various intrinsics used are woven +together with def-use and use-def chains, similar to how anchors are used with globals. This allows the +debugger to analyze the relationship between statements, variable definitions, +and the code used to implement the function.

+ +

+In this example, two explicit regions are defined, one with the definition of the %D1 variable and one with the +definition of %D7. In the case of +%D1, the debug information indicates that the function whose descriptor is specified as an argument to the +intrinsic. This defines a new stack frame whose lifetime ends when the region +is ended by the %D12 call.

+ +

+Representing the boundaries of functions with regions allows normal LLVM +interprocedural optimizations to change the boundaries of functions without +having to worry about breaking mapping information between LLVM and source-level +functions. In particular, the inlining optimization requires no modification to +support inlining with debugging information: there is no correlation drawn +between LLVM functions and their source-level counterparts.

+ +

+Once the function has been defined, the stopping point corresponding to line #2 of the +function is encountered. At this point in the function, no local +variables are live. As lines 2 and 3 of the example are executed, their +variable definitions are automatically introduced into the program, without the +need to specify a new region. These variables do not require new regions to be +introduced because they go out of scope at the same point in the program: line +9. +

+ +

+In contrast, the Z variable goes out of scope at a different time, on +line 7. For this reason, it is defined within the +%D7 region, which kills the availability of Z before the +code for line 8 is executed. Through the use of LLVM debugger regions, +arbitrary source-language scoping rules can be supported, as long as they can +only be nested (ie, one scope cannot partially overlap with a part of another +scope). +

+ +

+It is worth noting that this scoping mechanism is used to control scoping of all +declarations, not just variable declarations. For example, the scope of a C++ +using declaration is controlled with this, and the llvm-db C++ support +routines could use this to change how name lookup is performed (though this is +not yet implemented). +

+ +
+ + + +
+ Object descriptor formats +
+ +
+

+The LLVM debugger expects the descriptors for global objects to start in a +canonical format, but the descriptors can include additional information +appended at the end. All LLVM debugging information is versioned, allowing +backwards compatibility in the case that the core structures need to change in +some way. The lowest-level descriptor are those describing the files containing the program source +code, all other descriptors refer to them. +

+
+ + + +
+ Representation of source files +
+ +
+

+Source file descriptors were roughly patterned after the Dwarf "compile_unit" +object. The descriptor currently is defined to have the following LLVM +type:

+ +

+%lldb.compile_unit = type {
+       ushort,               ;; LLVM debug version number
+       ushort,               ;; Dwarf language identifier
+       sbyte*,               ;; Filename
+       sbyte*,               ;; Working directory when compiled
+       sbyte*,               ;; Producer of the debug information
+       {}*                   ;; Anchor for llvm.dbg.translation_units
+}
+

+ +

+These descriptors contain the version number for the debug info, a source +language ID for the file (we use the Dwarf 3.0 ID numbers, such as +DW_LANG_C89, DW_LANG_C_plus_plus, DW_LANG_Cobol74, +etc), three strings describing the filename, working directory of the compiler, +and an identifier string for the compiler that produced it, and the anchor for the descriptor. Here is an example +descriptor: +

+ +

+%arraytest_source_file = internal constant %lldb.compile_unit {
+    ushort 0,                                                     ; Version #0
+    ushort 1,                                                     ; DW_LANG_C89
+    sbyte* getelementptr ([12 x sbyte]* %.str_1, long 0, long 0), ; filename
+    sbyte* getelementptr ([12 x sbyte]* %.str_2, long 0, long 0), ; working dir
+    sbyte* getelementptr ([12 x sbyte]* %.str_3, long 0, long 0), ; producer
+    {}* %llvm.dbg.translation_units                               ; Anchor
+}
+%.str_1 = internal constant [12 x sbyte] c"arraytest.c\00"
+%.str_2 = internal constant [12 x sbyte] c"/home/sabre\00"
+%.str_3 = internal constant [12 x sbyte] c"llvmgcc 3.4\00"
+

+ + +
+ + + +
+ Representation of global objects +
+ +
+

+The LLVM debugger needs to know what the source-language global objects, in +order to build stack traces and other related activities. Because +source-languages have widly varying forms of global objects, the LLVM debugger +only expects the following fields in the descriptor for each global: +

+ +

+%lldb.global = type {
+       %lldb.compile_unit*,   ;; The translation unit containing the global
+       sbyte*,                ;; The global object 'name'
+       [type]*,               ;; Source-language type descriptor for global
+       {}*                    ;; The anchor for llvm.dbg.globals
+}
+

+ +

+The first field contains a pointer to the translation unit the function is +defined in. This pointer allows the debugger to find out which version of debug +information the function corresponds to. The second field contains a string +that the debugger can use to identify the subprogram if it does not contain +explicit support for the source-language in use. This should be some sort of +unmangled string that corresponds to the function somehow. +

+ +

+Note again that descriptors can be extended to include source-language-specific +information in addition to the fields required by the LLVM debugger. See the section on the C/C++ front-end for more +information. +

+
+ + + + +
+ Representation of local variables +
+ +
+

+

+
+ + + +
+ Other intrinsic functions +
+ +
+

+ +

+
+ + + + +
+ C/C++ front-end specific debug information +
+ +
+ +

+The C and C++ front-ends represent information about the program in a format +that is effectively identical to Dwarf 3.0 in terms of +information content. This allows code generators to trivially support native +debuggers by generating standard dwarf information, and contains enough +information for non-dwarf targets to translate it other as needed.

+ +

+TODO: document extensions to standard debugging objects, document how we +represent source types, etc. +

+ +
+ + +
+ Object Descriptor Formats +
+ +
+

+ +

+
+ + + + +
+ + + + diff --git a/docs/venusflytrap.jpg b/docs/venusflytrap.jpg new file mode 100644 index 00000000000..59340ef3ee1 Binary files /dev/null and b/docs/venusflytrap.jpg differ