X-Git-Url: http://plrg.eecs.uci.edu/git/?a=blobdiff_plain;f=docs%2FSourceLevelDebugging.html;h=fab63046a484414672b10315527ce3707ef08d74;hb=f21f85a0a72d091caa13b59562147d4d152d1a5e;hp=8a47a88d9a12f2569fae879e8a0044687b54236e;hpb=4f5659e829463806ffc538fe34bfb7b1906f0982;p=oota-llvm.git diff --git a/docs/SourceLevelDebugging.html b/docs/SourceLevelDebugging.html index 8a47a88d9a1..fab63046a48 100644 --- a/docs/SourceLevelDebugging.html +++ b/docs/SourceLevelDebugging.html @@ -2,6 +2,7 @@ "http://www.w3.org/TR/html4/strict.dtd">
+This document is the central repository for all information pertaining to -debug information in LLVM. It describes the user -interface for the llvm-db tool, which provides a -powerful source-level debugger -to users of LLVM-based compilers. It then describes the various components that make up the debugger and the -libraries which future clients may use. Finally, it describes the actual format that the LLVM debug information takes, -which is useful for those interested in creating front-ends or dealing directly -with the information.
+ debug information in LLVM. It describes the actual format + that the LLVM debug information takes, which is useful for those + interested in creating front-ends or dealing directly with the information. + Further, this document provides specific examples of what debug information + for C/C++.The idea of the LLVM debugging information is to capture how the important -pieces of the source-language's Abstract Syntax Tree map onto LLVM code. -Several design aspects have shaped the solution that appears here. The -important ones are:
+ pieces of the source-language's Abstract Syntax Tree map onto LLVM code. + Several design aspects have shaped the solution that appears here. The + important ones are:The approach used by the LLVM implementation is to use a small set of intrinsic functions to define a mapping -between LLVM program objects and the source-level objects. The description of -the source-level program is maintained in LLVM global variables in an implementation-defined format (the C/C++ front-end -currently uses working draft 7 of the Dwarf 3 standard).
+The approach used by the LLVM implementation is to use a small set + of intrinsic functions to define a + mapping between LLVM program objects and the source-level objects. The + description of the source-level program is maintained in LLVM global + variables in an implementation-defined format + (the C/C++ front-end currently uses working draft 7 of + the DWARF 3 + standard).
-When a program is debugged, the debugger interacts with the user and turns -the stored debug information into source-language specific information. As -such, the debugger must be aware of the source-language, and is thus tied to a -specific language of family of languages. The LLVM -debugger is designed to be modular in its support for source-languages.
+When a program is being debugged, a debugger interacts with the user and + turns the stored debug information into source-language specific information. + As such, a debugger must be aware of the source-language, and is thus tied to + a specific language or family of languages.
An extremely high priority of LLVM debugging information is to make it -interact well with optimizations and analysis. In particular, the LLVM debug -information provides the following guarantees:
- -The role of debug information is to provide meta information normally + stripped away during the compilation process. This meta information provides + an LLVM user a relationship between generated code and the original program + source code.
-Currently, debug information is consumed by the DwarfWriter to produce dwarf + information used by the gdb debugger. Other targets could use the same + information to produce stabs or other debug forms.
-It would also be reasonable to use debug information to feed profiling tools + for analysis of generated code, or, tools for reconstructing the original + source from generated code.
-Basically, the debug information allows you to compile a program with -"-O0 -g" and get full debug information, allowing you to arbitrarily -modify the program as it executes from the debugger. Compiling a program with -"-O3 -g" gives you full debug information that is always available and -accurate for reading (e.g., you get accurate stack traces despite tail call -elimination and inlining), but you might lose the ability to modify the program -and call functions where were optimized out of the program, or inlined away -completely.
+TODO - expound a bit more.
There are several important extensions that could be eventually added to the -LLVM debugger. The most important extension would be to upgrade the LLVM code -generators to support debugging information. This would also allow, for -example, the X86 code generator to emit native objects that contain debugging -information consumable by traditional source-level debuggers like GDB or -DBX.
-Additionally, LLVM optimizations can be upgraded to incrementally update the -debugging information, new commands can be added to the -debugger, and thread support could be added to the debugger.
+An extremely high priority of LLVM debugging information is to make it + interact well with optimizations and analysis. In particular, the LLVM debug + information provides the following guarantees:
-The "SourceLanguage" modules provided by llvm-db could be -substantially improved to provide good support for C++ language features like -namespaces and scoping rules.
+After working with the debugger for a while, perhaps the nicest improvement -would be to add some sort of line editor, such as GNU readline (but one that is -compatible with the LLVM license).
+Basically, the debug information allows you to compile a program with + "-O0 -g" and get full debug information, allowing you to arbitrarily + modify the program as it executes from a debugger. Compiling a program with + "-O3 -g" gives you full debug information that is always available + and accurate for reading (e.g., you get accurate stack traces despite tail + call elimination and inlining), but you might lose the ability to modify the + program and call functions where were optimized out of the program, or + inlined away completely.
+ +LLVM test suite provides a + framework to test optimizer's handling of debugging information. It can be + run like this:
+ ++% cd llvm/projects/test-suite/MultiSource/Benchmarks # or some other level +% make TEST=dbgopt ++
For someone so inclined, it should be straight-forward to write different -front-ends for the LLVM debugger, as the LLVM debugging engine is cleanly -separated from the llvm-db front-end. A new LLVM GUI debugger or IDE -would be nice.
+This will test impact of debugging information on optimization passes. If + debugging information influences optimization passes then it will be reported + as a failure. See TestingGuide for more + information on LLVM test infrastructure and how to run various tests.
The llvm-db tool provides a GDB-like interface for source-level -debugging of programs. This tool provides many standard commands for inspecting -and modifying the program as it executes, loading new programs, single stepping, -placing breakpoints, etc. This section describes how to use the debugger.
+LLVM debugging information has been carefully designed to make it possible + for the optimizer to optimize the program and debugging information without + necessarily having to know anything about debugging information. In + particular, the global constant merging pass automatically eliminates + duplicated debugging information (often caused by header files), the global + dead code elimination pass automatically deletes debugging information for a + function if it decides to delete the function, and the linker eliminates + debug information when it merges linkonce functions.
+ +To do this, most of the debugging information (descriptors for types, + variables, functions, source files, etc) is inserted by the language + front-end in the form of LLVM global variables. These LLVM global variables + are no different from any other global variables, except that they have a web + of LLVM intrinsic functions that point to them. If the last references to a + particular piece of debugging information are deleted (for example, by the + -globaldce pass), the extraneous debug information will + automatically become dead and be removed by the optimizer.
+ +Debug information is designed to be agnostic about the target debugger and + debugging information representation (e.g. DWARF/Stabs/etc). It uses a + generic machine debug information pass to decode the information that + represents variables, types, functions, namespaces, etc: this allows for + arbitrary source-language semantics and type-systems to be used, as long as + there is a module written for the target debugger to interpret the + information. In addition, debug global variables are declared in + the "llvm.metadata" section. All values declared in this section + are stripped away after target debug information is constructed and before + the program object is emitted.
+ +To provide basic functionality, the LLVM debugger does have to make some + assumptions about the source-level language being debugged, though it keeps + these to a minimum. The only common features that the LLVM debugger assumes + exist are source files, + and program objects. These abstract + objects are used by a debugger to form stack traces, show information about + local variables, etc.
-llvm-db has been designed to be as similar to GDB in its user -interface as possible. This should make it extremely easy to learn -llvm-db if you already know GDB. In general, llvm-db -provides the subset of GDB commands that are applicable to LLVM debugging users. -If there is a command missing that make a reasonable amount of sense within the -limitations of llvm-db, please report it as -a bug or, better yet, submit a patch to add it.
+This section of the documentation first describes the representation aspects + common to any source-language. The next section + describes the data layout conventions used by the C and C++ front-ends.
llvm-db is designed to be modular and easy to extend. This -extensibility was key to getting the debugger up-and-running quickly, because we -can start with simple-but-unsophisicated implementations of various components. -Because of this, it is currently missing many features, though they should be -easy to add over time (patches welcomed!). The biggest inherent limitations of -llvm-db are currently due to extremely simple debugger backend (implemented in -"lib/Debugger/UnixLocalInferiorProcess.cpp") which is designed to work without -any cooperation from the code generators. Because it is so simple, it suffers -from the following inherent limitations:
+In consideration of the complexity and volume of debug information, LLVM + provides a specification for well formed debug global variables. The + constant value of each of these globals is one of a limited set of + structures, known as debug descriptors.
+ +Consumers of LLVM debug information expect the descriptors for program + objects to start in a canonical format, but the descriptors can include + additional information appended at the end that is source-language + specific. All LLVM debugging information is versioned, allowing backwards + compatibility in the case that the core structures need to change in some + way. Also, all debugging information objects start with a tag to indicate + what type of object it is. The source-language is allowed to define its own + objects, by using unreserved tag numbers. We recommend using with tags in + the range 0x1000 thru 0x2000 (there is a defined enum DW_TAG_user_base = + 0x1000.)
+ +The fields of debug descriptors used internally by LLVM (MachineModuleInfo) + are restricted to only the simple data types int, uint, + bool, float, double, i8* and + { }*. References to arbitrary values are handled using a + { }* and a cast to { }* expression; typically + references to other field descriptors, arrays of descriptors or global + variables.
+ ++%llvm.dbg.object.type = type { + uint, ;; A tag + ... +} ++
The details of the various descriptors follow.
-+%llvm.dbg.compile_unit.type = type { + i32, ;; Tag = 17 + LLVMDebugVersion (DW_TAG_compile_unit) + { }*, ;; Compile unit anchor = cast = (%llvm.dbg.anchor.type* %llvm.dbg.compile_units to { }*) + i32, ;; DWARF language identifier (ex. DW_LANG_C89) + i8*, ;; Source file name + i8*, ;; Source file directory (includes trailing slash) + i8* ;; Producer (ex. "4.0.1 LLVM (LLVM research group)") + i1, ;; True if this is a main compile unit. + i1, ;; True if this is optimized. + i8*, ;; Flags + i32 ;; Runtime version +} ++
These descriptors contain a source language ID for the file (we use the DWARF + 3.0 ID numbers, such as DW_LANG_C89, DW_LANG_C_plus_plus, + DW_LANG_Cobol74, etc), three strings describing the filename, + working directory of the compiler, and an identifier string for the compiler + that produced it.
+ +Compile unit descriptors provide the root context for objects declared in a + specific source file. Global variables and top level functions would be + defined using this context. Compile unit descriptors also provide context + for source line correspondence.
+ +Each input file is encoded as a separate compile unit in LLVM debugging + information output. However, many target specific tool chains prefer to + encode only one compile unit in an object file. In this situation, the LLVM + code generator will include debugging information entities in the compile + unit that is marked as main compile unit. The code generator accepts maximum + one main compile unit per module. If a module does not contain any main + compile unit then the code generator will emit multiple compile units in the + output object file.
-That said, the debugger is still quite useful, and all of these limitations -can be eliminated by integrating support for the debugger into the code -generators, and writing a new InferiorProcess -subclass to use it. See the future work section for ideas -of how to extend the LLVM debugger despite these limitations.
++%llvm.dbg.global_variable.type = type { + i32, ;; Tag = 52 + LLVMDebugVersion (DW_TAG_variable) + { }*, ;; Global variable anchor = cast (%llvm.dbg.anchor.type* %llvm.dbg.global_variables to { }*), + { }*, ;; Reference to context descriptor + i8*, ;; Name + i8*, ;; Display name (fully qualified C++ name) + i8*, ;; MIPS linkage name (for C++) + { }*, ;; Reference to compile unit where defined + i32, ;; Line number where defined + { }*, ;; Reference to type descriptor + i1, ;; True if the global is local to compile unit (static) + i1, ;; True if the global is defined in the compile unit (not extern) + { }* ;; Reference to the global variable +} +
These descriptors provide debug information about globals variables. The +provide details such as name, type and where the variable is defined.
+ +TODO: this is obviously lame, when more is implemented, this can be much -better.
- +-$ llvm-db funccall -llvm-db: The LLVM source-level debugger -Loading program... successfully loaded 'funccall.bc'! -(llvm-db) create -Starting program: funccall.bc -main at funccall.c:9:2 -9 -> q = 0; -(llvm-db) list main -4 void foo() { -5 int t = q; -6 q = t + 1; -7 } -8 int main() { -9 -> q = 0; -10 foo(); -11 q = q - 1; -12 -13 return q; -(llvm-db) list -14 } -(llvm-db) step -10 -> foo(); -(llvm-db) s -foo at funccall.c:5:2 -5 -> int t = q; -(llvm-db) bt -#0 -> 0x85ffba0 in foo at funccall.c:5:2 -#1 0x85ffd98 in main at funccall.c:10:2 -(llvm-db) finish -main at funccall.c:11:2 -11 -> q = q - 1; -(llvm-db) s -13 -> return q; -(llvm-db) s -The program stopped with exit code 0 -(llvm-db) quit -$ +%llvm.dbg.subprogram.type = type { + i32, ;; Tag = 46 + LLVMDebugVersion (DW_TAG_subprogram) + { }*, ;; Subprogram anchor = cast (%llvm.dbg.anchor.type* %llvm.dbg.subprograms to { }*), + { }*, ;; Reference to context descriptor + i8*, ;; Name + i8*, ;; Display name (fully qualified C++ name) + i8*, ;; MIPS linkage name (for C++) + { }*, ;; Reference to compile unit where defined + i32, ;; Line number where defined + { }*, ;; Reference to type descriptor + i1, ;; True if the global is local to compile unit (static) + i1 ;; True if the global is defined in the compile unit (not extern) +}+
These descriptors provide debug information about functions, methods and + subprograms. They provide details such as name, return types and the source + location where the subprogram is defined.
+ ++%llvm.dbg.block = type { + i32, ;; Tag = 13 + LLVMDebugVersion (DW_TAG_lexical_block) + { }* ;; Reference to context descriptor +} +
These descriptors provide debug information about nested blocks within a + subprogram. The array of member descriptors is used to define local + variables and deeper nested blocks.
+There are three ways to start up the llvm-db debugger:
++%llvm.dbg.basictype.type = type { + i32, ;; Tag = 36 + LLVMDebugVersion (DW_TAG_base_type) + { }*, ;; Reference to context (typically a compile unit) + i8*, ;; Name (may be "" for anonymous types) + { }*, ;; Reference to compile unit where defined (may be NULL) + i32, ;; Line number where defined (may be 0) + i64, ;; Size in bits + i64, ;; Alignment in bits + i64, ;; Offset in bits + i32, ;; Flags + i32 ;; DWARF type encoding +} ++
When run with no options, just llvm-db, the debugger starts up -without a program loaded at all. You must use the file command to load a program, and the set args or run -commands to specify the arguments for the program.
+These descriptors define primitive types used in the code. Example int, bool + and float. The context provides the scope of the type, which is usually the + top level. Since basic types are not usually user defined the compile unit + and line number can be left as NULL and 0. The size, alignment and offset + are expressed in bits and can be 64 bit values. The alignment is used to + round the offset when embedded in a + composite type (example to keep float + doubles on 64 bit boundaries.) The offset is the bit offset if embedded in + a composite type.
-If you start the debugger with one argument, as llvm-db -<program>, the debugger will start up and load in the specified -program. You can then optionally specify arguments to the program with the set args or run -commands.
+The type encoding provides the details of the type. The values are typically + one of the following:
-The third way to start the program is with the --args option. This -option allows you to specify the program to load and the arguments to start out -with. Example use: llvm-db --args ls /home
++DW_ATE_address = 1 +DW_ATE_boolean = 2 +DW_ATE_float = 4 +DW_ATE_signed = 5 +DW_ATE_signed_char = 6 +DW_ATE_unsigned = 7 +DW_ATE_unsigned_char = 8 ++
FIXME: this needs work obviously. See the GDB documentation for -information about what these do, or try 'help [command]' within -llvm-db to get information.
++%llvm.dbg.derivedtype.type = type { + i32, ;; Tag (see below) + { }*, ;; Reference to context + i8*, ;; Name (may be "" for anonymous types) + { }*, ;; Reference to compile unit where defined (may be NULL) + i32, ;; Line number where defined (may be 0) + i32, ;; Size in bits + i32, ;; Alignment in bits + i32, ;; Offset in bits + { }* ;; Reference to type derived from +} ++
-
These descriptors are used to define types derived from other types. The +value of the tag varies depending on the meaning. The following are possible +tag values:
-+DW_TAG_formal_parameter = 5 +DW_TAG_member = 13 +DW_TAG_pointer_type = 15 +DW_TAG_reference_type = 16 +DW_TAG_typedef = 22 +DW_TAG_const_type = 38 +DW_TAG_volatile_type = 53 +DW_TAG_restrict_type = 55 ++
DW_TAG_member is used to define a member of + a composite type + or subprogram. The type of the member is + the derived + type. DW_TAG_formal_parameter is used to define a member which + is a formal argument of a subprogram.
+DW_TAG_typedef is used to provide a name for the derived type.
-DW_TAG_pointer_type,DW_TAG_reference_type, + DW_TAG_const_type, DW_TAG_volatile_type + and DW_TAG_restrict_type are used to qualify + the derived type.
-Derived type location can be determined + from the compile unit and line number. The size, alignment and offset are + expressed in bits and can be 64 bit values. The alignment is used to round + the offset when embedded in a composite + type (example to keep float doubles on 64 bit boundaries.) The offset is + the bit offset if embedded in a composite + type.
+ +Note that the void * type is expressed as a + llvm.dbg.derivedtype.type with tag of DW_TAG_pointer_type + and NULL derived type.
The LLVM debugger is built out of three distinct layers of software. These -layers provide clients with different interface options depending on what pieces -of they want to implement themselves, and it also promotes code modularity and -good design. The three layers are the Debugger -interface, the "info" interfaces, and the llvm-db tool itself.
+ ++%llvm.dbg.compositetype.type = type { + i32, ;; Tag (see below) + { }*, ;; Reference to context + i8*, ;; Name (may be "" for anonymous types) + { }*, ;; Reference to compile unit where defined (may be NULL) + i32, ;; Line number where defined (may be 0) + i64, ;; Size in bits + i64, ;; Alignment in bits + i64, ;; Offset in bits + i32, ;; Flags + { }*, ;; Reference to type derived from + { }*, ;; Reference to array of member descriptors + i32 ;; Runtime languages +} ++
These descriptors are used to define types that are composed of 0 or more +elements. The value of the tag varies depending on the meaning. The following +are possible tag values:
+ ++DW_TAG_array_type = 1 +DW_TAG_enumeration_type = 4 +DW_TAG_structure_type = 19 +DW_TAG_union_type = 23 +DW_TAG_vector_type = 259 +DW_TAG_subroutine_type = 21 +DW_TAG_inheritance = 28 ++
The vector flag indicates that an array type is a native packed vector.
+ +The members of array types (tag = DW_TAG_array_type) or vector types + (tag = DW_TAG_vector_type) are subrange + descriptors, each representing the range of subscripts at that level of + indexing.
+ +The members of enumeration types (tag = DW_TAG_enumeration_type) are + enumerator descriptors, each representing + the definition of enumeration value for the set.
+ +The members of structure (tag = DW_TAG_structure_type) or union (tag + = DW_TAG_union_type) types are any one of + the basic, + derived + or composite type descriptors, each + representing a field member of the structure or union.
+ +For C++ classes (tag = DW_TAG_structure_type), member descriptors + provide information about base classes, static members and member + functions. If a member is a derived type + descriptor and has a tag of DW_TAG_inheritance, then the type + represents a base class. If the member of is + a global variable descriptor then it + represents a static member. And, if the member is + a subprogram descriptor then it represents + a member function. For static members and member + functions, getName() returns the members link or the C++ mangled + name. getDisplayName() the simplied version of the name.
+ +The first member of subroutine (tag = DW_TAG_subroutine_type) type + elements is the return type for the subroutine. The remaining elements are + the formal arguments to the subroutine.
+ +Composite type location can be + determined from the compile unit and line number. The size, alignment and + offset are expressed in bits and can be 64 bit values. The alignment is used + to round the offset when embedded in + a composite type (as an example, to keep + float doubles on 64 bit boundaries.) The offset is the bit offset if embedded + in a composite type.
+The Debugger class (defined in the include/llvm/Debugger/ directory) -is a low-level class which is used to maintain information about the loaded -program, as well as start and stop the program running as necessary. This class -does not provide any high-level analysis or control over the program, only -exposing simple interfaces like load/unloadProgram, -create/killProgram, step/next/finish/contProgram, and -low-level methods for installing breakpoints.
- --The Debugger class is itself a wrapper around the lowest-level InferiorProcess -class. This class is used to represent an instance of the program running under -debugger control. The InferiorProcess class can be implemented in different -ways for different targets and execution scenarios (e.g., remote debugging). -The InferiorProcess class exposes a small and simple collection of interfaces -which are useful for inspecting the current state of the program (such as -collecting stack trace information, reading the memory image of the process, -etc). The interfaces in this class are designed to be as low-level and simple -as possible, to make it easy to create new instances of the class. -
- --The Debugger class exposes the currently active instance of InferiorProcess -through the Debugger::getRunningProcess method, which returns a -const reference to the class. This means that clients of the Debugger -class can only inspect the running instance of the program directly. To -change the executing process in some way, they must use the interces exposed by -the Debugger class. -
+ ++%llvm.dbg.subrange.type = type { + i32, ;; Tag = 33 + LLVMDebugVersion (DW_TAG_subrange_type) + i64, ;; Low value + i64 ;; High value +} ++
These descriptors are used to define ranges of array subscripts for an array + composite type. The low value defines + the lower bounds typically zero for C/C++. The high value is the upper + bounds. Values are 64 bit. High - low + 1 is the size of the array. If low + == high the array will be unbounded.
+-The next-highest level of debugger abstraction is provided through the -ProgramInfo, RuntimeInfo, SourceLanguage and related classes (also defined in -the include/llvm/Debugger/ directory). These classes efficiently -decode the debugging information and low-level interfaces exposed by -InferiorProcess into a higher-level representation, suitable for analysis by the -debugger. -
- --The ProgramInfo class exposes a variety of different kinds of information about -the program objects in the source-level-language. The SourceFileInfo class -represents a source-file in the program (e.g. a .cpp or .h file). The -SourceFileInfo class captures information such as which SourceLanguage was used -to compile the file, where the debugger can get access to the actual file text -(which is lazily loaded on demand), etc. The SourceFunctionInfo class -represents a... FIXME: finish. The ProgramInfo class provides interfaces -to lazily find and decode the information needed to create the Source*Info -classes requested by the debugger. -
- --The RuntimeInfo class exposes information about the currently executed program, -by decoding information from the InferiorProcess and ProgramInfo classes. It -provides a StackFrame class which provides an easy-to-use interface for -inspecting the current and suspended stack frames in the program. -
- --The SourceLanguage class is an abstract interface used by the debugger to -perform all source-language-specific tasks. For example, this interface is used -by the ProgramInfo class to decode language-specific types and functions and by -the debugger front-end (such as llvm-db to -evaluate source-langauge expressions typed into the debugger. This class uses -the RuntimeInfo & ProgramInfo classes to get information about the current -execution context and the loaded program, respectively. -
+ ++%llvm.dbg.enumerator.type = type { + i32, ;; Tag = 40 + LLVMDebugVersion (DW_TAG_enumerator) + i8*, ;; Name + i64 ;; Value +} ++
These descriptors are used to define members of an + enumeration composite type, it + associates the name to the value.
-The llvm-db is designed to be a debugger providing an interface as similar to GDB as reasonable, but no more so than that. -Because the Debugger and info classes implement all of the heavy lifting and -analysis, llvm-db (which lives in llvm/tools/llvm-db) consists -mainly of of code to interact with the user and parse commands. The CLIDebugger -constructor registers all of the builtin commands for the debugger, and each -command is implemented as a CLIDebugger::[name]Command method. -
+ ++%llvm.dbg.variable.type = type { + i32, ;; Tag (see below) + { }*, ;; Context + i8*, ;; Name + { }*, ;; Reference to compile unit where defined + i32, ;; Line number where defined + { }* ;; Type descriptor +} ++
These descriptors are used to define variables local to a sub program. The + value of the tag depends on the usage of the variable:
+ ++DW_TAG_auto_variable = 256 +DW_TAG_arg_variable = 257 +DW_TAG_return_variable = 258 +
An auto variable is any variable declared in the body of the function. An + argument variable is any variable that appears as a formal argument to the + function. A return variable is used to track the result of a function and + has no source correspondent.
+ +The context is either the subprogram or block where the variable is defined. + Name the source variable name. Compile unit and line indicate where the + variable was defined. Type descriptor defines the declared type of the + variable.
+ +-FIXME: this section will eventually go away. These are notes to myself of -things that should be implemented, but haven't yet. -
- --Breakpoints: Support is already implemented in the 'InferiorProcess' -class, though it hasn't been tested yet. To finish breakpoint support, we need -to implement breakCommand (which should reuse the linespec parser from the list -command), and handle the fact that 'break foo' or 'break file.c:53' may insert -multiple breakpoints. Also, if you say 'break file.c:53' and there is no -stoppoint on line 53, the breakpoint should go on the next available line. My -idea was to have the Debugger class provide a "Breakpoint" class which -encapsulated this messiness, giving the debugger front-end a simple interface. -The debugger front-end would have to map the really complex semantics of -temporary breakpoints and 'conditional' breakpoints onto this intermediate -level. Also, breakpoints should survive as much as possible across program -reloads. -
- --UnixLocalInferiorProcess.cpp speedup: There is no reason for the debugged -process to code gen the globals corresponding to debug information. The -IntrinsicLowering object could instead change descriptors into constant expr -casts of the constant address of the LLVM objects for the descriptors. This -would also allow us to eliminate the mapping back and forth between physical -addresses that must be done.
- --Process deaths: The InferiorProcessDead exception should be extended to -know "how" a process died, i.e., it was killed by a signal. This is easy to -collect in the UnixLocalInferiorProcess, we just need to represent it.
+LLVM uses several intrinsic functions (name prefixed with "llvm.dbg") to + provide debug information at various points in generated code.
+ void %llvm.dbg.stoppoint( uint, uint, { }* ) +-
LLVM debugging information has been carefully designed to make it possible -for the optimizer to optimize the program and debugging information without -necessarily having to know anything about debugging information. In particular, -the global constant merging pass automatically eliminates duplicated debugging -information (often caused by header files), the global dead code elimination -pass automatically deletes debugging information for a function if it decides to -delete the function, and the linker eliminates debug information when it merges -linkonce functions.
+This intrinsic is used to provide correspondence between the source file and + the generated code. The first argument is the line number (base 1), second + argument is the column number (0 if unknown) and the third argument the + source %llvm.dbg.compile_unit* + cast to a { }*. Code following a call to this intrinsic will + have been defined in close proximity of the line, column and file. This + information holds until the next call + to %lvm.dbg.stoppoint.
-To do this, most of the debugging information (descriptors for types, -variables, functions, source files, etc) is inserted by the language front-end -in the form of LLVM global variables. These LLVM global variables are no -different from any other global variables, except that they have a web of LLVM -intrinsic functions that point to them. If the last references to a particular -piece of debugging information are deleted (for example, by the --globaldce pass), the extraneous debug information will automatically -become dead and be removed by the optimizer.
- -The debugger is designed to be agnostic about the contents of most of the -debugging information. It uses a source-language-specific -module to decode the information that represents variables, types, -functions, namespaces, etc: this allows for arbitrary source-language semantics -and type-systems to be used, as long as there is a module written for the -debugger to interpret the information.
+To provide basic functionality, the LLVM debugger does have to make some -assumptions about the source-level language being debugged, though it keeps -these to a minimum. The only common features that the LLVM debugger assumes -exist are source files, and program objects. These abstract objects are -used by the debugger to form stack traces, show information about local -variables, etc.
+ + -This section of the documentation first describes the representation aspects -common to any source-language. The next section -describes the data layout conventions used by the C and C++ front-ends.
++ void %llvm.dbg.func.start( { }* ) ++ +
This intrinsic is used to link the debug information + in %llvm.dbg.subprogram to the + function. It defines the beginning of the function's declarative region + (scope). It also implies a call to + %llvm.dbg.stoppoint which + defines a source line "stop point". The intrinsic should be called early in + the function after the all the alloca instructions. It should be paired off + with a closing + %llvm.dbg.region.end. + The function's single argument is + the %llvm.dbg.subprogram.type.
One important aspect of the LLVM debug representation is that it allows the -LLVM debugger to efficiently index all of the global objects without having the -scan the program. To do this, all of the global objects use "anchor" globals of -type "{}", with designated names. These anchor objects obviously do -not contain any content or meaning by themselves, but all of the global objects -of a particular type (e.g., source file descriptors) contain a pointer to the -anchor. This pointer allows the debugger to use def-use chains to find all -global objects of that type.
- -So far, the following names are recognized as anchors by the LLVM -debugger:
++ void %llvm.dbg.region.start( { }* ) ++ +
This intrinsic is used to define the beginning of a declarative scope (ex. + block) for local language elements. It should be paired off with a closing + %llvm.dbg.region.end. The + function's single argument is + the %llvm.dbg.block which is + starting.
+ +- %llvm.dbg.translation_units = linkonce global {} {} - %llvm.dbg.globals = linkonce global {} {} + void %llvm.dbg.region.end( { }* )-
Using anchors in this way (where the source file descriptor points to the -anchors, as opposed to having a list of source file descriptors) allows for the -standard dead global elimination and merging passes to automatically remove -unused debugging information. If the globals were kept track of through lists, -there would always be an object pointing to the descriptors, thus would never be -deleted.
+This intrinsic is used to define the end of a declarative scope (ex. block) + for local language elements. It should be paired off with an + opening %llvm.dbg.region.start + or %llvm.dbg.func.start. + The function's single argument is either + the %llvm.dbg.block or + the %llvm.dbg.subprogram.type + which is ending.
+ ++ void %llvm.dbg.declare( { } *, { }* ) ++ +
This intrinsic provides information about a local element (ex. variable.) The + first argument is the alloca for the variable, cast to a { }*. The + second argument is + the %llvm.dbg.variable containing + the description of the variable, also cast to a { }*.
LLVM debugger "stop points" are a key part of the debugging representation -that allows the LLVM to maintain simple semantics for debugging optimized code. The basic idea is that the -front-end inserts calls to the %llvm.dbg.stoppoint intrinsic function -at every point in the program where the debugger should be able to inspect the -program (these correspond to places the debugger stops when you "step" -through it). The front-end can choose to place these as fine-grained as it -would like (for example, before every subexpression evaluated), but it is -recommended to only put them after every source statement that includes -executable code.
+ that allows the LLVM to maintain simple semantics + for debugging optimized code. The basic idea is that + the front-end inserts calls to + the %llvm.dbg.stoppoint + intrinsic function at every point in the program where a debugger should be + able to inspect the program (these correspond to places a debugger stops when + you "step" through it). The front-end can choose to place these as + fine-grained as it would like (for example, before every subexpression + evaluated), but it is recommended to only put them after every source + statement that includes executable code.Using calls to this intrinsic function to demark legal points for the -debugger to inspect the program automatically disables any optimizations that -could potentially confuse debugging information. To non-debug-information-aware -transformations, these calls simply look like calls to an external function, -which they must assume to do anything (including reading or writing to any part -of reachable memory). On the other hand, it does not impact many optimizations, -such as code motion of non-trapping instructions, nor does it impact -optimization of subexpressions, code duplication transformations, or basic-block -reordering transformations.
- -An important aspect of the calls to the %llvm.dbg.stoppoint -intrinsic is that the function-local debugging information is woven together -with use-def chains. This makes it easy for the debugger to, for example, -locate the 'next' stop point. For a concrete example of stop points, see the -example in the next section.
+ debugger to inspect the program automatically disables any optimizations that + could potentially confuse debugging information. To + non-debug-information-aware transformations, these calls simply look like + calls to an external function, which they must assume to do anything + (including reading or writing to any part of reachable memory). On the other + hand, it does not impact many optimizations, such as code motion of + non-trapping instructions, nor does it impact optimization of subexpressions, + code duplication transformations, or basic-block reordering + transformations.In many languages, the local variables in functions can have their lifetime -or scope limited to a subset of a function. In the C family of languages, for -example, variables are only live (readable and writable) within the source block -that they are defined in. In functional languages, values are only readable -after they have been defined. Though this is a very obvious concept, it is also -non-trivial to model in LLVM, because it has no notion of scoping in this sense, -and does not want to be tied to a language's scoping rules.
+ or scope limited to a subset of a function. In the C family of languages, + for example, variables are only live (readable and writable) within the + source block that they are defined in. In functional languages, values are + only readable after they have been defined. Though this is a very obvious + concept, it is also non-trivial to model in LLVM, because it has no notion of + scoping in this sense, and does not want to be tied to a language's scoping + rules.In order to handle this, the LLVM debug format uses the notion of "regions" -of a function, delineated by calls to intrinsic functions. These intrinsic -functions define new regions of the program and indicate when the region -lifetime expires. Consider the following C fragment, for example:
+ of a function, delineated by calls to intrinsic functions. These intrinsic + functions define new regions of the program and indicate when the region + lifetime expires. Consider the following C fragment, for example: +1. void foo() { 2. int X = ...; @@ -763,352 +941,931 @@ lifetime expires. Consider the following C fragment, for example: 8. ... 9. }+
Compiled to LLVM, this function would be represented like this (FIXME: CHECK -AND UPDATE THIS):
+Compiled to LLVM, this function would be represented like this:
+void %foo() { +entry: %X = alloca int %Y = alloca int %Z = alloca int - %D1 = call {}* %llvm.dbg.func.start(%lldb.global* %d.foo) - %D2 = call {}* %llvm.dbg.stoppoint({}* %D1, uint 2, uint 2, %lldb.compile_unit* %file) - - %D3 = call {}* %llvm.dbg.DEFINEVARIABLE({}* %D2, ...) + + ... + + call void @llvm.dbg.func.start( %llvm.dbg.subprogram.type* @llvm.dbg.subprogram ) + + call void @llvm.dbg.stoppoint( uint 2, uint 2, %llvm.dbg.compile_unit* @llvm.dbg.compile_unit ) + + call void @llvm.dbg.declare({}* %X, ...) + call void @llvm.dbg.declare({}* %Y, ...) + ;; Evaluate expression on line 2, assigning to X. - %D4 = call {}* %llvm.dbg.stoppoint({}* %D3, uint 3, uint 2, %lldb.compile_unit* %file) - - %D5 = call {}* %llvm.dbg.DEFINEVARIABLE({}* %D4, ...) + + call void @llvm.dbg.stoppoint( uint 3, uint 2, %llvm.dbg.compile_unit* @llvm.dbg.compile_unit ) + ;; Evaluate expression on line 3, assigning to Y. - %D6 = call {}* %llvm.dbg.stoppoint({}* %D5, uint 5, uint 4, %lldb.compile_unit* %file) - - %D7 = call {}* %llvm.region.start({}* %D6) - %D8 = call {}* %llvm.dbg.DEFINEVARIABLE({}* %D7, ...) + + call void @llvm.region.start() + call void @llvm.dbg.stoppoint( uint 5, uint 4, %llvm.dbg.compile_unit* @llvm.dbg.compile_unit ) + call void @llvm.dbg.declare({}* %X, ...) + ;; Evaluate expression on line 5, assigning to Z. - %D9 = call {}* %llvm.dbg.stoppoint({}* %D8, uint 6, uint 4, %lldb.compile_unit* %file) - - ;; Code for line 6. - %D10 = call {}* %llvm.region.end({}* %D9) - %D11 = call {}* %llvm.dbg.stoppoint({}* %D10, uint 8, uint 2, %lldb.compile_unit* %file) - - ;; Code for line 8. - %D12 = call {}* %llvm.region.end({}* %D11) + + call void @llvm.dbg.stoppoint( uint 7, uint 2, %llvm.dbg.compile_unit* @llvm.dbg.compile_unit ) + call void @llvm.region.end() + + call void @llvm.dbg.stoppoint( uint 9, uint 2, %llvm.dbg.compile_unit* @llvm.dbg.compile_unit ) + + call void @llvm.region.end() + ret void }+
This example illustrates a few important details about the LLVM debugging -information. In particular, it shows how the various intrinsics used are woven -together with def-use and use-def chains, similar to how anchors are used with globals. This allows -the debugger to analyze the relationship between statements, variable -definitions, and the code used to implement the function.
- -In this example, two explicit regions are defined, one with the definition of the %D1 variable and one with the -definition of %D7. In the case of -%D1, the debug information indicates that the function whose descriptor is specified as an argument to the -intrinsic. This defines a new stack frame whose lifetime ends when the region -is ended by the %D12 call.
+ information. In particular, it shows how the various intrinsics are applied + together to allow a debugger to analyze the relationship between statements, + variable definitions, and the code used to implement the function. + +The first + intrinsic %llvm.dbg.func.start + provides a link with the subprogram + descriptor containing the details of this function. This call also + defines the beginning of the function region, bounded by + the %llvm.region.end at the + end of the function. This region is used to bracket the lifetime of + variables declared within. For a function, this outer region defines a new + stack frame whose lifetime ends when the region is ended.
+ +It is possible to define inner regions for short term variables by using the + %llvm.region.start + and %llvm.region.end to + bound a region. The inner region in this example would be for the block + containing the declaration of Z.
Using regions to represent the boundaries of source-level functions allow -LLVM interprocedural optimizations to arbitrarily modify LLVM functions without -having to worry about breaking mapping information between the LLVM code and the -and source-level program. In particular, the inliner requires no modification -to support inlining with debugging information: there is no explicit correlation -drawn between LLVM functions and their source-level counterparts (note however, -that if the inliner inlines all instances of a non-strong-linkage function into -its caller that it will not be possible for the user to manually invoke the -inlined function from the debugger).
- -Once the function has been defined, the stopping point corresponding to line #2 of -the function is encountered. At this point in the function, no local -variables are live. As lines 2 and 3 of the example are executed, their -variable definitions are automatically introduced into the program, without the -need to specify a new region. These variables do not require new regions to be -introduced because they go out of scope at the same point in the program: line -9.
+ LLVM interprocedural optimizations to arbitrarily modify LLVM functions + without having to worry about breaking mapping information between the LLVM + code and the and source-level program. In particular, the inliner requires + no modification to support inlining with debugging information: there is no + explicit correlation drawn between LLVM functions and their source-level + counterparts (note however, that if the inliner inlines all instances of a + non-strong-linkage function into its caller that it will not be possible for + the user to manually invoke the inlined function from a debugger). + +Once the function has been defined, + the stopping point + corresponding to line #2 (column #2) of the function is encountered. At this + point in the function, no local variables are live. As lines 2 and 3 + of the example are executed, their variable definitions are introduced into + the program using + %llvm.dbg.declare, without the + need to specify a new region. These variables do not require new regions to + be introduced because they go out of scope at the same point in the program: + line 9.
In contrast, the Z variable goes out of scope at a different time, -on line 7. For this reason, it is defined within the -%D7 region, which kills the availability of Z before the -code for line 8 is executed. In this way, regions can support arbitrary -source-language scoping rules, as long as they can only be nested (ie, one scope -cannot partially overlap with a part of another scope).
+ on line 7. For this reason, it is defined within the inner region, which + kills the availability of Z before the code for line 8 is executed. + In this way, regions can support arbitrary source-language scoping rules, as + long as they can only be nested (ie, one scope cannot partially overlap with + a part of another scope).It is worth noting that this scoping mechanism is used to control scoping of -all declarations, not just variable declarations. For example, the scope of a -C++ using declaration is controlled with this, and the llvm-db C++ -support routines could use this to change how name lookup is performed (though -this is not implemented yet).
+ all declarations, not just variable declarations. For example, the scope of + a C++ using declaration is controlled with this and could change how name + lookup is performed. + +The C and C++ front-ends represent information about the program in a format + that is effectively identical + to DWARF 3.0 in + terms of information content. This allows code generators to trivially + support native debuggers by generating standard dwarf information, and + contains enough information for non-dwarf targets to translate it as + needed.
+ +This section describes the forms used to represent C and C++ programs. Other + languages could pattern themselves after this (which itself is tuned to + representing programs in the same way that DWARF 3 does), or they could + choose to provide completely different forms if they don't fit into the DWARF + model. As support for debugging information gets added to the various LLVM + source-language front-ends, the information used should be documented + here.
+ +The following sections provide examples of various C/C++ constructs and the + debug information that would best describe those constructs.
The LLVM debugger expects the descriptors for program objects to start in a -canonical format, but the descriptors can include additional information -appended at the end that is source-language specific. All LLVM debugging -information is versioned, allowing backwards compatibility in the case that the -core structures need to change in some way. Also, all debugging information -objects start with a tag to indicate what type -of object it is. The source-language is allows to define its own objects, by -using unreserved tag numbers.
-The lowest-level descriptor are those describing the files containing the program source -code, as most other descriptors (sometimes indirectly) refer to them. -
+Given the source files MySource.cpp and MyHeader.h located + in the directory /Users/mine/sources, the following code:
+ ++#include "MyHeader.h" + +int main(int argc, char *argv[]) { + return 0; +} +
a C/C++ front-end would generate the following descriptors:
+ ++... +;; +;; Define types used. In this case we need one for compile unit anchors and one +;; for compile units. +;; +%llvm.dbg.anchor.type = type { uint, uint } +%llvm.dbg.compile_unit.type = type { uint, { }*, uint, uint, i8*, i8*, i8* } +... +;; +;; Define the anchor for compile units. Note that the second field of the +;; anchor is 17, which is the same as the tag for compile units +;; (17 = DW_TAG_compile_unit.) +;; +%llvm.dbg.compile_units = linkonce constant %llvm.dbg.anchor.type { uint 0, uint 17 }, section "llvm.metadata" + +;; +;; Define the compile unit for the source file "/Users/mine/sources/MySource.cpp". +;; +%llvm.dbg.compile_unit1 = internal constant %llvm.dbg.compile_unit.type { + uint add(uint 17, uint 262144), + { }* cast (%llvm.dbg.anchor.type* %llvm.dbg.compile_units to { }*), + uint 1, + uint 1, + i8* getelementptr ([13 x i8]* %str1, i32 0, i32 0), + i8* getelementptr ([21 x i8]* %str2, i32 0, i32 0), + i8* getelementptr ([33 x i8]* %str3, i32 0, i32 0) }, section "llvm.metadata" + +;; +;; Define the compile unit for the header file "/Users/mine/sources/MyHeader.h". +;; +%llvm.dbg.compile_unit2 = internal constant %llvm.dbg.compile_unit.type { + uint add(uint 17, uint 262144), + { }* cast (%llvm.dbg.anchor.type* %llvm.dbg.compile_units to { }*), + uint 1, + uint 1, + i8* getelementptr ([11 x i8]* %str4, int 0, int 0), + i8* getelementptr ([21 x i8]* %str2, int 0, int 0), + i8* getelementptr ([33 x i8]* %str3, int 0, int 0) }, section "llvm.metadata" + +;; +;; Define each of the strings used in the compile units. +;; +%str1 = internal constant [13 x i8] c"MySource.cpp\00", section "llvm.metadata"; +%str2 = internal constant [21 x i8] c"/Users/mine/sources/\00", section "llvm.metadata"; +%str3 = internal constant [33 x i8] c"4.0.1 LLVM (LLVM research group)\00", section "llvm.metadata"; +%str4 = internal constant [11 x i8] c"MyHeader.h\00", section "llvm.metadata"; +... ++
-Source file descriptors are patterned after the Dwarf "compile_unit" object. -The descriptor currently is defined to have at least the following LLVM -type entries:
+Given an integer global variable declared as follows:
+ ++int MyGlobal = 100; ++
a C/C++ front-end would generate the following descriptors:
+ +-%lldb.compile_unit = type { - uint, ;; Tag: LLVM_COMPILE_UNIT - ushort, ;; LLVM debug version number - ushort, ;; Dwarf language identifier - sbyte*, ;; Filename - sbyte*, ;; Working directory when compiled - sbyte* ;; Producer of the debug information +;; +;; Define types used. One for global variable anchors, one for the global +;; variable descriptor, one for the global's basic type and one for the global's +;; compile unit. +;; +%llvm.dbg.anchor.type = type { uint, uint } +%llvm.dbg.global_variable.type = type { uint, { }*, { }*, i8*, { }*, uint, { }*, bool, bool, { }*, uint } +%llvm.dbg.basictype.type = type { uint, { }*, i8*, { }*, int, uint, uint, uint, uint } +%llvm.dbg.compile_unit.type = ... +... +;; +;; Define the global itself. +;; +%MyGlobal = global int 100 +... +;; +;; Define the anchor for global variables. Note that the second field of the +;; anchor is 52, which is the same as the tag for global variables +;; (52 = DW_TAG_variable.) +;; +%llvm.dbg.global_variables = linkonce constant %llvm.dbg.anchor.type { uint 0, uint 52 }, section "llvm.metadata" + +;; +;; Define the global variable descriptor. Note the reference to the global +;; variable anchor and the global variable itself. +;; +%llvm.dbg.global_variable = internal constant %llvm.dbg.global_variable.type { + uint add(uint 52, uint 262144), + { }* cast (%llvm.dbg.anchor.type* %llvm.dbg.global_variables to { }*), + { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), + i8* getelementptr ([9 x i8]* %str1, int 0, int 0), + i8* getelementptr ([1 x i8]* %str2, int 0, int 0), + { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), + uint 1, + { }* cast (%llvm.dbg.basictype.type* %llvm.dbg.basictype to { }*), + bool false, + bool true, + { }* cast (int* %MyGlobal to { }*) }, section "llvm.metadata" + +;; +;; Define the basic type of 32 bit signed integer. Note that since int is an +;; intrinsic type the source file is NULL and line 0. +;; +%llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type { + uint add(uint 36, uint 262144), + { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), + i8* getelementptr ([4 x i8]* %str3, int 0, int 0), + { }* null, + int 0, + uint 32, + uint 32, + uint 0, + uint 5 }, section "llvm.metadata" + +;; +;; Define the names of the global variable and basic type. +;; +%str1 = internal constant [9 x i8] c"MyGlobal\00", section "llvm.metadata" +%str2 = internal constant [1 x i8] c"\00", section "llvm.metadata" +%str3 = internal constant [4 x i8] c"int\00", section "llvm.metadata" ++
Given a function declared as follows:
+ ++int main(int argc, char *argv[]) { + return 0; }+
-These descriptors contain the version number for the debug info, a source -language ID for the file (we use the Dwarf 3.0 ID numbers, such as -DW_LANG_C89, DW_LANG_C_plus_plus, DW_LANG_Cobol74, -etc), three strings describing the filename, working directory of the compiler, -and an identifier string for the compiler that produced it. Note that actual -compile_unit declarations must also include an anchor to llvm.dbg.translation_units, -but it is not specified where the anchor is to be located. Here is an example -descriptor: -
- --%arraytest_source_file = internal constant %lldb.compile_unit { - uint 17, ; Tag value - ushort 0, ; Version #0 - ushort 1, ; DW_LANG_C89 - sbyte* getelementptr ([12 x sbyte]* %.str_1, long 0, long 0), ; filename - sbyte* getelementptr ([12 x sbyte]* %.str_2, long 0, long 0), ; working dir - sbyte* getelementptr ([12 x sbyte]* %.str_3, long 0, long 0), ; producer - {}* %llvm.dbg.translation_units ; Anchor +a C/C++ front-end would generate the following descriptors:
+ ++-+;; +;; Define types used. One for subprogram anchors, one for the subprogram +;; descriptor, one for the global's basic type and one for the subprogram's +;; compile unit. +;; +%llvm.dbg.subprogram.type = type { uint, { }*, { }*, i8*, { }*, bool, bool } +%llvm.dbg.anchor.type = type { uint, uint } +%llvm.dbg.compile_unit.type = ... + +;; +;; Define the anchor for subprograms. Note that the second field of the +;; anchor is 46, which is the same as the tag for subprograms +;; (46 = DW_TAG_subprogram.) +;; +%llvm.dbg.subprograms = linkonce constant %llvm.dbg.anchor.type { uint 0, uint 46 }, section "llvm.metadata" + +;; +;; Define the descriptor for the subprogram. TODO - more details. +;; +%llvm.dbg.subprogram = internal constant %llvm.dbg.subprogram.type { + uint add(uint 46, uint 262144), + { }* cast (%llvm.dbg.anchor.type* %llvm.dbg.subprograms to { }*), + { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), + i8* getelementptr ([5 x i8]* %str1, int 0, int 0), + i8* getelementptr ([1 x i8]* %str2, int 0, int 0), + { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), + uint 1, + { }* null, + bool false, + bool true }, section "llvm.metadata" + +;; +;; Define the name of the subprogram. +;; +%str1 = internal constant [5 x i8] c"main\00", section "llvm.metadata" +%str2 = internal constant [1 x i8] c"\00", section "llvm.metadata" + +;; +;; Define the subprogram itself. +;; +int %main(int %argc, i8** %argv) { +... } -%.str_1 = internal constant [12 x sbyte] c"arraytest.c\00" -%.str_2 = internal constant [12 x sbyte] c"/home/sabre\00" -%.str_3 = internal constant [12 x sbyte] c"llvmgcc 3.4\00" -+ +-Note that the LLVM constant merging pass should eliminate duplicate copies of -the strings that get emitted to each translation unit, such as the producer. -
+
The following are the basic type descriptors for C/C++ core types:
- +-The LLVM debugger needs to know about some source-language program objects, in -order to build stack traces, print information about local variables, and other -related activities. The LLVM debugger differentiates between three different -types of program objects: subprograms (functions, messages, methods, etc), -variables (locals and globals), and others. Because source-languages have -widely varying forms of these objects, the LLVM debugger expects only a few -fields in the descriptor for each object: -
+-%lldb.object = type { - uint, ;; A tag - any*, ;; The context for the object - sbyte* ;; The object 'name' -} +%llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type { + uint add(uint 36, uint 262144), + { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), + i8* getelementptr ([5 x i8]* %str1, int 0, int 0), + { }* null, + int 0, + uint 32, + uint 32, + uint 0, + uint 2 }, section "llvm.metadata" +%str1 = internal constant [5 x i8] c"bool\00", section "llvm.metadata"+
The first field contains a tag for the descriptor. The second field contains -either a pointer to the descriptor for the containing source file, or it contains a pointer to -another program object whose context pointer eventually reaches a source file. -Through this context pointer, the -LLVM debugger can establish the debug version number of the object.
+ +The third field contains a string that the debugger can use to identify the -object if it does not contain explicit support for the source-language in use -(ie, the 'unknown' source language handler uses this string). This should be -some sort of unmangled string that corresponds to the object, but it is a -quality of implementation issue what exactly it contains (it is legal, though -not useful, for all of these strings to be null).
+Note again that descriptors can be extended to include -source-language-specific information in addition to the fields required by the -LLVM debugger. See the section on the C/C++ -front-end for more information. Also remember that global objects -(functions, selectors, global variables, etc) must contain an anchor to the llvm.dbg.globals -variable.
++%llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type { + uint add(uint 36, uint 262144), + { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), + i8* getelementptr ([5 x i8]* %str1, int 0, int 0), + { }* null, + int 0, + uint 8, + uint 8, + uint 0, + uint 6 }, section "llvm.metadata" +%str1 = internal constant [5 x i8] c"char\00", section "llvm.metadata" +
-Allow source-language specific contexts, use to identify namespaces etc -Must end up in a source file descriptor. -Debugger core ignores all unknown context objects. +%llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type { + uint add(uint 36, uint 262144), + { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), + i8* getelementptr ([14 x i8]* %str1, int 0, int 0), + { }* null, + int 0, + uint 8, + uint 8, + uint 0, + uint 8 }, section "llvm.metadata" +%str1 = internal constant [14 x i8] c"unsigned char\00", section "llvm.metadata"
-Define each intrinsics, as an extension of the language reference manual. +%llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type { + uint add(uint 36, uint 262144), + { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), + i8* getelementptr ([10 x i8]* %str1, int 0, int 0), + { }* null, + int 0, + uint 16, + uint 16, + uint 0, + uint 5 }, section "llvm.metadata" +%str1 = internal constant [10 x i8] c"short int\00", section "llvm.metadata" ++
+%llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type { + uint add(uint 36, uint 262144), + { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), + i8* getelementptr ([19 x i8]* %str1, int 0, int 0), + { }* null, + int 0, + uint 16, + uint 16, + uint 0, + uint 7 }, section "llvm.metadata" +%str1 = internal constant [19 x i8] c"short unsigned int\00", section "llvm.metadata"
Happen to be the same value as the similarly named Dwarf-3 tags, this may -change in the future.
++%llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type { + uint add(uint 36, uint 262144), + { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), + i8* getelementptr ([4 x i8]* %str1, int 0, int 0), + { }* null, + int 0, + uint 32, + uint 32, + uint 0, + uint 5 }, section "llvm.metadata" +%str1 = internal constant [4 x i8] c"int\00", section "llvm.metadata" +
- LLVM_COMPILE_UNIT : 17 - LLVM_SUBPROGRAM : 46 - LLVM_VARIABLE : 52 - +%llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type { + uint add(uint 36, uint 262144), + { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), + i8* getelementptr ([13 x i8]* %str1, int 0, int 0), + { }* null, + int 0, + uint 32, + uint 32, + uint 0, + uint 7 }, section "llvm.metadata" +%str1 = internal constant [13 x i8] c"unsigned int\00", section "llvm.metadata"
The C and C++ front-ends represent information about the program in a format -that is effectively identical to Dwarf 3.0 in terms of -information content. This allows code generators to trivially support native -debuggers by generating standard dwarf information, and contains enough -information for non-dwarf targets to translate it as needed.
- -The basic debug information required by the debugger is (intentionally) -designed to be as minimal as possible. This basic information is so minimal -that it is unlikely that any source-language could be adequately -described by it. Because of this, the debugger format was designed for -extension to support source-language-specific information. The extended -descriptors are read and interpreted by the language-specific modules in the debugger if there is -support available, otherwise it is ignored.
- -This section describes the extensions used to represent C and C++ programs. -Other languages could pattern themselves after this (which itself is tuned to -representing programs in the same way that Dwarf 3 does), or they could choose -to provide completely different extensions if they don't fit into the Dwarf -model. As support for debugging information gets added to the various LLVM -source-language front-ends, the information used should be documented here.
++%llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type { + uint add(uint 36, uint 262144), + { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), + i8* getelementptr ([14 x i8]* %str1, int 0, int 0), + { }* null, + int 0, + uint 64, + uint 64, + uint 0, + uint 5 }, section "llvm.metadata" +%str1 = internal constant [14 x i8] c"long long int\00", section "llvm.metadata" ++
TODO
+ ++%llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type { + uint add(uint 36, uint 262144), + { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), + i8* getelementptr ([23 x i8]* %str1, int 0, int 0), + { }* null, + int 0, + uint 64, + uint 64, + uint 0, + uint 7 }, section "llvm.metadata" +%str1 = internal constant [23 x 8] c"long long unsigned int\00", section "llvm.metadata" +
-Translation units do not add any information over the standard source file representation already -expected by the debugger. As such, it uses descriptors of the type specified, -with a trailing anchor. -
+ ++%llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type { + uint add(uint 36, uint 262144), + { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), + i8* getelementptr ([6 x i8]* %str1, int 0, int 0), + { }* null, + int 0, + uint 32, + uint 32, + uint 0, + uint 4 }, section "llvm.metadata" +%str1 = internal constant [6 x i8] c"float\00", section "llvm.metadata" +
+%llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type { + uint add(uint 36, uint 262144), + { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), + 8* getelementptr ([7 x 8]* %str1, int 0, int 0), + { }* null, + int 0, + uint 64, + uint 64, + uint 0, + uint 4 }, section "llvm.metadata" +%str1 = internal constant [7 x 8] c"double\00", section "llvm.metadata" ++
Given the following as an example of C/C++ derived type:
+ ++typedef const int *IntPtr; ++
a C/C++ front-end would generate the following descriptors:
+ ++;; +;; Define the typedef "IntPtr". +;; +%llvm.dbg.derivedtype1 = internal constant %llvm.dbg.derivedtype.type { + uint add(uint 22, uint 262144), + { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), + i8* getelementptr ([7 x 8]* %str1, int 0, int 0), + { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), + int 1, + uint 0, + uint 0, + uint 0, + { }* cast (%llvm.dbg.derivedtype.type* %llvm.dbg.derivedtype2 to { }*) }, section "llvm.metadata" +%str1 = internal constant [7 x 8] c"IntPtr\00", section "llvm.metadata" + +;; +;; Define the pointer type. +;; +%llvm.dbg.derivedtype2 = internal constant %llvm.dbg.derivedtype.type { + uint add(uint 15, uint 262144), + { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), + i8* null, + { }* null, + int 0, + uint 32, + uint 32, + uint 0, + { }* cast (%llvm.dbg.derivedtype.type* %llvm.dbg.derivedtype3 to { }*) }, section "llvm.metadata" + +;; +;; Define the const type. +;; +%llvm.dbg.derivedtype3 = internal constant %llvm.dbg.derivedtype.type { + uint add(uint 38, uint 262144), + { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), + i8* null, + { }* null, + int 0, + uint 0, + uint 0, + uint 0, + { }* cast (%llvm.dbg.basictype.type* %llvm.dbg.basictype1 to { }*) }, section "llvm.metadata" + +;; +;; Define the int type. +;; +%llvm.dbg.basictype1 = internal constant %llvm.dbg.basictype.type { + uint add(uint 36, uint 262144), + { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), + 8* getelementptr ([4 x 8]* %str2, int 0, int 0), + { }* null, + int 0, + uint 32, + uint 32, + uint 0, + uint 5 }, section "llvm.metadata" +%str2 = internal constant [4 x 8] c"int\00", section "llvm.metadata" ++
TODO
+ +Given the following as an example of C/C++ struct type:
+ ++struct Color { + unsigned Red; + unsigned Green; + unsigned Blue; +}; ++
a C/C++ front-end would generate the following descriptors:
+ ++;; +;; Define basic type for unsigned int. +;; +%llvm.dbg.basictype = internal constant %llvm.dbg.basictype.type { + uint add(uint 36, uint 262144), + { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), + i8* getelementptr ([13 x i8]* %str1, int 0, int 0), + { }* null, + int 0, + uint 32, + uint 32, + uint 0, + uint 7 }, section "llvm.metadata" +%str1 = internal constant [13 x i8] c"unsigned int\00", section "llvm.metadata" + +;; +;; Define composite type for struct Color. +;; +%llvm.dbg.compositetype = internal constant %llvm.dbg.compositetype.type { + uint add(uint 19, uint 262144), + { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), + i8* getelementptr ([6 x i8]* %str2, int 0, int 0), + { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), + int 1, + uint 96, + uint 32, + uint 0, + { }* null, + { }* cast ([3 x { }*]* %llvm.dbg.array to { }*) }, section "llvm.metadata" +%str2 = internal constant [6 x i8] c"Color\00", section "llvm.metadata" + +;; +;; Define the Red field. +;; +%llvm.dbg.derivedtype1 = internal constant %llvm.dbg.derivedtype.type { + uint add(uint 13, uint 262144), + { }* null, + i8* getelementptr ([4 x i8]* %str3, int 0, int 0), + { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), + int 2, + uint 32, + uint 32, + uint 0, + { }* cast (%llvm.dbg.basictype.type* %llvm.dbg.basictype to { }*) }, section "llvm.metadata" +%str3 = internal constant [4 x i8] c"Red\00", section "llvm.metadata" + +;; +;; Define the Green field. +;; +%llvm.dbg.derivedtype2 = internal constant %llvm.dbg.derivedtype.type { + uint add(uint 13, uint 262144), + { }* null, + i8* getelementptr ([6 x i8]* %str4, int 0, int 0), + { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), + int 3, + uint 32, + uint 32, + uint 32, + { }* cast (%llvm.dbg.basictype.type* %llvm.dbg.basictype to { }*) }, section "llvm.metadata" +%str4 = internal constant [6 x i8] c"Green\00", section "llvm.metadata" + +;; +;; Define the Blue field. +;; +%llvm.dbg.derivedtype3 = internal constant %llvm.dbg.derivedtype.type { + uint add(uint 13, uint 262144), + { }* null, + i8* getelementptr ([5 x i8]* %str5, int 0, int 0), + { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), + int 4, + uint 32, + uint 32, + uint 64, + { }* cast (%llvm.dbg.basictype.type* %llvm.dbg.basictype to { }*) }, section "llvm.metadata" +%str5 = internal constant [5 x 8] c"Blue\00", section "llvm.metadata" + +;; +;; Define the array of fields used by the composite type Color. +;; +%llvm.dbg.array = internal constant [3 x { }*] [ + { }* cast (%llvm.dbg.derivedtype.type* %llvm.dbg.derivedtype1 to { }*), + { }* cast (%llvm.dbg.derivedtype.type* %llvm.dbg.derivedtype2 to { }*), + { }* cast (%llvm.dbg.derivedtype.type* %llvm.dbg.derivedtype3 to { }*) ], section "llvm.metadata" ++
TODO
+ +Given the following as an example of C/C++ enumeration type:
+ ++enum Trees { + Spruce = 100, + Oak = 200, + Maple = 300 +}; +
a C/C++ front-end would generate the following descriptors:
+ ++;; +;; Define composite type for enum Trees +;; +%llvm.dbg.compositetype = internal constant %llvm.dbg.compositetype.type { + uint add(uint 4, uint 262144), + { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), + i8* getelementptr ([6 x i8]* %str1, int 0, int 0), + { }* cast (%llvm.dbg.compile_unit.type* %llvm.dbg.compile_unit to { }*), + int 1, + uint 32, + uint 32, + uint 0, + { }* null, + { }* cast ([3 x { }*]* %llvm.dbg.array to { }*) }, section "llvm.metadata" +%str1 = internal constant [6 x i8] c"Trees\00", section "llvm.metadata" + +;; +;; Define Spruce enumerator. +;; +%llvm.dbg.enumerator1 = internal constant %llvm.dbg.enumerator.type { + uint add(uint 40, uint 262144), + i8* getelementptr ([7 x i8]* %str2, int 0, int 0), + int 100 }, section "llvm.metadata" +%str2 = internal constant [7 x i8] c"Spruce\00", section "llvm.metadata" + +;; +;; Define Oak enumerator. +;; +%llvm.dbg.enumerator2 = internal constant %llvm.dbg.enumerator.type { + uint add(uint 40, uint 262144), + i8* getelementptr ([4 x i8]* %str3, int 0, int 0), + int 200 }, section "llvm.metadata" +%str3 = internal constant [4 x i8] c"Oak\00", section "llvm.metadata" + +;; +;; Define Maple enumerator. +;; +%llvm.dbg.enumerator3 = internal constant %llvm.dbg.enumerator.type { + uint add(uint 40, uint 262144), + i8* getelementptr ([6 x i8]* %str4, int 0, int 0), + int 300 }, section "llvm.metadata" +%str4 = internal constant [6 x i8] c"Maple\00", section "llvm.metadata" + +;; +;; Define the array of enumerators used by composite type Trees. +;; +%llvm.dbg.array = internal constant [3 x { }*] [ + { }* cast (%llvm.dbg.enumerator.type* %llvm.dbg.enumerator1 to { }*), + { }* cast (%llvm.dbg.enumerator.type* %llvm.dbg.enumerator2 to { }*), + { }* cast (%llvm.dbg.enumerator.type* %llvm.dbg.enumerator3 to { }*) ], section "llvm.metadata" ++