X-Git-Url: http://plrg.eecs.uci.edu/git/?a=blobdiff_plain;f=docs%2FProgrammersManual.html;h=d096f5a722d80b5b8abae79c9c762fd0e24c99a7;hb=5520ad4dd9e3e726f96cf2c32c2b90f9467ff783;hp=e2525ae84e0e5e766f170fa63d3d8a9b379290c6;hpb=6a65f4208f05af4cd2d76ae71c956b426d9e2373;p=oota-llvm.git diff --git a/docs/ProgrammersManual.html b/docs/ProgrammersManual.html index e2525ae84e0..d096f5a722d 100644 --- a/docs/ProgrammersManual.html +++ b/docs/ProgrammersManual.html @@ -2,6 +2,7 @@ "http://www.w3.org/TR/html4/strict.dtd"> + LLVM Programmer's Manual @@ -28,6 +29,13 @@ @@ -117,6 +131,7 @@ with another Value
  • Deleting GlobalVariables
  • +
  • How to Create Types
  • @@ -412,6 +439,107 @@ are lots of examples in the LLVM source base.

    + + +
    + Passing strings (the StringRef +and Twine classes) +
    + +
    + +

    Although LLVM generally does not do much string manipulation, we do have +several important APIs which take strings. Two important examples are the +Value class -- which has names for instructions, functions, etc. -- and the +StringMap class which is used extensively in LLVM and Clang.

    + +

    These are generic classes, and they need to be able to accept strings which +may have embedded null characters. Therefore, they cannot simply take +a const char *, and taking a const std::string& requires +clients to perform a heap allocation which is usually unnecessary. Instead, +many LLVM APIs use a const StringRef& or a const +Twine& for passing strings efficiently.

    + +
    + + +
    + The StringRef class +
    + +
    + +

    The StringRef data type represents a reference to a constant string +(a character array and a length) and supports the common operations available +on std:string, but does not require heap allocation.

    + +

    It can be implicitly constructed using a C style null-terminated string, +an std::string, or explicitly with a character pointer and length. +For example, the StringRef find function is declared as:

    + +
    + iterator find(const StringRef &Key); +
    + +

    and clients can call it using any one of:

    + +
    +
    +  Map.find("foo");                 // Lookup "foo"
    +  Map.find(std::string("bar"));    // Lookup "bar"
    +  Map.find(StringRef("\0baz", 4)); // Lookup "\0baz"
    +
    +
    + +

    Similarly, APIs which need to return a string may return a StringRef +instance, which can be used directly or converted to an std::string +using the str member function. See +"llvm/ADT/StringRef.h" +for more information.

    + +

    You should rarely use the StringRef class directly, because it contains +pointers to external memory it is not generally safe to store an instance of the +class (unless you know that the external storage will not be freed).

    + +
    + + +
    + The Twine class +
    + +
    + +

    The Twine class is an efficient way for APIs to accept concatenated +strings. For example, a common LLVM paradigm is to name one instruction based on +the name of another instruction with a suffix, for example:

    + +
    +
    +    New = CmpInst::Create(..., SO->getName() + ".cmp");
    +
    +
    + +

    The Twine class is effectively a +lightweight rope +which points to temporary (stack allocated) objects. Twines can be implicitly +constructed as the result of the plus operator applied to strings (i.e., a C +strings, an std::string, or a StringRef). The twine delays the +actual concatenation of strings until it is actually required, at which point +it can be efficiently rendered directly into a character array. This avoids +unnecessary heap allocation involved in constructing the temporary results of +string concatenation. See +"llvm/ADT/Twine.h" +for more information.

    + +

    As with a StringRef, Twine objects point to external memory +and should almost never be stored or mentioned directly. They are intended +solely for use when defining a function which should be able to efficiently +accept concatenated strings.

    + +
    + +
    The DEBUG() macro and -debug option @@ -436,7 +564,7 @@ tool) is run with the '-debug' command line argument:

    -DOUT << "I am here!\n";
    +DEBUG(errs() << "I am here!\n");
     
    @@ -481,16 +609,16 @@ option as follows:

    -DOUT << "No debug type\n";
     #undef  DEBUG_TYPE
    +DEBUG(errs() << "No debug type\n");
     #define DEBUG_TYPE "foo"
    -DOUT << "'foo' debug type\n";
    +DEBUG(errs() << "'foo' debug type\n");
     #undef  DEBUG_TYPE
     #define DEBUG_TYPE "bar"
    -DOUT << "'bar' debug type\n";
    +DEBUG(errs() << "'bar' debug type\n"));
     #undef  DEBUG_TYPE
     #define DEBUG_TYPE ""
    -DOUT << "No debug type (2)\n";
    +DEBUG(errs() << "No debug type (2)\n");
     
    @@ -522,6 +650,21 @@ on when the name is specified. This allows, for example, all debug information for instruction scheduling to be enabled with -debug-type=InstrSched, even if the source lives in multiple files.

    +

    The DEBUG_WITH_TYPE macro is also available for situations where you +would like to set DEBUG_TYPE, but only for one specific DEBUG +statement. It takes an additional first parameter, which is the type to use. For +example, the preceding example could be written as:

    + + +
    +
    +DEBUG_WITH_TYPE("", errs() << "No debug type\n");
    +DEBUG_WITH_TYPE("foo", errs() << "'foo' debug type\n");
    +DEBUG_WITH_TYPE("bar", errs() << "'bar' debug type\n"));
    +DEBUG_WITH_TYPE("", errs() << "No debug type (2)\n");
    +
    +
    +
    @@ -714,6 +857,10 @@ access the container. Based on that, you should use:

    iteration, but do not support efficient look-up based on a key. +
  • a string container is a specialized sequential + container or reference structure that is used for character or byte + arrays.
  • +
  • a bit container provides an efficient way to store and perform set operations on sets of numeric id's, while automatically eliminating duplicates. Bit containers require a maximum of 1 bit for each @@ -951,7 +1098,7 @@ in the default manner.

    -

    ilists have another speciality that must be considered. To be a good +

    ilists have another specialty that must be considered. To be a good citizen in the C++ ecosystem, it needs to support the standard container operations, such as begin and end iterators, etc. Also, the operator-- must work correctly on the end iterator in the @@ -1346,6 +1493,23 @@ inserted into the map) that it needs internally.

    + +
    + "llvm/ADT/ValueMap.h" +
    + +
    + +

    +ValueMap is a wrapper around a DenseMap mapping +Value*s (or subclasses) to another type. When a Value is deleted or RAUW'ed, +ValueMap will update itself so the new version of the key is mapped to the same +value, just as if the key were a WeakVH. You can configure exactly how this +happens, and what else happens on these two events, by passing +a Config parameter to the ValueMap template.

    + +
    +
    <map> @@ -1385,6 +1549,20 @@ always better.

    + +
    + String-like containers +
    + +
    + +

    +TODO: const char* vs stringref vs smallstring vs std::string. Describe twine, +xref to #string_apis. +

    + +
    +
    Bit storage containers (BitVector, SparseBitVector) @@ -1408,7 +1586,7 @@ please don't use it.

    -

    The BitVector container provides a fixed size set of bits for manipulation. +

    The BitVector container provides a dynamic size set of bits for manipulation. It supports individual bit setting/testing, as well as set operations. The set operations take time O(size of bitvector), but operations are performed one word at a time, instead of one bit at a time. This makes the BitVector very fast for @@ -1417,6 +1595,25 @@ the number of set bits to be high (IE a dense set).

    + +
    + SmallBitVector +
    + +
    +

    The SmallBitVector container provides the same interface as BitVector, but +it is optimized for the case where only a small number of bits, less than +25 or so, are needed. It also transparently supports larger bit counts, but +slightly less efficiently than a plain BitVector, so SmallBitVector should +only be used when larger counts are rare. +

    + +

    +At this time, SmallBitVector does not support set operations (and, or, xor), +and its operator[] does not provide an assignable lvalue. +

    +
    +
    SparseBitVector @@ -1496,7 +1693,7 @@ an example that prints the name of a BasicBlock and the number of for (Function::iterator i = func->begin(), e = func->end(); i != e; ++i) // Print out the name of the basic block if it has one, and then the // number of instructions that it contains - llvm::cerr << "Basic block (name=" << i->getName() << ") has " + errs() << "Basic block (name=" << i->getName() << ") has " << i->size() << " instructions.\n";
    @@ -1529,14 +1726,14 @@ a BasicBlock:

    for (BasicBlock::iterator i = blk->begin(), e = blk->end(); i != e; ++i) // The next statement works since operator<<(ostream&,...) // is overloaded for Instruction& - llvm::cerr << *i << "\n"; + errs() << *i << "\n";

    However, this isn't really the best way to print out the contents of a BasicBlock! Since the ostream operators are overloaded for virtually anything you'll care about, you could have just invoked the print routine on the -basic block itself: llvm::cerr << *blk << "\n";.

    +basic block itself: errs() << *blk << "\n";.

    @@ -1562,7 +1759,7 @@ small example that shows how to dump all instructions in a function to the stand // F is a pointer to a Function instance for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I) - llvm::cerr << *I << "\n"; + errs() << *I << "\n"; @@ -1641,7 +1838,7 @@ without actually obtaining it via iteration over some structure:

    void printNextInstruction(Instruction* inst) { BasicBlock::iterator it(inst); ++it; // After this line, it refers to the instruction after *inst - if (it != inst->getParent()->end()) llvm::cerr << *it << "\n"; + if (it != inst->getParent()->end()) errs() << *it << "\n"; } @@ -1759,8 +1956,8 @@ Function *F = ...; for (Value::use_iterator i = F->use_begin(), e = F->use_end(); i != e; ++i) if (Instruction *Inst = dyn_cast<Instruction>(*i)) { - llvm::cerr << "F is used in instruction:\n"; - llvm::cerr << *Inst << "\n"; + errs() << "F is used in instruction:\n"; + errs() << *Inst << "\n"; } @@ -2088,6 +2285,235 @@ GV->eraseFromParent(); + +
    + How to Create Types +
    + +
    + +

    In generating IR, you may need some complex types. If you know these types +statically, you can use TypeBuilder<...>::get(), defined +in llvm/Support/TypeBuilder.h, to retrieve them. TypeBuilder +has two forms depending on whether you're building types for cross-compilation +or native library use. TypeBuilder<T, true> requires +that T be independent of the host environment, meaning that it's built +out of types from +the llvm::types +namespace and pointers, functions, arrays, etc. built of +those. TypeBuilder<T, false> additionally allows native C types +whose size may depend on the host compiler. For example,

    + +
    +
    +FunctionType *ft = TypeBuilder<types::i<8>(types::i<32>*), true>::get();
    +
    +
    + +

    is easier to read and write than the equivalent

    + +
    +
    +std::vector<const Type*> params;
    +params.push_back(PointerType::getUnqual(Type::Int32Ty));
    +FunctionType *ft = FunctionType::get(Type::Int8Ty, params, false);
    +
    +
    + +

    See the class +comment for more details.

    + +
    + + +
    + Threads and LLVM +
    + + +
    +

    +This section describes the interaction of the LLVM APIs with multithreading, +both on the part of client applications, and in the JIT, in the hosted +application. +

    + +

    +Note that LLVM's support for multithreading is still relatively young. Up +through version 2.5, the execution of threaded hosted applications was +supported, but not threaded client access to the APIs. While this use case is +now supported, clients must adhere to the guidelines specified below to +ensure proper operation in multithreaded mode. +

    + +

    +Note that, on Unix-like platforms, LLVM requires the presence of GCC's atomic +intrinsics in order to support threaded operation. If you need a +multhreading-capable LLVM on a platform without a suitably modern system +compiler, consider compiling LLVM and LLVM-GCC in single-threaded mode, and +using the resultant compiler to build a copy of LLVM with multithreading +support. +

    +
    + + +
    + Entering and Exiting Multithreaded Mode +
    + +
    + +

    +In order to properly protect its internal data structures while avoiding +excessive locking overhead in the single-threaded case, the LLVM must intialize +certain data structures necessary to provide guards around its internals. To do +so, the client program must invoke llvm_start_multithreaded() before +making any concurrent LLVM API calls. To subsequently tear down these +structures, use the llvm_stop_multithreaded() call. You can also use +the llvm_is_multithreaded() call to check the status of multithreaded +mode. +

    + +

    +Note that both of these calls must be made in isolation. That is to +say that no other LLVM API calls may be executing at any time during the +execution of llvm_start_multithreaded() or llvm_stop_multithreaded +. It's is the client's responsibility to enforce this isolation. +

    + +

    +The return value of llvm_start_multithreaded() indicates the success or +failure of the initialization. Failure typically indicates that your copy of +LLVM was built without multithreading support, typically because GCC atomic +intrinsics were not found in your system compiler. In this case, the LLVM API +will not be safe for concurrent calls. However, it will be safe for +hosting threaded applications in the JIT, though care +must be taken to ensure that side exits and the like do not accidentally +result in concurrent LLVM API calls. +

    +
    + + +
    + Ending Execution with llvm_shutdown() +
    + +
    +

    +When you are done using the LLVM APIs, you should call llvm_shutdown() +to deallocate memory used for internal structures. This will also invoke +llvm_stop_multithreaded() if LLVM is operating in multithreaded mode. +As such, llvm_shutdown() requires the same isolation guarantees as +llvm_stop_multithreaded(). +

    + +

    +Note that, if you use scope-based shutdown, you can use the +llvm_shutdown_obj class, which calls llvm_shutdown() in its +destructor. +

    + + +
    + Lazy Initialization with ManagedStatic +
    + +
    +

    +ManagedStatic is a utility class in LLVM used to implement static +initialization of static resources, such as the global type tables. Before the +invocation of llvm_shutdown(), it implements a simple lazy +initialization scheme. Once llvm_start_multithreaded() returns, +however, it uses double-checked locking to implement thread-safe lazy +initialization. +

    + +

    +Note that, because no other threads are allowed to issue LLVM API calls before +llvm_start_multithreaded() returns, it is possible to have +ManagedStatics of llvm::sys::Mutexs. +

    + +

    +The llvm_acquire_global_lock() and llvm_release_global_lock +APIs provide access to the global lock used to implement the double-checked +locking for lazy initialization. These should only be used internally to LLVM, +and only if you know what you're doing! +

    +
    + + +
    + Achieving Isolation with LLVMContext +
    + +
    +

    +LLVMContext is an opaque class in the LLVM API which clients can use +to operate multiple, isolated instances of LLVM concurrently within the same +address space. For instance, in a hypothetical compile-server, the compilation +of an individual translation unit is conceptually independent from all the +others, and it would be desirable to be able to compile incoming translation +units concurrently on independent server threads. Fortunately, +LLVMContext exists to enable just this kind of scenario! +

    + +

    +Conceptually, LLVMContext provides isolation. Every LLVM entity +(Modules, Values, Types, Constants, etc.) +in LLVM's in-memory IR belongs to an LLVMContext. Entities in +different contexts cannot interact with each other: Modules in +different contexts cannot be linked together, Functions cannot be added +to Modules in different contexts, etc. What this means is that is is +safe to compile on multiple threads simultaneously, as long as no two threads +operate on entities within the same context. +

    + +

    +In practice, very few places in the API require the explicit specification of a +LLVMContext, other than the Type creation/lookup APIs. +Because every Type carries a reference to its owning context, most +other entities can determine what context they belong to by looking at their +own Type. If you are adding new entities to LLVM IR, please try to +maintain this interface design. +

    + +

    +For clients that do not require the benefits of isolation, LLVM +provides a convenience API getGlobalContext(). This returns a global, +lazily initialized LLVMContext that may be used in situations where +isolation is not a concern. +

    +
    + + +
    + Threads and the JIT +
    + +
    +

    +LLVM's "eager" JIT compiler is safe to use in threaded programs. Multiple +threads can call ExecutionEngine::getPointerToFunction() or +ExecutionEngine::runFunction() concurrently, and multiple threads can +run code output by the JIT concurrently. The user must still ensure that only +one thread accesses IR in a given LLVMContext while another thread +might be modifying it. One way to do that is to always hold the JIT lock while +accessing IR outside the JIT (the JIT modifies the IR by adding +CallbackVHs). Another way is to only +call getPointerToFunction() from the LLVMContext's thread. +

    + +

    When the JIT is configured to compile lazily (using +ExecutionEngine::DisableLazyCompilation(false)), there is currently a +race condition in +updating call sites after a function is lazily-jitted. It's still possible to +use the lazy JIT in a threaded program if you ensure that only one thread at a +time can call any particular lazy stub and that the JIT lock guards any IR +access, but we suggest using only the eager JIT in threaded programs. +

    +
    +
    Advanced Topics @@ -2573,9 +2999,9 @@ the lib/VMCore directory.

      -
    • bool isInteger() const: Returns true for any integer type.
    • +
    • bool isIntegerTy() const: Returns true for any integer type.
    • -
    • bool isFloatingPoint(): Return true if this is one of the two +
    • bool isFloatingPointTy(): Return true if this is one of the five floating point types.
    • bool isAbstract(): Return true if the type is abstract (contains @@ -2624,7 +3050,7 @@ the lib/VMCore directory.

      VectorType
      Subclass of SequentialType for vector types. A vector type is similar to an ArrayType but is distinguished because it is - a first class type wherease ArrayType is not. Vector types are used for + a first class type whereas ArrayType is not. Vector types are used for vector operations and are usually small vectors of of an integer or floating point type.
      StructType
      @@ -3184,7 +3610,7 @@ Superclasses: GlobalValue, Value

      The Function class represents a single procedure in LLVM. It is -actually one of the more complex classes in the LLVM heirarchy because it must +actually one of the more complex classes in the LLVM hierarchy because it must keep track of a large amount of data. The Function class keeps track of a list of BasicBlocks, a list of formal Arguments, and a @@ -3193,7 +3619,7 @@ of a list of BasicBlocks, a list of formal

      The list of BasicBlocks is the most commonly used part of Function objects. The list imposes an implicit ordering of the blocks in the function, which indicate how the code will be -layed out by the backend. Additionally, the first BasicBlock is the implicit entry node for the Function. It is not legal in LLVM to explicitly branch to this initial block. There are no implicit exit nodes, and in fact there may be multiple exit @@ -3323,7 +3749,7 @@ Superclasses: GlobalValue, User, Value

      -

      Global variables are represented with the (suprise suprise) +

      Global variables are represented with the (surprise surprise) GlobalVariable class. Like functions, GlobalVariables are also subclasses of GlobalValue, and as such are always referenced by their address (global values must live in memory, so their @@ -3373,7 +3799,7 @@ never change at runtime).

    • Constant *getInitializer() -

      Returns the intial value for a GlobalVariable. It is not legal +

      Returns the initial value for a GlobalVariable. It is not legal to call this method if there is no initializer.

    @@ -3389,7 +3815,7 @@ never change at runtime).

    #include "llvm/BasicBlock.h"
    -doxygen info: BasicBlock +doxygen info: BasicBlock Class
    Superclass: Value