X-Git-Url: http://plrg.eecs.uci.edu/git/?a=blobdiff_plain;f=docs%2FProgrammersManual.html;h=e920cbbc6460d54ac21bd5094e93bb824a08d58e;hb=f72596c04a06458c6152a51c9678690a34d59ab8;hp=ba8a38be9a9029b36bf65e370dacec0e13221b4e;hpb=b6d1f39cd0ed77997007db2053a5b5e277c67fc9;p=oota-llvm.git diff --git a/docs/ProgrammersManual.html b/docs/ProgrammersManual.html index ba8a38be9a9..e920cbbc646 100644 --- a/docs/ProgrammersManual.html +++ b/docs/ProgrammersManual.html @@ -2,6 +2,7 @@ "http://www.w3.org/TR/html4/strict.dtd">
+Although LLVM generally does not do much string manipulation, we do have +several important APIs which take strings. Two important examples are the +Value class -- which has names for instructions, functions, etc. -- and the +StringMap class which is used extensively in LLVM and Clang.
+ +These are generic classes, and they need to be able to accept strings which +may have embedded null characters. Therefore, they cannot simply take +a const char *, and taking a const std::string& requires +clients to perform a heap allocation which is usually unnecessary. Instead, +many LLVM APIs use a const StringRef& or a const +Twine& for passing strings efficiently.
+ +The StringRef data type represents a reference to a constant string +(a character array and a length) and supports the common operations available +on std:string, but does not require heap allocation.
+ +It can be implicitly constructed using a C style null-terminated string, +an std::string, or explicitly with a character pointer and length. +For example, the StringRef find function is declared as:
+ +and clients can call it using any one of:
+ ++ Map.find("foo"); // Lookup "foo" + Map.find(std::string("bar")); // Lookup "bar" + Map.find(StringRef("\0baz", 4)); // Lookup "\0baz" ++
Similarly, APIs which need to return a string may return a StringRef +instance, which can be used directly or converted to an std::string +using the str member function. See +"llvm/ADT/StringRef.h" +for more information.
+ +You should rarely use the StringRef class directly, because it contains +pointers to external memory it is not generally safe to store an instance of the +class (unless you know that the external storage will not be freed).
+ +The Twine class is an efficient way for APIs to accept concatenated +strings. For example, a common LLVM paradigm is to name one instruction based on +the name of another instruction with a suffix, for example:
+ ++ New = CmpInst::Create(..., SO->getName() + ".cmp"); ++
The Twine class is effectively a +lightweight rope +which points to temporary (stack allocated) objects. Twines can be implicitly +constructed as the result of the plus operator applied to strings (i.e., a C +strings, an std::string, or a StringRef). The twine delays the +actual concatentation of strings until it is actually required, at which point +it can be efficiently rendered directly into a character array. This avoids +unnecessary heap allocation involved in constructing the temporary results of +string concatenation. See +"llvm/ADT/Twine.h" +for more information.
+ +As with a StringRef, Twine objects point to external memory +and should almost never be stored or mentioned directly. They are intended +solely for use when defining a function which should be able to efficiently +accept concatenated strings.
+ +-DOUT << "I am here!\n"; +DEBUG(errs() << "I am here!\n");
-DOUT << "No debug type\n"; #undef DEBUG_TYPE +DEBUG(errs() << "No debug type\n"); #define DEBUG_TYPE "foo" -DOUT << "'foo' debug type\n"; +DEBUG(errs() << "'foo' debug type\n"); #undef DEBUG_TYPE #define DEBUG_TYPE "bar" -DOUT << "'bar' debug type\n"; +DEBUG(errs() << "'bar' debug type\n")); #undef DEBUG_TYPE #define DEBUG_TYPE "" -DOUT << "No debug type (2)\n"; +DEBUG(errs() << "No debug type (2)\n");
The DEBUG_WITH_TYPE macro is also available for situations where you +would like to set DEBUG_TYPE, but only for one specific DEBUG +statement. It takes an additional first parameter, which is the type to use. For +example, the preceeding example could be written as:
+ + ++DEBUG_WITH_TYPE("", errs() << "No debug type\n"); +DEBUG_WITH_TYPE("foo", errs() << "'foo' debug type\n"); +DEBUG_WITH_TYPE("bar", errs() << "'bar' debug type\n")); +DEBUG_WITH_TYPE("", errs() << "No debug type (2)\n"); ++
ilist has the same drawbacks as std::list, and additionally requires an -ilist_traits implementation for the element type, but it provides some novel -characteristics. In particular, it can efficiently store polymorphic objects, -the traits class is informed when an element is inserted or removed from the -list, and ilists are guaranteed to support a constant-time splice operation. -
+ilist has the same drawbacks as std::list, and additionally +requires an ilist_traits implementation for the element type, but it +provides some novel characteristics. In particular, it can efficiently store +polymorphic objects, the traits class is informed when an element is inserted or +removed from the list, and ilists are guaranteed to support a +constant-time splice operation.
+ +These properties are exactly what we want for things like +Instructions and basic blocks, which is why these are implemented with +ilists.
+ +Related classes of interest are explained in the following subsections: +ilist_traits<T> is ilist<T>'s customization +mechanism. iplist<T> (and consequently ilist<T>) +publicly derive from this traits class.
+iplist<T> is ilist<T>'s base and as such +supports a slightly narrower interface. Notably, inserters from +T& are absent.
+ +ilist_traits<T> is a public base of this class and can be +used for a wide variety of customizations.
+These properties are exactly what we want for things like Instructions and -basic blocks, which is why these are implemented with ilists.
+ + + +ilist_node<T> implements a the forward and backward links +that are expected by the ilist<T> (and analogous containers) +in the default manner.
+ +ilist_node<T>s are meant to be embedded in the node type +T, usually T publicly derives from +ilist_node<T>.
+ilists have another speciality that must be considered. To be a good +citizen in the C++ ecosystem, it needs to support the standard container +operations, such as begin and end iterators, etc. Also, the +operator-- must work correctly on the end iterator in the +case of non-empty ilists.
+ +The only sensible solution to this problem is to allocate a so-called +sentinel along with the intrusive list, which serves as the end +iterator, providing the back-link to the last element. However conforming to the +C++ convention it is illegal to operator++ beyond the sentinel and it +also must not be dereferenced.
+ +These constraints allow for some implementation freedom to the ilist +how to allocate and store the sentinel. The corresponding policy is dictated +by ilist_traits<T>. By default a T gets heap-allocated +whenever the need for a sentinel arises.
+ +While the default policy is sufficient in most cases, it may break down when +T does not provide a default constructor. Also, in the case of many +instances of ilists, the memory overhead of the associated sentinels +is wasted. To alleviate the situation with numerous and voluminous +T-sentinels, sometimes a trick is employed, leading to ghostly +sentinels.
+ +Ghostly sentinels are obtained by specially-crafted ilist_traits<T> +which superpose the sentinel with the ilist instance in memory. Pointer +arithmetic is used to obtain the sentinel, which is relative to the +ilist's this pointer. The ilist is augmented by an +extra pointer, which serves as the back-link of the sentinel. This is the only +field in the ghostly sentinel which can be legally accessed.
The STL provides several other options, such as std::multiset and the various -"hash_set" like containers (whether from C++ TR1 or from the SGI library).
+"hash_set" like containers (whether from C++ TR1 or from the SGI library). We +never use hash_set and unordered_set because they are generally very expensive +(each insertion requires a malloc) and very non-portable. +std::multiset is useful if you're not interested in elimination of duplicates, but has all the drawbacks of std::set. A sorted vector (where you don't delete duplicate entries) or some other approach is almost always better.
-The various hash_set implementations (exposed portably by -"llvm/ADT/hash_set") is a simple chained hashtable. This algorithm is as malloc -intensive as std::set (performing an allocation for each element inserted, -thus having really high constant factors) but (usually) provides O(1) -insertion/deletion of elements. This can be useful if your elements are large -(thus making the constant-factor cost relatively low) or if comparisons are -expensive. Element iteration does not visit elements in a useful order.
- @@ -1294,19 +1518,27 @@ another element takes place).The STL provides several other options, such as std::multimap and the various -"hash_map" like containers (whether from C++ TR1 or from the SGI library).
+"hash_map" like containers (whether from C++ TR1 or from the SGI library). We +never use hash_set and unordered_set because they are generally very expensive +(each insertion requires a malloc) and very non-portable.std::multimap is useful if you want to map a key to multiple values, but has all the drawbacks of std::map. A sorted vector or some other approach is almost always better.
-The various hash_map implementations (exposed portably by -"llvm/ADT/hash_map") are simple chained hash tables. This algorithm is as -malloc intensive as std::map (performing an allocation for each element -inserted, thus having really high constant factors) but (usually) provides O(1) -insertion/deletion of elements. This can be useful if your elements are large -(thus making the constant-factor cost relatively low) or if comparisons are -expensive. Element iteration does not visit elements in a useful order.
+ + + + + ++TODO: const char* vs stringref vs smallstring vs std::string. Describe twine, +xref to #string_apis. +
In generating IR, you may need some complex types. If you know these types +statically, you can use TypeBuilder<...>::get(), defined +in llvm/Support/TypeBuilder.h, to retrieve them. TypeBuilder +has two forms depending on whether you're building types for cross-compilation +or native library use. TypeBuilder<T, true> requires +that T be independent of the host environment, meaning that it's built +out of types from +the llvm::types +namespace and pointers, functions, arrays, etc. built of +those. TypeBuilder<T, false> additionally allows native C types +whose size may depend on the host compiler. For example,
+ ++FunctionType *ft = TypeBuilder<types::i<8>(types::i<32>*), true>::get(); ++
is easier to read and write than the equivalent
+ ++std::vector<const Type*> params; +params.push_back(PointerType::getUnqual(Type::Int32Ty)); +FunctionType *ft = FunctionType::get(Type::Int8Ty, params, false); ++
See the class +comment for more details.
+ ++This section describes the interaction of the LLVM APIs with multithreading, +both on the part of client applications, and in the JIT, in the hosted +application. +
+ ++Note that LLVM's support for multithreading is still relatively young. Up +through version 2.5, the execution of threaded hosted applications was +supported, but not threaded client access to the APIs. While this use case is +now supported, clients must adhere to the guidelines specified below to +ensure proper operation in multithreaded mode. +
+ ++Note that, on Unix-like platforms, LLVM requires the presence of GCC's atomic +intrinsics in order to support threaded operation. If you need a +multhreading-capable LLVM on a platform without a suitably modern system +compiler, consider compiling LLVM and LLVM-GCC in single-threaded mode, and +using the resultant compiler to build a copy of LLVM with multithreading +support. +
++In order to properly protect its internal data structures while avoiding +excessive locking overhead in the single-threaded case, the LLVM must intialize +certain data structures necessary to provide guards around its internals. To do +so, the client program must invoke llvm_start_multithreaded() before +making any concurrent LLVM API calls. To subsequently tear down these +structures, use the llvm_stop_multithreaded() call. You can also use +the llvm_is_multithreaded() call to check the status of multithreaded +mode. +
+ ++Note that both of these calls must be made in isolation. That is to +say that no other LLVM API calls may be executing at any time during the +execution of llvm_start_multithreaded() or llvm_stop_multithreaded +. It's is the client's responsibility to enforce this isolation. +
+ ++The return value of llvm_start_multithreaded() indicates the success or +failure of the initialization. Failure typically indicates that your copy of +LLVM was built without multithreading support, typically because GCC atomic +intrinsics were not found in your system compiler. In this case, the LLVM API +will not be safe for concurrent calls. However, it will be safe for +hosting threaded applications in the JIT, though care must be taken to ensure +that side exits and the like do not accidentally result in concurrent LLVM API +calls. +
++When you are done using the LLVM APIs, you should call llvm_shutdown() +to deallocate memory used for internal structures. This will also invoke +llvm_stop_multithreaded() if LLVM is operating in multithreaded mode. +As such, llvm_shutdown() requires the same isolation guarantees as +llvm_stop_multithreaded(). +
+ ++Note that, if you use scope-based shutdown, you can use the +llvm_shutdown_obj class, which calls llvm_shutdown() in its +destructor. +
+ManagedStatic is a utility class in LLVM used to implement static +initialization of static resources, such as the global type tables. Before the +invocation of llvm_shutdown(), it implements a simple lazy +initialization scheme. Once llvm_start_multithreaded() returns, +however, it uses double-checked locking to implement thread-safe lazy +initialization. +
+ ++Note that, because no other threads are allowed to issue LLVM API calls before +llvm_start_multithreaded() returns, it is possible to have +ManagedStatics of llvm::sys::Mutexs. +
+ ++The llvm_acquire_global_lock() and llvm_release_global_lock +APIs provide access to the global lock used to implement the double-checked +locking for lazy initialization. These should only be used internally to LLVM, +and only if you know what you're doing! +
+The -User class provides a base for expressing the ownership of User +User class provides a basis for expressing the ownership of User towards other Values. The Use helper class is employed to do the bookkeeping and to facilitate O(1) @@ -2242,7 +2631,7 @@ addition and removal.
@@ -2446,7 +2835,8 @@ tag bits.
For layout b) instead of the User we find a pointer (User* with LSBit set). Following this pointer brings us to the User. A portable trick ensures that the first bytes of User (if interpreted as a pointer) never has -the LSBit set. +the LSBit set. (Portability is relying on the fact that all known compilers place the +vptr in the first word of the instances.)Return whether or not the Function has a body defined. If the function is "external", it does not have a body, and thus must be resolved @@ -3275,11 +3665,12 @@ never change at runtime).
Create a new global variable of the specified type. If isConstant is true then the global variable will be marked as unchanging for the program. The Linkage parameter specifies the type of - linkage (internal, external, weak, linkonce, appending) for the variable. If - the linkage is InternalLinkage, WeakLinkage, or LinkOnceLinkage, then - the resultant global variable will have internal linkage. AppendingLinkage - concatenates together all instances (in different translation units) of the - variable into a single variable but is only applicable to arrays. See + linkage (internal, external, weak, linkonce, appending) for the variable. + If the linkage is InternalLinkage, WeakAnyLinkage, WeakODRLinkage, + LinkOnceAnyLinkage or LinkOnceODRLinkage, then the resultant + global variable will have internal linkage. AppendingLinkage concatenates + together all instances (in different translation units) of the variable + into a single variable but is only applicable to arrays. See the LLVM Language Reference for further details on linkage types. Optionally an initializer, a name, and the module to put the variable into may be specified for the global variable as @@ -3312,7 +3703,7 @@ never change at runtime).
#include "llvm/BasicBlock.h"
-doxygen info: BasicBlock
+doxygen info: BasicBlock
Class
Superclass: Value