LegalizeIntegerTypes: Reenable the large shift with small amount optimization.

[oota-llvm.git] / docs / ProgrammersManual.html
diff --git a/docs/ProgrammersManual.html b/docs/ProgrammersManual.html

index bfa721ddc95a26d23a1a1501cf604a985ad23479..f4c03222b2065a6760cab92203940662d54593fc 100644 (file)
--- a/docs/ProgrammersManual.html
+++ b/docs/ProgrammersManual.html
@@ -68,17 +68,26 @@ option</a></li>
        <li><a href="#dss_packedvector">llvm/ADT/PackedVector.h</a></li>
        <li><a href="#dss_other">Other Sequential Container Options</a></li>
      </ul></li>
+    <li><a href="#ds_string">String-like containers</a>
+    <ul>
+      <li><a href="#dss_stringref">llvm/ADT/StringRef.h</a></li>
+      <li><a href="#dss_twine">llvm/ADT/Twine.h</a></li>
+      <li><a href="#dss_smallstring">llvm/ADT/SmallString.h</a></li>
+      <li><a href="#dss_stdstring">std::string</a></li>
+    </ul></li>
      <li><a href="#ds_set">Set-Like Containers (std::set, SmallSet, SetVector, etc)</a>
      <ul>
        <li><a href="#dss_sortedvectorset">A sorted 'vector'</a></li>
        <li><a href="#dss_smallset">"llvm/ADT/SmallSet.h"</a></li>
        <li><a href="#dss_smallptrset">"llvm/ADT/SmallPtrSet.h"</a></li>
        <li><a href="#dss_denseset">"llvm/ADT/DenseSet.h"</a></li>
+      <li><a href="#dss_sparseset">"llvm/ADT/SparseSet.h"</a></li>
        <li><a href="#dss_FoldingSet">"llvm/ADT/FoldingSet.h"</a></li>
        <li><a href="#dss_set">&lt;set&gt;</a></li>
        <li><a href="#dss_setvector">"llvm/ADT/SetVector.h"</a></li>
        <li><a href="#dss_uniquevector">"llvm/ADT/UniqueVector.h"</a></li>
-      <li><a href="#dss_otherset">Other Set-Like ContainerOptions</a></li>
+      <li><a href="#dss_immutableset">"llvm/ADT/ImmutableSet.h"</a></li>
+      <li><a href="#dss_otherset">Other Set-Like Container Options</a></li>
      </ul></li>
      <li><a href="#ds_map">Map-Like Containers (std::map, DenseMap, etc)</a>
      <ul>
@@ -90,12 +99,9 @@ option</a></li>
        <li><a href="#dss_intervalmap">"llvm/ADT/IntervalMap.h"</a></li>
        <li><a href="#dss_map">&lt;map&gt;</a></li>
        <li><a href="#dss_inteqclasses">"llvm/ADT/IntEqClasses.h"</a></li>
+      <li><a href="#dss_immutablemap">"llvm/ADT/ImmutableMap.h"</a></li>
        <li><a href="#dss_othermap">Other Map-Like Container Options</a></li>
      </ul></li>
-    <li><a href="#ds_string">String-like containers</a>
-    <!--<ul>
-       todo
-    </ul>--></li>
      <li><a href="#ds_bit">BitVector-like containers</a>
      <ul>
        <li><a href="#dss_bitvector">A dense bitvector</a></li>
@@ -884,7 +890,7 @@ cost of adding the elements to the container. </p>
  <div>
  There are a variety of sequential containers available for you, based on your
  needs.  Pick the first in this section that will do what you want.
-
+  
  <!-- _______________________________________________________________________ -->
  <h4>
    <a name="dss_arrayref">llvm/ADT/ArrayRef.h</a>
@@ -943,8 +949,6 @@ type, and 2) it cannot hold a null pointer.</p>
    
  </div>
      
-<div>
-
  <!-- _______________________________________________________________________ -->
  <h4>
    <a name="dss_smallvector">"llvm/ADT/SmallVector.h"</a>
@@ -994,7 +998,7 @@ vector is also useful when interfacing with code that expects vectors :).
  <pre>
  for ( ... ) {
     std::vector&lt;foo&gt; V;
-   use V;
+   // make use of V.
  }
  </pre>
  </div>
@@ -1005,7 +1009,7 @@ for ( ... ) {
  <pre>
  std::vector&lt;foo&gt; V;
  for ( ... ) {
-   use V;
+   // make use of V.
     V.clear();
  }
  </pre>
@@ -1209,9 +1213,187 @@ std::priority_queue, std::stack, etc.  These provide simplified access to an
  underlying container but don't affect the cost of the container itself.</p>
  
  </div>
+</div>
+
+<!-- ======================================================================= -->
+<h3>
+  <a name="ds_string">String-like containers</a>
+</h3>
+
+<div>
+
+<p>
+There are a variety of ways to pass around and use strings in C and C++, and
+LLVM adds a few new options to choose from.  Pick the first option on this list
+that will do what you need, they are ordered according to their relative cost.
+</p>
+<p>
+Note that is is generally preferred to <em>not</em> pass strings around as 
+"<tt>const char*</tt>"'s.  These have a number of problems, including the fact
+that they cannot represent embedded nul ("\0") characters, and do not have a
+length available efficiently.  The general replacement for '<tt>const 
+char*</tt>' is StringRef.
+</p>
+  
+<p>For more information on choosing string containers for APIs, please see
+<a href="#string_apis">Passing strings</a>.</p>
+  
+  
+<!-- _______________________________________________________________________ -->
+<h4>
+  <a name="dss_stringref">llvm/ADT/StringRef.h</a>
+</h4>
+
+<div>
+<p>
+The StringRef class is a simple value class that contains a pointer to a
+character and a length, and is quite related to the <a 
+href="#dss_arrayref">ArrayRef</a> class (but specialized for arrays of
+characters).  Because StringRef carries a length with it, it safely handles
+strings with embedded nul characters in it, getting the length does not require
+a strlen call, and it even has very convenient APIs for slicing and dicing the
+character range that it represents.
+</p>
+  
+<p>
+StringRef is ideal for passing simple strings around that are known to be live,
+either because they are C string literals, std::string, a C array, or a
+SmallVector.  Each of these cases has an efficient implicit conversion to
+StringRef, which doesn't result in a dynamic strlen being executed.
+</p>
+  
+<p>StringRef has a few major limitations which make more powerful string
+containers useful:</p>
+  
+<ol>
+<li>You cannot directly convert a StringRef to a 'const char*' because there is
+no way to add a trailing nul (unlike the .c_str() method on various stronger
+classes).</li>
+
+  
+<li>StringRef doesn't own or keep alive the underlying string bytes.
+As such it can easily lead to dangling pointers, and is not suitable for
+embedding in datastructures in most cases (instead, use an std::string or
+something like that).</li>
+  
+<li>For the same reason, StringRef cannot be used as the return value of a
+method if the method "computes" the result string.  Instead, use
+std::string.</li>
+    
+<li>StringRef's do not allow you to mutate the pointed-to string bytes and it
+doesn't allow you to insert or remove bytes from the range.  For editing 
+operations like this, it interoperates with the <a 
+href="#dss_twine">Twine</a> class.</li>
+</ol>
+  
+<p>Because of its strengths and limitations, it is very common for a function to
+take a StringRef and for a method on an object to return a StringRef that
+points into some string that it owns.</p>
+  
+</div>
+  
+<!-- _______________________________________________________________________ -->
+<h4>
+  <a name="dss_twine">llvm/ADT/Twine.h</a>
+</h4>
  
+<div>
+  <p>
+  The Twine class is used as an intermediary datatype for APIs that want to take
+  a string that can be constructed inline with a series of concatenations.
+  Twine works by forming recursive instances of the Twine datatype (a simple
+  value object) on the stack as temporary objects, linking them together into a
+  tree which is then linearized when the Twine is consumed.  Twine is only safe
+  to use as the argument to a function, and should always be a const reference,
+  e.g.:
+  </p>
+  
+  <pre>
+    void foo(const Twine &amp;T);
+    ...
+    StringRef X = ...
+    unsigned i = ...
+    foo(X + "." + Twine(i));
+  </pre>
+  
+  <p>This example forms a string like "blarg.42" by concatenating the values
+  together, and does not form intermediate strings containing "blarg" or
+  "blarg.".
+  </p>
+  
+  <p>Because Twine is constructed with temporary objects on the stack, and
+  because these instances are destroyed at the end of the current statement,
+  it is an inherently dangerous API.  For example, this simple variant contains
+  undefined behavior and will probably crash:</p>
+  
+  <pre>
+    void foo(const Twine &amp;T);
+    ...
+    StringRef X = ...
+    unsigned i = ...
+    const Twine &amp;Tmp = X + "." + Twine(i);
+    foo(Tmp);
+  </pre>
+
+  <p>... because the temporaries are destroyed before the call.  That said,
+  Twine's are much more efficient than intermediate std::string temporaries, and
+  they work really well with StringRef.  Just be aware of their limitations.</p>
+  
  </div>
  
+  
+<!-- _______________________________________________________________________ -->
+<h4>
+  <a name="dss_smallstring">llvm/ADT/SmallString.h</a>
+</h4>
+
+<div>
+  
+<p>SmallString is a subclass of <a href="#dss_smallvector">SmallVector</a> that
+adds some convenience APIs like += that takes StringRef's.  SmallString avoids
+allocating memory in the case when the preallocated space is enough to hold its
+data, and it calls back to general heap allocation when required.  Since it owns
+its data, it is very safe to use and supports full mutation of the string.</p>
+  
+<p>Like SmallVector's, the big downside to SmallString is their sizeof.  While
+they are optimized for small strings, they themselves are not particularly
+small.  This means that they work great for temporary scratch buffers on the
+stack, but should not generally be put into the heap: it is very rare to 
+see a SmallString as the member of a frequently-allocated heap data structure
+or returned by-value.
+</p>
+
+</div>
+  
+<!-- _______________________________________________________________________ -->
+<h4>
+  <a name="dss_stdstring">std::string</a>
+</h4>
+
+<div>
+  
+  <p>The standard C++ std::string class is a very general class that (like
+  SmallString) owns its underlying data.  sizeof(std::string) is very reasonable
+  so it can be embedded into heap data structures and returned by-value.
+  On the other hand, std::string is highly inefficient for inline editing (e.g.
+  concatenating a bunch of stuff together) and because it is provided by the
+  standard library, its performance characteristics depend a lot of the host
+  standard library (e.g. libc++ and MSVC provide a highly optimized string
+  class, GCC contains a really slow implementation).
+  </p>
+
+  <p>The major disadvantage of std::string is that almost every operation that
+  makes them larger can allocate memory, which is slow.  As such, it is better
+  to use SmallVector or Twine as a scratch buffer, but then use std::string to
+  persist the result.</p>
+
+  
+</div>
+  
+<!-- end of strings -->
+</div>
+
+  
  <!-- ======================================================================= -->
  <h3>
    <a name="ds_set">Set-Like Containers (std::set, SmallSet, SetVector, etc)</a>
@@ -1307,6 +1489,24 @@ href="#dss_densemap">DenseMap</a> has.
  
  </div>
  
+<!-- _______________________________________________________________________ -->
+<h4>
+  <a name="dss_sparseset">"llvm/ADT/SparseSet.h"</a>
+</h4>
+
+<div>
+
+<p>SparseSet holds a small number of objects identified by unsigned keys of
+moderate size. It uses a lot of memory, but provides operations that are
+almost as fast as a vector. Typical keys are physical registers, virtual
+registers, or numbered basic blocks.</p>
+
+<p>SparseSet is useful for algorithms that need very fast clear/find/insert/erase
+and fast iteration over small sets.  It is not intended for building composite
+data structures.</p>
+
+</div>
+
  <!-- _______________________________________________________________________ -->
  <h4>
    <a name="dss_FoldingSet">"llvm/ADT/FoldingSet.h"</a>
@@ -1400,12 +1600,13 @@ elements out of (linear time), unless you use it's "pop_back" method, which is
  faster.
  </p>
  
-<p>SetVector is an adapter class that defaults to using std::vector and std::set
-for the underlying containers, so it is quite expensive.  However,
-<tt>"llvm/ADT/SetVector.h"</tt> also provides a SmallSetVector class, which
-defaults to using a SmallVector and SmallSet of a specified size.  If you use
-this, and if your sets are dynamically smaller than N, you will save a lot of 
-heap traffic.</p>
+<p><tt>SetVector</tt> is an adapter class that defaults to
+   using <tt>std::vector</tt> and a size 16 <tt>SmallSet</tt> for the underlying
+   containers, so it is quite expensive. However,
+   <tt>"llvm/ADT/SetVector.h"</tt> also provides a <tt>SmallSetVector</tt>
+   class, which defaults to using a <tt>SmallVector</tt> and <tt>SmallSet</tt>
+   of a specified size. If you use this, and if your sets are dynamically
+   smaller than <tt>N</tt>, you will save a lot of heap traffic.</p>
  
  </div>
  
@@ -1428,6 +1629,29 @@ factors, and produces a lot of malloc traffic.  It should be avoided.</p>
  
  </div>
  
+<!-- _______________________________________________________________________ -->
+<h4>
+  <a name="dss_immutableset">"llvm/ADT/ImmutableSet.h"</a>
+</h4>
+
+<div>
+
+<p>
+ImmutableSet is an immutable (functional) set implementation based on an AVL
+tree.
+Adding or removing elements is done through a Factory object and results in the
+creation of a new ImmutableSet object.
+If an ImmutableSet already exists with the given contents, then the existing one
+is returned; equality is compared with a FoldingSetNodeID.
+The time and space complexity of add or remove operations is logarithmic in the
+size of the original set.
+
+<p>
+There is no method for returning an element of the set, you can only check for
+membership.
+
+</div>
+
  
  <!-- _______________________________________________________________________ -->
  <h4>
@@ -1548,7 +1772,7 @@ pointers, or map other small types to each other.
  
  <p>
  There are several aspects of DenseMap that you should be aware of, however.  The
-iterators in a densemap are invalidated whenever an insertion occurs, unlike
+iterators in a DenseMap are invalidated whenever an insertion occurs, unlike
  map.  Also, because DenseMap allocates space for a large number of key/value
  pairs (it starts with 64 by default), it will waste a lot of space if your keys
  or values are large.  Finally, you must implement a partial specialization of
@@ -1556,6 +1780,14 @@ DenseMapInfo for the key that you want, if it isn't already supported.  This
  is required to tell DenseMap about two special marker values (which can never be
  inserted into the map) that it needs internally.</p>
  
+<p>
+DenseMap's find_as() method supports lookup operations using an alternate key
+type. This is useful in cases where the normal key type is expensive to
+construct, but cheap to compare against. The DenseMapInfo is responsible for
+defining the appropriate comparison and hashing methods for each alternate
+key type used.
+</p>
+
  </div>
  
  <!-- _______________________________________________________________________ -->
@@ -1632,6 +1864,25 @@ it can be edited again.</p>
  
  </div>
  
+<!-- _______________________________________________________________________ -->
+<h4>
+  <a name="dss_immutablemap">"llvm/ADT/ImmutableMap.h"</a>
+</h4>
+
+<div>
+
+<p>
+ImmutableMap is an immutable (functional) map implementation based on an AVL
+tree.
+Adding or removing elements is done through a Factory object and results in the
+creation of a new ImmutableMap object.
+If an ImmutableMap already exists with the given key set, then the existing one
+is returned; equality is compared with a FoldingSetNodeID.
+The time and space complexity of add or remove operations is logarithmic in the
+size of the original map.
+
+</div>
+
  <!-- _______________________________________________________________________ -->
  <h4>
    <a name="dss_othermap">Other Map-Like Container Options</a>
@@ -1653,20 +1904,6 @@ always better.</p>
  
  </div>
  
-<!-- ======================================================================= -->
-<h3>
-  <a name="ds_string">String-like containers</a>
-</h3>
-
-<div>
-
-<p>
-TODO: const char* vs stringref vs smallstring vs std::string.  Describe twine,
-xref to #string_apis.
-</p>
-
-</div>
-
  <!-- ======================================================================= -->
  <h3>
    <a name="ds_bit">Bit storage containers (BitVector, SparseBitVector)</a>