Fix the order of these sections of the release notes.

[oota-llvm.git] / docs / ReleaseNotes.rst
diff --git a/docs/ReleaseNotes.rst b/docs/ReleaseNotes.rst

index d4ffe54e2c2d18768739522ebf4c625e5e7653e1..9ddbbb765bfd46878c9461efbc0d7d3c9f915234 100644 (file)
--- a/docs/ReleaseNotes.rst
+++ b/docs/ReleaseNotes.rst
@@ -292,7 +292,6 @@ Major New Features
     strong phi elim
     loop dependence analysis
     CorrelatedValuePropagation
-   lib/Transforms/IPO/MergeFunctions.cpp => consider for 3.2.
     Integrated assembler on by default for arm/thumb?
  
    Near dead:
@@ -352,7 +351,21 @@ We vectorize under the following loops:
     '``noalias``' and are checked at runtime.
  #. ...
  
-SROA - We've re-written SROA to be significantly more powerful.
+SROA - We've re-written SROA to be significantly more powerful and generate
+code which is much more friendly to the rest of the optimization pipeline.
+Previously this pass had scaling problems that required it to only operate on
+relatively small aggregates, and at times it would mistakenly replace a large
+aggregate with a single very large integer in order to make it a scalar SSA
+value. The result was a large number of i1024 and i2048 values representing any
+small stack buffer. These in turn slowed down many subsequent optimization
+paths.
+
+The new SROA pass uses a different algorithm that allows it to only promote to
+scalars the pieces of the aggregate actively in use. Because of this it doesn't
+require any thresholds. It also always deduces the scalar values from the uses
+of the aggregate rather than the specific LLVM type of the aggregate. These
+features combine to both optimize more code with the pass but to improve the
+compile time of many functions dramatically.
  
  #. Branch weight metadata is preseved through more of the optimizer.
  #. ...
@@ -373,14 +386,6 @@ Post <http://blog.llvm.org/2010/04/intro-to-llvm-mc-project.html>`_.
  Target Independent Code Generator Improvements
  ----------------------------------------------
  
-Stack Coloring - We have implemented a new optimization pass to merge stack
-objects which are used in disjoin areas of the code.  This optimization reduces
-the required stack space significantly, in cases where it is clear to the
-optimizer that the stack slot is not shared.  We use the lifetime markers to
-tell the codegen that a certain alloca is used within a region.
-
-We now merge consecutive loads and stores.
-
  We have put a significant amount of work into the code generator
  infrastructure, which allows us to implement more aggressive algorithms and
  make it run faster:
@@ -395,6 +400,14 @@ which can be queried to determine legal groupings of instructions in a bundle.
  We have added a new target independent VLIW packetizer based on the DFA
  infrastructure to group machine instructions into bundles.
  
+Stack Coloring - We have implemented a new optimization pass to merge stack
+objects which are used in disjoin areas of the code.  This optimization reduces
+the required stack space significantly, in cases where it is clear to the
+optimizer that the stack slot is not shared.  We use the lifetime markers to
+tell the codegen that a certain alloca is used within a region.
+
+We now merge consecutive loads and stores.
+
  Basic Block Placement
  ^^^^^^^^^^^^^^^^^^^^^