LLE 6/6: Add LoopLoadElimination pass Summary: The goal of this pass is to perform store-to-load forwarding across the backedge of a loop. E.g.: for (i) A[i + 1] = A[i] + B[i] => T = A[0] for (i) T = T + B[i] A[i + 1] = T The pass relies on loop dependence analysis via LoopAccessAnalisys to find opportunities of loop-carried dependences with a distance of one between a store and a load. Since it's using LoopAccessAnalysis, it was easy to also add support for versioning away may-aliasing intervening stores that would otherwise prevent this transformation. This optimization is also performed by Load-PRE in GVN without the option of multi-versioning. As was discussed with Daniel Berlin in http://reviews.llvm.org/D9548, this is inferior to a more loop-aware solution applied here. Hopefully, we will be able to remove some complexity from GVN/MemorySSA as a consequence. In the long run, we may want to extend this pass (or create a new one if there is little overlap) to also eliminate loop-indepedent redundant loads and store that *require* versioning due to may-aliasing intervening stores/loads. I have some motivating cases for store elimination. My plan right now is to wait for MemorySSA to come online first rather than using memdep for this. The main motiviation for this pass is the 456.hmmer loop in SPECint2006 where after distributing the original loop and vectorizing the top part, we are left with the critical path exposed in the bottom loop. Being able to promote the memory dependence into a register depedence (even though the HW does perform store-to-load fowarding as well) results in a major gain (~20%). This gain also transfers over to x86: it's around 8-10%. Right now the pass is off by default and can be enabled with -enable-loop-load-elim. On the LNT testsuite, there are two performance changes (negative number -> improvement): 1. -28% in Polybench/linear-algebra/solvers/dynprog: the length of the critical paths is reduced 2. +2% in Polybench/stencils/adi: Unfortunately, I couldn't reproduce this outside of LNT The pass is scheduled after the loop vectorizer (which is after loop distribution). The rational is to try to reuse LAA state, rather than recomputing it. The order between LV and LLE is not critical because normally LV does not touch scalar st->ld forwarding cases where vectorizing would inhibit the CPU's st->ld forwarding to kick in. LoopLoadElimination requires LAA to provide the full set of dependences (including forward dependences). LAA is known to omit loop-independent dependences in certain situations. The big comment before removeDependencesFromMultipleStores explains why this should not occur for the cases that we're interested in. Reviewers: dberlin, hfinkel Subscribers: junbuml, dberlin, mssimpso, rengolin, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D13259 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@252017 91177308-0d34-0410-b5e6-96231b3b80d8
[PM] Port ADCE to the new pass manager git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251725 91177308-0d34-0410-b5e6-96231b3b80d8
[PM] Port SROA to the new pass manager. In some ways this is a very boring port to the new pass manager as there are no interesting analyses or dependencies or other oddities. However, this does introduce the first good example of a transformation pass with non-trivial state porting to the new pass manager. I've tried to carve out patterns here to replicate elsewhere, and would appreciate comments on whether folks like these patterns: - A common need in the new pass manager is to effectively lift the pass class and some of its state into a public header file. Prior to this, LLVM used anonymous namespaces to provide "module private" types and utilities, but that doesn't scale to cases where a public header file is needed and the new pass manager will exacerbate that. The pattern I've adopted here is to use the namespace-cased-name of the core pass (what would be a module if we had them) as a module-private namespace. Then utility and other code can be declared and defined in this namespace. At some point in the future, we could even have (conditionally compiled) code that used modules features when available to do the same basic thing. - I've split the actual pass run method in two in order to expose a private method usable by the old pass manager to wrap the new class with a minimum of duplicated code. I actually looked at a bunch of ways to automate or generate these, but they are all quite terrible IMO. The fundamental need is to extract the set of analyses which need to cross this interface boundary, and that will end up being too unpredictable to effectively encapsulate IMO. This is also a relatively small amount of boiler plate that will live a relatively short time, so I'm not too worried about the fact that it is boiler plate. The rest of the patch is totally boring but results in a massive diff (sorry). It just moves code around and removes or adds qualifiers to reflect the new name and nesting structure. Differential Revision: http://reviews.llvm.org/D12773 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@247501 91177308-0d34-0410-b5e6-96231b3b80d8
[PM/AA] Rebuild LLVM's alias analysis infrastructure in a way compatible with the new pass manager, and no longer relying on analysis groups. This builds essentially a ground-up new AA infrastructure stack for LLVM. The core ideas are the same that are used throughout the new pass manager: type erased polymorphism and direct composition. The design is as follows: - FunctionAAResults is a type-erasing alias analysis results aggregation interface to walk a single query across a range of results from different alias analyses. Currently this is function-specific as we always assume that aliasing queries are *within* a function. - AAResultBase is a CRTP utility providing stub implementations of various parts of the alias analysis result concept, notably in several cases in terms of other more general parts of the interface. This can be used to implement only a narrow part of the interface rather than the entire interface. This isn't really ideal, this logic should be hoisted into FunctionAAResults as currently it will cause a significant amount of redundant work, but it faithfully models the behavior of the prior infrastructure. - All the alias analysis passes are ported to be wrapper passes for the legacy PM and new-style analysis passes for the new PM with a shared result object. In some cases (most notably CFL), this is an extremely naive approach that we should revisit when we can specialize for the new pass manager. - BasicAA has been restructured to reflect that it is much more fundamentally a function analysis because it uses dominator trees and loop info that need to be constructed for each function. All of the references to getting alias analysis results have been updated to use the new aggregation interface. All the preservation and other pass management code has been updated accordingly. The way the FunctionAAResultsWrapperPass works is to detect the available alias analyses when run, and add them to the results object. This means that we should be able to continue to respect when various passes are added to the pipeline, for example adding CFL or adding TBAA passes should just cause their results to be available and to get folded into this. The exception to this rule is BasicAA which really needs to be a function pass due to using dominator trees and loop info. As a consequence, the FunctionAAResultsWrapperPass directly depends on BasicAA and always includes it in the aggregation. This has significant implications for preserving analyses. Generally, most passes shouldn't bother preserving FunctionAAResultsWrapperPass because rebuilding the results just updates the set of known AA passes. The exception to this rule are LoopPass instances which need to preserve all the function analyses that the loop pass manager will end up needing. This means preserving both BasicAAWrapperPass and the aggregating FunctionAAResultsWrapperPass. Now, when preserving an alias analysis, you do so by directly preserving that analysis. This is only necessary for non-immutable-pass-provided alias analyses though, and there are only three of interest: BasicAA, GlobalsAA (formerly GlobalsModRef), and SCEVAA. Usually BasicAA is preserved when needed because it (like DominatorTree and LoopInfo) is marked as a CFG-only pass. I've expanded GlobalsAA into the preserved set everywhere we previously were preserving all of AliasAnalysis, and I've added SCEVAA in the intersection of that with where we preserve SCEV itself. One significant challenge to all of this is that the CGSCC passes were actually using the alias analysis implementations by taking advantage of a pretty amazing set of loop holes in the old pass manager's analysis management code which allowed analysis groups to slide through in many cases. Moving away from analysis groups makes this problem much more obvious. To fix it, I've leveraged the flexibility the design of the new PM components provides to just directly construct the relevant alias analyses for the relevant functions in the IPO passes that need them. This is a bit hacky, but should go away with the new pass manager, and is already in many ways cleaner than the prior state. Another significant challenge is that various facilities of the old alias analysis infrastructure just don't fit any more. The most significant of these is the alias analysis 'counter' pass. That pass relied on the ability to snoop on AA queries at different points in the analysis group chain. Instead, I'm planning to build printing functionality directly into the aggregation layer. I've not included that in this patch merely to keep it smaller. Note that all of this needs a nearly complete rewrite of the AA documentation. I'm planning to do that, but I'd like to make sure the new design settles, and to flesh out a bit more of what it looks like in the new pass manager first. Differential Revision: http://reviews.llvm.org/D12080 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@247167 91177308-0d34-0410-b5e6-96231b3b80d8
Convert SampleProfile pass into a Module pass. Eventually, we will need sample profiles to be incorporated into the inliner's cost models. To do this, we need the sample profile pass to be a module pass. This patch makes no functional changes beyond the mechanical adjustments needed to run SampleProfile as a module pass. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245940 91177308-0d34-0410-b5e6-96231b3b80d8
[PM/AA] Hoist the interface to TBAA into a dedicated header along with its creation function. Update the relevant includes accordingly. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245019 91177308-0d34-0410-b5e6-96231b3b80d8
[PM/AA] Hoist ScopedNoAliasAA's interface into a header and move the creation function there. Same basic refactoring as the other alias analyses. Nothing special required this time around. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245012 91177308-0d34-0410-b5e6-96231b3b80d8
[PM/AA] Hoist the interface for BasicAA into a header file. This is the first mechanical step in preparation for making this and all the other alias analysis passes available to the new pass manager. I'm factoring out all the totally boring changes I can so I'm moving code around here with no other changes. I've even minimized the formatting churn. I'll reformat and freshen comments on the interface now that its located in the right place so that the substantive changes don't triger this. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244197 91177308-0d34-0410-b5e6-96231b3b80d8
Add a speculative execution pass Summary: This is a pass for speculative execution of instructions for simple if-then (triangle) control flow. It's aimed at GPUs, but could perhaps be used in other contexts. Enabling this pass gives us a 1.0% geomean improvement on Google benchmark suites, with one benchmark improving 33%. Credit goes to Jingyue Wu for writing an earlier version of this pass. Patched by Bjarke Roune. Test Plan: This patch adds a set of tests in test/Transforms/SpeculativeExecution/spec.ll The pass is controlled by a flag which defaults to having the pass not run. Reviewers: eliben, dberlin, meheff, jingyue, hfinkel Reviewed By: jingyue, hfinkel Subscribers: majnemer, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D9360 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@237459 91177308-0d34-0410-b5e6-96231b3b80d8
New Loop Distribution pass Summary: This implements the initial version as was proposed earlier this year (http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-January/080462.html). Since then Loop Access Analysis was split out from the Loop Vectorizer and was made into a separate analysis pass. Loop Distribution becomes the second user of this analysis. The pass is off by default and can be enabled with -enable-loop-distribution. There is currently no notion of profitability; if there is a loop with dependence cycles, the pass will try to split them off from other memory operations into a separate loop. I decided to remove the control-dependence calculation from this first version. This and the issues with the PDT are actively discussed so it probably makes sense to treat it separately. Right now I just mark all terminator instruction required which keeps identical CFGs for each distributed loop. This seems to be working pretty well for 456.hmmer where even though there is an empty if-then block in the distributed loop initially, it gets completely removed. The pass keeps DominatorTree and LoopInfo updated. I've tested this with -loop-distribute-verify with the testsuite where we distribute ~90 loops. SimplifyLoop is violated in some cases and I have a FIXME covering this. Reviewers: hfinkel, nadav, aschwaighofer Reviewed By: aschwaighofer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8831 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@237358 91177308-0d34-0410-b5e6-96231b3b80d8
Simplify n-ary adds by reassociation Summary: This transformation reassociates a n-ary add so that the add can partially reuse existing instructions. For example, this pass can simplify void foo(int a, int b) { bar(a + b); bar((a + 2) + b); } to void foo(int a, int b) { int t = a + b; bar(t); bar(t + 2); } saving one add instruction. Fixes PR22357 (https://llvm.org/bugs/show_bug.cgi?id=22357). Test Plan: nary-add.ll Reviewers: broune, dberlin, hfinkel, meheff, sanjoy, atrick Reviewed By: sanjoy, atrick Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8950 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234855 91177308-0d34-0410-b5e6-96231b3b80d8
Reapply r233175 and r233183: float2int. This re-adds float2int to the tree, after fixing PR23038. It turns out the argument to APSInt() is true-if-unsigned, rather than true-if-signed :(. Added testcase and explanatory comment. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233370 91177308-0d34-0410-b5e6-96231b3b80d8
Revert r233175 and r233183 with it. This pulls float2int back out of the tree, due to PR23038. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233350 91177308-0d34-0410-b5e6-96231b3b80d8
Reapply r233062: "float2int": Add a new pass to demote from float to int where possible. Now with a fix for PR23008 and extra regression test. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233175 91177308-0d34-0410-b5e6-96231b3b80d8
Revert r233062 ""float2int": Add a new pass to demote from float to int where possible." This caused PR23008, compiles failing with: "Use still stuck around after Def is destroyed: %.sroa.speculated" Also reverting follow-up r233064. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233105 91177308-0d34-0410-b5e6-96231b3b80d8
"float2int": Add a new pass to demote from float to int where possible. It is possible to have code that converts from integer to float, performs operations then converts back, and the result is provably the same as if integers were used. This can come from different sources, but the most obvious is a helper function that uses floats but the arguments given at an inlined callsites are integers. This pass considers all integers requiring a bitwidth less than or equal to the bitwidth of the mantissa of a floating point type (23 for floats, 52 for doubles) as exactly representable in floating point. To reduce the risk of harming efficient code, the pass only attempts to perform complete removal of inttofp/fptoint operations, not just move them around. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233062 91177308-0d34-0410-b5e6-96231b3b80d8
Verifier: Remove the separate -verify-di pass Remove `DebugInfoVerifierLegacyPass` and the `-verify-di` pass. Instead, call into the `DebugInfoVerifier` from inside `VerifierLegacyPass::finalizeModule()`. This better matches the logic in `verifyModule()` (used by the new PassManager), avoids requiring two separate passes to verify the IR, and makes the API for "add a pass to verify the IR" simple. Note: the `-verify-debug-info` flag still works (for now, at least; eventually it might make sense to just remove it). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232772 91177308-0d34-0410-b5e6-96231b3b80d8
Add a new pass "Loop Interchange" This pass interchanges loops to provide a more cache-friendly memory access. For e.g. given a loop like - for(int i=0;i<N;i++) for(int j=0;j<N;j++) A[j][i] = A[j][i]+B[j][i]; is interchanged to - for(int j=0;j<N;j++) for(int i=0;i<N;i++) A[j][i] = A[j][i]+B[j][i]; This pass is currently disabled by default. To give a brief introduction it consists of 3 stages- LoopInterchangeLegality : Checks the legality of loop interchange based on Dependency matrix. LoopInterchangeProfitability: A very basic heuristic has been added to check for profitibility. This will evolve over time. LoopInterchangeTransform : Which does the actual transform. LNT Performance tests shows improvement in Polybench/linear-algebra/kernels/mvt and Polybench/linear-algebra/kernels/gemver becnmarks. TODO: 1) Add support for reductions and lcssa phi. 2) Improve profitability model. 3) Improve loop selection algorithm to select best loop for interchange. Currently the innermost loop is selected for interchange. 4) Improve compile time regression found in llvm lnt due to this pass. 5) Fix issues in Dependency Analysis module. A special thanks to Hal for reviewing this code. Review: http://reviews.llvm.org/D7499 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231458 91177308-0d34-0410-b5e6-96231b3b80d8
Add a pass for constructing gc.statepoint sequences w/explicit relocations This patch consists of a single pass whose only purpose is to visit previous inserted gc.statepoints which do not have gc.relocates inserted yet, and insert them. This can be used either immediately after IR generation to perform 'early safepoint insertion' or late in the pass order to perform 'late insertion'. This patch is setting the stage for work to continue in tree. In particular, there are known naming and style violations in the current patch. I'll try to get those resolved over the next week or so. As I touch each area to make style changes, I need to make sure we have adequate testing in place. As part of the cleanup, I will be cleaning up a collection of test cases we have out of tree and submitting them upstream. The tests included in this change are very basic and mostly to provide examples of usage. The pass has several main subproblems it needs to address: - First, it has identify any live pointers. In the current code, the use of address spaces to distinguish pointers to GC managed objects is hard coded, but this will become parametrizable in the near future. Note that the current change doesn't actually contain a useful liveness analysis. It was seperated into a followup change as the code wasn't ready to be shared. Instead, the current implementation just considers any dominating def of appropriate pointer type to be live. - Second, it has to identify base pointers for each live pointer. This is a fairly straight forward data flow algorithm. - Third, the information in the previous steps is used to actually introduce rewrites. Rather than trying to do this by hand, we simply re-purpose the code behind Mem2Reg to do this for us. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229945 91177308-0d34-0410-b5e6-96231b3b80d8
[LoopAccesses] Create the analysis pass This is a function pass that runs the analysis on demand. The analysis can be initiated by querying the loop access info via LAA::getInfo. It either returns the cached info or runs the analysis. Symbolic stride information continues to reside outside of this analysis pass. We may move it inside later but it's not a priority for me right now. The idea is that Loop Distribution won't support run-time stride checking at least initially. This means that when querying the analysis, symbolic stride information can be provided optionally. Whether stride information is used can invalidate the cache entry and rerun the analysis. Note that if the loop does not have any symbolic stride, the entry should be preserved across Loop Distribution and LV. Since currently the only user of the pass is LV, I just check that the symbolic stride information didn't change when using a cached result. On the LV side, LoopVectorizationLegality requests the info object corresponding to the loop from the analysis pass. A large chunk of the diff is due to LAI becoming a pointer from a reference. A test will be added as part of the -analyze patch. Also tested that with AVX, we generate identical assembly output for the testsuite (including the external testsuite) before and after. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229893 91177308-0d34-0410-b5e6-96231b3b80d8