Programming Languages Research Group: Git

Add support for sub-byte aligned writes to lib/Support/Endian.h

Summary:
As per Duncan's review for D12536, I extracted the sub-byte bit aligned
reading and writing code into lib/Support, and generalized it. Added calls from
BackpatchWord. Also added unittests.

Reviewers: dexonsmith

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13189

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248897 91177308-0d34-0410-b5e6-96231b3b80d8

Refactor computeKnownBits alignment handling code

Reviewed By: reames, hfinkel

Differential Revision: http://reviews.llvm.org/D12958

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248892 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM][NEON] Use address space in vld([1234]|[234]lane) and vst([1234]|[234]lane) instructions

This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234],
vst[234]lane ARM neon intrinsics and associates an address space with the
pointer that these intrinsics take. This changes, e.g.,

<2 x i32> @llvm.arm.neon.vld1.v2i32(i8*, i32)

to

<2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8*, i32)

This change ensures that address spaces are fully taken into account in the ARM
target during lowering of interleaved loads and stores.

Differential Revision: http://reviews.llvm.org/D12985

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248887 91177308-0d34-0410-b5e6-96231b3b80d8

[CMake] Adjust the variables set by LLVMConfig.cmake

When using LLVMConfig.cmake from an installed toolchain in order to build a
loadable pass using add_llvm_loadable_module LLVM_ENABLE_PLUGINS and
LLVM_PLUGIN_EXT must be set. Also make LLVM_DEFINITIONS be set to what it
actually is.

Differential Revision: http://reviews.llvm.org/D13214

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248884 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][XOP] Added support for the lowering of 128-bit vector shifts to XOP shift instructions

The XOP shifts just have logical/arithmetic versions and the left/right shifts are controlled by whether the value is positive/negative. Because of this I've added new X86ISD nodes instead of trying to force them to use the existing shift nodes.

Additionally Excavator cores (bdver4) support XOP and AVX2 - meaning that it should use the AVX2 shifts when it can and fall back to XOP in other cases.

Differential Revision: http://reviews.llvm.org/D8690

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248878 91177308-0d34-0410-b5e6-96231b3b80d8

InstrProf: Don't call std::unique twice here

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248872 91177308-0d34-0410-b5e6-96231b3b80d8

Add unittest for new samle profile format.

http://reviews.llvm.org/D13145

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248870 91177308-0d34-0410-b5e6-96231b3b80d8

http://reviews.llvm.org/D13145

Support hierarachical sample profile format.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248865 91177308-0d34-0410-b5e6-96231b3b80d8

[safestack] Fix a stupid mix-up in the direct-tls code path.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248863 91177308-0d34-0410-b5e6-96231b3b80d8

InstrProf: Add a missing const_cast from r248833

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248859 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU/SI: Don't set DATA_FORMAT if ADD_TID_ENABLE is set

to prevent setting a huge stride, because DATA_FORMAT has a different
meaning if ADD_TID_ENABLE is set.

This is a candidate for stable llvm 3.7.

Tested-and-Reviewed-by: Christian König <christian.koenig@amd.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248858 91177308-0d34-0410-b5e6-96231b3b80d8

[WinEH] Setup RBP correctly in Win64 funclet prologues

Previously local variable captures just didn't work in 64-bit. Now we
can access local variables more or less correctly.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248857 91177308-0d34-0410-b5e6-96231b3b80d8

[WinEH] Ensure that funclets obey the x64 ABI

The x64 ABI requires that epilogues do not contain code other than stack
adjustments and some limited control flow. However, we'd insert code to
initialize the return address after stack adjustments. Instead, insert
EAX/RAX with the current value before we create the stack adjustments in
the epilogue.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248839 91177308-0d34-0410-b5e6-96231b3b80d8

InstrProf: Support for value profiling in the indexed profile format

Add support to the indexed instrprof reader and writer for the format
that will be used for value profiling.

Patch by Betul Buyukkurt, with minor modifications.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248833 91177308-0d34-0410-b5e6-96231b3b80d8

HHVM calling conventions.

HHVM calling convention, hhvmcc, is used by HHVM JIT for
functions in translated cache. We currently support LLVM back end to
generate code for X86-64 and may support other architectures in the
future.

In HHVM calling convention any GP register could be used to pass and
return values, with the exception of R12 which is reserved for
thread-local area and is callee-saved. Other than R12, we always
pass RBX and RBP as args, which are our virtual machine's stack pointer
and frame pointer respectively.

When we enter translation cache via hhvmcc function, we expect
the stack to be aligned at 16 bytes, i.e. skewed by 8 bytes as opposed
to standard ABI alignment. This affects stack object alignment and stack
adjustments for function calls.

One extra calling convention, hhvm_ccc, is used to call C++ helpers from
HHVM's translation cache. It is almost identical to standard C calling
convention with an exception of first argument which is passed in RBP
(before we use RDI, RSI, etc.)

Differential Revision: http://reviews.llvm.org/D12681

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248832 91177308-0d34-0410-b5e6-96231b3b80d8

Fix test from r248825.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248827 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64] Add support for pre- and post-index LDPSWs.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248825 91177308-0d34-0410-b5e6-96231b3b80d8

[WinEH] Teach AsmPrinter about funclets

Summary:
Funclets have been turned into functions by the time they hit the object
file. Make sure that they have decent names for the symbol table and
CFI directives explaining how to reason about their prologues.

Differential Revision: http://reviews.llvm.org/D13261

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248824 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-pdbdump] Add include-only filters.

PDB files have a lot of noise in them, with hundreds (or thousands)
of symbols from system libraries and compiler generated types.  If
you're only looking for a specific type, this can be problematic.

This CL allows you to display *only* types, variables, or compilands
matching a particular pattern.  These filters can even be combined
with exclude filters.  Include-only filters are given priority, so
that first the set of items to display is limited only to those that
match the include filters, and then the set of exclude filters is
applied to those.  If there are no include filters specified, then
it means "display everything".

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248822 91177308-0d34-0410-b5e6-96231b3b80d8

Rename some function arguments in MachineBasicBlock.cpp/h by turning the first letter into upper case. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248821 91177308-0d34-0410-b5e6-96231b3b80d8

http://reviews.llvm.org/D13231

Change lookup functions to const functions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248818 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64] Add integer pre- and post-index halfword/byte loads and stores.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248817 91177308-0d34-0410-b5e6-96231b3b80d8

Revert r248810 which breaks tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248814 91177308-0d34-0410-b5e6-96231b3b80d8

Fix Clang-tidy modernize-use-nullptr warnings in examples and include directories; other minor cleanups.

Patch by Eugene Zelenko!

Differential Revision: http://reviews.llvm.org/D13172

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248811 91177308-0d34-0410-b5e6-96231b3b80d8

http://reviews.llvm.org/D13231

Change lookup functions to const functions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248810 91177308-0d34-0410-b5e6-96231b3b80d8

Addition of interfaces the BE to conform to Table A-2 of ELF V2 ABI V1.1

This patch corresponds to review:
http://reviews.llvm.org/D13191

Back end portion of the fifth round of additions to altivec.h.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248809 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64] Scale offsets by the size of the memory operation. NFC.

The immediate in the load/store should be scaled by the size of the memory
operation, not the size of the register being loaded/stored. This change gets
us one step closer to forming LDPSW instructions. This change also enables
pre- and post-indexing for halfword and byte loads and stores.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248804 91177308-0d34-0410-b5e6-96231b3b80d8

[ValueTracking] Lower dom-conditions-dom-blocks and dom-conditions-max-uses thresholds

On some of our benchmarks this change shows about 50% compile time improvement without any noticeable performance difference.

Differential Revision: http://reviews.llvm.org/D13248

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248801 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64] Remove some redundant cases. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248800 91177308-0d34-0410-b5e6-96231b3b80d8

[CMake] Move the setting of LLVM_COMPILER_IS_GCC_COMPATIBLE to a separate file

Currently LLVM_COMPILER_IS_GCC_COMPATIBLE is set as a side-effect of determining
the stdlib to use in HandleLLVMStdlib, which causes problems when attempting to
use AddLLVM from an installed LLVM toolchain, as HandleLLVMStdlib is not used.
Move the setting of this variable into DetermineGCCCompatible and include that
from both AddLLVM and HandleLLVMStdlib.

Differential Revision: http://reviews.llvm.org/D13216

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248798 91177308-0d34-0410-b5e6-96231b3b80d8

[ValueTracking] Teach isKnownNonZero about monotonically increasing PHIs

If a PHI starts at a non-negative constant, monotonically increases
(only adds of a constant are supported at the moment) and that add
does not wrap, then the PHI is known never to be zero.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248796 91177308-0d34-0410-b5e6-96231b3b80d8

Arguments spilled on the stack before a function call may have
alignment requirements, for example in the case of vectors.
These requirements are exploited by the code generator by using
move instructions that have similar alignment requirements, e.g.,
movaps on x86.

Although the code generator properly aligns the arguments with
respect to the displacement of the stack pointer it computes,
the displacement itself may cause misalignment. For example if
we have

%3 = load <16 x float>, <16 x float>* %1, align 64
call void @bar(<16 x float> %3, i32 0)

the x86 back-end emits:

movaps  32(%ecx), %xmm2
movaps  (%ecx), %xmm0
movaps  16(%ecx), %xmm1
movaps  48(%ecx), %xmm3
subl    $20, %esp       <-- if %esp was 16-byte aligned before this instruction, it no longer will be afterwards
movaps  %xmm3, (%esp)   <-- movaps requires 16-byte alignment, while %esp is not aligned as such.
movl    $0, 16(%esp)
calll   __bar

To solve this, we need to make sure that the computed value with which
the stack pointer is changed is a multiple af the maximal alignment seen
during its computation. With this change we get proper alignment:

subl    $32, %esp
movaps  %xmm3, (%esp)

Differential Revision: http://reviews.llvm.org/D12337

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248786 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] Improve Vector Demanded Bits Through Bitcasts

Currently SimplifyDemandedVectorElts can only peek through bitcasts if the vectors have the same number of elements.

This patch fixes and enables some existing (disabled) code to support bitcasting to vectors with more/fewer elements. It currently only accepts cases when vectors alias cleanly (i.e. number of elements are an exact multiple of the other vector).

This was added to improve the demanded vector elements support for SSE vector shifts which require the __m128i (<2 x i64>) argument type to be bitcast to the vector type for the builtin shift. I've added extra tests for various additional bitcasts.

Differential Revision: http://reviews.llvm.org/D12935

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248784 91177308-0d34-0410-b5e6-96231b3b80d8

[WebAssembly] Rename test files to match platform naming conventions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248783 91177308-0d34-0410-b5e6-96231b3b80d8

[LoopUnswitch] Add block frequency analysis to recognize hot/cold regions

Summary: This patch adds block frequency analysis to LoopUnswitch pass to recognize hot/cold regions. For cold regions the pass only performs trivial unswitches since they do not increase code size, and for hot regions everything works as before. This helps to minimize code growth in cold regions and be more aggressive in hot regions. Currently the default cold regions are blocks with frequencies below 20% of function entry frequency, and it can be adjusted via -loop-unswitch-cold-block-frequency flag. The entire feature is controlled via -loop-unswitch-with-block-frequency flag and it is off by default.

Reviewers: broune, silvas, dnovillo, reames

Subscribers: davidxl, llvm-commits

Differential Revision: http://reviews.llvm.org/D11605

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248777 91177308-0d34-0410-b5e6-96231b3b80d8

[CMake] X86AsmParser: Prune redundant LINK_LIBS.

It is described in LLVMBuild.txt.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248771 91177308-0d34-0410-b5e6-96231b3b80d8

Move dbg.declare intrinsics when merging and replacing allocas.

Place new and update dbg.declare calls immediately after the
corresponding alloca.

Current code in replaceDbgDeclareForAlloca puts the new dbg.declare
at the end of the basic block. LLVM codegen has problems emitting
debug info in a situation when dbg.declare appears after all uses of
the variable. This usually kinda works for inlining and ASan (two
users of this function) but not for SafeStack (see the pending change
in http://reviews.llvm.org/D13178).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248769 91177308-0d34-0410-b5e6-96231b3b80d8

RegisterPressure: LiveRegSet tracks register units not physregs

There are always more physical registers and register units so the
previous behaviour was correct but we can do with less memory.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248767 91177308-0d34-0410-b5e6-96231b3b80d8

[WinEH] Fix ip2state table emission with funclets

Previously we were hijacking the old LandingPadInfo data structures to
communicate our state numbers. Now we don't need that anymore.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248763 91177308-0d34-0410-b5e6-96231b3b80d8

Fix unused variable warning in non-debug builds.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248754 91177308-0d34-0410-b5e6-96231b3b80d8

tidy up comments; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248750 91177308-0d34-0410-b5e6-96231b3b80d8

add a FIXME for a CPU model check that should have an attribute instead

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248746 91177308-0d34-0410-b5e6-96231b3b80d8

move one-use check under the comment that describes it; NFCI

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248745 91177308-0d34-0410-b5e6-96231b3b80d8

[SCEV] Don't crash on pointer comparisons

`ScalarEvolution::isImpliedCondOperandsViaNoOverflow` tries to cast the
operand type of the comparison it is given to an `IntegerType`. This is
incorrect because it could actually be simplifying a comparison between
two pointers. Switch it to using `getTypeSizeInBits` instead, which
does the right thing for both pointers and integers.

Fixed PR24956.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248743 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Factor switch into separate function

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248742 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Fix splitting x16 SMRD loads

When used recursively, this would set the kill flag
on the intermediate step from first splitting
x16 to x8.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248741 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Fix moving SMRD loads with literal offsets on CI

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248740 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Fix splitting SMRD with large offset

The splitting of > 4 dword SMRD instructions
if using an offset in an SGPR instead of an immediate
was not setting the destination register,
resulting an an instruction missing an operand
which would assert later.

Test will be included in a following commit
which fixes a related issue.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248739 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Add testcases

Make sure we are testing moving users
of the moved and split SMRD loads.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248738 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Cleanup test

Run instnamer on it, and rename check prefix.

This is in preparation for adding new testcases to cover
bugs on other subtargets.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248737 91177308-0d34-0410-b5e6-96231b3b80d8

Improved the interface of methods commuting operands, improved X86-FMA3 mem-folding&coalescing.

Patch by Slava Klochkov (vyacheslav.n.klochkov@intel.com)

Differential Revision: http://reviews.llvm.org/D11370

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248735 91177308-0d34-0410-b5e6-96231b3b80d8

[GlobalOpt] Sort members of llvm.used deterministically

Patch by Jake VanAdrighem!

Summary:
Fix the way we sort the llvm.used and llvm.compiler.used members.

This bug seems to have been introduced in rL183756 through a set of improper casts to GlobalValue*. In subsequent patches this problem was missed and transformed into a getName call on a ConstantExpr.

Reviewers: silvas

Subscribers: silvas, llvm-commits

Differential Revision: http://reviews.llvm.org/D12851

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248728 91177308-0d34-0410-b5e6-96231b3b80d8

Improve performance of SimplifyInstructionsInBlock

1. Use a worklist, not a recursive approach, to avoid needless
   revisitation and being repeatedly forced to jump back to the
   start of the BB if a handle is invalidated.

2. Only insert operands to the worklist if they become unused
   after a dead instruction is removed, so we don’t have to
   visit them again in most cases.

3. Use a SmallSetVector to track the worklist.

4. Instead of pre-initting the SmallSetVector like in
   DeadCodeEliminationPass, only put things into the worklist
   if they have to be revisited after the first run-through.
   This minimizes how much the actual SmallSetVector gets used,
   which saves a lot of time.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248727 91177308-0d34-0410-b5e6-96231b3b80d8

[mips][p5600] Added P5600 processor and initial scheduler.

Summary:
The P5600 is an out-of-order, superscalar implementation of the MIPS32R5
architecture.

The scheduler has a few missing details (see the 'Tricky Instructions'
section and some quirks of the P5600 are deliberately omitted due to
implementation difficulty and low chance of significant benefit (e.g. the
predicate on P5600WriteEitherALU). However, testing on SingleSource is
showing significant performance benefits on some apps (seven in the 10-30%
range) and only one significant regression (12%) when
-pre-RA-sched=linearize is given. Without -pre-RA-sched=linearize the
results are more variable. Some do even better (up to 55% improvement) but
increased numbers of copies are slowing others down (up to 12%).

Overall, the scheduler as it currently stands is a 2.4% win with
-pre-RA-sched=linearize and a 2.7% win without -pre-RA-sched=linearize.
I'm sure we can improve on this further.

For completeness, the FPGA this was tested on shows some failures with and
without the P5600 scheduler. These appear to be scheduling related since
the two test runs have fairly different sets of failing tests even after
accounting for other factors (e.g. spurious connection failures) however
it's not P5600 specific since we also get some for the generic scheduler.

Reviewers: vkalintiris

Subscribers: mpf, llvm-commits, atrick, vkalintiris

Differential Revision: http://reviews.llvm.org/D12193

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248725 91177308-0d34-0410-b5e6-96231b3b80d8

Introduce !align metadata for load instruction

Reviewed By: hfinkel

Differential Revision: http://reviews.llvm.org/D12853

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248721 91177308-0d34-0410-b5e6-96231b3b80d8

[InstSimplify] Fold simple known implications to true

This was split off of http://reviews.llvm.org/D13040 to make it easier to test the correctness of the implication logic. For the moment, this only handles a single easy case which shows up when eliminating and combining range checks. In the (near) future, I plan to extend this for other cases which show up in range checks, but I wanted to make those changes incrementally once the framework was in place.

At the moment, the implication logic will be used by three places. One in InstSimplify (this review) and two in SimplifyCFG (http://reviews.llvm.org/D13040 & http://reviews.llvm.org/D13070). Can anyone think of other locations this style of reasoning would make sense?

Differential Revision: http://reviews.llvm.org/D13074

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248719 91177308-0d34-0410-b5e6-96231b3b80d8

[LoopReroll] Ignore debug intrinsics

Originally, debug intrinsics and annotation intrinsics may prevent
the loop to be rerolled, now they are ignored.

Differential Revision: http://reviews.llvm.org/D13150

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248718 91177308-0d34-0410-b5e6-96231b3b80d8

[WebAssembly] Support for direct call and call_indirect.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248716 91177308-0d34-0410-b5e6-96231b3b80d8

[mips] Handling of immediates bigger than 16 bits
Differential Revision: http://reviews.llvm.org/D10539

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248706 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] Avoid redundant checks for isThumb1Only() after supportsTailCall()

supportsTailCall() has two callers. Both of them double-check isThumb1Only(),
and refuse to proceed with tail-calling in that case.
Therefore, it makes sense to move this check to
ARMSubtarget::initSubtargetFeatures, where SupportsTailCall is initialized;
and to eliminate the extra checks at the call sites.

Following a review comment, added an "assert(supportsTailCall())"
in IsEligibleForTailCall.

NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248703 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombine] Fix getStoreMergeAndAliasCandidates's AA-enabled chain walking

When AA is being used, non-aliasing stores are canonicalized to use the same
chain, and DAGCombiner::getStoreMergeAndAliasCandidates can take advantage of
this by looking only as users of a store's chain operand. However, user
iteration is not result-number specific, we need to check that the use is as a
chain operand, and not via some other operand. It is certainly possible to have
another potentially-aliasing store, which shares the first's base pointer, and
uses the first's chain's node via some other operand.

Failure to catch this situation caused, at least in the included test case, an
assert later because the relative sequence-number ordering caused later
replacement to create a cycle in the DAG.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248698 91177308-0d34-0410-b5e6-96231b3b80d8

Remove 'const' from some ArrayRefs. ArrayRefs are already immutable. NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248693 91177308-0d34-0410-b5e6-96231b3b80d8

AsmWriter: Print the argument names in declarations while debugging

When llvm declarations have argument names, it's helpful to actually
print those names when debugging. Arguably, it'd be nice to print them
all the time, but that would mean the IR we output wouldn't round trip
through bitcode, which doesn't store the names.

Make the varous print() methods in AsmWriter optionally print "for
debug" and set that flag in the dump() methods. The only thing this
does differently for now is print the argument names in declarations.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248692 91177308-0d34-0410-b5e6-96231b3b80d8

Silence clang warning: variable ‘Status’ set but not used.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248691 91177308-0d34-0410-b5e6-96231b3b80d8

[SCEV] identical instructions don't compute equal values

Before this change `HasSameValue` would return true for distinct
`alloca` instructions if they happened to be allocating the same
type (`alloca` instructions are not specified as reading memory). This
change adds an explicit whitelist of instruction types for which
"identical" instructions compute the same value.

Fixes PR24952.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248690 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] fold zexts and constants into a phi (PR24766)

This is one step towards solving PR24766:
https://llvm.org/bugs/show_bug.cgi?id=24766

We were not producing the same IR for these two C functions because the store
to the temp bool causes extra zexts:

#include <stdbool.h>

bool switchy(char x1, char x2, char condition) {
   bool conditionMet = false;
   switch (condition) {
   case 0: conditionMet = (x1 == x2); break;
   case 1: conditionMet = (x1 <= x2); break;
   }
   return conditionMet;
}

bool switchy2(char x1, char x2, char condition) {
   switch (condition) {
   case 0: return (x1 == x2);
   case 1: return (x1 <= x2);
   }
  return false;
}

As noted in the code comments, this test case manages to avoid the more general existing
phi optimizations where there are only 2 phi inputs or where there are no constant phi
args mixed in with the casts ops. It seems like a corner case, but if we don't catch it,
then I don't think we can get SimplifyCFG to further optimize towards the canonical form
for this function shown in the bug report.

Differential Revision: http://reviews.llvm.org/D12866

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248689 91177308-0d34-0410-b5e6-96231b3b80d8

[EH] Create removeUnwindEdge utility

Summary:
Factor the code that rewrites invokes to calls and rewrites WinEH
terminators to their "unwind to caller" equivalents into a helper in
Utils/Local, and use it in the three places I'm aware of that need to do
this.

Reviewers: andrew.w.kaylor, majnemer, rnk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13152

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248677 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] Removed unnecessary meta attributes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248672 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-mc-fuzzer] Fix -jobs option.

The fuzzer argument parser will ignore all options starting with '--' so
operation mode options should begin with '--' and fuzzer options should begin
with '-'. Fuzzer arguments must still follow --fuzzer-args so that they escape
the parsing performed by the CommandLine library.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248671 91177308-0d34-0410-b5e6-96231b3b80d8

[BranchProbability] Manually round the floating point output.

llvm::format compiles down to snprintf which has no defined rounding for
floating point arguments, and MSVC has implemented it differently from
what the BSD libcs and glibc do. Try to emulate the glibc rounding
behavior to avoid changing tests.

While there simplify code a bit and move trivial methods inline.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248665 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Remove hasPostISelHook from most instructions

Since this is only needed for VOP3 and a few other special
case instructions, stop setting it on everything.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248657 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Switch over reg class size instead of checking all super classes

This gets isSGPRClass out of my profile of SIFixSGPRCopies.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248656 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Don't handle invalid reg classes in helper functions

No tests hit these and it would be better to have checks like
this explicit where they are used.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248655 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: address -Winconsistent-missing-override

Add missing override. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248652 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Set CopyCost of register classes

These require multiple mov instructions to copy,
but the default value is that 1 instruction is needed.
I'm not sure if this actually changes anything.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248651 91177308-0d34-0410-b5e6-96231b3b80d8

[Bug 24848] Use range metadata to constant fold comparisons between two values

Summary:
This is the second part of fixing bug 24848 https://llvm.org/bugs/show_bug.cgi?id=24848.

If both operands of a comparison have range metadata, they should be used to constant fold the comparison.

Reviewers: sanjoy, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13177

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248650 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: VOP3b definition cleanups

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248647 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Fix sched model for VOP2b instructions

Trying to use the version with the explicit output operand
would complain because of the missing WriteSALU. I'm not sure
why it doesn't complain about this with the implicit VCC def.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248646 91177308-0d34-0410-b5e6-96231b3b80d8

[WebAssembly] Rename several functions and types according to the new spec.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248644 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] Don't generate clrex for pre-v7 targets.

Since r248294, we emit clrex, but it doesn't exist on v6.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248640 91177308-0d34-0410-b5e6-96231b3b80d8

[SCEV] Reapply 'Teach isLoopBackedgeGuardedByCond to exploit trip counts'

Summary:
If the trip count of a specific backedge is `N`, then we know that
backedge is effectively guarded by the condition `{0,+,1} u< N`. This
change teaches SCEV to use this condition to prove things in
`isLoopBackedgeGuardedByCond`.

Depends on D12948
Depends on D12949

The original checkin, r248608 had to be backed out due to an issue with
a ObjCXX unit test. That issue is now fixed, so re-landing.

Reviewers: atrick, reames, majnemer, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12950

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248638 91177308-0d34-0410-b5e6-96231b3b80d8

[SCEV] Reapply 'Exploit A < B => (A+K) < (B+K) when possible'

Summary:

This change teaches SCEV's `isImpliedCond` two new identities:

  A u< B u< -C          =>  (A + C) u< (B + C)
  A s< B s< INT_MIN - C =>  (A + C) s< (B + C)

While these are useful on their own, they're really intended to support
D12950.

The original checkin, r248606 had to be backed out due to an issue with
a ObjCXX unit test.  That issue is now fixed, so re-landing.

Reviewers: atrick, reames, majnemer, nlewycky, hfinkel

Subscribers: aadg, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D12948

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248637 91177308-0d34-0410-b5e6-96231b3b80d8

LivePhysRegs: Fix live-outs of return blocks

I realized that the live-out set computed for the return block is
missing the callee saved registers (the non-pristine ones to be exact).

This only affects the liveness computed for instructions inside the
function epilogue which currently none of the LivePhysRegs users in llvm
cares about, so this is just a drive-by fix without a testcase.

Differential Revision: http://reviews.llvm.org/D13180

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248636 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] match De Morgan's Law hidden by zext ops (PR22723)

This is a fix for PR22723:
https://llvm.org/bugs/show_bug.cgi?id=22723

My first attempt at this was to change what I thought was the root problem:

xor (zext i1 X to i32), 1 --> zext (xor i1 X, true) to i32

...but we create the opposite pattern in InstCombiner::visitZExt(), so infinite loop!

My next idea was to fix the matchIfNot() implementation in PatternMatch, but that would
mean potentially returning a different size for the match than what was input. I think
this would require all users of m_Not to check the size of the returned match, so I
abandoned that idea.

I settled on just fixing the exact case presented in the PR. This patch does allow the
2 functions in PR22723 to compile identically (x86):

bool test(bool x, bool y) { return !x | !y; }
bool test(bool x, bool y) { return !x || !y; }
...
andb %sil, %dil
xorb $1, %dil
movb %dil, %al
retq

Differential Revision: http://reviews.llvm.org/D12705

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248634 91177308-0d34-0410-b5e6-96231b3b80d8

Use fixed-point representation for BranchProbability.

BranchProbability now is represented by its numerator and denominator in uint32_t type. This patch changes this representation into a fixed point that is represented by the numerator in uint32_t type and a constant denominator 1<<31. This is quite similar to the representation of BlockMass in BlockFrequencyInfoImpl.h. There are several pros and cons of this change:

Pros:

1. It uses only a half space of the current one.
2. Some operations are much faster like plus, subtraction, comparison, and scaling by an integer.

Cons:

1. Constructing a probability using arbitrary numerator and denominator needs additional calculations.
2. It is a little less precise than before as we use a fixed denominator. For example, 1 - 1/3 may not be exactly identical to 1 / 3 (this will lead to many BranchProbability unit test failures). This should not matter when we only use it for branch probability. If we use it like a rational value for some precise calculations we may need another construct like ValueRatio.

One important reason for this change is that we propose to store branch probabilities instead of edge weights in MachineBasicBlock. We also want clients to use probability instead of weight when adding successors to a MBB. The current BranchProbability has more space which may be a concern.

Differential revision: http://reviews.llvm.org/D12603

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248633 91177308-0d34-0410-b5e6-96231b3b80d8

SelectionDAGDumper: Print simple operands inline.

Print simple operands inline instead of their pointer/value number.
Simple operands are SDNodes without predecessors like Constant(FP), Register,
UNDEF. This unifies the behaviour with dumpr() which was already doing this.

Previously:
  t0: ch = EntryToken
    t1: i64 = Register %vreg0
  t2: i64,ch = CopyFromReg t0, t1
    t3: i64 = Constant<1>
  t4: i64 = add t2, t3
    t5: i64 = Constant<2>
  t6: i64 = add t2, t5
  t10: i64 = undef
  t11: i8,ch = load t0, t2, t10<LD1[%tmp81]>
  t12: i8,ch = load t0, t4, t10<LD1[%tmp10]>
  t13: i8,ch = load t0, t6, t10<LD1[%tmp12]>

Now:
  t0: ch = EntryToken
  t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
  t4: i64 = add t2, Constant:i64<1>
  t6: i64 = add t2, Constant:i64<2>
  t11: i8,ch = load<LD1[%tmp81]> t0, t2, undef:i64
  t12: i8,ch = load<LD1[%tmp10]> t0, t4, undef:i64
  t13: i8,ch = load<LD1[%tmp12]> t0, t6, undef:i64

Differential Revision: http://reviews.llvm.org/D12567

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248628 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Construct new buffer instruction when moving SMRD

It's easier to understand creating a full instruction
than the current situation where sometimes a new
instruction is created and sometimes it is awkwardly
mutated in place.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248627 91177308-0d34-0410-b5e6-96231b3b80d8

DAGCombiner: Check if store is volatile first

This is the simpler check. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248625 91177308-0d34-0410-b5e6-96231b3b80d8

TargetRegisterInfo: Introduce PrintLaneMask.

This makes it more convenient to print lane masks and lead to more
uniform printing.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248624 91177308-0d34-0410-b5e6-96231b3b80d8

TargetRegisterInfo: Add typedef unsigned LaneBitmask and use it where apropriate; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248623 91177308-0d34-0410-b5e6-96231b3b80d8

merge vector stores into wider vector stores and fix AArch64 misaligned access TLI hook (PR21711)

This is a redo of D7208 ( r227242 - http://llvm.org/viewvc/llvm-project?view=revision&revision=227242 ).

The patch was reverted because an AArch64 target could infinite loop after the change in DAGCombiner
to merge vector stores. That happened because AArch64's allowsMisalignedMemoryAccesses() wasn't telling
the truth. It reported all unaligned memory accesses as fast, but then split some 128-bit unaligned
accesses up in performSTORECombine() because they are slow.

This patch attempts to fix the problem in AArch's allowsMisalignedMemoryAccesses() while preserving
existing (perhaps questionable) lowering behavior.

The x86 test shows that store merging is working as intended for a target with fast 32-byte unaligned
stores.

Differential Revision: http://reviews.llvm.org/D12635

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248622 91177308-0d34-0410-b5e6-96231b3b80d8

PrologueEpilogInserter: Fix missing live-ins when savepoint equals restorepoint

The algorithm would not modify the live-in list of blocks below the save
block point which is correct unless it happens to be a restore point at
the same time.
Also fixes the benign issue of live-in registers being added twice in
some cases.

The testcase is based on a test submitted by Kit Barton.

Differential Revision: http://reviews.llvm.org/D13176

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248620 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU/SI: Use .hsatext section instead of .text for HSA

Reviewers: arsenm, grosbach, rafael

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D12424

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248619 91177308-0d34-0410-b5e6-96231b3b80d8

MCAsmInfo: Allow targets to specify when the .section directive should be omitted

Summary:
The default behavior is to omit the .section directive for .text, .data,
and sometimes .bss, but some targets may want to omit this directive for
other sections too.

The AMDGPU backend will uses this to emit a simplified syntax for section
switches. For example if the section directive is not omitted (current
behavior), section switches to .hsatext will be printed like this:

.section .hsatext,#alloc,#execinstr,#write

This is actually wrong, because .hsatext has some custom STT_* flags,
which MC doesn't know how to print or parse.

If the section directive is omitted (made possible by this commit),
section switches will be printed like this:

.hsatext

The motivation for this patch is to make it possible to emit sections
with custom STT_* flags without having to teach MC about all the target
specific STT_* flags.

Reviewers: rafael, grosbach

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12423

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248618 91177308-0d34-0410-b5e6-96231b3b80d8

MachineBasicBlock: Factor out common code into isReturnBlock()

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248617 91177308-0d34-0410-b5e6-96231b3b80d8

Revert two SCEV changes that caused test failures in clang.

r248606: "[SCEV] Exploit A < B => (A+K) < (B+K) when possible"
r248608: "[SCEV] Teach isLoopBackedgeGuardedByCond to exploit trip counts."

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248614 91177308-0d34-0410-b5e6-96231b3b80d8

ADCE: Fix typo in file comment. NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248613 91177308-0d34-0410-b5e6-96231b3b80d8

PeepholeOptimizer: Remove redundant copies

If a virtual register is copied and another copy was already
seen, replace with the previous copy. This only handles the
simplest cases for now.

This pattern shows up from various operand restrictions
AMDGPU has which require inserting copies depending
on the register class of the operands.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248611 91177308-0d34-0410-b5e6-96231b3b80d8

Simplify code. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248610 91177308-0d34-0410-b5e6-96231b3b80d8

more space; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248609 91177308-0d34-0410-b5e6-96231b3b80d8