Programming Languages Research Group: Git

[CMake] We need to explicitly add llvm-config before clang so that LLVM_BUILD_EXTERNAL_COMPILER_RT can depend on llvm-config.

This patch is a required stepping stone to fix PR14109.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249202 91177308-0d34-0410-b5e6-96231b3b80d8

inariant.group handling in GVN

The most important part required to make clang
devirtualization works ( ͡°͜ʖ ͡°).
The code is able to find non local dependencies, but unfortunatelly
because the caller can only handle local dependencies, I had to add
some restrictions to look for dependencies only in the same BB.

http://reviews.llvm.org/D12992

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249196 91177308-0d34-0410-b5e6-96231b3b80d8

[libFuzzer] remove experimental flag and functionality

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249194 91177308-0d34-0410-b5e6-96231b3b80d8

[WebAssembly] Fix CFG stackification of nested loops.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249187 91177308-0d34-0410-b5e6-96231b3b80d8

[WebAssembly] Support calls marked as "tail", fastcc, and coldcc.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249184 91177308-0d34-0410-b5e6-96231b3b80d8

Call the correct overload.

Call the correct overload so a string literal does not get converted to a bool.
Also fix the test case to match the names given.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249183 91177308-0d34-0410-b5e6-96231b3b80d8

[libFuzzer] add a flag -max_total_time

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249181 91177308-0d34-0410-b5e6-96231b3b80d8

[WebAssembly] Add a resize_memory intrinsic.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249178 91177308-0d34-0410-b5e6-96231b3b80d8

[SCEV] Refactor out a createNodeForSelect

Summary:
We will shortly re-use this for select-like br-phi pairs.

Reviewers: atrick, joker-eph, joker.eph

Subscribers: sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D13377

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249177 91177308-0d34-0410-b5e6-96231b3b80d8

[Tests] Add one more case to LoopUnroll/pr18861.ll for better coverage.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249174 91177308-0d34-0410-b5e6-96231b3b80d8

[Tests] Give meaningful names to blocks in LoopUnroll/pr18861.ll, add a description of what's going on.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249173 91177308-0d34-0410-b5e6-96231b3b80d8

[Tests] Slightly reduce test LoopUnroll/pr18861.ll.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249172 91177308-0d34-0410-b5e6-96231b3b80d8

[WebAssembly] Add a memory_size intrinsic.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249171 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU/SI: Add verifier check for exec reads

Make sure we aren't accidentally not setting
these in the instruction definitions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249170 91177308-0d34-0410-b5e6-96231b3b80d8

Add way to test for generic TargetOpcodes

The alternative would be to add a bit to the target's
InstrFlags but that seems like a waste of a bit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249169 91177308-0d34-0410-b5e6-96231b3b80d8

[SCEV] Try to prove predicates by splitting them

Summary:
This change teaches SCEV that to prove `A u< B` it is sufficient to
prove each of these facts individually:

- B >= 0
- A s< B
- A >= 0

In practice, SCEV sometimes finds it easier to prove these facts
individually than to prove `A u< B` as one atomic step.

Reviewers: reames, atrick, nlewycky, hfinkel

Subscribers: sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D13042

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249168 91177308-0d34-0410-b5e6-96231b3b80d8

Actually switch the arch when we see .arch. PR21695

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249165 91177308-0d34-0410-b5e6-96231b3b80d8

ARM: diagnose invalid local fixups on Thumb1

We previously stopped producing Thumb2 relaxations when they weren't supported,
but only diagnosed the case where an actual relocation was produced. We should
also tell people if local symbols aren't going to work rather than silently
overflowing.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249164 91177308-0d34-0410-b5e6-96231b3b80d8

ARM: correctly align constant pool value on Thumb1 targets.

Since we're using tLDRpci to access it, the constant pool's address must be 0
(mod 4).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249163 91177308-0d34-0410-b5e6-96231b3b80d8

[lit] Raise the default soft process limit when possible

It is common to have a default soft process limit, at least on some families of
Linux distributions, of 1024. This is normally more than enough, but if you
have many cores, and you're running tests that create many threads, this can
become a problem. My POWER7 development machine has 48 cores, and when running
the lld regression tests, which often want to create up to 48 threads, I run
into problems. lit, by default, will want to run 48 tests in parallel, and
48*48 < 1024, and so many tests fail like this:

terminate called after throwing an instance of 'std::system_error'

what(): Resource temporarily unavailable
or lit fails like this when launching a test:

OSError: [Errno 11] Resource temporarily unavailable

lit can easily detect this situation and attempt to repair it before launching
tests (by raising the soft process limit to something that will allow ncpus^2
threads to be created), and should do so to prevent spurious test failures.

This is the follow-up to this thread:
http://lists.llvm.org/pipermail/llvm-dev/2015-October/090942.html

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249161 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] Typo. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249153 91177308-0d34-0410-b5e6-96231b3b80d8

Reapply r249121 : "[FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types."

This patch teaches FastIsel the following two things:
1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types;
2) On AVX, no instructions are needed for bitcasts between 256-bit vector types.

Example:

%1 = bitcast <4 x i31> %V to <2 x i64>

Before (-fast-isel -fast-isel-abort=1):

FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64>

Now we don't fall back to SelectionDAG and we correctly fold that computation
propagating the register associated to %V.

Originally reviewed here: http://reviews.llvm.org/D13347

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249147 91177308-0d34-0410-b5e6-96231b3b80d8

Revert: [FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types.

r249121 caused a Clang test failure (avx2-buitins.c).
Revert r249121 while I keep investigating on the reason why that test failed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249124 91177308-0d34-0410-b5e6-96231b3b80d8

[mips][microMIPS] Fix an issue with selecting sqrt instruction in LLVM backend
Differential Revision: http://reviews.llvm.org/D13235

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249123 91177308-0d34-0410-b5e6-96231b3b80d8

[FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types.

This patch teaches FastIsel the following two things:
1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types;
2) On AVX, no instructions are needed for bitcasts between 256-bit vector types.

Example:

%1 = bitcast <4 x i31> %V to <2 x i64>

Before (-fast-isel -fast-isel-abort=1):

FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64>

Now we don't fall back to SelectionDAG and we correctly fold that computation
propagating the register associated to %V.

Differential Revision: http://reviews.llvm.org/D13347

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249121 91177308-0d34-0410-b5e6-96231b3b80d8

DenseMap: we're trying to call the reserved global placement allocation
function here; use "::new" to avoid accidentally picking up a class-specific
operator new.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249112 91177308-0d34-0410-b5e6-96231b3b80d8

dsymutil: Also ignore the ByteSize when building the DeclContext cache for
clang modules.

Forward decls of ObjC interfaces don't have a bytesize.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249110 91177308-0d34-0410-b5e6-96231b3b80d8

[LibFuzzer] test_single_input option to run a single test case.

-test_single_input flag specifies a file name with test data.

Review URL: http://reviews.llvm.org/D13359

Patch by Mike Aizatsky!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249096 91177308-0d34-0410-b5e6-96231b3b80d8

[SimplifyLibCalls] Fix instruction misplacement in string/memory libcall optimization

When trying to optimize fortified library functions use the right
location to insert new instructions in order to preserve correct
def-use order.

This fixes an issue where a misplaced instruction definition would
happen to be *after* one of its use after a RAUW, forming invalid IR.
This behavior was introduced by r227250.

Differential Revision: http://reviews.llvm.org/D13301

rdar://problem/22802369

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249092 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Fix unused variable warning in release build

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249091 91177308-0d34-0410-b5e6-96231b3b80d8

[Hexagon] XFAILing test while diagnosing backend error.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249088 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Move SIFixSGPRLiveRanges to be a regalloc pass

Replace LiveInterval usage with LiveVariables. LiveIntervals
computes far more information than is needed for this pass
which just needs to find if an SGPR is live out of the
defining block.

LiveIntervals are not usually available that early, requiring
computing them twice which is very expensive. The extra run of
LiveIntervals/LiveVariables/SlotIndexes was costing in total
about 5% of compile time.

Continuing to use LiveIntervals is problematic. It seems
there is an option (early-live-intervals) to run the analysis
about where it should go to avoid recomputing LiveVariables,
but it seems to be completely broken with subreg liveness
enabled. There are also problems from trying to recompute
LiveIntervals since this seems to undo LiveVariables
and clearing kill flags, causing TwoAddressInstructions
to make bad decisions.

Insert the pass right after live variables and preserve it.
The tricky case to worry about might be phis since
LiveVariables doesn't count a register as live out if
in the successor block it is only used in a phi,
but I don't think this is a concern right now
because SIFixSGPRCopies replaces SGPR phis.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249087 91177308-0d34-0410-b5e6-96231b3b80d8

Fix relocation used for GOT references in non-PIC mode. Fix relocations
for "set" pseudo op in PIC mode.

Differential Revision: http://reviews.llvm.org/D13173

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249086 91177308-0d34-0410-b5e6-96231b3b80d8

[PATCH] D13360: [llvm-objdump] Teach -d about AArch64 mapping symbols

AArch64 uses $d* and $x* to interleave between text and data.
llvm-objdump didn't know about this so it ended up printing garbage.
This patch is a first step towards a solution of the problem.

Differential Revision: http://reviews.llvm.org/D13360

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249083 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Merge if and switch

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249082 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Remove dead code

There's no point in checking VReg_1 because all uses
of it should already have been removed by SILowerI1Copies.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249081 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Make SIInsertWaits about a factor of 4 faster

This was the slowest target custom pass and was spending 80%
of the time in getMinimalPhysRegClass which was called
for every register operand.

Try to use the statically known register class when possible from
the instruction's MCOperandInfo. There are a few pseudo instructions
which are not well behaved with unknown register classes which still
require the expensive physical register class search.

There are a few other possibilities for making this even faster,
such as not inspecting implicit operands. For now those are checked
because it is technically possible to have a scalar load into
exec or vcc which can be implicitly used.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249079 91177308-0d34-0410-b5e6-96231b3b80d8

[WinEH] Emit __C_specific_handler tables for the new IR

We emit denormalized tables, where every range of invokes in the same
state gets a complete list of EH action entries. This is significantly
simpler than trying to infer the correct nested scoping structure from
the MI. Fortunately, for SEH, the nesting structure is really just a
size optimization.

With this, some basic __try / __except examples work.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249078 91177308-0d34-0410-b5e6-96231b3b80d8

[Hexagon] XFAILing test while diagnosing backend error.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249075 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU/SI: Remove assert from AMDGPUOpenCLImageTypeLowering pass

Summary:
Instead of asserting when the kernel metadata is different than we expect,
we should just skip lowering that function. This fixes assertion
failures with OpenCL argument metadata from older LLVM releases.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13356

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249073 91177308-0d34-0410-b5e6-96231b3b80d8

[WinEH] Stop BranchFolding from merging across funclets

BranchFolding would merge two funclets together, this is not OK.
Disable this and strengthen the assertion in FuncletLayout.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249069 91177308-0d34-0410-b5e6-96231b3b80d8

Kill another reference to in-source builds

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249067 91177308-0d34-0410-b5e6-96231b3b80d8

[WinEH] Make FuncletLayout more robust against catchret

Catchret transfers control from a catch funclet to an earlier funclet.
However, it is not completely clear which funclet the catchret target is
part of. Make this clear by stapling the catchret target's funclet
membership onto the CATCHRET SDAG node.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249052 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64] Deprecate a command-line option used for testing.

Support for pairing unscaled loads and stores has been enabled since the
original ARM64 port. This feature is no longer experimental, AFAICT.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249049 91177308-0d34-0410-b5e6-96231b3b80d8

[SystemZ] Add some generic (floating point support) load instructions.

Add generic instructions for load complement, load negative and load positive
for fp32 and fp64, and let isel prefer them. They do not clobber CC, and so
give scheduler more freedom. SystemZElimCompare pass will convert them when it
can to the CC-setting variants.

Regression tests updated to expect the new opcodes in places where the old ones
where used. New test case SystemZ/fp-cmp-05.ll checks that
SystemZCompareElim.cpp can handle the new opcodes.

README.txt updated (bullet removed).

Note that fp128 is not yet handled, because it is relatively rare, and is a
bit trickier, because of the fact that l.dfr would operate on the sign bit of
one of the subregisters of a fp128, but we would not want to copy the other
sub-reg in case src and dst regs are not the same.

Reviewed by Ulrich Weigand.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249046 91177308-0d34-0410-b5e6-96231b3b80d8

Fix printing of 64 bit values and make test more strict.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249043 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Add MEM_RAT STORE_TYPED.

v2: Add test (Matt).
    Fix capitalization of isEOP (Matt).
    Move pattern to class parameter (Matt).
    Make the instruction available to Cayman (Matt).
    Change name from MEM_RAT WRITE_TYPED to MEM_RAT STORE_TYPED.

Patch by: Zoltan Gilian

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249042 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Factor out EOP query.

v2: Fix brace placement and capitalization (Matt).

Patch by: Zoltan Gilian

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249041 91177308-0d34-0410-b5e6-96231b3b80d8

Reformat.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249033 91177308-0d34-0410-b5e6-96231b3b80d8

Revert r248959, "[WinEH] Emit int3 after noreturn calls on Win64"

It broke; LLVM :: CodeGen__Generic__2009-11-16-BadKillsCrash.ll

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249032 91177308-0d34-0410-b5e6-96231b3b80d8

Use more strict types. NFC.

On 32 bit ELF these are 32 bit values.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249022 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] Remove trivially empty lifetime start/end ranges.

Summary:
Some passes may open up opportunities for optimizations, leaving empty
lifetime start/end ranges. For example, with the following code:

    void foo(char *, char *);
    void bar(int Size, bool flag) {
      for (int i = 0; i < Size; ++i) {
        char text[1];
        char buff[1];
        if (flag)
          foo(text, buff); // BBFoo
      }
    }

the loop unswitch pass will create 2 versions of the loop, one with
flag==true, and the other one with flag==false, but always leaving
the BBFoo basic block, with lifetime ranges covering the scope of the for
loop. Simplify CFG will then remove BBFoo in the case where flag==false,
but will leave the lifetime markers.

This patch teaches InstCombine to remove trivially empty lifetime marker
ranges, that is ranges ending right after they were started (ignoring
debug info or other lifetime markers in the range).

This fixes PR24598: excessive compile time after r234581.

Reviewers: reames, chandlerc

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13305

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249018 91177308-0d34-0410-b5e6-96231b3b80d8

[SystemZ] Add assembly instructions for obtaining clock values as well as CPU features

Provide assembler support for STCK, STCKF, STCKE, and STFLE.

Author: joncmu
Differential Revision: http://reviews.llvm.org/D13299

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249015 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64] Hoist commonly failing check. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249011 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64] Rename variable to improve readability. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249008 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64] Update comment to reflect reality.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249007 91177308-0d34-0410-b5e6-96231b3b80d8

[mips][microMIPS] Implement CACHEE, WRPGPR and WSBH instructions
Differential Revision: http://reviews.llvm.org/D10337

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249004 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] More care with Thumb1 writeback in ARMLoadStoreOptimizer

Differential Revision: http://reviews.llvm.org/D13240

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249002 91177308-0d34-0410-b5e6-96231b3b80d8

[NaryReassociate] SeenExprs records WeakVH

Summary:
The instructions SeenExprs records may be deleted during rewriting.
FindClosestMatchingDominator should ignore these deleted instructions.

Fixes PR24301.

Reviewers: grosser

Subscribers: grosser, llvm-commits

Differential Revision: http://reviews.llvm.org/D13315

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248983 91177308-0d34-0410-b5e6-96231b3b80d8

Fix performance problem in long-running SectionMemoryManagers

Summary:
Without this patch, the memory manager would call `mprotect` on every memory
region it ever allocated whenever it wanted to finalize memory (i.e. not just
the ones it just allocated). This caused terrible performance problems for
long running memory managers. In one particular compile heavy julia benchmark,
we were spending 50% of time in `mprotect` if running under MCJIT.

Fix this by splitting allocated memory blocks into those on which memory
permissions have been set and those on which they haven't and only running
`mprotect` on the latter.

Reviewers: lhames

Subscribers: reames, llvm-commits

Differential Revision: http://reviews.llvm.org/D13156

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248981 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU/SI: Re-order PreloadedValue enum and number entries based on init order

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D12451

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248978 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-objdump] Fix time of check to time of use bug.

There's already a test that covers this situation, so we should be
fine.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248976 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "Enable -Wdeprecated in the cmake build now that LLVM (& Clang, Polly, and LLD) are -Wdeprecated clean"

This reverts commit r248963.

Seems there's some standard libraries (and libcxxabi implementations)
that aren't -Wdeprecated clean... hrm.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248972 91177308-0d34-0410-b5e6-96231b3b80d8

Update sample profile propagation algorithm.

http://reviews.llvm.org/D13218

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248968 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Don't custom-lower vNi32 uint_to_fp when unsafe-fp-math.

The custom code produces incorrect results if later reassociated.

Since r221657, on x86, vNi32 uitofp is lowered using an optimized
sequence:

  movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...]
  pand %xmm0, %xmm1
  por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...]
  psrld $16, %xmm0
  por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...]
  addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...]
  addps %xmm1, %xmm0

Since r240361, the machine combiner opportunistically reassociates
2-instruction sequences (with -ffast-math). In the new code sequence,
the ADDPS' are eligible. In isolation, for simple examples (without
reassociable users), this makes no performance difference (the goal
being to enable reassociation of longer chains).

In the trivial example (just one uitofp), the reassociation doesn't
happen, because (I think) it would require the emission of a separate
movaps for a constantpool load (instead of folding it into addps).

However, when we have multiple uitofp sequences, and the constantpool
loads are CSE'd earlier, the machine combiner can do the reassociation.

When the ADDPS' are reassociated, the resulting sequence isn't correct
anymore, as we'd be adding large (2**39) constants with comparatively
smaller values (~2**23). Given that two of the three inputs are powers
of 2 larger than 2**16, and that ulp(2**39) == 2**(39-24) == 2**15,
the reassociated chain will produce 0 for any input in [0, 2**14[.
In my testing, it also produces wrong results for 99.5% of [0, 2**32[.

Avoid this by disabling the new lowering when -ffast-math. It does
mean that we'll get slower code than without it, but at least we
won't get egregiously incorrect code.

One might argue that, considering -ffast-math is all but meaningless,
uitofp producing wrong results isn't a compiler bug. But it really is.

Fixes PR24512.

...though this is really more of a workaround.
Ideally, we'd have some sort of Machine FMF, but that's a problem
that's not worth tackling until we do more with machine IR.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248965 91177308-0d34-0410-b5e6-96231b3b80d8

Enable -Wdeprecated in the cmake build now that LLVM (& Clang, Polly, and LLD) are -Wdeprecated clean

This particularly helps enforce the C++ Rule of 5 (for new move ops this
is already an error, but for a type only using C++98 features (copy
ctor/assign, dtor) it is only deprecated, not invalid)

Applying the flag for any GCC compatible compiler - GCC doesn't warn on
the Rule of 5 cases that C++11 deprecates, but it doesn't have other
false positives so far as I could see (compiling with GCC 4.8 didn't
produce any -Wdeprecated warnings I could spot).

Reviewers: aaron.ballman

Differential Revision: http://reviews.llvm.org/D13314

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248963 91177308-0d34-0410-b5e6-96231b3b80d8

[WinEH] Emit int3 after noreturn calls on Win64

The Win64 unwinder disassembles forwards from each PC to try to
determine if this PC is in an epilogue. If so, it skips calling the EH
personality function for that frame. Typically, this means you cannot
catch an exception in the same frame that you threw it, because 'throw'
calls a noreturn runtime function.

Previously we avoided this problem with the TrapUnreachable
TargetOption, but that's a much bigger hammer than we need. All we need
is a 1 byte non-epilogue instruction right after the call. Instead,
what we got was an unconditional branch to a shared block containing the
ud2, potentially 7 bytes instead of 1. So, this reverts r206684, which
added TrapUnreachable, and replaces it with something better.

The new code pattern matches for invoke/call followed by unreachable and
inserts an int3 into the DAG. To be 100% watertight, we would need to
insert SEH_Epilogue instructions into all basic blocks ending in a call
with no terminators or successors, but in practice this is unlikely to
come up.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248959 91177308-0d34-0410-b5e6-96231b3b80d8

[PowerPC] undef Relocation names in PowerPC*.def

glibc's PowerPC /usr/include/asm/sigcontext.h, has this:

  #ifdef __powerpc64__
  #include <asm/elf.h>
  #endif

and that contains defines of all of the relocation symbols, like this:

  #define R_PPC_NONE              0

and if that file is included prior to including
include/llvm/Support/ELFRelocs/PowerPC*.def, which we cannot in general
prevent, the result will fail.

As it turns out, this happens when compiling
lld/unittests/DriverTests/GnuLdDriverTest.cpp under PPC64/Linux, because:

  lld/include/lld/ReaderWriter/ELFLinkingContext.h includes
  lld/unittests/DriverTests/DriverTest.h which includes
  utils/unittest/googletest/include/gtest/gtest.h which includes
  utils/unittest/googletest/include/gtest/internal/gtest-internal.h which includes
  /usr/include/sys/wait.h which includes
  /usr/include/signal.h which includes
  /usr/include/bits/sigcontext.h which includes
  /usr/include/asm/sigcontext.h which includes
  /usr/include/asm/elf.h

the test could be fixed to include ReaderWriter/ELFLinkingContext.h before
including unittests/DriverTests/DriverTest.h, but dealing with this in the
*.def files is a more-general solution that localizes the fix to the headers
instead of requiring changes to an unbounded number of other source files (both
in-tree and external).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248957 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] enable machine combiner reassociations for 256-bit vector logical integer insts

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248955 91177308-0d34-0410-b5e6-96231b3b80d8

[libFuzzer] Marking exported symbols as visible. Patch by Mike Aizatsky

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248954 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64] Remove an unnecessary run line and other cleanup. NFC.

Unscaled load/store combining has been enabled since the initial ARM64 port. No
need for a redundance run. Also, add CHECK-LABEL directives.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248945 91177308-0d34-0410-b5e6-96231b3b80d8

[SLP] Don't vectorize loads of non-packed types (like i1, i2).

Summary:
Given an array of i2 elements, 4 consecutive scalar loads will be lowered to
i8-sized loads and thus will access 4 consecutive bytes in memory. If we
vectorize these loads into a single <4 x i2> load, it'll access only 1 byte in
memory. Hence, we should prohibit vectorization in such cases.

PS: Initial patch was proposed by Arnold.

Reviewers: aschwaighofer, nadav, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13277

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248943 91177308-0d34-0410-b5e6-96231b3b80d8

Fix -Wsign-compare warning

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248942 91177308-0d34-0410-b5e6-96231b3b80d8

Move dw_op_minus test to DebugInfo/X86.

The test requires X86 target support, and checks the actual debug
info contents, including register numbers which would be different on
other platforms.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248938 91177308-0d34-0410-b5e6-96231b3b80d8

Fix debug info with SafeStack.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248933 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64] Remove an unnecessary restriction on pre-index instructions.

Previously, the index was constrained to the size of the memory operation for
no apparent reason. This change removes that constraint so that we can form
pre-index instructions with any valid offset.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248931 91177308-0d34-0410-b5e6-96231b3b80d8

DeadCodeElimination: rewrite to be faster

Same strategy as simplifyInstructionsInBlock. ~1/3 less time
on my test suite. This pass doesn't have many in-tree users,
but getting rid of an O(N^2) worst case and making it cleaner
should at least make it a viable alternative to ADCE, since
it's now consistently somewhat faster.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248927 91177308-0d34-0410-b5e6-96231b3b80d8

[PowerPC] Disable shrink wrapping

Shrink wrapping is causing a self-hosting failure on PPC64/Linux. Disable for
now until the problem can be fixed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248924 91177308-0d34-0410-b5e6-96231b3b80d8

SLPVectorizer: add a test to check if the minimum region size works.

This is an addition to rL248917.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248923 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] Support for ARMv6-Z / ARMv6-ZK missing

As Richard Barton observed at http://reviews.llvm.org/D12937#inline-107121
TargetParser in LLVM has insufficient support for ARMv6Z and ARMv6ZK.

In particular, there were no tests for TrustZone being supported in these
architectures.

The patch clears a FIXME: left by Saleem Abdulrasool in r201471, and fixes
his test case which hadn't really been testing what it was claiming to test.

Differential Revision: http://reviews.llvm.org/D13236

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248921 91177308-0d34-0410-b5e6-96231b3b80d8

SLPVectorizer: limit the scheduling region size per basic block.

Usually large blocks are not a problem. But if a large block (> 10k instructions)
contains many (potential) chains of vector instructions, and those chains are
spread over a wide range of instructions, then scheduling becomes a compile time problem.
This change introduces a limit for the accumulate scheduling region size of a block.
For real-world functions this limit will never be exceeded (it's about 10x larger than
the maximum value seen in the test-suite and external test suite).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248917 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64] Use helper function to improve readability. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248914 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] Teach how to convert SSSE3/AVX2 byte shuffles to builtin shuffles if the shuffle mask is constant.

This patch teaches InstCombiner how to convert a SSSE3/AVX2 byte shuffle to a
builtin shuffle if the mask is constant.

Converting byte shuffle intrinsic calls to builtin shuffles can help finding
more opportunities for combining shuffles later on in selection dag.

We may end up with byte shuffles with constant masks as the result of inlining.

Differential Revision: http://reviews.llvm.org/D13252

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248913 91177308-0d34-0410-b5e6-96231b3b80d8

[CMake] Make the bindir and libdir arguments to set_output_directory optional

When building a plugin against an installed LLVM toolchain using
add_llvm_loadable_module (in the documented manner) doesn't work as nothing sets
the *_OUTPUT_INTDIR variables causing an error when set_output_directory is
called. Making those arguments optional (causing the default output directory
to be used) fixes this.

Differential Revision: http://reviews.llvm.org/D13215

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248911 91177308-0d34-0410-b5e6-96231b3b80d8

Add support for sub-byte aligned writes to lib/Support/Endian.h

Summary:
As per Duncan's review for D12536, I extracted the sub-byte bit aligned
reading and writing code into lib/Support, and generalized it. Added calls from
BackpatchWord. Also added unittests.

Reviewers: dexonsmith

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13189

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248897 91177308-0d34-0410-b5e6-96231b3b80d8

Refactor computeKnownBits alignment handling code

Reviewed By: reames, hfinkel

Differential Revision: http://reviews.llvm.org/D12958

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248892 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM][NEON] Use address space in vld([1234]|[234]lane) and vst([1234]|[234]lane) instructions

This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234],
vst[234]lane ARM neon intrinsics and associates an address space with the
pointer that these intrinsics take. This changes, e.g.,

<2 x i32> @llvm.arm.neon.vld1.v2i32(i8*, i32)

to

<2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8*, i32)

This change ensures that address spaces are fully taken into account in the ARM
target during lowering of interleaved loads and stores.

Differential Revision: http://reviews.llvm.org/D12985

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248887 91177308-0d34-0410-b5e6-96231b3b80d8

[CMake] Adjust the variables set by LLVMConfig.cmake

When using LLVMConfig.cmake from an installed toolchain in order to build a
loadable pass using add_llvm_loadable_module LLVM_ENABLE_PLUGINS and
LLVM_PLUGIN_EXT must be set. Also make LLVM_DEFINITIONS be set to what it
actually is.

Differential Revision: http://reviews.llvm.org/D13214

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248884 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][XOP] Added support for the lowering of 128-bit vector shifts to XOP shift instructions

The XOP shifts just have logical/arithmetic versions and the left/right shifts are controlled by whether the value is positive/negative. Because of this I've added new X86ISD nodes instead of trying to force them to use the existing shift nodes.

Additionally Excavator cores (bdver4) support XOP and AVX2 - meaning that it should use the AVX2 shifts when it can and fall back to XOP in other cases.

Differential Revision: http://reviews.llvm.org/D8690

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248878 91177308-0d34-0410-b5e6-96231b3b80d8

InstrProf: Don't call std::unique twice here

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248872 91177308-0d34-0410-b5e6-96231b3b80d8

Add unittest for new samle profile format.

http://reviews.llvm.org/D13145

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248870 91177308-0d34-0410-b5e6-96231b3b80d8

http://reviews.llvm.org/D13145

Support hierarachical sample profile format.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248865 91177308-0d34-0410-b5e6-96231b3b80d8

[safestack] Fix a stupid mix-up in the direct-tls code path.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248863 91177308-0d34-0410-b5e6-96231b3b80d8

InstrProf: Add a missing const_cast from r248833

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248859 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU/SI: Don't set DATA_FORMAT if ADD_TID_ENABLE is set

to prevent setting a huge stride, because DATA_FORMAT has a different
meaning if ADD_TID_ENABLE is set.

This is a candidate for stable llvm 3.7.

Tested-and-Reviewed-by: Christian König <christian.koenig@amd.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248858 91177308-0d34-0410-b5e6-96231b3b80d8

[WinEH] Setup RBP correctly in Win64 funclet prologues

Previously local variable captures just didn't work in 64-bit. Now we
can access local variables more or less correctly.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248857 91177308-0d34-0410-b5e6-96231b3b80d8

[WinEH] Ensure that funclets obey the x64 ABI

The x64 ABI requires that epilogues do not contain code other than stack
adjustments and some limited control flow. However, we'd insert code to
initialize the return address after stack adjustments. Instead, insert
EAX/RAX with the current value before we create the stack adjustments in
the epilogue.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248839 91177308-0d34-0410-b5e6-96231b3b80d8

InstrProf: Support for value profiling in the indexed profile format

Add support to the indexed instrprof reader and writer for the format
that will be used for value profiling.

Patch by Betul Buyukkurt, with minor modifications.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248833 91177308-0d34-0410-b5e6-96231b3b80d8

HHVM calling conventions.

HHVM calling convention, hhvmcc, is used by HHVM JIT for
functions in translated cache. We currently support LLVM back end to
generate code for X86-64 and may support other architectures in the
future.

In HHVM calling convention any GP register could be used to pass and
return values, with the exception of R12 which is reserved for
thread-local area and is callee-saved. Other than R12, we always
pass RBX and RBP as args, which are our virtual machine's stack pointer
and frame pointer respectively.

When we enter translation cache via hhvmcc function, we expect
the stack to be aligned at 16 bytes, i.e. skewed by 8 bytes as opposed
to standard ABI alignment. This affects stack object alignment and stack
adjustments for function calls.

One extra calling convention, hhvm_ccc, is used to call C++ helpers from
HHVM's translation cache. It is almost identical to standard C calling
convention with an exception of first argument which is passed in RBP
(before we use RDI, RSI, etc.)

Differential Revision: http://reviews.llvm.org/D12681

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248832 91177308-0d34-0410-b5e6-96231b3b80d8

Fix test from r248825.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248827 91177308-0d34-0410-b5e6-96231b3b80d8