AMDGPU/SI: Fix a GPU hang with POS_W_FLOAT enabled Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16037 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@257625 91177308-0d34-0410-b5e6-96231b3b80d8
AMDGPU/SI: Add s_waitcnt at the end of non-void functions Summary: v2: Make ReturnsVoid private, so that I can another 8 lines of code and look more productive. Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16034 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@257622 91177308-0d34-0410-b5e6-96231b3b80d8
AMDGPU/SI: Add support for non-void functions Summary: Return values can be stored in SGPRs (i32) and VGPRs (f32). This will be used by functions which expect some bytecode or other binary to be appended at the end. It allows defining in which registers the return values will be stored. v2: don't do this for compute shaders Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16033 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@257621 91177308-0d34-0410-b5e6-96231b3b80d8
AMDGPU/SI: Allow any number of PS inputs Summary: With the ability to concatenate shader binaries, the limit of 15 no longer applies. Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16031 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@257592 91177308-0d34-0410-b5e6-96231b3b80d8
AMDGPU/SI: Add new target attribute InitialPSInputAddr Summary: This allows Mesa to pass initial SPI_PS_INPUT_ADDR to LLVM. The register assigns VGPR locations to PS inputs, while the ENA register determines whether or not they are loaded. Mesa needs to set some inputs as not-movable, so that a pixel shader prolog binary appended at the beginning can assume where some inputs are. v2: Make PSInputAddr private, because there is never enough silly getters and setters for people to read. Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16030 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@257591 91177308-0d34-0410-b5e6-96231b3b80d8
AMDGPU: Fix crash with dispatch.ptr intrinsic with non-HSA target It might be better to let this be a select failure instead. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@257386 91177308-0d34-0410-b5e6-96231b3b80d8
AMDGPU: Pattern match ffbh pattern to instruction. The hardware instruction's output on 0 is -1 rather than 32. Eliminate a test and select to -1. This removes an extra instruction from the compatability function with HSAIL's firstbit instruction. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@257352 91177308-0d34-0410-b5e6-96231b3b80d8
AMDGPU: Remove dead target dag combine git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@257344 91177308-0d34-0410-b5e6-96231b3b80d8
AMDGPU/SI: Select constant loads with non-uniform addresses to MUBUF instructions Summary: We were previously selecting all constant loads to SMRD instructions and legalizing the SMRDs with non-uniform addresses during the SIFixSGPRCopesPass. This new solution is more simple and also generates much better code, because the instruction selector is able to take advantage of all the MUBUF addressing modes that are legalization pass wasn't able to. We also no longer need to generate v_add_* instructions when we have a uniform pointer and a non-uniform offset, as this is now folded into the MUBUF instruction during instruction selection. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15425 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255672 91177308-0d34-0410-b5e6-96231b3b80d8
AMDGPU/SI: Add llvm.amdgcn.v.interp.p[12] intrinsics Summary: These are meant to be used instead of the llvm.SI.fs.interp intrinsic which will be deprecated at some point. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15474 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255651 91177308-0d34-0410-b5e6-96231b3b80d8
AMDGPU: Use generic bitreverse intrinsic Also fix bug in vector legalization for bitreverse. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255512 91177308-0d34-0410-b5e6-96231b3b80d8
AMDGPU/SI: Emit constant arrays in the .text section Summary: This allows us to remove the END_OF_TEXT_LABEL hack we had been using and simplifies the fixups used to compute the address of constant arrays. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15257 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255204 91177308-0d34-0410-b5e6-96231b3b80d8
AMDGPU/SI: Add support for sgpr and vgpr inline assembly constraints Summary: The 's' constraint represents sgprs and the 'v' constraint represents vgprs. Reviewers: arsenm, echristo Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15342 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255203 91177308-0d34-0410-b5e6-96231b3b80d8
AMDGPU: Implement isNoopAddrSpaceCast git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254468 91177308-0d34-0410-b5e6-96231b3b80d8
AMDGPU/SI: Remove REGISTER_STORE/REGISTER_LOAD code which is now dead Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15050 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254427 91177308-0d34-0410-b5e6-96231b3b80d8
AMDGPU: Fix unused function git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254333 91177308-0d34-0410-b5e6-96231b3b80d8
AMDGPU: Rework how private buffer passed for HSA If we know we have stack objects, we reserve the registers that the private buffer resource and wave offset are passed and use them directly. If not, reserve the last 5 SGPRs just in case we need to spill. After register allocation, try to pick the next available registers instead of the last SGPRs, and then insert copies from the inputs to the reserved registers in the progloue. This also only selectively enables all of the input registers which are really required instead of always enabling them. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254331 91177308-0d34-0410-b5e6-96231b3b80d8
AMDGPU: Rename enums to be consistent with HSA code object terminology git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254330 91177308-0d34-0410-b5e6-96231b3b80d8
AMDGPU: Remove SIPrepareScratchRegs It does not work because of emergency stack slots. This pass was supposed to eliminate dummy registers for the spill instructions, but the register scavenger can introduce more during PrologEpilogInserter, so some would end up left behind if they were needed. The potential for spilling the scratch resource descriptor and offset register makes doing something like this overly complicated. Reserve registers to use for the resource descriptor and use them directly in eliminateFrameIndex. Also removes creating another scratch resource descriptor when directly selecting scratch MUBUF instructions. The choice of which registers are reserved is temporary. For now it attempts to pick the next available registers after the user and system SGPRs. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254329 91177308-0d34-0410-b5e6-96231b3b80d8
AMDGPU: Use assert zext for workgroup sizes git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254328 91177308-0d34-0410-b5e6-96231b3b80d8