From f390c22a9378fdad6919213ff4e23cdc7cf9c2d3 Mon Sep 17 00:00:00 2001 From: Tim Northover Date: Thu, 16 Jan 2014 12:29:55 +0000 Subject: [PATCH] ReMat: fix overly cavalier attitude to sub-register indices There are two attempted optimisations in reMaterializeTrivialDef, trying to avoid promoting the size of a register too much when rematerializing. Unfortunately, both appear to be flawed. First, we see if the original register would have worked, but this is inadequate. Consider: v1 = SOMETHING (v1 is QQ) v2:Q0 = COPY v1:Q1 (v1, v2 are QQ) ... uses of v2 In this case even though v2 *could* be used directly as the output of SOMETHING, this would set the wrong bits of the QQ register involved. The correct rematerialization must be: v2:Q0_Q1 = SOMETHING (v2 promoted to QQQ) ... uses of v2:Q1_Q2 For the second optimisation, if the correct remat is "v2:idx = SOMETHING" then we can't necessarily expect v2 itself to be valid for SOMETHING, but we do try to hunt for a class between v1 and v2 that works. Unfortunately, this is also wrong: v1 = SOMETHING (v1 is QQ) v2:Q0_Q1 = COPY v1 (v1 is QQ, v2 is QQQ) ... uses of v2 as a QQQ The canonical rematerialization here is "v2:Q0_Q1 = SOMETHING". However current logic would decide that v2 could be a QQ (no interest is taken in later uses). This patch, therefore, always accepts the widened register class without trying to be clever. Generally there is no penalty to this (e.g. in the common GR32 < GR64 case, expanding the width doesn't matter because it's not like you were going to do anything else with the high bits of a GR32 register). It can increase register pressure in cases like the ARM VFP regs though (multiple non-overlapping but equivalent subregisters). This situation can be spotted by the fact that both source and destination in the not-quite-coalesced pair have a sub-register index and rematerialisation is skipped in that situation. Unfortunately, no in-tree targets actually expose this as far as I can tell (there are so few isAsCheapAsAMove instructions for it to trigger on) so I've been unable to produce a test. It was exposed in our ARM64 SPEC tests though, and I will be adding a test there that we should be able to contribute soon(TM). rdar://problem/15775279 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199376 91177308-0d34-0410-b5e6-96231b3b80d8 --- lib/CodeGen/RegisterCoalescer.cpp | 45 +++++++++++++++---------------- 1 file changed, 21 insertions(+), 24 deletions(-) diff --git a/lib/CodeGen/RegisterCoalescer.cpp b/lib/CodeGen/RegisterCoalescer.cpp index efe7161e16f..6fc4bffdcf9 100644 --- a/lib/CodeGen/RegisterCoalescer.cpp +++ b/lib/CodeGen/RegisterCoalescer.cpp @@ -769,6 +769,14 @@ bool RegisterCoalescer::reMaterializeTrivialDef(CoalescerPair &CP, if (DstOperand.getSubReg() && !DstOperand.isUndef()) return false; + // If both SrcIdx and DstIdx are set, correct rematerialization would widen + // the register substantially (beyond both source and dest size). This is bad + // for performance since it can cascade through a function, introducing many + // extra spills and fills (e.g. ARM can easily end up copying QQQQPR registers + // around after a few subreg copies). + if (SrcIdx && DstIdx) + return false; + const TargetRegisterClass *DefRC = TII->getRegClass(MCID, 0, TRI, *MF); if (!DefMI->isImplicitDef()) { if (TargetRegisterInfo::isPhysicalRegister(DstReg)) { @@ -816,30 +824,19 @@ bool RegisterCoalescer::reMaterializeTrivialDef(CoalescerPair &CP, } if (TargetRegisterInfo::isVirtualRegister(DstReg)) { - unsigned NewIdx; - const TargetRegisterClass *RCForInst; - - if (MRI->constrainRegClass(DstReg, DefRC)) { - // The materialized instruction is quite capable of setting DstReg - // directly, but it may still have a now-trivial subregister index which - // we should clear. - NewMI->getOperand(0).setSubReg(0); - } else if ((NewIdx = NewMI->getOperand(0).getSubReg()) && - (RCForInst = TRI->getMatchingSuperRegClass( - MRI->getRegClass(DstReg), DefRC, NewIdx))) { - // The subreg index on NewMI is essential; we still have to make sure - // DstReg:idx is in a class that NewMI can use. - MRI->constrainRegClass(DstReg, RCForInst); - } else { - // DstReg is actually incompatible with NewMI, we have to move to a - // super-reg's class. This could come from a sequence like: - // GR32 = MOV32r0 - // GR8 = COPY GR32:sub_8 - MRI->setRegClass(DstReg, CP.getNewRC()); - updateRegDefsUses(DstReg, DstReg, DstIdx); - NewMI->getOperand(0).setSubReg( - TRI->composeSubRegIndices(SrcIdx, DefMI->getOperand(0).getSubReg())); - } + const TargetRegisterClass *NewRC = CP.getNewRC(); + unsigned NewIdx = NewMI->getOperand(0).getSubReg(); + + if (NewIdx) + NewRC = TRI->getMatchingSuperRegClass(NewRC, DefRC, NewIdx); + else + NewRC = TRI->getCommonSubClass(NewRC, DefRC); + + assert(NewRC && "subreg chosen for remat incompatible with instruction"); + MRI->setRegClass(DstReg, NewRC); + + updateRegDefsUses(DstReg, DstReg, DstIdx); + NewMI->getOperand(0).setSubReg(NewIdx); } else if (NewMI->getOperand(0).getReg() != CopyDstReg) { // The New instruction may be defining a sub-register of what's actually // been asked for. If so it must implicitly define the whole thing. -- 2.34.1