From ef25bf0438feb1d21001219d3569f8d7f666940b Mon Sep 17 00:00:00 2001 From: Richard Sandiford Date: Wed, 15 May 2013 12:53:31 +0000 Subject: [PATCH] [SystemZ] Add more future work items to the README Based on an analysis by Ulrich Weigand. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181882 91177308-0d34-0410-b5e6-96231b3b80d8 --- lib/Target/SystemZ/README.txt | 98 ++++++++++++++++++++++++++++++++--- 1 file changed, 91 insertions(+), 7 deletions(-) diff --git a/lib/Target/SystemZ/README.txt b/lib/Target/SystemZ/README.txt index d1f56a49168..8f5a5476b41 100644 --- a/lib/Target/SystemZ/README.txt +++ b/lib/Target/SystemZ/README.txt @@ -29,17 +29,44 @@ to load 103. This seems to be a general target-independent problem. -- -The tuning of the choice between Load Address (LA) and addition in +The tuning of the choice between LOAD ADDRESS (LA) and addition in SystemZISelDAGToDAG.cpp is suspect. It should be tweaked based on performance measurements. -- +We don't support tail calls at present. + +-- + +We don't support prefetching yet. + +-- + There is no scheduling support. -- -We don't use the Branch on Count or Branch on Index families of instruction. +We don't use the BRANCH ON COUNT or BRANCH ON INDEX families of instruction. + +-- + +We might want to use BRANCH ON CONDITION for conditional indirect calls +and conditional returns. + +-- + +We don't use the combined COMPARE AND BRANCH instructions. Using them +would require a change to the way we handle out-of-range branches. +At the moment, we start with 32-bit forms like BRCL and shorten them +to forms like BRC where possible, but COMPARE AND BRANCH does not have +a 32-bit form. + +-- + +We should probably model just CC, not the PSW as a whole. Strictly +speaking, every instruction changes the PSW since the PSW contains the +current instruction address. -- @@ -54,7 +81,30 @@ equality after an integer comparison, etc. -- -We don't optimize string and block memory operations. +We don't use the LOAD AND TEST or TEST DATA CLASS instructions. + +-- + +We could use the generic floating-point forms of LOAD COMPLEMENT, +LOAD NEGATIVE and LOAD POSITIVE in cases where we don't need the +condition codes. For example, we could use LCDFR instead of LCDBR. + +-- + +We don't optimize block memory operations. + +It's definitely worth using things like MVC, CLC, NC, XC and OC with +constant lengths. MVCIN may be worthwhile too. + +We should probably implement things like memcpy using MVC with EXECUTE. +Likewise memcmp and CLC. MVCLE and CLCLE could be useful too. + +-- + +We don't optimize string operations. + +MVST, CLST, SRST and CUSE could be useful here. Some of the TRANSLATE +family might be too, although they are probably more difficult to exploit. -- @@ -63,9 +113,33 @@ conventions require f128s to be returned by invisible reference. -- +ADD LOGICAL WITH SIGNED IMMEDIATE could be useful when we need to +produce a carry. SUBTRACT LOGICAL IMMEDIATE could be useful when we +need to produce a borrow. (Note that there are no memory forms of +ADD LOGICAL WITH CARRY and SUBTRACT LOGICAL WITH BORROW, so the high +part of 128-bit memory operations would probably need to be done +via a register.) + +-- + +We don't use the halfword forms of LOAD REVERSED and STORE REVERSED +(LRVH and STRVH). + +-- + +We could take advantage of the various ... UNDER MASK instructions, +such as ICM and STCM. + +-- + +We could make more use of the ROTATE AND ... SELECTED BITS instructions. +At the moment we only use RISBG, and only then for subword atomic operations. + +-- + DAGCombiner can detect integer absolute, but there's not yet an associated -ISD opcode. We could add one and implement it using Load Positive. -Negated absolutes could use Load Negative. +ISD opcode. We could add one and implement it using LOAD POSITIVE. +Negated absolutes could use LOAD NEGATIVE. -- @@ -142,5 +216,15 @@ See CodeGen/SystemZ/alloca-01.ll for an example. -- Atomic loads and stores use the default compare-and-swap based implementation. -This is probably much too conservative in practice, and the overhead is -especially bad for 8- and 16-bit accesses. +This is much too conservative in practice, since the architecture guarantees +that 1-, 2-, 4- and 8-byte loads and stores to aligned addresses are +inherently atomic. + +-- + +If needed, we can support 16-byte atomics using LPQ, STPQ and CSDG. + +-- + +We might want to model all access registers and use them to spill +32-bit values. -- 2.34.1