//===---------------------------------------------------------------------===//
-How about intrinsics? An example is:
- *res = _mm_mulhi_epu16(*A, _mm_mul_epu32(*B, *C));
-
-compiles to
- pmuludq (%eax), %xmm0
- movl 8(%esp), %eax
- movdqa (%eax), %xmm1
- pmulhuw %xmm0, %xmm1
-
-The transformation probably requires a X86 specific pass or a DAG combiner
-target specific hook.
-
-//===---------------------------------------------------------------------===//
-
In many cases, LLVM generates code like this:
_test:
//===---------------------------------------------------------------------===//
-Start using the flags more. For example, compile:
+Use the FLAGS values from arithmetic instructions more. For example, compile:
int add_zf(int *x, int y, int a, int b) {
if ((*x += y) == 0)
movl %ecx, %eax
ret
-and:
-
-int add_zf(int *x, int y, int a, int b) {
- if ((*x + y) < 0)
- return a;
- else
- return b;
-}
-
-to:
-
-add_zf:
- addl (%rdi), %esi
- movl %edx, %eax
- cmovns %ecx, %eax
- ret
-
-instead of:
-
-_add_zf:
- addl (%rdi), %esi
- testl %esi, %esi
- cmovs %edx, %ecx
- movl %ecx, %eax
- ret
+As another example, compile function f2 in test/CodeGen/X86/cmp-test.ll
+without a test instruction.
//===---------------------------------------------------------------------===//
//===---------------------------------------------------------------------===//
-We need to teach the codegen to convert two-address INC instructions to LEA
-when the flags are dead (likewise dec). For example, on X86-64, compile:
-
-int foo(int A, int B) {
- return A+1;
-}
-
-to:
-
-_foo:
- leal 1(%edi), %eax
- ret
-
-instead of:
-
-_foo:
- incl %edi
- movl %edi, %eax
- ret
-
-Another example is:
-
-;; X's live range extends beyond the shift, so the register allocator
-;; cannot coalesce it with Y. Because of this, a copy needs to be
-;; emitted before the shift to save the register value before it is
-;; clobbered. However, this copy is not needed if the register
-;; allocator turns the shift into an LEA. This also occurs for ADD.
-
-; Check that the shift gets turned into an LEA.
-; RUN: llvm-as < %s | llc -march=x86 -x86-asm-syntax=intel | \
-; RUN: not grep {mov E.X, E.X}
-
-@G = external global i32 ; <i32*> [#uses=3]
-
-define i32 @test1(i32 %X, i32 %Y) {
- %Z = add i32 %X, %Y ; <i32> [#uses=1]
- volatile store i32 %Y, i32* @G
- volatile store i32 %Z, i32* @G
- ret i32 %X
-}
-
-define i32 @test2(i32 %X) {
- %Z = add i32 %X, 1 ; <i32> [#uses=1]
- volatile store i32 %Z, i32* @G
- ret i32 %X
-}
-
-//===---------------------------------------------------------------------===//
-
Sometimes it is better to codegen subtractions from a constant (e.g. 7-x) with
a neg instead of a sub instruction. Consider:
ret
-//===---------------------------------------------------------------------===//
-
-Re-materialize MOV32r0 etc. with xor instead of changing them to moves if the
-condition register is dead. xor reg reg is shorter than mov reg, #0.
-
//===---------------------------------------------------------------------===//
The following code:
cmpl $150, %edi
jne LBB1_1 ## bb1
+The issue is that we hoist the cast of "scaler" to long long outside of the
+loop, the value comes into the loop as two values, and
+RegsForValue::getCopyFromRegs doesn't know how to put an AssertSext on the
+constructed BUILD_PAIR which represents the cast value.
+
//===---------------------------------------------------------------------===//
Test instructions can be eliminated by using EFLAGS values from arithmetic
//===---------------------------------------------------------------------===//
-It looks like we don't have patterns (or they aren't matching) for adc with
-immediate:
+Re-implement atomic builtins __sync_add_and_fetch() and __sync_sub_and_fetch
+properly.
-define i64 @f1(i64 %a) nounwind {
- %tmp = sub i64 %a, 734439407618
- ret i64 %tmp
-}
-$ llvm-as < t.ll | llc -march=x86
+When the return value is not used (i.e. only care about the value in the
+memory), x86 does not have to use add to implement these. Instead, it can use
+add, sub, inc, dec instructions with the "lock" prefix.
-_f1:
- movl 4(%esp), %eax
- addl $4294967294, %eax
- movl $4294967124, %edx
- adcl 8(%esp), %edx
- ret
+This is currently implemented using a bit of instruction selection trick. The
+issue is the target independent pattern produces one output and a chain and we
+want to map it into one that just output a chain. The current trick is to select
+it into a MERGE_VALUES with the first definition being an implicit_def. The
+proper solution is to add new ISD opcodes for the no-output variant. DAG
+combiner can then transform the node before it gets to target node selection.
-There is no need to clobber %edx there.
+Problem #2 is we are adding a whole bunch of x86 atomic instructions when in
+fact these instructions are identical to the non-lock versions. We need a way to
+add target specific information to target nodes and have this information
+carried over to machine instructions. Asm printer (or JIT) can use this
+information to add the "lock" prefix.
//===---------------------------------------------------------------------===//