Use rsqrt (X86) to speed up reciprocal square root calcs
authorSanjay Patel <spatel@rotateright.com>
Fri, 24 Oct 2014 17:02:16 +0000 (17:02 +0000)
committerSanjay Patel <spatel@rotateright.com>
Fri, 24 Oct 2014 17:02:16 +0000 (17:02 +0000)
commita46f06efe217b3c3328d9ececf8de0c8c7dee446
tree4455baca9177e6b4a2a3c607aa4947173660c302
parent2992ea0cb5437b2eeddd75e738b3651bd7ba6cea
Use rsqrt (X86) to speed up reciprocal square root calcs

This is a first step for generating SSE rsqrt instructions for
reciprocal square root calcs when fast-math is allowed.

For now, be conservative and only enable this for AMD btver2
where performance improves significantly - for example, 29%
on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c
(if we convert the data type to single-precision float).

This patch adds a two constant version of the Newton-Raphson
refinement algorithm to DAGCombiner that can be selected by any target
via a parameter returned by getRsqrtEstimate()..

See PR20900 for more details:
http://llvm.org/bugs/show_bug.cgi?id=20900

Differential Revision: http://reviews.llvm.org/D5658

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220570 91177308-0d34-0410-b5e6-96231b3b80d8
include/llvm/Target/TargetLowering.h
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
lib/Target/PowerPC/PPCISelLowering.cpp
lib/Target/PowerPC/PPCISelLowering.h
lib/Target/X86/X86.td
lib/Target/X86/X86ISelLowering.cpp
lib/Target/X86/X86ISelLowering.h
lib/Target/X86/X86Subtarget.cpp
lib/Target/X86/X86Subtarget.h
test/CodeGen/X86/sqrt-fastmath.ll