From: Benjamin Kramer Date: Mon, 19 Mar 2012 00:43:34 +0000 (+0000) Subject: Add a note for -ffast-math optimization of vector norm. X-Git-Url: http://plrg.eecs.uci.edu/git/?a=commitdiff_plain;h=8118c94a55b7e3d6bcd43b4a043c922d8e20a8aa;p=oota-llvm.git Add a note for -ffast-math optimization of vector norm. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153031 91177308-0d34-0410-b5e6-96231b3b80d8 --- diff --git a/lib/Target/X86/README-SSE.txt b/lib/Target/X86/README-SSE.txt index a581993c3c6..624e56fa0f6 100644 --- a/lib/Target/X86/README-SSE.txt +++ b/lib/Target/X86/README-SSE.txt @@ -922,3 +922,22 @@ _test2: ## @test2 The insertps's of $0 are pointless complex copies. //===---------------------------------------------------------------------===// + +[UNSAFE FP] + +void foo(double, double, double); +void norm(double x, double y, double z) { + double scale = __builtin_sqrt(x*x + y*y + z*z); + foo(x/scale, y/scale, z/scale); +} + +We currently generate an sqrtsd and 3 divsd instructions. This is bad, fp div is +slow and not pipelined. In -ffast-math mode we could compute "1.0/scale" first +and emit 3 mulsd in place of the divs. This can be done as a target-independent +transform. + +If we're dealing with floats instead of doubles we could even replace the sqrtss +and inversion with an rsqrtss instruction, which computes 1/sqrt faster at the +cost of reduced accuracy. + +//===---------------------------------------------------------------------===//