AVX-512: Fixed a bug in extracting subvector from v64i1

[oota-llvm.git] / docs / LangRef.rst
diff --git a/docs/LangRef.rst b/docs/LangRef.rst

index b33266af7c1cabc094f4b34bd9227a0883c49394..397d5fe3756734dfddf4f49209952b3f757878f4 100644 (file)
--- a/docs/LangRef.rst
+++ b/docs/LangRef.rst
@@ -1196,6 +1196,13 @@ example:
      computing edge weights, basic blocks post-dominated by a cold
      function call are also considered to be cold; and, thus, given low
      weight.
+``convergent``
+    This attribute indicates that the callee is dependent on a convergent
+    thread execution pattern under certain parallel execution models.
+    Transformations that are execution model agnostic may only move or
+    tranform this call if the final location is control equivalent to its
+    original position in the program, where control equivalence is defined as
+    A dominates B and B post-dominates A, or vice versa.
  ``inlinehint``
      This attribute indicates that the source code contained a hint that
      inlining this function is desirable (such as the "inline" keyword in
@@ -2961,7 +2968,7 @@ DIFile
  Files are sometimes used in ``scope:`` fields, and are the only valid target
  for ``file:`` fields.
  
-.. _DILocation:
+.. _DIBasicType:
  
  DIBasicType
  """""""""""
@@ -3225,6 +3232,8 @@ discriminate between control flow within a single block in the source language.
      !1 = !DILexicalBlockFile(scope: !0, file: !4, discriminator: 0)
      !2 = !DILexicalBlockFile(scope: !0, file: !4, discriminator: 1)
  
+.. _DILocation:
+
  DILocation
  """"""""""
  
@@ -3423,7 +3432,7 @@ For example,
      %2 = load float, float* %c, align 4, !alias.scope !5
      store float %2, float* %arrayidx.i2, align 4, !noalias !6
  
-    ; These two instructions don't alias (for domain !0, the set of scopes in
+    ; These two instructions may alias (for domain !0, the set of scopes in
      ; the !noalias list is not a superset of, or equal to, the scopes in the
      ; !alias.scope list):
      %2 = load float, float* %c, align 4, !alias.scope !6
@@ -5656,7 +5665,7 @@ Syntax:
  
  ::
  
-      <result> = load [volatile] <ty>, <ty>* <pointer>[, align <alignment>][, !nontemporal !<index>][, !invariant.load !<index>][, !nonnull !<index>]
+      <result> = load [volatile] <ty>, <ty>* <pointer>[, align <alignment>][, !nontemporal !<index>][, !invariant.load !<index>][, !nonnull !<index>][, !dereferenceable !<index>][, !dereferenceable_or_null !<index>]
        <result> = load atomic [volatile] <ty>* <pointer> [singlethread] <ordering>, align <alignment>
        !<index> = !{ i32 1 }
  
@@ -5721,6 +5730,25 @@ never be null.  This is analogous to the ''nonnull'' attribute
  on parameters and return values.  This metadata can only be applied
  to loads of a pointer type.
  
+The optional ``!dereferenceable`` metadata must reference a single
+metadata name ``<index>`` corresponding to a metadata node with one ``i64``
+entry. The existence of the ``!dereferenceable`` metadata on the instruction 
+tells the optimizer that the value loaded is known to be dereferenceable.
+The number of bytes known to be dereferenceable is specified by the integer 
+value in the metadata node. This is analogous to the ''dereferenceable'' 
+attribute on parameters and return values.  This metadata can only be applied 
+to loads of a pointer type.
+
+The optional ``!dereferenceable_or_null`` metadata must reference a single
+metadata name ``<index>`` corresponding to a metadata node with one ``i64``
+entry. The existence of the ``!dereferenceable_or_null`` metadata on the 
+instruction tells the optimizer that the value loaded is known to be either
+dereferenceable or null.
+The number of bytes known to be dereferenceable is specified by the integer 
+value in the metadata node. This is analogous to the ''dereferenceable_or_null'' 
+attribute on parameters and return values.  This metadata can only be applied 
+to loads of a pointer type.
+
  Semantics:
  """"""""""
  
@@ -9754,6 +9782,8 @@ intrinsic returns the executable address corresponding to ``tramp``
  after performing the required machine specific adjustments. The pointer
  returned can then be :ref:`bitcast and executed <int_trampoline>`.
  
+.. _int_mload_mstore:
+
  Masked Vector Load and Store Intrinsics
  ---------------------------------------
  
@@ -9776,13 +9806,13 @@ This is an overloaded intrinsic. The loaded data is a vector of any integer or f
  Overview:
  """""""""
  
-Reads a vector from memory according to the provided mask. The mask holds a bit for each vector lane, and is used to prevent memory accesses to the masked-off lanes. The masked-off lanes in the result vector are taken from the corresponding lanes in the passthru operand.
+Reads a vector from memory according to the provided mask. The mask holds a bit for each vector lane, and is used to prevent memory accesses to the masked-off lanes. The masked-off lanes in the result vector are taken from the corresponding lanes of the '``passthru``' operand.
  
  
  Arguments:
  """"""""""
  
-The first operand is the base pointer for the load. The second operand is the alignment of the source location. It must be a constant integer value. The third operand, mask, is a vector of boolean 'i1' values with the same number of elements as the return type. The fourth is a pass-through value that is used to fill the masked-off lanes of the result. The return type, underlying type of the base pointer and the type of passthru operand are the same vector types.
+The first operand is the base pointer for the load. The second operand is the alignment of the source location. It must be a constant integer value. The third operand, mask, is a vector of boolean values with the same number of elements as the return type. The fourth is a pass-through value that is used to fill the masked-off lanes of the result. The return type, underlying type of the base pointer and the type of the '``passthru``' operand are the same vector types.
  
  
  Semantics:
@@ -9841,6 +9871,115 @@ The result of this operation is equivalent to a load-modify-store sequence. Howe
         store <16 x float> %res, <16 x float>* %ptr, align 4
  
  
+Masked Vector Gather and Scatter Intrinsics
+-------------------------------------------
+
+LLVM provides intrinsics for vector gather and scatter operations. They are similar to :ref:`Masked Vector Load and Store <int_mload_mstore>`, except they are designed for arbitrary memory accesses, rather than sequential memory accesses. Gather and scatter also employ a mask operand, which holds one bit per vector element, switching the associated vector lane on or off. The memory addresses corresponding to the "off" lanes are not accessed. When all bits are off, no memory is accessed.
+
+.. _int_mgather:
+
+'``llvm.masked.gather.*``' Intrinsics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+This is an overloaded intrinsic. The loaded data are multiple scalar values of any integer or floating point data type gathered together into one vector.
+
+::
+
+      declare <16 x float> @llvm.masked.gather.v16f32 (<16 x float*> <ptrs>, i32 <alignment>, <16 x i1> <mask>, <16 x float> <passthru>)
+      declare <2 x double> @llvm.masked.gather.v2f64  (<2 x double*> <ptrs>, i32 <alignment>, <2 x i1>  <mask>, <2 x double> <passthru>)
+
+Overview:
+"""""""""
+
+Reads scalar values from arbitrary memory locations and gathers them into one vector. The memory locations are provided in the vector of pointers '``ptrs``'. The memory is accessed according to the provided mask. The mask holds a bit for each vector lane, and is used to prevent memory accesses to the masked-off lanes. The masked-off lanes in the result vector are taken from the corresponding lanes of the '``passthru``' operand.
+
+
+Arguments:
+""""""""""
+
+The first operand is a vector of pointers which holds all memory addresses to read. The second operand is an alignment of the source addresses. It must be a constant integer value. The third operand, mask, is a vector of boolean values with the same number of elements as the return type. The fourth is a pass-through value that is used to fill the masked-off lanes of the result. The return type, underlying type of the vector of pointers and the type of the '``passthru``' operand are the same vector types.
+
+
+Semantics:
+""""""""""
+
+The '``llvm.masked.gather``' intrinsic is designed for conditional reading of multiple scalar values from arbitrary memory locations in a single IR operation. It is useful for targets that support vector masked gathers and allows vectorizing basic blocks with data and control divergence. Other targets may support this intrinsic differently, for example by lowering it into a sequence of scalar load operations.
+The semantics of this operation are equivalent to a sequence of conditional scalar loads with subsequent gathering all loaded values into a single vector. The mask restricts memory access to certain lanes and facilitates vectorization of predicated basic blocks.
+
+
+::
+
+       %res = call <4 x double> @llvm.masked.gather.v4f64 (<4 x double*> %ptrs, i32 8, <4 x i1>%mask, <4 x double> <true, true, true, true>)
+
+       ;; The gather with all-true mask is equivalent to the following instruction sequence
+       %ptr0 = extractelement <4 x double*> %ptrs, i32 0
+       %ptr1 = extractelement <4 x double*> %ptrs, i32 1
+       %ptr2 = extractelement <4 x double*> %ptrs, i32 2
+       %ptr3 = extractelement <4 x double*> %ptrs, i32 3
+
+       %val0 = load double, double* %ptr0, align 8
+       %val1 = load double, double* %ptr1, align 8
+       %val2 = load double, double* %ptr2, align 8
+       %val3 = load double, double* %ptr3, align 8
+
+       %vec0    = insertelement <4 x double>undef, %val0, 0
+       %vec01   = insertelement <4 x double>%vec0, %val1, 1
+       %vec012  = insertelement <4 x double>%vec01, %val2, 2
+       %vec0123 = insertelement <4 x double>%vec012, %val3, 3
+
+.. _int_mscatter:
+
+'``llvm.masked.scatter.*``' Intrinsics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+This is an overloaded intrinsic. The data stored in memory is a vector of any integer or floating point data type. Each vector element is stored in an arbitrary memory addresses. Scatter with overlapping addresses is guaranteed to be ordered from least-significant to most-significant element.
+
+::
+
+       declare void @llvm.masked.scatter.v8i32 (<8 x i32>  <value>, <8 x i32*>  <ptrs>, i32 <alignment>,  <8 x i1>  <mask>)
+       declare void @llvm.masked.scatter.v16f32(<16 x i32> <value>, <16 x i32*> <ptrs>, i32 <alignment>,  <16 x i1> <mask>)
+
+Overview:
+"""""""""
+
+Writes each element from the value vector to the corresponding memory address. The memory addresses are represented as a vector of pointers. Writing is done according to the provided mask. The mask holds a bit for each vector lane, and is used to prevent memory accesses to the masked-off lanes.
+
+Arguments:
+""""""""""
+
+The first operand is a vector value to be written to memory. The second operand is a vector of pointers, pointing to where the value elements should be stored. It has the same underlying type as the value operand. The third operand is an alignment of the destination addresses. The fourth operand, mask, is a vector of boolean values. The types of the mask and the value operand must have the same number of vector elements.
+
+
+Semantics:
+""""""""""
+
+The '``llvm.masked.scatter``' intrinsics is designed for writing selected vector elements to arbitrary memory addresses in a single IR operation. The operation may be conditional, when not all bits in the mask are switched on. It is useful for targets that support vector masked scatter and allows vectorizing basic blocks with data and control divergency. Other targets may support this intrinsic differently, for example by lowering it into a sequence of branches that guard scalar store operations.
+
+::
+
+       ;; This instruction unconditionaly stores data vector in multiple addresses
+       call @llvm.masked.scatter.v8i32 (<8 x i32> %value, <8 x i32*> %ptrs, i32 4,  <8 x i1>  <true, true, .. true>)
+
+       ;; It is equivalent to a list of scalar stores
+       %val0 = extractelement <8 x i32> %value, i32 0
+       %val1 = extractelement <8 x i32> %value, i32 1
+       ..
+       %val7 = extractelement <8 x i32> %value, i32 7
+       %ptr0 = extractelement <8 x i32*> %ptrs, i32 0
+       %ptr1 = extractelement <8 x i32*> %ptrs, i32 1
+       ..
+       %ptr7 = extractelement <8 x i32*> %ptrs, i32 7
+       ;; Note: the order of the following stores is important when they overlap:
+       store i32 %val0, i32* %ptr0, align 4
+       store i32 %val1, i32* %ptr1, align 4
+       ..
+       store i32 %val7, i32* %ptr7, align 4
+
+
  Memory Use Markers
  ------------------