From 919d37151ae021eb419d69f5514f3bf8815a980b Mon Sep 17 00:00:00 2001 From: Reid Spencer Date: Tue, 15 Aug 2006 03:32:10 +0000 Subject: [PATCH] Rearrange things for clarity, don't talk about "dereferencing" when we shouldn't, and add a better example for one of the questions. Thanks to Chris Lattner for these suggestions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@29691 91177308-0d34-0410-b5e6-96231b3b80d8 --- docs/GetElementPtr.html | 138 ++++++++++++++++++++++++++-------------- 1 file changed, 90 insertions(+), 48 deletions(-) diff --git a/docs/GetElementPtr.html b/docs/GetElementPtr.html index ac910887a6a..99319a49924 100644 --- a/docs/GetElementPtr.html +++ b/docs/GetElementPtr.html @@ -56,15 +56,92 @@ this leads to the following questions, all of which are answered in the following sections.

    +
  1. What is the first index of the GEP instruction? +
  2. Why is the extra 0 index required?
  3. What is dereferenced by GEP?
  4. -
  5. Why can you index through the first pointer but not - subsequent ones?
  6. Why don't GEP x,0,0,1 and GEP x,1 alias?
  7. Why do GEP x,1,0,0 and GEP x,1 alias?
+ +
+ What is the first index of the GEP instruction? +
+
+

Quick answer: Because its already present.

+

Having understood the previous question, a new + question then arises:

+
Why is it okay to index through the first pointer, but + subsequent pointers won't be dereferenced?
+

The answer is simply because memory does not have to be accessed to + perform the computation. The first operand to the GEP instruction must be a + value of a pointer type. The value of the pointer is provided directly to + the GEP instruction without any need for accessing memory. It must, + therefore be indexed like any other operand. Consider this example:

+
+  struct munger_struct {
+    int f1;
+    int f2;
+  };
+  void munge(struct munger_struct *P)
+  {
+    P[0].f1 = P[1].f1 + P[2].f2;
+  }
+  ...
+  complex Array[3];
+  ...
+  munge(Array);
+

In this "C" example, the front end compiler (llvm-gcc) will generate three + GEP instructions for the three indices through "P" in the assignment + statement. The function argument P will be the first operand of each + of these GEP instructions. The second operand will be the field offset into + the struct munger_struct type, for either the f1 or + f2 field. So, in LLVM assembly the munge function looks + like:

+
+  void %munge(%struct.munger_struct* %P) {
+  entry:
+    %tmp = getelementptr %struct.munger_struct* %P, int 1, uint 0
+    %tmp = load int* %tmp
+    %tmp6 = getelementptr %struct.munger_struct* %P, int 2, uint 1
+    %tmp7 = load int* %tmp6
+    %tmp8 = add int %tmp7, %tmp
+    %tmp9 = getelementptr %struct.munger_struct* %P, int 0, uint 0
+    store int %tmp8, int* %tmp9
+    ret void
+  }
+

In each case the first operand is the pointer through which the GEP + instruction starts. The same is true whether the first operand is an + argument, allocated memory, or a global variable.

+

To make this clear, let's consider a more obtuse example:

+
+  %MyVar = unintialized global int
+  ...
+  %idx1 = getelementptr int* %MyVar, long 0
+  %idx2 = getelementptr int* %MyVar, long 1
+  %idx3 = getelementptr int* %MyVar, long 2
+

These GEP instructions are simply making address computations from the + base address of MyVar. They compute, as follows (using C syntax): +

+ +

Since the type int is known to be four bytes long, the indices + 0, 1 and 2 translate into memory offsets of 0, 4, and 8, respectively. No + memory is accessed to make these computations because the address of + %MyVar is passed directly to the GEP instructions.

+

The obtuse part of this example is in the cases of %idx2 and + %idx3. They result in the computation of addresses that point to + memory past the end of the %MyVar global, which is only one + int long, not three ints long. While this is legal in LLVM, + it is inadvisable because any load or store with the pointer that results + from these GEP instructions would produce undefined results.

+
+
Why is the extra 0 index required? @@ -81,7 +158,7 @@

The GEP above yields an int* by indexing the int typed field of the structure %MyStruct. When people first look at it, they wonder why the long 0 index is needed. However, a closer inspection - of how globals and GEPs work reveals the need. Becoming aware of the following + of how globals and GEPs work reveals the need. Becoming aware of the following facts will dispell the confusion:

  1. The type of %MyStruct is not { float*, int } @@ -91,8 +168,11 @@
  2. Point #1 is evidenced by noticing the type of the first operand of the GEP instruction (%MyStruct) which is { float*, int }*.
  3. -
  4. The first index, long 0 is required to dereference the - pointer associated with %MyStruct.
  5. +
  6. The first index, long 0 is required to step over the global + variable %MyStruct. Since the first argument to the GEP + instruction must always be a value of pointer type, the first index + steps through that pointer. A value of 0 means 0 elements offset from that + pointer.
  7. The second index, ubyte 1 selects the second field of the structure (the int).
@@ -105,8 +185,9 @@

Quick answer: nothing.

The GetElementPtr instruction dereferences nothing. That is, it doesn't - access memory in any way. That's what the Load instruction is for. GEP is - only involved in the computation of addresses. For example, consider this:

+ access memory in any way. That's what the Load and Store instructions are for. + GEP is only involved in the computation of addresses. For example, consider + this:

   %MyVar = uninitialized global { [40 x int ]* }
   ...
@@ -137,45 +218,6 @@
   array there.

- - -
-

Quick answer: Because its already present.

-

Having understood the previous question, a new - question then arises:

-
Why is it okay to index through the first pointer, but - subsequent pointers won't be dereferenced?
-

The answer is simply because - memory does not have to be accessed to perform the computation. The first - operand to the GEP instruction must be a value of a pointer type. The value - of the pointer is provided directly to the GEP instruction without any need - for accessing memory. It must, therefore be indexed like any other operand. - Consider this example:

-
-  %MyVar = unintialized global int
-  ...
-  %idx1 = getelementptr int* %MyVar, long 0
-  %idx2 = getelementptr int* %MyVar, long 1
-  %idx3 = getelementptr int* %MyVar, long 2
-

These GEP instructions are simply making address computations from the - base address of MyVar. They compute, as follows (using C syntax):

-
    -
  • idx1 = &MyVar + 0
  • -
  • idx2 = &MyVar + 4
  • -
  • idx3 = &MyVar + 8
  • -
-

Since the type int is known to be four bytes long, the indices - 0, 1 and 2 translate into memory offsets of 0, 4, and 8, respectively. No - memory is accessed to make these computations because the address of - %MyVar is passed directly to the GEP instructions.

-

Note that the cases of %idx2 and %idx3 are a bit silly. - They are computing addresses of something of unknown type (and thus - potentially breaking type safety) because %MyVar is only one - integer long.

-
-
Why don't GEP x,0,0,1 and GEP x,1 alias? @@ -187,7 +229,7 @@ computation diverges with that index. Consider this example:

   %MyVar = global { [10 x int ] }
-  %idx1 = getlementptr { [10 x int ] }* %MyVar, long 0, byte 0, long 1
+  %idx1 = getlementptr { [10 x int ] }* %MyVar, long 0, ubyte 0, long 1
   %idx2 = getlementptr { [10 x int ] }* %MyVar, long 1

In this example, idx1 computes the address of the second integer in the array that is in the structure in %MyVar, that is MyVar+4. The @@ -210,7 +252,7 @@ the type. Consider this example:

   %MyVar = global { [10 x int ] }
-  %idx1 = getlementptr { [10 x int ] }* %MyVar, long 1, byte 0, long 0
+  %idx1 = getlementptr { [10 x int ] }* %MyVar, long 1, ubyte 0, long 0
   %idx2 = getlementptr { [10 x int ] }* %MyVar, long 1

In this example, the value of %idx1 is %MyVar+40 and its type is int*. The value of %idx2 is also -- 2.34.1