X-Git-Url: http://plrg.eecs.uci.edu/git/?a=blobdiff_plain;f=docs%2FInAlloca.rst;h=c7609cddb4f9c71f83e5276b8443cfb8fe7389c4;hb=d0518569ecfe74ad7d90b26cee2d8ebebc1bdb93;hp=b1779874e0e2a0436fdf962bc85782ed9190365b;hpb=4b70bfc905f3ac68a8429f9fe0016e30433b3b0c;p=oota-llvm.git diff --git a/docs/InAlloca.rst b/docs/InAlloca.rst index b1779874e0e..c7609cddb4f 100644 --- a/docs/InAlloca.rst +++ b/docs/InAlloca.rst @@ -5,21 +5,19 @@ Design and Usage of the InAlloca Attribute Introduction ============ -.. Warning:: This feature is unstable and not fully implemented. - -The :ref:`attr_inalloca` attribute is designed to allow taking the -address of an aggregate argument that is being passed by value through -memory. Primarily, this feature is required for compatibility with the -Microsoft C++ ABI. Under that ABI, class instances that are passed by -value are constructed directly into argument stack memory. Prior to the -addition of inalloca, calls in LLVM were indivisible instructions. -There was no way to perform intermediate work, such as object -construction, between the first stack adjustment and the final control -transfer. With inalloca, each argument is modelled as an alloca, which -can be stored to independently of the call. Unfortunately, this -complicated feature comes with a large set of restrictions designed to -bound the lifetime of the argument memory around the call, which are -explained in this document. +The :ref:`inalloca ` attribute is designed to allow +taking the address of an aggregate argument that is being passed by +value through memory. Primarily, this feature is required for +compatibility with the Microsoft C++ ABI. Under that ABI, class +instances that are passed by value are constructed directly into +argument stack memory. Prior to the addition of inalloca, calls in LLVM +were indivisible instructions. There was no way to perform intermediate +work, such as object construction, between the first stack adjustment +and the final control transfer. With inalloca, all arguments passed in +memory are modelled as a single alloca, which can be stored to prior to +the call. Unfortunately, this complicated feature comes with a large +set of restrictions designed to bound the lifetime of the argument +memory around the call. For now, it is recommended that frontends and optimizers avoid producing this construct, primarily because it forces the use of a base pointer. @@ -30,48 +28,58 @@ passing by value with a copy. Intended Usage ============== -In the example below, ``f`` is attempting to pass a default-constructed -``Foo`` object to ``g`` by value. +The example below is the intended LLVM IR lowering for some C++ code +that passes two default-constructed ``Foo`` objects to ``g`` in the +32-bit Microsoft C++ ABI. + +.. code-block:: c++ + + // Foo is non-trivial. + struct Foo { int a, b; Foo(); ~Foo(); Foo(const Foo &); }; + void g(Foo a, Foo b); + void f() { + g(Foo(), Foo()); + } .. code-block:: llvm - %Foo = type { i32, i32 } - declare void @Foo_ctor(%Foo* %this) - declare void @g(%Foo* inalloca %arg) + %struct.Foo = type { i32, i32 } + declare void @Foo_ctor(%struct.Foo* %this) + declare void @Foo_dtor(%struct.Foo* %this) + declare void @g(<{ %struct.Foo, %struct.Foo }>* inalloca %memargs) define void @f() { - ... - - bb1: + entry: %base = call i8* @llvm.stacksave() - %arg = alloca %Foo - invoke void @Foo_ctor(%Foo* %arg) + %memargs = alloca <{ %struct.Foo, %struct.Foo }> + %b = getelementptr <{ %struct.Foo, %struct.Foo }>* %memargs, i32 1 + call void @Foo_ctor(%struct.Foo* %b) + + ; If a's ctor throws, we must destruct b. + %a = getelementptr <{ %struct.Foo, %struct.Foo }>* %memargs, i32 0 + invoke void @Foo_ctor(%struct.Foo* %a) to label %invoke.cont unwind %invoke.unwind invoke.cont: - call void @g(%Foo* inalloca %arg) + call void @g(<{ %struct.Foo, %struct.Foo }>* inalloca %memargs) call void @llvm.stackrestore(i8* %base) ... invoke.unwind: + call void @Foo_dtor(%struct.Foo* %b) call void @llvm.stackrestore(i8* %base) ... } -The alloca in this example is dynamic, meaning it is not in the entry -block, and it can be executed more than once. Due to the restrictions -against allocas between an alloca used with inalloca and its associated -call site, all allocas used with inalloca are considered dynamic. - -To avoid any stack leakage, the frontend saves the current stack pointer -with a call to :ref:`llvm.stacksave `. Then, it -allocates the argument stack space with alloca and calls the default -constructor. One important consideration is that the default -constructor could throw an exception, so the frontend has to create a -landing pad. At this point, if there were any other inalloca arguments, -the frontend would have to destruct them before restoring the stack -pointer. If the constructor does not unwind, ``g`` is called, and then -the stack is restored. +To avoid stack leaks, the frontend saves the current stack pointer with +a call to :ref:`llvm.stacksave `. Then, it allocates the +argument stack space with alloca and calls the default constructor. The +default constructor could throw an exception, so the frontend has to +create a landing pad. The frontend has to destroy the already +constructed argument ``b`` before restoring the stack pointer. If the +constructor does not unwind, ``g`` is called. In the Microsoft C++ ABI, +``g`` will destroy its arguments, and then the stack is restored in +``f``. Design Considerations ===================== @@ -81,31 +89,43 @@ Lifetime The biggest design consideration for this feature is object lifetime. We cannot model the arguments as static allocas in the entry block, -because all calls need to use the memory that is at the end of the call -frame to pass arguments. We cannot vend pointers to that memory at -function entry because after code generation they will alias. In the -current design, the rule against allocas between the inalloca alloca -values and the call site avoids this problem, but it creates a cleanup -problem. Cleanup and lifetime is handled explicitly with stack save and -restore calls. In the future, we may be able to avoid this by using -:ref:`llvm.lifetime.start ` and :ref:`llvm.lifetime.end -` instead. +because all calls need to use the memory at the top of the stack to pass +arguments. We cannot vend pointers to that memory at function entry +because after code generation they will alias. + +The rule against allocas between argument allocations and the call site +avoids this problem, but it creates a cleanup problem. Cleanup and +lifetime is handled explicitly with stack save and restore calls. In +the future, we may want to introduce a new construct such as ``freea`` +or ``afree`` to make it clear that this stack adjusting cleanup is less +powerful than a full stack save and restore. Nested Calls and Copy Elision ----------------------------- -The next consideration is the ability for the frontend to perform copy -elision in the face of nested calls. Consider the evaluation of -``foo(foo(Bar()))``, where ``foo`` takes and returns a ``Bar`` object by -value and ``Bar`` has non-trivial constructors. In this case, we want -to be able to elide copies into ``foo``'s argument slots. That means we -need to have more than one set of argument frames active at the same -time. First, we need to allocate the frame for the outer call so we can -pass it in as the hidden struct return pointer to the middle call. Then -we do the same for the middle call, allocating a frame and passing its -address to ``Bar``'s default constructor. By wrapping the evaluation of -the inner ``foo`` with stack save and restore, we can have multiple -overlapping active call frames. +We also want to be able to support copy elision into these argument +slots. This means we have to support multiple live argument +allocations. + +Consider the evaluation of: + +.. code-block:: c++ + + // Foo is non-trivial. + struct Foo { int a; Foo(); Foo(const &Foo); ~Foo(); }; + Foo bar(Foo b); + int main() { + bar(bar(Foo())); + } + +In this case, we want to be able to elide copies into ``bar``'s argument +slots. That means we need to have more than one set of argument frames +active at the same time. First, we need to allocate the frame for the +outer call so we can pass it in as the hidden struct return pointer to +the middle call. Then we do the same for the middle call, allocating a +frame and passing its address to ``Foo``'s default constructor. By +wrapping the evaluation of the inner ``bar`` with stack save and +restore, we can have multiple overlapping active call frames. Callee-cleanup Calling Conventions ----------------------------------