From 989ffa936e2a9ed89cf8139db3ea1aeac7c858de Mon Sep 17 00:00:00 2001 From: Philip Reames Date: Tue, 24 Feb 2015 19:44:46 +0000 Subject: [PATCH] Starting to cleanup the garbage collection documentation In this change: - Put the getting started section first - Create a dedicated section to document the built in collector strategies - Move discuss of ShadowStack into new section - Add placeholders for erlang, ocaml, and statepoint-example collectors There will be many more changes following. I plan on full integrating the documentation for gc.statepoint and gc.root. I want to make it much clearer on how to get started and what users should expect in terms of effort. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230359 91177308-0d34-0410-b5e6-96231b3b80d8 --- docs/GarbageCollection.rst | 297 +++++++++++++++++++++---------------- 1 file changed, 172 insertions(+), 125 deletions(-) diff --git a/docs/GarbageCollection.rst b/docs/GarbageCollection.rst index 40f5b1287b0..8d52a61f717 100644 --- a/docs/GarbageCollection.rst +++ b/docs/GarbageCollection.rst @@ -1,5 +1,5 @@ ===================================== -Accurate Garbage Collection with LLVM +Garbage Collection with LLVM ===================================== .. contents:: @@ -8,113 +8,9 @@ Accurate Garbage Collection with LLVM Introduction ============ -Garbage collection is a widely used technique that frees the programmer from -having to know the lifetimes of heap objects, making software easier to produce -and maintain. Many programming languages rely on garbage collection for -automatic memory management. There are two primary forms of garbage collection: -conservative and accurate. - -Conservative garbage collection often does not require any special support from -either the language or the compiler: it can handle non-type-safe programming -languages (such as C/C++) and does not require any special information from the -compiler. The `Boehm collector -`__ is an example of a -state-of-the-art conservative collector. - -Accurate garbage collection requires the ability to identify all pointers in the -program at run-time (which requires that the source-language be type-safe in -most cases). Identifying pointers at run-time requires compiler support to -locate all places that hold live pointer variables at run-time, including the -:ref:`processor stack and registers `. - -Conservative garbage collection is attractive because it does not require any -special compiler support, but it does have problems. In particular, because the -conservative garbage collector cannot *know* that a particular word in the -machine is a pointer, it cannot move live objects in the heap (preventing the -use of compacting and generational GC algorithms) and it can occasionally suffer -from memory leaks due to integer values that happen to point to objects in the -program. In addition, some aggressive compiler transformations can break -conservative garbage collectors (though these seem rare in practice). - -Accurate garbage collectors do not suffer from any of these problems, but they -can suffer from degraded scalar optimization of the program. In particular, -because the runtime must be able to identify and update all pointers active in -the program, some optimizations are less effective. In practice, however, the -locality and performance benefits of using aggressive garbage collection -techniques dominates any low-level losses. - -This document describes the mechanisms and interfaces provided by LLVM to -support accurate garbage collection. - -Goals and non-goals -------------------- - -LLVM's intermediate representation provides :ref:`garbage collection intrinsics -` that offer support for a broad class of collector models. For -instance, the intrinsics permit: - -* semi-space collectors - -* mark-sweep collectors - -* generational collectors - -* reference counting - -* incremental collectors - -* concurrent collectors - -* cooperative collectors - -We hope that the primitive support built into the LLVM IR is sufficient to -support a broad class of garbage collected languages including Scheme, ML, Java, -C#, Perl, Python, Lua, Ruby, other scripting languages, and more. - -However, LLVM does not itself provide a garbage collector --- this should be -part of your language's runtime library. LLVM provides a framework for compile -time :ref:`code generation plugins `. The role of these plugins is to -generate code and data structures which conforms to the *binary interface* -specified by the *runtime library*. This is similar to the relationship between -LLVM and DWARF debugging info, for example. The difference primarily lies in -the lack of an established standard in the domain of garbage collection --- thus -the plugins. - -The aspects of the binary interface with which LLVM's GC support is -concerned are: - -* Creation of GC-safe points within code where collection is allowed to execute - safely. - -* Computation of the stack map. For each safe point in the code, object - references within the stack frame must be identified so that the collector may - traverse and perhaps update them. - -* Write barriers when storing object references to the heap. These are commonly - used to optimize incremental scans in generational collectors. - -* Emission of read barriers when loading object references. These are useful - for interoperating with concurrent collectors. - -There are additional areas that LLVM does not directly address: - -* Registration of global roots with the runtime. - -* Registration of stack map entries with the runtime. - -* The functions used by the program to allocate memory, trigger a collection, - etc. - -* Computation or compilation of type maps, or registration of them with the - runtime. These are used to crawl the heap for object references. - -In general, LLVM's support for GC does not include features which can be -adequately addressed with other features of the IR and does not specify a -particular binary interface. On the plus side, this means that you should be -able to integrate LLVM with an existing runtime. On the other hand, it leaves a -lot of work for the developer of a novel language. However, it's easy to get -started quickly and scale up to a more sophisticated implementation as your -compiler matures. +This document covers how to integrate LLVM into a compiler for a language which +supports garbage collection. **Note that LLVM itself does not provide a +garbage collector.** You must provide your own. Getting started =============== @@ -177,7 +73,7 @@ To turn the shadow stack on for your functions, first call: .. code-block:: c++ - F.setGC("shadow-stack"); + F.setGC(); for each function your compiler emits. Since the shadow stack is built into LLVM, you do not need to load a plugin. @@ -252,27 +148,120 @@ data structure, but there are only 20 lines of meaningful code.) } } -About the shadow stack ----------------------- -Unlike many GC algorithms which rely on a cooperative code generator to compile -stack maps, this algorithm carefully maintains a linked list of stack roots -[:ref:`Henderson2002 `]. This so-called "shadow stack" mirrors the -machine stack. Maintaining this data structure is slower than using a stack map -compiled into the executable as constant data, but has a significant portability -advantage because it requires no special support from the target code generator, -and does not require tricky platform-specific code to crawl the machine stack. -The tradeoff for this simplicity and portability is: -* High overhead per function call. -* Not thread-safe. +What is Garbage Collection? +=========================== -Still, it's an easy way to get started. After your compiler and runtime are up -and running, writing a :ref:`plugin ` will allow you to take advantage -of :ref:`more advanced GC features ` of LLVM in order to -improve performance. +Garbage collection is a widely used technique that frees the programmer from +having to know the lifetimes of heap objects, making software easier to produce +and maintain. Many programming languages rely on garbage collection for +automatic memory management. There are two primary forms of garbage collection: +conservative and accurate. + +Conservative garbage collection often does not require any special support from +either the language or the compiler: it can handle non-type-safe programming +languages (such as C/C++) and does not require any special information from the +compiler. The `Boehm collector +`__ is an example of a +state-of-the-art conservative collector. + +Accurate garbage collection requires the ability to identify all pointers in the +program at run-time (which requires that the source-language be type-safe in +most cases). Identifying pointers at run-time requires compiler support to +locate all places that hold live pointer variables at run-time, including the +:ref:`processor stack and registers `. + +Conservative garbage collection is attractive because it does not require any +special compiler support, but it does have problems. In particular, because the +conservative garbage collector cannot *know* that a particular word in the +machine is a pointer, it cannot move live objects in the heap (preventing the +use of compacting and generational GC algorithms) and it can occasionally suffer +from memory leaks due to integer values that happen to point to objects in the +program. In addition, some aggressive compiler transformations can break +conservative garbage collectors (though these seem rare in practice). + +Accurate garbage collectors do not suffer from any of these problems, but they +can suffer from degraded scalar optimization of the program. In particular, +because the runtime must be able to identify and update all pointers active in +the program, some optimizations are less effective. In practice, however, the +locality and performance benefits of using aggressive garbage collection +techniques dominates any low-level losses. + +This document describes the mechanisms and interfaces provided by LLVM to +support accurate garbage collection. + +Goals and non-goals +------------------- + +LLVM's intermediate representation provides :ref:`garbage collection intrinsics +` that offer support for a broad class of collector models. For +instance, the intrinsics permit: + +* semi-space collectors + +* mark-sweep collectors + +* generational collectors + +* reference counting + +* incremental collectors + +* concurrent collectors + +* cooperative collectors + +We hope that the primitive support built into the LLVM IR is sufficient to +support a broad class of garbage collected languages including Scheme, ML, Java, +C#, Perl, Python, Lua, Ruby, other scripting languages, and more. + +However, LLVM does not itself provide a garbage collector --- this should be +part of your language's runtime library. LLVM provides a framework for compile +time :ref:`code generation plugins `. The role of these plugins is to +generate code and data structures which conforms to the *binary interface* +specified by the *runtime library*. This is similar to the relationship between +LLVM and DWARF debugging info, for example. The difference primarily lies in +the lack of an established standard in the domain of garbage collection --- thus +the plugins. + +The aspects of the binary interface with which LLVM's GC support is +concerned are: + +* Creation of GC-safe points within code where collection is allowed to execute + safely. + +* Computation of the stack map. For each safe point in the code, object + references within the stack frame must be identified so that the collector may + traverse and perhaps update them. + +* Write barriers when storing object references to the heap. These are commonly + used to optimize incremental scans in generational collectors. + +* Emission of read barriers when loading object references. These are useful + for interoperating with concurrent collectors. + +There are additional areas that LLVM does not directly address: + +* Registration of global roots with the runtime. + +* Registration of stack map entries with the runtime. + +* The functions used by the program to allocate memory, trigger a collection, + etc. + +* Computation or compilation of type maps, or registration of them with the + runtime. These are used to crawl the heap for object references. + +In general, LLVM's support for GC does not include features which can be +adequately addressed with other features of the IR and does not specify a +particular binary interface. On the plus side, this means that you should be +able to integrate LLVM with an existing runtime. On the other hand, it leaves a +lot of work for the developer of a novel language. However, it's easy to get +started quickly and scale up to a more sophisticated implementation as your +compiler matures. .. _gc_intrinsics: @@ -447,6 +436,64 @@ greater performance impact since pointer reads are more frequent than writes. .. _plugin: +Built In Collectors +==================== + +LLVM includes built in support for several varieties of garbage collectors. + +The Shadow Stack GC +---------------------- + +To use this collector strategy, mark your functions with: + +.. code-block:: c++ + + F.setGC("shadow-stack"); + +Unlike many GC algorithms which rely on a cooperative code generator to compile +stack maps, this algorithm carefully maintains a linked list of stack roots +[:ref:`Henderson2002 `]. This so-called "shadow stack" mirrors the +machine stack. Maintaining this data structure is slower than using a stack map +compiled into the executable as constant data, but has a significant portability +advantage because it requires no special support from the target code generator, +and does not require tricky platform-specific code to crawl the machine stack. + +The tradeoff for this simplicity and portability is: + +* High overhead per function call. + +* Not thread-safe. + +Still, it's an easy way to get started. After your compiler and runtime are up +and running, writing a :ref:`plugin ` will allow you to take advantage +of :ref:`more advanced GC features ` of LLVM in order to +improve performance. + +The 'Erlang' and 'Ocaml' GCs +----------------------------- + +LLVM ships with two example collectors which leverage the ''gcroot'' +mechanisms. To our knowledge, these are not actually used by any language +runtime, but they do provide a reasonable starting point for someone interested +in writing an ''gcroot' compatible GC plugin. In particular, these are the +only in tree examples of how to produce a custom binary stack map format using +a ''gcroot'' strategy. + +As there names imply, the binary format produced is intended to model that +used by the Erlang and OCaml compilers respectively. + + +The Statepoint Example GC +------------------------- + +.. code-block:: c++ + + F.setGC("statepoint-example"); + +This GC provides an example of how one might use the infrastructure provided +by ''gc.statepoint''. + + Implementing a collector plugin =============================== -- 2.34.1