docs/tutorial/LangImpl7.html

   1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   2                       "http://www.w3.org/TR/html4/strict.dtd">
   3
   4 <html>
   5 <head>
   6   <title>Kaleidoscope: Extending the Language: Mutable Variables / SSA
   7          construction</title>
   8   <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
   9   <meta name="author" content="Chris Lattner">
  10   <link rel="stylesheet" href="../llvm.css" type="text/css">
  11 </head>
  12
  13 <body>
  14
  15 <div class="doc_title">Kaleidoscope: Extending the Language: Mutable Variables</div>
  16
  17 <div class="doc_author">
  18   <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
  19 </div>
  20
  21 <!-- *********************************************************************** -->
  22 <div class="doc_section"><a name="intro">Part 7 Introduction</a></div>
  23 <!-- *********************************************************************** -->
  24
  25 <div class="doc_text">
  26
  27 <p>Welcome to Part 7 of the "<a href="index.html">Implementing a language with
  28 LLVM</a>" tutorial.  In parts 1 through 6, we've built a very respectable,
  29 albeit simple, <a
  30 href="http://en.wikipedia.org/wiki/Functional_programming">functional
  31 programming language</a>.  In our journey, we learned some parsing techniques,
  32 how to build and represent an AST, how to build LLVM IR, and how to optimize
  33 the resultant code and JIT compile it.</p>
  34
  35 <p>While Kaleidoscope is interesting as a functional language, this makes it
  36 "too easy" to generate LLVM IR for it.  In particular, a functional language
  37 makes it very easy to build LLVM IR directly in <a
  38 href="http://en.wikipedia.org/wiki/Static_single_assignment_form">SSA form</a>.
  39 Since LLVM requires that the input code be in SSA form, this is a very nice
  40 property and it is often unclear to newcomers how to generate code for an
  41 imperative language with mutable variables.</p>
  42
  43 <p>The short (and happy) summary of this chapter is that there is no need for
  44 your front-end to build SSA form: LLVM provides highly tuned and well tested
  45 support for this, though the way it works is a bit unexpected for some.</p>
  46
  47 </div>
  48
  49 <!-- *********************************************************************** -->
  50 <div class="doc_section"><a name="why">Why is this a hard problem?</a></div>
  51 <!-- *********************************************************************** -->
  52
  53 <div class="doc_text">
  54
  55 <p>
  56 To understand why mutable variables cause complexities in SSA construction,
  57 consider this extremely simple C example:
  58 </p>
  59
  60 <div class="doc_code">
  61 <pre>
  62 int G, H;
  63 int test(_Bool Condition) {
  64   int X;
  65   if (Condition)
  66     X = G;
  67   else
  68     X = H;
  69   return X;
  70 }
  71 </pre>
  72 </div>
  73
  74 <p>In this case, we have the variable "X", whose value depends on the path
  75 executed in the program.  Because there are two different possible values for X
  76 before the return instruction, a PHI node is inserted to merge the two values.
  77 The LLVM IR that we want for this example looks like this:</p>
  78
  79 <div class="doc_code">
  80 <pre>
  81 @G = weak global i32 0   ; type of @G is i32*
  82 @H = weak global i32 0   ; type of @H is i32*
  83
  84 define i32 @test(i1 %Condition) {
  85 entry:
  86         br i1 %Condition, label %cond_true, label %cond_false
  87
  88 cond_true:
  89         %X.0 = load i32* @G
  90         br label %cond_next
  91
  92 cond_false:
  93         %X.1 = load i32* @H
  94         br label %cond_next
  95
  96 cond_next:
  97         %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
  98         ret i32 %X.2
  99 }
 100 </pre>
 101 </div>
 102
 103 <p>In this example, the loads from the G and H global variables are explicit in
 104 the LLVM IR, and they live in the then/else branches of the if statement
 105 (cond_true/cond_false).  In order to merge the incoming values, the X.2 phi node
 106 in the cond_next block selects the right value to use based on where control
 107 flow is coming from: if control flow comes from the cond_false block, X.2 gets
 108 the value of X.1.  Alternatively, if control flow comes from cond_tree, it gets
 109 the value of X.0.  The intent of this chapter is not to explain the details of
 110 SSA form.  For more information, see one of the many <a
 111 href="http://en.wikipedia.org/wiki/Static_single_assignment_form">online
 112 references</a>.</p>
 113
 114 <p>The question for this article is "who places phi nodes when lowering
 115 assignments to mutable variables?".  The issue here is that LLVM
 116 <em>requires</em> that its IR be in SSA form: there is no "non-ssa" mode for it.
 117 However, SSA construction requires non-trivial algorithms and data structures,
 118 so it is inconvenient and wasteful for every front-end to have to reproduce this
 119 logic.</p>
 120
 121 </div>
 122
 123 <!-- *********************************************************************** -->
 124 <div class="doc_section"><a name="memory">Memory in LLVM</a></div>
 125 <!-- *********************************************************************** -->
 126
 127 <div class="doc_text">
 128
 129 <p>The 'trick' here is that while LLVM does require all register values to be
 130 in SSA form, it does not require (or permit) memory objects to be in SSA form.
 131 In the example above, note that the loads from G and H are direct accesses to
 132 G and H: they are not renamed or versioned.  This differs from some other
 133 compiler systems, which do try to version memory objects.  In LLVM, instead of
 134 encoding dataflow analysis of memory into the LLVM IR, it is handled with <a
 135 href="../WritingAnLLVMPass.html">Analysis Passes</a> which are computed on
 136 demand.</p>
 137
 138 <p>
 139 With this in mind, the high-level idea is that we want to make a stack variable
 140 (which lives in memory, because it is on the stack) for each mutable object in
 141 a function.  To take advantage of this trick, we need to talk about how LLVM
 142 represents stack variables.
 143 </p>
 144
 145 <p>In LLVM, all memory accesses are explicit with load/store instructions, and
 146 it is carefully designed to not have (or need) an "address-of" operator.  Notice
 147 how the type of the @G/@H global variables is actually "i32*" even though the
 148 variable is defined as "i32".  What this means is that @G defines <em>space</em>
 149 for an i32 in the global data area, but its <em>name</em> actually refers to the
 150 address for that space.  Stack variables work the same way, but instead of being
 151 declared with global variable definitions, they are declared with the
 152 <a href="../LangRef.html#i_alloca">LLVM alloca instruction</a>:</p>
 153
 154 <div class="doc_code">
 155 <pre>
 156 define i32 @test(i1 %Condition) {
 157 entry:
 158         %X = alloca i32           ; type of %X is i32*.
 159         ...
 160         %tmp = load i32* %X       ; load the stack value %X from the stack.
 161         %tmp2 = add i32 %tmp, 1   ; increment it
 162         store i32 %tmp2, i32* %X  ; store it back
 163         ...
 164 </pre>
 165 </div>
 166
 167 <p>This code shows an example of how you can declare and manipulate a stack
 168 variable in the LLVM IR.  Stack memory allocated with the alloca instruction is
 169 fully general: you can pass the address of the stack slot to functions, you can
 170 store it in other variables, etc.  In our example above, we could rewrite the
 171 example to use the alloca technique to avoid using a PHI node:</p>
 172
 173 <div class="doc_code">
 174 <pre>
 175 @G = weak global i32 0   ; type of @G is i32*
 176 @H = weak global i32 0   ; type of @H is i32*
 177
 178 define i32 @test(i1 %Condition) {
 179 entry:
 180         %X = alloca i32           ; type of %X is i32*.
 181         br i1 %Condition, label %cond_true, label %cond_false
 182
 183 cond_true:
 184         %X.0 = load i32* @G
 185         store i32 %X.0, i32* %X   ; Update X
 186         br label %cond_next
 187
 188 cond_false:
 189         %X.1 = load i32* @H
 190         store i32 %X.1, i32* %X   ; Update X
 191         br label %cond_next
 192
 193 cond_next:
 194         %X.2 = load i32* %X       ; Read X
 195         ret i32 %X.2
 196 }
 197 </pre>
 198 </div>
 199
 200 <p>With this, we have discovered a way to handle arbitrary mutable variables
 201 without the need to create Phi nodes at all:</p>
 202
 203 <ol>
 204 <li>Each mutable variable becomes a stack allocation.</li>
 205 <li>Each read of the variable becomes a load from the stack.</li>
 206 <li>Each update of the variable becomes a store to the stack.</li>
 207 <li>Taking the address of a variable just uses the stack address directly.</li>
 208 </ol>
 209
 210 <p>While this solution has solved our immediate problem, it introduced another
 211 one: we have now apparently introduced a lot of stack traffic for very simple
 212 and common operations, a major performance problem.  Fortunately for us, the
 213 LLVM optimizer has a highly-tuned optimization pass named "mem2reg" that handles
 214 this case, promoting allocas like this into SSA registers, inserting Phi nodes
 215 as appropriate.  If you run this example through the pass, for example, you'll
 216 get:</p>
 217
 218 <div class="doc_code">
 219 <pre>
 220 $ <b>llvm-as &lt; example.ll | opt -mem2reg | llvm-dis</b>
 221 @G = weak global i32 0
 222 @H = weak global i32 0
 223
 224 define i32 @test(i1 %Condition) {
 225 entry:
 226         br i1 %Condition, label %cond_true, label %cond_false
 227
 228 cond_true:
 229         %X.0 = load i32* @G
 230         br label %cond_next
 231
 232 cond_false:
 233         %X.1 = load i32* @H
 234         br label %cond_next
 235
 236 cond_next:
 237         %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
 238         ret i32 %X.01
 239 }
 240 </pre>
 241 </div>
 242
 243 <p>The mem2reg pass implements the standard "iterated dominator frontier"
 244 algorithm for constructing SSA form and has a number of optimizations that speed
 245 up very common degenerate cases.  mem2reg really is the answer for dealing with
 246 mutable variables, and we highly recommend that you depend on it.  Note that
 247 mem2reg only works on variables in certain circumstances:</p>
 248
 249 <ol>
 250 <li>mem2reg is alloca-driven: it looks for allocas and if it can handle them, it
 251 promotes them.  It does not apply to global variables or heap allocations.</li>
 252
 253 <li>mem2reg only looks for alloca instructions in the entry block of the
 254 function.  Being in the entry block guarantees that the alloca is only executed
 255 once, which makes analysis simpler.</li>
 256
 257 <li>mem2reg only promotes allocas whose uses are direct loads and stores.  If
 258 the address of the stack object is passed to a function, or if any funny pointer
 259 arithmetic is involved, the alloca will not be promoted.</li>
 260
 261 <li>mem2reg only works on allocas of scalar values, and only if the array size
 262 of the allocation is 1 (or missing in the .ll file).  mem2reg is not capable of
 263 promoting structs or arrays to registers.  Note that the "scalarrepl" pass is
 264 more powerful and can promote structs, "unions", and arrays in many cases.</li>
 265
 266 </ol>
 267
 268 <p>
 269 All of these properties are easy to satisfy for most imperative languages, and
 270 we'll illustrate this below with Kaleidoscope.  The final question you may be
 271 asking is: should I bother with this nonsense for my front-end?  Wouldn't it be
 272 better if I just did SSA construction directly, avoiding use of the mem2reg
 273 optimization pass?  In short, we strongly recommend that use you this technique
 274 for building SSA form, unless there is an extremely good reason not to.  Using
 275 this technique is:</p>
 276
 277 <ul>
 278 <li>Proven and well tested: llvm-gcc and clang both use this technique for local
 279 mutable variables.  As such, the most common clients of LLVM are using this to
 280 handle a bulk of their variables.  You can be sure that bugs are found fast and
 281 fixed early.</li>
 282
 283 <li>Extremely Fast: mem2reg has a number of special cases that make it fast in
 284 common cases as well as fully general.  For example, it has fast-paths for
 285 variables that are only used in a single block, variables that only have one
 286 assignment point, good heuristics to avoid insertion of unneeded phi nodes, etc.
 287 </li>
 288
 289 <li>Needed for debug info generation: <a href="../SourceLevelDebugging.html">
 290 Debug information in LLVM</a> relies on having the address of the variable
 291 exposed to attach debug info to it.  This technique dovetails very naturally
 292 with this style of debug info.</li>
 293 </ul>
 294
 295 <p>If nothing else, this makes it much easier to get your front-end up and
 296 running, and is very simple to implement.  Lets extend Kaleidoscope with mutable
 297 variables now!
 298 </p>
 299 </div>
 300
 301
 302 <!-- *********************************************************************** -->
 303 <div class="doc_section"><a name="code">Full Code Listing</a></div>
 304 <!-- *********************************************************************** -->
 305
 306 <div class="doc_text">
 307
 308 <p>
 309 Here is the complete code listing for our running example, enhanced with the
 310 if/then/else and for expressions..  To build this example, use:
 311 </p>
 312
 313 <div class="doc_code">
 314 <pre>
 315    # Compile
 316    g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
 317    # Run
 318    ./toy
 319 </pre>
 320 </div>
 321
 322 <p>Here is the code:</p>
 323
 324 <div class="doc_code">
 325 <pre>
 326 </pre>
 327 </div>
 328
 329 </div>
 330
 331 <!-- *********************************************************************** -->
 332 <hr>
 333 <address>
 334   <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
 335   src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
 336   <a href="http://validator.w3.org/check/referer"><img
 337   src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
 338
 339   <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
 340   <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
 341   Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $
 342 </address>
 343 </body>
 344 </html>