docs/CodingStandards.html

   1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
   2 <html><head><title>A Few Coding Standards</title></head>
   3 <body bgcolor=white>
   4
   5 <table width="100%" bgcolor="#330077" border=0 cellpadding=4 cellspacing=0>
   6 <tr><td>&nbsp; <font size=+5 color="#EEEEFF" face="Georgia,Palatino,Times,Roman"><b>A Few Coding Standards</b></font></td>
   7 </tr></table>
   8
   9 <ol>
  10   <li><a href="#introduction">Introduction</a>
  11   <li><a href="#mechanicalissues">Mechanical Source Issues</a>
  12     <ol>
  13       <li><a href="#sourceformating">Source Code Formatting</a>
  14         <ol>
  15           <li><a href="#scf_commenting">Commenting</a>
  16           <li><a href="#scf_commentformat">Comment Formatting</a>
  17           <li><a href="#scf_includes">#include Style</a>
  18           <li><a href="#scf_codewidth">Source Code Width</a>
  19           <li><a href="#scf_spacestabs">Use Spaces Instead of Tabs</a>
  20           <li><a href="#scf_indentation">Indent Code Consistently</a>
  21         </ol>
  22       <li><a href="#compilerissues">Compiler Issues</a>
  23         <ol>
  24           <li><a href="#ci_warningerrors">Treat Compiler Warnings Like Errors</a>
  25           <li><a href="#ci_cpp_features">Which C++ features can I use?</a>
  26           <li><a href="#ci_portable_code">Write Portable Code</a>
  27         </ol>
  28     </ol>
  29   <li><a href="#styleissues">Style Issues</a>
  30     <ol>
  31       <li><a href="#macro">The High Level Issues</a>
  32         <ol>
  33           <li><a href="#hl_module">A Public Header File <b>is</b> a Module</a>
  34           <li><a href="#hl_dontinclude">#include as Little as Possible</a>
  35           <li><a href="#hl_privateheaders">Keep "internal" Headers Private</a>
  36         </ol>
  37       <li><a href="#micro">The Low Level Issues</a>
  38         <ol>
  39           <li><a href="#hl_assert">Assert Liberally</a>
  40           <li><a href="#hl_preincrement">Prefer Preincrement</a>
  41           <li><a href="#hl_avoidendl">Avoid endl</a>
  42           <li><a href="#hl_exploitcpp">Exploit C++ to its Fullest</a>
  43         </ol>
  44       <li><a href="#iterators">Writing Iterators</a>
  45     </ol>
  46   <li><a href="#seealso">See Also</a>
  47 </ol><p>
  48
  49
  50 <!-- *********************************************************************** -->
  51 </ul><table width="100%" bgcolor="#330077" border=0 cellpadding=4 cellspacing=0><tr><td align=center><font color="#EEEEFF" size=+2 face="Georgia,Palatino"><b>
  52 <a name="introduction">Introduction
  53 </b></font></td></tr></table><ul>
  54 <!-- *********************************************************************** -->
  55
  56 This document attempts to describe a few coding standards that are being used in
  57 the LLVM source tree.  Although no coding standards should be regarded as
  58 absolute requirements to be followed in all instances, coding standards can be
  59 useful.<p>
  60
  61 This document intentionally does not prescribe fixed standards for religious
  62 issues such as brace placement and space usage.  For issues like this, follow
  63 the golden rule:
  64
  65 <a name="goldenrule">
  66 <blockquote><b>If you are adding a significant body of source to a project, feel
  67 free to use whatever style you are most comfortable with.  If you are extending,
  68 enhancing, or bug fixing already implemented code, use the style that is already
  69 being used so that the source is uniform and easy to follow.</b></blockquote>
  70
  71 The ultimate goal of these guidelines is the increase readability and
  72 maintainability of our common source base. If you have suggestions for topics to
  73 be included, please mail them to <a href="mailto:sabre@nondot.org">Chris</a>.<p>
  74
  75
  76 <!-- *********************************************************************** -->
  77 </ul><table width="100%" bgcolor="#330077" border=0 cellpadding=4 cellspacing=0><tr><td align=center><font color="#EEEEFF" size=+2 face="Georgia,Palatino"><b>
  78 <a name="mechanicalissues">Mechanical Source Issues
  79 </b></font></td></tr></table><ul>
  80 <!-- *********************************************************************** -->
  81
  82 <!-- ======================================================================= -->
  83 </ul><table width="100%" bgcolor="#441188" border=0 cellpadding=4 cellspacing=0><tr><td>&nbsp;</td><td width="100%">&nbsp; <font color="#EEEEFF" face="Georgia,Palatino"><b>
  84 <a name="sourceformating">Source Code Formatting
  85 </b></font></td></tr></table><ul>
  86
  87
  88 <!-- _______________________________________________________________________ -->
  89 </ul><a name="scf_commenting"><h4><hr size=0>Commenting</h4><ul>
  90
  91 Comments are one critical part of readability and maintainability.  Everyone
  92 knows they should comment, so should you.  :)  Although we all should probably
  93 comment our code more than we do, there are a few very critical places that
  94 documentation is very useful:<p>
  95
  96 <ol>
  97 <h4><li>File Headers</h4>
  98 Every source file should have a header on it that describes the basic purpose of
  99 the file.  If a file does not have a header, it should not be checked into CVS.
 100 Most source trees will probably have a standard file header format.  The
 101 standard format for the LLVM source tree looks like this:<p>
 102
 103 <pre>
 104 //===-- llvm/Instruction.h - Instruction class definition --------*- C++ -*--=//
 105 //
 106 // This file contains the declaration of the Instruction class, which is the
 107 // base class for all of the VM instructions.
 108 //
 109 //===----------------------------------------------------------------------===//
 110 </pre>
 111
 112 A few things to note about this particular format.  The "<tt>-*- C++ -*-</tt>"
 113 string on the first line is there to tell Emacs that the source file is a C++
 114 file, not a C file (Emacs assumes .h files are C files by default [Note that tag
 115 this is not necessary in .cpp files]).  The name of the file is also on the
 116 first line, along with a very short description of the purpose of the file.
 117 This is important when printing out code and flipping though lots of pages.<p>
 118
 119 The main body of the description does not have to be very long in most cases.
 120 Here it's only two lines.  If an algorithm is being implemented or something
 121 tricky is going on, a reference to the paper where it is published should be
 122 included, as well as any notes or "gotchas" in the code to watch out for.<p>
 123
 124
 125 <h4><li>Class overviews</h4>
 126
 127 Classes are one fundemental part of a good object oriented design.  As such, a
 128 class definition should have a comment block that explains what the class is
 129 used for... if it's not obvious.  If it's so completely obvious your grandma
 130 could figure it out, it's probably safe to leave it out.  Naming classes
 131 something sane goes a long ways towards avoiding writing documentation.  :)<p>
 132
 133
 134 <h4><li>Method information</h4>
 135
 136 Methods defined in a class (as well as any global functions) should also be
 137 documented properly.  A quick note about what it does any a description of the
 138 borderline behaviour is all that is necessary here (unless something
 139 particularly tricky or insideous is going on).  The hope is that people can
 140 figure out how to use your interfaces without reading the code itself... that is
 141 the goal metric.<p>
 142
 143 Good things to talk about here are what happens when something unexpected
 144 happens: does the method return null?  Abort?  Format your hard disk?<p>
 145 </ol>
 146
 147
 148 <!-- _______________________________________________________________________ -->
 149 </ul><a name="scf_commentformat"><h4><hr size=0>Comment Formatting</h4><ul>
 150
 151 In general, prefer C++ style (<tt>//</tt>) comments.  They take less space,
 152 require less typing, don't have nesting problems, etc.  There are a few cases
 153 when it is useful to use C style (<tt>/* */</tt>) comments however:<p>
 154
 155 <ol>
 156 <li>When writing a C code: Obviously if you are writing C code, use C style
 157 comments.  :)
 158 <li>When writing a header file that may be #included by a C source file.
 159 <li>When writing a source file that is used by a tool that only accepts C style
 160 comments.
 161 </ol><p>
 162
 163 To comment out a large block of code, use <tt>#if 0</tt> and <tt>#endif</tt>.
 164 These nest properly and are better behaved in general than C style comments.<p>
 165
 166 <!-- _______________________________________________________________________ -->
 167 </ul><a name="scf_includes"><h4><hr size=0>#include Style</h4><ul>
 168
 169 Immediately after the <a href="#scf_commenting">header file comment</a> (and
 170 include guards if working on a header file), the <a
 171 href="hl_dontinclude">minimal</a> list of #includes required by the file should
 172 be listed.  We prefer these #includes to be listed in this order:<p>
 173
 174 <ol>
 175 <li><a href="#mmheader">Main Module header</a>
 176 <li><a href="#hl_privateheaders">Local/Private Headers</a>
 177 <li>llvm/*
 178 <li>llvm/Analysis/*
 179 <li>llvm/Assembly/*
 180 <li>llvm/Bytecode/*
 181 <li>llvm/CodeGen/*
 182 <li>...
 183 <li>Support/*
 184 <li>Config/*
 185 <li>System #includes
 186 </ol>
 187
 188 ... and each catagory should be sorted by name.<p>
 189
 190 <a name="mmheader">The "Main Module Header" file applies to .cpp file which
 191 implement an interface defined by a .h file.  This #include should always be
 192 included <b>first</b> regardless of where it lives on the file system.  By
 193 including a header file first in the .cpp files that implement the interfaces,
 194 we ensure that the header does not have any hidden dependencies which are not
 195 explicitly #included in the header, but should be.  It is also a form of
 196 documentation in the .cpp file to indicate where the interfaces it implements
 197 are defined.<p>
 198
 199
 200 <!-- _______________________________________________________________________ -->
 201 </ul><a name="scf_codewidth"><h4><hr size=0>Source Code Width</h4><ul>
 202
 203 Write your code to fit within 80 columns of text.  This helps those of us who like to print out code and look at your code in an xterm without resizing it.
 204
 205
 206 <!-- _______________________________________________________________________ -->
 207 </ul><a name="scf_spacestabs"><h4><hr size=0>Use Spaces Instead of Tabs</h4><ul>
 208
 209 In all cases, prefer spaces to tabs in source files.  People have different
 210 prefered indentation levels, and different styles of indentation that they
 211 like... this is fine.  What isn't is that different editors/viewers expand tabs
 212 out to different tab stops.  This can cause your code to look completely
 213 unreadable, and it is not worth dealing with.<p>
 214
 215 As always, follow the <a href="#goldenrule">Golden Rule</a> above: follow the
 216 style of existing code if your are modifying and extending it.  If you like four
 217 spaces of indentation, <b>DO NOT</b> do that in the middle of a chunk of code
 218 with two spaces of indentation.  Also, do not reindent a whole source file: it
 219 makes for incredible diffs that are absolutely worthless.<p>
 220
 221
 222 <!-- _______________________________________________________________________ -->
 223 </ul><a name="scf_indentation"><h4><hr size=0>Indent Code Consistently</h4><ul>
 224
 225 Okay, your first year of programming you were told that indentation is
 226 important.  If you didn't believe and internalize this then, now is the time.
 227 Just do it.<p>
 228
 229
 230
 231
 232 <!-- ======================================================================= -->
 233 </ul><table width="100%" bgcolor="#441188" border=0 cellpadding=4 cellspacing=0><tr><td>&nbsp;</td><td width="100%">&nbsp; <font color="#EEEEFF" face="Georgia,Palatino"><b>
 234 <a name="compilerissues">Compiler Issues
 235 </b></font></td></tr></table><ul>
 236
 237
 238 <!-- _______________________________________________________________________ -->
 239 </ul><a name="ci_warningerrors"><h4><hr size=0>Treat Compiler Warnings Like Errors</h4><ul>
 240
 241 If your code has compiler warnings in it, something is wrong: you aren't casting
 242 values correctly, your have "questionable" constructs in your code, or you are
 243 doing something legitimately wrong.  Compiler warnings can cover up legitimate
 244 errors in output and make dealing with a translation unit difficult.<p>
 245
 246 It is not possible to prevent all warnings from all compilers, nor is it
 247 desirable.  Instead, pick a standard compiler (like <tt>gcc</tt>) that provides
 248 a good thorough set of warnings, and stick to them.  At least in the case of
 249 <tt>gcc</tt>, it is possible to work around any spurious errors by changing the
 250 syntax of the code slightly.  For example, an warning that annoys me occurs when
 251 I write code like this:<p>
 252
 253 <pre>
 254   if (V = getValue()) {
 255     ..
 256   }
 257 </pre><p>
 258
 259 <tt>gcc</tt> will warn me that I probably want to use the <tt>==</tt> operator,
 260 and that I probably mistyped it.  In most cases, I haven't, and I really don't
 261 want the spurious errors.  To fix this particular problem, I rewrite the code
 262 like this:<p>
 263
 264 <pre>
 265   if ((V = getValue())) {
 266     ..
 267   }
 268 </pre><p>
 269
 270 ...which shuts <tt>gcc</tt> up.  Any <tt>gcc</tt> warning that annoys you can be
 271 fixed by massaging the code appropriately.<p>
 272
 273 These are the <tt>gcc</tt> warnings that I prefer to enable: <tt>-Wall -Winline
 274 -W -Wwrite-strings -Wno-unused</tt><p>
 275
 276
 277 <!-- _______________________________________________________________________ -->
 278 </ul><a name="ci_cpp_features"><h4><hr size=0>Which C++ features can I use?</h4><ul>
 279
 280 Compilers are finally catching up to the C++ standard.  Most compilers implement
 281 most features, so you can use just about any features that you would like.  In
 282 the LLVM source tree, I have chosen to not use these features:<p>
 283
 284 <ol>
 285 <li>Exceptions: Exceptions are very useful for error reporting and handling
 286 exceptional conditions.  I do not use them in LLVM because they do have an
 287 associated performance impact (by restricting restructuring of code), and parts
 288 of LLVM are designed for performance critical purposes.<p>
 289
 290 Just like most of the rules in this document, this isn't a hard and fast
 291 requirement.  Exceptions are used in the Parser, because it simplifies error
 292 reporting <b>significantly</b>, and the LLVM parser is not at all in the
 293 critical path.<p>
 294
 295 <li>RTTI: RTTI has a large cost in terms of executable size, and compilers are
 296 not yet very good at stomping out "dead" class information blocks.  Because of
 297 this, typeinfo and dynamic cast are not used.
 298 </ol><p>
 299
 300 Other features, such as templates (without partial specialization) can be used
 301 freely.  The general goal is to have clear, consise, performant code... if a
 302 technique assists with that then use it.<p>
 303
 304
 305 <!-- _______________________________________________________________________ -->
 306 </ul><a name="ci_portable_code"><h4><hr size=0>Write Portable Code</h4><ul>
 307
 308 In almost all cases, it is possible and within reason to write completely
 309 portable code.  If there are cases where it isn't possible to write portable
 310 code, isolate it behind a well defined (and well documented) interface.<p>
 311
 312 In practice, this means that you shouldn't assume much about the host compiler,
 313 including its support for "high tech" features like partial specialization of
 314 templates.  In fact, Visual C++ 6 could be an important target for our work in
 315 the future, and we don't want to have to rewrite all of our code to support
 316 it.<p>
 317
 318
 319
 320 <!-- *********************************************************************** -->
 321 </ul><table width="100%" bgcolor="#330077" border=0 cellpadding=4 cellspacing=0><tr><td align=center><font color="#EEEEFF" size=+2 face="Georgia,Palatino"><b>
 322 <a name="styleissues">Style Issues
 323 </b></font></td></tr></table><ul>
 324 <!-- *********************************************************************** -->
 325
 326
 327 <!-- ======================================================================= -->
 328 </ul><table width="100%" bgcolor="#441188" border=0 cellpadding=4 cellspacing=0><tr><td>&nbsp;</td><td width="100%">&nbsp; <font color="#EEEEFF" face="Georgia,Palatino"><b>
 329 <a name="macro">The High Level Issues
 330 </b></font></td></tr></table><ul>
 331
 332
 333 <!-- _______________________________________________________________________ -->
 334 </ul><a name="hl_module"><h4><hr size=0>A Public Header File <b>is</b> a Module</h4><ul>
 335
 336 C++ doesn't do too well in the modularity department.  There is no real
 337 encapsulation or data hiding (unless you use expensive protocol classes), but it
 338 is what we have to work with.  When you write a public header file (in the LLVM
 339 source tree, they live in the top level "include" directory), you are defining a
 340 module of functionality.<p>
 341
 342 Ideally, modules should be completely independent of each other, and their
 343 header files should only include the absolute minimum number of headers
 344 possible. A module is not just a class, a function, or a namespace: <a
 345 href="http://www.cuj.com/articles/2000/0002/0002c/0002c.htm">it's a collection
 346 of these</a> that defines an interface.  This interface may be several
 347 functions, classes or data structures, but the important issue is how they work
 348 together.<p>
 349
 350 In general, a module should be implemented with one or more <tt>.cpp</tt> files.
 351 Each of these <tt>.cpp</tt> files should include the header that defines their
 352 interface first.  This ensure that all of the dependences of the module header
 353 have been properly added to the module header itself, and are not implicit.
 354 System headers should be included after user headers for a translation unit.<p>
 355
 356
 357 <!-- _______________________________________________________________________ -->
 358 </ul><a name="hl_dontinclude"><h4><hr size=0>#include as Little as Possible</h4><ul>
 359
 360 <tt>#include</tt> hurts compile time performance.  Don't do it unless you have
 361 to, especially in header files.<p>
 362
 363 But wait, sometimes you need to have the definition of a class to use it, or to
 364 inherit from it.  In these cases go ahead and #include that header file.  Be
 365 aware however that there are many cases where you don't need to have the full
 366 definition of a class.  If you are using a pointer or reference to a class, you
 367 don't need the header file.  If you are simply returning a class instance from a
 368 prototyped function or method, you don't need it.  In fact, for most cases, you
 369 simply don't need the definition of a class... and not <tt>#include</tt>'ing
 370 speeds up compilation.<p>
 371
 372 It is easy to try to go too overboard on this recommendation, however.  You
 373 <b>must</b> include all of the header files that you are using, either directly
 374 or indirectly (through another header file).  To make sure that you don't
 375 accidently forget to include a header file in your module header, make sure to
 376 include your module header <b>first</b> in the implementation file (as mentioned
 377 above).  This way there won't be any hidden dependencies that you'll find out
 378 about later...<p>
 379
 380
 381 <!-- _______________________________________________________________________ -->
 382 </ul><a name="hl_privateheaders"><h4><hr size=0>Keep "internal" Headers Private</h4><ul>
 383
 384 Many modules have a complex implementation that causes them to use more than one
 385 implementation (<tt>.cpp</tt>) file.  It is often tempting to put the internal
 386 communication interface (helper classes, extra functions, etc) in the public
 387 module header file.  Don't do this.  :)<p>
 388
 389 If you really need to do something like this, put a private header file in the
 390 same directory as the source files, and include it locally.  This ensures that
 391 your private interface remains private and undisturbed by outsiders.<p>
 392
 393 Note however, that it's okay to put extra implementation methods a public class
 394 itself... just make them private (or protected), and all is well.<p>
 395
 396
 397 <!-- ======================================================================= -->
 398 </ul><table width="100%" bgcolor="#441188" border=0 cellpadding=4 cellspacing=0><tr><td>&nbsp;</td><td width="100%">&nbsp; <font color="#EEEEFF" face="Georgia,Palatino"><b>
 399 <a name="micro">The Low Level Issues
 400 </b></font></td></tr></table><ul>
 401
 402
 403 <!-- _______________________________________________________________________ -->
 404 </ul><a name="hl_assert"><h4><hr size=0>Assert Liberally</h4><ul>
 405
 406 Use the "<tt>assert</tt>" function to its fullest.  Check all of your
 407 preconditions and assumptions, you never know when a bug (not neccesarily even
 408 yours) might be caught early by an assertion, which reduces debugging time
 409 dramatically.  The "<tt>&lt;cassert&gt;</tt>" header file is probably already
 410 included by the header files you are using, so it doesn't cost anything to use
 411 it.<p>
 412
 413 To further assist with debugging, make sure to put some kind of error message in
 414 the assertion statement (which is printed if the assertion is tripped). This
 415 helps the poor debugging make sense of why an assertion is being made and
 416 enforced, and hopefully what to do about it.  Here is one complete example:<p>
 417
 418 <pre>
 419   inline Value *getOperand(unsigned i) {
 420     assert(i &lt; Operands.size() && "getOperand() out of range!");
 421     return Operands[i];
 422   }
 423 </pre>
 424
 425 Here are some examples:
 426
 427 <pre>
 428   assert(Ty-&gt;isPointerType() && "Can't allocate a non pointer type!");
 429
 430   assert((Opcode == Shl || Opcode == Shr) && "ShiftInst Opcode invalid!");
 431
 432   assert(idx &lt; getNumSuccessors() && "Successor # out of range!");
 433
 434   assert(V1.getType() == V2.getType() && "Constant types must be identical!");
 435
 436   assert(isa&lt;PHINode&gt;(Succ-&gt;front()) && "Only works on PHId BBs!");
 437 </pre><p>
 438
 439 You get the idea...<p>
 440
 441
 442 <!-- _______________________________________________________________________ -->
 443 </ul><a name="hl_preincrement"><h4><hr size=0>Prefer Preincrement</h4><ul>
 444
 445 Hard fast rule: Preincrement (++X) may be no slower than postincrement (X++) and
 446 could very well be a lot faster than it.  Use preincrementation whenever
 447 possible.<p>
 448
 449 The semantics of postincrement include making a copy of the value being
 450 incremented, returning it, and then preincrementing the "work value".  For
 451 primitive types, this isn't a big deal... but for iterators, it can be a huge
 452 issue (for example, some iterators contains stack and set objects in them...
 453 copying an iterator could invoke the copy ctor's of these as well).  In general,
 454 get in the habit of always using preincrement, and you won't have a problem.<p>
 455
 456
 457 <!-- _______________________________________________________________________ -->
 458 </ul><a name="hl_avoidendl"><h4><hr size=0>Avoid endl</h4><ul>
 459
 460 The <tt>endl</tt> modifier, when used with iostreams outputs a newline to the
 461 output stream specified.  In addition to doing this, however, it also flushes
 462 the output stream.  In other words, these are equivalent:<p>
 463
 464 <pre>
 465   cout << endl;
 466   cout << "\n" << flush;
 467 </pre>
 468
 469 Most of the time, you probably have no reason to flush the output stream, so it's better to use a literal <tt>"\n"</tt>.<p>
 470
 471
 472 <!-- _______________________________________________________________________ -->
 473 </ul><a name="hl_exploitcpp"><h4><hr size=0>Exploit C++ to its Fullest</h4><ul>
 474
 475 C++ is a powerful language.  With a firm grasp on its capabilities, you can make
 476 write effective, consise, readable and maintainable code all at the same time.
 477 By staying consistent, you reduce the amount of special cases that need to be
 478 remembered.  Reducing the total number of lines of code you write is a good way
 479 to avoid documentation, and avoid giving bugs a place to hide.<p>
 480
 481 For these reasons, come to know and love the contents of your local
 482 &lt;algorithm&gt; header file.  Know about &lt;functional&gt; and what it can do
 483 for you.  C++ is just a tool that wants you to master it. :)<p>
 484
 485
 486
 487 <!-- ======================================================================= -->
 488 </ul><table width="100%" bgcolor="#441188" border=0 cellpadding=4 cellspacing=0><tr><td>&nbsp;</td><td width="100%">&nbsp; <font color="#EEEEFF" face="Georgia,Palatino"><b>
 489 <a name="iterators">Writing Iterators
 490 </b></font></td></tr></table><ul>
 491
 492 Here's a pretty good summary of how to write your own data structure iterators
 493 in a way that is compatible with the STL, and with a lot of other code out there
 494 (slightly edited by Chris):<p>
 495
 496 <pre>
 497 From: Ross Smith <ross.s@ihug.co.nz>
 498 Newsgroups: comp.lang.c++.moderated
 499 Subject: Writing iterators (was: Re: Non-template functions that take iterators)
 500 Date: 28 Jun 2001 12:07:10 -0400
 501
 502 Andre Majorel wrote:
 503 > Any pointers handy on "writing STL-compatible iterators for
 504 > dummies ?"
 505
 506 I'll give it a try...
 507
 508 The usual situation requiring user-defined iterators is that you have
 509 a type that bears some resemblance to an STL container, and you want
 510 to provide iterators so it can be used with STL algorithms. You need
 511 to ask three questions:
 512
 513 First, is this simply a wrapper for an underlying collection of
 514 objects that's held somewhere as a real STL container, or is it a
 515 "virtual container" for which iteration is (under the hood) more
 516 complicated than simply incrementing some underlying iterator (or
 517 pointer or index or whatever)? In the former case you can frequently
 518 get away with making your container's iterators simply typedefs for
 519 those of the underlying container; your begin() function would call
 520 member_container.begin(), and so on.
 521
 522 Second, do you only need read-only iterators, or do you need separate
 523 read-only (const) and read-write (non-const) iterators?
 524
 525 Third, which kind of iterator (input, output, forward, bidirectional,
 526 or random access) is appropriate? If you're familiar with the
 527 properties of the iterator types (if not, visit
 528 <a href="http://www.sgi.com/tech/stl/">http://www.sgi.com/tech/stl/</a>), the appropriate choice should be
 529 obvious from the semantics of the container.
 530
 531 I'll start with forward iterators, as the simplest case that's likely
 532 to come up in normal code. Input and output iterators have some odd
 533 properties and rarely need to be implemented in user code; I'll leave
 534 them out of discussion. Bidirectional and random access iterators are
 535 covered below.
 536
 537 The exact behaviour of a forward iterator is spelled out in the
 538 Standard in terms of a set of expressions with specified behaviour,
 539 rather than a set of member functions, which leaves some leeway in how
 540 you actually implement it. Typically it looks something like this
 541 (I'll start with the const-iterator-only situation):
 542
 543   #include &lt;iterator&gt;
 544
 545   class container {
 546     public:
 547       typedef something_or_other value_type;
 548       class const_iterator:
 549         public std::iterator&lt;std::forward_iterator_tag, value_type&gt; {
 550           friend class container;
 551           public:
 552             const value_type&amp; operator*() const;
 553             const value_type* operator->() const;
 554             const_iterator&amp; operator++();
 555             const_iterator operator++(int);
 556             friend bool operator==(const_iterator lhs,
 557                                    const_iterator rhs);
 558             friend bool operator!=(const_iterator lhs,
 559                                    const_iterator rhs);
 560           private:
 561             //...
 562         };
 563       //...
 564   };
 565
 566 An iterator should always be derived from an instantiation of the
 567 std::iterator template. The iterator's life cycle functions
 568 (constructors, destructor, and assignment operator) aren't declared
 569 here; in most cases the compiler-generated ones are sufficient. The
 570 container needs to be a friend of the iterator so that the container's
 571 begin() and end() functions can fill in the iterator's private members
 572 with the appropriate values.
 573
 574 <i>[Chris's Note: I prefer to not make my iterators friends.  Instead, two
 575 ctor's are provided for the iterator class: one to start at the end of the
 576 container, and one at the beginning.  Typically this is done by providing
 577 two constructors with different signatures.]</i>
 578
 579 There are normally only three member functions that need nontrivial
 580 implementations; the rest are just boilerplate.
 581
 582   const container::value_type&amp;
 583     container::const_iterator::operator*() const {
 584       // find the element and return a reference to it
 585     }
 586
 587   const container::value_type*
 588     container::const_iterator::operator->() const {
 589       return &amp;**this;
 590     }
 591
 592 If there's an underlying real container, operator*() can just return a
 593 reference to the appropriate element. If there's no actual container
 594 and the elements need to be generated on the fly -- what I think of as
 595 a "virtual container" -- things get a bit more complicated; you'll
 596 probably need to give the iterator a value_type member object, and
 597 fill it in when you need to. This might be done as part of the
 598 increment operator (below), or if the operation is nontrivial, you
 599 might choose the "lazy" approach and only generate the actual value
 600 when one of the dereferencing operators is called.
 601
 602 The operator->() function is just boilerplate around a call to
 603 operator*().
 604
 605   container::const_iterator&amp;
 606     container::const_iterator::operator++() {
 607       // the incrementing logic goes here
 608       return *this;
 609     }
 610
 611   container::const_iterator
 612     container::const_iterator::operator++(int) {
 613       const_iterator old(*this);
 614       ++*this;
 615       return old;
 616     }
 617
 618 Again, the incrementing logic will usually be trivial if there's a
 619 real container involved, more complicated if you're working with a
 620 virtual container. In particular, watch out for what happens when you
 621 increment past the last valid item -- this needs to produce an
 622 iterator that will compare equal to container.end(), and making this
 623 work is often nontrivial for virtual containers.
 624
 625 The post-increment function is just boilerplate again (and
 626 incidentally makes it obvious why all the experts recommend using
 627 pre-increment wherever possible).
 628
 629   bool operator==(container::const_iterator lhs,
 630                   container::const_iterator rhs) {
 631     // equality comparison goes here
 632   }
 633
 634   bool operator!=(container::const_iterator lhs,
 635                   container::const_iterator rhs) {
 636     return !(lhs == rhs);
 637   }
 638
 639 For a real container, the equality comparison will usually just
 640 compare the underlying iterators (or pointers or indices or whatever).
 641 The semantics of comparisons for virtual container iterators are often
 642 tricky. Remember that iterator comparison only needs to be defined for
 643 iterators into the same container, so you can often simplify things by
 644 taking for granted that lhs and rhs both point into the same container
 645 object. Again, the second function is just boilerplate.
 646
 647 It's a matter of taste whether iterator arguments are passed by value
 648 or reference; I've shown tham passed by value to reduce clutter, but
 649 if the iterator contains several data members, passing by reference
 650 may be better.
 651
 652 That convers the const-iterator-only situation. When we need separate
 653 const and mutable iterators, one small complication is added beyond
 654 the simple addition of a second class.
 655
 656   class container {
 657     public:
 658       typedef something_or_other value_type;
 659       class const_iterator;
 660       class iterator:
 661         public std::iterator&lt;std::forward_iterator_tag, value_type&gt; {
 662           friend class container;
 663           friend class container::const_iterator;
 664           public:
 665             value_type&amp; operator*() const;
 666             value_type* operator->() const;
 667             iterator&amp; operator++();
 668             iterator operator++(int);
 669             friend bool operator==(iterator lhs, iterator rhs);
 670             friend bool operator!=(iterator lhs, iterator rhs);
 671           private:
 672             //...
 673         };
 674       class const_iterator:
 675         public std::iterator&lt;std::forward_iterator_tag, value_type&gt; {
 676           friend class container;
 677           public:
 678             const_iterator();
 679             const_iterator(const iterator&amp; i);
 680             const value_type&amp; operator*() const;
 681             const value_type* operator->() const;
 682             const_iterator&amp; operator++();
 683             const_iterator operator++(int);
 684             friend bool operator==(const_iterator lhs,
 685                                    const_iterator rhs);
 686             friend bool operator!=(const_iterator lhs,
 687                                    const_iterator rhs);
 688           private:
 689             //...
 690         };
 691       //...
 692   };
 693
 694 There needs to be a conversion from iterator to const_iterator (so
 695 that mixed-type operations, such as comparison between an iterator and
 696 a const_iterator, will work). This is done here by giving
 697 const_iterator a conversion constructor from iterator (equivalently,
 698 we could have given iterator an operator const_iterator()), which
 699 requires const_iterator to be a friend of iterator, so it can copy its
 700 data members. (It also requires the addition of an explicit default
 701 constructor to const_iterator, since the existence of another
 702 user-defined constructor inhibits the compiler-defined one.)
 703
 704 Bidirectional iterators add just two member functions to forward
 705 iterators:
 706
 707   class iterator:
 708     public std::iterator&lt;std::bidirectional_iterator_tag, value_type&gt; {
 709       public:
 710         //...
 711         iterator&amp; operator--();
 712         iterator operator--(int);
 713         //...
 714     };
 715
 716 I won't detail the implementations, they're obvious variations on
 717 operator++().
 718
 719 Random access iterators add several more member and friend functions:
 720
 721   class iterator:
 722     public std::iterator&lt;std::random_access_iterator_tag, value_type&gt; {
 723       public:
 724         //...
 725         iterator&amp; operator+=(difference_type rhs);
 726         iterator&amp; operator-=(difference_type rhs);
 727         friend iterator operator+(iterator lhs, difference_type rhs);
 728         friend iterator operator+(difference_type lhs, iterator rhs);
 729         friend iterator operator-(iterator lhs, difference_type rhs);
 730         friend difference_type operator-(iterator lhs, iterator rhs);
 731         friend bool operator&lt;(iterator lhs, iterator rhs);
 732         friend bool operator&gt;(iterator lhs, iterator rhs);
 733         friend bool operator&lt;=(iterator lhs, iterator rhs);
 734         friend bool operator&gt;=(iterator lhs, iterator rhs);
 735         //...
 736     };
 737
 738   container::iterator&amp;
 739     container::iterator::operator+=(container::difference_type rhs) {
 740       // add rhs to iterator position
 741       return *this;
 742     }
 743
 744   container::iterator&amp;
 745     container::iterator::operator-=(container::difference_type rhs) {
 746       // subtract rhs from iterator position
 747       return *this;
 748     }
 749
 750   container::iterator operator+(container::iterator lhs,
 751                                 container::difference_type rhs) {
 752     return iterator(lhs) += rhs;
 753   }
 754
 755   container::iterator operator+(container::difference_type lhs,
 756                                 container::iterator rhs) {
 757     return iterator(rhs) += lhs;
 758   }
 759
 760   container::iterator operator-(container::iterator lhs,
 761                                 container::difference_type rhs) {
 762     return iterator(lhs) -= rhs;
 763   }
 764
 765   container::difference_type operator-(container::iterator lhs,
 766                                        container::iterator rhs) {
 767     // calculate distance between iterators
 768   }
 769
 770   bool operator&lt;(container::iterator lhs, container::iterator rhs) {
 771     // perform less-than comparison
 772   }
 773
 774   bool operator&gt;(container::iterator lhs, container::iterator rhs) {
 775     return rhs &lt; lhs;
 776   }
 777
 778   bool operator&lt;=(container::iterator lhs, container::iterator rhs) {
 779     return !(rhs &lt; lhs);
 780   }
 781
 782   bool operator&gt;=(container::iterator lhs, container::iterator rhs) {
 783     return !(lhs &lt; rhs);
 784   }
 785
 786 Four of the functions (operator+=(), operator-=(), the second
 787 operator-(), and operator&lt;()) are nontrivial; the rest are
 788 boilerplate.
 789
 790 One feature of the above code that some experts may disapprove of is
 791 the declaration of all the free functions as friends, when in fact
 792 only a few of them need direct access to the iterator's private data.
 793 I originally got into the habit of doing this simply to keep the
 794 declarations together; declaring some functions inside the class and
 795 some outside seemed awkward. Since then, though, I've been told that
 796 there's a subtle difference in the way name lookup works for functions
 797 declared inside a class (as friends) and outside, so keeping them
 798 together in the class is probably a good idea for practical as well as
 799 aesthetic reasons.
 800
 801 I hope all this is some help to anyone who needs to write their own
 802 STL-like containers and iterators.
 803
 804 --
 805 Ross Smith &lt;ross.s@ihug.co.nz&gt; The Internet Group, Auckland, New Zealand
 806 </pre>
 807
 808
 809 <!-- *********************************************************************** -->
 810 </ul><table width="100%" bgcolor="#330077" border=0 cellpadding=4 cellspacing=0><tr><td align=center><font color="#EEEEFF" size=+2 face="Georgia,Palatino"><b>
 811 <a name="seealso">See Also
 812 </b></font></td></tr></table><ul>
 813 <!-- *********************************************************************** -->
 814
 815 A lot of these comments and recommendations have been culled for other sources.
 816 Two particularly important books for our work are:<p>
 817
 818 <ol>
 819 <li><a href="http://www.aw.com/product/0,2627,0201924889,00.html">Effective C++</a> by Scott Meyers.  There is an online version of the book (only some chapters though) <a href="http://www.awlonline.com/cseng/meyerscddemo/">available as well</a>.
 820 <li><a href="http://cseng.aw.com/book/0,3828,0201633620,00.html">Large-Scale C++ Software Design</a> by John Lakos
 821 </ol><p>
 822
 823 If you get some free time, and you haven't read them: do so, you might learn
 824 something. :)
 825
 826
 827 <!-- *********************************************************************** -->
 828 </ul>
 829 <!-- *********************************************************************** -->
 830
 831 <hr>
 832 <font size=-1>
 833 <address><a href="mailto:sabre@nondot.org">Chris Lattner</a></address>
 834 <!-- Created: Tue Jan 23 15:19:28 CST 2001 -->
 835 <!-- hhmts start -->
 836 Last modified: Thu Aug  7 16:44:33 CDT 2003
 837 <!-- hhmts end -->
 838 </font>
 839 </body></html>