Alias Analysis Infrastructure in LLVM

Alias Analysis (or Pointer Analysis) is a technique which attempts to determine whether or not two pointers ever can point to the same object in memory. Traditionally, Alias Analyses respond to a query with either a Must, May, or No alias response, indicating that two pointers do point to the same object, might point to the same object, or are known not to point to the same object.

The AliasAnalysis class is the centerpiece of the LLVM Alias Analysis related infrastructure. This class is the common interface between clients of alias analysis information and the implementations providing it. In addition to simple alias analysis information, this class exposes Mod/Ref information from those implementations which can provide it, allowing for powerful analyses and transformations to work well together.

This document contains information necessary to successfully implement this interface, use it, and to test both sides. It also explains some of the finer points about what exactly results mean. If you feel that something is unclear or should be added, please let me know.

The AliasAnalysis class defines the interface that Alias Analysis implementations should support. This class exports two important enums: AliasResult and ModRefResult which represent the result of an alias query or a mod/ref query, respectively.

The AliasAnalysis interface exposes information about memory, represented in several different ways. In particular, memory objects are represented as a starting address and size, and function calls are represented as the actual call or invoke instructions that performs the call. The AliasAnalysis interface also exposes some helper methods which allow you to get mod/ref information for arbitrary instructions.

Most importantly, the AliasAnalysis class provides several methods which are used to query whether or not pointers alias, whether function calls can modify or read memory, etc.

Representing memory objects as a starting address and a size is critically important for precise Alias Analyses. For example, consider this (silly) C code:

  int i;
  char C[2];
  char A[10]; 
  /* ... */
  for (i = 0; i != 10; ++i) {
    C[0] = A[i];          /* One byte store */
    C[1] = A[9-i];        /* One byte store */
  }

In this case, the basicaa pass will disambiguate the stores to C[0] and C[1] because they are accesses to two distinct locations one byte apart, and the accesses are each one byte. In this case, the LICM pass can use store motion to remove the stores from the loop. In constrast, the following code:

  int i;
  char C[2];
  char A[10]; 
  /* ... */
  for (i = 0; i != 10; ++i) {
    ((short*)C)[0] = A[i];  /* Two byte store! */
    C[1] = A[9-i];          /* One byte store */
  }

In this case, the two stores to C do alias each other, because the access to the &C[0] element is a two byte access. If size information wasn't available in the query, even the first case would have to conservatively assume that the accesses alias.

An Alias Analysis implementation can return one of three responses: MustAlias, MayAlias, and NoAlias. The No and May alias results are obvious: if the two pointers may never equal each other, return NoAlias, if they might, return MayAlias.

The Must Alias response is trickier though. In LLVM, the Must Alias response may only be returned if the two memory objects are guaranteed to always start at exactly the same location. If two memory objects overlap, but do not start at the same location, MayAlias must be returned.

The getModRefInfo methods return information about whether the execution of an instruction can read or modify a memory location. Mod/Ref information is always conservative: if an action may read a location, Ref is returned.

Writing a new alias analysis implementation for LLVM is quite straight-forward. There are already several implementations that you can use for examples, and the following information should help fill in any details. For a minimal example, take a look at the no-aa implementation.

The first step to determining what type of LLVM pass you need to use for your Alias Analysis. As is the case with most other analyses and transformations, the answer should be fairly obvious from what type of problem you are trying to solve:

If you require interprocedural analysis, it should be a Pass.
If you are a global analysis, subclass FunctionPass.
If you are a local pass, subclass BasicBlockPass.
If you don't need to look at the program at all, subclass ImmutablePass.

In addition to the pass that you subclass, you should also inherit from the AliasAnalysis interface, of course, and use the RegisterAnalysisGroup template to register as an implementation of AliasAnalysis.

Your subclass of AliasAnalysis is required to invoke two methods on the AliasAnalysis base class: getAnalysisUsage and InitializeAliasAnalysis. In particular, your implementation of getAnalysisUsage should explicitly call into the AliasAnalysis::getAnalysisUsage method in addition to doing any declaring any pass dependencies your pass has. Thus you should have something like this:

    void getAnalysisUsage(AnalysisUsage &AU) const {
      AliasAnalysis::getAnalysisUsage(AU);
      // declare your dependencies here.
    }

Additionally, your must invoke the InitializeAliasAnalysis method from your analysis run method (run for a Pass, runOnFunction for a FunctionPass, runOnBasicBlock for a BasicBlockPass, or InitializeAliasAnalysis for an ImmutablePass). For example (as part of a Pass):

    bool run(Module &M) {
      InitializeAliasAnalysis(this);
      // Perform analysis here...
      return false;
    }

All of the AliasAnalysis virtual methods default to providing conservatively correct information (returning "May" Alias and "Mod/Ref" for alias and mod/ref queries respectively). Depending on the capabilities of the analysis you are implementing, you just override the interfaces you can improve.

With only two special exceptions (the basicaa and no-aa passes) every alias analysis pass should chain to another alias analysis implementation (for example, you could specify "-basic-aa -ds-aa -andersens-aa -licm" to get the maximum benefit from the three alias analyses). To do this, simply "Require" AliasAnalysis in your getAnalysisUsage method, and if you need to return a conservative MayAlias or Mod/Ref result, simply chain to a lower analysis.

From the LLVM perspective, the only thing you need to do to provide an efficient alias analysis is to make sure that alias analysis queries are serviced quickly. The actual calculation of the alias analysis results (the "run" method) is only performed once, but many (perhaps duplicate) queries may be performed. Because of this, try to move as much computation to the run method as possible (within reason).

There are several different ways to use alias analysis results. In order of preference, these are...

The load-vn pass uses alias analysis to provide value numbering information for load instructions. If your analysis or transformation can be modelled in a form that uses value numbering information, you don't have to do anything special to handle load instructions: just use the load-vn pass, which uses alias analysis.

Many transformations need information about alias sets that are active in some scope, rather than information about pairwise aliasing. The AliasSetTracker class is used to efficiently build these Alias Sets from the pairwise alias analysis information provided by the AliasAnalysis interface.

First you initialize the AliasSetTracker by use the "add" methods to add information about various potentially aliasing instructions in the scope you are interested in. Once all of the alias sets are completed, your pass should simply iterate through the constructed alias sets, using the AliasSetTracker begin()/end() methods.

The AliasSets formed by the AliasSetTracker are guaranteed to be disjoint, calculate mod/ref information and volatility for the set, and keep track of whether or not all of the pointers in the set are Must aliases. The AliasSetTracker also makes sure that sets are properly folded due to call instructions, and can provide a list of pointers in each set.

As an example user of this, the Loop Invariant Code Motion pass uses AliasSetTrackers to build alias information about each loop nest. If an AliasSet in a loop is not modified, then all load instructions from that set may be hoisted out of the loop. If any alias sets are stored and are must alias sets, then the stores may be sunk to outside of the loop, promoting the memory location to a register for the duration of the loop nest. Both of these transformations obviously only apply if the pointer argument is loop-invariant.

The AliasSetTracker class is implemented to be as efficient as possible. It uses the union-find algorithm to efficiently merge AliasSets when a pointer is inserted into the AliasSetTracker that aliases multiple sets. The primary data structure is a hash table mapping pointers to the AliasSet they are in.

The AliasSetTracker class must maintain a list of all of the LLVM Value*'s that are in each AliasSet. Since the hash table already has entries for each LLVM Value* of interest, the AliasesSets thread the linked list through these hash-table nodes to avoid having to allocate memory unnecessarily, and to make merging alias sets extremely efficient (the linked list merge is constant time).

You shouldn't need to understand these details if you are just a client of the AliasSetTracker, but if you look at the code, hopefully this brief description will help make sense of why things are designed the way they are.

As a last resort, your pass could use the AliasAnalysis interface directly to service your pass. If you find the need to do this, please let me know so I can see if something new needs to be added to LLVM.

If you're going to be working with the AliasAnalysis infrastructure, there are several nice tools that may be useful for you and are worth knowing about...

The -no-aa analysis is just like what it sounds: an alias analysis that never returns any useful information. This pass can be useful if you think that alias analysis is doing something wrong and are trying to narrow down a problem. If you don't specify an alias analysis, the default will be to use the basicaa pass which does quite a bit of disambiguation on its own.

The -print-alias-sets pass is exposed as part of the analyze tool to print out the Alias Sets formed by the AliasSetTracker class. This is useful if you're using the AliasSetTracker.

The -count-aa pass is useful to see how many queries a particular pass is making and what kinds of responses are returned by the alias analysis. An example usage is:

  $ opt -basicaa -count-aa -ds-aa -count-aa -licm

Which will print out how many queries (and what responses are returned) by the -licm pass (of the -ds-aa pass) and how many queries are made of the -basicaa pass by the -ds-aa pass. This can be useful when evaluating an alias analysis for precision.