LLVM vs. the World - Comparing Compilers to Compilers
  1. Introduction
  2. General Applicability
  3. Type System
  4. Control-flow and Data-flow Information
  5. Registers
  6. Programmer Interface
  7. Machine Code Emission

Written by Brian R. Gaeke

Introduction

Whether you are a stranger to LLVM or not, and whether you are considering using it for your projects or not, you may find it useful to understand how we compare ourselves to other well-known compilers. The following list of points should help you understand -- from our point of view -- some of the important ways in which we see LLVM as different from other selected compilers and code generation systems.

At the moment, we only compare ourselves below to GCC and GNU lightning, but we will try to revise and expand it as our knowledge and experience permit. Contributions are welcome.

General Applicability

GNU lightning: Only currently usable for dynamic runtime emission of binary machine code to memory. Supports one backend at a time.

LLVM: Supports compilation of C and C++ (with more languages coming soon), strong SSA-based optimization at compile-time, link-time, run-time, and off-line, and multiple platform backends with Just-in-Time and ahead-of-time compilation frameworks. (See our document on Lifelong Code Optimization for more.)

GCC: Many relatively mature platform backends support assembly-language code generation from many source languages. No run-time compilation support.

Type System

GNU lightning: C integer types and "void *" are supported. No type checking is performed. Explicit type casts are not typically necessary unless the underlying machine-specific types are distinct (e.g., sign- or zero-extension is apparently necessary, but casting "int" to "void *" would not be.) Floating-point support may not work on all platforms (it does not appear to be documented in the latest release).

LLVM: Compositional type system based on C types, supporting structures, opaque types, and C integer and floating point types. Explicit cast instructions are required to transform a value from one type to another.

GCC: Union of high-level types including those used in Pascal, C, C++, Ada, Java, and FORTRAN.

Control-flow and Data-flow Information

GNU lightning: No data-flow information encoded in the generated program. No support for calculating CFG or def-use chains over generated programs.

LLVM: Scalar values in Static Single-Assignment form; def-use chains and CFG always implicitly available and automatically kept up to date.

GCC: Trees and RTL do not directly encode data-flow info; but def-use chains and CFGs can be calculated on the side. They are not automatically kept up to date.

Registers

GNU lightning: Very small fixed register set -- it takes the least common denominator of supported platforms; basically it inherits its tiny register set from IA-32, unnecessarily crippling targets like PowerPC with a large register set.

LLVM: An infinite register set, reduced to a particular platform's finite register set by register allocator.

GCC: Trees and RTL provide an arbitrarily large set of values. Reduced to a particular platform's finite register set by register allocator.

Programmer Interface

GNU lightning: Library interface based on C preprocessor macros that emit binary code for a particular instruction to memory. No support for manipulating code before emission.

LLVM: Library interface based on classes representing platform-independent intermediate code (Instruction) and platform-dependent code (MachineInstr) which can be manipulated arbitrarily and then emitted to memory.

GCC: Internal header file interface (tree.h) to abstract syntax trees, representing roughly the union of all possible supported source-language constructs; also, an internal header file interface (rtl.h, rtl.def) to a low-level IR called RTL which represents roughly the union of all possible target machine instructions.

Machine Code Emission

GNU lightning: Only supports binary machine code emission to memory.

LLVM: Supports writing out assembly language to a file, and binary machine code to memory, from the same back-end.

GCC: Supports writing out assembly language to a file. No support for emitting machine code to memory.