System Library

Warning: This document is a work in progress.

Written by Reid Spencer

Abstract

This document describes the requirements, design, and implementation details of LLVM's System Library. The library is composed of the header files in llvm/include/llvm/System and the source files in llvm/lib/System. The goal of this library is to completely shield LLVM from the variations in operating system interfaces. By centralizing LLVM's use of operating system interfaces, we make it possible for the LLVM tool chain and runtime libraries to be more easily ported to new platforms. The library also unclutters the rest of LLVM from #ifdef use and special cases for specific operating systems.

The System Library was donated to LLVM by Reid Spencer who formulated the original design as part of the eXtensible Programming System (XPS) which is based, in part, on LLVM.

System Library Requirements

The System library's requirements are aimed at shielding LLVM from the variations in operating system interfaces. The following sections define the requirements needed to fulfill this objective.

Hide System Header Files

To be written.

No Exposed Functions

To be written.

No Exposed Data

To be written.

No Exceptions

To be written.

Standard Error Codes

To be written.

Minimize Overhead

To be written.

System Library Design

In order to fulfill the requirements of the system library, strict design objectives must be maintained in the library as it evolves. The goal here is to provide interfaces to operating system concepts (files, memory maps, sockets, signals, locking, etc) efficiently and in such a way that the remainder of LLVM is completely operating system agnostic.

Use Opaque Classes

no public data

onlyprimitive typed private/protected data

data size is "right" for platform, not max of all platforms

each class corresponds to O/S concept

Common Implementations

To be written.

Multiple Implementations

To be written.

Use Low Level Interfaces

To be written.

No Memory Allocation

To be written.

No Virtual Methods

To be written.

System Library Details

To be written.

Bug 351

See bug 351 for further details on the progress of this work

Rationale For #include Hierarchy

In order to provide different implementations of the lib/System interface for different platforms, it is necessary for the library to "sense" which operating system is being compiled for and conditionally compile only the applicabe parts of the library. While several operating system wrapper libraries (e.g. APR, ACE) choose to use #ifdef preprocessor statements in combination with autoconf variable (HAVE_* family), lib/System chooses an alternate strategy.

To put it succinctly, the lib/System strategy has traded "#ifdef hell" for "#include hell". That is, a given implementation file defines one or more functions for a particular operating system variant. The functions defined in that file have no #ifdef's to disambiguate the platform since the file is only compiled on one kind of platform. While this leads to the same function being imlemented differently in different files, it is our contention that this leads to better maintenance and easier portability.

For example, consider a function having different implementations on a variety of platforms. Many wrapper libraries choose to deal with the different implementations by using #ifdef, like this:


      void SomeFunction(void) {
      #if defined __LINUX
        // .. Linux implementation
      #elif defined __WIN32
        // .. Win32 implementation
      #elif defined __SunOS
        // .. SunOS implementation
      #else
      #warning "Don't know how to implement SomeFunction on this platform"
      #endif
      }
  

The problem with this is that its very messy to read, especially as the number of operating systems and their variants grow. The above example is actually tame compared to what can happen when the implementation depends on specific flavors and versions of the operating system. In that case you end up with multiple levels of nested #if statements. This is what we mean by "#ifdef hell".

To avoid the situation above, we've choosen to locate all functions for a given implementation file for a specific operating system into one place. This has the following advantages:

So, given that we have decided to use #include instead of #if to provide platform specific implementations, there are actually three ways we can go about doing this. None of them are perfect, but we believe we've chosen the lesser of the three evils. Given that there is a variable named $OS which names the platform for which we must build, here's a summary of the three approaches we could use to determine the correct directory:

  1. Provide the compiler with a -I$(OS) on the command line. This could be provided in only the lib/System makefile.
  2. Use autoconf to transform #include statements in the implementation files by using substitutions of @OS@. For example, if we had a file, File.cpp.in, that contained "#include <@OS@/File.cpp>" this would get transformed to "#include <actual/File.cpp>" where "actual" is the actual name of the operating system
  3. Create a link from $OBJ_DIR/platform to $SRC_DIR/$OS. This allows us to use a generic directory name to get the correct platform, as in #include <platform/File.cpp>

Let's look at the pitfalls of each approach.

In approach #1, we end up with some confusion as to what gets included. Suppose we have lib/System/File.cpp that includes just File.cpp to get the platform specific part of the implementation. In this case, the include directive with the <> syntax will include the right file but the include directive with the "" syntax will recursively include the same file, lib/System/File.cpp. In the case of #include <File.cpp>, the -I options to the compiler are searched first so it works. But in the #include "File.cpp" case, the current directory is searched first. Furthermore, in both cases, neither include directive documents which File.cpp is getting included.

In approach #2, we have the problem of needing to reconfigure repeatedly. Developer's generally hate that and we don't want lib/System to be a thorn in everyone's side because it will constantly need updating as operating systems change and as new operating systems are added. The problem occurs when a new implementation file is added to the library. First of all, you have to add a file with the .in suffix, then you have to add that file name to the list of configurable files in the autoconf/configure.ac file, then you have to run AutoRegen.sh to rebuild the configure script, then you have to run the configure script. This is deemed to be a pretty large hassle.

In approach #3, we have the problem that not all platforms support links. Fortunately the autoconf macro used to create the link can compensate for this. If a link can't be made, the configure script will copy the correct directory from $BUILD_SRC_DIR to $BUILD_OBJ_DIR under the new name. The only problem with this is that if a copy is made, the copy doesn't get updated if the programmer adds or modifies files in the $BUILD_SRC_DIR. A reconfigure or manual copying is needed to get things to compile.

The approach we have taken in lib/System is #3. Here's why:

Reference Implementation

The linux implementation of the system library will always be the reference implementation. This means that (a) the concepts defined by the linux must be identically replicated in the other implementations and (b) the linux implementation must always be complete (provide implementations for all concepts).


Valid CSS! Valid HTML 4.01! Reid Spencer
LLVM Compiler Infrastructure
Last modified: $Date$