1 ===================================
2 Customizing LLVMC: Reference Manual
3 ===================================
4 :Author: Mikhail Glushenkov <foldr@codedegers.com>
6 LLVMC is a generic compiler driver, designed to be customizable and
7 extensible. It plays the same role for LLVM as the ``gcc`` program
8 does for GCC - LLVMC's job is essentially to transform a set of input
9 files into a set of targets depending on configuration rules and user
10 options. What makes LLVMC different is that these transformation rules
11 are completely customizable - in fact, LLVMC knows nothing about the
12 specifics of transformation (even the command-line options are mostly
13 not hard-coded) and regards the transformation structure as an
14 abstract graph. The structure of this graph is completely determined
15 by plugins, which can be either statically or dynamically linked. This
16 makes it possible to easily adapt LLVMC for other purposes - for
17 example, as a build tool for game resources.
19 Because LLVMC employs TableGen [1]_ as its configuration language, you
20 need to be familiar with it to customize LLVMC.
29 LLVMC tries hard to be as compatible with ``gcc`` as possible,
30 although there are some small differences. Most of the time, however,
31 you shouldn't be able to notice them::
33 $ # This works as expected:
34 $ llvmc -O3 -Wall hello.cpp
38 One nice feature of LLVMC is that one doesn't have to distinguish
39 between different compilers for different languages (think ``g++`` and
40 ``gcc``) - the right toolchain is chosen automatically based on input
41 language names (which are, in turn, determined from file
42 extensions). If you want to force files ending with ".c" to compile as
43 C++, use the ``-x`` option, just like you would do it with ``gcc``::
45 $ # hello.c is really a C++ file
46 $ llvmc -x c++ hello.c
50 On the other hand, when using LLVMC as a linker to combine several C++
51 object files you should provide the ``--linker`` option since it's
52 impossible for LLVMC to choose the right linker in that case::
56 [A lot of link-time errors skipped]
57 $ llvmc --linker=c++ hello.o
61 By default, LLVMC uses ``llvm-gcc`` to compile the source code. It is
62 also possible to choose the work-in-progress ``clang`` compiler with
63 the ``-clang`` option.
69 LLVMC has some built-in options that can't be overridden in the
70 configuration libraries:
72 * ``-o FILE`` - Output file name.
74 * ``-x LANGUAGE`` - Specify the language of the following input files
75 until the next -x option.
77 * ``-load PLUGIN_NAME`` - Load the specified plugin DLL. Example:
78 ``-load $LLVM_DIR/Release/lib/LLVMCSimple.so``.
80 * ``-v`` - Enable verbose mode, i.e. print out all executed commands.
82 * ``--view-graph`` - Show a graphical representation of the compilation
83 graph. Requires that you have ``dot`` and ``gv`` programs
84 installed. Hidden option, useful for debugging.
86 * ``--write-graph`` - Write a ``compilation-graph.dot`` file in the
87 current directory with the compilation graph description in the
88 Graphviz format. Hidden option, useful for debugging.
90 * ``--save-temps`` - Write temporary files to the current directory
91 and do not delete them on exit. Hidden option, useful for debugging.
93 * ``--help``, ``--help-hidden``, ``--version`` - These options have
94 their standard meaning.
97 Compiling LLVMC plugins
98 =======================
100 It's easiest to start working on your own LLVMC plugin by copying the
101 skeleton project which lives under ``$LLVMC_DIR/plugins/Simple``::
103 $ cd $LLVMC_DIR/plugins
104 $ cp -r Simple MyPlugin
107 Makefile PluginMain.cpp Simple.td
109 As you can see, our basic plugin consists of only two files (not
110 counting the build script). ``Simple.td`` contains TableGen
111 description of the compilation graph; its format is documented in the
112 following sections. ``PluginMain.cpp`` is just a helper file used to
113 compile the auto-generated C++ code produced from TableGen source. It
114 can also contain hook definitions (see `below`__).
118 The first thing that you should do is to change the ``LLVMC_PLUGIN``
119 variable in the ``Makefile`` to avoid conflicts (since this variable
120 is used to name the resulting library)::
122 LLVMC_PLUGIN=MyPlugin
124 It is also a good idea to rename ``Simple.td`` to something less
127 $ mv Simple.td MyPlugin.td
129 Note that the plugin source directory must be placed under
130 ``$LLVMC_DIR/plugins`` to make use of the existing build
131 infrastructure. To build a version of the LLVMC executable called
132 ``mydriver`` with your plugin compiled in, use the following command::
135 $ make BUILTIN_PLUGINS=MyPlugin DRIVER_NAME=mydriver
137 To build your plugin as a dynamic library, just ``cd`` to its source
138 directory and run ``make``. The resulting file will be called
139 ``LLVMC$(LLVMC_PLUGIN).$(DLL_EXTENSION)`` (in our case,
140 ``LLVMCMyPlugin.so``). This library can be then loaded in with the
141 ``-load`` option. Example::
143 $ cd $LLVMC_DIR/plugins/Simple
145 $ llvmc -load $LLVM_DIR/Release/lib/LLVMCSimple.so
147 Sometimes, you will want a 'bare-bones' version of LLVMC that has no
148 built-in plugins. It can be compiled with the following command::
151 $ make BUILTIN_PLUGINS=""
154 Customizing LLVMC: the compilation graph
155 ========================================
157 Each TableGen configuration file should include the common
160 include "llvm/CompilerDriver/Common.td"
162 Internally, LLVMC stores information about possible source
163 transformations in form of a graph. Nodes in this graph represent
164 tools, and edges between two nodes represent a transformation path. A
165 special "root" node is used to mark entry points for the
166 transformations. LLVMC also assigns a weight to each edge (more on
167 this later) to choose between several alternative edges.
169 The definition of the compilation graph (see file
170 ``plugins/Base/Base.td`` for an example) is just a list of edges::
172 def CompilationGraph : CompilationGraph<[
173 Edge<"root", "llvm_gcc_c">,
174 Edge<"root", "llvm_gcc_assembler">,
177 Edge<"llvm_gcc_c", "llc">,
178 Edge<"llvm_gcc_cpp", "llc">,
181 OptionalEdge<"llvm_gcc_c", "opt", (case (switch_on "opt"),
183 OptionalEdge<"llvm_gcc_cpp", "opt", (case (switch_on "opt"),
187 OptionalEdge<"llvm_gcc_assembler", "llvm_gcc_cpp_linker",
188 (case (input_languages_contain "c++"), (inc_weight),
189 (or (parameter_equals "linker", "g++"),
190 (parameter_equals "linker", "c++")), (inc_weight))>,
195 As you can see, the edges can be either default or optional, where
196 optional edges are differentiated by an additional ``case`` expression
197 used to calculate the weight of this edge. Notice also that we refer
198 to tools via their names (as strings). This makes it possible to add
199 edges to an existing compilation graph in plugins without having to
200 know about all tool definitions used in the graph.
202 The default edges are assigned a weight of 1, and optional edges get a
203 weight of 0 + 2*N where N is the number of tests that evaluated to
204 true in the ``case`` expression. It is also possible to provide an
205 integer parameter to ``inc_weight`` and ``dec_weight`` - in this case,
206 the weight is increased (or decreased) by the provided value instead
207 of the default 2. It is also possible to change the default weight of
208 an optional edge by using the ``default`` clause of the ``case``
211 When passing an input file through the graph, LLVMC picks the edge
212 with the maximum weight. To avoid ambiguity, there should be only one
213 default edge between two nodes (with the exception of the root node,
214 which gets a special treatment - there you are allowed to specify one
215 default edge *per language*).
217 When multiple plugins are loaded, their compilation graphs are merged
218 together. Since multiple edges that have the same end nodes are not
219 allowed (i.e. the graph is not a multigraph), an edge defined in
220 several plugins will be replaced by the definition from the plugin
221 that was loaded last. Plugin load order can be controlled by using the
222 plugin priority feature described above.
224 To get a visual representation of the compilation graph (useful for
225 debugging), run ``llvmc --view-graph``. You will need ``dot`` and
226 ``gsview`` installed for this to work properly.
231 Command-line options that the plugin supports are defined by using an
234 def Options : OptionList<[
235 (switch_option "E", (help "Help string")),
236 (alias_option "quiet", "q")
240 As you can see, the option list is just a list of DAGs, where each DAG
241 is an option description consisting of the option name and some
242 properties. A plugin can define more than one option list (they are
243 all merged together in the end), which can be handy if one wants to
244 separate option groups syntactically.
246 * Possible option types:
248 - ``switch_option`` - a simple boolean switch, for example ``-time``.
250 - ``parameter_option`` - option that takes an argument, for example
253 - ``parameter_list_option`` - same as the above, but more than one
254 occurence of the option is allowed.
256 - ``prefix_option`` - same as the parameter_option, but the option name
257 and parameter value are not separated.
259 - ``prefix_list_option`` - same as the above, but more than one
260 occurence of the option is allowed; example: ``-lm -lpthread``.
262 - ``alias_option`` - a special option type for creating
263 aliases. Unlike other option types, aliases are not allowed to
264 have any properties besides the aliased option name. Usage
265 example: ``(alias_option "preprocess", "E")``
268 * Possible option properties:
270 - ``help`` - help string associated with this option. Used for
273 - ``required`` - this option is obligatory.
275 - ``hidden`` - this option should not appear in the ``--help``
276 output (but should appear in the ``--help-hidden`` output).
278 - ``really_hidden`` - the option should not appear in any help
281 - ``extern`` - this option is defined in some other plugin, see below.
286 Sometimes, when linking several plugins together, one plugin needs to
287 access options defined in some other plugin. Because of the way
288 options are implemented, such options should be marked as
289 ``extern``. This is what the ``extern`` option property is
293 (switch_option "E", (extern))
296 See also the section on plugin `priorities`__.
302 Conditional evaluation
303 ======================
305 The 'case' construct is the main means by which programmability is
306 achieved in LLVMC. It can be used to calculate edge weights, program
307 actions and modify the shell commands to be executed. The 'case'
308 expression is designed after the similarly-named construct in
309 functional languages and takes the form ``(case (test_1), statement_1,
310 (test_2), statement_2, ... (test_N), statement_N)``. The statements
311 are evaluated only if the corresponding tests evaluate to true.
315 // Edge weight calculation
317 // Increases edge weight by 5 if "-A" is provided on the
318 // command-line, and by 5 more if "-B" is also provided.
320 (switch_on "A"), (inc_weight 5),
321 (switch_on "B"), (inc_weight 5))
324 // Tool command line specification
326 // Evaluates to "cmdline1" if the option "-A" is provided on the
327 // command line; to "cmdline2" if "-B" is provided;
328 // otherwise to "cmdline3".
331 (switch_on "A"), "cmdline1",
332 (switch_on "B"), "cmdline2",
333 (default), "cmdline3")
335 Note the slight difference in 'case' expression handling in contexts
336 of edge weights and command line specification - in the second example
337 the value of the ``"B"`` switch is never checked when switch ``"A"`` is
338 enabled, and the whole expression always evaluates to ``"cmdline1"`` in
341 Case expressions can also be nested, i.e. the following is legal::
343 (case (switch_on "E"), (case (switch_on "o"), ..., (default), ...)
346 You should, however, try to avoid doing that because it hurts
347 readability. It is usually better to split tool descriptions and/or
348 use TableGen inheritance instead.
350 * Possible tests are:
352 - ``switch_on`` - Returns true if a given command-line switch is
353 provided by the user. Example: ``(switch_on "opt")``.
355 - ``parameter_equals`` - Returns true if a command-line parameter equals
357 Example: ``(parameter_equals "W", "all")``.
359 - ``element_in_list`` - Returns true if a command-line parameter
360 list contains a given value.
361 Example: ``(parameter_in_list "l", "pthread")``.
363 - ``input_languages_contain`` - Returns true if a given language
364 belongs to the current input language set.
365 Example: ``(input_languages_contain "c++")``.
367 - ``in_language`` - Evaluates to true if the input file language
368 equals to the argument. At the moment works only with ``cmd_line``
369 and ``actions`` (on non-join nodes).
370 Example: ``(in_language "c++")``.
372 - ``not_empty`` - Returns true if a given option (which should be
373 either a parameter or a parameter list) is set by the
375 Example: ``(not_empty "o")``.
377 - ``default`` - Always evaluates to true. Should always be the last
378 test in the ``case`` expression.
380 - ``and`` - A standard logical combinator that returns true iff all
381 of its arguments return true. Used like this: ``(and (test1),
382 (test2), ... (testN))``. Nesting of ``and`` and ``or`` is allowed,
385 - ``or`` - Another logical combinator that returns true only if any
386 one of its arguments returns true. Example: ``(or (test1),
387 (test2), ... (testN))``.
390 Writing a tool description
391 ==========================
393 As was said earlier, nodes in the compilation graph represent tools,
394 which are described separately. A tool definition looks like this
395 (taken from the ``include/llvm/CompilerDriver/Tools.td`` file)::
397 def llvm_gcc_cpp : Tool<[
399 (out_language "llvm-assembler"),
400 (output_suffix "bc"),
401 (cmd_line "llvm-g++ -c $INFILE -o $OUTFILE -emit-llvm"),
405 This defines a new tool called ``llvm_gcc_cpp``, which is an alias for
406 ``llvm-g++``. As you can see, a tool definition is just a list of
407 properties; most of them should be self-explanatory. The ``sink``
408 property means that this tool should be passed all command-line
409 options that aren't mentioned in the option list.
411 The complete list of all currently implemented tool properties follows.
413 * Possible tool properties:
415 - ``in_language`` - input language name. Can be either a string or a
416 list, in case the tool supports multiple input languages.
418 - ``out_language`` - output language name. Tools are not allowed to
419 have multiple output languages.
421 - ``output_suffix`` - output file suffix. Can also be changed
422 dynamically, see documentation on actions.
424 - ``cmd_line`` - the actual command used to run the tool. You can
425 use ``$INFILE`` and ``$OUTFILE`` variables, output redirection
426 with ``>``, hook invocations (``$CALL``), environment variables
427 (via ``$ENV``) and the ``case`` construct.
429 - ``join`` - this tool is a "join node" in the graph, i.e. it gets a
430 list of input files and joins them together. Used for linkers.
432 - ``sink`` - all command-line options that are not handled by other
433 tools are passed to this tool.
435 - ``actions`` - A single big ``case`` expression that specifies how
436 this tool reacts on command-line options (described in more detail
442 A tool often needs to react to command-line options, and this is
443 precisely what the ``actions`` property is for. The next example
444 illustrates this feature::
446 def llvm_gcc_linker : Tool<[
447 (in_language "object-code"),
448 (out_language "executable"),
449 (output_suffix "out"),
450 (cmd_line "llvm-gcc $INFILE -o $OUTFILE"),
452 (actions (case (not_empty "L"), (forward "L"),
453 (not_empty "l"), (forward "l"),
455 [(append_cmd "-dummy1"), (append_cmd "-dummy2")])
458 The ``actions`` tool property is implemented on top of the omnipresent
459 ``case`` expression. It associates one or more different *actions*
460 with given conditions - in the example, the actions are ``forward``,
461 which forwards a given option unchanged, and ``append_cmd``, which
462 appends a given string to the tool execution command. Multiple actions
463 can be associated with a single condition by using a list of actions
464 (used in the example to append some dummy options). The same ``case``
465 construct can also be used in the ``cmd_line`` property to modify the
468 The "join" property used in the example means that this tool behaves
471 The list of all possible actions follows.
475 - ``append_cmd`` - append a string to the tool invocation
477 Example: ``(case (switch_on "pthread"), (append_cmd "-lpthread"))``
479 - ``forward`` - forward an option unchanged.
480 Example: ``(forward "Wall")``.
482 - ``forward_as`` - Change the name of an option, but forward the
484 Example: ``(forward_as "O0" "--disable-optimization")``.
486 - ``output_suffix`` - modify the output suffix of this
488 Example: ``(output_suffix "i")``.
490 - ``stop_compilation`` - stop compilation after this tool processes
491 its input. Used without arguments.
493 - ``unpack_values`` - used for for splitting and forwarding
494 comma-separated lists of options, e.g. ``-Wa,-foo=bar,-baz`` is
495 converted to ``-foo=bar -baz`` and appended to the tool invocation
497 Example: ``(unpack_values "Wa,")``.
502 If you are adding support for a new language to LLVMC, you'll need to
503 modify the language map, which defines mappings from file extensions
504 to language names. It is used to choose the proper toolchain(s) for a
505 given input file set. Language map definition looks like this::
507 def LanguageMap : LanguageMap<
508 [LangToSuffixes<"c++", ["cc", "cp", "cxx", "cpp", "CPP", "c++", "C"]>,
509 LangToSuffixes<"c", ["c"]>,
513 For example, without those definitions the following command wouldn't work::
516 llvmc: Unknown suffix: cpp
518 The language map entries should be added only for tools that are
519 linked with the root node. Since tools are not allowed to have
520 multiple output languages, for nodes "inside" the graph the input and
521 output languages should match. This is enforced at compile-time.
529 Hooks and environment variables
530 -------------------------------
532 Normally, LLVMC executes programs from the system ``PATH``. Sometimes,
533 this is not sufficient: for example, we may want to specify tool names
534 in the configuration file. This can be achieved via the mechanism of
535 hooks - to write your own hooks, just add their definitions to the
536 ``PluginMain.cpp`` or drop a ``.cpp`` file into the
537 ``$LLVMC_DIR/driver`` directory. Hooks should live in the ``hooks``
538 namespace and have the signature ``std::string hooks::MyHookName
539 (void)``. They can be used from the ``cmd_line`` tool property::
541 (cmd_line "$CALL(MyHook)/path/to/file -o $CALL(AnotherHook)")
543 It is also possible to use environment variables in the same manner::
545 (cmd_line "$ENV(VAR1)/path/to/file -o $ENV(VAR2)")
547 To change the command line string based on user-provided options use
548 the ``case`` expression (documented `above`__)::
553 "llvm-g++ -E -x c $INFILE -o $OUTFILE",
555 "llvm-g++ -c -x c $INFILE -o $OUTFILE -emit-llvm"))
561 How plugins are loaded
562 ----------------------
564 It is possible for LLVMC plugins to depend on each other. For example,
565 one can create edges between nodes defined in some other plugin. To
566 make this work, however, that plugin should be loaded first. To
567 achieve this, the concept of plugin priority was introduced. By
568 default, every plugin has priority zero; to specify the priority
569 explicitly, put the following line in your plugin's TableGen file::
571 def Priority : PluginPriority<$PRIORITY_VALUE>;
572 # Where PRIORITY_VALUE is some integer > 0
574 Plugins are loaded in order of their (increasing) priority, starting
575 with 0. Therefore, the plugin with the highest priority value will be
581 When writing LLVMC plugins, it can be useful to get a visual view of
582 the resulting compilation graph. This can be achieved via the command
583 line option ``--view-graph``. This command assumes that Graphviz [2]_ and
584 Ghostview [3]_ are installed. There is also a ``--dump-graph`` option that
585 creates a Graphviz source file(``compilation-graph.dot``) in the
592 .. [1] TableGen Fundamentals
593 http://llvm.cs.uiuc.edu/docs/TableGenFundamentals.html
596 http://www.graphviz.org/
599 http://pages.cs.wisc.edu/~ghost/