NFC: Fix indentation and add braces to clarify nested of else-statement.

[oota-llvm.git] / docs / LibFuzzer.rst
diff --git a/docs/LibFuzzer.rst b/docs/LibFuzzer.rst

index 3b074aad6a137fc702d0c3ad7286be51e6afa0b2..8d4972296edef4a2d807a86616949e443e549a7d 100644 (file)
--- a/docs/LibFuzzer.rst
+++ b/docs/LibFuzzer.rst
@@ -61,14 +61,13 @@ The most important flags are::
    mutate_depth                         5       Apply this number of consecutive mutations to each input.
    timeout                              1200    Timeout in seconds (if positive). If one unit runs more than this number of seconds the process will abort.
    help                                 0       Print help.
-  save_minimized_corpus                0       If 1, the minimized corpus is saved into the first input directory
+  save_minimized_corpus                0       If 1, the minimized corpus is saved into the first input directory. Example: ./fuzzer -save_minimized_corpus=1 NEW_EMPTY_DIR OLD_CORPUS
    jobs                                 0       Number of jobs to run. If jobs >= 1 we spawn this number of jobs in separate worker processes with stdout/stderr redirected to fuzz-JOB.log.
    workers                              0       Number of simultaneous worker processes to run the jobs. If zero, "min(jobs,NumberOfCpuCores()/2)" is used.
-  tokens                               0       Use the file with tokens (one token per line) to fuzz a token based input language.
-  apply_tokens                         0       Read the given input file, substitute bytes  with tokens and write the result to stdout.
    sync_command                         0       Execute an external command "<sync_command> <test_corpus>" to synchronize the test corpus.
    sync_timeout                         600     Minimum timeout between syncs.
    use_traces                            0       Experimental: use instruction traces
+  only_ascii                            0       If 1, generate only ASCII (isprint+isspace) inputs.
  
  
  For the full list of flags run the fuzzer binary with ``-help=1``.
@@ -238,7 +237,7 @@ to find Heartbleed with LibFuzzer::
      SSL_free(server);
    }
    EOF
-  # Build the fuzzer. 
+  # Build the fuzzer.
    clang++ -g handshake-fuzz.cc  -fsanitize=address \
      openssl-1.0.1f/libssl.a openssl-1.0.1f/libcrypto.a Fuzzer*.o
    # Run 20 independent fuzzer jobs.
@@ -257,23 +256,25 @@ Voila::
  Advanced features
  =================
  
-Tokens
-------
-
-By default, the fuzzer is not aware of complexities of the input language
-and when fuzzing e.g. a C++ parser it will mostly stress the lexer.
-It is very hard for the fuzzer to come up with something like ``reinterpret_cast<int>``
-from a test corpus that doesn't have it.
-See a detailed discussion of this topic at
-http://lcamtuf.blogspot.com/2015/01/afl-fuzz-making-up-grammar-with.html.
-
-lib/Fuzzer implements a simple technique that allows to fuzz input languages with
-long tokens. All you need is to prepare a text file containing up to 253 tokens, one token per line,
-and pass it to the fuzzer as ``-tokens=TOKENS_FILE.txt``.
-Three implicit tokens are added: ``" "``, ``"\t"``, and ``"\n"``.
-The fuzzer itself will still be mutating a string of bytes
-but before passing this input to the target library it will replace every byte ``b`` with the ``b``-th token.
-If there are less than ``b`` tokens, a space will be added instead.
+Dictionaries
+------------
+*EXPERIMENTAL*.
+LibFuzzer supports user-supplied dictionaries with input language keywords
+or other interesting byte sequences (e.g. multi-byte magic values).
+Use ``-dict=DICTIONARY_FILE``. For some input languages using a dictionary
+may significantly improve the search speed.
+The dictionary syntax is similar to that used by AFL_ for its ``-x`` option::
+
+  # Lines starting with '#' and empty lines are ignored.
+
+  # Adds "blah" (w/o quotes) to the dictionary.
+  kw1="blah"
+  # Use \\ for backslash and \" for quotes.
+  kw2="\"ac\\dc\""
+  # Use \xAB for hex values
+  kw3="\xF7\xF8"
+  # the name of the keyword followed by '=' may be omitted:
+  "foo\x0Abar"
  
  Data-flow-guided fuzzing
  ------------------------
@@ -282,7 +283,7 @@ Data-flow-guided fuzzing
  With an additional compiler flag ``-fsanitize-coverage=trace-cmp`` (see SanitizerCoverageTraceDataFlow_)
  and extra run-time flag ``-use_traces=1`` the fuzzer will try to apply *data-flow-guided fuzzing*.
  That is, the fuzzer will record the inputs to comparison instructions, switch statements,
-and several libc functions (``memcmp``, ``strncmp``, etc).
+and several libc functions (``memcmp``, ``strcmp``, ``strncmp``, etc).
  It will later use those recorded inputs during mutations.
  
  This mode can be combined with DataFlowSanitizer_ to achieve better sensitivity.
@@ -335,18 +336,37 @@ Build (make sure to use fresh clang as the host compiler)::
  
  Optionally build other kinds of binaries (asan+Debug, msan, ubsan, etc).
  
-TODO: commit the pre-fuzzed corpus to svn (?).
-
  Tracking bug: https://llvm.org/bugs/show_bug.cgi?id=23052
  
  clang-fuzzer
  ------------
  
-The default behavior is very similar to ``clang-format-fuzzer``.
-Clang can also be fuzzed with Tokens_ using ``-tokens=$LLVM/lib/Fuzzer/cxx_fuzzer_tokens.txt`` option.
+The behavior is very similar to ``clang-format-fuzzer``.
  
  Tracking bug: https://llvm.org/bugs/show_bug.cgi?id=23057
  
+llvm-as-fuzzer
+--------------
+
+Tracking bug: https://llvm.org/bugs/show_bug.cgi?id=24639
+
+llvm-mc-fuzzer
+--------------
+
+This tool fuzzes the MC layer. Currently it is only able to fuzz the
+disassembler but it is hoped that assembly, and round-trip verification will be
+added in future.
+
+When run in dissassembly mode, the inputs are opcodes to be disassembled. The
+fuzzer will consume as many instructions as possible and will stop when it
+finds an invalid instruction or runs out of data.
+
+Please note that the command line interface differs from that of other fuzzers.
+The fuzzer arguments are specified with ``-fuzzer-args`` while other arguments
+control the operation mode and target in a similar manner to ``llvm-mc``.
+
+Unfortunately, this fuzzer is currently unable to use the -jobs option.
+
  Buildbot
  --------
  
@@ -362,7 +382,7 @@ The corpuses are stored in git on github and can be used like this::
    git clone https://github.com/kcc/fuzzing-with-sanitizers.git
    bin/clang-format-fuzzer fuzzing-with-sanitizers/llvm/clang-format/C1
    bin/clang-fuzzer        fuzzing-with-sanitizers/llvm/clang/C1/
-  bin/clang-fuzzer        fuzzing-with-sanitizers/llvm/clang/TOK1  -tokens=$LLVM/llvm/lib/Fuzzer/cxx_fuzzer_tokens.txt
+  bin/llvm-as-fuzzer      fuzzing-with-sanitizers/llvm/llvm-as/C1  -only_ascii=1
  
  
  FAQ
@@ -421,6 +441,45 @@ small inputs, each input takes < 1ms to run, and the library code is not expecte
  to crash on invalid inputs.
  Examples: regular expression matchers, text or binary format parsers.
  
+Trophies
+========
+* GLIBC: https://sourceware.org/glibc/wiki/FuzzingLibc
+
+* MUSL LIBC:
+
+  * http://git.musl-libc.org/cgit/musl/commit/?id=39dfd58417ef642307d90306e1c7e50aaec5a35c
+  * http://www.openwall.com/lists/oss-security/2015/03/30/3
+
+* pugixml: https://github.com/zeux/pugixml/issues/39
+
+* PCRE: Search for "LLVM fuzzer" in http://vcs.pcre.org/pcre2/code/trunk/ChangeLog?view=markup
+
+* ICU: http://bugs.icu-project.org/trac/ticket/11838
+
+* Freetype: https://savannah.nongnu.org/search/?words=LibFuzzer&type_of_search=bugs&Search=Search&exact=1#options
+
+* Linux Kernel's BPF verifier: https://github.com/iovisor/bpf-fuzzer
+
+* LLVM:
+
+  * Clang: https://llvm.org/bugs/show_bug.cgi?id=23057
+
+  * Clang-format: https://llvm.org/bugs/show_bug.cgi?id=23052
+
+  * libc++: https://llvm.org/bugs/show_bug.cgi?id=24411
+
+  * llvm-as: https://llvm.org/bugs/show_bug.cgi?id=24639
+
+  * Disassembler:
+
+    * Mips: Discovered a number of untested instructions for the Mips target
+      (see valid-mips*.s in http://reviews.llvm.org/rL247405,
+      http://reviews.llvm.org/rL247414, http://reviews.llvm.org/rL247416,
+      http://reviews.llvm.org/rL247417, http://reviews.llvm.org/rL247420,
+      and http://reviews.llvm.org/rL247422) as well some instructions that
+      successfully disassembled on ISA's where they were not valid (see
+      invalid-xfail.s files in the same commits).
+
  .. _pcre2: http://www.pcre.org/
  
  .. _AFL: http://lcamtuf.coredump.cx/afl/