[libFuzzer] mention more trophies

[oota-llvm.git] / docs / LibFuzzer.rst
diff --git a/docs/LibFuzzer.rst b/docs/LibFuzzer.rst

index 412087ac726957327d162f8442ebb6db05d80707..b51e0418c65e698b277626c4b0f2dee3af836bdd 100644 (file)
--- a/docs/LibFuzzer.rst
+++ b/docs/LibFuzzer.rst
@@ -59,15 +59,16 @@ The most important flags are::
    max_len                              64      Maximum length of the test input.
    cross_over                           1       If 1, cross over inputs.
    mutate_depth                         5       Apply this number of consecutive mutations to each input.
-  timeout                              -1      Timeout in seconds (if positive). If one unit runs more than this number of seconds the process will abort.
+  timeout                              1200    Timeout in seconds (if positive). If one unit runs more than this number of seconds the process will abort.
    help                                 0       Print help.
-  save_minimized_corpus                0       If 1, the minimized corpus is saved into the first input directory
+  save_minimized_corpus                0       If 1, the minimized corpus is saved into the first input directory. Example: ./fuzzer -save_minimized_corpus=1 NEW_EMPTY_DIR OLD_CORPUS
    jobs                                 0       Number of jobs to run. If jobs >= 1 we spawn this number of jobs in separate worker processes with stdout/stderr redirected to fuzz-JOB.log.
    workers                              0       Number of simultaneous worker processes to run the jobs. If zero, "min(jobs,NumberOfCpuCores()/2)" is used.
-  tokens                               0       Use the file with tokens (one token per line) to fuzz a token based input language.
-  apply_tokens                         0       Read the given input file, substitute bytes  with tokens and write the result to stdout.
    sync_command                         0       Execute an external command "<sync_command> <test_corpus>" to synchronize the test corpus.
    sync_timeout                         600     Minimum timeout between syncs.
+  use_traces                            0       Experimental: use instruction traces
+  only_ascii                            0       If 1, generate only ASCII (isprint+isspace) inputs.
+
  
  For the full list of flags run the fuzzer binary with ``-help=1``.
  
@@ -112,7 +113,7 @@ Here we show how to use lib/Fuzzer on something real, yet simple: pcre2_::
    (cd pcre; ./autogen.sh; CC="clang -fsanitize=address $COV_FLAGS" ./configure --prefix=`pwd`/../inst && make -j && make install)
    # Build lib/Fuzzer files.
    clang -c -g -O2 -std=c++11 Fuzzer/*.cpp -IFuzzer
-  # Build the the actual function that does something interesting with PCRE2.
+  # Build the actual function that does something interesting with PCRE2.
    cat << EOF > pcre_fuzzer.cc
    #include <string.h>
    #include "pcre2posix.h"
@@ -255,23 +256,37 @@ Voila::
  Advanced features
  =================
  
-Tokens
-------
-
-By default, the fuzzer is not aware of complexities of the input language
-and when fuzzing e.g. a C++ parser it will mostly stress the lexer.
-It is very hard for the fuzzer to come up with something like ``reinterpret_cast<int>``
-from a test corpus that doesn't have it.
-See a detailed discussion of this topic at
-http://lcamtuf.blogspot.com/2015/01/afl-fuzz-making-up-grammar-with.html.
-
-lib/Fuzzer implements a simple technique that allows to fuzz input languages with
-long tokens. All you need is to prepare a text file containing up to 253 tokens, one token per line,
-and pass it to the fuzzer as ``-tokens=TOKENS_FILE.txt``.
-Three implicit tokens are added: ``" "``, ``"\t"``, and ``"\n"``.
-The fuzzer itself will still be mutating a string of bytes
-but before passing this input to the target library it will replace every byte ``b`` with the ``b``-th token.
-If there are less than ``b`` tokens, a space will be added instead.
+Dictionaries
+------------
+*EXPERIMENTAL*.
+LibFuzzer supports user-supplied dictionaries with input language keywords
+or other interesting byte sequences (e.g. multi-byte magic values).
+Use ``-dict=DICTIONARY_FILE``. For some input languages using a dictionary
+may significantly improve the search speed.
+The dictionary syntax is similar to that used by AFL_ for its ``-x`` option::
+
+  # Lines starting with '#' and empty lines are ignored.
+
+  # Adds "blah" (w/o quotes) to the dictionary.
+  kw1="blah"
+  # Use \\ for backslash and \" for quotes.
+  kw2="\"ac\\dc\""
+  # Use \xAB for hex values
+  kw3="\xF7\xF8"
+  # the name of the keyword followed by '=' may be omitted:
+  "foo\x0Abar"
+
+Data-flow-guided fuzzing
+------------------------
+
+*EXPERIMENTAL*.
+With an additional compiler flag ``-fsanitize-coverage=trace-cmp`` (see SanitizerCoverageTraceDataFlow_)
+and extra run-time flag ``-use_traces=1`` the fuzzer will try to apply *data-flow-guided fuzzing*.
+That is, the fuzzer will record the inputs to comparison instructions, switch statements,
+and several libc functions (``memcmp``, ``strcmp``, ``strncmp``, etc).
+It will later use those recorded inputs during mutations.
+
+This mode can be combined with DataFlowSanitizer_ to achieve better sensitivity.
  
  AFL compatibility
  -----------------
@@ -321,18 +336,20 @@ Build (make sure to use fresh clang as the host compiler)::
  
  Optionally build other kinds of binaries (asan+Debug, msan, ubsan, etc).
  
-TODO: commit the pre-fuzzed corpus to svn (?).
-
  Tracking bug: https://llvm.org/bugs/show_bug.cgi?id=23052
  
  clang-fuzzer
  ------------
  
-The default behavior is very similar to ``clang-format-fuzzer``.
-Clang can also be fuzzed with Tokens_ using ``-tokens=$LLVM/lib/Fuzzer/cxx_fuzzer_tokens.txt`` option.
+The behavior is very similar to ``clang-format-fuzzer``.
  
  Tracking bug: https://llvm.org/bugs/show_bug.cgi?id=23057
  
+llvm-as-fuzzer
+--------------
+
+Tracking bug: https://llvm.org/bugs/show_bug.cgi?id=24639
+
  Buildbot
  --------
  
@@ -348,7 +365,7 @@ The corpuses are stored in git on github and can be used like this::
    git clone https://github.com/kcc/fuzzing-with-sanitizers.git
    bin/clang-format-fuzzer fuzzing-with-sanitizers/llvm/clang-format/C1
    bin/clang-fuzzer        fuzzing-with-sanitizers/llvm/clang/C1/
-  bin/clang-fuzzer        fuzzing-with-sanitizers/llvm/clang/TOK1  -tokens=$LLVM/llvm/lib/Fuzzer/cxx_fuzzer_tokens.txt
+  bin/llvm-as-fuzzer      fuzzing-with-sanitizers/llvm/llvm-as/C1  -only_ascii=1
  
  
  FAQ
@@ -407,11 +424,44 @@ small inputs, each input takes < 1ms to run, and the library code is not expecte
  to crash on invalid inputs.
  Examples: regular expression matchers, text or binary format parsers.
  
+Trophies
+========
+* GLIBC: https://sourceware.org/glibc/wiki/FuzzingLibc
+
+* MUSL LIBC:
+
+  * http://git.musl-libc.org/cgit/musl/commit/?id=39dfd58417ef642307d90306e1c7e50aaec5a35c
+  * http://www.openwall.com/lists/oss-security/2015/03/30/3
+
+* pugixml: https://github.com/zeux/pugixml/issues/39
+
+* PCRE: Search for "LLVM fuzzer" in http://vcs.pcre.org/pcre2/code/trunk/ChangeLog?view=markup
+
+* ICU: http://bugs.icu-project.org/trac/ticket/11838
+
+* Freetype: https://savannah.nongnu.org/search/?words=LibFuzzer&type_of_search=bugs&Search=Search&exact=1#options
+
+* Linux Kernel's BPF verifier: https://github.com/iovisor/bpf-fuzzer
+
+* LLVM:
+
+  * Clang: https://llvm.org/bugs/show_bug.cgi?id=23057
+
+  * Clang-format: https://llvm.org/bugs/show_bug.cgi?id=23052
+
+  * libc++: https://llvm.org/bugs/show_bug.cgi?id=24411
+
+  * llvm-as: https://llvm.org/bugs/show_bug.cgi?id=24639
+
+
+
  .. _pcre2: http://www.pcre.org/
  
  .. _AFL: http://lcamtuf.coredump.cx/afl/
  
  .. _SanitizerCoverage: http://clang.llvm.org/docs/SanitizerCoverage.html
+.. _SanitizerCoverageTraceDataFlow: http://clang.llvm.org/docs/SanitizerCoverage.html#tracing-data-flow
+.. _DataFlowSanitizer: http://clang.llvm.org/docs/DataFlowSanitizer.html
  
  .. _Heartbleed: http://en.wikipedia.org/wiki/Heartbleed