[libFuzzer] better documentatio for -save_minimized_corpus=1

[oota-llvm.git] / docs / LibFuzzer.rst
diff --git a/docs/LibFuzzer.rst b/docs/LibFuzzer.rst

index f6f6e6548aae948f35e2220286ee0880358efa58..d3e0cb1e31050c08bf9770d18285c18a6bddb15d 100644 (file)
--- a/docs/LibFuzzer.rst
+++ b/docs/LibFuzzer.rst
@@ -61,11 +61,9 @@ The most important flags are::
    mutate_depth                         5       Apply this number of consecutive mutations to each input.
    timeout                              1200    Timeout in seconds (if positive). If one unit runs more than this number of seconds the process will abort.
    help                                 0       Print help.
-  save_minimized_corpus                0       If 1, the minimized corpus is saved into the first input directory
+  save_minimized_corpus                0       If 1, the minimized corpus is saved into the first input directory. Example: ./fuzzer -save_minimized_corpus=1 NEW_EMPTY_DIR OLD_CORPUS
    jobs                                 0       Number of jobs to run. If jobs >= 1 we spawn this number of jobs in separate worker processes with stdout/stderr redirected to fuzz-JOB.log.
    workers                              0       Number of simultaneous worker processes to run the jobs. If zero, "min(jobs,NumberOfCpuCores()/2)" is used.
-  tokens                               0       Use the file with tokens (one token per line) to fuzz a token based input language.
-  apply_tokens                         0       Read the given input file, substitute bytes  with tokens and write the result to stdout.
    sync_command                         0       Execute an external command "<sync_command> <test_corpus>" to synchronize the test corpus.
    sync_timeout                         600     Minimum timeout between syncs.
    use_traces                            0       Experimental: use instruction traces
@@ -258,23 +256,25 @@ Voila::
  Advanced features
  =================
  
-Tokens
-------
-
-By default, the fuzzer is not aware of complexities of the input language
-and when fuzzing e.g. a C++ parser it will mostly stress the lexer.
-It is very hard for the fuzzer to come up with something like ``reinterpret_cast<int>``
-from a test corpus that doesn't have it.
-See a detailed discussion of this topic at
-http://lcamtuf.blogspot.com/2015/01/afl-fuzz-making-up-grammar-with.html.
-
-lib/Fuzzer implements a simple technique that allows to fuzz input languages with
-long tokens. All you need is to prepare a text file containing up to 253 tokens, one token per line,
-and pass it to the fuzzer as ``-tokens=TOKENS_FILE.txt``.
-Three implicit tokens are added: ``" "``, ``"\t"``, and ``"\n"``.
-The fuzzer itself will still be mutating a string of bytes
-but before passing this input to the target library it will replace every byte ``b`` with the ``b``-th token.
-If there are less than ``b`` tokens, a space will be added instead.
+Dictionaries
+------------
+*EXPERIMENTAL*.
+LibFuzzer supports user-supplied dictionaries with input language keywords
+or other interesting byte sequences (e.g. multi-byte magic values).
+Use ``-dict=DICTIONARY_FILE``. For some input languages using a dictionary
+may significantly improve the search speed.
+The dictionary syntax is similar to that used by AFL_ for its ``-x`` option::
+
+  # Lines starting with '#' and empty lines are ignored.
+
+  # Adds "blah" (w/o quotes) to the dictionary.
+  kw1="blah"
+  # Use \\ for backslash and \" for quotes.
+  kw2="\"ac\\dc\""
+  # Use \xAB for hex values
+  kw3="\xF7\xF8"
+  # the name of the keyword followed by '=' may be omitted:
+  "foo\x0Abar"
  
  Data-flow-guided fuzzing
  ------------------------
@@ -336,18 +336,20 @@ Build (make sure to use fresh clang as the host compiler)::
  
  Optionally build other kinds of binaries (asan+Debug, msan, ubsan, etc).
  
-TODO: commit the pre-fuzzed corpus to svn (?).
-
  Tracking bug: https://llvm.org/bugs/show_bug.cgi?id=23052
  
  clang-fuzzer
  ------------
  
-The default behavior is very similar to ``clang-format-fuzzer``.
-Clang can also be fuzzed with Tokens_ using ``-tokens=$LLVM/lib/Fuzzer/cxx_fuzzer_tokens.txt`` option.
+The behavior is very similar to ``clang-format-fuzzer``.
  
  Tracking bug: https://llvm.org/bugs/show_bug.cgi?id=23057
  
+llvm-as-fuzzer
+--------------
+
+Tracking bug: https://llvm.org/bugs/show_bug.cgi?id=24639
+
  Buildbot
  --------
  
@@ -363,7 +365,7 @@ The corpuses are stored in git on github and can be used like this::
    git clone https://github.com/kcc/fuzzing-with-sanitizers.git
    bin/clang-format-fuzzer fuzzing-with-sanitizers/llvm/clang-format/C1
    bin/clang-fuzzer        fuzzing-with-sanitizers/llvm/clang/C1/
-  bin/clang-fuzzer        fuzzing-with-sanitizers/llvm/clang/TOK1  -tokens=$LLVM/llvm/lib/Fuzzer/cxx_fuzzer_tokens.txt
+  bin/llvm-as-fuzzer      fuzzing-with-sanitizers/llvm/llvm-as/C1  -only_ascii=1
  
  
  FAQ
@@ -425,15 +427,27 @@ Examples: regular expression matchers, text or binary format parsers.
  Trophies
  ========
  * GLIBC: https://sourceware.org/glibc/wiki/FuzzingLibc
+
  * MUSL LIBC:
- * http://git.musl-libc.org/cgit/musl/commit/?id=39dfd58417ef642307d90306e1c7e50aaec5a35c
- * http://www.openwall.com/lists/oss-security/2015/03/30/3
+
+  * http://git.musl-libc.org/cgit/musl/commit/?id=39dfd58417ef642307d90306e1c7e50aaec5a35c
+  * http://www.openwall.com/lists/oss-security/2015/03/30/3
+
  * pugixml: https://github.com/zeux/pugixml/issues/39
+
  * PCRE: Search for "LLVM fuzzer" in http://vcs.pcre.org/pcre2/code/trunk/ChangeLog?view=markup
+
+* ICU: http://bugs.icu-project.org/trac/ticket/11838
+
  * LLVM:
- * Clang: https://llvm.org/bugs/show_bug.cgi?id=23057
- * Clang-format: https://llvm.org/bugs/show_bug.cgi?id=23052
- * libc++: https://llvm.org/bugs/show_bug.cgi?id=24411
+
+  * Clang: https://llvm.org/bugs/show_bug.cgi?id=23057
+
+  * Clang-format: https://llvm.org/bugs/show_bug.cgi?id=23052
+
+  * libc++: https://llvm.org/bugs/show_bug.cgi?id=24411
+
+  * llvm-as: https://llvm.org/bugs/show_bug.cgi?id=24639