From: Kostya Serebryany Date: Wed, 1 Apr 2015 21:33:20 +0000 (+0000) Subject: [fuzzer] document the -tokens flag. Also change the diagnostic output X-Git-Url: http://plrg.eecs.uci.edu/git/?p=oota-llvm.git;a=commitdiff_plain;h=01055ec7e316e4b6e1b37e9e165b66d07716830c [fuzzer] document the -tokens flag. Also change the diagnostic output git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233842 91177308-0d34-0410-b5e6-96231b3b80d8 --- diff --git a/docs/LibFuzzer.rst b/docs/LibFuzzer.rst index 354e8719035..684d9def787 100644 --- a/docs/LibFuzzer.rst +++ b/docs/LibFuzzer.rst @@ -163,6 +163,27 @@ which will cause the fuzzer to exit on the first new synthesised input:: N=100; M=4; ./pcre_fuzzer ./CORPUS -jobs=$N -workers=$M -exit_on_first=1 +Advanced features +================= + +Tokens +------ + +By default, the fuzzer is not aware of complexities of the input language +and when fuzzing e.g. a C++ parser it will mostly stress the lexer. +It is very hard for the fuzzer to come up with something like ``reinterpret_cast`` +from a test corpus that doesn't have it. +See a detailed discussion of this topic at +http://lcamtuf.blogspot.com/2015/01/afl-fuzz-making-up-grammar-with.html. + +lib/Fuzzer implements a simple technique that allows to fuzz input languages with +long tokens. All you need is to prepare a text file containing up to 253 tokens, one token per line, +and pass it to the fuzzer as ``-tokens=TOKENS_FILE.txt``. +Three implicit tokens are added: ``" "``, ``"\t"``, and ``"\n"``. +The fuzzer itself will still be mutating a string of bytes +but before passing this input to the target library it will replace every byte ``b`` with the ``b``-th token. +If there are less than ``b`` tokens, a space will be added instead. + Fuzzing components of LLVM ========================== @@ -188,6 +209,7 @@ clang-fuzzer ------------ The default behavior is very similar to ``clang-format-fuzzer``. +Clang can also be fuzzed with Tokens_ using ``-tokens=$LLVM/lib/Fuzzer/cxx_fuzzer_tokens.txt`` option. Tracking bug: https://llvm.org/bugs/show_bug.cgi?id=23057 diff --git a/lib/Fuzzer/FuzzerUtil.cpp b/lib/Fuzzer/FuzzerUtil.cpp index 3f62a1f1d1e..3635f39a10d 100644 --- a/lib/Fuzzer/FuzzerUtil.cpp +++ b/lib/Fuzzer/FuzzerUtil.cpp @@ -19,15 +19,18 @@ namespace fuzzer { void Print(const Unit &v, const char *PrintAfter) { - std::cerr << v.size() << ": "; for (auto x : v) - std::cerr << (unsigned) x << " "; + std::cerr << "0x" << std::hex << (unsigned) x << std::dec << ","; std::cerr << PrintAfter; } void PrintASCII(const Unit &U, const char *PrintAfter) { - for (auto X : U) - std::cerr << (char)((isascii(X) && X >= ' ') ? X : '?'); + for (auto X : U) { + if (isprint(X)) + std::cerr << X; + else + std::cerr << "\\x" << std::hex << (int)(unsigned)X << std::dec; + } std::cerr << PrintAfter; }