LLVM Project News and Details from the Trenches

Showing posts with label sanitizer. Show all posts
Showing posts with label sanitizer. Show all posts

Friday, April 1, 2016

My Little LLVM: Undefined Behavior is Magic!

A horrible mashup between LLVM's old dragon logo and a My Little Pony inspired pegasus pony
New LLVM logo

There’s been lots of discussion online (and then quite some more) about compilers abusing undefined behavior. As a response the LLVM compiler infrastructure is rebranding and adopting a motto to make undefined behavior friendlier and less prone to corruption.


The re-branding puts to rest a long-standing issue with LLVM’s “dragon” logo actually being a wyvern with an upside-down head, a special form of undefined behavior in its own right. The logo is now clearly a pegasus pony.


Another great side-effect of this rebranding is increased security by auto-magically closing all vulnerabilities used by the hacker who goes by the pseudonym “Pinkie Pie”.


These new features are enabled with the -rainbow clang option, in honor of Rainbow Dash’s unary name.

Thursday, April 9, 2015

Simple guided fuzzing for libraries using LLVM's new libFuzzer


Fuzzing (or fuzz testing) is becoming increasingly popular. Fuzzing Clang and fuzzing with Clang is not new: Clang-based AddressSanitizer has been used for fuzz-testing the Chrome browser for several years and Clang itself has been extensively fuzzed using csmith and, more recently, using AFL. Now we’ve closed the loop and started to fuzz parts of LLVM (including Clang) using LLVM itself.

LibFuzzer, recently added to the LLVM tree, is a library for in-process fuzzing that uses Sanitizer Coverage instrumentation to guide test generation. With LibFuzzer one can implement a guided fuzzer for some library by writing one simple function: 
extern "C" void TestOneInput(const uint8_t *Data, size_t Size);

We have implemented two fuzzers on top of LibFuzzer: clang-format-fuzzer and clang-fuzzer. Clang-format is mostly a lexical analyzer, so giving it random bytes to format worked perfectly and discovered over 20 bugs. Clang however is more than just a lexer and giving it random bytes barely scratches the surface, so in addition to testing with random bytes we also fuzzed Clang in token-aware mode. Both modes found bugs; some of them were previously detected by AFL, some others were not: we’ve run this fuzzer with AddressSanitizer and some of the bugs are not easily discoverable without it.

Just to give you the feeling, here are some of the input samples discovered by the token-aware clang-fuzzer starting from an empty test corpus:
 static void g(){}
 signed*Qwchar_t;
 overridedouble++!=~;inline-=}y=^bitand{;*=or;goto*&&k}==n
 int XS/=~char16_t&s<=const_cast<Xchar*>(thread_local3+=char32_t

Fuzzing is not a one-off thing -- it shines when used continuously. We have set up a public build bot that runs clang-fuzzer and clang-format-fuzzer 24/7. This way, the fuzzers keep improving the test corpus and will periodically find old bugs or fresh regressions (the bot has caught at least one such regression already).

The benefit of in-process fuzzing compared to out-of-process is that you can test more inputs per second. This is also a weakness: you can not effectively limit the execution time for every input. If some of the inputs trigger superlinear behavior, it may slow down or paralyze the fuzzing. Our fuzzing bot was nearly dead after it discovered exponential parsing time in clang-format. You can workaround the problem by setting a timeout for the fuzzer, but it’s always better to fix superlinear behavior.

It would be interesting to fuzz other parts of LLVM, but a requirement for in-process fuzzing is that the library does not crash on invalid inputs. This holds for clang and clang-format, but not for, e.g., the LLVM bitcode reader.

Help is more than welcome! You can start by fixing one of the existing bugs in clang or clang-format (see PR23057, PR23052 and the results from AFL) or write your own fuzzer for some other part of LLVM or profile one of the existing fuzzers and try to make it faster by fixing performance bugs.

Of course, LibFuzzer can be used to test things outside of the LLVM project. As an example, and following Hanno Böck’s blog post on Heartbleed, we’ve applied LibFuzzer to openssl and found Heartbleed in less than a minute. Also, quite a few new bugs have been discovered in PCRE2 (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11), Glibc and MUSL libc (1, 2) .

Fuzz testing, especially coverage-directed and sanitizer-aided fuzz testing, should directly compliment unit testing, integration testing, and system functional testing. We encourage everyone to start actively fuzz testing their interfaces, especially those with even a small chance of being subject to attacker-controlled inputs. We hope the LLVM fuzzing library helps you start leveraging our tools to better test your code, and let us know about any truly exciting bugs they help you find!

Monday, April 1, 2013

Testing libc++ with -fsanitize=undefined

[This article is re-posted in a slightly expanded form from Marshall's blog]

After my last article, Testing libc++ with Address Sanitizer, I thought "what other tests can I run?"

Address Sanitizer (ASan) is not the only "sanitizer" that clang offers. There are "Thread Sanitizer" (TSan), "Undefined Behavior Sanitizer" (UBSan), and others. There's an integer overflow sanitizer which is called IOC coming in the 3.3 release of clang. The documentation for UBSan can be found on the LLVM site.

I have been looking at the results of running the libc++ test suite with UBSan enabled. Even if you're not interested in libc++ specifically, this post can be a useful introduction to useful Clang bug detectors, and shows several classes of problems they can find.

Thursday, March 28, 2013

Testing libc++ with Address Sanitizer

[This article is re-posted in a slightly expanded form from Marshall's blog]
I've been running the libc++ tests off and on for a while. It's a quite extensive test suite, but I wondered if there were any bugs that the test suite was not uncovering. In the upcoming clang 3.3, there is a new feature named Address Sanitizer which inserts a bunch of runtime checks into your executable to see if there are any "out of bounds" reads and writes to memory.
In the back of my head, I've always thought that it would be nice to be able to say that libc++ was "ASan clean" (i.e, passed all of the test suite when running with Address Sanitizer).
So I decided to do that.