Subject: Proposal for the inclusion of LLCov code
Date: Monday 16th June 2014 20:54:22 UTC (over 3 years ago)
Hello all, below is a proposal to include LLCov, a simple but helpful little tool based on LLVM/Clang, into the main LLVM code. Tl;dr: It's a module pass that instruments basic blocks with calls to an external function, and it can be used for various things, including (live!) basic block coverage. I'm looking forward to hear opinions on this :) Best, Chris === Problem description === Code coverage always has been considered an important aspect in testing. Especially for automated testing (e.g. fuzzing), coverage is a requirement for success. Some recent fuzzing research is going into the direction of genetic algorithms where coverage can be a part of the fitness function. However, applying this all to a large codebase in a practical way is a complex endeavor. Popular code coverage tools like GCov are not exactly designed to be used to obtain coverage while the program is running. Since we want to make decisions based on coverage without terminating the program though (mainly for performance reasons but depending on the type of fuzzing also because one would like to alter the mutation strategy mid-fuzzing), we need to get coverage feedback live when it happens. Furthermore, we are often not interested in all of the coverage. Often, a particular portion of the code is targeted and the rest (which is the majority) would only slow us down if instrumented. === Proposed solution === I propose to include LLCov into the main LLVM tree. LLCov is implemented as a module pass and allows to selectively instrument code portions for basic block coverage measurement (or any other task that should be performed per basic block). It can instrument based on a combination of black- and whitelist that works based on files, lines or functions. All of the instrumented code calls an arbitrary external function per basic block (that is, per control flow node). This external function can do whatever the tester wants it to do. The simplest task would be to output coverage information on stderr and have the fuzzer collect it there. It could also provide the information over a network socket though. === Current status of the tool === The current LLCov code is maintained at https://github.com/choller/LLCov and consists of the main LLCov.cpp file, implementing the module pass, as well as two patches (one integrating the LLVM pass, the other patching the Clang frontend to support the necessary compiler flag and to link the runtime). Over the time, the module pass itself only required little adjustment (e.g. some includes changed), but rebasing the patches for the frontend typically required manual work. === Alternatives === One alternative would be to add an interface such that the changes required to integrate this and other passes (especially into the Clang frontend), can be made dynamically. I'm not sure if this is possible though. Another alternative would be to add this functionality to the GCov pass, but I am not sure if that is easily doable given the way GCov typically works.