POSTED BY: anestis / 24.09.2015

The road to efficient Android fuzzing

In the aftermath of the recent Android stagefright vulnerabilities, efficient fuzz testing techniques and tools for the Android ecosystem are again in the spotlight. In this post we would like to share some of the fuzz testing experience we have gained through our projects and show how it can be applied in the Android world. Additionally, we’ll enlist some of the public contributions we’ve made to open source tools aiming to help the community focus more on the target and less on the tooling.

The race for unique crashes

When fuzzing campaigns are completed researchers expect to proceed with the investigation of craches & bug-triaging rather than separating duplicates and going through “garbage” information. As such every “decent” fuzzing tool or framework offers either a post-run or runtime unique crash detection logic.

While many prefer to filter unique crashes at the post-run stage, we strongly believe that it is more efficient to implement this at runtime. Especially for target environments with limited native debugging capabilities (e.g. Android OS) that require time-consuming remote analysis sessions. For example, in our initial fuzzing work against the new Android ART runtime [1], each post-run GDB-scripted analysis was costing ~6-10 seconds per crash. This is a big cost, especially when one considers the case where the same “low hanging fruit” bug is triggered over and over again.

The most common unique crash filtering technique is to unwind the stack of the thread that triggered the abnormal behavior and use frame information to construct a signature of the crash. Depending on the fuzzing environment characteristics (OS, security controls, etc.) and target application, the available frame information might greatly vary affecting the fingerprinting logic. Hence, the decision algorithm that identifies a unique bug must cater for the above target characteristics (ASLR, sanitizer enabled builds, symbols, etc.) without requiring a manual tune or code refactoring effort per target.

Moving to the implementation details, post-run analysis tools prefer to utilize debuggers with scripting capabilities (GDB, LLDB, WinDBG, etc.) to unwind and collect frame information. Runtime analysis tools require either a built-in unwinder (e.g. libunwind) or a target-aware dynamically linked backtrace library (e.g. libbacktrace for Android OS > 5.0).

For our Android fuzzing projects, we preferred the option of a built-in unwinder in order to avoid the dependency and versioning nightmare. Balancing our limited options, we chose libunwind [2] and patched this to support Android NDK cross-compilation for arm, arm64, x86 and x86_64 CPUs. A fully Android compatible libunwind is also maintained by Google in the AOSP tree (used internally by libbacktrace) [3]. We prefer to use a vanilla upstream fork of libunwind, because this AOSP mirror contains significant changes in order to support the Android debuggerd business requirements.

Having the unwinder available at runtime, the next step was to write a frame fingerprinting algorithm that could (also) be applied to the Android OS. After a series of discussions with @_argp, @fel1x and others, the following algorithm was selected for the signature hash:

  • If ProcFS maps read is available
    • Parse target ProcFS maps
    • Calculate frame’s relative PC
    • Use relative PC & file name (if region was mapped from a file) of each frame
    • Hash the first N (most important) frames to create the crash signature
  • If ProcFS maps read is not available
    • Use the last 3 nibbles of frame’s PC as frame fingerprint
    • Hash the first N (most important) frames to create the crash signature

Relative PC and 3 least significant address nibbles ensure that in cases where disabling ASLR is not an option, hash signatures will maintain their values across runs. Additionally, it ensures that if the same binaries are loaded on different systems (e.g. different Nexus devices), the unique signatures will be valid across all devices. The number of frames that construct the signature is a trade-off that highly depends on the code structure of the target. Usually we default it to 7 frames and this appears to be a good tradeoff for most targets.

Our built-in libunwind support and parts of the signature algorithm have been contributed to the honggfuzz open source project [4] as part of the Linux Android PTRACE backend.

Blacklisting triaged crashes

The initial fuzzing campaigns usually produce crashes that to a large extent (50% — 70%) are caused by non exploitable bugs. To avoid carrying that “noise” in future campaign rounds, a blacklist file is maintained for each fuzzing target setup. We reuse the previous stack hash algorithm to assemble a list of hashes that represent non-interesting crashes. This list is then fed to the fuzzing engine which skips them if found again during the campaign.

A honggfuzz stack hash blacklisting feature has been committed here [5] and will be pushed upstream soon.

Automatic runtime reports

In highly successful fuzzing campaigns (i.e. campaigns that have identified a large number of unique crashes) there’s always the dilemma of where to start the analysis from. Some have the charisma and (randomly?) pick the most promising crashes. For the rest of us, investigating additional contextual data helps a lot in the prioritization process.

Inspired by the initial honggfuzz report created by @robertswiecki we save the following information at runtime for each identified unique (and non-blacklisted) crash:

  • Fuzzer configuration
  • Fuzzing target configuration and invoking arguments
  • Original seed file name used to generate the test case (for mutation based cases only)
  • PID of crashed target process
  • Signal type & code
  • Fault address if available (depends on the crash signal type)
  • Crash instruction assembly (via statically compiled libcapstone)
  • Stack hash signature
  • Full unwinded backtrace (doesn’t stop on major N frames of the signature)
  • ProcFS maps copy if available
  • Register values at crashing frame

We’ve contributed most of our Android-specific crash reporting features to upstream honggfuzz [4].

The road towards a native fuzzing framework

In order to efficiently perform fuzz testing against Android OS systems, we developed fully native C/C++ engines (fuzzing core, data generation, etc.) that could run on target devices without requiring remote host support. Although dealing with the Android internals (debuggerd ptrace, signal masking, bionic restrictions, compatibilities, etc.) was seen as a significant cost during initial development, it was a cost that we decided to pay aiming at better results. Of course a maintenance cost will always remain, caused mainly by the Android OS evolution and vendor fragmentation. And that is one more reason towards our decision to publicly contribute part of our work, so that community feedback can (hopefully) balance it up.

Once we had a stable and robust fuzzer core to work with, we were left to develop a data generation engine that would stress test each target application. Until recently our approach was to add the target-specific data generation logic to the fuzzer core, as we didnt want to pay the cost of an additional execve() per iteration to invoke external tools.

This approach resulted into big fragmentation and significant backport cost while we were adding features to the fuzzing engine. As such we took a step back and decided to upgrade the fuzzing engine into something more modular that can easily interface with 3rd party code through extensions.

File format fuzzing (parsers, decoders, etc.) is a nice example of how researchers can benefit from the proposed extensions approach. In order to efficiently fuzz such targets, a common strategy under the mutation-based philosophy, is to apply smart mutations against the original corpus, while at the same time respecting parts of the target format structure (header metadata, CRCs, etc.). Open source libraries and public tools can be easily chained together to accomplish this task.

On the same note, native extensions are implemented by exporting a set of callbacks that will be triggered by the fuzzing engine. Callbacks are exported for the 5 main steps of a fuzzing campaign:

  1. Initialization phase corpus pre-parsing
  2. Per fuzzing iteration data preparation
    • Decision on mangling algorithm
    • Prepare / resize data buffers
  3. Per fuzzing iteration data mangling
  4. Per fuzzing iteration post mangle fix-ups
    • Repair checksums, metadata, headers, etc.
    • Discard iteration if some requirements haven’t been met
  5. Crash detection post event
    • In case additional triaging automation is desired

Native extensions are activated at compile time. They can make use of existing codebases (libraries, parsers, header definitions, etc.) and can be maintained as standalone entities. This cuts down on the tooling cost significantly, as math libs, file parsers and other libraries available in native code can be reused.

In order to get further feedback from the community we decided to port part of this approach to the honggfuzz engine. On the master-dev branch [6] one can find a prototype implementation of the extension callbacks feature. As a proof of concept, sample PNG and DEX file format fuzzing extensions have been pushed under the extensions directory. You can follow commit logs or the “new feature” issue discussion page [7] for more details about the current state of the prototype.

Concluding remarks

At the end of the day the adoption of a fuzzing tool or fuzzing technique is really a matter of personal preference. There’s a wide variety of fuzzing engines to choose from, and they still remain successful across many different targets. Having said that, we believe that for limited resource environments, such as embedded devices, researchers should not spend their time in reinventing the wheel, especially for core fuzzing functionality. Smart people should be motivated to write test case generation algorithms and targeted fuzzing templates, instead of debugging ptrace() et al. wrappers.

Through evaluation of the publicly available tools and through discussions with infosec researchers, we realized that the Android security community was lacking a generic but efficient fuzzing tool. The majority of publicly available fuzzing and triaging projects were relying on some very inefficient, fragile and hack-ish mechanisms that performed remote target execution (adb), monitoring (debuggerd over logcat) and crash data collection (tombstones). Such techniques practically void all previously discussed strategies that can be used to optimize fuzzing campaigns.

For this reason, we decided to publicly contribute to a generic fuzzing tool — honggfuzz, by way of implementing native Android OS support for the PTRACE and POSIX API architecture backends. Hopefully, Android researchers will benefit from our work, re-use the engine as a drive-in for more native tools and contribute back with ideas or identified inefficiencies / bugs.

Additionally, we’re very happy to see that the Android team has started integrating the in-process LLVM libFuzzer fuzzing library in AOSP. When combined with LLVM sanitizers in the build tree, it definitely aids researchers that are focusing on the Android ecosystem.

If you would like to share your thoughts or comments with us, feel free to drop an email to the team.