Rust FFI - Fuzzing Like a (much faster) Caveman

Mar 6, 2022 | 12 minutes read

Tags: rust, ffi, fuzzing, python

In October of 2021, I gave a talk at the Texas Cyber Summit titled Rust, I choose you! A formative talk for the Rust-curious. I thought it would be nice to put the same information into blog form, for those who didn’t get to see it. I don’t plan to recreate every little piece of the talk in these blogs, but the parts that deal with rust’s Foreign Function Interface (FFI) make nice bite-sized chunks of information, so we’ll stick to those.

The source code and slides from the talk are located at my Rust I Choose You - TCS 2021 repository.


In early 2020, @h0mbre wrote a blog post titled Fuzzing Like A Caveman. At the time of this writing, he has five posts in the Fuzzing Like a Caveman series. We’re interested in the first two posts, which document his journey writing and then improving a simple JPEG fuzzer using python. This post explores replacing one of his mutation functions with a Rust implementation. The goal is to channel our inner @gamozolabs and increase that ever elusive perf!


Project Setup

Let’s kick things off by setting up our project directory. We’ll begin by creating a new rust library project using the cargo command below.

cargo new --lib call-rust-from-python

Once complete, we’re left with a directory structure that looks like this.

├── Cargo.toml
└── src

While we’re at it, let’s add a few python dependencies as well.

cwd: call-rust-from-python/

poetry init -n
poetry add pytest pytest-benchmark snakeviz

Once the commands above have been run, we can move along, confident in the fact that we have a nice new project workspace waiting for us.

Choosing the Mutator

We know that we want to attempt to speed up h0mbre’s fuzzer. In order to do that, we need to select an area of the program to target. To that end, we’ll examine the fuzzer from Fuzzing Like A Caveman 2: Improving Performance, as it’s been optimized a bit over the version that appeared in the first installment of the series.

The fuzzer has two mutation functions, one named magic, the other named bit_flip. bit_flip randomly flips bits in binary data while magic injects numbers that are at typical problem boundaries like max values for signed integers, etc…

def bit_flip(data):
    # randomly flip bits in binary data
    # snip ----8<---- snip

def magic(data):
    # use magic numbers at problem boundaries; i.e. 0x7FFFFFFF
    # snip ----8<---- snip

Since we don’t know which offers more room for improvement, let’s perform a little profiling!

Profiling the Mutators

For us to be able to profile the code, we’ll need to pull down h0mbre’s repository.

git clone
cd Fuzzing/JPEGMutation

Next, we can download the same jpeg file that h0mbre used while developing his fuzzer.


With fuzzer and jpeg in hand, we can profile execution.

python -m cProfile -o caveman.profile Canon_40D.jpg

Once that runs to completion, we can use snakeviz to visualize the results.

poetry run snakeviz caveman.profile

When we examine the graph, we can see that we’re spending nearly half of the execution time in the bit_flip function! Additionally, we don’t even see the magic function listed anywhere.


Based on our profiling, we have way more to gain by replacing the bit_flip function with an equivalent written in rust, so let’s give it a go!

Writing the Replacement

We want to create a drop-in replacement for h0mbre’s bit_flip function. In order to accomplish that goal, we’ll begin with porting h0mbre’s python version to rust. In the first iteration, we’re simply going to write normal rust code, without regard for python bindings.

In h0mbre’s implementation, he chooses to flip 1% of the input bytes. In order to flip the bytes, he randomly chooses a power of two and uses it to xor a randomly selected byte. The implementation is shown below.’s bit_flip function

def bit_flip(data):

	length = len(data) - 4

	num_of_flips = int(length * .01)

	picked_indexes = []
	flip_array = [1,2,4,8,16,32,64,128]

	counter = 0
	while counter < num_of_flips:
		counter += 1

	for x in picked_indexes:
		mask = random.choice(flip_array)
		data[x] = data[x] ^ mask

	return data

The same bit_flip function is shown below, just that it’s written in rust. The only appreciable difference is that the while loop to populate picked_indexes is folded into the same loop that loops over the number of flips that should be performed.’s bit_flip function ported to rust

use rand::prelude::SliceRandom;
use std::error::Error;
use std::ptr::slice_from_raw_parts_mut;

use rand::Rng;

/// array of bitmasks to apply to individual bytes via xor
static FLIP_ARRAY: [u8; 8] = [1, 2, 4, 8, 16, 32, 64, 128];

/// direct port of h0mbre's second version of the bit_flip function used in his blog post
fn bit_flip(data: &mut [u8]) -> &[u8] {
    // create thread-local random number generator, seeded by the system
    let mut rng = rand::thread_rng();

    // attempt to subtract 4 from data.len() and store it as a usize. In the event that data.len()
    // is 3 or less, checked_sub will panic, raising an exception
    // the original bit_flip function was designed around flipping bytes specifically in a JPEG.
    // Subtracting four from the length here is to preserve the SOI and EOI markers in the provided
    // JPEG.
    let length = data
        .expect("length of data is too small");

    // only flip 1% of the bits in the bytearray passed into the function
    let flip_mul = length as f64;
    let num_of_flips = (flip_mul * 0.01) as usize;

    // we create a raw mutable slice from the raw pointer, as they're a bit easier to work with
    let data_slice = slice_from_raw_parts_mut(data.as_mut_ptr(), data.len());

    (0..num_of_flips).for_each(|_| {
        // the extension trait, SliceRandom, from the rand crate allows us to call .choose on
        // our array, giving us a random value from the array
        let mask = FLIP_ARRAY.choose(&mut rng).unwrap();

        // get a random index that will be used as an index into our `data` buffer, pointed
        // to by `data_slice`
        let data_index = rng.gen_range(0, length);

        unsafe {
            // dereference the `data_slice` pointer, and index into it while xor'ing the value with our
            // randomized `mask` value
            (*data_slice)[data_index] ^= mask;

    data // return the mutated data to the caller

We can confirm that our ported mutator works with the following code.

mod tests {
    use super::*;

    fn mutator_mutates() {
        let original: Vec<_> = (0..1000).map(|_| rand::random::<u8>()).collect();
        let mut to_flip = original.clone();

        let mutated = bit_flip(&mut to_flip);

        assert!(mutated != original);

Sweet! We have a working rust implementation, now let’s expose it such that python can use the function.

Adding Python Bindings

In order to add python bindings, we’ll stand on the shoulders of giants and leverage the PyO3 crate. PyO3 can be used to generate a native Python module from our rust code. Below is an updated bit_flip that adds in the necessary components for us to expose the function to Python.

 2fn bit_flip(data: &PyByteArray) -> &PyByteArray {
 3    // create thread-local random number generator, seeded by the system
 4    let mut rng = rand::thread_rng();
 6    // attempt to subrtact 4 from data.len() and store it as a usize. In the event that data.len()
 7    // is 3 or less, checked_sub will panic, raising the exception below:
 8    //
 9    //      pyo3_runtime.PanicException: length of data is too small
10    //
11    // the original bit_flip function was designed around flipping bytes specifically in a JPEG.
12    // Subtracting four from the length here is to preserve the SOI and EOI markers in the provided
13    // JPEG.
14    let length = data
15        .len()
16        .checked_sub(4)
17        .expect("length of data is too small");
19    // only flip 1% of the bits in the bytearray passed into the function
20    let flip_mul = length as f64;
21    let num_of_flips = (flip_mul * 0.01) as usize;
23    // returns a raw pointer to the start of the buffer containing the contents
24    // of the bytearray
25    let data_ptr =;
27    // we create a raw mutable slice from the raw pointer, as they're a bit easier to work with
28    let data_slice = slice_from_raw_parts_mut(data_ptr, data.len());
30    (0..num_of_flips).for_each(|_| {
31        // the extension trait, SliceRandom, from the rand crate allows us to call .choose on
32        // our array, giving us a random value from the array
33        let mask = FLIP_ARRAY.choose(&mut rng).unwrap();
35        // get a random index that will be used as an index into our `` buffer, pointed
36        // to by `data_slice`
37        let data_index = rng.gen_range(0..length);
39        unsafe {
40            // dereference the `data_slice` pointer, and index into it while xor'ing the value with our
41            // randomized `mask` value
42            (*data_slice)[data_index] ^= mask;
43        }
44    });
46    data // return the mutated data to the caller

As shown by the highlighted code, there weren’t many changes necessary. The #[pyfunction] attribute (line 1) is used to define a Python function from a Rust function. We also changed the input parameter and return type to a PyByteArray (line 2), which is the PyO3 equivalent of Python’s bytearray. In plainer terms, we could say that the bit_flip function takes a reference to a python bytearray as input and returns the same. Finally, we needed to get a pointer to the internal buffer that the PyByteArray exposes via the .data() method (line 25).

Now that we’ve defined the python function, it needs to be added to a python module using PyO3. The canonical example is shown below, tailored to our needs.

/// A Python module implemented in Rust
fn bit_flipper(_py: Python, m: &PyModule) -> PyResult<()> {
    // add the bit_flip function to the given module
    m.add_function(wrap_pyfunction!(bit_flip, m)?)?;

If you were to write this module/function combination in python directly, one possible implementation would be a file with a bit_flip function inside the file. That’s effectively what we’ve set up in the rust code above.

That’s it; our rust implementation is ready to be used on the Python side of things. Let’s finalize the steps needed to make that dream a reality!

Running from Python

Before we can compile, we’ll need to make sure our Cargo.toml is similar to what’s shown below.

name = "call-rust-from-python"
version = "0.1.0"
edition = "2021"

pyo3 = { features = ["extension-module"], git = "" }
rand = "0.8"

name = "bit_flipper"
crate-type = ["cdylib"]

The important bits are the dependencies and crate-type = ["cdylib"]. That means that when we compile our rust code, we’ll get a shared object in return (assuming we’re on Linux).

With our setup complete, let’s compile!

cargo build --release

Finished release [optimized] target(s) in 0.03s

After that, all we need to do is rename the file to match the module name that we used in our source code, bit_flipper in this case

mv target/release/

And then it’s ready for use in python! Assuming is in our current directory, we can drop into a python interpreter and use the bit_flipper module and bit_flip function just as though it were normal python code, which I think is super neat.


>>> from bit_flipper import bit_flip
>>> data = bytearray(range(256))
>>> data_copy = data[:]
>>> bit_flip(data)
>>> data == data_copy

How cool is that? We just ran rust code straight outta python!

The Fun Part

Ok, we have this shiny new shared object, but how do we know it’s any better than the python version? Well, we get to do the the fun part, it’s time to benchmark!

As mentioned earlier, h0mbre went back in his second post and improved the bit_flip function with a more efficient implementation. The second implementation is the one that was ported to rust. We can use pytest-benchmark against h0mbre’s first and second iteration of his bit_flip function, as well as the one written in rust, to compare performance.

The benchmarking code is shown below.

 1import random
 3import bit_flipper
 6def caveman_bit_flip_one(data):
 7    """ code taken from with author's permission """
 8    num_of_flips = int((len(data) - 4) * .01)
10    indexes = range(4, (len(data) - 4))
12    chosen_indexes = []
14    # iterate selecting indexes until we've hit our num_of_flips number
15    counter = 0
16    while counter < num_of_flips:
17        chosen_indexes.append(random.choice(indexes))
18        counter += 1
20    for x in chosen_indexes:
21        current = data[x]
22        current = (bin(current).replace("0b", ""))
23        current = "0" * (8 - len(current)) + current
25        indexes = range(0, 8)
27        picked_index = random.choice(indexes)
29        new_number = []
31        # our new_number list now has all the digits, example: ['1', '0', '1', '0', '1', '0', '1', '0']
32        for i in current:
33            new_number.append(i)
35        # if the number at our randomly selected index is a 1, make it a 0, and vice versa
36        if new_number[picked_index] == "1":
37            new_number[picked_index] = "0"
38        else:
39            new_number[picked_index] = "1"
41        # create our new binary string of our bit-flipped number
42        current = ''
43        for i in new_number:
44            current += i
46        # convert that string to an integer
47        current = int(current, 2)
49        # change the number in our byte array to our new number we just constructed
50        data[x] = current
52    return data
55def caveman_bit_flip_two(data):
56    """ code taken from with author's permission """
57    length = len(data) - 4
59    num_of_flips = int(length * .01)
61    picked_indexes = []
63    flip_array = [1, 2, 4, 8, 16, 32, 64, 128]
65    counter = 0
66    while counter < num_of_flips:
67        picked_indexes.append(random.choice(range(0, length)))
68        counter += 1
70    for x in picked_indexes:
71        mask = random.choice(flip_array)
72        data[x] = data[x] ^ mask
74    return data
77contents = bytearray(open('call-rust-from-python/corpus/Canon_40D.jpg', 'rb').read())
80def test_caveman_bit_flip_one(benchmark):
81    # benchmark.pedantic(caveman_bit_flip_one, (contents,), iterations=100000, rounds=10)
82    benchmark(caveman_bit_flip_one, contents)
85def test_caveman_bit_flip_two(benchmark):
86    # benchmark.pedantic(caveman_bit_flip_two, (contents,), iterations=100000, rounds=10)
87    benchmark(caveman_bit_flip_two, contents)
90def test_rust_bit_flip(benchmark):
91    # benchmark.pedantic(rust_exec.bit_flip, (contents,), iterations=100000, rounds=10)
92    benchmark(bit_flipper.bit_flip, contents)

caveman_bit_flip_one and caveman_bit_flip_two are the functions from h0mbre’s posts, simply copied and pasted into our benchmark code for the sake of simplicity.

We can run the benchmarks with poetry run python -m pytest


The first python implementation comes in at 7800 execs per second. The second iteration is better at almost 18000. However, neither python solution comes close to the rust implementation, which reached 680,000 execs per second!

By swapping out the function in h0mbre’s fuzzer, the amount of time spent in bit_flip over 10s of 1000s of iterations amounts to a small fraction of a second.

Not too shabby, I’d say!


There we have it, a relatively brief post showcasing calling rust code from python. I’ve personally used this strategy numerous times to increase performance of python code; it’s a handy tool to have in your toolbox.

If you enjoyed this post, you’ll be glad to know that there are at least a few more FFI-related posts to yank out of the slides. Til next time!

Additional Resources

  1. Rust I Choose You - TCS 2021
  2. Fuzzing Like A Caveman
  3. Fuzzing Like A Caveman 2: Improving Performance
  4. PyO3

comments powered by Disqus