Tip

Edit (2024-02-25): This is only here for historical reasons, and to show how much cargo-difftests has changed since it was first released. Although the basic commands are still almost the same, cargo-difftests today has some newer commands which provide an easier and nicer interface to interact with.

A few days ago, I wrote a post about upsilon-difftests, a tool that tells you about the tests that have changed since the last commit / test run, but somewhat tailored to my upsilon project. Since then, I thought it would be nice to extract the core functionality into a few separate crates, and make it available for others to use. In this post, I’ll give a quick introduction to the cargo-difftests crate, and show you how to get the most out of it.

TL;DR

cargo-difftests works with coverage data, and can tell you which tests have changed since the last commit / based on file system mtimes. In the next section, I will go over how it achieves this, but if you just want the guide to set it up, feel free to skip to the walkthrough section.

How does it work?

Similarly to my upsilon-difftests post, I would like to ask you to familiarize yourself with rustc’s instrumentation-based source coverage first, since that is the foundation on which cargo-difftests is built.

After that, we can get started.

cargo-difftests-testclient

cargo-difftests-testclient is a small crate that is used to generate the file system structure that cargo-difftests expects.

It takes a bit of information about the test and a directory, and generates a directory structure that looks like this:

Do note that it will delete the directory if it exists, so make sure you don’t have anything important in there.

.
|- self.json
|- self.profraw
|- cargo_difftests_version
|- ....profraw
  • self.json contains the information that you pass about the tests.
  • self.profraw contains the coverage data for the code of the test binary itself.
  • cargo_difftests_version contains the version of cargo-difftests that generated the directory.
  • ....profraw being the other profraw files generated by the binaries the test invokes.

cargo-difftests

After we have the directory structure, we can use cargo-difftests to figure out if any of the source files involved in the test have changed.

Under the hood, it will call rust-profdata merge to merge all the .profraw files into a .profdata file, and then call rust-cov export to get the coverage data from the .profdata file. (rust-profdata and rust-cov come from cargo-binutils)

Optionally, it can export the coverage data json into a smaller “test index”, which contains only the information that it actually uses, and nothing more.

In practice (in upsilon), when the exported coverage data is a json file of about 40 MiBs, the test index comes up to about 20 KiBs, so using that for the analysis is a lot faster, almost instant.

After we have some exported coverage data, we can use it to figure out which source files were involved in the test, and then check if any of them have changed.

Note

cargo-difftests will use the mtime of the self.json file to determine when the test was run.

From the coverage data, and depending on the passed --algo option, it will go one of three ways, but for the sake of the explanation, we’ll just call the modified files “dirty”. If any of the files involved in a test were marked as dirty, then so will the test.

--algo=fs-mtime (default)

This is the simplest algorithm, and it will just check if any of the source files that have been involved in the test have changed since the test was run (by comparing mtimes). So, the set of dirty files is the set of all files from the repository that have been changed since the test was run.

--algo=git-diff-files

Very similar to --algo=fs-mtime, but it will diff the HEAD with the worktree to find out which files have changed. Here, the set of dirty files is the set of all files from the repository that have been changed since the last commit (and not test run).

--algo=git-diff-hunks

This is a more advanced one. Similarly to --algo=git-diff-files, it will diff the HEAD with the worktree, but instead of just checking if the file was modified, it will look at the hunks that were modified, and it will try to intersect them with the regions from the coverage data. To try to put it mathematically, the set of dirty files is the set of all files that changed since the last commit, and each file has to have at least one diff hunk that intersects any of the regions of the coverage data with a count > 0.

It has almost the same drawback as --algo=git-diff-files, that is, it cannot know the state of the repository at the last test run, just at the last commit, but when that is indeed the case, it will yield the most accurate results out of the three.

In git, a hunk is a part of a file that was changed. It’s not always a single line, but it can be multiple lines. In libgit2, it is identified by a tuple like (old_line_start, old_line_count, new_line_start, new_line_count).

When old_line_count is 0, it means that the hunk is an addition, and when new_line_count is 0, it means that the hunk is a deletion. In other cases, it’s a modification.

The way this algorithm works is that it will intersect (old_line_start..old_line_start + old_line_count) with the regions from the coverage data, and if there is an intersection, it will mark the file as dirty.

This only works if the file was not modified since the last commit when the tests were run.

--algo=git-diff-hunks vs. --algo=git-diff-files

--algo=git-diff-hunks will be more accurate than --algo=git-diff-files, assuming the last test run was right after the last commit, when none of the source files have been changed (that is, the worktree was clean), but in the case that the worktree of the repository was dirty when the test was run, --algo=git-diff-hunks would catastrophically fail, giving false positives and false negatives, while --algo=git-diff-files would still yield somewhat correct results, although with a few false positives.

This is just their overview, and we will compare how those algorithms work in practice in the walkthrough, including how --algo=git-diff-hunks would fail while --algo=git-diff-files would still work (for now git-diff-hunks is completely broken, and I will update this once it is fixed edit: fixed in 0.1.0-alpha.3).

Walkthrough

Prerequisites

NOTE

Needs rust nightly.

cargo-difftests uses rust-profdata and rust-cov from cargo-binutils under the hood, so you will need to install that first, along with the llvm-tools themselves:

rustup component add llvm-tools-preview
cargo install cargo-binutils

Now, to install cargo-difftests:

cargo install cargo-difftests --version 0.1.0-alpha.3

Setup

Let us start with a new project:

cargo new --bin cargo-difftests-sample-project

We will create a new profile called difftests, that will use code coverage:

# .cargo/config.toml
[profile.difftests]
inherits = "dev"
rustflags = [
    "-C", "instrument-coverage", # flag required for instrumentation-based code coverage
    "--cfg", "cargo_difftests", # cfg required for cargo-difftests-testclient,
    # more on it in a second
]
 
[unstable]
profile-rustflags = true

If we just run cargo run --profile difftests, we will get:

> cargo run --profile difftests -q
Hello, world!

And if we ls, we should have a .profraw file:

> ls
Cargo.lock  default_8281569816464993346_0_147888.profraw  target/
Cargo.toml  src/

We don’t need it, so feel free to delete it:

rm default_*.profraw

Great! Now, let’s add some functions we can test:

// src/lib.rs
 
pub fn add(a: i32, b: i32) -> i32 {
    a + b
}
 
pub fn sub(a: i32, b: i32) -> i32 {
    a - b
}
 
pub mod advanced_arithmetic;
 
pub use advanced_arithmetic::*;
// src/advanced_arithmetic.rs
 
pub fn mul(a: i32, b: i32) -> i32 {
    a * b
}
 
pub fn div_unchecked(a: i32, b: i32) -> i32 {
    a / b
}
 
pub fn div(a: i32, b: i32) -> Option<i32> {
    if b != 0 {
        Some(div_unchecked(a, b))
    } else {
        None
    }
}

And now we can add some tests:

// tests/tests.rs
 
use cargo_difftests_sample_project::*;
 
#[test]
fn test_add() {
    assert_eq!(add(1, 2), 3);
}
 
#[test]
fn test_sub() {
    assert_eq!(sub(3, 2), 1);
}
 
#[test]
fn test_mul() {
    assert_eq!(mul(2, 3), 6);
}
 
#[test]
fn test_div() {
    assert_eq!(div(6, 3), Some(2));
}
 
#[test]
fn test_div_2() {
    assert_eq!(div(6, 0), None);
}

Running them right now gives us:

> cargo t --profile difftests
   Compiling cargo-difftests-sample-project v0.1.0 (C:\Users\Dinu\samples\cargo-difftests-sample-project)
    Finished difftests [unoptimized + debuginfo] target(s) in 0.66s
     Running unittests src\lib.rs (target\difftests\deps\cargo_difftests_sample_project-0fa293eef4b2f5f9.exe)
 
running 0 tests
 
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
 
     Running unittests src\main.rs (target\difftests\deps\cargo_difftests_sample_project-3c5054455458f422.exe)
 
running 0 tests
 
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
 
     Running tests\tests.rs (target\difftests\deps\tests-53cb4ce840823521.exe)
 
running 5 tests
test test_add ... ok
test test_div ... ok
test test_sub ... ok
test test_mul ... ok
test test_div_2 ... ok
 
test result: ok. 5 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
 
   Doc-tests cargo-difftests-sample-project
 
running 0 tests
 
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
 

And now we should have 3 .profraw files:

> ls
Cargo.lock
Cargo.toml
default_14538582753082375997_0_149916.profraw
default_17956391759092769319_0_141152.profraw
default_323744082823911785_0_145776.profraw
src/
target/
tests/

One came from the unit tests, one from the integration tests in tests/tests.rs, and one from the doc tests.

Again, we can go ahead and remove them:

rm default_*.profraw

Now, we will need cargo-difftests-testclient:

Cargo.toml

[package]
name = "cargo-difftests-sample-project"
version = "0.1.0"
edition = "2021"
 
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
 
[dependencies]
 
[dev-dependencies]
cargo-difftests-testclient = "0.1.0-alpha.3"

And we can go ahead and use it:

tests/tests.rs

use cargo_difftests_sample_project::*;
 
fn setup_difftests(test_name: &str) {
    #[cfg(cargo_difftests)] // the cargo_difftests_testclient crate is empty
    // without this cfg
    {
        // the temporary directory where we will store everything we need.
        // this should be passed to various `cargo difftests` subcommands as the
        // `--dir` option.
        let tmpdir = std::path::PathBuf::from(env!("CARGO_TARGET_TMPDIR"))
            .join("cargo-difftests").join(test_name);
        let difftests_env = cargo_difftests_testclient::init(
            cargo_difftests_testclient::TestDesc {
                // a "description" of the test.
                // cargo-difftests doesn't care about what you put here
                // (except for the bin_path field) but it is your job to use
                // the data in here to identify the test
                // and rerun it if needed.
                // those fields are here to guide you, but you can add any other
                // fields you might need (see the `other_fields` field below)
                pkg_name: env!("CARGO_PKG_NAME").to_string(),
                crate_name: env!("CARGO_CRATE_NAME").to_string(),
                bin_name: option_env!("CARGO_BIN_NAME").map(ToString::to_string),
                bin_path: std::env::current_exe().unwrap(),
                test_name: test_name.to_string(),
                other_fields: std::collections::HashMap::new(), // any other
                // fields you might want to add, to identify the test.
                // (the map is of type HashMap<String, String>)
            },
            &tmpdir,
        ).unwrap();
        // right now, the difftests_env is not used, but if
        // spawning children, it is needed to pass some environment variables to
        // them, like this:
        //
        // cmd.envs(difftests_env.env_for_children());
    }
}
 
#[test]
fn test_add() {
    setup_difftests("test_add");
    assert_eq!(add(1, 2), 3);
}
 
#[test]
fn test_sub() {
    setup_difftests("test_sub");
    assert_eq!(sub(3, 2), 1);
}
 
#[test]
fn test_mul() {
    setup_difftests("test_mul");
    assert_eq!(mul(2, 3), 6);
}
 
#[test]
fn test_div() {
    setup_difftests("test_div");
    assert_eq!(div(6, 3), Some(2));
}
 
#[test]
fn test_div_2() {
    setup_difftests("test_div_2");
    assert_eq!(div(6, 0), None);
}

NOTE

For it to work, you need to run the tests in separate processes. cargo nextest does that by default, but if you are using cargo test, you will need to do that yourself.

Also, the tests should not exit ::std::process::exit(code) or abort()-style as that will prevent the coverage data from being written to the .profraw file.

Now, we can run the tests:

cargo t --profile difftests --test tests test_add
cargo t --profile difftests --test tests test_sub
cargo t --profile difftests --test tests test_mul
cargo t --profile difftests --test tests test_div -- --exact
cargo t --profile difftests --test tests test_div_2

Now, we finally get to invoke cargo difftests for the first time:

cargo difftests analyze-all # since we used the default directory
cargo difftests analyze-all --dir target/tmp/cargo-difftests # explicit

We should get something like this:

[
  {
    "difftest": {
      "dir": "target/tmp/cargo-difftests\\test_add",
      "self_profraw": "target/tmp/cargo-difftests\\test_add\\self.profraw",
      "other_profraws": [],
      "self_json": "target/tmp/cargo-difftests\\test_add\\self.json",
      "profdata_file": "target/tmp/cargo-difftests\\test_add\\merged.profdata",
      "exported_profdata_file": "target/tmp/cargo-difftests\\test_add\\exported.json",
      "index_data": null
    },
    "test_desc": {
      "pkg_name": "cargo-difftests-sample-project",
      "crate_name": "tests",
      "bin_name": null,
      "bin_path": "C:\\Users\\Dinu\\samples\\cargo-difftests-sample-project\\target\\difftests\\deps\\tests-53cb4ce840823521.exe",
      "test_name": "test_add",
      "other_fields": {}
    },
    "verdict": "clean"
  },
  {
    "difftest": {
      "dir": "target/tmp/cargo-difftests\\test_div",
      "self_profraw": "target/tmp/cargo-difftests\\test_div\\self.profraw",
      "other_profraws": [],
      "self_json": "target/tmp/cargo-difftests\\test_div\\self.json",
      "profdata_file": "target/tmp/cargo-difftests\\test_div\\merged.profdata",
      "exported_profdata_file": "target/tmp/cargo-difftests\\test_div\\exported.json",
      "index_data": null
    },
    "test_desc": {
      "pkg_name": "cargo-difftests-sample-project",
      "crate_name": "tests",
      "bin_name": null,
      "bin_path": "C:\\Users\\Dinu\\samples\\cargo-difftests-sample-project\\target\\difftests\\deps\\tests-53cb4ce840823521.exe",
      "test_name": "test_div",
      "other_fields": {}
    },
    "verdict": "clean"
  },
  {
    "difftest": {
      "dir": "target/tmp/cargo-difftests\\test_div_2",
      "self_profraw": "target/tmp/cargo-difftests\\test_div_2\\self.profraw",
      "other_profraws": [],
      "self_json": "target/tmp/cargo-difftests\\test_div_2\\self.json",
      "profdata_file": "target/tmp/cargo-difftests\\test_div_2\\merged.profdata",
      "exported_profdata_file": "target/tmp/cargo-difftests\\test_div_2\\exported.json",
      "index_data": null
    },
    "test_desc": {
      "pkg_name": "cargo-difftests-sample-project",
      "crate_name": "tests",
      "bin_name": null,
      "bin_path": "C:\\Users\\Dinu\\samples\\cargo-difftests-sample-project\\target\\difftests\\deps\\tests-53cb4ce840823521.exe",
      "test_name": "test_div_2",
      "other_fields": {}
    },
    "verdict": "clean"
  },
  {
    "difftest": {
      "dir": "target/tmp/cargo-difftests\\test_mul",
      "self_profraw": "target/tmp/cargo-difftests\\test_mul\\self.profraw",
      "other_profraws": [],
      "self_json": "target/tmp/cargo-difftests\\test_mul\\self.json",
      "profdata_file": "target/tmp/cargo-difftests\\test_mul\\merged.profdata",
      "exported_profdata_file": "target/tmp/cargo-difftests\\test_mul\\exported.json",
      "index_data": null
    },
    "test_desc": {
      "pkg_name": "cargo-difftests-sample-project",
      "crate_name": "tests",
      "bin_name": null,
      "bin_path": "C:\\Users\\Dinu\\samples\\cargo-difftests-sample-project\\target\\difftests\\deps\\tests-53cb4ce840823521.exe",
      "test_name": "test_mul",
      "other_fields": {}
    },
    "verdict": "clean"
  },
  {
    "difftest": {
      "dir": "target/tmp/cargo-difftests\\test_sub",
      "self_profraw": "target/tmp/cargo-difftests\\test_sub\\self.profraw",
      "other_profraws": [],
      "self_json": "target/tmp/cargo-difftests\\test_sub\\self.json",
      "profdata_file": "target/tmp/cargo-difftests\\test_sub\\merged.profdata",
      "exported_profdata_file": "target/tmp/cargo-difftests\\test_sub\\exported.json",
      "index_data": null
    },
    "test_desc": {
      "pkg_name": "cargo-difftests-sample-project",
      "crate_name": "tests",
      "bin_name": null,
      "bin_path": "C:\\Users\\Dinu\\samples\\cargo-difftests-sample-project\\target\\difftests\\deps\\tests-53cb4ce840823521.exe",
      "test_name": "test_sub",
      "other_fields": {}
    },
    "verdict": "clean"
  }
]

As you can see, it’s quite verbose, but it’s also quite easy to see what’s going on. We are only interested in the name of the test and the verdict.

Verdict is always either clean or dirty, and then you can use the test_desc to get the name of the test to rerun.

Let’s touch the file src/lib.rs and see what happens:

touch src/lib.rs

cargo difftests analyze-all:

[
  {
    "difftest": {
      "dir": "target/tmp/cargo-difftests\\test_add",
      "self_profraw": "target/tmp/cargo-difftests\\test_add\\self.profraw",
      "other_profraws": [],
      "self_json": "target/tmp/cargo-difftests\\test_add\\self.json",
      "profdata_file": "target/tmp/cargo-difftests\\test_add\\merged.profdata",
      "exported_profdata_file": "target/tmp/cargo-difftests\\test_add\\exported.json",
      "index_data": null
    },
    "test_desc": {
      "pkg_name": "cargo-difftests-sample-project",
      "crate_name": "tests",
      "bin_name": null,
      "bin_path": "C:\\Users\\Dinu\\samples\\cargo-difftests-sample-project\\target\\difftests\\deps\\tests-53cb4ce840823521.exe",
      "test_name": "test_add",
      "other_fields": {}
    },
    "verdict": "dirty"
  },
  {
    "difftest": {
      "dir": "target/tmp/cargo-difftests\\test_div",
      "self_profraw": "target/tmp/cargo-difftests\\test_div\\self.profraw",
      "other_profraws": [],
      "self_json": "target/tmp/cargo-difftests\\test_div\\self.json",
      "profdata_file": "target/tmp/cargo-difftests\\test_div\\merged.profdata",
      "exported_profdata_file": "target/tmp/cargo-difftests\\test_div\\exported.json",
      "index_data": null
    },
    "test_desc": {
      "pkg_name": "cargo-difftests-sample-project",
      "crate_name": "tests",
      "bin_name": null,
      "bin_path": "C:\\Users\\Dinu\\samples\\cargo-difftests-sample-project\\target\\difftests\\deps\\tests-53cb4ce840823521.exe",
      "test_name": "test_div",
      "other_fields": {}
    },
    "verdict": "clean"
  },
  {
    "difftest": {
      "dir": "target/tmp/cargo-difftests\\test_div_2",
      "self_profraw": "target/tmp/cargo-difftests\\test_div_2\\self.profraw",
      "other_profraws": [],
      "self_json": "target/tmp/cargo-difftests\\test_div_2\\self.json",
      "profdata_file": "target/tmp/cargo-difftests\\test_div_2\\merged.profdata",
      "exported_profdata_file": "target/tmp/cargo-difftests\\test_div_2\\exported.json",
      "index_data": null
    },
    "test_desc": {
      "pkg_name": "cargo-difftests-sample-project",
      "crate_name": "tests",
      "bin_name": null,
      "bin_path": "C:\\Users\\Dinu\\samples\\cargo-difftests-sample-project\\target\\difftests\\deps\\tests-53cb4ce840823521.exe",
      "test_name": "test_div_2",
      "other_fields": {}
    },
    "verdict": "clean"
  },
  {
    "difftest": {
      "dir": "target/tmp/cargo-difftests\\test_mul",
      "self_profraw": "target/tmp/cargo-difftests\\test_mul\\self.profraw",
      "other_profraws": [],
      "self_json": "target/tmp/cargo-difftests\\test_mul\\self.json",
      "profdata_file": "target/tmp/cargo-difftests\\test_mul\\merged.profdata",
      "exported_profdata_file": "target/tmp/cargo-difftests\\test_mul\\exported.json",
      "index_data": null
    },
    "test_desc": {
      "pkg_name": "cargo-difftests-sample-project",
      "crate_name": "tests",
      "bin_name": null,
      "bin_path": "C:\\Users\\Dinu\\samples\\cargo-difftests-sample-project\\target\\difftests\\deps\\tests-53cb4ce840823521.exe",
      "test_name": "test_mul",
      "other_fields": {}
    },
    "verdict": "clean"
  },
  {
    "difftest": {
      "dir": "target/tmp/cargo-difftests\\test_sub",
      "self_profraw": "target/tmp/cargo-difftests\\test_sub\\self.profraw",
      "other_profraws": [],
      "self_json": "target/tmp/cargo-difftests\\test_sub\\self.json",
      "profdata_file": "target/tmp/cargo-difftests\\test_sub\\merged.profdata",
      "exported_profdata_file": "target/tmp/cargo-difftests\\test_sub\\exported.json",
      "index_data": null
    },
    "test_desc": {
      "pkg_name": "cargo-difftests-sample-project",
      "crate_name": "tests",
      "bin_name": null,
      "bin_path": "C:\\Users\\Dinu\\samples\\cargo-difftests-sample-project\\target\\difftests\\deps\\tests-53cb4ce840823521.exe",
      "test_name": "test_sub",
      "other_fields": {}
    },
    "verdict": "dirty"
  }
]

We can see that the test_add and test_sub tests have the “dirty” verdict, while the other tests still have the “clean” verdict. That is because we modified the src/lib.rs file (well technically it’s still the same, but by mtime rules it is different), and only the test_add and test_sub tests used code from src/lib.rs, while the others didn’t. Let’s rerun the test_add and test_sub tests:

cargo t --profile difftests --test tests test_addcargo t --profile difftests --test tests test_sub

Analyzing again:

cargo difftests analyze-all

Should give us "verdict": "clean" for all the tests.

Similarly, if we were to touch src/advanced_arithmetic.rs, we would get the “dirty” verdict for the test_multest_div and test_div_2 tests, but test_add and test_sub would still be “clean”.

I mentioned above that cargo difftests used the file system mtime by default to determine if a file was modified. This works well in most cases, but it also has 2 other git-diff based algorithms to choose from:

cargo difftests analyze-all --algo git-diff-files
# and
cargo difftests analyze-all --algo git-diff-hunks

To be able to use them, you need to have a git repository, with at least one commit, so let’s initialize one and commit our files:

git init
git add .
git commit -m "Initial commit"

In both cases, it’s recommended to rerun the tests right after each commit, so let’s do that:

cargo t --profile difftests --test tests test_add
cargo t --profile difftests --test tests test_sub
cargo t --profile difftests --test tests test_mul
cargo t --profile difftests --test tests test_div -- --exact
cargo t --profile difftests --test tests test_div_2

git-diff-files

What this does is explained above, but let’s see it in action.

If we analyze:

cargo difftests analyze-all --algo git-diff-files

It should give us clean on all tests.

Let’s try adding a few empty lines to src/lib.rs and analyzing again:

cargo difftests analyze-all --algo git-diff-files

Similarly to the mtime algorithm, we get the “dirty” verdict for the test_add and test_sub tests, but the others are still “clean”.

Now, if we remove the empty lines that we added and analyze again:

git reset --hard HEAD
cargo difftests analyze-all --algo git-diff-files

We should get the “clean” verdict for all the tests.

git-diff-hunks

Currently broken. To be done. edit: fixed in 0.1.0-alpha.3.

This algorithm is similar to the git-diff-files algorithm, but instead of considering the whole file, it looks only at hunks (groups of lines that were modified). If they were touched by a test, then that test should be considered dirty.

It’s highly recommended you go read the explanation of this in the first part of the blog post before deciding to use this, as it is the most error-prone if not used well, yet can be the most accurate out of all of them.

Let’s try it out:

git reset --hard HEAD # reset to HEAD
cargo t --profile difftests --test tests test_add -- --exact
cargo t --profile difftests --test tests test_sub -- --exact
cargo t --profile difftests --test tests test_mul -- --exact
cargo t --profile difftests --test tests test_div -- --exact
cargo t --profile difftests --test tests test_div_2 -- --exact

Let us edit just advanced_arithmetic::div_unchecked:

// src/advanced_arithmetic.rs
 
pub fn mul(a: i32, b: i32) -> i32 {
    a * b
}
 
pub fn div_unchecked(a: i32, b: i32) -> i32 {
    a / b // b is guaranteed to be != 0 // we modified this line
}
 
pub fn div(a: i32, b: i32) -> Option<i32> {
    if b != 0 {
        Some(div_unchecked(a, b))
    } else {
        None
    }
}

And analyze:

cargo difftests analyze-all --algo git-diff-hunks

test_div should be the only dirty test, as it is the only one that uses advanced_arithmetic::div_uncheckedtest_div_2 is not dirty, because div_unchecked is only reached if b != 0, and that is not the case in test_div_2.

The problems arise when the profiling data was not collected in a clean working tree.

For example, let us perform the following steps:

git reset --hard HEAD

Edit file:

// src/advanced_arithmetic.rs
 
pub fn mul(a: i32, b: i32) -> i32 {
    a * b
}
 
// add a few empty lines
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
pub fn div_unchecked(a: i32, b: i32) -> i32 {
    a / b
}
 
pub fn div(a: i32, b: i32) -> Option<i32> {
    if b != 0 {
        Some(div_unchecked(a, b))
    } else {
        None
    }
}

Now rerun the tests:

cargo t --profile difftests --test tests test_add -- --exact
cargo t --profile difftests --test tests test_sub -- --exact
cargo t --profile difftests --test tests test_mul -- --exact
cargo t --profile difftests --test tests test_div -- --exact
cargo t --profile difftests --test tests test_div_2 -- --exact

Now if we remove those empty lines, and make div_unchecked return a / b + 1:

// src/advanced_arithmetic.rs
 
pub fn mul(a: i32, b: i32) -> i32 {
    a * b
}
 
// add a few empty lines here
pub fn div_unchecked(a: i32, b: i32) -> i32 {
    a / b + 1
}
 
pub fn div(a: i32, b: i32) -> Option<i32> {
    if b != 0 {
        Some(div_unchecked(a, b))
    } else {
        None
    }
}

Now if we rerun the analysis:

cargo difftests analyze-all --algo git-diff-hunks
[
  {
    "difftest": {
      "dir": "target/tmp/cargo-difftests\\test_add",
      "self_profraw": "target/tmp/cargo-difftests\\test_add\\self.profraw",
      "other_profraws": [],
      "self_json": "target/tmp/cargo-difftests\\test_add\\self.json",
      "profdata_file": "target/tmp/cargo-difftests\\test_add\\merged.profdata",
      "exported_profdata_file": "target/tmp/cargo-difftests\\test_add\\exported.json",
      "index_data": null
    },
    "test_desc": {
      "pkg_name": "cargo-difftests-sample-project",
      "crate_name": "tests",
      "bin_name": null,
      "bin_path": "C:\\Users\\Dinu\\samples\\cargo-difftests-sample-project\\target\\difftests\\deps\\tests-53cb4ce840823521.exe",
      "test_name": "test_add",
      "other_fields": {}
    },
    "verdict": "clean"
  },
  {
    "difftest": {
      "dir": "target/tmp/cargo-difftests\\test_div",
      "self_profraw": "target/tmp/cargo-difftests\\test_div\\self.profraw",
      "other_profraws": [],
      "self_json": "target/tmp/cargo-difftests\\test_div\\self.json",
      "profdata_file": "target/tmp/cargo-difftests\\test_div\\merged.profdata",
      "exported_profdata_file": "target/tmp/cargo-difftests\\test_div\\exported.json",
      "index_data": null
    },
    "test_desc": {
      "pkg_name": "cargo-difftests-sample-project",
      "crate_name": "tests",
      "bin_name": null,
      "bin_path": "C:\\Users\\Dinu\\samples\\cargo-difftests-sample-project\\target\\difftests\\deps\\tests-53cb4ce840823521.exe",
      "test_name": "test_div",
      "other_fields": {}
    },
    "verdict": "clean"
  },
  {
    "difftest": {
      "dir": "target/tmp/cargo-difftests\\test_div_2",
      "self_profraw": "target/tmp/cargo-difftests\\test_div_2\\self.profraw",
      "other_profraws": [],
      "self_json": "target/tmp/cargo-difftests\\test_div_2\\self.json",
      "profdata_file": "target/tmp/cargo-difftests\\test_div_2\\merged.profdata",
      "exported_profdata_file": "target/tmp/cargo-difftests\\test_div_2\\exported.json",
      "index_data": null
    },
    "test_desc": {
      "pkg_name": "cargo-difftests-sample-project",
      "crate_name": "tests",
      "bin_name": null,
      "bin_path": "C:\\Users\\Dinu\\samples\\cargo-difftests-sample-project\\target\\difftests\\deps\\tests-53cb4ce840823521.exe",
      "test_name": "test_div_2",
      "other_fields": {}
    },
    "verdict": "clean"
  },
  {
    "difftest": {
      "dir": "target/tmp/cargo-difftests\\test_mul",
      "self_profraw": "target/tmp/cargo-difftests\\test_mul\\self.profraw",
      "other_profraws": [],
      "self_json": "target/tmp/cargo-difftests\\test_mul\\self.json",
      "profdata_file": "target/tmp/cargo-difftests\\test_mul\\merged.profdata",
      "exported_profdata_file": "target/tmp/cargo-difftests\\test_mul\\exported.json",
      "index_data": null
    },
    "test_desc": {
      "pkg_name": "cargo-difftests-sample-project",
      "crate_name": "tests",
      "bin_name": null,
      "bin_path": "C:\\Users\\Dinu\\samples\\cargo-difftests-sample-project\\target\\difftests\\deps\\tests-53cb4ce840823521.exe",
      "test_name": "test_mul",
      "other_fields": {}
    },
    "verdict": "clean"
  },
  {
    "difftest": {
      "dir": "target/tmp/cargo-difftests\\test_sub",
      "self_profraw": "target/tmp/cargo-difftests\\test_sub\\self.profraw",
      "other_profraws": [],
      "self_json": "target/tmp/cargo-difftests\\test_sub\\self.json",
      "profdata_file": "target/tmp/cargo-difftests\\test_sub\\merged.profdata",
      "exported_profdata_file": "target/tmp/cargo-difftests\\test_sub\\exported.json",
      "index_data": null
    },
    "test_desc": {
      "pkg_name": "cargo-difftests-sample-project",
      "crate_name": "tests",
      "bin_name": null,
      "bin_path": "C:\\Users\\Dinu\\samples\\cargo-difftests-sample-project\\target\\difftests\\deps\\tests-53cb4ce840823521.exe",
      "test_name": "test_sub",
      "other_fields": {}
    },
    "verdict": "clean"
  }
]

All the tests are considered clean, but that is clearly wrong. This is one of the pitfalls of using --algo=git-diff-hunks: It’s not accurate when running the tests in a worktree that has uncommitted changes. In this case, --algo=git-diff-files would still work, while --algo=git-diff-hunks gives flat out incorrect results.

Now the question that would naturally arise:

Which algorithm should I use?

The answer is: it depends. If you understand and can manage the pitfalls of git-diff-hunks, that’s the best option. Otherwise, git-diff-files is another good option; although it suffers from the same problems as git-diff-hunks, they are not as severe. Although it’s not as accurate as the git-diff-based ones, fs-mtime can (almost) never go wrong (it is actually hard to get it to go wrong), and is therefore the default, so if you’re unsure, just use that.

Test indexes

The cargo difftests analyze-all command can also generate and use test indexes, which are JSON files that contain simpler versions of the extracted profdata files, making subsequent analyze calls a lot faster. In our small sample project, we don’t get much of an improvement, but in a larger project it can be significant. For example, in upsilon I got a 23x speedup (from 7s down to 0.3s) when using indexes.

To use them:

cargo difftests analyze-all --index-root ... --index-strategy always
# or
cargo difftests analyze-all --index-root ... --index-strategy if-available

But note that the if-available strategy will only use the index if it exists, and will not generate it if it doesn’t.

The --index-root argument is the path to the directory where the indexes will be stored.

Appendix

Appendix A: Versions

The toolchain used in this guide was nightly-2023-02-03-x86_64-pc-windows-msvc.

cargo-difftests version: 0.1.0-alpha.3

Appendix B: Repository

The repository for this guide can be found here.