Edit (2024-02-25): This is only here for historical reasons, and to show how much cargo-difftests has changed since it was first released. Although the basic commands are still almost the same, cargo-difftests today has some newer commands which provide an easier and nicer interface to interact with.
A few days ago, I wrote a post about upsilon-difftests, a tool that tells you about the tests that have changed since the last commit / test run, but somewhat tailored to my upsilon project. Since then, I thought it would be nice to extract the core functionality into a few separate crates, and make it available for others to use. In this post, I’ll give a quick introduction to the cargo-difftests crate, and show you how to get the most out of it.
Abstract
cargo-difftests is a tool that works with coverage data, and can tell you which tests have changed since the last commit / based on file system mtimes.
In the next section, I will go over how it achieves this, but if you just want the guide to set it up, feel free to skip to the walkthrough section.
self.json contains the information that you pass about the tests.
self.profraw contains the coverage data for the code of the test binary itself.
cargo_difftests_version contains the version of cargo-difftests that generated the directory.
....profraw being the other profraw files generated by the binaries the test invokes.
cargo-difftests
After we have the directory structure, we can use cargo-difftests to figure out if any of the source files involved in the test have changed.
Under the hood, it will call rust-profdata merge to merge all the .profraw files into a .profdata file, and then call rust-cov export to get the coverage data from the .profdata file. (rust-profdata and rust-cov come from cargo-binutils)
Optionally, it can export the coverage data json into a smaller “test index”, which contains only the information that it actually uses, and nothing more.
In practice (in upsilon), when the exported coverage data is a json file of about 40 MiBs, the test index comes up to about 20 KiBs, so using that for the analysis is a lot faster, almost instant.
After we have some exported coverage data, we can use it to figure out which source files were involved in the test, and then check if any of them have changed.
Note
cargo-difftests will use the mtime of the self.json file to determine when the test was run.
From the coverage data, and depending on the passed --algo option, it will go one of three ways, but for the sake of the explanation, we’ll just call the modified files “dirty”. If any of the files involved in a test were marked as dirty, then so will the test.
--algo=fs-mtime (default)
This is the simplest algorithm, and it will just check if any of the source files that have been involved in the test have changed since the test was run (by comparing mtimes). So, the set of dirty files is the set of all files from the repository that have been changed since the test was run.
--algo=git-diff-files
Very similar to --algo=fs-mtime, but it will diff the HEAD with the worktree to find out which files have changed. Here, the set of dirty files is the set of all files from the repository that have been changed since the last commit (and not test run).
--algo=git-diff-hunks
This is a more advanced one. Similarly to --algo=git-diff-files, it will diff the HEAD with the worktree, but instead of just checking if the file was modified, it will look at the hunks that were modified, and it will try to intersect them with the regions from the coverage data. To try to put it mathematically, the set of dirty files is the set of all files that changed since the last commit, and each file has to have at least one diff hunk that intersects any of the regions of the coverage data with a count > 0.
It has almost the same drawback as --algo=git-diff-files, that is, it cannot know the state of the repository at the last test run, just at the last commit, but when that is indeed the case, it will yield the most accurate results out of the three.
In git, a hunk is a part of a file that was changed. It’s not always a single line, but it can be multiple lines. In libgit2, it is identified by a tuple like (old_line_start, old_line_count, new_line_start, new_line_count).
When old_line_count is 0, it means that the hunk is an addition, and when new_line_count is 0, it means that the hunk is a deletion. In other cases, it’s a modification.
The way this algorithm works is that it will intersect (old_line_start..old_line_start + old_line_count) with the regions from the coverage data, and if there is an intersection, it will mark the file as dirty.
This only works if the file was not modified since the last commit when the tests were run.
--algo=git-diff-hunks vs. --algo=git-diff-files
--algo=git-diff-hunks will be more accurate than --algo=git-diff-files, assuming the last test run was right after the last commit, when none of the source files have been changed (that is, the worktree was clean), but in the case that the worktree of the repository was dirty when the test was run, --algo=git-diff-hunks would catastrophically fail, giving false positives and false negatives, while --algo=git-diff-files would still yield somewhat correct results, although with a few false positives.
This is just their overview, and we will compare how those algorithms work in practice in the walkthrough, including how --algo=git-diff-hunks would fail while --algo=git-diff-files would still work (for now git-diff-hunks is completely broken, and I will update this once it is fixed edit: fixed in 0.1.0-alpha.3).
Walkthrough
Prerequisites
NOTE
Needs rust nightly.
cargo-difftests uses rust-profdata and rust-cov from cargo-binutils under the hood, so you will need to install that first, along with the llvm-tools themselves:
Now, to install cargo-difftests:
Setup
Let us start with a new project:
We will create a new profile called difftests, that will use code coverage:
If we just run cargo run --profile difftests, we will get:
And if we ls, we should have a .profraw file:
We don’t need it, so feel free to delete it:
Great! Now, let’s add some functions we can test:
And now we can add some tests:
Running them right now gives us:
And now we should have 3 .profraw files:
One came from the unit tests, one from the integration tests in tests/tests.rs, and one from the doc tests.
Again, we can go ahead and remove them:
Now, we will need cargo-difftests-testclient:
Cargo.toml
And we can go ahead and use it:
tests/tests.rs
NOTE
For it to work, you need to run the tests in separate processes. cargo nextest does that by default, but if you are using cargo test, you will need to do that yourself.
Also, the tests should not exit ::std::process::exit(code) or abort()-style as that will prevent the coverage data from being written to the .profraw file.
Now, we can run the tests:
Now, we finally get to invoke cargo difftests for the first time:
We should get something like this:
As you can see, it’s quite verbose, but it’s also quite easy to see what’s going on. We are only interested in the name of the test and the verdict.
Verdict is always either clean or dirty, and then you can use the test_desc to get the name of the test to rerun.
Let’s touch the file src/lib.rs and see what happens:
cargo difftests analyze-all:
We can see that the test_add and test_sub tests have the “dirty” verdict, while the other tests still have the “clean” verdict. That is because we modified the src/lib.rs file (well technically it’s still the same, but by mtime rules it is different), and only the test_add and test_sub tests used code from src/lib.rs, while the others didn’t. Let’s rerun the test_add and test_sub tests:
Analyzing again:
Should give us "verdict": "clean" for all the tests.
Similarly, if we were to touch src/advanced_arithmetic.rs, we would get the “dirty” verdict for the test_mul, test_div and test_div_2 tests, but test_add and test_sub would still be “clean”.
I mentioned above that cargo difftests used the file system mtime by default to determine if a file was modified. This works well in most cases, but it also has 2 other git-diff based algorithms to choose from:
To be able to use them, you need to have a git repository, with at least one commit, so let’s initialize one and commit our files:
In both cases, it’s recommended to rerun the tests right after each commit, so let’s do that:
git-diff-files
What this does is explained above, but let’s see it in action.
If we analyze:
It should give us clean on all tests.
Let’s try adding a few empty lines to src/lib.rs and analyzing again:
Similarly to the mtime algorithm, we get the “dirty” verdict for the test_add and test_sub tests, but the others are still “clean”.
Now, if we remove the empty lines that we added and analyze again:
We should get the “clean” verdict for all the tests.
git-diff-hunks
Currently broken. To be done. edit: fixed in 0.1.0-alpha.3.
This algorithm is similar to the git-diff-files algorithm, but instead of considering the whole file, it looks only at hunks (groups of lines that were modified). If they were touched by a test, then that test should be considered dirty.
It’s highly recommended you go read the explanation of this in the first part of the blog post before deciding to use this, as it is the most error-prone if not used well, yet can be the most accurate out of all of them.
Let’s try it out:
Let us edit just advanced_arithmetic::div_unchecked:
And analyze:
test_div should be the only dirty test, as it is the only one that uses advanced_arithmetic::div_unchecked. test_div_2 is not dirty, because div_unchecked is only reached if b != 0, and that is not the case in test_div_2.
The problems arise when the profiling data was not collected in a clean working tree.
For example, let us perform the following steps:
Edit file:
Now rerun the tests:
Now if we remove those empty lines, and make div_unchecked return a / b + 1:
Now if we rerun the analysis:
All the tests are considered clean, but that is clearly wrong. This is one of the pitfalls of using --algo=git-diff-hunks: It’s not accurate when running the tests in a worktree that has uncommitted changes. In this case, --algo=git-diff-files would still work, while --algo=git-diff-hunks gives flat out incorrect results.
Now the question that would naturally arise:
Which algorithm should I use?
The answer is: it depends. If you understand and can manage the pitfalls of git-diff-hunks, that’s the best option. Otherwise, git-diff-files is another good option; although it suffers from the same problems as git-diff-hunks, they are not as severe. Although it’s not as accurate as the git-diff-based ones, fs-mtime can (almost) never go wrong (it is actually hard to get it to go wrong), and is therefore the default, so if you’re unsure, just use that.
Test indexes
The cargo difftests analyze-all command can also generate and use test indexes, which are JSON files that contain simpler versions of the extracted profdata files, making subsequent analyze calls a lot faster. In our small sample project, we don’t get much of an improvement, but in a larger project it can be significant. For example, in upsilon I got a 23x speedup (from 7s down to 0.3s) when using indexes.
To use them:
But note that the if-available strategy will only use the index if it exists, and will not generate it if it doesn’t.
The --index-root argument is the path to the directory where the indexes will be stored.
Appendix
Appendix A: Versions
The toolchain used in this guide was nightly-2023-02-03-x86_64-pc-windows-msvc.