A golden test is a fancy way of doing assertEquals() on a string or a directory tree, where the expected output is kept in a separate file or files -- the golden files. If the actual output does not match the expected output, the test runner can optionally run an interactive file comparison tool such as Meld to display the differences and allow you to selectively merge the differences into the golden file.
This is useful when
- the data being checked is large - too large to embed into the Python source; or
- the data contains relatively inconsequential details, such as boilerplate text or formatting, which might be changed frequently.
class ExampleTest(golden_test.GoldenTestCase): def test_formatting_html(self): obj = make_some_example_object() temp_dir = self.make_temp_dir() format_as_html(obj, temp_dir) self.assert_golden(temp_dir, os.path.join(os.path.dirname(__file__), "golden-files")) if __name__ == "__main__": golden_test.main()By default, the test runs non-interactively, which is what you want on a continuous integration machine, and it will print a diff if it fails. To switch on the semi-interactive mode which runs Meld, you run the test with the option --meld.
Here is a simple version of the test helper (taken from here):
import os import subprocess import sys import unittest class GoldenTestCase(unittest.TestCase): run_meld = False def assert_golden(self, dir_got, dir_expect): assert os.path.exists(dir_expect), dir_expect proc = subprocess.Popen(["diff", "--recursive", "-u", "-N", "--exclude=.*", dir_expect, dir_got], stdout=subprocess.PIPE) stdout, stderr = proc.communicate() if len(stdout) > 0: if self.run_meld: # Put expected output on the right because that is the # side we usually edit. subprocess.call(["meld", dir_got, dir_expect]) raise AssertionError( "Differences from golden files found.\n" "Try running with --meld to update golden files.\n" "%s" % stdout) self.assertEquals(proc.wait(), 0) def main(): if sys.argv[1:2] == ["--meld"]: GoldenTestCase.run_meld = True sys.argv.pop(1) unittest.main()(It's a bit cheesy to modify global state on startup to enable melding, but because unittest doesn't make it easy to pass parameters into tests this is the simplest way of doing it.)
Golden tests have the same sort of advantages that are associated with test-driven development in general.
- Golden files are checked into version control and help to make changesets self-documenting. A changeset that affects the program's output will include patches that demonstrate how the output is affected. You can see the history of the program's output in version control. (This assumes that everyone runs the tests before committing!)
- Sometimes you can point people at the golden files if they want to see example output. For HTML, sometimes you can contrive the CSS links to work so that the HTML looks right when viewed in a browser.
- And of course, this can catch cases where you didn't intend to change the program's output.
Other times, one change will affect many locations in the golden files, and adding a new test is not necessary. It's usually not too difficult to quickly eyeball the differences with Meld.
Here are some of the things that I have used golden files to test:
- formatting of automatically-generated e-mails
- automatically generated configuration files
- HTML formatting logs (build_log_test.py and its golden files)
- pretty-printed output of X Windows messages in xjack-xcb (golden_test.py and golden_test_data). This ended up testing several components in one go:
- the XCB protocol definitions for X11
- the encoders and decoders that work off of the XCB protocol definitions
- the pretty printer for the decoder
It can be tempting to overuse golden tests. As with any test suite, avoid creating one big example that tries to cover all cases. (This is particularly tempting if the test is slow to run.) Try to create smaller, separate examples. The test helper above is not so good at this, because if you are not careful it can end up running Meld several times. In the past I have put several examples into (from unittest's point of view) one test case, so that it runs Meld only once.
Golden tests might not work so well if components can be changed independently. For example, if your XML library changes its whitespace pretty-printing, the tests' output could change. This is less of a problem if your code is deployed with tightly-controlled versions of libraries, because you can just update the golden files when you upgrade libraries.
A note on terminology: I think I got the term "golden file" from my previous workplace, where other people were using them, and the term seems to enjoy some limited use judging from Google. "Golden test", however, may have been a term that I have made up and that no-one else outside my workplace is using for this meaning.