The approach is simple: use
git to track files that generate output.
git automatically assigns a unique 40 character alphanumeric string (a "hash") that identifies the state of a repository.
By saving the value of the hash when a certain output file is created, we know what code created the output.
With data files, it is simple to add an extra variable containing the hash.
With figures, I use the metadata fields to save the hash value.
Getting the current hash in MATLAB
The following MATLAB function
githash will return the hash of the last commit that modified the file in
fname. If not provided with
fname it returns the hash of the last commit in the repository.
function [hash] = githash(fname, gitdir) if ~exist('fname', 'var') fname = '.'; end if ~exist('gitdir', 'var') gitdir = ''; else gitdir = ['--git-dir=' gitdir]; end [~, hashout] = system(['TERM=xterm git ' gitdir ... ' log -n 1 --no-color --pretty=format:''%H'' ''' ... fname ''' < /dev/null']); % remove bash escape characters hash = hashout(9:48) end
Using it in a MATLAB script requires the incantation
hash = githash([mfilename('fullpath') '.m']);
githash with the path to the current mfile that is calling
Quite frequently, I calculate diagnostics that take a while which means that rerunning them every time I make an image is not feasible. I save the
hash variable to the file containing diagnostic output. This lets me know what version of the code created that version of the saved output.
Using the hash
MATLAB's FileExchange has a couple of useful scripts
getAnnotation that insert and recover metadata in MATLAB figure windows.
An obvious choice is to save the hash. More importantly, one can save the exact function call that generated a figure. Then, you know two things:
- the version of the code that created the figure, and
- all parameters provided to the code;
both of which are saved in the metadata of the figure itself.
getAnnotation can then recover the saved metadata when saving a figure to file.
Saving the hash in an image file
In general, all you need is a line that looks like
system(['exiftool -overwrite_original -Producer=' ... hash ' ' pdf_nam]);
The above tells exiftool to save the contents of variable
hash in the metadata field
Producer of the file named
pdf_nam. The slight complication here is that the metadata field names are not standardized among different image formats.
exiftool is only required to modify the metadata fields of PDF and EPS files. MATLAB's
imwrite can write metadata to bitmap files (e.g. PNG).
hash in my fork of
export_fig.m will show you how
imwrite can be used.
Extracting commit hash from image metadata
To recover the recorded hash, it suffices to call
exiftool FILENAME which will print all metadata stored in the image; not just the hash.
grep can then find the recorded hash:
#!/bin/bash # displays saved git hash of a provided file using exiftool file=$1 hash=$(exiftool $file | grep -i "hash:") echo $hash