Imsi Tracking
imsi has been built to facilate the setup, configuration, and running of
complex physics-based models on different HPC platforms. For these problems,
there are countless degrees of freedom involved:
model code version
physical parameters/settings
compiler configuration (ex: compiler used; optimization settings)
technical parameters/settings (ex: MPI layout)
sequencing configuration
machine specific settings
etc…
All of these settings can make reproducibility hard to ensure, as it is easy for
human eyes to lose track of all the settings they have activated. As such, a tracking toolkit has been
added to imsi to help with this - specifically it tracks:
What
imsicommands were executed to setup and manipulate the runWhat config files are actually used by the simulation and how they’ve changed throughout it
What version of the source code has been used for the simulation and if any changes occurred during it
What is tracked?
At a high level, there are three distinct items that need to be considered to reproduce a run:
what version of the source repo was used?
what version of the config files were used?
what machine was the simulation ran on?
Where No. 3 is implicitly tracked in the config files. As such, the majority of the
imsi tracking system is devoted to tracking the status/version of the source repo under src/
and the config files under config/ - it is important to note that to faciliate easy tracking of the files under config/,
imsi initiates it as a local git repo.
With the above stated, for each of these directories imsi tracks:
the commit hash
the status of the repo
any
diffsfound in the repos
where the details are stored under .imsi/states within the run’s setup directory.
The imsi states directory
To track the uniqueness of the config and src directories, imsi relies on md5sum to checksum
the contents and produce one unique hash to represent the status of these directories - these unique hashes are
what users will find under
.imsi/states
Under each hash directory, you can find:
src_*_rev.txt
src_*_status.txt
src_*_diff.diff
config_*_rev.txt
config_*_status.txt
config_*_diff.diff
where (for each relevant repo):
*_rev.txtfiles contain the current git commit hashes*_status.txtfiles contain information on what files have changes*_diff.difffiles contain the actualgit diffoutput
When does tracking occur
By default, imsi only logs the above information for certain cli commands - specifically:
imsi configimsi reloadimsi setimsi build`imsi submitimsi save-restarts
In addition to tracking the config/ and src/ repos, imsi also stores a
cli command log at .imsi-cli.log in the setup directory.
Note
Due to the implementation of imsi ensemble, if the above commands are executed
using imsi ensemble <command>, imsi will still log the necessary information for
each member of the ensemble
How to add tracking points
While the above mentioned log points provide a good default state-logging framework, users might wish to have explicit state-logging at other points throughout their job scripts. For example, some groups might wish to explicitly track state of things right before the model launches in order to ensure no local user changes might go un-noticed. To do this, users can instrument their scripting with
imsi log-state -m "USEFUL LOGGING MESSAGE" -p /path/to/runid/setup/directory
This will then make imsi track the state of the various directories at that exact point.
What to do with tracking artifacts?
If you are on an HPC system where you can keep runs on-disk for a long time, simply relying on the various directory structures might be enough for you. However in most cases, users will need to clean-up simulations after they are completed and so the necessary reproducibility information might be lost.
As such, if you have access to an archiving system, it is recommended that users setup a job to dump
the local
config/directory andthe local
.imsi/statesdirectory
to whatever archive system their machines have access to. With this, users should be able to determine all the necessary details to potentially re-run past simulations.