Ensemble Tool
Overview
The ensemble tool enables the extension of IMSI to many runs. It implements IMSI commands as subcommands of the ensemble tool, so existing commands are replicated in the tool but are applied across multiple ensemble members. Users can refer to the main implementation details of the available commands to view behaviour.
See imsi ensemble --help for a list of available commands.
Usage: imsi ensemble [OPTIONS] COMMAND1 [ARGS]... [COMMAND2 [ARGS]...]...
Commands to manage ensembles of imsi runs. To determine behaviour of each
ensemble command, refer to the help message of the underlying imsi command.
WARNING: Where the main IMSI tools may prompt users for confirmation before
taking action, the ensemble tool does not. Users should take caution of the
location of their working directories and be aware of the underlying action
the ensemble tool is taking.
Options:
--config-path TEXT Path to the entry config. [required]
--show-diffs Summarize differences between ensemble members if they
exist.
-h, --help Show this message and exit.
Commands:
build Compile the ensemble member directories.
config Run imsi config for each ensemble member.
save-restarts Save the restart files for each ensemble member.
setup Run imsi setup for each ensemble member.
status Check the status of the ensemble jobs.
submit Submit the ensemble jobs.
Configuration Files
The ensemble tool can be used in several ways, including from a single high-level config,
to external tables defining ensemble members as .yaml or .csv files,
or combinations of both tables and the high-level config. However, regardless of how a user decides to use the ensamble tool,
its critical to understand what the underlying tool is doing:
The ensemble tool composes a set of changes for each ensemble member.
These changes are mapped directly onto the Resolved Configuration with the defined hierarchy for each ensemble member (users are responsible for setting the hierarchy correctly!)
The ensemble tool then loops through the
imsi (ensemble) <command>for each ensemble member with the modified Resolved Configuration
After setting up their first ensemble, users are encouraged to look at the Resolved Configuration files for each ensemble member,
which are stored in each ensemble member’s setup directory as imsi_configuration_{runid}.yaml, and verify that their modifications
have been correctly applied at the correct hierarchical levels.
The entry config (required)
Defining the entry configuration file
The entry config is a .yaml file that must be defined at every invocation of the ensemble tool defined with --config-path ....
For convenience, the default file is $(pwd)/config.yaml, but --config-path can point to any valid .yaml file on disk.
The ensemble tool reads the high-level config and sets up the ensemble members accordingly, e.g.:
imsi ensemble --config-path=config.yaml <command>
Contents of the entry config file
Inside of the entry config are sets of required parameters: the ensemble-level and member-level parameters.
ensemble_level parameters
These are global parameters for ensemble runs that are logically defined once per ensemble run. They do not specify any information about individual members.
Required Parameters |
Description |
|---|---|
|
Defaults to |
|
Defaults to |
|
Defaults to |
Optional Parameters |
Description |
|
Path to the configuration table. See details on configuration tables below. |
|
|
An example ensemble_level configuration could look like the following:
ensemble_level:
user: ${oc.env:USER} # required, recommended example automatically sets omegaconf interpolation to $USER
run_directory: /output/path/to/ensemble/setup_dirs/ # optional, defaults to pwd
config_table: table.csv # optional table, see details below
share_repo: true # optional, defaults to false
Warning
The ensemble tool is configured to overwrite any existing setup directories defined under run_directory.
Where the main IMSI tools may prompt users for confirmation before taking action, the ensemble tool does not.
Users should take caution of the location of their working directories and be aware of the
underlying action the ensemble tool is taking.
member_level parameters
These parameters are defined once per ensemble member and represent any parameter from the Resolved Configuration File including the full hierarchy in the cast of nested parameters.
Tip
In a configuration file, the full hierarchy to a parameter is referred to as its key-path. For example, the hierarchical configuration
components:
CanAM:
namelists:
canam_settings:
phys_parm:
pp_rdm_num_pert: <value>
has the key-path components -> CanAM -> namelists -> canam_settings -> phys_parm -> pp_rdm_num_pert. This concept can be used to construct aliases for long key-paths.
Creating ensembles via the entry config
Any member_level variable may be defined as either a single value or a list of values.
If a list is provided, the ensemble tool will generate a separate run for each value in that list.
When multiple member_level variables are defined as lists, their list indices are aligned: values with the same index across different variables correspond to the same ensemble member.
For example, the following member_level configuration
member_level:
this:
is:
a:
parameter: [value-a, value-b]
another:
parameter: [value-01, value-02]
constructs two ensemble members with the following key-paths defined:
Ensemble Member 1:
this -> is -> a -> parameter:value-a
another -> parameter:value-01Ensemble Member 2:
this -> is -> a -> parameter:value-b
another -> parameter:value-02
Tip
In order to determine the correct key path or hierarchy for member level parameters,
users should refer to a reference Resolved Configuration File on disk
(i.e. imsi_configuration_{runid}.yaml). Running imsi setup ...
might be necessary to generate this file for reference.
Aliases
Tip
Key paths can quickly become long and ugly. To help shorten this and make tables more readable,
you can specify an alias for long key-paths in the ensemble_level: aliases section. This alias then becomes
global and can be used in configuration tables (see below) or in the member_level section of the entry config file.
ensemble_level:
user: ${oc.env:USER}
...
aliases:
# the alias key can be any dictionary compatible string
pp_rdm_num_pert: components:CanAM:namelists:canam_settings:phys_parm:pp_rdm_num_pert
pp_rdm_num_pert can then be used in place of the full key-path in configuration tables or the member_level section.
Setup parameters
Setup parameters include any argument that could be provided to imsi setup. All setup parameters must be defined under the subkey setup to be
correctly recognized and used by imsi. For example:
member_level:
setup:
runid: [run-01, run-02]
model: [canesm51_p1, canam51_p1]
exp: [cmip6-piControl, cmip6-amip]
Warning
Failure to place setup keys under setup will prevent imsi from
recognizing setup parameters and will either fail or resort to defaults leading to unexpected behaviour.
Configuration tables via the config_table parameter (optional)
While the entry config file is required for all ensembles, IMSI’s ensemble tool also supports the definition of external
configuration tables at the ensemble_level to define ensemble members in a bulk format.
Configuration tables allow for easier (and sometimes more flexible and explicit) modification of ensemble member parameters, especially for large ensembles.
Configuration tables store discrete ensemble member runs and their associated parameter modifications.
IMSI’s ensemble tool supports .yaml and .csv formats. Legacy .txt support is still available but deprecated, with a DeprecationWarning
issued when used.
Note
We recommend that users use the .yaml format for external tables due to its explicit representation of key hierarchies
When using external configuration tables, users have the option of defining all ensemble member parameters in the table itself,
or defining a subset of parameters in the table and using the entry config to define common parameters. When defining all parameters in the table,
the entry config’s member_level section can be left empty (i.e. member_level: {}).
Supported Configuration Formats for ensemble_level: config_table
To illustrate the use of configuration tables, consider the following entry config file:
# entry config file
ensemble_level:
user: ${oc.env:USER}
run_directory: /output/path/to/ensemble/setup_dirs/
config_table: config/example.(yaml|csv) # path to config table
share_repo: true
member_level: {}
.yaml table format
To have an equivalent ensemble to previous examples, the config/example.yaml would then contain:
# example.yaml
- setup:
runid: run-01
model: canesm51_p1
exp: cmip6-piControl
- setup:
runid: run-02
model: canam51_p1
exp: cmip6-amip
Note
The external .yaml config table is a list where each hyphen - denotes a new list item, with each item in the list representing an ensemble member.
CSV table format
For a .csv config table, the same content config_table: config/example.csv would look like:
# example.csv
setup:runid, setup:model, setup:exp
run-01, canesm51_p1, cmip6-piControl
run-02, canam51_p1, cmip6-amip
Note
In .csv config tables, the header row must contain the full key-paths for each parameter,
with nested keys separated by colons :. Each subsequent row represents an ensemble member.
Advanced Configuration Techniques
Tip
Users can always refer to the Resolved Configuration File for each ensemble member to understand how their configurations are being applied and to verify settings are being applied as expected.
Mixing member_level parameters and tables via broadcasting
The ensemble tool resolves configurations from multiple sources as follows:
If a key exists in both
member_leveland aconfig_table.(yaml|csv), the table value overrides and issues a warning.If a key exists only in
member_level:
Single values are broadcasted to all ensemble members.
Lists must match the number of ensemble members defined in the table.
Any overlapping keys (even lists) are overridden by the
config_tablevalues. If they don’t exist in theconfig_table, they are broadcasted to all ensemble members.
Note
Broadcasting in this context means that singular values are copied and applied to each ensemble member. Lists are broadcasted to each ensemble member in the order they are defined.
Omitting parameters in configuration tables
For tables, the ensemble tool supports configurations where users can omit parameters from ensemble runs that are present in other members. This makes the tool flexible to arbitrary configuration structures, and allows the ability to create diverse ensembles.
For example, the following config tables are valid:
CSV:
setup:runid, setup:model, some:imsi:parameter
run-01-csv, canesm51_p1,
run-02-csv, canam51_p1, 123
YAML:
- setup:
runid: run-01-yaml-table
model: canesm51_p1
- setup:
runid: run-02-yaml-table
model: canam51_p1
some: # note how this level is only defined for this member
imsi:
parameter: 123
Warning
This technique only available for table configurations due to the structure of ensemble
member definitions in a member_level section.
Modifying lower level configuration parameters
The ensemble tool allows for the modification of any non-setup parameter in the resolved yaml file (i.e. imsi_configuration_{runid}.yaml).
Below are some examples for how to modify the parameter pp_rdm_num_pert.
Important
As mentioned many times now, the parameter that is being modified must contain the entire
heriarchy of the Resolved Configuration file (i.e. imsi_configuration_{runid}.yaml). The ensemble tool
modifies the resolved .yaml file in place and runs imsi config on the modified file.
If a new key is added to the resolved``.yaml`` by the ensemble tool, it will warn users.
Low level parameter modification in a .yaml table (recommended method):
- setup:
runid: run-01
model: canesm51_p1
exp: cmip6-piControl
components:
CanAM:
namelists:
canam_settings:
phys_parm:
pp_rdm_num_pert: 0
- setup:
runid: run-02
model: canam51_p1
exp: cmip6-amip
components:
CanAM:
namelists:
canam_settings:
phys_parm:
pp_rdm_num_pert: 2
Low level parameter modification in the entry config file
ensemble_level:
user: ${oc.env:USER}
...
member_level:
setup:
runid: [run-01, run-02]
components:
CanAM:
namelists:
canam_settings:
phys_parm:
pp_rdm_num_pert: [0, 2]
Low level parameter modification in a .csv config table
runid, model, exp, components:CanAM:namelists:canam_settings:phys_parm:pp_rdm_num_pert
run-01, canesm51_p1, cmip6-piControl, 0
run-02, canam51_p1, cmip6-amip, 2
As mentioned above, you can specify an alias for that very long key-path in your entry config:
ensemble_level:
user: ${oc.env:USER}
...
aliases:
# the alias key can be any dictionary compatible string
pp_rdm_num_pert: components:CanAM:namelists:canam_settings:phys_parm:pp_rdm_num_pert
member_level: {}
And then in your .csv config table:
setup:runid, setup:model, setup:exp, pp_rdm_num_pert
run-01, canesm51_p1, cmip6-piControl, 0
run-02, canam51_p1, cmip6-amip, 2
Simple examples from CanESM
To help get users started in the canesm world, we provide some simple ensemble config files here.
Simple ensemble with varying adjustable params:
A very common use case of an ensemble is to use it to assess the affects of adjusting parameters. Say we
are working with CanESM6 and want to assess affect of varying ap_uicefac and ap_facacc, where we
we also want:
to share the source repo and executables
make the simulations go until year
3000
Example 1: the entry config
For simple experiments is possible to set all the configuration in a single file. For example, we
can create a single config.yaml with:
# config.yaml
ensemble_level:
user: ${oc.env:USER}
share_repo: true
aliases:
uicefac: components:CanAM:namelists:model_settings.nml:adjustable_parm:ap_uicefac
facacc: components:CanAM:namelists:model_settings.nml:adjustable_parm:ap_facacc
member_level:
setup:
model: canesm6_p1
exp: dev-repeated-cycle-phy53
ver: v6.0-imsi-integration
runid: [myrun01, myrun02, myrun03]
uicefac: [ 4078.0351998582, 4084.30063527445, 3935.17992548678 ]
facacc: [ 9.15046755786538, 14.0613122621687, 11.5656114129704 ]
sequencing:
run_dates:
run_segment_stop_time: 3000
Example 2: .yaml format config table
The .yaml format lets you organize your run information explicitly and cleanly, and also makes it easier to
only define the modifications you want applied to that single run - i.e. say you only want to modify
the default value for one run.
# config.yaml
ensemble_level:
user: ${oc.env:USER}
config_table: table.yaml
share_repo: true
aliases:
uicefac: components:CanAM:namelists:model_settings.nml:adjustable_parm:ap_uicefac
facacc: components:CanAM:namelists:model_settings.nml:adjustable_parm:ap_facacc
member_level:
setup:
model: canesm6_p1
exp: dev-repeated-cycle-phy53
ver: v6.0-imsi-integration
sequencing:
run_dates:
run_segment_stop_time: 3000
# table.yaml
- setup:
runid: myrun07
uicefac: 4078.0351998582
facacc: 9.15046755786538
- setup:
runid: myrun08
uicefac: 4084.30063527445
facacc: 14.0613122621687
- setup:
runid: myrun09
uicefac: 3935.17992548678
facacc: 11.5656114129704
Example 3: .csv format config table
The .csv format offers a tabular format that uses explicit delimiters
and is able to leverage aliases to limit the column length of the tables.
# config.yaml
ensemble_level:
user: ${oc.env:USER}
config_table: table.csv
share_repo: true
aliases:
uicefac: components:CanAM:namelists:model_settings.nml:adjustable_parm:ap_uicefac
facacc: components:CanAM:namelists:model_settings.nml:adjustable_parm:ap_facacc
member_level:
setup:
model: canesm6_p1
exp: dev-repeated-cycle-phy53
ver: v6.0-imsi-integration
sequencing:
run_dates:
run_segment_stop_time: 3000
# table.csv
setup:runid, uicefac, facacc
myrun01, 4078.0351998582, 9.15046755786538
myrun02, 4084.30063527445, 14.0613122621687
myrun03, 3935.17992548678, 11.5656114129704
Simple ensemble for testing experiments:
To help launch production/test ensembles, users may wish to launch an ensemble of multiple canned experiments. For example,
for CanESM6, we might want to test 2 years of the AMIP, ESM, and OMIP models in the same ensemble. This can easily be done with
the following config file
# define a runid prefix to refer to runs in the ensemble
prefix: v6-sys
ensemble_level:
user: ${oc.env:USER} # pick up the user from the account running `imsi ensemble ...`
share_repo: true # share execs and repo
member_level:
setup:
repo: git@gitlab.science.gc.ca:CanESM/CanESM5.git
ver: v6.0-imsi-integration
runid: [
"${oc.select:prefix}-amip53",
"${oc.select:prefix}-omip",
"${oc.select:prefix}-esm53"
]
exp: [
"dev-amip-v6-phy53",
"dev-omip1-v6",
"dev-repeated-cycle-phy53"
]
model: [
"canam6_p1",
"cannemo6_p1",
"canesm6_p1"
]
sequencing:
run_dates:
run_segment_start_time: [
"2003",
"0001",
"1001"
]
run_segment_stop_time: [
"2004",
"0002",
"1002"
]
Conceptual examples of broadcasting
Broadcasting techniques in the entry config file.
Running an ensemble with a single model and multiple experiments.
Running an ensemble with multiple models and a single experiment.
Running an ensemble with multiple models and multiple experiments.
Example 1: Single model, multiple experiments
ensemble_level:
user: ${oc.env:USER}
run_directory: /output/path/to/ensemble/setup_dirs/
share_repo: true
member_level:
setup:
runid: [run-01, run-02]
model: canesm51_p1 # this is broadcasted to all ensemble members and is equivalent to [canesm51_p1, canesm51_p1]
exp: [cmip6-piControl, cmip6-amip]
Example 2: Multiple models, single experiment
ensemble_level:
user: ${oc.env:USER}
run_directory: /output/path/to/ensemble/setup_dirs/
share_repo: true
member_level:
setup:
runid: [run-01, run-02]
model: [canesm51_p1, canam51_p1]
exp: cmip6-piControl # this is broadcasted to all ensemble members and is equivalent to [cmip6-piControl, cmip6-piControl]
Example 3: Multiple models, multiple experiments (no broadcasting)
ensemble_level:
user: ${oc.env:USER}
run_directory: /output/path/to/ensemble/setup_dirs/
share_repo: true
member_level:
setup:
runid: [run-01, run-02, run-03, run-04]
model: [canesm51_p1, canam51_p1, canesm51_p2, canam51_p2]
exp: [cmip6-piControl, cmip6-amip, cmip6-historical, cmip6-ssp585]
Examples mixing the entry config with config tables:
Example 1: Single model and version; multiple experiments
Consider the following entry config:
ensemble_level:
user: ${oc.env:USER}
run_directory: /output/path/to/ensemble/setup_dirs/
share_repo: true
config_table: config/example.yaml
member_level:
setup:
ver: imsi-integration
model: canesm51_p1
In config/config.yaml, the commented keys show the equivalent structure in a table
- setup:
runid: run-01
# model: canesm51_p1
exp: cmip6-piControl
# ver: imsi-integration
- setup:
runid: run-02
# model: canesm51_p1
exp: cmip6-historical
# ver: imsi-integration
Example 2: Multiple models; single experiment and version
Consider the following entry config:
ensemble_level:
user: ${oc.env:USER}
run_directory: /output/path/to/ensemble/setup_dirs/
share_repo: true
config_table: config/example.yaml
member_level:
setup:
ver: imsi-integration
exp: cmip6-piControl
In config/config.yaml, the commented keys show the equivalent structure in a table
- setup:
runid: run-01
model: canesm51_p1
# ver: imsi-integration
# exp: cmip6-piControl
- setup:
runid: run-02
model: canam51_p1
# ver: imsi-integration
# exp: cmip6-piControl
Example 3: Multiple models, multiple experiments
Consider the following entry config:
ensemble_level:
user: ${oc.env:USER}
run_directory: /output/path/to/ensemble/setup_dirs/
share_repo: true
config_table: config/example.yaml
member_level:
setup:
model: [model_A, model_B]
ver: imsi-integration
In config/config.yaml, the commented keys show the equivalent structure in a table
- setup:
runid: run-01
# model: model_A
exp: cmip6-piControl
# ver: imsi-integration
- setup:
runid: run-02
# model: model_B
exp: cmip6-amip
# ver: imsi-integration