Ensemble Tool

imsi ensemble --help
Usage: imsi ensemble [OPTIONS] COMMAND1 [ARGS]... [COMMAND2 [ARGS]...]...

  Commands to manage ensemble runs

Options:
  --config-path TEXT  Name of the configuration file.  [required]
  --show-diffs        Summarize differences between ensemble members if they
                      exist.
  -h, --help          Show this message and exit.

Commands:
  build          Compile the ensemble member directories.
  config         Run imsi config for each ensemble member.
  save-restarts  Save the restart files for each ensemble member.
  setup          Setup an ensemble of configurations
  status         Check the status of the ensemble jobs.
  submit         Submit the ensemble jobs.

Overview

The ensemble tool is a comprehensive way to configure and run multiple ensemble members. It is designed to be flexible and user-friendly, allowing for easy setup of multiple ensemble members with minimal effort. The ensemble tool is built on top of the IMSI framework and so existing commands are replicated in the tool but applied across multiple ensemble members. Users can configure an ensemble run using one of two main methods: as a configuration of lists in a high-level configuration file, or as a list of configurations in a configuration table. Starting with the high-level configuration file, the ensemble tool sets up a new run for each unique value in the lists. The configuration table method is more flexible and allows for more complex configurations. The ensemble tool also supports broadcasting of configuration parameters from the high-level configuration file to all ensemble members.

High-Level Config: The Entry Config

Calling the ensemble tool requires a high-level config to be set in the --config-path argument. The default config file is config.yaml. The high-level config is a YAML file that defines the ensemble-level and member-level parameters. The ensemble tool reads the high-level config and sets up the ensemble members accordingly.:

imsi ensemble --config-path=config.yaml <command>

The entry config requires the definition of two main sub-levels: ensemble_level and member_level.

ensemble_level Parameters

These parameters need to be logically defined once per ensemble run. For example:

ensemble_level:
  user: ${oc.env:USER}  # required, recommended example sets omegaconf interpolation to $USER
  run_directory: /output/path/to/ensemble/setup_dirs/  # optional, defaults to cwd
  config_table: table.csv  # optional
  share_repo: true  # optional, defaults to false
ensemble_level Parameters

Parameter

Description

user

The user running the ensemble. Defaults to $USER but can be overridden.

run_directory

Directory where ensemble setup directories are created. Defaults to the current working directory.

config_table

Path to the configuration table. See details on configuration tables below.

share_repo

If True, the first ensemble member’s setup src repository is symlinked to subsequent setup directories. Defaults to False.

aliases

alias: parameter pairs to help keep table headers short. Defaults to None.

The minimum required ensemble_level parameters are user, so a minimal configuration would look like the following:

ensemble_level:
  user: ${oc.env:USER}

member_level Parameters

These parameters are defined once per ensemble member and represent any IMSI-compatible parameter. Most importantly, any member_level variable can be defined as either a value or a list of values. If a list is specified, the ensemble tool sets up a new run for each value in the list. All setup parameters must be defined under the subkey setup in order to be correctly used by imsi (e.g. setup:runid).

Note

Single member runs can be defined under the member_level section, but values must be added as a list with a single value. For example, runid: [run-01]. This is to maintain consistency with ensemble runs.

Supported Configuration Formats for ensemble_level: config_table

Configuration tables store discrete ensemble member runs and their associated parameter modifications. IMSI’s ensemble tool supports .yaml and .csv formats. Legacy .txt support is still available but deprecated, with a DeprecationWarning issued when used. We recommend that users use the .yaml format due to its explicit representation of key heirarchies.

Example member_level configuration of two ensemble members:

ensemble_level:
  user: ${oc.env:USER}
  run_directory: /output/path/to/ensemble/setup_dirs/
  share_repo: true

member_level:
  setup:
    runid: [run-01, run-02]
    model: [canesm51_p1, canam51_p1]
    exp: [cmip6-piControl, cmip6-amip]

is equivalent to:

ensemble_level:
  user: ${oc.env:USER}
  run_directory: /output/path/to/ensemble/setup_dirs/
  config_table: config/example.csv
  share_repo: true

member_level: {}

with config/example.csv containing:

setup:runid,  setup:model, setup:exp
run-01,       canesm51_p1, cmip6-piControl
run-02,       canam51_p1,  cmip6-amip

or YAML config table:

- setup:
    runid: run-01
    model: canesm51_p1
    exp: cmip6-piControl

- setup:
    runid: run-02
    model: canam51_p1
    exp: cmip6-amip

Minimal member_level for config table-only runs:

member_level: {}

Broadcasting Configuration Parameters from member_level

The ensemble tool resolves configurations as follows:

  1. If a key exists in both member_level and config_table, the config_table value overrides and issues a warning.

  2. If a key exists only in member_level:

    • Single values are broadcasted to all ensemble members.

    • Lists must match the number of ensemble members.

    • Any overlapping keys (even lists) are overridden by the config_table values. If they don’t exist in the config_table, they are broadcasted to all ensemble members.

Note

Broadcasting in this context means that singular values are copied and applied to each ensemble member. Lists are broadcasted to each ensemble member in the order they are defined.

For .csv and .yaml config tables, the ensemble tool now supports configurations where users can omit parameters from ensemble runs that are present in other members. For example, the following config tables are valid:

CSV:

setup:runid, setup:model, some:imsi:parameter
run-01-csv,  canesm51_p1,
run-02-csv,  canam51_p1,  123

YAML:

- setup:
    runid: run-01-yaml-table
    model: canesm51_p1

- setup:
    runid: run-02-yaml-table
    model: canam51_p1
    some:
      imsi:
        parameter: 123

Modifying lower level configuration parameters

The ensemble tool allows for the modification of any non-setup parameter in the resolved yaml file (i.e. imsi_configuration_{runid}.yaml). For instance, to modify the parameter ppm_rdm_num_pert, the user can acheive this in multiple ways:

1. In the entry level config file, add the following:

ensemble_level:
  user: ${oc.env:USER}
  ...
member_level:
 setup:
   runid: [run-01, run-02]
 components:
   CanAM:
     namelists:
       canam_settings:
         phys_parm:
           ppm_rdm_num_pert: [0, 2]

Important

The parameter that is being modified must contain the entire heriarchy of the resolved yaml (i.e. imsi_configuration_{runid}.yaml). The ensemble tool modifies the resolved yaml file in place and runs imsi config on the modified file. If a new key is added to the resolved yaml by the ensemble tool, it will warn users.

2. In a CSV config table, add the following:

runid,  model,       exp,             components:CanAM:namelists:canam_settings:phys_parm:ppm_rdm_num_pert
run-01, canesm51_p1, cmip6-piControl, 0
run-02, canam51_p1,  cmip6-amip,      2

Or, if you’re like us and think that column name is long and ugly, you can specify an alias for that very long key-path in your entry config:

ensemble_level:
  user: ${oc.env:USER}
  ...
  aliases:
    # the alias key can be any dictionary compatible string
    ppm_rdm_num_pert: components:CanAM:namelists:canam_settings:phys_parm:ppm_rdm_num_pert
member_level: {}

And then in your CSV config table:

setup:runid,  setup:model, setup:exp,       ppm_rdm_num_pert
run-01,       canesm51_p1, cmip6-piControl, 0
run-02,       canam51_p1,  cmip6-amip,      2

3. In a YAML configuration table (the most explicit way):

- setup:
    runid: run-01
    model: canesm51_p1
    exp: cmip6-piControl
  components:
    CanAM:
      namelists:
        canam_settings:
          phys_parm:
            ppm_rdm_num_pert: 0

- setup:
    runid: run-02
    model: canam51_p1
    exp: cmip6-amip
  components:
    CanAM:
      namelists:
        canam_settings:
          phys_parm:
            ppm_rdm_num_pert: 2

Common configuration examples in the entry YAML:

  • Running an ensemble with a single model and multiple experiments.

  • Running an ensemble with multiple models and a single experiment.

  • Running an ensemble with multiple models and multiple experiments.

Example 1: Single model, multiple experiments

ensemble_level:
  user: ${oc.env:USER}
  run_directory: /output/path/to/ensemble/setup_dirs/
  share_repo: true

member_level:
  setup:
    runid: [run-01, run-02]
    model: canesm51_p1 # this is broadcasted to all ensemble members and is equivalent to [canesm51_p1, canesm51_p1]
    exp: [cmip6-piControl, cmip6-amip]

Example 2: Multiple models, single experiment

ensemble_level:
  user: ${oc.env:USER}
  run_directory: /output/path/to/ensemble/setup_dirs/
  share_repo: true

member_level:
  setup:
    runid: [run-01, run-02]
    model: [canesm51_p1, canam51_p1]
    exp: cmip6-piControl # this is broadcasted to all ensemble members and is equivalent to [cmip6-piControl, cmip6-piControl]

Example 3: Multiple models, multiple experiments

ensemble_level:
  user: ${oc.env:USER}
  run_directory: /output/path/to/ensemble/setup_dirs/
  share_repo: true

member_level:
  setup:
    runid: [run-01, run-02, run-03, run-04]
    model: [canesm51_p1, canam51_p1, canesm51_p2, canam51_p2]
    exp: [cmip6-piControl, cmip6-amip, cmip6-historical, cmip6-ssp585]

Common configuration examples from a YAML config table:

Example 1: Single model, multiple experiments

Consider the following entry level YAML:

ensemble_level:
  user: ${oc.env:USER}
  run_directory: /output/path/to/ensemble/setup_dirs/
  share_repo: true
  config_table: config/example.yaml

member_level:
  setup:
    ver: imsi-integration

In config/config.yaml, the commented keys show how the values are resolved into the table

- setup:
    runid: run-01
    model: canesm51_p1
    exp: cmip6-piControl
  # ver: imsi-integration is broadcasted into resolved ensemble config and is equivalent to specifying directly
- setup:
     runid: run-02
     model: canesm51_p1
     exp: cmip6-amip
  # ver: imsi-integration is broadcasted into resolved ensemble config and is equivalent to specifying directly