CMIP data reference syntax handling

For CMIP experiments, the paths in which data is saved is defined by a data reference syntax. These syntax can be hard to remember. However, from a given ‘CMIP ScmCube’, the expected path can be easily queried in two ways:

  1. Look at the output of the cubes get_data_reference_syntax method

    • To see how time period should be included, look at the output of the same method with the relevant ‘time period/range’ argument (this argument is None by default).

  2. Look at the cube’s docstring

Note:

  • the root directory is by default . i.e. the current working directory.

  • the file extension does not include a ., you must specify this yourself!

The meaning of these arguments is somewhat explained by the cube’s property docstrings but pull requests are always welcome to make these better!

In the following cells, we give a few examples for some of the available cubes.

[1]:
# NBVAL_IGNORE_OUTPUT
from netcdf_scm import iris_cube_wrappers

We can see the full list of available cubes with

[2]:
[
    el
    for el in dir(iris_cube_wrappers)
    if el.endswith("Cube") and not (el.startswith("_") or el.startswith("Scm"))
]
[2]:
['CMIP6Input4MIPsCube', 'CMIP6OutputCube', 'MarbleCMIP5Cube']

CMIP6Input4MIPsCube

[3]:
from netcdf_scm.iris_cube_wrappers import CMIP6Input4MIPsCube

CMIP6Input4MIPsCube.get_data_reference_syntax()
[3]:
'root-dir/activity-id/mip-era/target-mip/institution-id/source-id/realm/frequency/variable-id/grid-label/version/variable-id_activity-id_dataset-category_target-mip_source-id_grid-label_time-rangefile-ext'

Including a . in the file extension makes this looks more as expected.

[4]:
CMIP6Input4MIPsCube.get_data_reference_syntax(file_ext=".nc")
[4]:
'root-dir/activity-id/mip-era/target-mip/institution-id/source-id/realm/frequency/variable-id/grid-label/version/variable-id_activity-id_dataset-category_target-mip_source-id_grid-label_time-range.nc'

Including the time range too completes the picture.

[5]:
CMIP6Input4MIPsCube.get_data_reference_syntax(
    time_range="YYYY-YYYY", file_ext=".nc"
)
[5]:
'root-dir/activity-id/mip-era/target-mip/institution-id/source-id/realm/frequency/variable-id/grid-label/version/variable-id_activity-id_dataset-category_target-mip_source-id_grid-label_YYYY-YYYY.nc'

Alternately, you can also drop out the time range to see what files look like without it (e.g. surface land-fraction or cell area files).

[6]:
CMIP6Input4MIPsCube.get_data_reference_syntax(time_range=None, file_ext=".nc")
[6]:
'root-dir/activity-id/mip-era/target-mip/institution-id/source-id/realm/frequency/variable-id/grid-label/version/variable-id_activity-id_dataset-category_target-mip_source-id_grid-label.nc'

MarbleCMIP5Cube

[7]:
from netcdf_scm.iris_cube_wrappers import MarbleCMIP5Cube

MarbleCMIP5Cube.get_data_reference_syntax(
    time_period="YYYY-YYYY", file_ext=".nc"
)
[7]:
'root-dir/activity/experiment/mip-table/variable-name/model/ensemble-member/variable-name_mip-table_model_experiment_ensemble-member_YYYY-YYYY.nc'

CMIP6OutputCube

[8]:
from netcdf_scm.iris_cube_wrappers import CMIP6OutputCube

CMIP6OutputCube.get_data_reference_syntax(
    time_range="YYYY-YYYY", file_ext=".nc"
)
[8]:
'root-dir/mip-era/activity-id/institution-id/source-id/experiment-id/member-id/table-id/variable-id/grid-label/version/variable-id_table-id_source-id_experiment-id_member-id_grid-label_YYYY-YYYY.nc'

Printing the cube’s docstring provides the link to the data reference syntax page (between the carets i.e. between the < and >).

[9]:
# in a notebook, putting a question mark after an object/function/method
# is a nice shortcut to see the docstring
CMIP6OutputCube?
[10]:
print(CMIP6OutputCube.__doc__)

    Cube which can be used with CMIP6 model output data

    The data must match the CMIP6 data reference syntax as specified in the 'File name
    template' and 'Directory structure template' sections of the
    `CMIP6 Data Reference Syntax <https://goo.gl/v1drZl>`_.