Iris cube wrappers API

Wrappers of the iris cube.

These classes automate handling of a number of netCDF processing steps. For example, finding surface land fraction files, applying regions to data and returning timeseries in key regions for simple climate models.

class netcdf_scm.iris_cube_wrappers.CMIP6Input4MIPsCube[source]

Bases: netcdf_scm.iris_cube_wrappers._CMIPCube

Cube which can be used with CMIP6 input4MIPs data

The data must match the CMIP6 Forcing Datasets Summary, specifically the Forcing Dataset Specifications.

activity_id = None

The activity_id for which we want to load data.

For these cubes, this will almost always be input4MIPs.

Type

str

areacell_var

The name of the variable associated with the area of each gridbox.

If required, this is used to determine the area of each cell in a data file. For example, if our data file is tas_Amon_HadCM3_rcp45_r1i1p1_200601.nc then areacell_var can be used to work out the name of the associated cell area file. In some cases, it might be as simple as replacing tas with the value of areacell_var.

Type

str

convert_scm_timeseries_cubes_to_openscmdata(scm_timeseries_cubes, out_calendar=None)

Convert dictionary of SCM timeseries cubes to an scmdata.ScmRun

Parameters
  • scm_timeseries_cubes (dict) – Dictionary of “region name”-ScmCube key-value pairs.

  • out_calendar (str) – Calendar to use for the time axis of the output

Returns

scmdata.ScmRun containing the data from the SCM timeseries cubes

Return type

scmdata.ScmRun

Raises

NotImplementedError – The (original) input data has dimensions other than time, latitude and longitude (so the data to convert has dimensions other than time).

dataset_category = None

The dataset_category for which we want to load data e.g. GHGConcentrations

Type

str

dim_names

Names of the dimensions in this cube

Here the names are the standard_names which means there can be None in the output.

Type

list

file_ext = None

The file extension of the data file we want to load e.g. .nc

Type

str

frequency = None

The frequency for which we want to load data e.g. yr

Type

str

get_area_weights(areacell_scmcube=None)

Get area weights for this cube

Parameters

areacell_scmcube (ScmCube) – ScmCube containing areacell data. If None, we calculate the weights using iris.

Returns

Weights on the cube’s latitude-longitude grid.

Return type

np.ndarray

Raises
  • iris.exceptions.CoordinateMultiDimError – The cube’s co-ordinates are multi-dimensional and we don’t have cell area data.

  • ValueError – Area weights units are not as expected (contradict with self._area_weights_units).

get_data_directory()

Get the path to a data file from self’s attributes.

This can take multiple forms, it may just return a previously set filepath attribute or it could combine a number of different metadata elements (e.g. model name, experiment name) to create the data path.

Returns

path to the data file from which this cube has been/will be loaded

Return type

str

Raises

OSError – The data directory cannot be determined

get_data_filename()

Get the name of a data file from self’s attributes.

This can take multiple forms, it may just return a previously set filename attribute or it could combine a number of different metadata elements (e.g. model name, experiment name) to create the data name.

Returns

name of the data file from which this cube has been/will be loaded.

Return type

str

Raises

OSError – The data directory cannot be determined

classmethod get_data_reference_syntax(**kwargs)

Get data reference syntax for this cube

Parameters

kwargs (str) – Attributes of the cube to set before generating the example data reference syntax.

Returns

Example of the full path to a file for the given kwargs with this cube’s data reference syntax.

Return type

str

get_filepath_from_load_data_from_identifiers_args(**kwargs)[source]

Get the full filepath of the data to load from the arguments passed to self.load_data_from_identifiers.

Full details about the meaning of the identifiers are given in the Forcing Dataset Specifications.

Parameters

kwargs (str) – Identifiers to use to load the data

Returns

The full filepath (path and name) of the file to load.

Return type

str

Raises

AttributeError – An input argument does not match with the cube’s data reference syntax

get_load_data_from_identifiers_args_from_filepath(filepath)

Get the set of identifiers to use to load data from a filepath.

Parameters

filepath (str) – The filepath from which to load the data.

Returns

Set of arguments which can be passed to self.load_data_from_identifiers to load the data in the filepath.

Return type

dict

Raises

ValueError – Path and filename contradict each other

get_metadata_cube(metadata_variable, cube=None)

Load a metadata cube from self’s attributes.

Parameters
  • metadata_variable (str) – the name of the metadata variable to get, as it appears in the filename.

  • cube (ScmCube) – Optionally, pass in an already loaded metadata cube to link it to currently loaded cube

Returns

instance of self which has been loaded from the file containing the metadata variable of interest.

Return type

type(self)

Raises

TypeErrorcube is not an ScmCube

get_scm_timeseries(**kwargs)

Get SCM relevant timeseries from self.

Parameters

**kwargs – Passed to get_scm_timeseries_cubes()

Returns

scmdata.ScmRun instance with the data in the data attribute and metadata in the metadata attribute.

Return type

scmdata.ScmRun

get_scm_timeseries_cubes(lazy=False, **kwargs)

Get SCM relevant cubes

The effective areas used for each of the regions are added as auxillary co-ordinates of each timeseries cube.

If global, Northern Hemisphere and Southern Hemisphere land cubes are calculated, then three auxillary co-ordinates are also added to each cube: land_fraction, land_fraction_northern_hemisphere and land_fraction_southern_hemisphere. These co-ordinates document the area fraction that was considered to be land when the cubes were crunched i.e. land_fraction is the fraction of the entire globe which was considered to be land, land_fraction_northern_hemisphere is the fraction of the Northern Hemisphere which was considered to be land and land_fraction_southern_hemisphere is the fraction of the Southern Hemisphere which was considered to be land.

Parameters
  • lazy (bool) – Should I process the data lazily? This can be slow as data has to be read off disk multiple time.

  • kwargs (anys) – Passed to get_scm_timeseries_weights()

Returns

dict of str – Dictionary of cubes (region: cube key: value pairs), with latitude-longitude mean data as appropriate for each of the requested regions.

Return type

ScmCube

Raises

InvalidWeightsError – No valid weights are found for the requested regions

get_scm_timeseries_weights(surface_fraction_cube=None, areacell_scmcube=None, regions=None, cell_weights=None, log_failure=False)

Get the scm timeseries weights

Parameters
  • surface_fraction_cube (ScmCube, optional) – land surface fraction data which is used to determine whether a given gridbox is land or ocean. If None, we try to load the land surface fraction automatically.

  • areacell_scmcube (ScmCube, optional) – cell area data which is used to take the latitude-longitude mean of the cube’s data. If None, we try to load this data automatically and if that fails we fall back onto iris.analysis.cartography.area_weights.

  • regions (list[str]) – List of regions to use. If None then netcdf_scm.regions.DEFAULT_REGIONS is used.

  • cell_weights ({'area-only', 'area-surface-fraction'}) – How cell weights should be calculated. If 'area-surface-fraction', both cell area and its surface fraction will be used to weight the cell. If 'area-only', only the cell’s area will be used to weight the cell (cells which do not belong to the region are nonetheless excluded). If None, netCDF-SCM will guess whether land surface fraction weights should be included or not based on the data being processed. When guessing, for ocean data, netCDF-SCM will weight cells only by the horizontal area of the cell i.e. no land fraction (see Section L5 of Griffies et al., GMD, 2016, https://doi.org/10.5194/gmd-9-3231-2016). For land variables, netCDF-SCM will weight cells by both thier horizontal area and their land surface fraction. “Yes, you do need to weight the output by land frac (sftlf is the CMIP variable name).” (Chris Jones, personal communication, 18 April 2020). For land variables, note that there seems to be nothing in Jones et al., GMD, 2016 (https://doi.org/10.5194/gmd-9-2853-2016).

  • log_failure (bool) – Should regions which fail be logged? If no, failures are raised as warnings.

Returns

dict of str – Dictionary of ‘region name’: weights, key: value pairs

Return type

np.ndarray

Notes

Only regions which can be calculated are returned. If no regions can be calculated, an empty dictionary will be returned.

get_variable_constraint()

Get the iris variable constraint to use when loading data with self.load_data_from_identifiers.

Returns

constraint to use which ensures that only the variable of interest is loaded.

Return type

iris.Constraint

grid_label = None

The grid_label for which we want to load data e.g. gr1-GMNHSH

Type

str

info

Information about the cube’s source files

res["files"] contains the files used to load the data in this cube. res["metadata"] contains information for each of the metadata cubes used to load the data in this cube.

Returns

Return type

dict

institution_id = None

The institution_id for which we want to load data e.g. UoM

Type

str

lat_dim

iris.coords.DimCoord The latitude dimension of the data.

lat_dim_number

The index which corresponds to the latitude dimension.

e.g. if latitude is the first dimension of the data, then self.lat_dim_number will be 0 (Python is zero-indexed).

Type

int

lat_lon_shape

2D Tuple of int which gives shape of a lat-lon slice of the data

e.g. if the cube’s shape is (4, 3, 5, 4) and its dimensions are (time, lat, depth, lon) then cube.lat_lon_shape will be (3, 4)

Raises

AssertionError – No lat lon slice can be deduced (if this happens, please raise an issue at https://gitlab.com/netcdf-scm/netcdf-scm/issues so we can address your use case).

Type

tuple

load_data_from_identifiers(process_warnings=True, **kwargs)

Load data using key identifiers.

The identifiers are used to determine the path of the file to load. The file is then loaded into an iris cube which can be accessed through self.cube.

Parameters
  • process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?

  • kwargs (any) – Arguments which can then be processed by self.get_filepath_from_load_data_from_identifiers_args and self.get_variable_constraint to determine the full filepath of the file to load and the variable constraint to use.

load_data_from_path(filepath, process_warnings=True)

Load data from a path.

Parameters
  • filepath (str) – The filepath from which to load the data.

  • process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?

load_data_in_directory(directory=None, process_warnings=True)

Load data in a directory.

The data is loaded into an iris cube which can be accessed through self.cube.

Initially, this method is intended to only be used to load data when it is saved in a number of different timeslice files e.g.:

  • tas_Amon_HadCM3_rcp45_r1i1p1_200601-203012.nc

  • tas_Amon_HadCM3_rcp45_r1i1p1_203101-203512.nc

  • tas_Amon_HadCM3_rcp45_r1i1p1_203601-203812.nc

It is not intended to be used to load multiple different variables or non-continuous timeseries. These use cases could be added in future, but are not required yet so have not been included.

Note that this function removes any attributes which aren’t common between the loaded cubes. In general, we have found that this mainly means creation_date, tracking_id and history are deleted. If unsure, please check.

Parameters
  • directory (str) – Directory from which to load the data.

  • process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?

Raises

ValueError – If the files in the directory are not from the same run (i.e. their filenames are not identical except for the timestamp) or if the files don’t form a continuous timeseries.

lon_dim

iris.coords.DimCoord The longitude dimension of the data.

lon_dim_number

The index which corresponds to the longitude dimension.

e.g. if longitude is the third dimension of the data, then self.lon_dim_number will be 2 (Python is zero-indexed).

Type

int

mip_era = None

The mip_era for which we want to load data e.g. CMIP6

Type

str

netcdf_scm_realm

The realm in which netCDF-SCM thinks the data belongs.

This is used to make decisions about how to take averages of the data and where to find metadata variables.

If it is not sure, netCDF-SCM will guess that the data belongs to the ‘atmosphere’ realm.

Type

str

process_filename(filename)[source]

Cut a filename into its identifiers

Parameters

filename (str) – The filename to process. Filename here means just the filename, no path should be included.

Returns

A dictionary where each key is the identifier name and each value is the value of that identifier for the input filename

Return type

dict

process_path(path)[source]

Cut a path into its identifiers

Parameters

path (str) – The path to process. Path here means just the path, no filename should be included.

Returns

A dictionary where each key is the identifier name and each value is the value of that identifier for the input path

Return type

dict

realm = None

The realm for which we want to load data e.g. atmos

Type

str

root_dir = None
The root directory of the database i.e. where the cube should start its

path

e.g. /home/users/usertim/cmip6input.

Type

str

source_id = None

The source_id for which we want to load data e.g. UoM-REMIND-MAGPIE-ssp585-1-2-0

This must include the institution_id.

Type

str

surface_fraction_var

The name of the variable associated with the surface fraction in each gridbox.

If required, this is used when looking for the surface fraction file which belongs to a given data file. For example, if our data file is tas_Amon_HadCM3_rcp45_r1i1p1_200601.nc then surface_fraction_var can be used to work out the name of the associated surface fraction file. In some cases, it might be as simple as replacing tas with the value of surface_fraction_var.

Type

str

table_name_for_metadata_vars

The name of the ‘table’ in which metadata variables can be found.

For example, fx or Ofx.

We wrap this as a property as table typically means table_id but is sometimes referred to in other ways e.g. as mip_table in CMIP5.

Type

str

target_mip = None

The target_mip for which we want to load data e.g. ScenarioMIP

Type

str

time_dim

iris.coords.DimCoord The time dimension of the data.

time_dim_number

The index which corresponds to the time dimension.

e.g. if time is the first dimension of the data, then self.time_dim_number will be 0 (Python is zero-indexed).

Type

int

time_period_regex

Regular expression which captures the timeseries identifier in input data files.

For help on regular expressions, see regular expressions.

Type

_sre.SRE_Pattern

time_range = None

The time range for which we want to load data e.g. 2005-2100

If None, this information isn’t included in the filename which is useful for loading metadata files which don’t have a relevant time period.

Type

str

timestamp_definitions

Definition of valid timestamp information and corresponding key values.

This follows the CMIP standards where time strings must be one of the following: YYYY, YYYYMM, YYYYMMDD, YYYYMMDDHH or one of the previous combined with a hyphen e.g. YYYY-YYYY.

Each key in the definitions dictionary is the length of the timestamp. Each value is itself a dictionary, with keys:

  • datetime_str: the string required to convert a timestamp of this length into a datetime using datetime.datetime.strptime

  • generic_regexp: a regular expression which will match timestamps in this format

  • expected_timestep: a dateutil.relativedelta.relativedelta object which contains the expected timestep in files with this timestamp

Returns

Return type

dict

Examples

>>> self.timestamp_definitions[len("2012")]["datetime_str"]
"%Y"
variable_id = None

The variable_id for which we want to load data e.g. mole-fraction-of-carbon-dioxide-in-air

Type

str

version = None

The version for which we want to load data e.g. v20180427

Type

str

class netcdf_scm.iris_cube_wrappers.CMIP6OutputCube[source]

Bases: netcdf_scm.iris_cube_wrappers._CMIPCube

Cube which can be used with CMIP6 model output data

The data must match the CMIP6 data reference syntax as specified in the ‘File name template’ and ‘Directory structure template’ sections of the CMIP6 Data Reference Syntax.

activity_id = None

The activity_id for which we want to load data.

In CMIP6, this denotes the responsible MIP e.g. DCPP.

Type

str

areacell_var

The name of the variable associated with the area of each gridbox.

If required, this is used to determine the area of each cell in a data file. For example, if our data file is tas_Amon_HadCM3_rcp45_r1i1p1_200601.nc then areacell_var can be used to work out the name of the associated cell area file. In some cases, it might be as simple as replacing tas with the value of areacell_var.

Type

str

convert_scm_timeseries_cubes_to_openscmdata(scm_timeseries_cubes, out_calendar=None)

Convert dictionary of SCM timeseries cubes to an scmdata.ScmRun

Parameters
  • scm_timeseries_cubes (dict) – Dictionary of “region name”-ScmCube key-value pairs.

  • out_calendar (str) – Calendar to use for the time axis of the output

Returns

scmdata.ScmRun containing the data from the SCM timeseries cubes

Return type

scmdata.ScmRun

Raises

NotImplementedError – The (original) input data has dimensions other than time, latitude and longitude (so the data to convert has dimensions other than time).

dim_names

Names of the dimensions in this cube

Here the names are the standard_names which means there can be None in the output.

Type

list

experiment_id = None

The experiment_id for which we want to load data e.g. dcppA-hindcast

Type

str

file_ext = None

The file extension of the data file we want to load e.g. .nc

Type

str

get_area_weights(areacell_scmcube=None)

Get area weights for this cube

Parameters

areacell_scmcube (ScmCube) – ScmCube containing areacell data. If None, we calculate the weights using iris.

Returns

Weights on the cube’s latitude-longitude grid.

Return type

np.ndarray

Raises
  • iris.exceptions.CoordinateMultiDimError – The cube’s co-ordinates are multi-dimensional and we don’t have cell area data.

  • ValueError – Area weights units are not as expected (contradict with self._area_weights_units).

get_data_directory()

Get the path to a data file from self’s attributes.

This can take multiple forms, it may just return a previously set filepath attribute or it could combine a number of different metadata elements (e.g. model name, experiment name) to create the data path.

Returns

path to the data file from which this cube has been/will be loaded

Return type

str

Raises

OSError – The data directory cannot be determined

get_data_filename()

Get the name of a data file from self’s attributes.

This can take multiple forms, it may just return a previously set filename attribute or it could combine a number of different metadata elements (e.g. model name, experiment name) to create the data name.

Returns

name of the data file from which this cube has been/will be loaded.

Return type

str

Raises

OSError – The data directory cannot be determined

classmethod get_data_reference_syntax(**kwargs)

Get data reference syntax for this cube

Parameters

kwargs (str) – Attributes of the cube to set before generating the example data reference syntax.

Returns

Example of the full path to a file for the given kwargs with this cube’s data reference syntax.

Return type

str

get_filepath_from_load_data_from_identifiers_args(**kwargs)[source]

Get the full filepath of the data to load from the arguments passed to self.load_data_from_identifiers.

Full details about the meaning of each identifier is given in Table 1 of the CMIP6 Data Reference Syntax.

Parameters

kwargs (str) – Identifiers to use to load the data

Returns

The full filepath (path and name) of the file to load.

Return type

str

Raises

AttributeError – An input argument does not match with the cube’s data reference syntax

classmethod get_instance_id(filepath)[source]

Get the instance_id from a given path

This is used as a unique identifier for datasets on the ESGF.

Parameters

filepath (str) – Full file path including directory structure

Raises

ValueError: – If the filepath provided results in an instance id which is obviously incorrect

Returns

Instance ID

Return type

str

get_load_data_from_identifiers_args_from_filepath(filepath)

Get the set of identifiers to use to load data from a filepath.

Parameters

filepath (str) – The filepath from which to load the data.

Returns

Set of arguments which can be passed to self.load_data_from_identifiers to load the data in the filepath.

Return type

dict

Raises

ValueError – Path and filename contradict each other

get_metadata_cube(metadata_variable, cube=None)

Load a metadata cube from self’s attributes.

Parameters
  • metadata_variable (str) – the name of the metadata variable to get, as it appears in the filename.

  • cube (ScmCube) – Optionally, pass in an already loaded metadata cube to link it to currently loaded cube

Returns

instance of self which has been loaded from the file containing the metadata variable of interest.

Return type

type(self)

Raises

TypeErrorcube is not an ScmCube

get_scm_timeseries(**kwargs)

Get SCM relevant timeseries from self.

Parameters

**kwargs – Passed to get_scm_timeseries_cubes()

Returns

scmdata.ScmRun instance with the data in the data attribute and metadata in the metadata attribute.

Return type

scmdata.ScmRun

get_scm_timeseries_cubes(lazy=False, **kwargs)

Get SCM relevant cubes

The effective areas used for each of the regions are added as auxillary co-ordinates of each timeseries cube.

If global, Northern Hemisphere and Southern Hemisphere land cubes are calculated, then three auxillary co-ordinates are also added to each cube: land_fraction, land_fraction_northern_hemisphere and land_fraction_southern_hemisphere. These co-ordinates document the area fraction that was considered to be land when the cubes were crunched i.e. land_fraction is the fraction of the entire globe which was considered to be land, land_fraction_northern_hemisphere is the fraction of the Northern Hemisphere which was considered to be land and land_fraction_southern_hemisphere is the fraction of the Southern Hemisphere which was considered to be land.

Parameters
  • lazy (bool) – Should I process the data lazily? This can be slow as data has to be read off disk multiple time.

  • kwargs (anys) – Passed to get_scm_timeseries_weights()

Returns

dict of str – Dictionary of cubes (region: cube key: value pairs), with latitude-longitude mean data as appropriate for each of the requested regions.

Return type

ScmCube

Raises

InvalidWeightsError – No valid weights are found for the requested regions

get_scm_timeseries_weights(surface_fraction_cube=None, areacell_scmcube=None, regions=None, cell_weights=None, log_failure=False)

Get the scm timeseries weights

Parameters
  • surface_fraction_cube (ScmCube, optional) – land surface fraction data which is used to determine whether a given gridbox is land or ocean. If None, we try to load the land surface fraction automatically.

  • areacell_scmcube (ScmCube, optional) – cell area data which is used to take the latitude-longitude mean of the cube’s data. If None, we try to load this data automatically and if that fails we fall back onto iris.analysis.cartography.area_weights.

  • regions (list[str]) – List of regions to use. If None then netcdf_scm.regions.DEFAULT_REGIONS is used.

  • cell_weights ({'area-only', 'area-surface-fraction'}) –

    How cell weights should be calculated. If 'area-surface-fraction', both cell area and its surface fraction will be used to weight the cell. If 'area-only', only the cell’s area will be used to weight the cell (cells which do not belong to the region are nonetheless excluded). If None, netCDF-SCM will guess whether land surface fraction weights should be included or not based on the data being processed. When guessing, for ocean data, netCDF-SCM will weight cells only by the horizontal area of the cell i.e. no land fraction (see Section L5 of Griffies et al., GMD, 2016, https://doi.org/10.5194/gmd-9-3231-2016). For land variables, netCDF-SCM will weight cells by both thier horizontal area and their land surface fraction. “Yes, you do need to weight the output by land frac (sftlf is the CMIP variable name).” (Chris Jones, personal communication, 18 April 2020). For land variables, note that there seems to be nothing in Jones et al., GMD, 2016 (https://doi.org/10.5194/gmd-9-2853-2016).

  • log_failure (bool) – Should regions which fail be logged? If no, failures are raised as warnings.

Returns

dict of str – Dictionary of ‘region name’: weights, key: value pairs

Return type

np.ndarray

Notes

Only regions which can be calculated are returned. If no regions can be calculated, an empty dictionary will be returned.

get_variable_constraint()

Get the iris variable constraint to use when loading data with self.load_data_from_identifiers.

Returns

constraint to use which ensures that only the variable of interest is loaded.

Return type

iris.Constraint

grid_label = None

The grid_label for which we want to load data e.g. grn

Type

str

info

Information about the cube’s source files

res["files"] contains the files used to load the data in this cube. res["metadata"] contains information for each of the metadata cubes used to load the data in this cube.

Returns

Return type

dict

institution_id = None

The institution_id for which we want to load data e.g. CNRM-CERFACS

Type

str

lat_dim

iris.coords.DimCoord The latitude dimension of the data.

lat_dim_number

The index which corresponds to the latitude dimension.

e.g. if latitude is the first dimension of the data, then self.lat_dim_number will be 0 (Python is zero-indexed).

Type

int

lat_lon_shape

2D Tuple of int which gives shape of a lat-lon slice of the data

e.g. if the cube’s shape is (4, 3, 5, 4) and its dimensions are (time, lat, depth, lon) then cube.lat_lon_shape will be (3, 4)

Raises

AssertionError – No lat lon slice can be deduced (if this happens, please raise an issue at https://gitlab.com/netcdf-scm/netcdf-scm/issues so we can address your use case).

Type

tuple

load_data_from_identifiers(process_warnings=True, **kwargs)

Load data using key identifiers.

The identifiers are used to determine the path of the file to load. The file is then loaded into an iris cube which can be accessed through self.cube.

Parameters
  • process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?

  • kwargs (any) – Arguments which can then be processed by self.get_filepath_from_load_data_from_identifiers_args and self.get_variable_constraint to determine the full filepath of the file to load and the variable constraint to use.

load_data_from_path(filepath, process_warnings=True)

Load data from a path.

Parameters
  • filepath (str) – The filepath from which to load the data.

  • process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?

load_data_in_directory(directory=None, process_warnings=True)

Load data in a directory.

The data is loaded into an iris cube which can be accessed through self.cube.

Initially, this method is intended to only be used to load data when it is saved in a number of different timeslice files e.g.:

  • tas_Amon_HadCM3_rcp45_r1i1p1_200601-203012.nc

  • tas_Amon_HadCM3_rcp45_r1i1p1_203101-203512.nc

  • tas_Amon_HadCM3_rcp45_r1i1p1_203601-203812.nc

It is not intended to be used to load multiple different variables or non-continuous timeseries. These use cases could be added in future, but are not required yet so have not been included.

Note that this function removes any attributes which aren’t common between the loaded cubes. In general, we have found that this mainly means creation_date, tracking_id and history are deleted. If unsure, please check.

Parameters
  • directory (str) – Directory from which to load the data.

  • process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?

Raises

ValueError – If the files in the directory are not from the same run (i.e. their filenames are not identical except for the timestamp) or if the files don’t form a continuous timeseries.

lon_dim

iris.coords.DimCoord The longitude dimension of the data.

lon_dim_number

The index which corresponds to the longitude dimension.

e.g. if longitude is the third dimension of the data, then self.lon_dim_number will be 2 (Python is zero-indexed).

Type

int

member_id = None

The member_id for which we want to load data e.g. s1960-r2i1p1f3

Type

str

mip_era = None

The mip_era for which we want to load data e.g. CMIP6

Type

str

netcdf_scm_realm

The realm in which netCDF-SCM thinks the data belongs.

This is used to make decisions about how to take averages of the data and where to find metadata variables.

If it is not sure, netCDF-SCM will guess that the data belongs to the ‘atmosphere’ realm.

Type

str

process_filename(filename)[source]

Cut a filename into its identifiers

Parameters

filename (str) – The filename to process. Filename here means just the filename, no path should be included.

Returns

A dictionary where each key is the identifier name and each value is the value of that identifier for the input filename

Return type

dict

process_path(path)[source]

Cut a path into its identifiers

Parameters

path (str) – The path to process. Path here means just the path, no filename should be included.

Returns

A dictionary where each key is the identifier name and each value is the value of that identifier for the input path

Return type

dict

root_dir = None
The root directory of the database i.e. where the cube should start its

path

e.g. /home/users/usertim/cmip6_data.

Type

str

source_id = None

The source_id for which we want to load data e.g. CNRM-CM6-1

This was known as model in CMIP5.

Type

str

surface_fraction_var

The name of the variable associated with the surface fraction in each gridbox.

If required, this is used when looking for the surface fraction file which belongs to a given data file. For example, if our data file is tas_Amon_HadCM3_rcp45_r1i1p1_200601.nc then surface_fraction_var can be used to work out the name of the associated surface fraction file. In some cases, it might be as simple as replacing tas with the value of surface_fraction_var.

Type

str

table_id = None

The table_id for which we want to load data. e.g. day

Type

str

table_name_for_metadata_vars

The name of the ‘table’ in which metadata variables can be found.

For example, fx or Ofx.

We wrap this as a property as table typically means table_id but is sometimes referred to in other ways e.g. as mip_table in CMIP5.

Type

str

time_dim

iris.coords.DimCoord The time dimension of the data.

time_dim_number

The index which corresponds to the time dimension.

e.g. if time is the first dimension of the data, then self.time_dim_number will be 0 (Python is zero-indexed).

Type

int

time_period_regex

Regular expression which captures the timeseries identifier in input data files.

For help on regular expressions, see regular expressions.

Type

_sre.SRE_Pattern

time_range = None

The time range for which we want to load data e.g. 198001-198412

If None, this information isn’t included in the filename which is useful for loading metadata files which don’t have a relevant time period.

Type

str

timestamp_definitions

Definition of valid timestamp information and corresponding key values.

This follows the CMIP standards where time strings must be one of the following: YYYY, YYYYMM, YYYYMMDD, YYYYMMDDHH or one of the previous combined with a hyphen e.g. YYYY-YYYY.

Each key in the definitions dictionary is the length of the timestamp. Each value is itself a dictionary, with keys:

  • datetime_str: the string required to convert a timestamp of this length into a datetime using datetime.datetime.strptime

  • generic_regexp: a regular expression which will match timestamps in this format

  • expected_timestep: a dateutil.relativedelta.relativedelta object which contains the expected timestep in files with this timestamp

Returns

Return type

dict

Examples

>>> self.timestamp_definitions[len("2012")]["datetime_str"]
"%Y"
variable_id = None

The variable_id for which we want to load data e.g. pr

Type

str

version = None

The version for which we want to load data e.g. v20160215

Type

str

class netcdf_scm.iris_cube_wrappers.MarbleCMIP5Cube[source]

Bases: netcdf_scm.iris_cube_wrappers._CMIPCube

Cube which can be used with the cmip5 directory on marble (identical to ETH Zurich’s archive).

This directory structure is very similar, but not quite identical, to the recommended CMIP5 directory structure described in section 3.1 of the CMIP5 Data Reference Syntax.

activity = None

The activity for which we want to load data e.g. ‘cmip5’

Type

str

areacell_var

The name of the variable associated with the area of each gridbox.

If required, this is used to determine the area of each cell in a data file. For example, if our data file is tas_Amon_HadCM3_rcp45_r1i1p1_200601.nc then areacell_var can be used to work out the name of the associated cell area file. In some cases, it might be as simple as replacing tas with the value of areacell_var.

Type

str

convert_scm_timeseries_cubes_to_openscmdata(scm_timeseries_cubes, out_calendar=None)

Convert dictionary of SCM timeseries cubes to an scmdata.ScmRun

Parameters
  • scm_timeseries_cubes (dict) – Dictionary of “region name”-ScmCube key-value pairs.

  • out_calendar (str) – Calendar to use for the time axis of the output

Returns

scmdata.ScmRun containing the data from the SCM timeseries cubes

Return type

scmdata.ScmRun

Raises

NotImplementedError – The (original) input data has dimensions other than time, latitude and longitude (so the data to convert has dimensions other than time).

dim_names

Names of the dimensions in this cube

Here the names are the standard_names which means there can be None in the output.

Type

list

ensemble_member = None

The ensemble member for which we want to load data e.g. ‘r1i1p1’

Type

str

experiment = None

The experiment for which we want to load data e.g. ‘1pctCO2’

Type

str

file_ext = None

The file extension of the data file we want to load e.g. ‘.nc’

Type

str

get_area_weights(areacell_scmcube=None)

Get area weights for this cube

Parameters

areacell_scmcube (ScmCube) – ScmCube containing areacell data. If None, we calculate the weights using iris.

Returns

Weights on the cube’s latitude-longitude grid.

Return type

np.ndarray

Raises
  • iris.exceptions.CoordinateMultiDimError – The cube’s co-ordinates are multi-dimensional and we don’t have cell area data.

  • ValueError – Area weights units are not as expected (contradict with self._area_weights_units).

get_data_directory()

Get the path to a data file from self’s attributes.

This can take multiple forms, it may just return a previously set filepath attribute or it could combine a number of different metadata elements (e.g. model name, experiment name) to create the data path.

Returns

path to the data file from which this cube has been/will be loaded

Return type

str

Raises

OSError – The data directory cannot be determined

get_data_filename()

Get the name of a data file from self’s attributes.

This can take multiple forms, it may just return a previously set filename attribute or it could combine a number of different metadata elements (e.g. model name, experiment name) to create the data name.

Returns

name of the data file from which this cube has been/will be loaded.

Return type

str

Raises

OSError – The data directory cannot be determined

classmethod get_data_reference_syntax(**kwargs)

Get data reference syntax for this cube

Parameters

kwargs (str) – Attributes of the cube to set before generating the example data reference syntax.

Returns

Example of the full path to a file for the given kwargs with this cube’s data reference syntax.

Return type

str

get_filepath_from_load_data_from_identifiers_args(**kwargs)[source]

Get the full filepath of the data to load from the arguments passed to self.load_data_from_identifiers.

Full details about the identifiers are given in Section 2 of the CMIP5 Data Reference Syntax.

Parameters

kwargs (str) – Identifiers to use to load the data

Returns

The full filepath (path and name) of the file to load.

Return type

str

Raises

AttributeError – An input argument does not match with the cube’s data reference syntax

get_load_data_from_identifiers_args_from_filepath(filepath)

Get the set of identifiers to use to load data from a filepath.

Parameters

filepath (str) – The filepath from which to load the data.

Returns

Set of arguments which can be passed to self.load_data_from_identifiers to load the data in the filepath.

Return type

dict

Raises

ValueError – Path and filename contradict each other

get_metadata_cube(metadata_variable, cube=None)

Load a metadata cube from self’s attributes.

Parameters
  • metadata_variable (str) – the name of the metadata variable to get, as it appears in the filename.

  • cube (ScmCube) – Optionally, pass in an already loaded metadata cube to link it to currently loaded cube

Returns

instance of self which has been loaded from the file containing the metadata variable of interest.

Return type

type(self)

Raises

TypeErrorcube is not an ScmCube

get_scm_timeseries(**kwargs)

Get SCM relevant timeseries from self.

Parameters

**kwargs – Passed to get_scm_timeseries_cubes()

Returns

scmdata.ScmRun instance with the data in the data attribute and metadata in the metadata attribute.

Return type

scmdata.ScmRun

get_scm_timeseries_cubes(lazy=False, **kwargs)

Get SCM relevant cubes

The effective areas used for each of the regions are added as auxillary co-ordinates of each timeseries cube.

If global, Northern Hemisphere and Southern Hemisphere land cubes are calculated, then three auxillary co-ordinates are also added to each cube: land_fraction, land_fraction_northern_hemisphere and land_fraction_southern_hemisphere. These co-ordinates document the area fraction that was considered to be land when the cubes were crunched i.e. land_fraction is the fraction of the entire globe which was considered to be land, land_fraction_northern_hemisphere is the fraction of the Northern Hemisphere which was considered to be land and land_fraction_southern_hemisphere is the fraction of the Southern Hemisphere which was considered to be land.

Parameters
  • lazy (bool) – Should I process the data lazily? This can be slow as data has to be read off disk multiple time.

  • kwargs (anys) – Passed to get_scm_timeseries_weights()

Returns

dict of str – Dictionary of cubes (region: cube key: value pairs), with latitude-longitude mean data as appropriate for each of the requested regions.

Return type

ScmCube

Raises

InvalidWeightsError – No valid weights are found for the requested regions

get_scm_timeseries_weights(surface_fraction_cube=None, areacell_scmcube=None, regions=None, cell_weights=None, log_failure=False)

Get the scm timeseries weights

Parameters
  • surface_fraction_cube (ScmCube, optional) – land surface fraction data which is used to determine whether a given gridbox is land or ocean. If None, we try to load the land surface fraction automatically.

  • areacell_scmcube (ScmCube, optional) – cell area data which is used to take the latitude-longitude mean of the cube’s data. If None, we try to load this data automatically and if that fails we fall back onto iris.analysis.cartography.area_weights.

  • regions (list[str]) – List of regions to use. If None then netcdf_scm.regions.DEFAULT_REGIONS is used.

  • cell_weights ({'area-only', 'area-surface-fraction'}) –

    How cell weights should be calculated. If 'area-surface-fraction', both cell area and its surface fraction will be used to weight the cell. If 'area-only', only the cell’s area will be used to weight the cell (cells which do not belong to the region are nonetheless excluded). If None, netCDF-SCM will guess whether land surface fraction weights should be included or not based on the data being processed. When guessing, for ocean data, netCDF-SCM will weight cells only by the horizontal area of the cell i.e. no land fraction (see Section L5 of Griffies et al., GMD, 2016, https://doi.org/10.5194/gmd-9-3231-2016). For land variables, netCDF-SCM will weight cells by both thier horizontal area and their land surface fraction. “Yes, you do need to weight the output by land frac (sftlf is the CMIP variable name).” (Chris Jones, personal communication, 18 April 2020). For land variables, note that there seems to be nothing in Jones et al., GMD, 2016 (https://doi.org/10.5194/gmd-9-2853-2016).

  • log_failure (bool) – Should regions which fail be logged? If no, failures are raised as warnings.

Returns

dict of str – Dictionary of ‘region name’: weights, key: value pairs

Return type

np.ndarray

Notes

Only regions which can be calculated are returned. If no regions can be calculated, an empty dictionary will be returned.

get_variable_constraint()

Get the iris variable constraint to use when loading data with self.load_data_from_identifiers.

Returns

constraint to use which ensures that only the variable of interest is loaded.

Return type

iris.Constraint

info

Information about the cube’s source files

res["files"] contains the files used to load the data in this cube. res["metadata"] contains information for each of the metadata cubes used to load the data in this cube.

Returns

Return type

dict

lat_dim

iris.coords.DimCoord The latitude dimension of the data.

lat_dim_number

The index which corresponds to the latitude dimension.

e.g. if latitude is the first dimension of the data, then self.lat_dim_number will be 0 (Python is zero-indexed).

Type

int

lat_lon_shape

2D Tuple of int which gives shape of a lat-lon slice of the data

e.g. if the cube’s shape is (4, 3, 5, 4) and its dimensions are (time, lat, depth, lon) then cube.lat_lon_shape will be (3, 4)

Raises

AssertionError – No lat lon slice can be deduced (if this happens, please raise an issue at https://gitlab.com/netcdf-scm/netcdf-scm/issues so we can address your use case).

Type

tuple

load_data_from_identifiers(process_warnings=True, **kwargs)

Load data using key identifiers.

The identifiers are used to determine the path of the file to load. The file is then loaded into an iris cube which can be accessed through self.cube.

Parameters
  • process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?

  • kwargs (any) – Arguments which can then be processed by self.get_filepath_from_load_data_from_identifiers_args and self.get_variable_constraint to determine the full filepath of the file to load and the variable constraint to use.

load_data_from_path(filepath, process_warnings=True)

Load data from a path.

Parameters
  • filepath (str) – The filepath from which to load the data.

  • process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?

load_data_in_directory(directory=None, process_warnings=True)

Load data in a directory.

The data is loaded into an iris cube which can be accessed through self.cube.

Initially, this method is intended to only be used to load data when it is saved in a number of different timeslice files e.g.:

  • tas_Amon_HadCM3_rcp45_r1i1p1_200601-203012.nc

  • tas_Amon_HadCM3_rcp45_r1i1p1_203101-203512.nc

  • tas_Amon_HadCM3_rcp45_r1i1p1_203601-203812.nc

It is not intended to be used to load multiple different variables or non-continuous timeseries. These use cases could be added in future, but are not required yet so have not been included.

Note that this function removes any attributes which aren’t common between the loaded cubes. In general, we have found that this mainly means creation_date, tracking_id and history are deleted. If unsure, please check.

Parameters
  • directory (str) – Directory from which to load the data.

  • process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?

Raises

ValueError – If the files in the directory are not from the same run (i.e. their filenames are not identical except for the timestamp) or if the files don’t form a continuous timeseries.

lon_dim

iris.coords.DimCoord The longitude dimension of the data.

lon_dim_number

The index which corresponds to the longitude dimension.

e.g. if longitude is the third dimension of the data, then self.lon_dim_number will be 2 (Python is zero-indexed).

Type

int

mip_era = 'CMIP5'

The MIP era to which this cube belongs

Type

str

mip_table = None

The mip_table for which we want to load data e.g. ‘Amon’

Type

str

model = None

The model for which we want to load data e.g. ‘CanESM2’

Type

str

netcdf_scm_realm

The realm in which netCDF-SCM thinks the data belongs.

This is used to make decisions about how to take averages of the data and where to find metadata variables.

If it is not sure, netCDF-SCM will guess that the data belongs to the ‘atmosphere’ realm.

Type

str

process_filename(filename)[source]

Cut a filename into its identifiers

Parameters

filename (str) – The filename to process. Filename here means just the filename, no path should be included.

Returns

A dictionary where each key is the identifier name and each value is the value of that identifier for the input filename

Return type

dict

process_path(path)[source]

Cut a path into its identifiers

Parameters

path (str) – The path to process. Path here means just the path, no filename should be included.

Returns

A dictionary where each key is the identifier name and each value is the value of that identifier for the input path

Return type

dict

root_dir = None

The root directory of the database i.e. where the cube should start its path

e.g. /home/users/usertim/cmip5_25x25

Type

str

surface_fraction_var

The name of the variable associated with the surface fraction in each gridbox.

If required, this is used when looking for the surface fraction file which belongs to a given data file. For example, if our data file is tas_Amon_HadCM3_rcp45_r1i1p1_200601.nc then surface_fraction_var can be used to work out the name of the associated surface fraction file. In some cases, it might be as simple as replacing tas with the value of surface_fraction_var.

Type

str

table_name_for_metadata_vars

The name of the ‘table’ in which metadata variables can be found.

For example, fx or Ofx.

We wrap this as a property as table typically means table_id but is sometimes referred to in other ways e.g. as mip_table in CMIP5.

Type

str

time_dim

iris.coords.DimCoord The time dimension of the data.

time_dim_number

The index which corresponds to the time dimension.

e.g. if time is the first dimension of the data, then self.time_dim_number will be 0 (Python is zero-indexed).

Type

int

time_period = None

The time period for which we want to load data

If None, this information isn’t included in the filename which is useful for loading metadata files which don’t have a relevant time period.

Type

str

time_period_regex

Regular expression which captures the timeseries identifier in input data files.

For help on regular expressions, see regular expressions.

Type

_sre.SRE_Pattern

timestamp_definitions

Definition of valid timestamp information and corresponding key values.

This follows the CMIP standards where time strings must be one of the following: YYYY, YYYYMM, YYYYMMDD, YYYYMMDDHH or one of the previous combined with a hyphen e.g. YYYY-YYYY.

Each key in the definitions dictionary is the length of the timestamp. Each value is itself a dictionary, with keys:

  • datetime_str: the string required to convert a timestamp of this length into a datetime using datetime.datetime.strptime

  • generic_regexp: a regular expression which will match timestamps in this format

  • expected_timestep: a dateutil.relativedelta.relativedelta object which contains the expected timestep in files with this timestamp

Returns

Return type

dict

Examples

>>> self.timestamp_definitions[len("2012")]["datetime_str"]
"%Y"
variable_name = None

The variable for which we want to load data e.g. ‘tas’

Type

str

class netcdf_scm.iris_cube_wrappers.ScmCube[source]

Bases: object

Class for processing netCDF files for use with simple climate models.

areacell_var

The name of the variable associated with the area of each gridbox.

If required, this is used to determine the area of each cell in a data file. For example, if our data file is tas_Amon_HadCM3_rcp45_r1i1p1_200601.nc then areacell_var can be used to work out the name of the associated cell area file. In some cases, it might be as simple as replacing tas with the value of areacell_var.

Type

str

convert_scm_timeseries_cubes_to_openscmdata(scm_timeseries_cubes, out_calendar=None)[source]

Convert dictionary of SCM timeseries cubes to an scmdata.ScmRun

Parameters
  • scm_timeseries_cubes (dict) – Dictionary of “region name”-ScmCube key-value pairs.

  • out_calendar (str) – Calendar to use for the time axis of the output

Returns

scmdata.ScmRun containing the data from the SCM timeseries cubes

Return type

scmdata.ScmRun

Raises

NotImplementedError – The (original) input data has dimensions other than time, latitude and longitude (so the data to convert has dimensions other than time).

cube = None

The Iris cube which is wrapped by this ScmCube instance.

Type

iris.cube.Cube

dim_names

Names of the dimensions in this cube

Here the names are the standard_names which means there can be None in the output.

Type

list

get_area_weights(areacell_scmcube=None)[source]

Get area weights for this cube

Parameters

areacell_scmcube (ScmCube) – ScmCube containing areacell data. If None, we calculate the weights using iris.

Returns

Weights on the cube’s latitude-longitude grid.

Return type

np.ndarray

Raises
  • iris.exceptions.CoordinateMultiDimError – The cube’s co-ordinates are multi-dimensional and we don’t have cell area data.

  • ValueError – Area weights units are not as expected (contradict with self._area_weights_units).

get_metadata_cube(metadata_variable, cube=None)[source]

Load a metadata cube from self’s attributes.

Parameters
  • metadata_variable (str) – the name of the metadata variable to get, as it appears in the filename.

  • cube (ScmCube) – Optionally, pass in an already loaded metadata cube to link it to currently loaded cube.

Returns

instance of self which has been loaded from the file containing the metadata variable of interest.

Return type

type(self)

Raises

TypeErrorcube is not an ScmCube

get_scm_timeseries(**kwargs)[source]

Get SCM relevant timeseries from self.

Parameters

**kwargs – Passed to get_scm_timeseries_cubes()

Returns

scmdata.ScmRun instance with the data in the data attribute and metadata in the metadata attribute.

Return type

scmdata.ScmRun

get_scm_timeseries_cubes(lazy=False, **kwargs)[source]

Get SCM relevant cubes

The effective areas used for each of the regions are added as auxillary co-ordinates of each timeseries cube.

If global, Northern Hemisphere and Southern Hemisphere land cubes are calculated, then three auxillary co-ordinates are also added to each cube: land_fraction, land_fraction_northern_hemisphere and land_fraction_southern_hemisphere. These co-ordinates document the area fraction that was considered to be land when the cubes were crunched i.e. land_fraction is the fraction of the entire globe which was considered to be land, land_fraction_northern_hemisphere is the fraction of the Northern Hemisphere which was considered to be land and land_fraction_southern_hemisphere is the fraction of the Southern Hemisphere which was considered to be land.

Parameters
  • lazy (bool) – Should I process the data lazily? This can be slow as data has to be read off disk multiple time.

  • kwargs (anys) – Passed to get_scm_timeseries_weights()

Returns

dict of str – Dictionary of cubes (region: cube key: value pairs), with latitude-longitude mean data as appropriate for each of the requested regions.

Return type

ScmCube

Raises

InvalidWeightsError – No valid weights are found for the requested regions

get_scm_timeseries_weights(surface_fraction_cube=None, areacell_scmcube=None, regions=None, cell_weights=None, log_failure=False)[source]

Get the scm timeseries weights

Parameters
  • surface_fraction_cube (ScmCube, optional) – land surface fraction data which is used to determine whether a given gridbox is land or ocean. If None, we try to load the land surface fraction automatically.

  • areacell_scmcube (ScmCube, optional) – cell area data which is used to take the latitude-longitude mean of the cube’s data. If None, we try to load this data automatically and if that fails we fall back onto iris.analysis.cartography.area_weights.

  • regions (list[str]) – List of regions to use. If None then netcdf_scm.regions.DEFAULT_REGIONS is used.

  • cell_weights ({'area-only', 'area-surface-fraction'}) –

    How cell weights should be calculated. If 'area-surface-fraction', both cell area and its surface fraction will be used to weight the cell. If 'area-only', only the cell’s area will be used to weight the cell (cells which do not belong to the region are nonetheless excluded). If None, netCDF-SCM will guess whether land surface fraction weights should be included or not based on the data being processed. When guessing, for ocean data, netCDF-SCM will weight cells only by the horizontal area of the cell i.e. no land fraction (see Section L5 of Griffies et al., GMD, 2016, https://doi.org/10.5194/gmd-9-3231-2016). For land variables, netCDF-SCM will weight cells by both thier horizontal area and their land surface fraction. “Yes, you do need to weight the output by land frac (sftlf is the CMIP variable name).” (Chris Jones, personal communication, 18 April 2020). For land variables, note that there seems to be nothing in Jones et al., GMD, 2016 (https://doi.org/10.5194/gmd-9-2853-2016).

  • log_failure (bool) – Should regions which fail be logged? If no, failures are raised as warnings.

Returns

dict of str – Dictionary of ‘region name’: weights, key: value pairs

Return type

np.ndarray

Notes

Only regions which can be calculated are returned. If no regions can be calculated, an empty dictionary will be returned.

info

Information about the cube’s source files

res["files"] contains the files used to load the data in this cube. res["metadata"] contains information for each of the metadata cubes used to load the data in this cube.

Returns

Return type

dict

lat_dim

iris.coords.DimCoord The latitude dimension of the data.

lat_dim_number

The index which corresponds to the latitude dimension.

e.g. if latitude is the first dimension of the data, then self.lat_dim_number will be 0 (Python is zero-indexed).

Type

int

lat_lon_shape

2D Tuple of int which gives shape of a lat-lon slice of the data

e.g. if the cube’s shape is (4, 3, 5, 4) and its dimensions are (time, lat, depth, lon) then cube.lat_lon_shape will be (3, 4)

Raises

AssertionError – No lat lon slice can be deduced (if this happens, please raise an issue at https://gitlab.com/netcdf-scm/netcdf-scm/issues so we can address your use case).

Type

tuple

lat_name = 'latitude'

The expected name of the latitude co-ordinate in data.

Type

str

load_data_from_path(filepath, process_warnings=True)[source]

Load data from a path.

If you are using the ScmCube class directly, this method simply loads the path into an iris cube which can be accessed through self.cube.

If implemented on a subclass of ScmCube, this method should:

  • use self.get_load_data_from_identifiers_args_from_filepath to determine the suitable set of arguments to pass to self.load_data_from_identifiers from the filepath

  • load the data using self.load_data_from_identifiers as this method contains much better checks and helper components

Parameters
  • filepath (str) – The filepath from which to load the data.

  • process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?

load_data_in_directory(directory=None, process_warnings=True)[source]

Load data in a directory.

The data is loaded into an iris cube which can be accessed through self.cube.

Initially, this method is intended to only be used to load data when it is saved in a number of different timeslice files e.g.:

  • tas_Amon_HadCM3_rcp45_r1i1p1_200601-203012.nc

  • tas_Amon_HadCM3_rcp45_r1i1p1_203101-203512.nc

  • tas_Amon_HadCM3_rcp45_r1i1p1_203601-203812.nc

It is not intended to be used to load multiple different variables or non-continuous timeseries. These use cases could be added in future, but are not required yet so have not been included.

Note that this function removes any attributes which aren’t common between the loaded cubes. In general, we have found that this mainly means creation_date, tracking_id and history are deleted. If unsure, please check.

Parameters
  • directory (str) – Directory from which to load the data.

  • process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?

Raises

ValueError – If the files in the directory are not from the same run (i.e. their filenames are not identical except for the timestamp) or if the files don’t form a continuous timeseries.

lon_dim

iris.coords.DimCoord The longitude dimension of the data.

lon_dim_number

The index which corresponds to the longitude dimension.

e.g. if longitude is the third dimension of the data, then self.lon_dim_number will be 2 (Python is zero-indexed).

Type

int

lon_name = 'longitude'

The expected name of the longitude co-ordinate in data.

Type

str

netcdf_scm_realm

The realm in which netCDF-SCM thinks the data belongs.

This is used to make decisions about how to take averages of the data and where to find metadata variables.

If it is not sure, netCDF-SCM will guess that the data belongs to the ‘atmosphere’ realm.

Type

str

surface_fraction_var

The name of the variable associated with the surface fraction in each gridbox.

If required, this is used when looking for the surface fraction file which belongs to a given data file. For example, if our data file is tas_Amon_HadCM3_rcp45_r1i1p1_200601.nc then surface_fraction_var can be used to work out the name of the associated surface fraction file. In some cases, it might be as simple as replacing tas with the value of surface_fraction_var.

Type

str

table_name_for_metadata_vars

The name of the ‘table’ in which metadata variables can be found.

For example, fx or Ofx.

We wrap this as a property as table typically means table_id but is sometimes referred to in other ways e.g. as mip_table in CMIP5.

Type

str

time_dim

iris.coords.DimCoord The time dimension of the data.

time_dim_number

The index which corresponds to the time dimension.

e.g. if time is the first dimension of the data, then self.time_dim_number will be 0 (Python is zero-indexed).

Type

int

time_name = 'time'

The expected name of the time co-ordinate in data.

Type

str

time_period_regex

Regular expression which captures the timeseries identifier in input data files.

For help on regular expressions, see regular expressions.

Type

_sre.SRE_Pattern

time_period_separator = '-'

Character used to separate time period strings in the time period indicator in filenames.

e.g. - is the ‘time period separator’ in “2015-2030”.

Type

str

timestamp_definitions

Definition of valid timestamp information and corresponding key values.

This follows the CMIP standards where time strings must be one of the following: YYYY, YYYYMM, YYYYMMDD, YYYYMMDDHH or one of the previous combined with a hyphen e.g. YYYY-YYYY.

Each key in the definitions dictionary is the length of the timestamp. Each value is itself a dictionary, with keys:

  • datetime_str: the string required to convert a timestamp of this length into a datetime using datetime.datetime.strptime

  • generic_regexp: a regular expression which will match timestamps in this format

  • expected_timestep: a dateutil.relativedelta.relativedelta object which contains the expected timestep in files with this timestamp

Returns

Return type

dict

Examples

>>> self.timestamp_definitions[len("2012")]["datetime_str"]
"%Y"