Iris cube wrappers API¶

Wrappers of the iris cube.

These classes automate handling of a number of netCDF processing steps. For example, finding surface land fraction files, applying regions to data and returning timeseries in key regions for simple climate models.

class netcdf_scm.iris_cube_wrappers.CMIP6Input4MIPsCube[source]¶

Bases: netcdf_scm.iris_cube_wrappers._CMIPCube

Cube which can be used with CMIP6 input4MIPs data

The data must match the CMIP6 Forcing Datasets Summary, specifically the Forcing Dataset Specifications.

activity_id = None¶

The activity_id for which we want to load data.

For these cubes, this will almost always be input4MIPs.

Type: str

areacell_var¶

The name of the variable associated with the area of each gridbox.

If required, this is used to determine the area of each cell in a data file. For example, if our data file is tas_Amon_HadCM3_rcp45_r1i1p1_200601.nc then areacell_var can be used to work out the name of the associated cell area file. In some cases, it might be as simple as replacing tas with the value of areacell_var.

Type: str

convert_scm_timeseries_cubes_to_openscmdata(scm_timeseries_cubes, out_calendar=None)¶

Convert dictionary of SCM timeseries cubes to an scmdata.ScmRun

Parameters

scm_timeseries_cubes (dict) – Dictionary of “region name”-ScmCube key-value pairs.
out_calendar (str) – Calendar to use for the time axis of the output

Returns

scmdata.ScmRun containing the data from the SCM timeseries cubes

Return type

scmdata.ScmRun

Raises

NotImplementedError – The (original) input data has dimensions other than time, latitude and longitude (so the data to convert has dimensions other than time).

dataset_category = None¶

The dataset_category for which we want to load data e.g. GHGConcentrations

Type: str

dim_names¶

Names of the dimensions in this cube

Here the names are the standard_names which means there can be None in the output.

Type: list

file_ext = None¶

The file extension of the data file we want to load e.g. .nc

Type: str

frequency = None¶

The frequency for which we want to load data e.g. yr

Type: str

get_area_weights(areacell_scmcube=None)¶

Get area weights for this cube

Parameters

areacell_scmcube (ScmCube) – ScmCube containing areacell data. If None, we calculate the weights using iris.

Returns

Weights on the cube’s latitude-longitude grid.

Return type

np.ndarray

Raises

iris.exceptions.CoordinateMultiDimError – The cube’s co-ordinates are multi-dimensional and we don’t have cell area data.
ValueError – Area weights units are not as expected (contradict with self._area_weights_units).

get_data_directory()¶

Get the path to a data file from self’s attributes.

This can take multiple forms, it may just return a previously set filepath attribute or it could combine a number of different metadata elements (e.g. model name, experiment name) to create the data path.

Returns: path to the data file from which this cube has been/will be loaded
Return type: str
Raises: OSError – The data directory cannot be determined

get_data_filename()¶

Get the name of a data file from self’s attributes.

This can take multiple forms, it may just return a previously set filename attribute or it could combine a number of different metadata elements (e.g. model name, experiment name) to create the data name.

Returns: name of the data file from which this cube has been/will be loaded.
Return type: str
Raises: OSError – The data directory cannot be determined

classmethod get_data_reference_syntax(**kwargs)¶

Get data reference syntax for this cube

Parameters: kwargs (str) – Attributes of the cube to set before generating the example data reference syntax.
Returns: Example of the full path to a file for the given kwargs with this cube’s data reference syntax.
Return type: str

get_filepath_from_load_data_from_identifiers_args(**kwargs)[source]¶

Get the full filepath of the data to load from the arguments passed to self.load_data_from_identifiers.

Full details about the meaning of the identifiers are given in the Forcing Dataset Specifications.

Parameters: kwargs (str) – Identifiers to use to load the data
Returns: The full filepath (path and name) of the file to load.
Return type: str
Raises: AttributeError – An input argument does not match with the cube’s data reference syntax

get_load_data_from_identifiers_args_from_filepath(filepath)¶

Get the set of identifiers to use to load data from a filepath.

Parameters: filepath (str) – The filepath from which to load the data.
Returns: Set of arguments which can be passed to self.load_data_from_identifiers to load the data in the filepath.
Return type: dict
Raises: ValueError – Path and filename contradict each other

get_metadata_cube(metadata_variable, cube=None)¶

Load a metadata cube from self’s attributes.

Parameters

metadata_variable (str) – the name of the metadata variable to get, as it appears in the filename.
cube (ScmCube) – Optionally, pass in an already loaded metadata cube to link it to currently loaded cube

Returns

instance of self which has been loaded from the file containing the metadata variable of interest.

Return type

type(self)

Raises

TypeError – cube is not an ScmCube

get_scm_timeseries(**kwargs)¶

Get SCM relevant timeseries from self.

Parameters: **kwargs – Passed to get_scm_timeseries_cubes()
Returns: scmdata.ScmRun instance with the data in the data attribute and metadata in the metadata attribute.
Return type: scmdata.ScmRun

get_scm_timeseries_cubes(lazy=False, **kwargs)¶

Get SCM relevant cubes

The effective areas used for each of the regions are added as auxillary co-ordinates of each timeseries cube.

If global, Northern Hemisphere and Southern Hemisphere land cubes are calculated, then three auxillary co-ordinates are also added to each cube: land_fraction, land_fraction_northern_hemisphere and land_fraction_southern_hemisphere. These co-ordinates document the area fraction that was considered to be land when the cubes were crunched i.e. land_fraction is the fraction of the entire globe which was considered to be land, land_fraction_northern_hemisphere is the fraction of the Northern Hemisphere which was considered to be land and land_fraction_southern_hemisphere is the fraction of the Southern Hemisphere which was considered to be land.

Parameters

lazy (bool) – Should I process the data lazily? This can be slow as data has to be read off disk multiple time.
kwargs (anys) – Passed to get_scm_timeseries_weights()

Returns

dict of str – Dictionary of cubes (region: cube key: value pairs), with latitude-longitude mean data as appropriate for each of the requested regions.

Return type

ScmCube

Raises

InvalidWeightsError – No valid weights are found for the requested regions

get_scm_timeseries_weights(surface_fraction_cube=None, areacell_scmcube=None, regions=None, cell_weights=None, log_failure=False)¶

Get the scm timeseries weights

Parameters

surface_fraction_cube (ScmCube, optional) – land surface fraction data which is used to determine whether a given gridbox is land or ocean. If None, we try to load the land surface fraction automatically.
areacell_scmcube (ScmCube, optional) – cell area data which is used to take the latitude-longitude mean of the cube’s data. If None, we try to load this data automatically and if that fails we fall back onto iris.analysis.cartography.area_weights.
regions (list[str]) – List of regions to use. If None then netcdf_scm.regions.DEFAULT_REGIONS is used.
cell_weights ({'area-only', 'area-surface-fraction'}) – How cell weights should be calculated. If 'area-surface-fraction', both cell area and its surface fraction will be used to weight the cell. If 'area-only', only the cell’s area will be used to weight the cell (cells which do not belong to the region are nonetheless excluded). If None, netCDF-SCM will guess whether land surface fraction weights should be included or not based on the data being processed. When guessing, for ocean data, netCDF-SCM will weight cells only by the horizontal area of the cell i.e. no land fraction (see Section L5 of Griffies et al., GMD, 2016, https://doi.org/10.5194/gmd-9-3231-2016). For land variables, netCDF-SCM will weight cells by both thier horizontal area and their land surface fraction. “Yes, you do need to weight the output by land frac (sftlf is the CMIP variable name).” (Chris Jones, personal communication, 18 April 2020). For land variables, note that there seems to be nothing in Jones et al., GMD, 2016 (https://doi.org/10.5194/gmd-9-2853-2016).
log_failure (bool) – Should regions which fail be logged? If no, failures are raised as warnings.

Returns

dict of str – Dictionary of ‘region name’: weights, key: value pairs

Return type

np.ndarray

Notes

Only regions which can be calculated are returned. If no regions can be calculated, an empty dictionary will be returned.

get_variable_constraint()¶

Get the iris variable constraint to use when loading data with self.load_data_from_identifiers.

Returns: constraint to use which ensures that only the variable of interest is loaded.
Return type: iris.Constraint

grid_label = None¶

The grid_label for which we want to load data e.g. gr1-GMNHSH

Type: str

info¶

Information about the cube’s source files

res["files"] contains the files used to load the data in this cube. res["metadata"] contains information for each of the metadata cubes used to load the data in this cube.

Returns
Return type: dict

institution_id = None¶

The institution_id for which we want to load data e.g. UoM

Type: str

lat_dim¶: iris.coords.DimCoord The latitude dimension of the data.

lat_dim_number¶

The index which corresponds to the latitude dimension.

e.g. if latitude is the first dimension of the data, then self.lat_dim_number will be 0 (Python is zero-indexed).

Type: int

lat_lon_shape¶

2D Tuple of int which gives shape of a lat-lon slice of the data

e.g. if the cube’s shape is (4, 3, 5, 4) and its dimensions are (time, lat, depth, lon) then cube.lat_lon_shape will be (3, 4)

Raises: AssertionError – No lat lon slice can be deduced (if this happens, please raise an issue at https://gitlab.com/netcdf-scm/netcdf-scm/issues so we can address your use case).
Type: tuple

load_data_from_identifiers(process_warnings=True, **kwargs)¶

Load data using key identifiers.

The identifiers are used to determine the path of the file to load. The file is then loaded into an iris cube which can be accessed through self.cube.

Parameters

process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?
kwargs (any) – Arguments which can then be processed by self.get_filepath_from_load_data_from_identifiers_args and self.get_variable_constraint to determine the full filepath of the file to load and the variable constraint to use.

load_data_from_path(filepath, process_warnings=True)¶

Load data from a path.

Parameters

filepath (str) – The filepath from which to load the data.
process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?

load_data_in_directory(directory=None, process_warnings=True)¶

Load data in a directory.

The data is loaded into an iris cube which can be accessed through self.cube.

Initially, this method is intended to only be used to load data when it is saved in a number of different timeslice files e.g.:

tas_Amon_HadCM3_rcp45_r1i1p1_200601-203012.nc
tas_Amon_HadCM3_rcp45_r1i1p1_203101-203512.nc
tas_Amon_HadCM3_rcp45_r1i1p1_203601-203812.nc

It is not intended to be used to load multiple different variables or non-continuous timeseries. These use cases could be added in future, but are not required yet so have not been included.

Note that this function removes any attributes which aren’t common between the loaded cubes. In general, we have found that this mainly means creation_date, tracking_id and history are deleted. If unsure, please check.

Parameters

directory (str) – Directory from which to load the data.
process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?

Raises

ValueError – If the files in the directory are not from the same run (i.e. their filenames are not identical except for the timestamp) or if the files don’t form a continuous timeseries.

lon_dim¶: iris.coords.DimCoord The longitude dimension of the data.

lon_dim_number¶

The index which corresponds to the longitude dimension.

e.g. if longitude is the third dimension of the data, then self.lon_dim_number will be 2 (Python is zero-indexed).

Type: int

mip_era = None¶

The mip_era for which we want to load data e.g. CMIP6

Type: str

netcdf_scm_realm¶

The realm in which netCDF-SCM thinks the data belongs.

This is used to make decisions about how to take averages of the data and where to find metadata variables.

If it is not sure, netCDF-SCM will guess that the data belongs to the ‘atmosphere’ realm.

Type: str

process_filename(filename)[source]¶

Cut a filename into its identifiers

Parameters: filename (str) – The filename to process. Filename here means just the filename, no path should be included.
Returns: A dictionary where each key is the identifier name and each value is the value of that identifier for the input filename
Return type: dict

process_path(path)[source]¶

Cut a path into its identifiers

Parameters: path (str) – The path to process. Path here means just the path, no filename should be included.
Returns: A dictionary where each key is the identifier name and each value is the value of that identifier for the input path
Return type: dict

realm = None¶

The realm for which we want to load data e.g. atmos

Type: str

root_dir = None¶

The root directory of the database i.e. where the cube should start its: path

e.g. /home/users/usertim/cmip6input.

Type: str

source_id = None¶

The source_id for which we want to load data e.g. UoM-REMIND-MAGPIE-ssp585-1-2-0

This must include the institution_id.

Type: str

surface_fraction_var¶

The name of the variable associated with the surface fraction in each gridbox.

If required, this is used when looking for the surface fraction file which belongs to a given data file. For example, if our data file is tas_Amon_HadCM3_rcp45_r1i1p1_200601.nc then surface_fraction_var can be used to work out the name of the associated surface fraction file. In some cases, it might be as simple as replacing tas with the value of surface_fraction_var.

Type: str

table_name_for_metadata_vars¶

The name of the ‘table’ in which metadata variables can be found.

For example, fx or Ofx.

We wrap this as a property as table typically means table_id but is sometimes referred to in other ways e.g. as mip_table in CMIP5.

Type: str

target_mip = None¶

The target_mip for which we want to load data e.g. ScenarioMIP

Type: str

time_dim¶: iris.coords.DimCoord The time dimension of the data.

time_dim_number¶

The index which corresponds to the time dimension.

e.g. if time is the first dimension of the data, then self.time_dim_number will be 0 (Python is zero-indexed).

Type: int

time_period_regex¶

Regular expression which captures the timeseries identifier in input data files.

For help on regular expressions, see regular expressions.

Type: _sre.SRE_Pattern

time_range = None¶

The time range for which we want to load data e.g. 2005-2100

If None, this information isn’t included in the filename which is useful for loading metadata files which don’t have a relevant time period.

Type: str

timestamp_definitions¶

Definition of valid timestamp information and corresponding key values.

This follows the CMIP standards where time strings must be one of the following: YYYY, YYYYMM, YYYYMMDD, YYYYMMDDHH or one of the previous combined with a hyphen e.g. YYYY-YYYY.

Each key in the definitions dictionary is the length of the timestamp. Each value is itself a dictionary, with keys:

datetime_str: the string required to convert a timestamp of this length into a datetime using datetime.datetime.strptime
generic_regexp: a regular expression which will match timestamps in this format
expected_timestep: a dateutil.relativedelta.relativedelta object which contains the expected timestep in files with this timestamp

Returns
Return type: dict

Examples

>>> self.timestamp_definitions[len("2012")]["datetime_str"]
"%Y"

variable_id = None¶

The variable_id for which we want to load data e.g. mole-fraction-of-carbon-dioxide-in-air

Type: str

version = None¶

The version for which we want to load data e.g. v20180427

Type: str

class netcdf_scm.iris_cube_wrappers.CMIP6OutputCube[source]¶

Bases: netcdf_scm.iris_cube_wrappers._CMIPCube

Cube which can be used with CMIP6 model output data

The data must match the CMIP6 data reference syntax as specified in the ‘File name template’ and ‘Directory structure template’ sections of the CMIP6 Data Reference Syntax.

activity_id = None¶

The activity_id for which we want to load data.

In CMIP6, this denotes the responsible MIP e.g. DCPP.

Type: str

areacell_var¶

The name of the variable associated with the area of each gridbox.

If required, this is used to determine the area of each cell in a data file. For example, if our data file is tas_Amon_HadCM3_rcp45_r1i1p1_200601.nc then areacell_var can be used to work out the name of the associated cell area file. In some cases, it might be as simple as replacing tas with the value of areacell_var.

Type: str

convert_scm_timeseries_cubes_to_openscmdata(scm_timeseries_cubes, out_calendar=None)¶

Convert dictionary of SCM timeseries cubes to an scmdata.ScmRun

Parameters

scm_timeseries_cubes (dict) – Dictionary of “region name”-ScmCube key-value pairs.
out_calendar (str) – Calendar to use for the time axis of the output

Returns

scmdata.ScmRun containing the data from the SCM timeseries cubes

Return type

scmdata.ScmRun

Raises

NotImplementedError – The (original) input data has dimensions other than time, latitude and longitude (so the data to convert has dimensions other than time).

dim_names¶

Names of the dimensions in this cube

Here the names are the standard_names which means there can be None in the output.

Type: list

experiment_id = None¶

The experiment_id for which we want to load data e.g. dcppA-hindcast

Type: str

file_ext = None¶

The file extension of the data file we want to load e.g. .nc

Type: str

get_area_weights(areacell_scmcube=None)¶

Get area weights for this cube

Parameters

areacell_scmcube (ScmCube) – ScmCube containing areacell data. If None, we calculate the weights using iris.

Returns

Weights on the cube’s latitude-longitude grid.

Return type

np.ndarray

Raises

iris.exceptions.CoordinateMultiDimError – The cube’s co-ordinates are multi-dimensional and we don’t have cell area data.
ValueError – Area weights units are not as expected (contradict with self._area_weights_units).

get_data_directory()¶

Get the path to a data file from self’s attributes.

This can take multiple forms, it may just return a previously set filepath attribute or it could combine a number of different metadata elements (e.g. model name, experiment name) to create the data path.

Returns: path to the data file from which this cube has been/will be loaded
Return type: str
Raises: OSError – The data directory cannot be determined

get_data_filename()¶

Get the name of a data file from self’s attributes.

This can take multiple forms, it may just return a previously set filename attribute or it could combine a number of different metadata elements (e.g. model name, experiment name) to create the data name.

Returns: name of the data file from which this cube has been/will be loaded.
Return type: str
Raises: OSError – The data directory cannot be determined

classmethod get_data_reference_syntax(**kwargs)¶

Get data reference syntax for this cube

Parameters: kwargs (str) – Attributes of the cube to set before generating the example data reference syntax.
Returns: Example of the full path to a file for the given kwargs with this cube’s data reference syntax.
Return type: str

get_filepath_from_load_data_from_identifiers_args(**kwargs)[source]¶

Get the full filepath of the data to load from the arguments passed to self.load_data_from_identifiers.

Full details about the meaning of each identifier is given in Table 1 of the CMIP6 Data Reference Syntax.

Parameters: kwargs (str) – Identifiers to use to load the data
Returns: The full filepath (path and name) of the file to load.
Return type: str
Raises: AttributeError – An input argument does not match with the cube’s data reference syntax

classmethod get_instance_id(filepath)[source]¶

Get the instance_id from a given path

This is used as a unique identifier for datasets on the ESGF.

Parameters: filepath (str) – Full file path including directory structure
Raises: ValueError: – If the filepath provided results in an instance id which is obviously incorrect
Returns: Instance ID
Return type: str

get_load_data_from_identifiers_args_from_filepath(filepath)¶

Get the set of identifiers to use to load data from a filepath.

Parameters: filepath (str) – The filepath from which to load the data.
Returns: Set of arguments which can be passed to self.load_data_from_identifiers to load the data in the filepath.
Return type: dict
Raises: ValueError – Path and filename contradict each other

get_metadata_cube(metadata_variable, cube=None)¶

Load a metadata cube from self’s attributes.

Parameters

metadata_variable (str) – the name of the metadata variable to get, as it appears in the filename.
cube (ScmCube) – Optionally, pass in an already loaded metadata cube to link it to currently loaded cube

Returns

instance of self which has been loaded from the file containing the metadata variable of interest.

Return type

type(self)

Raises

TypeError – cube is not an ScmCube

get_scm_timeseries(**kwargs)¶

Get SCM relevant timeseries from self.

Parameters: **kwargs – Passed to get_scm_timeseries_cubes()
Returns: scmdata.ScmRun instance with the data in the data attribute and metadata in the metadata attribute.
Return type: scmdata.ScmRun

get_scm_timeseries_cubes(lazy=False, **kwargs)¶

Get SCM relevant cubes

The effective areas used for each of the regions are added as auxillary co-ordinates of each timeseries cube.

If global, Northern Hemisphere and Southern Hemisphere land cubes are calculated, then three auxillary co-ordinates are also added to each cube: land_fraction, land_fraction_northern_hemisphere and land_fraction_southern_hemisphere. These co-ordinates document the area fraction that was considered to be land when the cubes were crunched i.e. land_fraction is the fraction of the entire globe which was considered to be land, land_fraction_northern_hemisphere is the fraction of the Northern Hemisphere which was considered to be land and land_fraction_southern_hemisphere is the fraction of the Southern Hemisphere which was considered to be land.

Parameters

lazy (bool) – Should I process the data lazily? This can be slow as data has to be read off disk multiple time.
kwargs (anys) – Passed to get_scm_timeseries_weights()

Returns

dict of str – Dictionary of cubes (region: cube key: value pairs), with latitude-longitude mean data as appropriate for each of the requested regions.

Return type

ScmCube

Raises

InvalidWeightsError – No valid weights are found for the requested regions

get_scm_timeseries_weights(surface_fraction_cube=None, areacell_scmcube=None, regions=None, cell_weights=None, log_failure=False)¶

Get the scm timeseries weights

Parameters

surface_fraction_cube (ScmCube, optional) – land surface fraction data which is used to determine whether a given gridbox is land or ocean. If None, we try to load the land surface fraction automatically.
areacell_scmcube (ScmCube, optional) – cell area data which is used to take the latitude-longitude mean of the cube’s data. If None, we try to load this data automatically and if that fails we fall back onto iris.analysis.cartography.area_weights.
regions (list[str]) – List of regions to use. If None then netcdf_scm.regions.DEFAULT_REGIONS is used.
cell_weights ({'area-only', 'area-surface-fraction'}) –
How cell weights should be calculated. If 'area-surface-fraction', both cell area and its surface fraction will be used to weight the cell. If 'area-only', only the cell’s area will be used to weight the cell (cells which do not belong to the region are nonetheless excluded). If None, netCDF-SCM will guess whether land surface fraction weights should be included or not based on the data being processed. When guessing, for ocean data, netCDF-SCM will weight cells only by the horizontal area of the cell i.e. no land fraction (see Section L5 of Griffies et al., GMD, 2016, https://doi.org/10.5194/gmd-9-3231-2016). For land variables, netCDF-SCM will weight cells by both thier horizontal area and their land surface fraction. “Yes, you do need to weight the output by land frac (sftlf is the CMIP variable name).” (Chris Jones, personal communication, 18 April 2020). For land variables, note that there seems to be nothing in Jones et al., GMD, 2016 (https://doi.org/10.5194/gmd-9-2853-2016).
log_failure (bool) – Should regions which fail be logged? If no, failures are raised as warnings.

Returns

dict of str – Dictionary of ‘region name’: weights, key: value pairs

Return type

np.ndarray

Notes

Only regions which can be calculated are returned. If no regions can be calculated, an empty dictionary will be returned.

get_variable_constraint()¶

Get the iris variable constraint to use when loading data with self.load_data_from_identifiers.

Returns: constraint to use which ensures that only the variable of interest is loaded.
Return type: iris.Constraint

grid_label = None¶

The grid_label for which we want to load data e.g. grn

Type: str

info¶

Information about the cube’s source files

res["files"] contains the files used to load the data in this cube. res["metadata"] contains information for each of the metadata cubes used to load the data in this cube.

Returns
Return type: dict

institution_id = None¶

The institution_id for which we want to load data e.g. CNRM-CERFACS

Type: str

lat_dim¶: iris.coords.DimCoord The latitude dimension of the data.

lat_dim_number¶

The index which corresponds to the latitude dimension.

e.g. if latitude is the first dimension of the data, then self.lat_dim_number will be 0 (Python is zero-indexed).

Type: int

lat_lon_shape¶

2D Tuple of int which gives shape of a lat-lon slice of the data

e.g. if the cube’s shape is (4, 3, 5, 4) and its dimensions are (time, lat, depth, lon) then cube.lat_lon_shape will be (3, 4)

Raises: AssertionError – No lat lon slice can be deduced (if this happens, please raise an issue at https://gitlab.com/netcdf-scm/netcdf-scm/issues so we can address your use case).
Type: tuple

load_data_from_identifiers(process_warnings=True, **kwargs)¶

Load data using key identifiers.

The identifiers are used to determine the path of the file to load. The file is then loaded into an iris cube which can be accessed through self.cube.

Parameters

process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?
kwargs (any) – Arguments which can then be processed by self.get_filepath_from_load_data_from_identifiers_args and self.get_variable_constraint to determine the full filepath of the file to load and the variable constraint to use.

load_data_from_path(filepath, process_warnings=True)¶

Load data from a path.

Parameters

filepath (str) – The filepath from which to load the data.
process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?

load_data_in_directory(directory=None, process_warnings=True)¶

Load data in a directory.

The data is loaded into an iris cube which can be accessed through self.cube.

Initially, this method is intended to only be used to load data when it is saved in a number of different timeslice files e.g.:

tas_Amon_HadCM3_rcp45_r1i1p1_200601-203012.nc
tas_Amon_HadCM3_rcp45_r1i1p1_203101-203512.nc
tas_Amon_HadCM3_rcp45_r1i1p1_203601-203812.nc

It is not intended to be used to load multiple different variables or non-continuous timeseries. These use cases could be added in future, but are not required yet so have not been included.

Note that this function removes any attributes which aren’t common between the loaded cubes. In general, we have found that this mainly means creation_date, tracking_id and history are deleted. If unsure, please check.

Parameters

directory (str) – Directory from which to load the data.
process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?

Raises

ValueError – If the files in the directory are not from the same run (i.e. their filenames are not identical except for the timestamp) or if the files don’t form a continuous timeseries.

lon_dim¶: iris.coords.DimCoord The longitude dimension of the data.

lon_dim_number¶

The index which corresponds to the longitude dimension.

e.g. if longitude is the third dimension of the data, then self.lon_dim_number will be 2 (Python is zero-indexed).

Type: int

member_id = None¶

The member_id for which we want to load data e.g. s1960-r2i1p1f3

Type: str

mip_era = None¶

The mip_era for which we want to load data e.g. CMIP6

Type: str

netcdf_scm_realm¶

The realm in which netCDF-SCM thinks the data belongs.

This is used to make decisions about how to take averages of the data and where to find metadata variables.

If it is not sure, netCDF-SCM will guess that the data belongs to the ‘atmosphere’ realm.

Type: str

process_filename(filename)[source]¶

Cut a filename into its identifiers

Parameters: filename (str) – The filename to process. Filename here means just the filename, no path should be included.
Returns: A dictionary where each key is the identifier name and each value is the value of that identifier for the input filename
Return type: dict

process_path(path)[source]¶

Cut a path into its identifiers

Parameters: path (str) – The path to process. Path here means just the path, no filename should be included.
Returns: A dictionary where each key is the identifier name and each value is the value of that identifier for the input path
Return type: dict

root_dir = None¶

The root directory of the database i.e. where the cube should start its: path

e.g. /home/users/usertim/cmip6_data.

Type: str

source_id = None¶

The source_id for which we want to load data e.g. CNRM-CM6-1

This was known as model in CMIP5.

Type: str

surface_fraction_var¶

The name of the variable associated with the surface fraction in each gridbox.

If required, this is used when looking for the surface fraction file which belongs to a given data file. For example, if our data file is tas_Amon_HadCM3_rcp45_r1i1p1_200601.nc then surface_fraction_var can be used to work out the name of the associated surface fraction file. In some cases, it might be as simple as replacing tas with the value of surface_fraction_var.

Type: str

table_id = None¶

The table_id for which we want to load data. e.g. day

Type: str

table_name_for_metadata_vars¶

The name of the ‘table’ in which metadata variables can be found.

For example, fx or Ofx.

We wrap this as a property as table typically means table_id but is sometimes referred to in other ways e.g. as mip_table in CMIP5.

Type: str

time_dim¶: iris.coords.DimCoord The time dimension of the data.

time_dim_number¶

The index which corresponds to the time dimension.

e.g. if time is the first dimension of the data, then self.time_dim_number will be 0 (Python is zero-indexed).

Type: int

time_period_regex¶

Regular expression which captures the timeseries identifier in input data files.

For help on regular expressions, see regular expressions.

Type: _sre.SRE_Pattern

time_range = None¶

The time range for which we want to load data e.g. 198001-198412

If None, this information isn’t included in the filename which is useful for loading metadata files which don’t have a relevant time period.

Type: str

timestamp_definitions¶

Definition of valid timestamp information and corresponding key values.

This follows the CMIP standards where time strings must be one of the following: YYYY, YYYYMM, YYYYMMDD, YYYYMMDDHH or one of the previous combined with a hyphen e.g. YYYY-YYYY.

Each key in the definitions dictionary is the length of the timestamp. Each value is itself a dictionary, with keys:

datetime_str: the string required to convert a timestamp of this length into a datetime using datetime.datetime.strptime
generic_regexp: a regular expression which will match timestamps in this format
expected_timestep: a dateutil.relativedelta.relativedelta object which contains the expected timestep in files with this timestamp

Returns
Return type: dict

Examples

>>> self.timestamp_definitions[len("2012")]["datetime_str"]
"%Y"

variable_id = None¶

The variable_id for which we want to load data e.g. pr

Type: str

version = None¶

The version for which we want to load data e.g. v20160215

Type: str

class netcdf_scm.iris_cube_wrappers.MarbleCMIP5Cube[source]¶

Bases: netcdf_scm.iris_cube_wrappers._CMIPCube

Cube which can be used with the cmip5 directory on marble (identical to ETH Zurich’s archive).

This directory structure is very similar, but not quite identical, to the recommended CMIP5 directory structure described in section 3.1 of the CMIP5 Data Reference Syntax.

activity = None¶

The activity for which we want to load data e.g. ‘cmip5’

Type: str

areacell_var¶

The name of the variable associated with the area of each gridbox.

If required, this is used to determine the area of each cell in a data file. For example, if our data file is tas_Amon_HadCM3_rcp45_r1i1p1_200601.nc then areacell_var can be used to work out the name of the associated cell area file. In some cases, it might be as simple as replacing tas with the value of areacell_var.

Type: str

convert_scm_timeseries_cubes_to_openscmdata(scm_timeseries_cubes, out_calendar=None)¶

Convert dictionary of SCM timeseries cubes to an scmdata.ScmRun

Parameters

scm_timeseries_cubes (dict) – Dictionary of “region name”-ScmCube key-value pairs.
out_calendar (str) – Calendar to use for the time axis of the output

Returns

scmdata.ScmRun containing the data from the SCM timeseries cubes

Return type

scmdata.ScmRun

Raises

NotImplementedError – The (original) input data has dimensions other than time, latitude and longitude (so the data to convert has dimensions other than time).

dim_names¶

Names of the dimensions in this cube

Here the names are the standard_names which means there can be None in the output.

Type: list

ensemble_member = None¶

The ensemble member for which we want to load data e.g. ‘r1i1p1’

Type: str

experiment = None¶

The experiment for which we want to load data e.g. ‘1pctCO2’

Type: str

file_ext = None¶

The file extension of the data file we want to load e.g. ‘.nc’

Type: str

get_area_weights(areacell_scmcube=None)¶

Get area weights for this cube

Parameters

areacell_scmcube (ScmCube) – ScmCube containing areacell data. If None, we calculate the weights using iris.

Returns

Weights on the cube’s latitude-longitude grid.

Return type

np.ndarray

Raises

iris.exceptions.CoordinateMultiDimError – The cube’s co-ordinates are multi-dimensional and we don’t have cell area data.
ValueError – Area weights units are not as expected (contradict with self._area_weights_units).

get_data_directory()¶

Get the path to a data file from self’s attributes.

This can take multiple forms, it may just return a previously set filepath attribute or it could combine a number of different metadata elements (e.g. model name, experiment name) to create the data path.

Returns: path to the data file from which this cube has been/will be loaded
Return type: str
Raises: OSError – The data directory cannot be determined

get_data_filename()¶

Get the name of a data file from self’s attributes.

This can take multiple forms, it may just return a previously set filename attribute or it could combine a number of different metadata elements (e.g. model name, experiment name) to create the data name.

Returns: name of the data file from which this cube has been/will be loaded.
Return type: str
Raises: OSError – The data directory cannot be determined

classmethod get_data_reference_syntax(**kwargs)¶

Get data reference syntax for this cube

Parameters: kwargs (str) – Attributes of the cube to set before generating the example data reference syntax.
Returns: Example of the full path to a file for the given kwargs with this cube’s data reference syntax.
Return type: str

get_filepath_from_load_data_from_identifiers_args(**kwargs)[source]¶

Get the full filepath of the data to load from the arguments passed to self.load_data_from_identifiers.

Full details about the identifiers are given in Section 2 of the CMIP5 Data Reference Syntax.

Parameters: kwargs (str) – Identifiers to use to load the data
Returns: The full filepath (path and name) of the file to load.
Return type: str
Raises: AttributeError – An input argument does not match with the cube’s data reference syntax

get_load_data_from_identifiers_args_from_filepath(filepath)¶

Get the set of identifiers to use to load data from a filepath.

Parameters: filepath (str) – The filepath from which to load the data.
Returns: Set of arguments which can be passed to self.load_data_from_identifiers to load the data in the filepath.
Return type: dict
Raises: ValueError – Path and filename contradict each other

get_metadata_cube(metadata_variable, cube=None)¶

Load a metadata cube from self’s attributes.

Parameters

metadata_variable (str) – the name of the metadata variable to get, as it appears in the filename.
cube (ScmCube) – Optionally, pass in an already loaded metadata cube to link it to currently loaded cube

Returns

instance of self which has been loaded from the file containing the metadata variable of interest.

Return type

type(self)

Raises

TypeError – cube is not an ScmCube

get_scm_timeseries(**kwargs)¶

Get SCM relevant timeseries from self.

Parameters: **kwargs – Passed to get_scm_timeseries_cubes()
Returns: scmdata.ScmRun instance with the data in the data attribute and metadata in the metadata attribute.
Return type: scmdata.ScmRun

get_scm_timeseries_cubes(lazy=False, **kwargs)¶

Get SCM relevant cubes

The effective areas used for each of the regions are added as auxillary co-ordinates of each timeseries cube.

If global, Northern Hemisphere and Southern Hemisphere land cubes are calculated, then three auxillary co-ordinates are also added to each cube: land_fraction, land_fraction_northern_hemisphere and land_fraction_southern_hemisphere. These co-ordinates document the area fraction that was considered to be land when the cubes were crunched i.e. land_fraction is the fraction of the entire globe which was considered to be land, land_fraction_northern_hemisphere is the fraction of the Northern Hemisphere which was considered to be land and land_fraction_southern_hemisphere is the fraction of the Southern Hemisphere which was considered to be land.

Parameters

lazy (bool) – Should I process the data lazily? This can be slow as data has to be read off disk multiple time.
kwargs (anys) – Passed to get_scm_timeseries_weights()

Returns

dict of str – Dictionary of cubes (region: cube key: value pairs), with latitude-longitude mean data as appropriate for each of the requested regions.

Return type

ScmCube

Raises

InvalidWeightsError – No valid weights are found for the requested regions

get_scm_timeseries_weights(surface_fraction_cube=None, areacell_scmcube=None, regions=None, cell_weights=None, log_failure=False)¶

Get the scm timeseries weights

Parameters

surface_fraction_cube (ScmCube, optional) – land surface fraction data which is used to determine whether a given gridbox is land or ocean. If None, we try to load the land surface fraction automatically.
areacell_scmcube (ScmCube, optional) – cell area data which is used to take the latitude-longitude mean of the cube’s data. If None, we try to load this data automatically and if that fails we fall back onto iris.analysis.cartography.area_weights.
regions (list[str]) – List of regions to use. If None then netcdf_scm.regions.DEFAULT_REGIONS is used.
cell_weights ({'area-only', 'area-surface-fraction'}) –
How cell weights should be calculated. If 'area-surface-fraction', both cell area and its surface fraction will be used to weight the cell. If 'area-only', only the cell’s area will be used to weight the cell (cells which do not belong to the region are nonetheless excluded). If None, netCDF-SCM will guess whether land surface fraction weights should be included or not based on the data being processed. When guessing, for ocean data, netCDF-SCM will weight cells only by the horizontal area of the cell i.e. no land fraction (see Section L5 of Griffies et al., GMD, 2016, https://doi.org/10.5194/gmd-9-3231-2016). For land variables, netCDF-SCM will weight cells by both thier horizontal area and their land surface fraction. “Yes, you do need to weight the output by land frac (sftlf is the CMIP variable name).” (Chris Jones, personal communication, 18 April 2020). For land variables, note that there seems to be nothing in Jones et al., GMD, 2016 (https://doi.org/10.5194/gmd-9-2853-2016).
log_failure (bool) – Should regions which fail be logged? If no, failures are raised as warnings.

Returns

dict of str – Dictionary of ‘region name’: weights, key: value pairs

Return type

np.ndarray

Notes

Only regions which can be calculated are returned. If no regions can be calculated, an empty dictionary will be returned.

get_variable_constraint()¶

Get the iris variable constraint to use when loading data with self.load_data_from_identifiers.

Returns: constraint to use which ensures that only the variable of interest is loaded.
Return type: iris.Constraint

info¶

Information about the cube’s source files

res["files"] contains the files used to load the data in this cube. res["metadata"] contains information for each of the metadata cubes used to load the data in this cube.

Returns
Return type: dict

lat_dim¶: iris.coords.DimCoord The latitude dimension of the data.

lat_dim_number¶

The index which corresponds to the latitude dimension.

e.g. if latitude is the first dimension of the data, then self.lat_dim_number will be 0 (Python is zero-indexed).

Type: int

lat_lon_shape¶

2D Tuple of int which gives shape of a lat-lon slice of the data

e.g. if the cube’s shape is (4, 3, 5, 4) and its dimensions are (time, lat, depth, lon) then cube.lat_lon_shape will be (3, 4)

Raises: AssertionError – No lat lon slice can be deduced (if this happens, please raise an issue at https://gitlab.com/netcdf-scm/netcdf-scm/issues so we can address your use case).
Type: tuple

load_data_from_identifiers(process_warnings=True, **kwargs)¶

Load data using key identifiers.

The identifiers are used to determine the path of the file to load. The file is then loaded into an iris cube which can be accessed through self.cube.

Parameters

process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?
kwargs (any) – Arguments which can then be processed by self.get_filepath_from_load_data_from_identifiers_args and self.get_variable_constraint to determine the full filepath of the file to load and the variable constraint to use.

load_data_from_path(filepath, process_warnings=True)¶

Load data from a path.

Parameters

filepath (str) – The filepath from which to load the data.
process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?

load_data_in_directory(directory=None, process_warnings=True)¶

Load data in a directory.

The data is loaded into an iris cube which can be accessed through self.cube.

Initially, this method is intended to only be used to load data when it is saved in a number of different timeslice files e.g.:

tas_Amon_HadCM3_rcp45_r1i1p1_200601-203012.nc
tas_Amon_HadCM3_rcp45_r1i1p1_203101-203512.nc
tas_Amon_HadCM3_rcp45_r1i1p1_203601-203812.nc

It is not intended to be used to load multiple different variables or non-continuous timeseries. These use cases could be added in future, but are not required yet so have not been included.

Note that this function removes any attributes which aren’t common between the loaded cubes. In general, we have found that this mainly means creation_date, tracking_id and history are deleted. If unsure, please check.

Parameters

directory (str) – Directory from which to load the data.
process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?

Raises

ValueError – If the files in the directory are not from the same run (i.e. their filenames are not identical except for the timestamp) or if the files don’t form a continuous timeseries.

lon_dim¶: iris.coords.DimCoord The longitude dimension of the data.

lon_dim_number¶

The index which corresponds to the longitude dimension.

e.g. if longitude is the third dimension of the data, then self.lon_dim_number will be 2 (Python is zero-indexed).

Type: int

mip_era = 'CMIP5'¶

The MIP era to which this cube belongs

Type: str

mip_table = None¶

The mip_table for which we want to load data e.g. ‘Amon’

Type: str

model = None¶

The model for which we want to load data e.g. ‘CanESM2’

Type: str

netcdf_scm_realm¶

The realm in which netCDF-SCM thinks the data belongs.

This is used to make decisions about how to take averages of the data and where to find metadata variables.

If it is not sure, netCDF-SCM will guess that the data belongs to the ‘atmosphere’ realm.

Type: str

process_filename(filename)[source]¶

Cut a filename into its identifiers

Parameters: filename (str) – The filename to process. Filename here means just the filename, no path should be included.
Returns: A dictionary where each key is the identifier name and each value is the value of that identifier for the input filename
Return type: dict

process_path(path)[source]¶

Cut a path into its identifiers

Parameters: path (str) – The path to process. Path here means just the path, no filename should be included.
Returns: A dictionary where each key is the identifier name and each value is the value of that identifier for the input path
Return type: dict

root_dir = None¶

The root directory of the database i.e. where the cube should start its path

e.g. /home/users/usertim/cmip5_25x25

Type: str

surface_fraction_var¶

The name of the variable associated with the surface fraction in each gridbox.

If required, this is used when looking for the surface fraction file which belongs to a given data file. For example, if our data file is tas_Amon_HadCM3_rcp45_r1i1p1_200601.nc then surface_fraction_var can be used to work out the name of the associated surface fraction file. In some cases, it might be as simple as replacing tas with the value of surface_fraction_var.

Type: str

table_name_for_metadata_vars¶

The name of the ‘table’ in which metadata variables can be found.

For example, fx or Ofx.

We wrap this as a property as table typically means table_id but is sometimes referred to in other ways e.g. as mip_table in CMIP5.

Type: str

time_dim¶: iris.coords.DimCoord The time dimension of the data.

time_dim_number¶

The index which corresponds to the time dimension.

e.g. if time is the first dimension of the data, then self.time_dim_number will be 0 (Python is zero-indexed).

Type: int

time_period = None¶

The time period for which we want to load data

If None, this information isn’t included in the filename which is useful for loading metadata files which don’t have a relevant time period.

Type: str

time_period_regex¶

Regular expression which captures the timeseries identifier in input data files.

For help on regular expressions, see regular expressions.

Type: _sre.SRE_Pattern

timestamp_definitions¶

Definition of valid timestamp information and corresponding key values.

This follows the CMIP standards where time strings must be one of the following: YYYY, YYYYMM, YYYYMMDD, YYYYMMDDHH or one of the previous combined with a hyphen e.g. YYYY-YYYY.

Each key in the definitions dictionary is the length of the timestamp. Each value is itself a dictionary, with keys:

datetime_str: the string required to convert a timestamp of this length into a datetime using datetime.datetime.strptime
generic_regexp: a regular expression which will match timestamps in this format
expected_timestep: a dateutil.relativedelta.relativedelta object which contains the expected timestep in files with this timestamp

Returns
Return type: dict

Examples

>>> self.timestamp_definitions[len("2012")]["datetime_str"]
"%Y"

variable_name = None¶

The variable for which we want to load data e.g. ‘tas’

Type: str

class netcdf_scm.iris_cube_wrappers.ScmCube[source]¶

Bases: object

Class for processing netCDF files for use with simple climate models.

areacell_var¶

The name of the variable associated with the area of each gridbox.

If required, this is used to determine the area of each cell in a data file. For example, if our data file is tas_Amon_HadCM3_rcp45_r1i1p1_200601.nc then areacell_var can be used to work out the name of the associated cell area file. In some cases, it might be as simple as replacing tas with the value of areacell_var.

Type: str

convert_scm_timeseries_cubes_to_openscmdata(scm_timeseries_cubes, out_calendar=None)[source]¶

Convert dictionary of SCM timeseries cubes to an scmdata.ScmRun

Parameters

scm_timeseries_cubes (dict) – Dictionary of “region name”-ScmCube key-value pairs.
out_calendar (str) – Calendar to use for the time axis of the output

Returns

scmdata.ScmRun containing the data from the SCM timeseries cubes

Return type

scmdata.ScmRun

Raises

NotImplementedError – The (original) input data has dimensions other than time, latitude and longitude (so the data to convert has dimensions other than time).

cube = None¶

The Iris cube which is wrapped by this ScmCube instance.

Type: iris.cube.Cube

dim_names¶

Names of the dimensions in this cube

Here the names are the standard_names which means there can be None in the output.

Type: list

get_area_weights(areacell_scmcube=None)[source]¶

Get area weights for this cube

Parameters

areacell_scmcube (ScmCube) – ScmCube containing areacell data. If None, we calculate the weights using iris.

Returns

Weights on the cube’s latitude-longitude grid.

Return type

np.ndarray

Raises

iris.exceptions.CoordinateMultiDimError – The cube’s co-ordinates are multi-dimensional and we don’t have cell area data.
ValueError – Area weights units are not as expected (contradict with self._area_weights_units).

get_metadata_cube(metadata_variable, cube=None)[source]¶

Load a metadata cube from self’s attributes.

Parameters

metadata_variable (str) – the name of the metadata variable to get, as it appears in the filename.
cube (ScmCube) – Optionally, pass in an already loaded metadata cube to link it to currently loaded cube.

Returns

instance of self which has been loaded from the file containing the metadata variable of interest.

Return type

type(self)

Raises

TypeError – cube is not an ScmCube

get_scm_timeseries(**kwargs)[source]¶

Get SCM relevant timeseries from self.

Parameters: **kwargs – Passed to get_scm_timeseries_cubes()
Returns: scmdata.ScmRun instance with the data in the data attribute and metadata in the metadata attribute.
Return type: scmdata.ScmRun

get_scm_timeseries_cubes(lazy=False, **kwargs)[source]¶

Get SCM relevant cubes

The effective areas used for each of the regions are added as auxillary co-ordinates of each timeseries cube.

If global, Northern Hemisphere and Southern Hemisphere land cubes are calculated, then three auxillary co-ordinates are also added to each cube: land_fraction, land_fraction_northern_hemisphere and land_fraction_southern_hemisphere. These co-ordinates document the area fraction that was considered to be land when the cubes were crunched i.e. land_fraction is the fraction of the entire globe which was considered to be land, land_fraction_northern_hemisphere is the fraction of the Northern Hemisphere which was considered to be land and land_fraction_southern_hemisphere is the fraction of the Southern Hemisphere which was considered to be land.

Parameters

lazy (bool) – Should I process the data lazily? This can be slow as data has to be read off disk multiple time.
kwargs (anys) – Passed to get_scm_timeseries_weights()

Returns

dict of str – Dictionary of cubes (region: cube key: value pairs), with latitude-longitude mean data as appropriate for each of the requested regions.

Return type

ScmCube

Raises

InvalidWeightsError – No valid weights are found for the requested regions

get_scm_timeseries_weights(surface_fraction_cube=None, areacell_scmcube=None, regions=None, cell_weights=None, log_failure=False)[source]¶

Get the scm timeseries weights

Parameters

surface_fraction_cube (ScmCube, optional) – land surface fraction data which is used to determine whether a given gridbox is land or ocean. If None, we try to load the land surface fraction automatically.
areacell_scmcube (ScmCube, optional) – cell area data which is used to take the latitude-longitude mean of the cube’s data. If None, we try to load this data automatically and if that fails we fall back onto iris.analysis.cartography.area_weights.
regions (list[str]) – List of regions to use. If None then netcdf_scm.regions.DEFAULT_REGIONS is used.
cell_weights ({'area-only', 'area-surface-fraction'}) –
How cell weights should be calculated. If 'area-surface-fraction', both cell area and its surface fraction will be used to weight the cell. If 'area-only', only the cell’s area will be used to weight the cell (cells which do not belong to the region are nonetheless excluded). If None, netCDF-SCM will guess whether land surface fraction weights should be included or not based on the data being processed. When guessing, for ocean data, netCDF-SCM will weight cells only by the horizontal area of the cell i.e. no land fraction (see Section L5 of Griffies et al., GMD, 2016, https://doi.org/10.5194/gmd-9-3231-2016). For land variables, netCDF-SCM will weight cells by both thier horizontal area and their land surface fraction. “Yes, you do need to weight the output by land frac (sftlf is the CMIP variable name).” (Chris Jones, personal communication, 18 April 2020). For land variables, note that there seems to be nothing in Jones et al., GMD, 2016 (https://doi.org/10.5194/gmd-9-2853-2016).
log_failure (bool) – Should regions which fail be logged? If no, failures are raised as warnings.

Returns

dict of str – Dictionary of ‘region name’: weights, key: value pairs

Return type

np.ndarray

Notes

Only regions which can be calculated are returned. If no regions can be calculated, an empty dictionary will be returned.

info¶

Information about the cube’s source files

res["files"] contains the files used to load the data in this cube. res["metadata"] contains information for each of the metadata cubes used to load the data in this cube.

Returns
Return type: dict

lat_dim¶: iris.coords.DimCoord The latitude dimension of the data.

lat_dim_number¶

The index which corresponds to the latitude dimension.

e.g. if latitude is the first dimension of the data, then self.lat_dim_number will be 0 (Python is zero-indexed).

Type: int

lat_lon_shape¶

2D Tuple of int which gives shape of a lat-lon slice of the data

e.g. if the cube’s shape is (4, 3, 5, 4) and its dimensions are (time, lat, depth, lon) then cube.lat_lon_shape will be (3, 4)

Raises: AssertionError – No lat lon slice can be deduced (if this happens, please raise an issue at https://gitlab.com/netcdf-scm/netcdf-scm/issues so we can address your use case).
Type: tuple

lat_name = 'latitude'¶

The expected name of the latitude co-ordinate in data.

Type: str

load_data_from_path(filepath, process_warnings=True)[source]¶

Load data from a path.

If you are using the ScmCube class directly, this method simply loads the path into an iris cube which can be accessed through self.cube.

If implemented on a subclass of ScmCube, this method should:

use self.get_load_data_from_identifiers_args_from_filepath to determine the suitable set of arguments to pass to self.load_data_from_identifiers from the filepath
load the data using self.load_data_from_identifiers as this method contains much better checks and helper components

Parameters

filepath (str) – The filepath from which to load the data.
process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?

load_data_in_directory(directory=None, process_warnings=True)[source]¶

Load data in a directory.

The data is loaded into an iris cube which can be accessed through self.cube.

Initially, this method is intended to only be used to load data when it is saved in a number of different timeslice files e.g.:

tas_Amon_HadCM3_rcp45_r1i1p1_200601-203012.nc
tas_Amon_HadCM3_rcp45_r1i1p1_203101-203512.nc
tas_Amon_HadCM3_rcp45_r1i1p1_203601-203812.nc

It is not intended to be used to load multiple different variables or non-continuous timeseries. These use cases could be added in future, but are not required yet so have not been included.

Note that this function removes any attributes which aren’t common between the loaded cubes. In general, we have found that this mainly means creation_date, tracking_id and history are deleted. If unsure, please check.

Parameters

directory (str) – Directory from which to load the data.
process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?

Raises

ValueError – If the files in the directory are not from the same run (i.e. their filenames are not identical except for the timestamp) or if the files don’t form a continuous timeseries.

lon_dim¶: iris.coords.DimCoord The longitude dimension of the data.

lon_dim_number¶

The index which corresponds to the longitude dimension.

e.g. if longitude is the third dimension of the data, then self.lon_dim_number will be 2 (Python is zero-indexed).

Type: int

lon_name = 'longitude'¶

The expected name of the longitude co-ordinate in data.

Type: str

netcdf_scm_realm¶

The realm in which netCDF-SCM thinks the data belongs.

This is used to make decisions about how to take averages of the data and where to find metadata variables.

If it is not sure, netCDF-SCM will guess that the data belongs to the ‘atmosphere’ realm.

Type: str

surface_fraction_var¶

The name of the variable associated with the surface fraction in each gridbox.

If required, this is used when looking for the surface fraction file which belongs to a given data file. For example, if our data file is tas_Amon_HadCM3_rcp45_r1i1p1_200601.nc then surface_fraction_var can be used to work out the name of the associated surface fraction file. In some cases, it might be as simple as replacing tas with the value of surface_fraction_var.

Type: str

table_name_for_metadata_vars¶

The name of the ‘table’ in which metadata variables can be found.

For example, fx or Ofx.

We wrap this as a property as table typically means table_id but is sometimes referred to in other ways e.g. as mip_table in CMIP5.

Type: str

time_dim¶: iris.coords.DimCoord The time dimension of the data.

time_dim_number¶

The index which corresponds to the time dimension.

e.g. if time is the first dimension of the data, then self.time_dim_number will be 0 (Python is zero-indexed).

Type: int

time_name = 'time'¶

The expected name of the time co-ordinate in data.

Type: str

time_period_regex¶

Regular expression which captures the timeseries identifier in input data files.

For help on regular expressions, see regular expressions.

Type: _sre.SRE_Pattern

time_period_separator = '-'¶

Character used to separate time period strings in the time period indicator in filenames.

e.g. - is the ‘time period separator’ in “2015-2030”.

Type: str

timestamp_definitions¶

Definition of valid timestamp information and corresponding key values.

This follows the CMIP standards where time strings must be one of the following: YYYY, YYYYMM, YYYYMMDD, YYYYMMDDHH or one of the previous combined with a hyphen e.g. YYYY-YYYY.

Each key in the definitions dictionary is the length of the timestamp. Each value is itself a dictionary, with keys:

datetime_str: the string required to convert a timestamp of this length into a datetime using datetime.datetime.strptime
generic_regexp: a regular expression which will match timestamps in this format
expected_timestep: a dateutil.relativedelta.relativedelta object which contains the expected timestep in files with this timestamp

Returns
Return type: dict

Examples

>>> self.timestamp_definitions[len("2012")]["datetime_str"]
"%Y"