Iris cube wrappers API¶
Wrappers of the iris cube.
These classes automate handling of a number of netCDF processing steps. For example, finding surface land fraction files, applying regions to data and returning timeseries in key regions for simple climate models.
-
class
netcdf_scm.iris_cube_wrappers.
CMIP6Input4MIPsCube
[source]¶ Bases:
netcdf_scm.iris_cube_wrappers._CMIPCube
Cube which can be used with CMIP6 input4MIPs data
The data must match the CMIP6 Forcing Datasets Summary, specifically the Forcing Dataset Specifications.
-
activity_id
= None¶ The activity_id for which we want to load data.
For these cubes, this will almost always be
input4MIPs
.- Type
-
areacell_var
¶ The name of the variable associated with the area of each gridbox.
If required, this is used to determine the area of each cell in a data file. For example, if our data file is
tas_Amon_HadCM3_rcp45_r1i1p1_200601.nc
thenareacell_var
can be used to work out the name of the associated cell area file. In some cases, it might be as simple as replacingtas
with the value ofareacell_var
.- Type
-
convert_scm_timeseries_cubes_to_openscmdata
(scm_timeseries_cubes, out_calendar=None)¶ Convert dictionary of SCM timeseries cubes to an
scmdata.ScmRun
- Parameters
- Returns
scmdata.ScmRun
containing the data from the SCM timeseries cubes- Return type
scmdata.ScmRun
- Raises
NotImplementedError – The (original) input data has dimensions other than time, latitude and longitude (so the data to convert has dimensions other than time).
-
dataset_category
= None¶ The dataset_category for which we want to load data e.g.
GHGConcentrations
- Type
-
dim_names
¶ Names of the dimensions in this cube
Here the names are the
standard_names
which means there can beNone
in the output.- Type
-
get_area_weights
(areacell_scmcube=None)¶ Get area weights for this cube
- Parameters
areacell_scmcube (
ScmCube
) –ScmCube
containing areacell data. IfNone
, we calculate the weights using iris.- Returns
Weights on the cube’s latitude-longitude grid.
- Return type
np.ndarray
- Raises
iris.exceptions.CoordinateMultiDimError – The cube’s co-ordinates are multi-dimensional and we don’t have cell area data.
ValueError – Area weights units are not as expected (contradict with
self._area_weights_units
).
-
get_data_directory
()¶ Get the path to a data file from self’s attributes.
This can take multiple forms, it may just return a previously set filepath attribute or it could combine a number of different metadata elements (e.g. model name, experiment name) to create the data path.
-
get_data_filename
()¶ Get the name of a data file from self’s attributes.
This can take multiple forms, it may just return a previously set filename attribute or it could combine a number of different metadata elements (e.g. model name, experiment name) to create the data name.
-
classmethod
get_data_reference_syntax
(**kwargs)¶ Get data reference syntax for this cube
-
get_filepath_from_load_data_from_identifiers_args
(**kwargs)[source]¶ Get the full filepath of the data to load from the arguments passed to
self.load_data_from_identifiers
.Full details about the meaning of the identifiers are given in the Forcing Dataset Specifications.
- Parameters
kwargs (str) – Identifiers to use to load the data
- Returns
The full filepath (path and name) of the file to load.
- Return type
- Raises
AttributeError – An input argument does not match with the cube’s data reference syntax
-
get_load_data_from_identifiers_args_from_filepath
(filepath)¶ Get the set of identifiers to use to load data from a filepath.
- Parameters
filepath (str) – The filepath from which to load the data.
- Returns
Set of arguments which can be passed to
self.load_data_from_identifiers
to load the data in the filepath.- Return type
- Raises
ValueError – Path and filename contradict each other
-
get_metadata_cube
(metadata_variable, cube=None)¶ Load a metadata cube from self’s attributes.
- Parameters
- Returns
instance of self which has been loaded from the file containing the metadata variable of interest.
- Return type
type(self)
- Raises
-
get_scm_timeseries
(**kwargs)¶ Get SCM relevant timeseries from
self
.- Parameters
**kwargs – Passed to
get_scm_timeseries_cubes()
- Returns
scmdata.ScmRun
instance with the data in thedata
attribute and metadata in themetadata
attribute.- Return type
scmdata.ScmRun
-
get_scm_timeseries_cubes
(lazy=False, **kwargs)¶ Get SCM relevant cubes
The effective areas used for each of the regions are added as auxillary co-ordinates of each timeseries cube.
If global, Northern Hemisphere and Southern Hemisphere land cubes are calculated, then three auxillary co-ordinates are also added to each cube:
land_fraction
,land_fraction_northern_hemisphere
andland_fraction_southern_hemisphere
. These co-ordinates document the area fraction that was considered to be land when the cubes were crunched i.e.land_fraction
is the fraction of the entire globe which was considered to be land,land_fraction_northern_hemisphere
is the fraction of the Northern Hemisphere which was considered to be land andland_fraction_southern_hemisphere
is the fraction of the Southern Hemisphere which was considered to be land.- Parameters
lazy (bool) – Should I process the data lazily? This can be slow as data has to be read off disk multiple time.
kwargs (anys) – Passed to
get_scm_timeseries_weights()
- Returns
dict of str – Dictionary of cubes (region: cube key: value pairs), with latitude-longitude mean data as appropriate for each of the requested regions.
- Return type
- Raises
InvalidWeightsError – No valid weights are found for the requested regions
-
get_scm_timeseries_weights
(surface_fraction_cube=None, areacell_scmcube=None, regions=None, cell_weights=None, log_failure=False)¶ Get the scm timeseries weights
- Parameters
surface_fraction_cube (
ScmCube
, optional) – land surface fraction data which is used to determine whether a given gridbox is land or ocean. IfNone
, we try to load the land surface fraction automatically.areacell_scmcube (
ScmCube
, optional) – cell area data which is used to take the latitude-longitude mean of the cube’s data. IfNone
, we try to load this data automatically and if that fails we fall back ontoiris.analysis.cartography.area_weights
.regions (list[str]) – List of regions to use. If
None
thennetcdf_scm.regions.DEFAULT_REGIONS
is used.cell_weights ({'area-only', 'area-surface-fraction'}) – How cell weights should be calculated. If
'area-surface-fraction'
, both cell area and its surface fraction will be used to weight the cell. If'area-only'
, only the cell’s area will be used to weight the cell (cells which do not belong to the region are nonetheless excluded). IfNone
, netCDF-SCM will guess whether land surface fraction weights should be included or not based on the data being processed. When guessing, for ocean data, netCDF-SCM will weight cells only by the horizontal area of the cell i.e. no land fraction (see Section L5 of Griffies et al., GMD, 2016, https://doi.org/10.5194/gmd-9-3231-2016). For land variables, netCDF-SCM will weight cells by both thier horizontal area and their land surface fraction. “Yes, you do need to weight the output by land frac (sftlf is the CMIP variable name).” (Chris Jones, personal communication, 18 April 2020). For land variables, note that there seems to be nothing in Jones et al., GMD, 2016 (https://doi.org/10.5194/gmd-9-2853-2016).log_failure (bool) – Should regions which fail be logged? If no, failures are raised as warnings.
- Returns
dict of str – Dictionary of ‘region name’: weights, key: value pairs
- Return type
np.ndarray
Notes
Only regions which can be calculated are returned. If no regions can be calculated, an empty dictionary will be returned.
-
get_variable_constraint
()¶ Get the iris variable constraint to use when loading data with
self.load_data_from_identifiers
.- Returns
constraint to use which ensures that only the variable of interest is loaded.
- Return type
iris.Constraint
-
info
¶ Information about the cube’s source files
res["files"]
contains the files used to load the data in this cube.res["metadata"]
contains information for each of the metadata cubes used to load the data in this cube.- Returns
- Return type
-
lat_dim
¶ iris.coords.DimCoord
The latitude dimension of the data.
-
lat_dim_number
¶ The index which corresponds to the latitude dimension.
e.g. if latitude is the first dimension of the data, then
self.lat_dim_number
will be0
(Python is zero-indexed).- Type
-
lat_lon_shape
¶ 2D Tuple of
int
which gives shape of a lat-lon slice of the datae.g. if the cube’s shape is (4, 3, 5, 4) and its dimensions are (time, lat, depth, lon) then
cube.lat_lon_shape
will be(3, 4)
- Raises
AssertionError – No lat lon slice can be deduced (if this happens, please raise an issue at https://gitlab.com/netcdf-scm/netcdf-scm/issues so we can address your use case).
- Type
-
load_data_from_identifiers
(process_warnings=True, **kwargs)¶ Load data using key identifiers.
The identifiers are used to determine the path of the file to load. The file is then loaded into an iris cube which can be accessed through
self.cube
.- Parameters
process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?
kwargs (any) – Arguments which can then be processed by
self.get_filepath_from_load_data_from_identifiers_args
andself.get_variable_constraint
to determine the full filepath of the file to load and the variable constraint to use.
-
load_data_from_path
(filepath, process_warnings=True)¶ Load data from a path.
-
load_data_in_directory
(directory=None, process_warnings=True)¶ Load data in a directory.
The data is loaded into an iris cube which can be accessed through
self.cube
.Initially, this method is intended to only be used to load data when it is saved in a number of different timeslice files e.g.:
tas_Amon_HadCM3_rcp45_r1i1p1_200601-203012.nc
tas_Amon_HadCM3_rcp45_r1i1p1_203101-203512.nc
tas_Amon_HadCM3_rcp45_r1i1p1_203601-203812.nc
It is not intended to be used to load multiple different variables or non-continuous timeseries. These use cases could be added in future, but are not required yet so have not been included.
Note that this function removes any attributes which aren’t common between the loaded cubes. In general, we have found that this mainly means
creation_date
,tracking_id
andhistory
are deleted. If unsure, please check.- Parameters
- Raises
ValueError – If the files in the directory are not from the same run (i.e. their filenames are not identical except for the timestamp) or if the files don’t form a continuous timeseries.
-
lon_dim
¶ iris.coords.DimCoord
The longitude dimension of the data.
-
lon_dim_number
¶ The index which corresponds to the longitude dimension.
e.g. if longitude is the third dimension of the data, then
self.lon_dim_number
will be2
(Python is zero-indexed).- Type
-
netcdf_scm_realm
¶ The realm in which netCDF-SCM thinks the data belongs.
This is used to make decisions about how to take averages of the data and where to find metadata variables.
If it is not sure, netCDF-SCM will guess that the data belongs to the ‘atmosphere’ realm.
- Type
-
root_dir
= None¶ - The root directory of the database i.e. where the cube should start its
path
e.g.
/home/users/usertim/cmip6input
.- Type
-
source_id
= None¶ The source_id for which we want to load data e.g.
UoM-REMIND-MAGPIE-ssp585-1-2-0
This must include the institution_id.
- Type
-
surface_fraction_var
¶ The name of the variable associated with the surface fraction in each gridbox.
If required, this is used when looking for the surface fraction file which belongs to a given data file. For example, if our data file is
tas_Amon_HadCM3_rcp45_r1i1p1_200601.nc
thensurface_fraction_var
can be used to work out the name of the associated surface fraction file. In some cases, it might be as simple as replacingtas
with the value ofsurface_fraction_var
.- Type
-
table_name_for_metadata_vars
¶ The name of the ‘table’ in which metadata variables can be found.
For example,
fx
orOfx
.We wrap this as a property as table typically means
table_id
but is sometimes referred to in other ways e.g. asmip_table
in CMIP5.- Type
-
time_dim
¶ iris.coords.DimCoord
The time dimension of the data.
-
time_dim_number
¶ The index which corresponds to the time dimension.
e.g. if time is the first dimension of the data, then
self.time_dim_number
will be0
(Python is zero-indexed).- Type
-
time_period_regex
¶ Regular expression which captures the timeseries identifier in input data files.
For help on regular expressions, see regular expressions.
- Type
_sre.SRE_Pattern
-
time_range
= None¶ The time range for which we want to load data e.g.
2005-2100
If
None
, this information isn’t included in the filename which is useful for loading metadata files which don’t have a relevant time period.- Type
-
timestamp_definitions
¶ Definition of valid timestamp information and corresponding key values.
This follows the CMIP standards where time strings must be one of the following: YYYY, YYYYMM, YYYYMMDD, YYYYMMDDHH or one of the previous combined with a hyphen e.g. YYYY-YYYY.
Each key in the definitions dictionary is the length of the timestamp. Each value is itself a dictionary, with keys:
datetime_str: the string required to convert a timestamp of this length into a datetime using
datetime.datetime.strptime
generic_regexp: a regular expression which will match timestamps in this format
expected_timestep: a
dateutil.relativedelta.relativedelta
object which contains the expected timestep in files with this timestamp
- Returns
- Return type
Examples
>>> self.timestamp_definitions[len("2012")]["datetime_str"] "%Y"
-
variable_id
= None¶ The variable_id for which we want to load data e.g.
mole-fraction-of-carbon-dioxide-in-air
- Type
-
-
class
netcdf_scm.iris_cube_wrappers.
CMIP6OutputCube
[source]¶ Bases:
netcdf_scm.iris_cube_wrappers._CMIPCube
Cube which can be used with CMIP6 model output data
The data must match the CMIP6 data reference syntax as specified in the ‘File name template’ and ‘Directory structure template’ sections of the CMIP6 Data Reference Syntax.
-
activity_id
= None¶ The activity_id for which we want to load data.
In CMIP6, this denotes the responsible MIP e.g.
DCPP
.- Type
-
areacell_var
¶ The name of the variable associated with the area of each gridbox.
If required, this is used to determine the area of each cell in a data file. For example, if our data file is
tas_Amon_HadCM3_rcp45_r1i1p1_200601.nc
thenareacell_var
can be used to work out the name of the associated cell area file. In some cases, it might be as simple as replacingtas
with the value ofareacell_var
.- Type
-
convert_scm_timeseries_cubes_to_openscmdata
(scm_timeseries_cubes, out_calendar=None)¶ Convert dictionary of SCM timeseries cubes to an
scmdata.ScmRun
- Parameters
- Returns
scmdata.ScmRun
containing the data from the SCM timeseries cubes- Return type
scmdata.ScmRun
- Raises
NotImplementedError – The (original) input data has dimensions other than time, latitude and longitude (so the data to convert has dimensions other than time).
-
dim_names
¶ Names of the dimensions in this cube
Here the names are the
standard_names
which means there can beNone
in the output.- Type
-
get_area_weights
(areacell_scmcube=None)¶ Get area weights for this cube
- Parameters
areacell_scmcube (
ScmCube
) –ScmCube
containing areacell data. IfNone
, we calculate the weights using iris.- Returns
Weights on the cube’s latitude-longitude grid.
- Return type
np.ndarray
- Raises
iris.exceptions.CoordinateMultiDimError – The cube’s co-ordinates are multi-dimensional and we don’t have cell area data.
ValueError – Area weights units are not as expected (contradict with
self._area_weights_units
).
-
get_data_directory
()¶ Get the path to a data file from self’s attributes.
This can take multiple forms, it may just return a previously set filepath attribute or it could combine a number of different metadata elements (e.g. model name, experiment name) to create the data path.
-
get_data_filename
()¶ Get the name of a data file from self’s attributes.
This can take multiple forms, it may just return a previously set filename attribute or it could combine a number of different metadata elements (e.g. model name, experiment name) to create the data name.
-
classmethod
get_data_reference_syntax
(**kwargs)¶ Get data reference syntax for this cube
-
get_filepath_from_load_data_from_identifiers_args
(**kwargs)[source]¶ Get the full filepath of the data to load from the arguments passed to
self.load_data_from_identifiers
.Full details about the meaning of each identifier is given in Table 1 of the CMIP6 Data Reference Syntax.
- Parameters
kwargs (str) – Identifiers to use to load the data
- Returns
The full filepath (path and name) of the file to load.
- Return type
- Raises
AttributeError – An input argument does not match with the cube’s data reference syntax
-
classmethod
get_instance_id
(filepath)[source]¶ Get the instance_id from a given path
This is used as a unique identifier for datasets on the ESGF.
-
get_load_data_from_identifiers_args_from_filepath
(filepath)¶ Get the set of identifiers to use to load data from a filepath.
- Parameters
filepath (str) – The filepath from which to load the data.
- Returns
Set of arguments which can be passed to
self.load_data_from_identifiers
to load the data in the filepath.- Return type
- Raises
ValueError – Path and filename contradict each other
-
get_metadata_cube
(metadata_variable, cube=None)¶ Load a metadata cube from self’s attributes.
- Parameters
- Returns
instance of self which has been loaded from the file containing the metadata variable of interest.
- Return type
type(self)
- Raises
-
get_scm_timeseries
(**kwargs)¶ Get SCM relevant timeseries from
self
.- Parameters
**kwargs – Passed to
get_scm_timeseries_cubes()
- Returns
scmdata.ScmRun
instance with the data in thedata
attribute and metadata in themetadata
attribute.- Return type
scmdata.ScmRun
-
get_scm_timeseries_cubes
(lazy=False, **kwargs)¶ Get SCM relevant cubes
The effective areas used for each of the regions are added as auxillary co-ordinates of each timeseries cube.
If global, Northern Hemisphere and Southern Hemisphere land cubes are calculated, then three auxillary co-ordinates are also added to each cube:
land_fraction
,land_fraction_northern_hemisphere
andland_fraction_southern_hemisphere
. These co-ordinates document the area fraction that was considered to be land when the cubes were crunched i.e.land_fraction
is the fraction of the entire globe which was considered to be land,land_fraction_northern_hemisphere
is the fraction of the Northern Hemisphere which was considered to be land andland_fraction_southern_hemisphere
is the fraction of the Southern Hemisphere which was considered to be land.- Parameters
lazy (bool) – Should I process the data lazily? This can be slow as data has to be read off disk multiple time.
kwargs (anys) – Passed to
get_scm_timeseries_weights()
- Returns
dict of str – Dictionary of cubes (region: cube key: value pairs), with latitude-longitude mean data as appropriate for each of the requested regions.
- Return type
- Raises
InvalidWeightsError – No valid weights are found for the requested regions
-
get_scm_timeseries_weights
(surface_fraction_cube=None, areacell_scmcube=None, regions=None, cell_weights=None, log_failure=False)¶ Get the scm timeseries weights
- Parameters
surface_fraction_cube (
ScmCube
, optional) – land surface fraction data which is used to determine whether a given gridbox is land or ocean. IfNone
, we try to load the land surface fraction automatically.areacell_scmcube (
ScmCube
, optional) – cell area data which is used to take the latitude-longitude mean of the cube’s data. IfNone
, we try to load this data automatically and if that fails we fall back ontoiris.analysis.cartography.area_weights
.regions (list[str]) – List of regions to use. If
None
thennetcdf_scm.regions.DEFAULT_REGIONS
is used.cell_weights ({'area-only', 'area-surface-fraction'}) –
How cell weights should be calculated. If
'area-surface-fraction'
, both cell area and its surface fraction will be used to weight the cell. If'area-only'
, only the cell’s area will be used to weight the cell (cells which do not belong to the region are nonetheless excluded). IfNone
, netCDF-SCM will guess whether land surface fraction weights should be included or not based on the data being processed. When guessing, for ocean data, netCDF-SCM will weight cells only by the horizontal area of the cell i.e. no land fraction (see Section L5 of Griffies et al., GMD, 2016, https://doi.org/10.5194/gmd-9-3231-2016). For land variables, netCDF-SCM will weight cells by both thier horizontal area and their land surface fraction. “Yes, you do need to weight the output by land frac (sftlf is the CMIP variable name).” (Chris Jones, personal communication, 18 April 2020). For land variables, note that there seems to be nothing in Jones et al., GMD, 2016 (https://doi.org/10.5194/gmd-9-2853-2016).log_failure (bool) – Should regions which fail be logged? If no, failures are raised as warnings.
- Returns
dict of str – Dictionary of ‘region name’: weights, key: value pairs
- Return type
np.ndarray
Notes
Only regions which can be calculated are returned. If no regions can be calculated, an empty dictionary will be returned.
-
get_variable_constraint
()¶ Get the iris variable constraint to use when loading data with
self.load_data_from_identifiers
.- Returns
constraint to use which ensures that only the variable of interest is loaded.
- Return type
iris.Constraint
-
info
¶ Information about the cube’s source files
res["files"]
contains the files used to load the data in this cube.res["metadata"]
contains information for each of the metadata cubes used to load the data in this cube.- Returns
- Return type
-
lat_dim
¶ iris.coords.DimCoord
The latitude dimension of the data.
-
lat_dim_number
¶ The index which corresponds to the latitude dimension.
e.g. if latitude is the first dimension of the data, then
self.lat_dim_number
will be0
(Python is zero-indexed).- Type
-
lat_lon_shape
¶ 2D Tuple of
int
which gives shape of a lat-lon slice of the datae.g. if the cube’s shape is (4, 3, 5, 4) and its dimensions are (time, lat, depth, lon) then
cube.lat_lon_shape
will be(3, 4)
- Raises
AssertionError – No lat lon slice can be deduced (if this happens, please raise an issue at https://gitlab.com/netcdf-scm/netcdf-scm/issues so we can address your use case).
- Type
-
load_data_from_identifiers
(process_warnings=True, **kwargs)¶ Load data using key identifiers.
The identifiers are used to determine the path of the file to load. The file is then loaded into an iris cube which can be accessed through
self.cube
.- Parameters
process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?
kwargs (any) – Arguments which can then be processed by
self.get_filepath_from_load_data_from_identifiers_args
andself.get_variable_constraint
to determine the full filepath of the file to load and the variable constraint to use.
-
load_data_from_path
(filepath, process_warnings=True)¶ Load data from a path.
-
load_data_in_directory
(directory=None, process_warnings=True)¶ Load data in a directory.
The data is loaded into an iris cube which can be accessed through
self.cube
.Initially, this method is intended to only be used to load data when it is saved in a number of different timeslice files e.g.:
tas_Amon_HadCM3_rcp45_r1i1p1_200601-203012.nc
tas_Amon_HadCM3_rcp45_r1i1p1_203101-203512.nc
tas_Amon_HadCM3_rcp45_r1i1p1_203601-203812.nc
It is not intended to be used to load multiple different variables or non-continuous timeseries. These use cases could be added in future, but are not required yet so have not been included.
Note that this function removes any attributes which aren’t common between the loaded cubes. In general, we have found that this mainly means
creation_date
,tracking_id
andhistory
are deleted. If unsure, please check.- Parameters
- Raises
ValueError – If the files in the directory are not from the same run (i.e. their filenames are not identical except for the timestamp) or if the files don’t form a continuous timeseries.
-
lon_dim
¶ iris.coords.DimCoord
The longitude dimension of the data.
-
lon_dim_number
¶ The index which corresponds to the longitude dimension.
e.g. if longitude is the third dimension of the data, then
self.lon_dim_number
will be2
(Python is zero-indexed).- Type
-
netcdf_scm_realm
¶ The realm in which netCDF-SCM thinks the data belongs.
This is used to make decisions about how to take averages of the data and where to find metadata variables.
If it is not sure, netCDF-SCM will guess that the data belongs to the ‘atmosphere’ realm.
- Type
-
root_dir
= None¶ - The root directory of the database i.e. where the cube should start its
path
e.g.
/home/users/usertim/cmip6_data
.- Type
-
source_id
= None¶ The source_id for which we want to load data e.g.
CNRM-CM6-1
This was known as model in CMIP5.
- Type
-
surface_fraction_var
¶ The name of the variable associated with the surface fraction in each gridbox.
If required, this is used when looking for the surface fraction file which belongs to a given data file. For example, if our data file is
tas_Amon_HadCM3_rcp45_r1i1p1_200601.nc
thensurface_fraction_var
can be used to work out the name of the associated surface fraction file. In some cases, it might be as simple as replacingtas
with the value ofsurface_fraction_var
.- Type
-
table_name_for_metadata_vars
¶ The name of the ‘table’ in which metadata variables can be found.
For example,
fx
orOfx
.We wrap this as a property as table typically means
table_id
but is sometimes referred to in other ways e.g. asmip_table
in CMIP5.- Type
-
time_dim
¶ iris.coords.DimCoord
The time dimension of the data.
-
time_dim_number
¶ The index which corresponds to the time dimension.
e.g. if time is the first dimension of the data, then
self.time_dim_number
will be0
(Python is zero-indexed).- Type
-
time_period_regex
¶ Regular expression which captures the timeseries identifier in input data files.
For help on regular expressions, see regular expressions.
- Type
_sre.SRE_Pattern
-
time_range
= None¶ The time range for which we want to load data e.g.
198001-198412
If
None
, this information isn’t included in the filename which is useful for loading metadata files which don’t have a relevant time period.- Type
-
timestamp_definitions
¶ Definition of valid timestamp information and corresponding key values.
This follows the CMIP standards where time strings must be one of the following: YYYY, YYYYMM, YYYYMMDD, YYYYMMDDHH or one of the previous combined with a hyphen e.g. YYYY-YYYY.
Each key in the definitions dictionary is the length of the timestamp. Each value is itself a dictionary, with keys:
datetime_str: the string required to convert a timestamp of this length into a datetime using
datetime.datetime.strptime
generic_regexp: a regular expression which will match timestamps in this format
expected_timestep: a
dateutil.relativedelta.relativedelta
object which contains the expected timestep in files with this timestamp
- Returns
- Return type
Examples
>>> self.timestamp_definitions[len("2012")]["datetime_str"] "%Y"
-
-
class
netcdf_scm.iris_cube_wrappers.
MarbleCMIP5Cube
[source]¶ Bases:
netcdf_scm.iris_cube_wrappers._CMIPCube
Cube which can be used with the
cmip5
directory on marble (identical to ETH Zurich’s archive).This directory structure is very similar, but not quite identical, to the recommended CMIP5 directory structure described in section 3.1 of the CMIP5 Data Reference Syntax.
-
areacell_var
¶ The name of the variable associated with the area of each gridbox.
If required, this is used to determine the area of each cell in a data file. For example, if our data file is
tas_Amon_HadCM3_rcp45_r1i1p1_200601.nc
thenareacell_var
can be used to work out the name of the associated cell area file. In some cases, it might be as simple as replacingtas
with the value ofareacell_var
.- Type
-
convert_scm_timeseries_cubes_to_openscmdata
(scm_timeseries_cubes, out_calendar=None)¶ Convert dictionary of SCM timeseries cubes to an
scmdata.ScmRun
- Parameters
- Returns
scmdata.ScmRun
containing the data from the SCM timeseries cubes- Return type
scmdata.ScmRun
- Raises
NotImplementedError – The (original) input data has dimensions other than time, latitude and longitude (so the data to convert has dimensions other than time).
-
dim_names
¶ Names of the dimensions in this cube
Here the names are the
standard_names
which means there can beNone
in the output.- Type
-
get_area_weights
(areacell_scmcube=None)¶ Get area weights for this cube
- Parameters
areacell_scmcube (
ScmCube
) –ScmCube
containing areacell data. IfNone
, we calculate the weights using iris.- Returns
Weights on the cube’s latitude-longitude grid.
- Return type
np.ndarray
- Raises
iris.exceptions.CoordinateMultiDimError – The cube’s co-ordinates are multi-dimensional and we don’t have cell area data.
ValueError – Area weights units are not as expected (contradict with
self._area_weights_units
).
-
get_data_directory
()¶ Get the path to a data file from self’s attributes.
This can take multiple forms, it may just return a previously set filepath attribute or it could combine a number of different metadata elements (e.g. model name, experiment name) to create the data path.
-
get_data_filename
()¶ Get the name of a data file from self’s attributes.
This can take multiple forms, it may just return a previously set filename attribute or it could combine a number of different metadata elements (e.g. model name, experiment name) to create the data name.
-
classmethod
get_data_reference_syntax
(**kwargs)¶ Get data reference syntax for this cube
-
get_filepath_from_load_data_from_identifiers_args
(**kwargs)[source]¶ Get the full filepath of the data to load from the arguments passed to
self.load_data_from_identifiers
.Full details about the identifiers are given in Section 2 of the CMIP5 Data Reference Syntax.
- Parameters
kwargs (str) – Identifiers to use to load the data
- Returns
The full filepath (path and name) of the file to load.
- Return type
- Raises
AttributeError – An input argument does not match with the cube’s data reference syntax
-
get_load_data_from_identifiers_args_from_filepath
(filepath)¶ Get the set of identifiers to use to load data from a filepath.
- Parameters
filepath (str) – The filepath from which to load the data.
- Returns
Set of arguments which can be passed to
self.load_data_from_identifiers
to load the data in the filepath.- Return type
- Raises
ValueError – Path and filename contradict each other
-
get_metadata_cube
(metadata_variable, cube=None)¶ Load a metadata cube from self’s attributes.
- Parameters
- Returns
instance of self which has been loaded from the file containing the metadata variable of interest.
- Return type
type(self)
- Raises
-
get_scm_timeseries
(**kwargs)¶ Get SCM relevant timeseries from
self
.- Parameters
**kwargs – Passed to
get_scm_timeseries_cubes()
- Returns
scmdata.ScmRun
instance with the data in thedata
attribute and metadata in themetadata
attribute.- Return type
scmdata.ScmRun
-
get_scm_timeseries_cubes
(lazy=False, **kwargs)¶ Get SCM relevant cubes
The effective areas used for each of the regions are added as auxillary co-ordinates of each timeseries cube.
If global, Northern Hemisphere and Southern Hemisphere land cubes are calculated, then three auxillary co-ordinates are also added to each cube:
land_fraction
,land_fraction_northern_hemisphere
andland_fraction_southern_hemisphere
. These co-ordinates document the area fraction that was considered to be land when the cubes were crunched i.e.land_fraction
is the fraction of the entire globe which was considered to be land,land_fraction_northern_hemisphere
is the fraction of the Northern Hemisphere which was considered to be land andland_fraction_southern_hemisphere
is the fraction of the Southern Hemisphere which was considered to be land.- Parameters
lazy (bool) – Should I process the data lazily? This can be slow as data has to be read off disk multiple time.
kwargs (anys) – Passed to
get_scm_timeseries_weights()
- Returns
dict of str – Dictionary of cubes (region: cube key: value pairs), with latitude-longitude mean data as appropriate for each of the requested regions.
- Return type
- Raises
InvalidWeightsError – No valid weights are found for the requested regions
-
get_scm_timeseries_weights
(surface_fraction_cube=None, areacell_scmcube=None, regions=None, cell_weights=None, log_failure=False)¶ Get the scm timeseries weights
- Parameters
surface_fraction_cube (
ScmCube
, optional) – land surface fraction data which is used to determine whether a given gridbox is land or ocean. IfNone
, we try to load the land surface fraction automatically.areacell_scmcube (
ScmCube
, optional) – cell area data which is used to take the latitude-longitude mean of the cube’s data. IfNone
, we try to load this data automatically and if that fails we fall back ontoiris.analysis.cartography.area_weights
.regions (list[str]) – List of regions to use. If
None
thennetcdf_scm.regions.DEFAULT_REGIONS
is used.cell_weights ({'area-only', 'area-surface-fraction'}) –
How cell weights should be calculated. If
'area-surface-fraction'
, both cell area and its surface fraction will be used to weight the cell. If'area-only'
, only the cell’s area will be used to weight the cell (cells which do not belong to the region are nonetheless excluded). IfNone
, netCDF-SCM will guess whether land surface fraction weights should be included or not based on the data being processed. When guessing, for ocean data, netCDF-SCM will weight cells only by the horizontal area of the cell i.e. no land fraction (see Section L5 of Griffies et al., GMD, 2016, https://doi.org/10.5194/gmd-9-3231-2016). For land variables, netCDF-SCM will weight cells by both thier horizontal area and their land surface fraction. “Yes, you do need to weight the output by land frac (sftlf is the CMIP variable name).” (Chris Jones, personal communication, 18 April 2020). For land variables, note that there seems to be nothing in Jones et al., GMD, 2016 (https://doi.org/10.5194/gmd-9-2853-2016).log_failure (bool) – Should regions which fail be logged? If no, failures are raised as warnings.
- Returns
dict of str – Dictionary of ‘region name’: weights, key: value pairs
- Return type
np.ndarray
Notes
Only regions which can be calculated are returned. If no regions can be calculated, an empty dictionary will be returned.
-
get_variable_constraint
()¶ Get the iris variable constraint to use when loading data with
self.load_data_from_identifiers
.- Returns
constraint to use which ensures that only the variable of interest is loaded.
- Return type
iris.Constraint
-
info
¶ Information about the cube’s source files
res["files"]
contains the files used to load the data in this cube.res["metadata"]
contains information for each of the metadata cubes used to load the data in this cube.- Returns
- Return type
-
lat_dim
¶ iris.coords.DimCoord
The latitude dimension of the data.
-
lat_dim_number
¶ The index which corresponds to the latitude dimension.
e.g. if latitude is the first dimension of the data, then
self.lat_dim_number
will be0
(Python is zero-indexed).- Type
-
lat_lon_shape
¶ 2D Tuple of
int
which gives shape of a lat-lon slice of the datae.g. if the cube’s shape is (4, 3, 5, 4) and its dimensions are (time, lat, depth, lon) then
cube.lat_lon_shape
will be(3, 4)
- Raises
AssertionError – No lat lon slice can be deduced (if this happens, please raise an issue at https://gitlab.com/netcdf-scm/netcdf-scm/issues so we can address your use case).
- Type
-
load_data_from_identifiers
(process_warnings=True, **kwargs)¶ Load data using key identifiers.
The identifiers are used to determine the path of the file to load. The file is then loaded into an iris cube which can be accessed through
self.cube
.- Parameters
process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?
kwargs (any) – Arguments which can then be processed by
self.get_filepath_from_load_data_from_identifiers_args
andself.get_variable_constraint
to determine the full filepath of the file to load and the variable constraint to use.
-
load_data_from_path
(filepath, process_warnings=True)¶ Load data from a path.
-
load_data_in_directory
(directory=None, process_warnings=True)¶ Load data in a directory.
The data is loaded into an iris cube which can be accessed through
self.cube
.Initially, this method is intended to only be used to load data when it is saved in a number of different timeslice files e.g.:
tas_Amon_HadCM3_rcp45_r1i1p1_200601-203012.nc
tas_Amon_HadCM3_rcp45_r1i1p1_203101-203512.nc
tas_Amon_HadCM3_rcp45_r1i1p1_203601-203812.nc
It is not intended to be used to load multiple different variables or non-continuous timeseries. These use cases could be added in future, but are not required yet so have not been included.
Note that this function removes any attributes which aren’t common between the loaded cubes. In general, we have found that this mainly means
creation_date
,tracking_id
andhistory
are deleted. If unsure, please check.- Parameters
- Raises
ValueError – If the files in the directory are not from the same run (i.e. their filenames are not identical except for the timestamp) or if the files don’t form a continuous timeseries.
-
lon_dim
¶ iris.coords.DimCoord
The longitude dimension of the data.
-
lon_dim_number
¶ The index which corresponds to the longitude dimension.
e.g. if longitude is the third dimension of the data, then
self.lon_dim_number
will be2
(Python is zero-indexed).- Type
-
netcdf_scm_realm
¶ The realm in which netCDF-SCM thinks the data belongs.
This is used to make decisions about how to take averages of the data and where to find metadata variables.
If it is not sure, netCDF-SCM will guess that the data belongs to the ‘atmosphere’ realm.
- Type
-
root_dir
= None¶ The root directory of the database i.e. where the cube should start its path
e.g.
/home/users/usertim/cmip5_25x25
- Type
-
surface_fraction_var
¶ The name of the variable associated with the surface fraction in each gridbox.
If required, this is used when looking for the surface fraction file which belongs to a given data file. For example, if our data file is
tas_Amon_HadCM3_rcp45_r1i1p1_200601.nc
thensurface_fraction_var
can be used to work out the name of the associated surface fraction file. In some cases, it might be as simple as replacingtas
with the value ofsurface_fraction_var
.- Type
-
table_name_for_metadata_vars
¶ The name of the ‘table’ in which metadata variables can be found.
For example,
fx
orOfx
.We wrap this as a property as table typically means
table_id
but is sometimes referred to in other ways e.g. asmip_table
in CMIP5.- Type
-
time_dim
¶ iris.coords.DimCoord
The time dimension of the data.
-
time_dim_number
¶ The index which corresponds to the time dimension.
e.g. if time is the first dimension of the data, then
self.time_dim_number
will be0
(Python is zero-indexed).- Type
-
time_period
= None¶ The time period for which we want to load data
If
None
, this information isn’t included in the filename which is useful for loading metadata files which don’t have a relevant time period.- Type
-
time_period_regex
¶ Regular expression which captures the timeseries identifier in input data files.
For help on regular expressions, see regular expressions.
- Type
_sre.SRE_Pattern
-
timestamp_definitions
¶ Definition of valid timestamp information and corresponding key values.
This follows the CMIP standards where time strings must be one of the following: YYYY, YYYYMM, YYYYMMDD, YYYYMMDDHH or one of the previous combined with a hyphen e.g. YYYY-YYYY.
Each key in the definitions dictionary is the length of the timestamp. Each value is itself a dictionary, with keys:
datetime_str: the string required to convert a timestamp of this length into a datetime using
datetime.datetime.strptime
generic_regexp: a regular expression which will match timestamps in this format
expected_timestep: a
dateutil.relativedelta.relativedelta
object which contains the expected timestep in files with this timestamp
- Returns
- Return type
Examples
>>> self.timestamp_definitions[len("2012")]["datetime_str"] "%Y"
-
-
class
netcdf_scm.iris_cube_wrappers.
ScmCube
[source]¶ Bases:
object
Class for processing netCDF files for use with simple climate models.
-
areacell_var
¶ The name of the variable associated with the area of each gridbox.
If required, this is used to determine the area of each cell in a data file. For example, if our data file is
tas_Amon_HadCM3_rcp45_r1i1p1_200601.nc
thenareacell_var
can be used to work out the name of the associated cell area file. In some cases, it might be as simple as replacingtas
with the value ofareacell_var
.- Type
-
convert_scm_timeseries_cubes_to_openscmdata
(scm_timeseries_cubes, out_calendar=None)[source]¶ Convert dictionary of SCM timeseries cubes to an
scmdata.ScmRun
- Parameters
- Returns
scmdata.ScmRun
containing the data from the SCM timeseries cubes- Return type
scmdata.ScmRun
- Raises
NotImplementedError – The (original) input data has dimensions other than time, latitude and longitude (so the data to convert has dimensions other than time).
-
dim_names
¶ Names of the dimensions in this cube
Here the names are the
standard_names
which means there can beNone
in the output.- Type
-
get_area_weights
(areacell_scmcube=None)[source]¶ Get area weights for this cube
- Parameters
areacell_scmcube (
ScmCube
) –ScmCube
containing areacell data. IfNone
, we calculate the weights using iris.- Returns
Weights on the cube’s latitude-longitude grid.
- Return type
np.ndarray
- Raises
iris.exceptions.CoordinateMultiDimError – The cube’s co-ordinates are multi-dimensional and we don’t have cell area data.
ValueError – Area weights units are not as expected (contradict with
self._area_weights_units
).
-
get_metadata_cube
(metadata_variable, cube=None)[source]¶ Load a metadata cube from self’s attributes.
- Parameters
- Returns
instance of self which has been loaded from the file containing the metadata variable of interest.
- Return type
type(self)
- Raises
-
get_scm_timeseries
(**kwargs)[source]¶ Get SCM relevant timeseries from
self
.- Parameters
**kwargs – Passed to
get_scm_timeseries_cubes()
- Returns
scmdata.ScmRun
instance with the data in thedata
attribute and metadata in themetadata
attribute.- Return type
scmdata.ScmRun
-
get_scm_timeseries_cubes
(lazy=False, **kwargs)[source]¶ Get SCM relevant cubes
The effective areas used for each of the regions are added as auxillary co-ordinates of each timeseries cube.
If global, Northern Hemisphere and Southern Hemisphere land cubes are calculated, then three auxillary co-ordinates are also added to each cube:
land_fraction
,land_fraction_northern_hemisphere
andland_fraction_southern_hemisphere
. These co-ordinates document the area fraction that was considered to be land when the cubes were crunched i.e.land_fraction
is the fraction of the entire globe which was considered to be land,land_fraction_northern_hemisphere
is the fraction of the Northern Hemisphere which was considered to be land andland_fraction_southern_hemisphere
is the fraction of the Southern Hemisphere which was considered to be land.- Parameters
lazy (bool) – Should I process the data lazily? This can be slow as data has to be read off disk multiple time.
kwargs (anys) – Passed to
get_scm_timeseries_weights()
- Returns
dict of str – Dictionary of cubes (region: cube key: value pairs), with latitude-longitude mean data as appropriate for each of the requested regions.
- Return type
- Raises
InvalidWeightsError – No valid weights are found for the requested regions
-
get_scm_timeseries_weights
(surface_fraction_cube=None, areacell_scmcube=None, regions=None, cell_weights=None, log_failure=False)[source]¶ Get the scm timeseries weights
- Parameters
surface_fraction_cube (
ScmCube
, optional) – land surface fraction data which is used to determine whether a given gridbox is land or ocean. IfNone
, we try to load the land surface fraction automatically.areacell_scmcube (
ScmCube
, optional) – cell area data which is used to take the latitude-longitude mean of the cube’s data. IfNone
, we try to load this data automatically and if that fails we fall back ontoiris.analysis.cartography.area_weights
.regions (list[str]) – List of regions to use. If
None
thennetcdf_scm.regions.DEFAULT_REGIONS
is used.cell_weights ({'area-only', 'area-surface-fraction'}) –
How cell weights should be calculated. If
'area-surface-fraction'
, both cell area and its surface fraction will be used to weight the cell. If'area-only'
, only the cell’s area will be used to weight the cell (cells which do not belong to the region are nonetheless excluded). IfNone
, netCDF-SCM will guess whether land surface fraction weights should be included or not based on the data being processed. When guessing, for ocean data, netCDF-SCM will weight cells only by the horizontal area of the cell i.e. no land fraction (see Section L5 of Griffies et al., GMD, 2016, https://doi.org/10.5194/gmd-9-3231-2016). For land variables, netCDF-SCM will weight cells by both thier horizontal area and their land surface fraction. “Yes, you do need to weight the output by land frac (sftlf is the CMIP variable name).” (Chris Jones, personal communication, 18 April 2020). For land variables, note that there seems to be nothing in Jones et al., GMD, 2016 (https://doi.org/10.5194/gmd-9-2853-2016).log_failure (bool) – Should regions which fail be logged? If no, failures are raised as warnings.
- Returns
dict of str – Dictionary of ‘region name’: weights, key: value pairs
- Return type
np.ndarray
Notes
Only regions which can be calculated are returned. If no regions can be calculated, an empty dictionary will be returned.
-
info
¶ Information about the cube’s source files
res["files"]
contains the files used to load the data in this cube.res["metadata"]
contains information for each of the metadata cubes used to load the data in this cube.- Returns
- Return type
-
lat_dim
¶ iris.coords.DimCoord
The latitude dimension of the data.
-
lat_dim_number
¶ The index which corresponds to the latitude dimension.
e.g. if latitude is the first dimension of the data, then
self.lat_dim_number
will be0
(Python is zero-indexed).- Type
-
lat_lon_shape
¶ 2D Tuple of
int
which gives shape of a lat-lon slice of the datae.g. if the cube’s shape is (4, 3, 5, 4) and its dimensions are (time, lat, depth, lon) then
cube.lat_lon_shape
will be(3, 4)
- Raises
AssertionError – No lat lon slice can be deduced (if this happens, please raise an issue at https://gitlab.com/netcdf-scm/netcdf-scm/issues so we can address your use case).
- Type
-
load_data_from_path
(filepath, process_warnings=True)[source]¶ Load data from a path.
If you are using the
ScmCube
class directly, this method simply loads the path into an iris cube which can be accessed throughself.cube
.If implemented on a subclass of
ScmCube
, this method should:use
self.get_load_data_from_identifiers_args_from_filepath
to determine the suitable set of arguments to pass toself.load_data_from_identifiers
from the filepathload the data using
self.load_data_from_identifiers
as this method contains much better checks and helper components
-
load_data_in_directory
(directory=None, process_warnings=True)[source]¶ Load data in a directory.
The data is loaded into an iris cube which can be accessed through
self.cube
.Initially, this method is intended to only be used to load data when it is saved in a number of different timeslice files e.g.:
tas_Amon_HadCM3_rcp45_r1i1p1_200601-203012.nc
tas_Amon_HadCM3_rcp45_r1i1p1_203101-203512.nc
tas_Amon_HadCM3_rcp45_r1i1p1_203601-203812.nc
It is not intended to be used to load multiple different variables or non-continuous timeseries. These use cases could be added in future, but are not required yet so have not been included.
Note that this function removes any attributes which aren’t common between the loaded cubes. In general, we have found that this mainly means
creation_date
,tracking_id
andhistory
are deleted. If unsure, please check.- Parameters
- Raises
ValueError – If the files in the directory are not from the same run (i.e. their filenames are not identical except for the timestamp) or if the files don’t form a continuous timeseries.
-
lon_dim
¶ iris.coords.DimCoord
The longitude dimension of the data.
-
lon_dim_number
¶ The index which corresponds to the longitude dimension.
e.g. if longitude is the third dimension of the data, then
self.lon_dim_number
will be2
(Python is zero-indexed).- Type
-
netcdf_scm_realm
¶ The realm in which netCDF-SCM thinks the data belongs.
This is used to make decisions about how to take averages of the data and where to find metadata variables.
If it is not sure, netCDF-SCM will guess that the data belongs to the ‘atmosphere’ realm.
- Type
-
surface_fraction_var
¶ The name of the variable associated with the surface fraction in each gridbox.
If required, this is used when looking for the surface fraction file which belongs to a given data file. For example, if our data file is
tas_Amon_HadCM3_rcp45_r1i1p1_200601.nc
thensurface_fraction_var
can be used to work out the name of the associated surface fraction file. In some cases, it might be as simple as replacingtas
with the value ofsurface_fraction_var
.- Type
-
table_name_for_metadata_vars
¶ The name of the ‘table’ in which metadata variables can be found.
For example,
fx
orOfx
.We wrap this as a property as table typically means
table_id
but is sometimes referred to in other ways e.g. asmip_table
in CMIP5.- Type
-
time_dim
¶ iris.coords.DimCoord
The time dimension of the data.
-
time_dim_number
¶ The index which corresponds to the time dimension.
e.g. if time is the first dimension of the data, then
self.time_dim_number
will be0
(Python is zero-indexed).- Type
-
time_period_regex
¶ Regular expression which captures the timeseries identifier in input data files.
For help on regular expressions, see regular expressions.
- Type
_sre.SRE_Pattern
-
time_period_separator
= '-'¶ Character used to separate time period strings in the time period indicator in filenames.
e.g.
-
is the ‘time period separator’ in “2015-2030”.- Type
-
timestamp_definitions
¶ Definition of valid timestamp information and corresponding key values.
This follows the CMIP standards where time strings must be one of the following: YYYY, YYYYMM, YYYYMMDD, YYYYMMDDHH or one of the previous combined with a hyphen e.g. YYYY-YYYY.
Each key in the definitions dictionary is the length of the timestamp. Each value is itself a dictionary, with keys:
datetime_str: the string required to convert a timestamp of this length into a datetime using
datetime.datetime.strptime
generic_regexp: a regular expression which will match timestamps in this format
expected_timestep: a
dateutil.relativedelta.relativedelta
object which contains the expected timestep in files with this timestamp
- Returns
- Return type
Examples
>>> self.timestamp_definitions[len("2012")]["datetime_str"] "%Y"
-