netCDF-SCM

netCDF-SCM is a Python package for processing netCDF files. It focusses on metrics which are relevant to simple climate models and is built on top of the Iris package.

netCDF-SCM is free software under a BSD 3-Clause License, see LICENSE. If you make any use of netCDF-SCM, please cite the Geoscience Data Journal (GDJ) paper (Nicholls et al., GDJ 2021) as well as the relevant Zenodo release.

Disclaimer

One of the most common uses of netCDF-SCM is for processing Coupled Model Intercomparison Project data. If this is your use case, please note that you must abide by the terms of use of the data, in particular the required acknowledgement statements (see the CMIP5 terms of use, CMIP6 terms of use and CMIP6 GMD Special Issue).

To make it easier to do this, we have developed some basic tools which simplify the process of checking model license terms and creating the tables required in publications to cite CMIP data (check them out here). However, we provide no guarantees that these tools are up to date so all users should double check that they do in fact produce output consistent with the terms of use referenced above (and if there are issues, please raise an issue at our issue tracker :) ).

Installation

The easiest way to install netCDF-SCM is with conda

# if you're using a conda environment, make sure you're in it
conda install -c conda-forge netcdf-scm

It is also possible to install it with pip

# if you're using a virtual environment, make sure you're in it
pip install netcdf-scm

However installing with pip requires installing all of Iris’s dependencies yourself which is not trivial. As far as we know, Iris cannot be installed with pip alone.

Usage

Here we provide various examples of netCDF-SCM’s behaviour and usage. The source code of these usage examples is available in the folder docs/source/usage of the GitLab repository.

Basic demos

Handling netCDF files for simple climate models

In this notebook we give a brief introduction to iris, the library we use for our analysis, before giving a demonstration of some of the key functionality of netCDF-SCM.

[1]:
# NBVAL_IGNORE_OUTPUT
from os.path import join

import numpy as np
import iris
import iris.coord_categorisation
import iris.quickplot as qplt
import iris.analysis.cartography
import matplotlib
import matplotlib.pyplot as plt

from netcdf_scm.iris_cube_wrappers import ScmCube, MarbleCMIP5Cube
[2]:
plt.style.use("bmh")
%matplotlib inline
[3]:
DATA_PATH_TEST = join("..", "..", "..", "tests", "test-data")
DATA_PATH_TEST_MARBLE_CMIP5_ROOT = join(DATA_PATH_TEST, "marble-cmip5")
Loading a cube
Loading with iris

Here we show how to load a cube directly using iris.

[4]:
tas_file = join(
    DATA_PATH_TEST_MARBLE_CMIP5_ROOT,
    "cmip5",
    "1pctCO2",
    "Amon",
    "tas",
    "CanESM2",
    "r1i1p1",
    "tas_Amon_CanESM2_1pctCO2_r1i1p1_189201-190312.nc",
)
[5]:
# NBVAL_IGNORE_OUTPUT
# Ignore output as the warnings are likely to change with
# new iris versions
tas_iris_load = ScmCube()
# you need this in case your cube has multiple variables
variable_constraint = iris.Constraint(
    cube_func=(lambda c: c.var_name == np.str("tas"))
)

tas_iris_load.cube = iris.load_cube(tas_file, variable_constraint)
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/fileformats/cf.py:803: UserWarning: Missing CF-netCDF measure variable 'areacella', referenced by netCDF variable 'tas'
  warnings.warn(message % (variable_name, nc_var_name))

The warning tells us that we need to add the areacella as a measure variable to our cube. Doing this manually everytime involves finding the areacella file, loading it, turning it into a cell measure and then adding it to the cube. This is a pain and involves about 100 lines of code. To make life easier, we wrap all of that away using netcdf_scm, which we will introduce in the next section.

Loading with netcdf_scm

There are a couple of particularly annoying things involved with processing netCDF data. Firstly, the data is often stored in a folder hierarchy which can be fiddly to navigate. Secondly, the metadata is often stored separate from the variable cubes.

Hence in netcdf_scm, we try to abstract the code which solves these two things away to make life a bit easier. This involves defining a cube in netcdf_scm.iris_cube_wrappers. The details can be read there, for now we just give an example.

Our example uses MarbleCMIP5Cube. This cube is designed to work with the CMIP5 data on our server at University of Melbourne, which has been organised into a number of folders which are similar, but not quite identical, to the CMOR directory structure described in section 3.1 of the CMIP5 Data Reference Syntax. To facilitate our example, the test data in DATA_PATH_TEST_MARBLE_CMIP5_ROOT is organised in the same way.

Loading with identifiers

With our MarbleCMIP5Cube, we can simply pass in the information about the data we want (experiment, model, ensemble member etc.) and it will load our desired cube using the load_data_from_identifiers method.

[6]:
tas = MarbleCMIP5Cube()
tas.load_data_from_identifiers(
    root_dir=DATA_PATH_TEST_MARBLE_CMIP5_ROOT,
    activity="cmip5",
    experiment="1pctCO2",
    mip_table="Amon",
    variable_name="tas",
    model="CanESM2",
    ensemble_member="r1i1p1",
    time_period="189201-190312",
    file_ext=".nc",
)

We can verify that the loaded cube is exactly the same as the cube we loaded in the previous section (where we provided the full path).

[7]:
# NBVAL_IGNORE_OUTPUT
assert tas.cube == tas_iris_load.cube

We can have a look at our cube too (note that the broken cell measures representation is intended to be fixed in https://github.com/SciTools/iris/pull/3173).

[8]:
# NBVAL_IGNORE_OUTPUT
tas.cube
[8]:
Air Temperature (K) time latitude longitude
Shape 144 64 128
Dimension coordinates
time x - -
latitude - x -
longitude - - x
Cell Measures
cell_area - x x
Scalar coordinates
height 2.0 m
Attributes
CCCma_data_licence 1) GRANT OF LICENCE - The Government of Canada (Environment Canada) is...
CCCma_parent_runid IGA
CCCma_runid IDK
CDI Climate Data Interface version 1.9.7.1 (http://mpimet.mpg.de/cdi)
CDO Climate Data Operators version 1.9.7.1 (http://mpimet.mpg.de/cdo)
Conventions CF-1.4
associated_files baseURL: http://cmip-pcmdi.llnl.gov/CMIP5/dataLocation gridspecFile: gridspec_atmos_fx_CanESM2_1pctCO2_r0i0p0.nc...
branch_time 171915.0
branch_time_YMDH 2321:01:01:00
cmor_version 2.5.4
contact cccma_info@ec.gc.ca
creation_date 2011-03-10T12:09:13Z
experiment 1 percent per year CO2
experiment_id 1pctCO2
forcing GHG (GHG includes CO2 only)
frequency mon
history 2011-03-10T12:09:13Z altered by CMOR: Treated scalar dimension: 'height'....
initialization_method 1
institute_id CCCma
institution CCCma (Canadian Centre for Climate Modelling and Analysis, Victoria, BC,...
model_id CanESM2
modeling_realm atmos
original_name ST
parent_experiment pre-industrial control
parent_experiment_id piControl
parent_experiment_rip r1i1p1
physics_version 1
product output
project_id CMIP5
realization 1
references http://www.cccma.ec.gc.ca/models
source CanESM2 2010 atmosphere: CanAM4 (AGCM15i, T63L35) ocean: CanOM4 (OGCM4.0,...
table_id Table Amon (31 January 2011) 53b766a395ac41696af40aab76a49ae5
title CanESM2 model output prepared for CMIP5 1 percent per year CO2
tracking_id 36b6de63-cce5-4a7a-a3f4-69e5b4056fde
Cell methods
mean time (15 minutes)
Loading with filepath

With our MarbleCMIP5Cube, we can also pass in the filepath and the cube will determine the relevant attributes for us, as well as loading the other required cubes.

[9]:
example_path = join(
    DATA_PATH_TEST_MARBLE_CMIP5_ROOT,
    "cmip5",
    "1pctCO2",
    "Amon",
    "tas",
    "CanESM2",
    "r1i1p1",
    "tas_Amon_CanESM2_1pctCO2_r1i1p1_189201-190312.nc",
)
example_path
[9]:
'../../../tests/test-data/marble-cmip5/cmip5/1pctCO2/Amon/tas/CanESM2/r1i1p1/tas_Amon_CanESM2_1pctCO2_r1i1p1_189201-190312.nc'
[10]:
tas_from_path = MarbleCMIP5Cube()
tas_from_path.load_data_from_path(example_path)
tas_from_path.model
[10]:
'CanESM2'

We can also confirm that this cube is the same again.

[11]:
# NBVAL_IGNORE_OUTPUT
assert tas_from_path.cube == tas_iris_load.cube
Acting on the cube

Once we have loaded our ScmCube, we can act on its cube attribute like any other Iris Cube. For example, we can add a year categorisation, take an annual mean and then plot the timeseries.

[12]:
# NBVAL_IGNORE_OUTPUT
year_cube = tas.cube.copy()
iris.coord_categorisation.add_year(year_cube, "time")
annual_mean = year_cube.aggregated_by(
    ["year"], iris.analysis.MEAN  # Do the averaging
)
global_annual_mean = annual_mean.collapsed(
    ["latitude", "longitude"],
    iris.analysis.MEAN,
    weights=iris.analysis.cartography.area_weights(annual_mean),
)
qplt.plot(global_annual_mean);
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/analysis/cartography.py:394: UserWarning: Using DEFAULT_SPHERICAL_EARTH_RADIUS.
  warnings.warn("Using DEFAULT_SPHERICAL_EARTH_RADIUS.")
_images/usage_demo_20_1.png

We can also take a time average and make a spatial plot.

[13]:
# NBVAL_IGNORE_OUTPUT
time_mean = tas.cube.collapsed("time", iris.analysis.MEAN)
qplt.pcolormesh(time_mean);
_images/usage_demo_22_0.png

Add a spatial plot with coastlines.

[14]:
# NBVAL_IGNORE_OUTPUT
# we ignore output here as CI sometimes has to
# download the map file
qplt.pcolormesh(time_mean,)
plt.gca().coastlines();
_images/usage_demo_24_0.png
SCM specifics

Finally, we present the key functions of this package. These are directly related to processing netCDF files for simple climate models.

Getting SCM timeseries

The major one is get_scm_timeseries. This function wraps a number of steps:

  1. load the land surface fraction data

  2. combine the land surface fraction and latitude data to determine the hemisphere and land/ocean boxes

  3. cut the data into the relevant boxes

  4. take a time average in each box

  5. return it all as an ScmRun instance

As you can imagine, we find it very useful to be able to abstract all these normally nasty steps away.

[15]:
tas_scm_timeseries = tas.get_scm_timeseries()
type(tas_scm_timeseries)
[15]:
scmdata.run.ScmRun
[16]:
tas_scm_timeseries.tail()
[16]:
time 1892-01-16 12:00:00 1892-02-15 00:00:00 1892-03-16 12:00:00 1892-04-16 00:00:00 1892-05-16 12:00:00 1892-06-16 00:00:00 1892-07-16 12:00:00 1892-08-16 12:00:00 1892-09-16 00:00:00 1892-10-16 12:00:00 ... 1903-03-16 12:00:00 1903-04-16 00:00:00 1903-05-16 12:00:00 1903-06-16 00:00:00 1903-07-16 12:00:00 1903-08-16 12:00:00 1903-09-16 00:00:00 1903-10-16 12:00:00 1903-11-16 00:00:00 1903-12-16 12:00:00
activity_id climate_model member_id mip_era model region scenario unit variable variable_standard_name
cmip5 CanESM2 r1i1p1 CMIP5 unspecified World|Southern Hemisphere 1pctCO2 K tas air_temperature 289.995973 290.172797 289.714070 288.535888 286.995946 285.540528 284.223201 284.026061 284.104412 285.718940 ... 289.463490 288.250221 287.143196 285.560893 284.483592 284.028544 284.407818 285.688775 287.605868 289.166832
World|Northern Hemisphere|Land 1pctCO2 K tas air_temperature 273.761214 274.977433 279.443819 285.065822 290.592045 295.710179 298.054659 296.513476 291.840867 285.930817 ... 280.092979 285.314135 290.661860 295.907703 298.376198 296.972201 292.059519 286.135415 280.426803 275.705298
World|Southern Hemisphere|Land 1pctCO2 K tas air_temperature 285.622468 284.234503 282.236887 279.420272 277.156861 275.346434 273.870726 276.048329 276.954356 280.633229 ... 281.224552 278.331160 277.761742 275.291770 274.902389 275.396653 277.198751 280.436202 283.624803 285.681420
World|Northern Hemisphere|Ocean 1pctCO2 K tas air_temperature 289.822557 289.047582 289.343519 290.817539 292.586042 294.197028 295.573417 296.190904 295.730899 294.419906 ... 289.406973 290.787364 292.588619 294.351466 295.773334 296.487474 296.136925 294.893012 293.140459 291.236986
World|Southern Hemisphere|Ocean 1pctCO2 K tas air_temperature 291.079561 291.644080 291.566631 290.794390 289.433696 288.066236 286.788149 286.002639 285.875923 286.978986 ... 291.504785 290.707786 289.467563 288.105189 286.857449 286.167197 286.193950 286.990162 288.592224 290.030384

5 rows × 144 columns

Having the data as an ScmRun makes it trivial to plot and work with.

[17]:
# NBVAL_IGNORE_OUTPUT
restricted_time_df = tas_scm_timeseries.filter(
    year=range(1895, 1901),
    region="*Ocean*",  # Try e.g. "*", "World", "*Land", "*Southern Hemisphere*" here
)
restricted_time_df.line_plot(
    x="time", hue="region",
);
_images/usage_demo_29_0.png
[18]:
# NBVAL_IGNORE_OUTPUT
tas_scm_timeseries_annual_mean = (
    tas_scm_timeseries.filter(region="World").timeseries().T
)
tas_scm_timeseries_annual_mean = tas_scm_timeseries_annual_mean.groupby(
    tas_scm_timeseries_annual_mean.index.map(lambda x: x.year)
).mean()
tas_scm_timeseries_annual_mean.head()
[18]:
activity_id cmip5
climate_model CanESM2
member_id r1i1p1
mip_era CMIP5
model unspecified
region World
scenario 1pctCO2
unit K
variable tas
variable_standard_name air_temperature
time
1892 288.398356
1893 288.119690
1894 288.089608
1895 288.392593
1896 288.339564
[19]:
tas_scm_timeseries_annual_mean.plot(figsize=(16, 9));
_images/usage_demo_31_0.png
Getting SCM timeseries cubes

As part of the process above, we calculate all the timeseries as iris.cube.Cube’s. Extracting these intermediate cubes can be done with get_scm_timeseries_cubes. These intermediate cubes are useful as they contain all the metadata from the source cube in a slightly more friendly format than ScmRun’s metadata attribute [Note: ScmRun’s metadata handling is a work in progress].

[20]:
tas_scm_ts_cubes = tas.get_scm_timeseries_cubes()
[21]:
# NBVAL_IGNORE_OUTPUT
print(tas_scm_ts_cubes["World"].cube)
air_temperature                               (time: 144)
     Dimension coordinates:
          time                                     x
     Scalar coordinates:
          area_world: 510099672793088.0 m**2
          area_world_land: 154606610000000.0 m**2
          area_world_northern_hemisphere: 255049836396544.0 m**2
          area_world_northern_hemisphere_land: 103962633261056.0 m**2
          area_world_northern_hemisphere_ocean: 151087203135488.0 m**2
          area_world_ocean: 355493070000000.0 m**2
          area_world_southern_hemisphere: 255049836396544.0 m**2
          area_world_southern_hemisphere_land: 50643977650176.0 m**2
          area_world_southern_hemisphere_ocean: 204405858746368.0 m**2
          height: 2.0 m
          land_fraction: 0.30309097827134884
          land_fraction_northern_hemisphere: 0.4076169376537766
          land_fraction_southern_hemisphere: 0.19856502699902237
          latitude: 0.0 degrees, bound=(-90.0, 90.0) degrees
          longitude: 178.59375 degrees, bound=(-1.40625, 358.59375) degrees
    Scalar cell measures:
          cell_area
     Attributes:
          CCCma_data_licence: 1) GRANT OF LICENCE - The Government of Canada (Environment Canada) is...
          CCCma_parent_runid: IGA
          CCCma_runid: IDK
          CDI: Climate Data Interface version 1.9.7.1 (http://mpimet.mpg.de/cdi)
          CDO: Climate Data Operators version 1.9.7.1 (http://mpimet.mpg.de/cdo)
          Conventions: CF-1.4
          activity_id: cmip5
          associated_files: baseURL: http://cmip-pcmdi.llnl.gov/CMIP5/dataLocation gridspecFile: gridspec_atmos_fx_CanESM2_1pctCO2_r0i0p0.nc...
          branch_time: 171915.0
          branch_time_YMDH: 2321:01:01:00
          climate_model: CanESM2
          cmor_version: 2.5.4
          contact: cccma_info@ec.gc.ca
          creation_date: 2011-03-10T12:09:13Z
          crunch_netcdf_scm_version: 2.0.0rc5+3.gc7d2d42.dirty (more info at gitlab.com/netcdf-scm/netcdf-s...
          crunch_netcdf_scm_weight_kwargs: {}
          crunch_source_files: Files: ['/cmip5/1pctCO2/Amon/tas/CanESM2/r1i1p1/tas_Amon_CanESM2_1pctCO2_r1i1p1_189201-190312.nc'];...
          experiment: 1 percent per year CO2
          experiment_id: 1pctCO2
          forcing: GHG (GHG includes CO2 only)
          frequency: mon
          history: 2011-03-10T12:09:13Z altered by CMOR: Treated scalar dimension: 'height'....
          initialization_method: 1
          institute_id: CCCma
          institution: CCCma (Canadian Centre for Climate Modelling and Analysis, Victoria, BC,...
          member_id: r1i1p1
          mip_era: CMIP5
          model_id: CanESM2
          modeling_realm: atmos
          original_name: ST
          parent_experiment: pre-industrial control
          parent_experiment_id: piControl
          parent_experiment_rip: r1i1p1
          physics_version: 1
          product: output
          project_id: CMIP5
          realization: 1
          references: http://www.cccma.ec.gc.ca/models
          region: World
          scenario: 1pctCO2
          source: CanESM2 2010 atmosphere: CanAM4 (AGCM15i, T63L35) ocean: CanOM4 (OGCM4.0,...
          table_id: Table Amon (31 January 2011) 53b766a395ac41696af40aab76a49ae5
          title: CanESM2 model output prepared for CMIP5 1 percent per year CO2
          tracking_id: 36b6de63-cce5-4a7a-a3f4-69e5b4056fde
          variable: tas
          variable_standard_name: air_temperature
     Cell methods:
          mean: time (15 minutes)
          mean: latitude, longitude

In particular, the land_fraction* auxillary co-ordinates provide useful information about the fraction of area that was assumed to be land in the crunching.

[22]:
tas_scm_ts_cubes["World"].cube.coords("land_fraction")
[22]:
[AuxCoord(array([0.30309098]), standard_name=None, units=Unit('1'), long_name='land_fraction')]
[23]:
tas_scm_ts_cubes["World"].cube.coords("land_fraction_northern_hemisphere")
[23]:
[AuxCoord(array([0.40761694]), standard_name=None, units=Unit('1'), long_name='land_fraction_northern_hemisphere')]
Investigating the weights

Another utility function is get_scm_timeseries_weights. This function is very similar to get_scm_timeseries but returns just the weights rather than a ScmRun. These weights are also area weighted.

[24]:
tas_scm_weights = tas.get_scm_timeseries_weights()
[25]:
# NBVAL_IGNORE_OUTPUT
plt.figure(figsize=(18, 18))
no_rows = 3
no_cols = 4

total_panels = no_cols * no_rows
rows_plt_comp = no_rows * 100
cols_plt_comp = no_cols * 10
for i, (label, weights) in enumerate(tas_scm_weights.items()):
    if label == "World":
        index = int((no_rows + 1) / 2)
        plt.subplot(no_rows, 1, index)
    else:
        if label == "World|Northern Hemisphere":
            index = 1
        elif label == "World|Southern Hemisphere":
            index = 1 + (no_rows - 1) * no_cols
        elif label == "World|Land":
            index = 2
        elif label == "World|Ocean":
            index = 2 + (no_rows - 1) * no_cols
        else:
            index = 3
            if "Ocean" in label:
                index += 1
            if "Southern Hemisphere" in label:
                index += (no_rows - 1) * no_cols

        plt.subplot(no_rows, no_cols, index)

    weight_cube = tas.cube.collapsed("time", iris.analysis.MEAN)
    weight_cube.data = weights
    qplt.pcolormesh(weight_cube)
    plt.title(label)
    plt.gca().coastlines()


plt.tight_layout()
<ipython-input-25-462c0f3bdde1>:38: UserWarning: Tight layout not applied. tight_layout cannot make axes width small enough to accommodate all axes decorations
  plt.tight_layout()
_images/usage_demo_40_1.png

More detail

Atmospheric, oceanic and land data handling

In this notebook we discuss the subtleties of how netCDF-SCM handles different data ‘realms’ and why these choices are made. The realms of interest are atmosphere, ocean and land and the distinction between the realms follows the CMIP6 realm controlled vocabulary.

[1]:
# NBVAL_IGNORE_OUTPUT
import traceback
from os.path import join

import iris
import iris.quickplot as qplt
import matplotlib.pyplot as plt
import numpy as np

from netcdf_scm.iris_cube_wrappers import CMIP6OutputCube
from netcdf_scm.utils import broadcast_onto_lat_lon_grid
[2]:
from pandas.plotting import register_matplotlib_converters

register_matplotlib_converters()
plt.style.use("bmh")
[3]:
import logging

logging.captureWarnings(True)

root_logger = logging.getLogger()
root_logger.setLevel(logging.WARNING)
fmt = logging.Formatter("{levelname}:{name}:{message}", style="{")
stream_handler = logging.StreamHandler()
stream_handler.setFormatter(fmt)
root_logger.addHandler(stream_handler)
[4]:
DATA_PATH_TEST = join("..", "..", "..", "tests", "test-data")

Note that all of our data is on a regular grid, we show an example of using native model grid data in the ocean section.

[5]:
tas_file = join(
    DATA_PATH_TEST,
    "cmip6output",
    "CMIP6",
    "CMIP",
    "IPSL",
    "IPSL-CM6A-LR",
    "historical",
    "r1i1p1f1",
    "Amon",
    "tas",
    "gr",
    "v20180803",
    "tas_Amon_IPSL-CM6A-LR_historical_r1i1p1f1_gr_191001-191003.nc",
)

gpp_file = tas_file.replace("Amon", "Lmon").replace("tas", "gpp")
csoilfast_file = gpp_file.replace("gpp", "cSoilFast")

hfds_file = join(
    DATA_PATH_TEST,
    "cmip6output",
    "CMIP6",
    "CMIP",
    "NOAA-GFDL",
    "GFDL-CM4",
    "piControl",
    "r1i1p1f1",
    "Omon",
    "hfds",
    "gr",
    "v20180701",
    "hfds_Omon_GFDL-CM4_piControl_r1i1p1f1_gr_015101-015103.nc",
)
Oceans

We start by loading our data.

[6]:
# NBVAL_IGNORE_OUTPUT
hfds = CMIP6OutputCube()
hfds.load_data_from_path(hfds_file)

netCDF-SCM will assume whether the data is “ocean”, “land” or “atmosphere”. The assumed realm can be checked by examining a ScmCube’s netcdf_scm_realm property.

In our case we have “ocean” data.

[7]:
hfds.netcdf_scm_realm
[7]:
'ocean'

If we have ocean data, then there is no data which will go in a “land” box. Hence, if we request e.g. World|Land data, we will get a warning and land data will not be returned.

[8]:
out = hfds.get_scm_timeseries(regions=["World", "World|Land"])
out["region"].unique()
WARNING:py.warnings:/Users/znicholls/Documents/AGCEC/netCDF-SCM/netcdf-scm/src/netcdf_scm/weights/__init__.py:869: UserWarning: Failed to create 'World|Land' weights: All weights are zero for region: `World|Land`
  warnings.warn(warn_str)

WARNING:netcdf_scm.iris_cube_wrappers:Not calculating land fractions as all required cubes are not available
WARNING:netcdf_scm.iris_cube_wrappers:Performing lazy conversion to datetime for calendar: 365_day. This may cause subtle errors in operations that depend on the length of time between dates
[8]:
array(['World'], dtype=object)

As there is no land data, the World mean is equal to the World|Ocean mean.

[9]:
# NBVAL_IGNORE_OUTPUT
hfds_scm_ts = hfds.get_scm_timeseries(regions=["World", "World|Ocean"])
hfds_scm_ts.line_plot(style="region")
np.testing.assert_allclose(
    hfds_scm_ts.filter(region="World").values,
    hfds_scm_ts.filter(region="World|Ocean").values,
);
WARNING:netcdf_scm.iris_cube_wrappers:Not calculating land fractions as all required cubes are not available
WARNING:netcdf_scm.iris_cube_wrappers:Performing lazy conversion to datetime for calendar: 365_day. This may cause subtle errors in operations that depend on the length of time between dates
_images/usage_atmos-land-ocean-handling_14_1.png

When taking averages, there are 3 obvious options:

  • unweighted average

  • area weighted average

  • area and surface fraction weighted average

In netCDF-SCM, we provide the choice of the first two (if you want an unweighted average, please raise an issue on our issue tracker). Depending on the context, one will likely make more sense than the other. The user can specify this to ScmCube.get_scm_timeseries_weights via the cell_weights argument. If the user doesn’t supply a value, ScmCube will guess depending on what is most appropriate, see the docstring below for more details.

[10]:
print(hfds.get_scm_timeseries_weights.__doc__)

        Get the scm timeseries weights

        Parameters
        ----------
        surface_fraction_cube : :obj:`ScmCube`, optional
            land surface fraction data which is used to determine whether a given
            gridbox is land or ocean. If ``None``, we try to load the land surface fraction automatically.

        areacell_scmcube : :obj:`ScmCube`, optional
            cell area data which is used to take the latitude-longitude mean of the
            cube's data. If ``None``, we try to load this data automatically and if
            that fails we fall back onto ``iris.analysis.cartography.area_weights``.

        regions : list[str]
            List of regions to use. If ``None`` then
            ``netcdf_scm.regions.DEFAULT_REGIONS`` is used.

        cell_weights : {'area-only', 'area-surface-fraction'}
            How cell weights should be calculated. If ``'area-surface-fraction'``, both cell area and its
            surface fraction will be used to weight the cell. If ``'area-only'``, only the cell's area
            will be used to weight the cell (cells which do not belong to the region are nonetheless
            excluded). If ``None``, netCDF-SCM will guess whether land surface fraction weights should
            be included or not based on the data being processed. When guessing, for ocean data,
            netCDF-SCM will weight cells only by the horizontal area of the cell i.e. no land fraction
            (see Section L5 of Griffies et al., *GMD*, 2016, `<https://doi.org/10.5194/gmd-9-3231-2016>`_).
            For land variables, netCDF-SCM will weight cells by both thier horizontal area and their land
            surface fraction. “Yes, you do need to weight the output by land frac (sftlf is the CMIP
            variable name).” (Chris Jones, *personal communication*, 18 April 2020). For land variables,
            note that there seems to be nothing in Jones et al., *GMD*, 2016
            (`<https://doi.org/10.5194/gmd-9-2853-2016>`_).

        log_failure : bool
            Should regions which fail be logged? If no, failures are raised as
            warnings.

        Returns
        -------
        dict of str: :obj:`np.ndarray`
            Dictionary of 'region name': weights, key: value pairs

        Notes
        -----
        Only regions which can be calculated are returned. If no regions can be calculated, an empty
        dictionary will be returned.

In the cells below, we show the difference the choice of cell weighting makes makes.

[11]:
def compare_weighting_options(input_scm_cube):
    unweighted_mean = input_scm_cube.cube.collapsed(
        ["latitude", "longitude"], iris.analysis.MEAN
    )

    area_cell = input_scm_cube.get_metadata_cube(
        input_scm_cube.areacell_var
    ).cube

    area_weights = broadcast_onto_lat_lon_grid(input_scm_cube, area_cell.data)
    area_weighted_mean = input_scm_cube.cube.collapsed(
        ["latitude", "longitude"], iris.analysis.MEAN, weights=area_weights
    )

    surface_frac = input_scm_cube.get_metadata_cube(
        input_scm_cube.surface_fraction_var
    ).cube

    area_sf = area_cell * surface_frac
    area_sf_weights = broadcast_onto_lat_lon_grid(input_scm_cube, area_sf.data)
    area_sf_weighted_mean = input_scm_cube.cube.collapsed(
        ["latitude", "longitude"], iris.analysis.MEAN, weights=area_sf_weights
    )

    plt.figure(figsize=(8, 4.5))
    qplt.plot(unweighted_mean, label="unweighted")
    qplt.plot(area_weighted_mean, label="area weighted")
    qplt.plot(
        area_sf_weighted_mean,
        label="area-surface fraction weighted",
        linestyle="--",
        dashes=(10, 10),
        linewidth=4,
    )

    plt.legend();
[12]:
# NBVAL_IGNORE_OUTPUT
compare_weighting_options(hfds)
WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/cube.py:3218: UserWarning: Collapsing spatial coordinate 'latitude' without weighting
  warnings.warn(msg.format(coord.name()))

_images/usage_atmos-land-ocean-handling_19_1.png

We go to the trouble of taking these area-surface fraction weightings because they matter. In particular, the area weight is required to not overweight the poles (on whatever grid we’re working) whilst the surface fraction allows the user to ensure that the cells’ contribution to an average reflects how much they belong in a given ‘SCM box’.

More detail

We can check which variable is being used for the cell areas by loooking at ScmCube.areacell_var. For ocean data this is areacello.

[13]:
hfds.areacell_var
[13]:
'areacello'
[14]:
hfds_area_cell = hfds.get_metadata_cube(hfds.areacell_var).cube
qplt.pcolormesh(hfds_area_cell);
_images/usage_atmos-land-ocean-handling_23_0.png

We can check which variable is being used for the surface fraction by loooking at ScmCube.surface_fraction_var. For ocean data this is sftof.

[15]:
hfds.surface_fraction_var
[15]:
'sftof'
[16]:
hfds_surface_frac = hfds.get_metadata_cube(hfds.surface_fraction_var).cube
qplt.pcolormesh(hfds_surface_frac);
_images/usage_atmos-land-ocean-handling_26_0.png

The product of the area of the cells and the surface fraction gives us the area-surface fraction weights. The addition of the surface fraction only really matters near the coastlines where cells are neither entirely land nor entirely ocean.

[17]:
hfds_area_sf = hfds_area_cell * hfds_surface_frac

plt.figure(figsize=(16, 9))
plt.subplot(121)
qplt.pcolormesh(hfds_area_sf,)

plt.subplot(122)
lat_con = iris.Constraint(latitude=lambda cell: -50 < cell < -20)
lon_con = iris.Constraint(longitude=lambda cell: 140 < cell < 160)
qplt.pcolormesh(hfds_area_sf.extract(lat_con & lon_con),);
_images/usage_atmos-land-ocean-handling_28_0.png

For ocean data, by default netCDF-SCM will only use the area weights. If we turn the logging up, we can see the decisions being made internally (look at the line following the line containing cell_weights).

[18]:
# NBVAL_IGNORE_OUTPUT
root_logger.setLevel(logging.DEBUG)
# also load the cube again so the caching doesn't hide the logging messages
hfds = CMIP6OutputCube()
hfds.load_data_from_path(hfds_file)
DEBUG:netcdf_scm.iris_cube_wrappers:loading cube ../../../tests/test-data/cmip6output/CMIP6/CMIP/NOAA-GFDL/GFDL-CM4/piControl/r1i1p1f1/Omon/hfds/gr/v20180701/hfds_Omon_GFDL-CM4_piControl_r1i1p1f1_gr_015101-015103.nc
DEBUG:netcdf_scm.iris_cube_wrappers:loading cube ../../../tests/test-data/cmip6output/CMIP6/CMIP/NOAA-GFDL/GFDL-CM4/piControl/r1i1p1f1/Ofx/areacello/gr/v20180701/areacello_Ofx_GFDL-CM4_piControl_r1i1p1f1_gr.nc
[19]:
# NBVAL_IGNORE_OUTPUT
hfds_area_weights = broadcast_onto_lat_lon_grid(hfds, hfds_area_cell.data)
hfds_area_weighted_mean = hfds.cube.collapsed(
    ["latitude", "longitude"], iris.analysis.MEAN, weights=hfds_area_weights
)

netcdf_scm_calculated = hfds.get_scm_timeseries(regions=["World"]).timeseries()

np.testing.assert_allclose(
    hfds_area_weighted_mean.data,
    netcdf_scm_calculated.values.squeeze(),
    rtol=1e-6,
)

netcdf_scm_calculated.T
DEBUG:netcdf_scm.iris_cube_wrappers:cell_weights: None
DEBUG:netcdf_scm.iris_cube_wrappers:self.netcdf_scm_realm: ocean
DEBUG:netcdf_scm.iris_cube_wrappers:Using: <class 'netcdf_scm.weights.AreaWeightCalculator'>
DEBUG:netcdf_scm.iris_cube_wrappers:loading cube ../../../tests/test-data/cmip6output/CMIP6/CMIP/NOAA-GFDL/GFDL-CM4/piControl/r1i1p1f1/Ofx/sftof/gr/v20180701/sftof_Ofx_GFDL-CM4_piControl_r1i1p1f1_gr.nc
DEBUG:netcdf_scm.weights:sftof data max is 100.0, dividing by 100.0 to convert units to fraction
DEBUG:netcdf_scm.iris_cube_wrappers:Crunching SCM timeseries in memory
WARNING:netcdf_scm.iris_cube_wrappers:Not calculating land fractions as all required cubes are not available
WARNING:netcdf_scm.iris_cube_wrappers:Performing lazy conversion to datetime for calendar: 365_day. This may cause subtle errors in operations that depend on the length of time between dates
[19]:
activity_id CMIP
climate_model GFDL-CM4
member_id r1i1p1f1
mip_era CMIP6
model unspecified
region World
scenario piControl
unit W m^-2
variable hfds
variable_standard_name surface_downward_heat_flux_in_sea_water
time
0151-01-16 12:00:00 12.899261
0151-02-15 00:00:00 12.346571
0151-03-16 12:00:00 7.410532

If we specify that surface fractions should be included, the timeseries calculated by netCDF-SCM is the same as the timeseries calculated using the surface fraction and area weights.

[20]:
# NBVAL_IGNORE_OUTPUT
hfds_area_sf_weights = broadcast_onto_lat_lon_grid(hfds, hfds_area_sf.data)
hfds_area_sf_weighted_mean = hfds.cube.collapsed(
    ["latitude", "longitude"], iris.analysis.MEAN, weights=hfds_area_sf_weights
)

netcdf_scm_calculated = hfds.get_scm_timeseries(
    regions=["World"], cell_weights="area-surface-fraction"
).timeseries()

np.testing.assert_allclose(
    hfds_area_sf_weighted_mean.data,
    netcdf_scm_calculated.values.squeeze(),
    rtol=1e-6,
)

netcdf_scm_calculated.T
DEBUG:netcdf_scm.iris_cube_wrappers:cell_weights: area-surface-fraction
DEBUG:netcdf_scm.iris_cube_wrappers:Using: <class 'netcdf_scm.weights.AreaSurfaceFractionWeightCalculator'>
DEBUG:netcdf_scm.weights:sftof data max is 100.0, dividing by 100.0 to convert units to fraction
DEBUG:netcdf_scm.iris_cube_wrappers:Crunching SCM timeseries in memory
WARNING:netcdf_scm.iris_cube_wrappers:Not calculating land fractions as all required cubes are not available
WARNING:netcdf_scm.iris_cube_wrappers:Performing lazy conversion to datetime for calendar: 365_day. This may cause subtle errors in operations that depend on the length of time between dates
[20]:
activity_id CMIP
climate_model GFDL-CM4
member_id r1i1p1f1
mip_era CMIP6
model unspecified
region World
scenario piControl
unit W m^-2
variable hfds
variable_standard_name surface_downward_heat_flux_in_sea_water
time
0151-01-16 12:00:00 13.440214
0151-02-15 00:00:00 12.608150
0151-03-16 12:00:00 7.226662
[21]:
root_logger.setLevel(logging.WARNING)
Land

Next we look at land data.

[22]:
gpp = CMIP6OutputCube()
gpp.load_data_from_path(gpp_file)

csoilfast = CMIP6OutputCube()
csoilfast.load_data_from_path(csoilfast_file)
[23]:
gpp.netcdf_scm_realm
[23]:
'land'
[24]:
csoilfast.netcdf_scm_realm
[24]:
'land'

If we have land data, then there is no data which will go in a “ocean” box. Hence, if we request e.g. World|Ocean data, we will get a warning and ocean data will not be returned.

[25]:
out = gpp.get_scm_timeseries(regions=["World", "World|Ocean"])
out["region"].unique()
WARNING:py.warnings:/Users/znicholls/Documents/AGCEC/netCDF-SCM/netcdf-scm/src/netcdf_scm/weights/__init__.py:869: UserWarning: Failed to create 'World|Ocean' weights: All weights are zero for region: `World|Ocean`
  warnings.warn(warn_str)

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

WARNING:netcdf_scm.iris_cube_wrappers:Not calculating land fractions as all required cubes are not available
[25]:
array(['World'], dtype=object)

As there is no ocean data, the World mean is equal to the World|Land mean.

[26]:
# NBVAL_IGNORE_OUTPUT
gpp_scm_ts = gpp.get_scm_timeseries(regions=["World", "World|Land"])
gpp_scm_ts.line_plot(style="region")
np.testing.assert_allclose(
    gpp_scm_ts.filter(region="World").values,
    gpp_scm_ts.filter(region="World|Land").values,
);
WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

WARNING:netcdf_scm.iris_cube_wrappers:Not calculating land fractions as all required cubes are not available
_images/usage_atmos-land-ocean-handling_42_1.png
[27]:
# NBVAL_IGNORE_OUTPUT
compare_weighting_options(gpp)
WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/cube.py:3218: UserWarning: Collapsing spatial coordinate 'latitude' without weighting
  warnings.warn(msg.format(coord.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

_images/usage_atmos-land-ocean-handling_43_1.png
[28]:
# NBVAL_IGNORE_OUTPUT
compare_weighting_options(csoilfast)
WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/cube.py:3218: UserWarning: Collapsing spatial coordinate 'latitude' without weighting
  warnings.warn(msg.format(coord.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

_images/usage_atmos-land-ocean-handling_44_1.png

For land data, by default netCDF-SCM will use the area and surface fraction weights. Once again, if we turn the logging up, we can see the decisions being made internally.

[29]:
# NBVAL_IGNORE_OUTPUT
root_logger.setLevel(logging.DEBUG)
csoilfast = CMIP6OutputCube()
csoilfast.load_data_from_path(csoilfast_file)
DEBUG:netcdf_scm.iris_cube_wrappers:loading cube ../../../tests/test-data/cmip6output/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/Lmon/cSoilFast/gr/v20180803/cSoilFast_Lmon_IPSL-CM6A-LR_historical_r1i1p1f1_gr_191001-191003.nc
DEBUG:netcdf_scm.iris_cube_wrappers:loading cube ../../../tests/test-data/cmip6output/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/fx/areacella/gr/v20180803/areacella_fx_IPSL-CM6A-LR_historical_r1i1p1f1_gr.nc
[30]:
# NBVAL_IGNORE_OUTPUT
netcdf_scm_calculated = csoilfast.get_scm_timeseries(
    regions=["World"]
).timeseries()

netcdf_scm_calculated.T
DEBUG:netcdf_scm.iris_cube_wrappers:cell_weights: None
DEBUG:netcdf_scm.iris_cube_wrappers:self.netcdf_scm_realm: land
DEBUG:netcdf_scm.iris_cube_wrappers:Using: <class 'netcdf_scm.weights.AreaSurfaceFractionWeightCalculator'>
DEBUG:netcdf_scm.iris_cube_wrappers:loading cube ../../../tests/test-data/cmip6output/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/fx/sftlf/gr/v20180803/sftlf_fx_IPSL-CM6A-LR_historical_r1i1p1f1_gr.nc
DEBUG:netcdf_scm.weights:sftlf data max is 100.0, dividing by 100.0 to convert units to fraction
DEBUG:netcdf_scm.iris_cube_wrappers:Crunching SCM timeseries in memory
WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

WARNING:netcdf_scm.iris_cube_wrappers:Not calculating land fractions as all required cubes are not available
[30]:
activity_id CMIP
climate_model IPSL-CM6A-LR
member_id r1i1p1f1
mip_era CMIP6
model unspecified
region World
scenario historical
unit kg m^-2
variable cSoilFast
variable_standard_name fast_soil_pool_carbon_content
time
1910-01-16 12:00:00 0.058512
1910-02-15 00:00:00 0.058663
1910-03-16 12:00:00 0.059181
Atmosphere

Finally we look at atmospheric data.

[31]:
tas = CMIP6OutputCube()
tas.load_data_from_path(tas_file)
DEBUG:netcdf_scm.iris_cube_wrappers:loading cube ../../../tests/test-data/cmip6output/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/Amon/tas/gr/v20180803/tas_Amon_IPSL-CM6A-LR_historical_r1i1p1f1_gr_191001-191003.nc
DEBUG:netcdf_scm.iris_cube_wrappers:loading cube ../../../tests/test-data/cmip6output/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/fx/areacella/gr/v20180803/areacella_fx_IPSL-CM6A-LR_historical_r1i1p1f1_gr.nc
[32]:
tas.netcdf_scm_realm
[32]:
'atmosphere'

If we have atmosphere data, then we have global coverage and so can split data into both the land and ocean boxes.

[33]:
# NBVAL_IGNORE_OUTPUT
fig = plt.figure(figsize=(16, 14))
ax1 = fig.add_subplot(311)
tas.get_scm_timeseries(
    regions=[
        "World",
        "World|Land",
        "World|Ocean",
        "World|Northern Hemisphere",
        "World|Southern Hemisphere",
    ]
).lineplot(hue="region", ax=ax1)

ax2 = fig.add_subplot(312, sharey=ax1, sharex=ax1)
tas.get_scm_timeseries(
    regions=[
        "World",
        "World|Northern Hemisphere|Land",
        "World|Southern Hemisphere|Land",
        "World|Northern Hemisphere|Ocean",
        "World|Southern Hemisphere|Ocean",
    ]
).lineplot(hue="region", ax=ax2)

ax3 = fig.add_subplot(313, sharey=ax1, sharex=ax1)
tas.get_scm_timeseries(
    regions=[
        "World",
        "World|Ocean",
        "World|North Atlantic Ocean",
        "World|El Nino N3.4",
    ]
).lineplot(hue="region", ax=ax3);
DEBUG:netcdf_scm.iris_cube_wrappers:cell_weights: None
DEBUG:netcdf_scm.iris_cube_wrappers:self.netcdf_scm_realm: atmosphere
DEBUG:netcdf_scm.iris_cube_wrappers:Using: <class 'netcdf_scm.weights.AreaSurfaceFractionWeightCalculator'>
DEBUG:netcdf_scm.iris_cube_wrappers:loading cube ../../../tests/test-data/cmip6output/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/fx/sftlf/gr/v20180803/sftlf_fx_IPSL-CM6A-LR_historical_r1i1p1f1_gr.nc
DEBUG:netcdf_scm.weights:sftlf data max is 100.0, dividing by 100.0 to convert units to fraction
DEBUG:netcdf_scm.iris_cube_wrappers:Crunching SCM timeseries in memory
WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

WARNING:netcdf_scm.iris_cube_wrappers:Not calculating land fractions as all required cubes are not available
DEBUG:netcdf_scm.iris_cube_wrappers:Crunching SCM timeseries in memory
WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

WARNING:netcdf_scm.iris_cube_wrappers:Not calculating land fractions as all required cubes are not available
DEBUG:netcdf_scm.iris_cube_wrappers:Crunching SCM timeseries in memory
WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

WARNING:netcdf_scm.iris_cube_wrappers:Not calculating land fractions as all required cubes are not available
_images/usage_atmos-land-ocean-handling_52_1.png
[34]:
# NBVAL_IGNORE_OUTPUT
compare_weighting_options(tas)
WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/cube.py:3218: UserWarning: Collapsing spatial coordinate 'latitude' without weighting
  warnings.warn(msg.format(coord.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

_images/usage_atmos-land-ocean-handling_53_1.png

As our data is global, the “World” data is simply an area-weighted mean.

[35]:
# NBVAL_IGNORE_OUTPUT
tas_area = tas.get_metadata_cube(tas.areacell_var).cube

tas_area_weights = broadcast_onto_lat_lon_grid(tas, tas_area.data)
tas_area_weighted_mean = tas.cube.collapsed(
    ["latitude", "longitude"], iris.analysis.MEAN, weights=tas_area_weights
)

netcdf_scm_calculated = tas.get_scm_timeseries(regions=["World"]).timeseries()

np.testing.assert_allclose(
    tas_area_weighted_mean.data, netcdf_scm_calculated.values.squeeze()
)

netcdf_scm_calculated.T
WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

DEBUG:netcdf_scm.iris_cube_wrappers:Crunching SCM timeseries in memory
WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

WARNING:netcdf_scm.iris_cube_wrappers:Not calculating land fractions as all required cubes are not available
[35]:
activity_id CMIP
climate_model IPSL-CM6A-LR
member_id r1i1p1f1
mip_era CMIP6
model unspecified
region World
scenario historical
unit K
variable tas
variable_standard_name air_temperature
time
1910-01-16 12:00:00 284.148122
1910-02-15 00:00:00 284.196805
1910-03-16 12:00:00 284.876555

The “World|Land” data is surface fraction weighted.

[36]:
# NBVAL_IGNORE_OUTPUT
tas_sf = tas.get_metadata_cube(tas.surface_fraction_var).cube
# netcdf-scm normalises weights to 1 internally so we do so here too
tas_sf = tas_sf / tas_sf.data.max()


tas_area_sf = tas_area * tas_sf

tas_area_sf_weights = broadcast_onto_lat_lon_grid(tas, tas_area_sf.data)
tas_area_sf_weighted_mean = tas.cube.collapsed(
    ["latitude", "longitude"], iris.analysis.MEAN, weights=tas_area_sf_weights
)

netcdf_scm_calculated = tas.get_scm_timeseries(
    regions=["World|Land"]
).timeseries()

np.testing.assert_allclose(
    tas_area_sf_weighted_mean.data, netcdf_scm_calculated.values.squeeze()
)

netcdf_scm_calculated.T
WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

DEBUG:netcdf_scm.iris_cube_wrappers:Crunching SCM timeseries in memory
WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

WARNING:netcdf_scm.iris_cube_wrappers:Not calculating land fractions as all required cubes are not available
[36]:
activity_id CMIP
climate_model IPSL-CM6A-LR
member_id r1i1p1f1
mip_era CMIP6
model unspecified
region World|Land
scenario historical
unit K
variable tas
variable_standard_name air_temperature
time
1910-01-16 12:00:00 273.530365
1910-02-15 00:00:00 273.393341
1910-03-16 12:00:00 275.527954

The “World|Ocean” data is also surface fraction weighted (calculated as 100 minus land surface fraction).

[37]:
# NBVAL_IGNORE_OUTPUT
tas_sf_ocean = tas.get_metadata_cube(tas.surface_fraction_var).cube
tas_sf_ocean.data = 100 - tas_sf_ocean.data

# netcdf-scm normalises weights to 1 internally so we do so here too
tas_sf_ocean = tas_sf_ocean / tas_sf_ocean.data.max()

tas_area_sf_ocean = tas_area.data * tas_sf_ocean.data

tas_area_sf_ocean_weights = broadcast_onto_lat_lon_grid(tas, tas_area_sf_ocean)
tas_area_sf_ocean_weighted_mean = tas.cube.collapsed(
    ["latitude", "longitude"],
    iris.analysis.MEAN,
    weights=tas_area_sf_ocean_weights,
)

netcdf_scm_calculated = tas.get_scm_timeseries(
    regions=["World|Ocean"]
).timeseries()

np.testing.assert_allclose(
    tas_area_sf_ocean_weighted_mean.data,
    netcdf_scm_calculated.values.squeeze(),
)

netcdf_scm_calculated.T
WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

DEBUG:netcdf_scm.iris_cube_wrappers:Crunching SCM timeseries in memory
WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

WARNING:netcdf_scm.iris_cube_wrappers:Not calculating land fractions as all required cubes are not available
[37]:
activity_id CMIP
climate_model IPSL-CM6A-LR
member_id r1i1p1f1
mip_era CMIP6
model unspecified
region World|Ocean
scenario historical
unit K
variable tas
variable_standard_name air_temperature
time
1910-01-16 12:00:00 288.427979
1910-02-15 00:00:00 288.551514
1910-03-16 12:00:00 288.644806

For atmosphere data, by default netCDF-SCM will use the area and surface fraction weights. Once again, if we turn the logging up, we can see the decisions being made internally.

[38]:
# NBVAL_IGNORE_OUTPUT
root_logger.setLevel(logging.DEBUG)
tas = CMIP6OutputCube()
tas.load_data_from_path(tas_file)
DEBUG:netcdf_scm.iris_cube_wrappers:loading cube ../../../tests/test-data/cmip6output/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/Amon/tas/gr/v20180803/tas_Amon_IPSL-CM6A-LR_historical_r1i1p1f1_gr_191001-191003.nc
DEBUG:netcdf_scm.iris_cube_wrappers:loading cube ../../../tests/test-data/cmip6output/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/fx/areacella/gr/v20180803/areacella_fx_IPSL-CM6A-LR_historical_r1i1p1f1_gr.nc
[39]:
# NBVAL_IGNORE_OUTPUT
netcdf_scm_calculated = tas.get_scm_timeseries(regions=["World"]).timeseries()

netcdf_scm_calculated.T
DEBUG:netcdf_scm.iris_cube_wrappers:cell_weights: None
DEBUG:netcdf_scm.iris_cube_wrappers:self.netcdf_scm_realm: atmosphere
DEBUG:netcdf_scm.iris_cube_wrappers:Using: <class 'netcdf_scm.weights.AreaSurfaceFractionWeightCalculator'>
DEBUG:netcdf_scm.iris_cube_wrappers:Crunching SCM timeseries in memory
WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))

WARNING:py.warnings:/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1410: UserWarning: Collapsing a non-contiguous coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))

WARNING:netcdf_scm.iris_cube_wrappers:Not calculating land fractions as all required cubes are not available
[39]:
activity_id CMIP
climate_model IPSL-CM6A-LR
member_id r1i1p1f1
mip_era CMIP6
model unspecified
region World
scenario historical
unit K
variable tas
variable_standard_name air_temperature
time
1910-01-16 12:00:00 284.148122
1910-02-15 00:00:00 284.196805
1910-03-16 12:00:00 284.876555
Ocean data handling

In this notebook we show how ocean data is handled.

[1]:
# NBVAL_IGNORE_OUTPUT
import traceback
from os.path import join

import numpy as np
import iris
import iris.quickplot as qplt
import matplotlib
import matplotlib.pyplot as plt
from scmdata import ScmRun

from netcdf_scm.iris_cube_wrappers import CMIP6OutputCube
[2]:
# make all logs apper
import logging

root_logger = logging.getLogger()
root_logger.addHandler(logging.StreamHandler())
[3]:
plt.style.use("bmh")
%matplotlib inline
[4]:
DATA_PATH_TEST = join("..", "..", "..", "tests", "test-data")
DATA_PATH_TEST_CMIP6_OUTPUT_ROOT = join(DATA_PATH_TEST, "cmip6output")
Test data

For this notebook’s test data we use CMIP6Output from NCAR’s CESM2 model.

2D data

Some ocean data is 2D. Here we use surface downward heat flux in sea water.

Firstly we use data which has been regridded by the modelling group.

[5]:
hfds_file = join(
    DATA_PATH_TEST,
    "cmip6output",
    "CMIP6",
    "CMIP",
    "NCAR",
    "CESM2",
    "historical",
    "r7i1p1f1",
    "Omon",
    "hfds",
    "gr",
    "v20190311",
    "hfds_Omon_CESM2_historical_r7i1p1f1_gr_195701-195703.nc",
)

We also examine how iris handles data which is provided on the native model grid.

[6]:
hfds_file_gn = hfds_file.replace("gr", "gn")
3D data

Some ocean data is 3D. netCDF-SCM currently supports crunching this to iris cubes but will not convert those cubes to SCM timeseries.

[7]:
thetao_file = join(
    DATA_PATH_TEST,
    "cmip6output",
    "CMIP6",
    "CMIP",
    "NCAR",
    "CESM2",
    "historical",
    "r10i1p1f1",
    "Omon",
    "thetao",
    "gn",
    "v20190313",
    "thetao_Omon_CESM2_historical_r10i1p1f1_gn_195310-195312.nc",
)
2D data handling
[8]:
# NBVAL_IGNORE_OUTPUT
hfds_cube = CMIP6OutputCube()
hfds_cube.load_data_from_path(hfds_file)
[9]:
print(hfds_cube.cube)
surface_downward_heat_flux_in_sea_water / (W m-2) (time: 3; latitude: 180; longitude: 360)
     Dimension coordinates:
          time                                         x            -               -
          latitude                                     -            x               -
          longitude                                    -            -               x
     Cell Measures:
          cell_area                                    -            x               x
     Attributes:
          CDI: Climate Data Interface version 1.8.2 (http://mpimet.mpg.de/cdi)
          CDO: Climate Data Operators version 1.8.2 (http://mpimet.mpg.de/cdo)
          Conventions: CF-1.7 CMIP-6.2
          activity_id: CMIP
          branch_method: standard
          branch_time_in_child: 674885.0
          branch_time_in_parent: 273750.0
          case_id: 21
          cesm_casename: b.e21.BHIST.f09_g17.CMIP6-historical.007
          comment: Model data on the 1x1 grid includes values in all cells for which ocean...
          contact: cesm_cmip6@ucar.edu
          creation_date: 2019-01-19T03:13:13Z
          data_specs_version: 01.00.29
          description: This is the net flux of heat entering the liquid water column through its...
          experiment: all-forcing simulation of the recent past
          experiment_id: historical
          external_variables: areacello
          frequency: mon
          further_info_url: https://furtherinfo.es-doc.org/CMIP6.NCAR.CESM2.historical.none.r7i1p1...
          grid: ocean data regridded from native gx1v7 displaced pole grid (384x320 latxlon)...
          grid_label: gr
          history: Sun Aug 18 22:57:15 2019: cdo -selmonth,1/3 tmp.nc hfds_Omon_CESM2_historical_r7i1p1f1_gr_195701-195703.nc
Sun...
          id: hfds
          institution: National Center for Atmospheric Research
          institution_id: NCAR
          license: CMIP6 model data produced by <The National Center for Atmospheric Research>...
          mipTable: Omon
          mip_era: CMIP6
          model_doi_url: https://doi.org/10.5065/D67H1H0V
          nominal_resolution: 1x1 degree
          out_name: hfds
          parent_activity_id: CMIP
          parent_experiment_id: piControl
          parent_mip_era: CMIP6
          parent_source_id: CESM2
          parent_time_units: days since 0001-01-01 00:00:00
          parent_variant_label: r1i1p1f1
          product: model-output
          prov: Omon ((isd.003))
          realm: ocean
          source: CESM2 (2017): atmosphere: CAM6 (0.9x1.25 finite volume grid; 288 x 192...
          source_id: CESM2
          source_type: AOGCM BGC
          sub_experiment: none
          sub_experiment_id: none
          table_id: Omon
          time: time
          time_label: time-mean
          time_title: Temporal mean
          title: Downward Heat Flux at Sea Water Surface
          tracking_id: hdl:21.14100/18907361-7d4d-4a3c-b355-4450472ab458
          type: real
          variable_id: hfds
          variant_info: CMIP6 20th century experiments (1850-2014) with CAM6, interactive land...
          variant_label: r7i1p1f1
     Cell methods:
          mean where sea: area
          mean: time
[10]:
# NBVAL_IGNORE_OUTPUT
time_mean = hfds_cube.cube.collapsed("time", iris.analysis.MEAN)
qplt.pcolormesh(time_mean)
plt.gca().coastlines();
_images/usage_ocean-data_16_0.png

Iris’ handling of data on the native model grid is mostly workable, but not yet perfect.

[11]:
# NBVAL_IGNORE_OUTPUT
hfds_cube_gn = CMIP6OutputCube()
hfds_cube_gn.load_data_from_path(hfds_file_gn)

print(hfds_cube_gn.cube)
WARNING: missing_value not used since it
cannot be safely cast to variable data type
surface_downward_heat_flux_in_sea_water / (W m-2) (time: 3; -- : 384; -- : 320)
     Dimension coordinates:
          time                                         x       -         -
     Auxiliary coordinates:
          latitude                                     -       x         x
          longitude                                    -       x         x
     Cell Measures:
          cell_area                                    -       x         x
     Attributes:
          CDI: Climate Data Interface version 1.8.2 (http://mpimet.mpg.de/cdi)
          CDO: Climate Data Operators version 1.8.2 (http://mpimet.mpg.de/cdo)
          Conventions: CF-1.7 CMIP-6.2
          activity_id: CMIP
          branch_method: standard
          branch_time_in_child: 674885.0
          branch_time_in_parent: 273750.0
          case_id: 21
          cesm_casename: b.e21.BHIST.f09_g17.CMIP6-historical.007
          comment: This is the net flux of heat entering the liquid water column through its...
          contact: cesm_cmip6@ucar.edu
          creation_date: 2019-01-19T03:13:13Z
          data_specs_version: 01.00.29
          description: This is the net flux of heat entering the liquid water column through its...
          experiment: all-forcing simulation of the recent past
          experiment_id: historical
          external_variables: areacello
          frequency: mon
          further_info_url: https://furtherinfo.es-doc.org/CMIP6.NCAR.CESM2.historical.none.r7i1p1...
          grid: native gx1v7 displaced pole grid (384x320 latxlon)
          grid_label: gn
          history: Sun Aug 18 22:57:16 2019: cdo -selmonth,1/3 tmp.nc hfds_Omon_CESM2_historical_r7i1p1f1_gn_195701-195703.nc
Sun...
          id: hfds
          institution: National Center for Atmospheric Research
          institution_id: NCAR
          license: CMIP6 model data produced by <The National Center for Atmospheric Research>...
          mipTable: Omon
          mip_era: CMIP6
          model_doi_url: https://doi.org/10.5065/D67H1H0V
          nominal_resolution: 100 km
          out_name: hfds
          parent_activity_id: CMIP
          parent_experiment_id: piControl
          parent_mip_era: CMIP6
          parent_source_id: CESM2
          parent_time_units: days since 0001-01-01 00:00:00
          parent_variant_label: r1i1p1f1
          product: model-output
          prov: Omon ((isd.003))
          realm: ocean
          source: CESM2 (2017): atmosphere: CAM6 (0.9x1.25 finite volume grid; 288 x 192...
          source_id: CESM2
          source_type: AOGCM BGC
          sub_experiment: none
          sub_experiment_id: none
          table_id: Omon
          time: time
          time_label: time-mean
          time_title: Temporal mean
          title: Downward Heat Flux at Sea Water Surface
          tracking_id: hdl:21.14100/f92a6db7-e8ea-44f1-882c-076226f8a62b
          type: real
          variable_id: hfds
          variant_info: CMIP6 20th century experiments (1850-2014) with CAM6, interactive land...
          variant_label: r7i1p1f1
     Cell methods:
          mean where sea: area
          mean: time
[12]:
# NBVAL_IGNORE_OUTPUT
time_mean = hfds_cube_gn.cube.collapsed("time", iris.analysis.MEAN)
qplt.pcolormesh(time_mean)
plt.gca().coastlines();
_images/usage_ocean-data_19_0.png
Getting SCM Timeseries

We cut down to SCM timeseries in the standard way.

[13]:
# NBVAL_IGNORE_OUTPUT
regions_to_get = [
    "World",
    "World|Northern Hemisphere",
    "World|Northern Hemisphere|Ocean",
    "World|Ocean",
    "World|Southern Hemisphere",
    "World|Southern Hemisphere|Ocean",
    "World|North Atlantic Ocean",
    "World|El Nino N3.4",
]
hfds_ts = hfds_cube.get_scm_timeseries(regions=regions_to_get)
hfds_gn_ts = hfds_cube_gn.get_scm_timeseries(regions=regions_to_get)

ax = plt.figure(figsize=(16, 9)).add_subplot(111)
ax = hfds_ts.lineplot(hue="region", style="variable", dashes=[(3, 3)], ax=ax)
hfds_gn_ts.lineplot(
    hue="region", style="variable", dashes=[(10, 30)], ax=ax, legend=False
);
Not calculating land fractions as all required cubes are not available
Performing lazy conversion to datetime for calendar: 365_day. This may cause subtle errors in operations that depend on the length of time between dates
WARNING: missing_value not used since it
cannot be safely cast to variable data type
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/fileformats/netcdf.py:395: UserWarning: WARNING: missing_value not used since it
cannot be safely cast to variable data type
  var = variable[keys]
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/fileformats/netcdf.py:395: UserWarning: WARNING: missing_value not used since it
cannot be safely cast to variable data type
  var = variable[keys]
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/fileformats/netcdf.py:395: UserWarning: WARNING: missing_value not used since it
cannot be safely cast to variable data type
  var = variable[keys]
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/fileformats/netcdf.py:395: UserWarning: WARNING: missing_value not used since it
cannot be safely cast to variable data type
  var = variable[keys]
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/fileformats/netcdf.py:395: UserWarning: WARNING: missing_value not used since it
cannot be safely cast to variable data type
  var = variable[keys]
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/fileformats/netcdf.py:395: UserWarning: WARNING: missing_value not used since it
cannot be safely cast to variable data type
  var = variable[keys]
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/fileformats/netcdf.py:395: UserWarning: WARNING: missing_value not used since it
cannot be safely cast to variable data type
  var = variable[keys]
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/fileformats/netcdf.py:395: UserWarning: WARNING: missing_value not used since it
cannot be safely cast to variable data type
  var = variable[keys]
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/fileformats/netcdf.py:395: UserWarning: WARNING: missing_value not used since it
cannot be safely cast to variable data type
  var = variable[keys]
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/fileformats/netcdf.py:395: UserWarning: WARNING: missing_value not used since it
cannot be safely cast to variable data type
  var = variable[keys]
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))
Not calculating land fractions as all required cubes are not available
Performing lazy conversion to datetime for calendar: 365_day. This may cause subtle errors in operations that depend on the length of time between dates
_images/usage_ocean-data_22_1.png

Comparing the results of collapsing the native grid and the regridded data reveals a small difference (approx 1%), in particular in the small El Nino N3.4 region.

[14]:
ax1, ax2 = plt.figure(figsize=(16, 9)).subplots(nrows=1, ncols=2)

ScmRun(hfds_ts.timeseries() - hfds_gn_ts.timeseries()).line_plot(
    hue="region", ax=ax1, legend=False
)
ax1.set_title("Absolute difference")

ScmRun(
    (
        (hfds_ts.timeseries() - hfds_gn_ts.timeseries()) / hfds_ts.timeseries()
    ).abs()
    * 100
).line_plot(hue="region", ax=ax2)
ax2.set_title("Percentage difference");
_images/usage_ocean-data_24_0.png
3D Data Handling
[15]:
# NBVAL_IGNORE_OUTPUT
thetao_cube = CMIP6OutputCube()
thetao_cube.load_data_from_path(thetao_file)
WARNING: missing_value not used since it
cannot be safely cast to variable data type
Missing CF-netCDF measure variable 'volcello', referenced by netCDF variable 'thetao'
[16]:
print(thetao_cube.cube)
sea_water_potential_temperature / (degC) (time: 3; generic: 60; -- : 384; -- : 320)
     Dimension coordinates:
          time                                x           -        -         -
          generic                             -           x        -         -
     Auxiliary coordinates:
          latitude                            -           -        x         x
          longitude                           -           -        x         x
     Cell Measures:
          cell_area                           -           -        x         x
     Attributes:
          CDI: Climate Data Interface version 1.8.2 (http://mpimet.mpg.de/cdi)
          CDO: Climate Data Operators version 1.8.2 (http://mpimet.mpg.de/cdo)
          Conventions: CF-1.7 CMIP-6.2
          activity_id: CMIP
          branch_method: standard
          branch_time_in_child: 674885.0
          branch_time_in_parent: 306600.0
          case_id: 24
          cesm_casename: b.e21.BHIST.f09_g17.CMIP6-historical.010
          comment: Diagnostic should be contributed even for models using conservative temperature...
          contact: cesm_cmip6@ucar.edu
          creation_date: 2019-03-12T02:46:53Z
          data_specs_version: 01.00.29
          description: Diagnostic should be contributed even for models using conservative temperature...
          experiment: Simulation of recent past (1850 to 2014). Impose changing conditions (consistent...
          experiment_id: historical
          external_variables: areacello volcello
          frequency: mon
          further_info_url: https://furtherinfo.es-doc.org/CMIP6.NCAR.CESM2.historical.none.r10i1p...
          grid: native gx1v7 displaced pole grid (384x320 latxlon)
          grid_label: gn
          history: Mon Aug 19 17:25:30 2019: cdo -selmonth,10/12 tmp.nc thetao_Omon_CESM2_historical_r10i1p1f1_gn_195310-195312.nc
Mon...
          id: thetao
          institution: National Center for Atmospheric Research
          institution_id: NCAR
          license: CMIP6 model data produced by <The National Center for Atmospheric Research>...
          mipTable: Omon
          mip_era: CMIP6
          model_doi_url: https://doi.org/10.5065/D67H1H0V
          nominal_resolution: 100 km
          out_name: thetao
          parent_activity_id: CMIP
          parent_experiment_id: piControl
          parent_mip_era: CMIP6
          parent_source_id: CESM2
          parent_time_units: days since 0001-01-01 00:00:00
          parent_variant_label: r1i1p1f1
          product: model-output
          prov: Omon ((isd.003))
          realm: ocean
          source: CESM2 (2017): atmosphere: CAM6 (0.9x1.25 finite volume grid; 288 x 192...
          source_id: CESM2
          source_type: AOGCM BGC
          sub_experiment: none
          sub_experiment_id: none
          table_id: Omon
          time: time
          time_label: time-mean
          time_title: Temporal mean
          title: Sea Water Potential Temperature
          tracking_id: hdl:21.14100/19f9ed4d-daf4-4a51-8563-fe32b9c2a0cd
          type: real
          variable_id: thetao
          variant_info: CMIP6 20th century experiments (1850-2014) with CAM6, interactive land...
          variant_label: r10i1p1f1
     Cell methods:
          mean where sea: area
          mean: time

If we take a time mean of a cube with 3D spatial data, we end up with a 3D cube, which cannot be plotted on a 2D plot.

[17]:
# NBVAL_IGNORE_OUTPUT
time_mean = thetao_cube.cube.collapsed("time", iris.analysis.MEAN)
try:
    qplt.pcolormesh(time_mean,)
except ValueError as e:
    traceback.print_exc(limit=0, chain=False)
Traceback (most recent call last):
ValueError: Cube must be 2-dimensional. Got 3 dimensions.

If we take e.g. a depth mean too, then we can plot (although as this data is on the model’s native grid iris doesn’t do a great job of plotting it).

[18]:
# NBVAL_IGNORE_OUTPUT
# the depth co-ordinate is labelled as 'generic' for some reason
time_depth_mean = time_mean.collapsed("generic", iris.analysis.MEAN)
qplt.pcolormesh(time_depth_mean);
_images/usage_ocean-data_31_0.png

We can crunch into SCM timeseries cubes.

[19]:
# NBVAL_IGNORE_OUTPUT
thetao_ts_cubes = thetao_cube.get_scm_timeseries_cubes(regions=regions_to_get)
WARNING: missing_value not used since it
cannot be safely cast to variable data type
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/fileformats/netcdf.py:395: UserWarning: WARNING: missing_value not used since it
cannot be safely cast to variable data type
  var = variable[keys]
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/fileformats/netcdf.py:395: UserWarning: WARNING: missing_value not used since it
cannot be safely cast to variable data type
  var = variable[keys]
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/fileformats/netcdf.py:395: UserWarning: WARNING: missing_value not used since it
cannot be safely cast to variable data type
  var = variable[keys]
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/fileformats/netcdf.py:395: UserWarning: WARNING: missing_value not used since it
cannot be safely cast to variable data type
  var = variable[keys]
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/fileformats/netcdf.py:395: UserWarning: WARNING: missing_value not used since it
cannot be safely cast to variable data type
  var = variable[keys]
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/fileformats/netcdf.py:395: UserWarning: WARNING: missing_value not used since it
cannot be safely cast to variable data type
  var = variable[keys]
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/fileformats/netcdf.py:395: UserWarning: WARNING: missing_value not used since it
cannot be safely cast to variable data type
  var = variable[keys]
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/fileformats/netcdf.py:395: UserWarning: WARNING: missing_value not used since it
cannot be safely cast to variable data type
  var = variable[keys]
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/fileformats/netcdf.py:395: UserWarning: WARNING: missing_value not used since it
cannot be safely cast to variable data type
  var = variable[keys]
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/fileformats/netcdf.py:395: UserWarning: WARNING: missing_value not used since it
cannot be safely cast to variable data type
  var = variable[keys]
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))
Not calculating land fractions as all required cubes are not available

These cubes now have dimensions of time and depth (labelled as ‘generic’ here). Hence we can plot them.

[20]:
plt.figure(figsize=(12, 15))

plt.subplot(311)
qplt.pcolormesh(thetao_ts_cubes["World"].cube,)
plt.title("World")

plt.subplot(323)
qplt.pcolormesh(thetao_ts_cubes["World|Northern Hemisphere|Ocean"].cube,)
plt.title("World|Northern Hemisphere|Ocean")

plt.subplot(324)
qplt.pcolormesh(thetao_ts_cubes["World|Southern Hemisphere|Ocean"].cube,)
plt.title("World|Southern Hemisphere|Ocean")

plt.subplot(325)
qplt.pcolormesh(thetao_ts_cubes["World|El Nino N3.4"].cube,)
plt.title("World|El Nino N3.4")

plt.subplot(326)
qplt.pcolormesh(thetao_ts_cubes["World|North Atlantic Ocean"].cube,)
plt.title("World|North Atlantic Ocean")

plt.tight_layout()
_images/usage_ocean-data_35_0.png

We have also not yet decided on our convention for handling the depth information in ScmRun’s, hence attempting to retrieve SCM timeseries will result in an error.

[21]:
# NBVAL_IGNORE_OUTPUT
try:
    thetao_cube.get_scm_timeseries(regions=regions_to_get)
except NotImplementedError as e:
    traceback.print_exc(limit=0, chain=False)
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/fileformats/netcdf.py:395: UserWarning: WARNING: missing_value not used since it
cannot be safely cast to variable data type
  var = variable[keys]
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/fileformats/netcdf.py:395: UserWarning: WARNING: missing_value not used since it
cannot be safely cast to variable data type
  var = variable[keys]
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/fileformats/netcdf.py:395: UserWarning: WARNING: missing_value not used since it
cannot be safely cast to variable data type
  var = variable[keys]
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/fileformats/netcdf.py:395: UserWarning: WARNING: missing_value not used since it
cannot be safely cast to variable data type
  var = variable[keys]
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/fileformats/netcdf.py:395: UserWarning: WARNING: missing_value not used since it
cannot be safely cast to variable data type
  var = variable[keys]
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/fileformats/netcdf.py:395: UserWarning: WARNING: missing_value not used since it
cannot be safely cast to variable data type
  var = variable[keys]
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/fileformats/netcdf.py:395: UserWarning: WARNING: missing_value not used since it
cannot be safely cast to variable data type
  var = variable[keys]
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/fileformats/netcdf.py:395: UserWarning: WARNING: missing_value not used since it
cannot be safely cast to variable data type
  var = variable[keys]
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'latitude'.
  warnings.warn(msg.format(self.name()))
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.8/site-packages/iris/coords.py:1406: UserWarning: Collapsing a multi-dimensional coordinate. Metadata may not be fully descriptive for 'longitude'.
  warnings.warn(msg.format(self.name()))
Not calculating land fractions as all required cubes are not available
Traceback (most recent call last):
NotImplementedError: Cannot yet get SCM timeseries for data with dimensions other than time, latitude and longitude
Wranglers

In this notebook we give a brief overview of wrangling with netCDF-SCM.

[1]:
# NBVAL_IGNORE_OUTPUT
import glob
from pathlib import Path

import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymagicc
[2]:
plt.style.use("bmh")
%matplotlib inline
Wrangling help

The wrangling help can be accessed via our command line interface.

[3]:
# NBVAL_IGNORE_OUTPUT
!netcdf-scm wrangle -h
Usage: netcdf-scm wrangle [OPTIONS] SRC DST WRANGLE_CONTACT

  Wrangle netCDF-SCM ``.nc`` files into other formats and directory
  structures.

  ``src`` is searched recursively and netcdf-scm will attempt to wrangle all
  the files found.

  ``wrangle_contact`` is written into the header of the output files.

Options:
  --regexp TEXT                   Regular expression to apply to file
                                  directory (only wrangles matches). Be
                                  careful, if you use a very copmlex regexp
                                  directory sorting can be extremely slow (see
                                  e.g. discussion at
                                  https://stackoverflow.com/a/5428712)!
                                  [default: ^(?!.*(fx)).*$]

  --prefix TEXT                   Prefix to apply to output file names (not
                                  paths).

  --out-format [mag-files|mag-files-average-year-start-year|mag-files-average-year-mid-year|mag-files-average-year-end-year|mag-files-point-start-year|mag-files-point-mid-year|mag-files-point-end-year|magicc-input-files|magicc-input-files-average-year-start-year|magicc-input-files-average-year-mid-year|magicc-input-files-average-year-end-year|magicc-input-files-point-start-year|magicc-input-files-point-mid-year|magicc-input-files-point-end-year|tuningstrucs-blend-model]
                                  Format to re-write crunched data into. The
                                  time operation conventions follow those in
                                  `Pymagicc <https://pymagicc.readthedocs.io/e
                                  n/latest/file_conventions.html#namelists>`_.
                                  [default: mag-files]

  --drs [None|MarbleCMIP5|CMIP6Input4MIPs|CMIP6Output]
                                  Data reference syntax to use to decipher
                                  paths. This is required to ensure the output
                                  folders match the input data reference
                                  syntax.  [default: None]

  -f, --force / --do-not-force    Overwrite any existing files.  [default:
                                  False]

  --number-workers INTEGER        Number of worker (threads) to use when
                                  wrangling.  [default: 4]

  --target-units-specs PATH       csv containing target units for wrangled
                                  variables.

  -h, --help                      Show this message and exit.
MAG file wrangling

The most common format to wrangle to is the .MAG format. This is a custom MAGICC format (see https://pymagicc.readthedocs.io/en/latest/file_conventions.html#the-future). We can wrangle data which has already been crunched to this format as shown below.

[4]:
# NBVAL_IGNORE_OUTPUT
!netcdf-scm wrangle \
    "../../../tests/test-data/expected-crunching-output/cmip6output/Lmon/CMIP6/CMIP/NCAR" \
    "../../../output-examples/wrangled-files" "notebook example <email address>" \
    --force \
    --drs "CMIP6Output" \
    --out-format "mag-files" \
    --regexp ".*cSoilFast.*"
87973 2021-03-18 13:05:52,227 INFO:netcdf_scm:netcdf-scm: 2.0.2+15.g74db9d85.dirty
87973 2021-03-18 13:05:52,228 INFO:netcdf_scm:wrangle_contact: notebook example <email address>
87973 2021-03-18 13:05:52,228 INFO:netcdf_scm:source: /Users/znicholls/Documents/AGCEC/netCDF-SCM/netcdf-scm/tests/test-data/expected-crunching-output/cmip6output/Lmon/CMIP6/CMIP/NCAR
87973 2021-03-18 13:05:52,228 INFO:netcdf_scm:destination: /Users/znicholls/Documents/AGCEC/netCDF-SCM/netcdf-scm/output-examples/wrangled-files
87973 2021-03-18 13:05:52,228 INFO:netcdf_scm:regexp: .*cSoilFast.*
87973 2021-03-18 13:05:52,228 INFO:netcdf_scm:prefix: None
87973 2021-03-18 13:05:52,228 INFO:netcdf_scm:drs: CMIP6Output
87973 2021-03-18 13:05:52,228 INFO:netcdf_scm:out_format: mag-files
87973 2021-03-18 13:05:52,228 INFO:netcdf_scm:force: True
87973 2021-03-18 13:05:52,230 INFO:netcdf_scm:Finding directories with files
Walking through directories and applying `check_func`: 11it [00:00, 9394.69it/s]
87973 2021-03-18 13:05:52,238 INFO:netcdf_scm:Found 1 directories with files
87973 2021-03-18 13:05:52,239 INFO:netcdf_scm.cli_parallel:Processing in parallel with 4 workers
87973 2021-03-18 13:05:52,239 INFO:netcdf_scm.cli_parallel:Forcing dask to use a single thread when reading
100%|████████████████████████████████████████| 1.00/1.00 [00:04<00:00, 4.32s/it]

We can then load the .MAG files using Pymagicc.

[5]:
written_files = [
    f for f in Path("../../../output-examples/wrangled-files").rglob("*.MAG")
]
written_files
[5]:
[PosixPath('../../../output-examples/wrangled-files/CMIP6/CMIP/NCAR/CESM2/historical/r7i1p1f1/Lmon/cSoilFast/gn/v20190311/netcdf-scm_cSoilFast_Lmon_CESM2_historical_r7i1p1f1_gn_195701-195703.MAG')]
[6]:
wrangled = pymagicc.io.MAGICCData(str(written_files[0]))
[7]:
# NBVAL_IGNORE_OUTPUT
wrangled.timeseries()
[7]:
time 1957-01-15 12:00:00 1957-02-14 00:00:00 1957-03-15 12:00:00
climate_model model region scenario todo unit variable
unspecified unspecified World unspecified SET kg m^-2 cSoilFast 0.085600 0.085547 0.085422
World|Northern Hemisphere unspecified SET kg m^-2 cSoilFast 0.097727 0.097910 0.098135
World|Southern Hemisphere unspecified SET kg m^-2 cSoilFast 0.060421 0.059879 0.059024
World|Land unspecified SET kg m^-2 cSoilFast 0.085600 0.085547 0.085422
World|Northern Hemisphere|Land unspecified SET kg m^-2 cSoilFast 0.097727 0.097910 0.098135
World|Southern Hemisphere|Land unspecified SET kg m^-2 cSoilFast 0.060421 0.059879 0.059024
[8]:
# NBVAL_IGNORE_OUTPUT
wrangled.lineplot(hue="region")
[8]:
<AxesSubplot:xlabel='time', ylabel='kg m^-2'>
_images/usage_wranglers_11_1.png
Adjusting units

The units of the wrangled data are kgmsuper-2. This might not be super helpful. As such, netcdf-scm wrangle allows users to specify a csv which defines the target units to use for variables when wrangling.

The conversion csv should look like the below.

[9]:
conv_csv = pd.DataFrame(
    [["cSoilFast", "t / m**2"], ["tos", "K"]], columns=["variable", "unit"]
)
conv_csv_path = "../../../output-examples/conversion-new-units.csv"
conv_csv.to_csv(conv_csv_path, index=False)
with open(conv_csv_path) as f:
    conv_csv_content = f.read()

print(conv_csv_content)
variable,unit
cSoilFast,t / m**2
tos,K

With such a csv, we can now crunch to our desired units.

[10]:
# NBVAL_IGNORE_OUTPUT
!netcdf-scm wrangle \
    "../../../tests/test-data/expected-crunching-output/cmip6output/Lmon/CMIP6/CMIP/NCAR" \
    "../../../output-examples/wrangled-files-new-units" \
    "notebook example <email address>" \
    --force --drs "CMIP6Output" \
    --out-format "mag-files" \
    --regexp ".*cSoilFast.*" \
    --target-units-specs "../../../output-examples/conversion-new-units.csv"
87988 2021-03-18 13:06:01,020 INFO:netcdf_scm:netcdf-scm: 2.0.2+15.g74db9d85.dirty
87988 2021-03-18 13:06:01,020 INFO:netcdf_scm:wrangle_contact: notebook example <email address>
87988 2021-03-18 13:06:01,020 INFO:netcdf_scm:source: /Users/znicholls/Documents/AGCEC/netCDF-SCM/netcdf-scm/tests/test-data/expected-crunching-output/cmip6output/Lmon/CMIP6/CMIP/NCAR
87988 2021-03-18 13:06:01,020 INFO:netcdf_scm:destination: /Users/znicholls/Documents/AGCEC/netCDF-SCM/netcdf-scm/output-examples/wrangled-files-new-units
87988 2021-03-18 13:06:01,020 INFO:netcdf_scm:regexp: .*cSoilFast.*
87988 2021-03-18 13:06:01,021 INFO:netcdf_scm:prefix: None
87988 2021-03-18 13:06:01,021 INFO:netcdf_scm:drs: CMIP6Output
87988 2021-03-18 13:06:01,021 INFO:netcdf_scm:out_format: mag-files
87988 2021-03-18 13:06:01,021 INFO:netcdf_scm:force: True
87988 2021-03-18 13:06:01,022 INFO:netcdf_scm:Finding directories with files
Walking through directories and applying `check_func`: 11it [00:00, 9150.60it/s]
87988 2021-03-18 13:06:01,030 INFO:netcdf_scm:Found 1 directories with files
87988 2021-03-18 13:06:01,031 INFO:netcdf_scm.cli_parallel:Processing in parallel with 4 workers
87988 2021-03-18 13:06:01,031 INFO:netcdf_scm.cli_parallel:Forcing dask to use a single thread when reading
100%|████████████████████████████████████████| 1.00/1.00 [00:04<00:00, 4.28s/it]
[11]:
# NBVAL_IGNORE_OUTPUT
written_files = [
    f
    for f in Path("../../../output-examples/wrangled-files-new-units").rglob(
        "*.MAG"
    )
]
wrangled_new_units = pymagicc.io.MAGICCData(str(written_files[0]))
wrangled_new_units.timeseries()
[11]:
time 1957-01-15 12:00:00 1957-02-14 00:00:00 1957-03-15 12:00:00
climate_model model region scenario todo unit variable
unspecified unspecified World unspecified SET t / m^2 cSoilFast 0.000086 0.000086 0.000085
World|Northern Hemisphere unspecified SET t / m^2 cSoilFast 0.000098 0.000098 0.000098
World|Southern Hemisphere unspecified SET t / m^2 cSoilFast 0.000060 0.000060 0.000059
World|Land unspecified SET t / m^2 cSoilFast 0.000086 0.000086 0.000085
World|Northern Hemisphere|Land unspecified SET t / m^2 cSoilFast 0.000098 0.000098 0.000098
World|Southern Hemisphere|Land unspecified SET t / m^2 cSoilFast 0.000060 0.000060 0.000059
[12]:
# NBVAL_IGNORE_OUTPUT
wrangled_new_units.lineplot(hue="region")
[12]:
<AxesSubplot:xlabel='time', ylabel='t / m^2'>
_images/usage_wranglers_18_1.png
Taking area sum

We can also set the units to include an area sum. For example, if we set our units to Gt / yr rather than Gt / m**2 / yr then the wrangler will automatically take an area sum of the data (weighted by the effective area used in the crunching) before returning the data.

[13]:
conv_csv = pd.DataFrame(
    [["cSoilFast", "Gt"], ["tos", "K"]], columns=["variable", "unit"]
)
conv_csv_path = "../../../output-examples/conversion-area-sum-units.csv"
conv_csv.to_csv(conv_csv_path, index=False)
with open(conv_csv_path) as f:
    conv_csv_content = f.read()

print(conv_csv_content)
variable,unit
cSoilFast,Gt
tos,K

[14]:
# NBVAL_IGNORE_OUTPUT
!netcdf-scm wrangle \
    "../../../tests/test-data/expected-crunching-output/cmip6output/Lmon/CMIP6/CMIP/NCAR" \
    "../../../output-examples/wrangled-files-area-sum-units" \
    "notebook example <email address>" \
    --force \
    --drs "CMIP6Output" \
    --out-format "mag-files" \
    --regexp ".*cSoilFast.*" \
    --target-units-specs "../../../output-examples/conversion-area-sum-units.csv"
88003 2021-03-18 13:06:10,165 INFO:netcdf_scm:netcdf-scm: 2.0.2+15.g74db9d85.dirty
88003 2021-03-18 13:06:10,166 INFO:netcdf_scm:wrangle_contact: notebook example <email address>
88003 2021-03-18 13:06:10,166 INFO:netcdf_scm:source: /Users/znicholls/Documents/AGCEC/netCDF-SCM/netcdf-scm/tests/test-data/expected-crunching-output/cmip6output/Lmon/CMIP6/CMIP/NCAR
88003 2021-03-18 13:06:10,166 INFO:netcdf_scm:destination: /Users/znicholls/Documents/AGCEC/netCDF-SCM/netcdf-scm/output-examples/wrangled-files-area-sum-units
88003 2021-03-18 13:06:10,166 INFO:netcdf_scm:regexp: .*cSoilFast.*
88003 2021-03-18 13:06:10,166 INFO:netcdf_scm:prefix: None
88003 2021-03-18 13:06:10,166 INFO:netcdf_scm:drs: CMIP6Output
88003 2021-03-18 13:06:10,166 INFO:netcdf_scm:out_format: mag-files
88003 2021-03-18 13:06:10,166 INFO:netcdf_scm:force: True
88003 2021-03-18 13:06:10,168 INFO:netcdf_scm:Finding directories with files
Walking through directories and applying `check_func`: 11it [00:00, 9617.96it/s]
88003 2021-03-18 13:06:10,176 INFO:netcdf_scm:Found 1 directories with files
88003 2021-03-18 13:06:10,177 INFO:netcdf_scm.cli_parallel:Processing in parallel with 4 workers
88003 2021-03-18 13:06:10,177 INFO:netcdf_scm.cli_parallel:Forcing dask to use a single thread when reading
100%|████████████████████████████████████████| 1.00/1.00 [00:04<00:00, 4.44s/it]
[15]:
# NBVAL_IGNORE_OUTPUT
written_files = [
    f
    for f in Path(
        "../../../output-examples/wrangled-files-area-sum-units"
    ).rglob("*.MAG")
]
wrangled_area_sum_units = pymagicc.io.MAGICCData(str(written_files[0]))
wrangled_area_sum_units.timeseries()
[15]:
time 1957-01-15 12:00:00 1957-02-14 00:00:00 1957-03-15 12:00:00
climate_model model region scenario todo unit variable
unspecified unspecified World unspecified SET Gt cSoilFast 12.79290 12.7849 12.76610
World|Land unspecified SET Gt cSoilFast 12.79290 12.7849 12.76610
World|Northern Hemisphere unspecified SET Gt cSoilFast 9.85760 9.8760 9.89873
World|Northern Hemisphere|Land unspecified SET Gt cSoilFast 9.85760 9.8760 9.89873
World|Southern Hemisphere unspecified SET Gt cSoilFast 2.93526 2.9089 2.86740
World|Southern Hemisphere|Land unspecified SET Gt cSoilFast 2.93526 2.9089 2.86740
[16]:
# NBVAL_IGNORE_OUTPUT
solid_regions = [
    "World",
    "World|Northern Hemisphere",
    "World|Southern Hemisphere",
]
ax = wrangled_area_sum_units.filter(region=solid_regions).lineplot(
    hue="region", linestyle="-"
)
wrangled_area_sum_units.filter(region=solid_regions, keep=False).lineplot(
    hue="region", linestyle="--", dashes=(5, 7.5), ax=ax
)
[16]:
<AxesSubplot:xlabel='time', ylabel='Gt'>
_images/usage_wranglers_23_1.png

As one last sanity check, we can make sure that the world total equals the regional total to within rounding errors.

[17]:
np.testing.assert_allclose(
    wrangled_area_sum_units.filter(region="World")
    .timeseries()
    .values.squeeze(),
    wrangled_area_sum_units.filter(
        region=["World|Northern Hemisphere", "World|Southern Hemisphere"]
    )
    .timeseries()
    .sum()
    .values.squeeze(),
    rtol=1e-5,
)
Time operations

The wrangling can also include a few basic time operations e.g. annual means or interpolation onto different grids. The different out-format codes follow those in Pymagicc (link to be updated once PR is merged). Here we show one example where we take the annual mean as part of the wrangling process.

[18]:
# NBVAL_IGNORE_OUTPUT
!netcdf-scm wrangle \
    "../../../tests/test-data/expected-crunching-output/cmip6output/Amon/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/piControl" \
    "../../../output-examples/wrangled-files-average-year" \
    "notebook example <email address>" \
    --force \
    --drs "CMIP6Output" \
    --out-format "mag-files-average-year-mid-year"
88024 2021-03-18 13:06:19,120 INFO:netcdf_scm:netcdf-scm: 2.0.2+15.g74db9d85.dirty
88024 2021-03-18 13:06:19,120 INFO:netcdf_scm:wrangle_contact: notebook example <email address>
88024 2021-03-18 13:06:19,120 INFO:netcdf_scm:source: /Users/znicholls/Documents/AGCEC/netCDF-SCM/netcdf-scm/tests/test-data/expected-crunching-output/cmip6output/Amon/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/piControl
88024 2021-03-18 13:06:19,120 INFO:netcdf_scm:destination: /Users/znicholls/Documents/AGCEC/netCDF-SCM/netcdf-scm/output-examples/wrangled-files-average-year
88024 2021-03-18 13:06:19,120 INFO:netcdf_scm:regexp: ^(?!.*(fx)).*$
88024 2021-03-18 13:06:19,120 INFO:netcdf_scm:prefix: None
88024 2021-03-18 13:06:19,120 INFO:netcdf_scm:drs: CMIP6Output
88024 2021-03-18 13:06:19,120 INFO:netcdf_scm:out_format: mag-files-average-year-mid-year
88024 2021-03-18 13:06:19,120 INFO:netcdf_scm:force: True
88024 2021-03-18 13:06:19,120 INFO:netcdf_scm:Finding directories with files
Walking through directories and applying `check_func`: 6it [00:00, 9062.23it/s]
88024 2021-03-18 13:06:19,127 INFO:netcdf_scm:Found 1 directories with files
88024 2021-03-18 13:06:19,128 INFO:netcdf_scm.cli_parallel:Processing in parallel with 4 workers
88024 2021-03-18 13:06:19,128 INFO:netcdf_scm.cli_parallel:Forcing dask to use a single thread when reading
  0%|                                               | 0.00/1.00 [00:00<?, ?it/s]/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/xarray/coding/times.py:463: SerializationWarning: Unable to decode time axis into full numpy.datetime64 objects, continuing using cftime.datetime objects instead, reason: dates out of range
  dtype = _decode_cf_datetime_dtype(data, units, calendar, self.use_cftime)
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/xarray/coding/times.py:463: SerializationWarning: Unable to decode time axis into full numpy.datetime64 objects, continuing using cftime.datetime objects instead, reason: dates out of range
  dtype = _decode_cf_datetime_dtype(data, units, calendar, self.use_cftime)
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/numpy/core/_asarray.py:102: SerializationWarning: Unable to decode time axis into full numpy.datetime64 objects, continuing using cftime.datetime objects instead, reason: dates out of range
  return array(a, dtype, copy=False, order=order)
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/numpy/core/_asarray.py:102: SerializationWarning: Unable to decode time axis into full numpy.datetime64 objects, continuing using cftime.datetime objects instead, reason: dates out of range
  return array(a, dtype, copy=False, order=order)
100%|████████████████████████████████████████| 1.00/1.00 [00:04<00:00, 4.82s/it]
[19]:
# NBVAL_IGNORE_OUTPUT
written_files = [
    f
    for f in Path(
        "../../../output-examples/wrangled-files-average-year"
    ).rglob("*.MAG")
]
wrangled_annual_mean = pymagicc.io.MAGICCData(str(written_files[0]))
wrangled_annual_mean.timeseries()
[19]:
time 2840-07-01 00:00:00 2841-07-01 00:00:00 2842-07-01 00:00:00 2843-07-01 00:00:00 2844-07-01 00:00:00 2845-07-01 00:00:00 2846-07-01 00:00:00 2847-07-01 00:00:00 2848-07-01 00:00:00 2849-07-01 00:00:00 2850-07-01 00:00:00 2851-07-01 00:00:00 2852-07-01 00:00:00 2853-07-01 00:00:00 2854-07-01 00:00:00 2855-07-01 00:00:00 2856-07-01 00:00:00 2857-07-01 00:00:00 2858-07-01 00:00:00 2859-07-01 00:00:00
climate_model model region scenario todo unit variable
unspecified unspecified World unspecified SET K tas 285.883 285.841 285.847 285.860 286.026 285.880 285.612 285.717 285.700 285.913 285.862 285.964 286.154 285.959 286.096 286.110 286.074 285.771 285.768 285.969
World|Northern Hemisphere unspecified SET K tas 286.566 286.482 286.411 286.538 286.717 286.539 286.240 286.387 286.297 286.521 286.558 286.596 286.700 286.628 286.699 286.866 286.802 286.469 286.387 286.682
World|Southern Hemisphere unspecified SET K tas 285.184 285.185 285.270 285.167 285.320 285.206 284.970 285.033 285.090 285.292 285.150 285.318 285.596 285.275 285.479 285.338 285.331 285.057 285.135 285.240
World|Land unspecified SET K tas 279.502 279.494 279.590 279.505 279.732 279.534 279.048 279.332 279.280 279.630 279.474 279.452 279.771 279.458 279.714 279.814 279.724 279.463 279.369 279.686
World|Ocean unspecified SET K tas 288.453 288.397 288.366 288.419 288.561 288.435 288.255 288.289 288.286 288.444 288.434 288.587 288.725 288.577 288.666 288.646 288.632 288.311 288.344 288.499
World|Northern Hemisphere|Land unspecified SET K tas 281.049 281.014 281.052 281.100 281.308 281.099 280.672 280.947 280.782 281.140 281.171 280.932 281.135 281.040 281.087 281.490 281.301 281.107 280.946 281.283
World|Southern Hemisphere|Land unspecified SET K tas 276.230 276.279 276.497 276.131 276.400 276.223 275.613 275.917 276.103 276.438 275.885 276.322 276.885 276.113 276.811 276.270 276.389 275.986 276.034 276.309
World|Northern Hemisphere|Ocean unspecified SET K tas 290.029 289.914 289.774 289.950 290.112 289.952 289.734 289.801 289.759 289.899 289.939 290.151 290.192 290.134 290.221 290.239 290.254 289.833 289.801 290.070
World|Southern Hemisphere|Ocean unspecified SET K tas 287.236 287.225 287.279 287.237 287.363 287.264 287.113 287.121 287.149 287.320 287.273 287.379 287.592 287.374 287.465 287.416 287.379 287.135 287.220 287.286
World|North Atlantic Ocean unspecified SET K tas 291.030 291.114 290.932 290.894 291.015 290.987 290.787 290.591 290.833 290.852 290.752 290.677 291.057 291.026 291.205 291.117 291.371 291.050 290.847 291.153
World|El Nino N3.4 unspecified SET K tas 297.656 296.947 296.951 297.666 297.818 297.051 296.019 296.918 296.887 297.067 297.055 298.488 297.646 297.392 298.025 297.780 296.800 295.766 297.053 298.179
[20]:
# NBVAL_IGNORE_OUTPUT
wrangled_annual_mean.lineplot(hue="region")
[20]:
<AxesSubplot:xlabel='time', ylabel='K'>
_images/usage_wranglers_29_1.png
[21]:
# NBVAL_IGNORE_OUTPUT
fig = plt.figure(figsize=(16, 9))
ax = fig.add_subplot(221)
wrangled_annual_mean.filter(region=["World", "World|*Hemisphere"]).lineplot(
    hue="region", ax=ax
)

ax = fig.add_subplot(222, sharey=ax, sharex=ax)
wrangled_annual_mean.filter(
    region=["World", "World|Land", "World|Ocean"]
).lineplot(hue="region", ax=ax)

ax = fig.add_subplot(223, sharey=ax, sharex=ax)
wrangled_annual_mean.filter(region=["World", "World|*Hemis*|*"]).lineplot(
    hue="region", ax=ax
)

ax = fig.add_subplot(224, sharey=ax, sharex=ax)
wrangled_annual_mean.filter(
    region=["World", "World|*El*", "World|*Ocean*"]
).lineplot(hue="region", ax=ax)
[21]:
<AxesSubplot:xlabel='time', ylabel='K'>
_images/usage_wranglers_30_1.png
Weights

In this notebook we demonstrate all of netCDF-SCM’s known weightings. These weights are used when taking area overages for different SCM boxes e.g. the ocean/land boxes or the El Nino box.

Note: here we use the “last resort” land surface fraction values. However, if land surface fraction data is available then that is used to do land/ocean weighting rather than the “last resort” values.

This notebook is set out as follows:

  1. we show the default weights

  2. we show how the different available options for combining area and surface fraction information

  3. we show all our inbuilt weights

  4. we show how the user can define their own custom weights.

Imports
[1]:
# NBVAL_IGNORE_OUTPUT
from os.path import join

import iris
import iris.quickplot as qplt
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import regionmask

from netcdf_scm.iris_cube_wrappers import CMIP6OutputCube
from netcdf_scm.weights import (
    AreaSurfaceFractionWeightCalculator,
    AreaWeightCalculator,
    get_weights_for_area,
    WEIGHTS_FUNCTIONS_WITHOUT_AREA_WEIGHTING,
)
[2]:
plt.style.use("bmh")
%matplotlib inline
Data path

Here we use our test data.

[3]:
DATA_PATH_TEST = join("..", "..", "..", "tests", "test-data")
DATA_PATH_TEST_CMIP6_ROOT = join(DATA_PATH_TEST, "cmip6output")
Load the cube
[4]:
example = CMIP6OutputCube()
example.load_data_in_directory(
    join(
        DATA_PATH_TEST_CMIP6_ROOT,
        #         "CMIP6/ScenarioMIP/BCC/BCC-CSM2-MR/ssp126/r1i1p1f1/Amon/example/gn/v20190314",
        "CMIP6/CMIP/NCAR/CESM2/historical/r10i1p1f1/Amon/tas/gn/v20190313",
    )
)

Interpolate the cube to get higher resolution data.

[5]:
sample_points = [
    ("longitude", np.arange(0, 360, 2)),
    ("latitude", np.arange(-90, 90 + 1, 2)),
]
example.cube = example.cube.interpolate(sample_points, iris.analysis.Linear())
Weights
Default weights

By default, only land/ocean and hemispheric weights are considered.

[6]:
# NBVAL_IGNORE_OUTPUT
default_weights = example.get_scm_timeseries_weights()
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/analysis/cartography.py:394: UserWarning: Using DEFAULT_SPHERICAL_EARTH_RADIUS.
  warnings.warn("Using DEFAULT_SPHERICAL_EARTH_RADIUS.")
[7]:
# NBVAL_IGNORE_OUTPUT
def plot_weights(weights_to_plot, constraint=None, axes=None, **kwargs):
    for i, (label, weights) in enumerate(weights_to_plot.items()):
        if axes is None:
            ax = plt.figure().add_subplot(111)
        else:
            ax = axes[i]

        weight_cube = example.cube.collapsed("time", iris.analysis.MEAN)
        weight_cube.data = weights
        weight_cube.units = ""
        if constraint is not None:
            weight_cube = weight_cube.extract(constraint)

        plt.sca(ax)

        qplt.pcolormesh(
            weight_cube, **kwargs,
        )

        plt.gca().set_title(label)
        plt.gca().coastlines()


plot_weights(default_weights)
_images/usage_weights_12_0.png
_images/usage_weights_12_1.png
_images/usage_weights_12_2.png
_images/usage_weights_12_3.png
_images/usage_weights_12_4.png
_images/usage_weights_12_5.png
_images/usage_weights_12_6.png
_images/usage_weights_12_7.png
_images/usage_weights_12_8.png
Area and surface fraction combination options

By defaults, the weights are calculated as the combination of area and surface fractions using netcdf_scm.weights.AreaSurfaceFractionWeightCalculator.

[8]:
# NBVAL_IGNORE_OUTPUT
print(AreaSurfaceFractionWeightCalculator.__doc__)

    Calculates weights which are both area and surface fraction weighted

    .. math::

        w(lat, lon) = a(lat, lon) \\times s(lat, lon)

    where :math:`w(lat, lon)` is the weight of the cell at given latitude and
    longitude, :math:`a` is area of the cell and :math:`s` is the surface
    fraction of the cell (e.g. fraction of ocean area for ocean based regions).

For land/ocean weights, this causes regions on coastlines to have weights less than their area weight, because they are not fully land or ocean.

The user can instead use netcdf_scm.weights.AreaWeightCalculator, which focusses on area weights but removes any areas that have a surface fraction of zero.

[9]:
# NBVAL_IGNORE_OUTPUT
print(AreaWeightCalculator.__doc__)

    Calculates weights which are area weighted but surface fraction aware.

    This means that any cells which have a surface fraction of zero will
    receive zero weight, otherwise cells are purely area weighted.

    .. math::

        w(lat, lon) = \\begin{cases}
            a(lat, lon), & s(lat, lon) > 0 \\\\
            0, & s(lat, lon) = 0
        \\end{cases}

    where :math:`w(lat, lon)` is the weight of the cell at given latitude and
    longitude, :math:`a` is area of the cell and :math:`s` is the surface
    fraction of the cell (e.g. fraction of ocean area for ocean based regions).

[10]:
# NBVAL_IGNORE_OUTPUT
area_weights = example.get_scm_timeseries_weights(cell_weights="area-only")
/Users/znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/analysis/cartography.py:394: UserWarning: Using DEFAULT_SPHERICAL_EARTH_RADIUS.
  warnings.warn("Using DEFAULT_SPHERICAL_EARTH_RADIUS.")
[11]:
# NBVAL_IGNORE_OUTPUT
fig, axes = plt.subplots(figsize=(16, 9), nrows=2, ncols=2)

for i, (w, title) in enumerate(
    ((default_weights, "Default"), (area_weights, "No land fraction"))
):
    plt_weights = {k: w[k] for k in ["World|Ocean", "World|Land"]}
    zoom_constraint = iris.Constraint(
        latitude=lambda cell: -45 < cell < -25
    ) & iris.Constraint(longitude=lambda cell: 120 < cell < 160)
    plot_weights(
        plt_weights, constraint=zoom_constraint, axes=[axes[0][i], axes[1][i]],
    )

cf = plt.gcf()
for i, (w, title) in enumerate(
    ((default_weights, "Default"), (area_weights, "Area only"))
):
    title_ax = cf.axes[i * 4]
    title_ax.set_title("{}\n{}".format(title, title_ax.get_title()))
_images/usage_weights_19_0.png
All inbuilt masks

The default masks do not contain all inbuilt masks. We also provide masks for the IPCC AR6 regions, as defined in Iturbide et al. (2020), as well as country-level (at the 50m scale) masks defined by Natural Earth. For both these masks, we use the regionmask implementation.

The regionmask names can be inspected as shown below. Not that the abbreviations for the countries are not unique.

[12]:
regionmask_countries = (
    pd.DataFrame(
        {
            "name": regionmask.defined_regions.natural_earth.countries_50.names,
            "abbreviation": regionmask.defined_regions.natural_earth.countries_50.abbrevs,
        }
    )
    .sort_values(by="name")
    .reset_index(drop=True)
)
regionmask_countries
[12]:
name abbreviation
0 Afghanistan AF
1 Albania AL
2 Algeria DZ
3 American Samoa AS
4 Andorra AND
... ... ...
236 Yemen YE
237 Zambia ZM
238 Zimbabwe ZW
239 eSwatini SW
240 Åland AI

241 rows × 2 columns

Below we show a selection of plots for the regions we include.

[13]:
selection_inbuilt_weights = example.get_scm_timeseries_weights(
    regions=[
        "World",
        "World|Northern Hemisphere",
        "World|Southern Hemisphere",
        "World|Land",
        "World|Ocean",
        "World|Northern Hemisphere|Land",
        "World|Southern Hemisphere|Land",
        "World|Northern Hemisphere|Ocean",
        "World|Southern Hemisphere|Ocean",
        "World|North Atlantic Ocean",
        "World|El Nino N3.4",
        "World|AR6|GIC",
        "World|AR6|NWN",
        "World|AR6|NEN",
        "World|AR6|WNA",
        "World|AR6|SSA",
        "World|AR6|NEU",
    ]
    + [
        "World|Natural Earth 50m|{}".format(c)
        for c in [
            "Australia",
            "Austria",
            "China",
            "New Zealand",
            "United States of America",
            # this fails as the region is tiny and
            # our data is not high-resolution enough to capture it
            "Vatican",
            "Vietnam",
        ]
    ]
)
/Users/znicholls/Documents/AGCEC/netCDF-SCM/netcdf-scm/src/netcdf_scm/weights/__init__.py:869: UserWarning: Failed to create 'World|Natural Earth 50m|Vatican' weights: All weights are zero for region: `World|Natural Earth 50m|Vatican`
  warnings.warn(warn_str)
[14]:
# NBVAL_IGNORE_OUTPUT
plot_weights(selection_inbuilt_weights)
<ipython-input-7-e10806626a7c>:5: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
  ax = plt.figure().add_subplot(111)
_images/usage_weights_25_1.png
_images/usage_weights_25_2.png
_images/usage_weights_25_3.png
_images/usage_weights_25_4.png
_images/usage_weights_25_5.png
_images/usage_weights_25_6.png
_images/usage_weights_25_7.png
_images/usage_weights_25_8.png
_images/usage_weights_25_9.png
_images/usage_weights_25_10.png
_images/usage_weights_25_11.png
_images/usage_weights_25_12.png
_images/usage_weights_25_13.png
_images/usage_weights_25_14.png
_images/usage_weights_25_15.png
_images/usage_weights_25_16.png
_images/usage_weights_25_17.png
_images/usage_weights_25_18.png
_images/usage_weights_25_19.png
_images/usage_weights_25_20.png
_images/usage_weights_25_21.png
_images/usage_weights_25_22.png
_images/usage_weights_25_23.png
[15]:
# full list of available regions
sorted(list(WEIGHTS_FUNCTIONS_WITHOUT_AREA_WEIGHTING.keys()))
[15]:
['World',
 'World|AR6|ARO',
 'World|AR6|ARP',
 'World|AR6|ARS',
 'World|AR6|BOB',
 'World|AR6|CAF',
 'World|AR6|CAR',
 'World|AR6|CAU',
 'World|AR6|CNA',
 'World|AR6|EAN',
 'World|AR6|EAO',
 'World|AR6|EAS',
 'World|AR6|EAU',
 'World|AR6|ECA',
 'World|AR6|EEU',
 'World|AR6|EIO',
 'World|AR6|ENA',
 'World|AR6|EPO',
 'World|AR6|ESAF',
 'World|AR6|ESB',
 'World|AR6|GIC',
 'World|AR6|MDG',
 'World|AR6|MED',
 'World|AR6|NAO',
 'World|AR6|NAU',
 'World|AR6|NCA',
 'World|AR6|NEAF',
 'World|AR6|NEN',
 'World|AR6|NES',
 'World|AR6|NEU',
 'World|AR6|NPO',
 'World|AR6|NSA',
 'World|AR6|NWN',
 'World|AR6|NWS',
 'World|AR6|NZ',
 'World|AR6|RAR',
 'World|AR6|RFE',
 'World|AR6|SAH',
 'World|AR6|SAM',
 'World|AR6|SAO',
 'World|AR6|SAS',
 'World|AR6|SAU',
 'World|AR6|SCA',
 'World|AR6|SEA',
 'World|AR6|SEAF',
 'World|AR6|SES',
 'World|AR6|SIO',
 'World|AR6|SOO',
 'World|AR6|SPO',
 'World|AR6|SSA',
 'World|AR6|SWS',
 'World|AR6|TIB',
 'World|AR6|WAF',
 'World|AR6|WAN',
 'World|AR6|WCA',
 'World|AR6|WCE',
 'World|AR6|WNA',
 'World|AR6|WSAF',
 'World|AR6|WSB',
 'World|El Nino N3.4',
 'World|Land',
 'World|Natural Earth 50m|Afghanistan',
 'World|Natural Earth 50m|Albania',
 'World|Natural Earth 50m|Algeria',
 'World|Natural Earth 50m|American Samoa',
 'World|Natural Earth 50m|Andorra',
 'World|Natural Earth 50m|Angola',
 'World|Natural Earth 50m|Anguilla',
 'World|Natural Earth 50m|Antarctica',
 'World|Natural Earth 50m|Antigua and Barb.',
 'World|Natural Earth 50m|Argentina',
 'World|Natural Earth 50m|Armenia',
 'World|Natural Earth 50m|Aruba',
 'World|Natural Earth 50m|Ashmore and Cartier Is.',
 'World|Natural Earth 50m|Australia',
 'World|Natural Earth 50m|Austria',
 'World|Natural Earth 50m|Azerbaijan',
 'World|Natural Earth 50m|Bahamas',
 'World|Natural Earth 50m|Bahrain',
 'World|Natural Earth 50m|Bangladesh',
 'World|Natural Earth 50m|Barbados',
 'World|Natural Earth 50m|Belarus',
 'World|Natural Earth 50m|Belgium',
 'World|Natural Earth 50m|Belize',
 'World|Natural Earth 50m|Benin',
 'World|Natural Earth 50m|Bermuda',
 'World|Natural Earth 50m|Bhutan',
 'World|Natural Earth 50m|Bolivia',
 'World|Natural Earth 50m|Bosnia and Herz.',
 'World|Natural Earth 50m|Botswana',
 'World|Natural Earth 50m|Br. Indian Ocean Ter.',
 'World|Natural Earth 50m|Brazil',
 'World|Natural Earth 50m|British Virgin Is.',
 'World|Natural Earth 50m|Brunei',
 'World|Natural Earth 50m|Bulgaria',
 'World|Natural Earth 50m|Burkina Faso',
 'World|Natural Earth 50m|Burundi',
 'World|Natural Earth 50m|Cabo Verde',
 'World|Natural Earth 50m|Cambodia',
 'World|Natural Earth 50m|Cameroon',
 'World|Natural Earth 50m|Canada',
 'World|Natural Earth 50m|Cayman Is.',
 'World|Natural Earth 50m|Central African Rep.',
 'World|Natural Earth 50m|Chad',
 'World|Natural Earth 50m|Chile',
 'World|Natural Earth 50m|China',
 'World|Natural Earth 50m|Colombia',
 'World|Natural Earth 50m|Comoros',
 'World|Natural Earth 50m|Congo',
 'World|Natural Earth 50m|Cook Is.',
 'World|Natural Earth 50m|Costa Rica',
 'World|Natural Earth 50m|Croatia',
 'World|Natural Earth 50m|Cuba',
 'World|Natural Earth 50m|Curaçao',
 'World|Natural Earth 50m|Cyprus',
 'World|Natural Earth 50m|Czechia',
 "World|Natural Earth 50m|Côte d'Ivoire",
 'World|Natural Earth 50m|Dem. Rep. Congo',
 'World|Natural Earth 50m|Denmark',
 'World|Natural Earth 50m|Djibouti',
 'World|Natural Earth 50m|Dominica',
 'World|Natural Earth 50m|Dominican Rep.',
 'World|Natural Earth 50m|Ecuador',
 'World|Natural Earth 50m|Egypt',
 'World|Natural Earth 50m|El Salvador',
 'World|Natural Earth 50m|Eq. Guinea',
 'World|Natural Earth 50m|Eritrea',
 'World|Natural Earth 50m|Estonia',
 'World|Natural Earth 50m|Ethiopia',
 'World|Natural Earth 50m|Faeroe Is.',
 'World|Natural Earth 50m|Falkland Is.',
 'World|Natural Earth 50m|Fiji',
 'World|Natural Earth 50m|Finland',
 'World|Natural Earth 50m|Fr. Polynesia',
 'World|Natural Earth 50m|Fr. S. Antarctic Lands',
 'World|Natural Earth 50m|France',
 'World|Natural Earth 50m|Gabon',
 'World|Natural Earth 50m|Gambia',
 'World|Natural Earth 50m|Georgia',
 'World|Natural Earth 50m|Germany',
 'World|Natural Earth 50m|Ghana',
 'World|Natural Earth 50m|Greece',
 'World|Natural Earth 50m|Greenland',
 'World|Natural Earth 50m|Grenada',
 'World|Natural Earth 50m|Guam',
 'World|Natural Earth 50m|Guatemala',
 'World|Natural Earth 50m|Guernsey',
 'World|Natural Earth 50m|Guinea',
 'World|Natural Earth 50m|Guinea-Bissau',
 'World|Natural Earth 50m|Guyana',
 'World|Natural Earth 50m|Haiti',
 'World|Natural Earth 50m|Heard I. and McDonald Is.',
 'World|Natural Earth 50m|Honduras',
 'World|Natural Earth 50m|Hong Kong',
 'World|Natural Earth 50m|Hungary',
 'World|Natural Earth 50m|Iceland',
 'World|Natural Earth 50m|India',
 'World|Natural Earth 50m|Indian Ocean Ter.',
 'World|Natural Earth 50m|Indonesia',
 'World|Natural Earth 50m|Iran',
 'World|Natural Earth 50m|Iraq',
 'World|Natural Earth 50m|Ireland',
 'World|Natural Earth 50m|Isle of Man',
 'World|Natural Earth 50m|Israel',
 'World|Natural Earth 50m|Italy',
 'World|Natural Earth 50m|Jamaica',
 'World|Natural Earth 50m|Japan',
 'World|Natural Earth 50m|Jersey',
 'World|Natural Earth 50m|Jordan',
 'World|Natural Earth 50m|Kazakhstan',
 'World|Natural Earth 50m|Kenya',
 'World|Natural Earth 50m|Kiribati',
 'World|Natural Earth 50m|Kosovo',
 'World|Natural Earth 50m|Kuwait',
 'World|Natural Earth 50m|Kyrgyzstan',
 'World|Natural Earth 50m|Laos',
 'World|Natural Earth 50m|Latvia',
 'World|Natural Earth 50m|Lebanon',
 'World|Natural Earth 50m|Lesotho',
 'World|Natural Earth 50m|Liberia',
 'World|Natural Earth 50m|Libya',
 'World|Natural Earth 50m|Liechtenstein',
 'World|Natural Earth 50m|Lithuania',
 'World|Natural Earth 50m|Luxembourg',
 'World|Natural Earth 50m|Macao',
 'World|Natural Earth 50m|Macedonia',
 'World|Natural Earth 50m|Madagascar',
 'World|Natural Earth 50m|Malawi',
 'World|Natural Earth 50m|Malaysia',
 'World|Natural Earth 50m|Maldives',
 'World|Natural Earth 50m|Mali',
 'World|Natural Earth 50m|Malta',
 'World|Natural Earth 50m|Marshall Is.',
 'World|Natural Earth 50m|Mauritania',
 'World|Natural Earth 50m|Mauritius',
 'World|Natural Earth 50m|Mexico',
 'World|Natural Earth 50m|Micronesia',
 'World|Natural Earth 50m|Moldova',
 'World|Natural Earth 50m|Monaco',
 'World|Natural Earth 50m|Mongolia',
 'World|Natural Earth 50m|Montenegro',
 'World|Natural Earth 50m|Montserrat',
 'World|Natural Earth 50m|Morocco',
 'World|Natural Earth 50m|Mozambique',
 'World|Natural Earth 50m|Myanmar',
 'World|Natural Earth 50m|N. Cyprus',
 'World|Natural Earth 50m|N. Mariana Is.',
 'World|Natural Earth 50m|Namibia',
 'World|Natural Earth 50m|Nauru',
 'World|Natural Earth 50m|Nepal',
 'World|Natural Earth 50m|Netherlands',
 'World|Natural Earth 50m|New Caledonia',
 'World|Natural Earth 50m|New Zealand',
 'World|Natural Earth 50m|Nicaragua',
 'World|Natural Earth 50m|Niger',
 'World|Natural Earth 50m|Nigeria',
 'World|Natural Earth 50m|Niue',
 'World|Natural Earth 50m|Norfolk Island',
 'World|Natural Earth 50m|North Korea',
 'World|Natural Earth 50m|Norway',
 'World|Natural Earth 50m|Oman',
 'World|Natural Earth 50m|Pakistan',
 'World|Natural Earth 50m|Palau',
 'World|Natural Earth 50m|Palestine',
 'World|Natural Earth 50m|Panama',
 'World|Natural Earth 50m|Papua New Guinea',
 'World|Natural Earth 50m|Paraguay',
 'World|Natural Earth 50m|Peru',
 'World|Natural Earth 50m|Philippines',
 'World|Natural Earth 50m|Pitcairn Is.',
 'World|Natural Earth 50m|Poland',
 'World|Natural Earth 50m|Portugal',
 'World|Natural Earth 50m|Puerto Rico',
 'World|Natural Earth 50m|Qatar',
 'World|Natural Earth 50m|Romania',
 'World|Natural Earth 50m|Russia',
 'World|Natural Earth 50m|Rwanda',
 'World|Natural Earth 50m|S. Geo. and the Is.',
 'World|Natural Earth 50m|S. Sudan',
 'World|Natural Earth 50m|Saint Helena',
 'World|Natural Earth 50m|Saint Lucia',
 'World|Natural Earth 50m|Samoa',
 'World|Natural Earth 50m|San Marino',
 'World|Natural Earth 50m|Saudi Arabia',
 'World|Natural Earth 50m|Senegal',
 'World|Natural Earth 50m|Serbia',
 'World|Natural Earth 50m|Seychelles',
 'World|Natural Earth 50m|Siachen Glacier',
 'World|Natural Earth 50m|Sierra Leone',
 'World|Natural Earth 50m|Singapore',
 'World|Natural Earth 50m|Sint Maarten',
 'World|Natural Earth 50m|Slovakia',
 'World|Natural Earth 50m|Slovenia',
 'World|Natural Earth 50m|Solomon Is.',
 'World|Natural Earth 50m|Somalia',
 'World|Natural Earth 50m|Somaliland',
 'World|Natural Earth 50m|South Africa',
 'World|Natural Earth 50m|South Korea',
 'World|Natural Earth 50m|Spain',
 'World|Natural Earth 50m|Sri Lanka',
 'World|Natural Earth 50m|St-Barthélemy',
 'World|Natural Earth 50m|St-Martin',
 'World|Natural Earth 50m|St. Kitts and Nevis',
 'World|Natural Earth 50m|St. Pierre and Miquelon',
 'World|Natural Earth 50m|St. Vin. and Gren.',
 'World|Natural Earth 50m|Sudan',
 'World|Natural Earth 50m|Suriname',
 'World|Natural Earth 50m|Sweden',
 'World|Natural Earth 50m|Switzerland',
 'World|Natural Earth 50m|Syria',
 'World|Natural Earth 50m|São Tomé and Principe',
 'World|Natural Earth 50m|Taiwan',
 'World|Natural Earth 50m|Tajikistan',
 'World|Natural Earth 50m|Tanzania',
 'World|Natural Earth 50m|Thailand',
 'World|Natural Earth 50m|Timor-Leste',
 'World|Natural Earth 50m|Togo',
 'World|Natural Earth 50m|Tonga',
 'World|Natural Earth 50m|Trinidad and Tobago',
 'World|Natural Earth 50m|Tunisia',
 'World|Natural Earth 50m|Turkey',
 'World|Natural Earth 50m|Turkmenistan',
 'World|Natural Earth 50m|Turks and Caicos Is.',
 'World|Natural Earth 50m|U.S. Virgin Is.',
 'World|Natural Earth 50m|Uganda',
 'World|Natural Earth 50m|Ukraine',
 'World|Natural Earth 50m|United Arab Emirates',
 'World|Natural Earth 50m|United Kingdom',
 'World|Natural Earth 50m|United States of America',
 'World|Natural Earth 50m|Uruguay',
 'World|Natural Earth 50m|Uzbekistan',
 'World|Natural Earth 50m|Vanuatu',
 'World|Natural Earth 50m|Vatican',
 'World|Natural Earth 50m|Venezuela',
 'World|Natural Earth 50m|Vietnam',
 'World|Natural Earth 50m|W. Sahara',
 'World|Natural Earth 50m|Wallis and Futuna Is.',
 'World|Natural Earth 50m|Yemen',
 'World|Natural Earth 50m|Zambia',
 'World|Natural Earth 50m|Zimbabwe',
 'World|Natural Earth 50m|eSwatini',
 'World|Natural Earth 50m|Åland',
 'World|North Atlantic Ocean',
 'World|Northern Hemisphere',
 'World|Northern Hemisphere|Land',
 'World|Northern Hemisphere|Ocean',
 'World|Ocean',
 'World|Southern Hemisphere',
 'World|Southern Hemisphere|Land',
 'World|Southern Hemisphere|Ocean']
User-defined masks

As a user, you can also define masks. Simply add them to netcdf_scm.masks.MASKS and then use them in your get_scm_cubes call.

[16]:
WEIGHTS_FUNCTIONS_WITHOUT_AREA_WEIGHTING["custom mask"] = get_weights_for_area(
    -60, 100, -10, 330
)
WEIGHTS_FUNCTIONS_WITHOUT_AREA_WEIGHTING[
    "Northern Atlantic area bounds"
] = get_weights_for_area(0, -80, 65, 0)
[17]:
custom_weights = example.get_scm_timeseries_weights(
    regions=[
        "World|El Nino N3.4",
        "custom mask",
        "World|Land",
        "Northern Atlantic area bounds",
    ]
)
[18]:
plot_weights(custom_weights)
_images/usage_weights_30_0.png
_images/usage_weights_30_1.png
_images/usage_weights_30_2.png
_images/usage_weights_30_3.png
Default land/ocean mask

When crunching data with netCDF-SCM, we want to cut files into (at least) Northern/Southern Hemisphere, land/ocean boxes. However, we don’t always have access to land-surface fraction information from the raw model output. In these cases, we simply apply a default land/ocean mask instead. In this notebook, we show how this mask looks and how it was derived.

Imports
[1]:
import iris
import numpy as np
[2]:
from matplotlib import pyplot as plt
import iris.plot as iplt
import iris.quickplot as qplt
Default mask

Our default mask lives in netcdf_scm.masks. We can access it using netcdf_scm.masks.get_default_sftlf_cube.

[3]:
from netcdf_scm.weights import get_default_sftlf_cube
[4]:
default_sftlf = get_default_sftlf_cube()
[5]:
# NBVAL_IGNORE_OUTPUT
fig = plt.figure(figsize=(16, 9))
qplt.pcolormesh(default_sftlf,);
/data/ubuntu-znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1192: UserWarning: Coordinate 'longitude' is not bounded, guessing contiguous bounds.
  warnings.warn('Coordinate {!r} is not bounded, guessing '
/data/ubuntu-znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1192: UserWarning: Coordinate 'latitude' is not bounded, guessing contiguous bounds.
  warnings.warn('Coordinate {!r} is not bounded, guessing '
_images/usage_default-land-ocean-mask_7_1.png
[6]:
zoomed = default_sftlf.extract(
    iris.Constraint(latitude=lambda cell: -45 < cell < -25)
    & iris.Constraint(longitude=lambda cell: 120 < cell < 160)
)
[7]:
# NBVAL_IGNORE_OUTPUT
fig = plt.figure(figsize=(16, 9))
qplt.pcolormesh(zoomed,);
/data/ubuntu-znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1192: UserWarning: Coordinate 'longitude' is not bounded, guessing contiguous bounds.
  warnings.warn('Coordinate {!r} is not bounded, guessing '
/data/ubuntu-znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1192: UserWarning: Coordinate 'latitude' is not bounded, guessing contiguous bounds.
  warnings.warn('Coordinate {!r} is not bounded, guessing '
_images/usage_default-land-ocean-mask_9_1.png
Deriving the mask

To derive the mask, we simply use the mask from the IPSL-CM6A-LR model in CMIP6.

[8]:
source_file = "../../../tests/test-data/cmip6output/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/fx/sftlf/gr/v20180803/sftlf_fx_IPSL-CM6A-LR_historical_r1i1p1f1_gr.nc"
[9]:
comp_cube = iris.load_cube(source_file)
/data/ubuntu-znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/fileformats/cf.py:803: UserWarning: Missing CF-netCDF measure variable 'areacella', referenced by netCDF variable 'sftlf'
  warnings.warn(message % (variable_name, nc_var_name))
[10]:
# NBVAL_IGNORE_OUTPUT
fig = plt.figure(figsize=(16, 9))
qplt.pcolormesh(comp_cube);
/data/ubuntu-znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1192: UserWarning: Coordinate 'longitude' is not bounded, guessing contiguous bounds.
  warnings.warn('Coordinate {!r} is not bounded, guessing '
/data/ubuntu-znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1192: UserWarning: Coordinate 'latitude' is not bounded, guessing contiguous bounds.
  warnings.warn('Coordinate {!r} is not bounded, guessing '
_images/usage_default-land-ocean-mask_13_1.png
[11]:
sample_points = [
    ("longitude", np.arange(0.5, 360, 1)),
    ("latitude", np.arange(-89.5, 90, 1)),
]
[12]:
comp_cube_interp = comp_cube.interpolate(sample_points, iris.analysis.Linear())
comp_cube_interp.attributes[
    "history"
] = "Interpolated to a 1deg x 1deg grid using iris.interpolate with linear interpolation"
comp_cube_interp.attributes[
    "title"
] = "Default land area fraction assumption in netcdf-scm. Base on {}".format(
    comp_cube_interp.attributes["title"]
)
[13]:
iris.save(comp_cube_interp, "default_weights.nc")
!ncdump -h default_weights.nc
netcdf default_weights {
dimensions:
        lat = 180 ;
        lon = 360 ;
        string8 = 8 ;
variables:
        float sftlf(lat, lon) ;
                sftlf:standard_name = "land_area_fraction" ;
                sftlf:long_name = "Land Area Fraction" ;
                sftlf:units = "%" ;
                sftlf:cell_methods = "area: mean" ;
                sftlf:coordinates = "type" ;
        double lat(lat) ;
                lat:axis = "Y" ;
                lat:units = "degrees_north" ;
                lat:standard_name = "latitude" ;
                lat:long_name = "Latitude" ;
        double lon(lon) ;
                lon:axis = "X" ;
                lon:units = "degrees_east" ;
                lon:standard_name = "longitude" ;
                lon:long_name = "Longitude" ;
        char type(string8) ;
                type:units = "1" ;
                type:standard_name = "area_type" ;
                type:long_name = "Land area type" ;

// global attributes:
                :CMIP6_CV_version = "cv=6.2.3.5-2-g63b123e" ;
                :EXPID = "historical" ;
                :NCO = "\"4.6.0\"" ;
                :activity_id = "CMIP" ;
                :branch_method = "standard" ;
                :branch_time_in_child = 0. ;
                :branch_time_in_parent = 21914. ;
                :contact = "ipsl-cmip6@listes.ipsl.fr" ;
                :creation_date = "2018-07-11T07:27:04Z" ;
                :data_specs_version = "01.00.21" ;
                :description = "Land Area Fraction" ;
                :dr2xml_md5sum = "f1e40c1fc5d8281f865f72fbf4e38f9d" ;
                :dr2xml_version = "1.11" ;
                :experiment = "all-forcing simulation of the recent past" ;
                :experiment_id = "historical" ;
                :forcing_index = 1 ;
                :frequency = "fx" ;
                :further_info_url = "https://furtherinfo.es-doc.org/CMIP6.IPSL.IPSL-CM6A-LR.historical.none.r1i1p1f1" ;
                :grid = "LMDZ grid" ;
                :grid_label = "gr" ;
                :history = "Interpolated to a 1deg x 1deg grid using iris.interpolate with linear interpolation" ;
                :initialization_index = 1 ;
                :institution = "Institut Pierre Simon Laplace, Paris 75252, France" ;
                :institution_id = "IPSL" ;
                :license = "CMIP6 model data produced by IPSL is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (https://creativecommons.org/licenses). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file) and at https://cmc.ipsl.fr/. The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law." ;
                :mip_era = "CMIP6" ;
                :model_version = "6.1.5" ;
                :name = "/ccc/work/cont003/gencmip6/p86caub/IGCM_OUT/IPSLCM6/PROD/historical/CM61-LR-hist-03.1910/CMIP6/ATM/sftlf_fx_IPSL-CM6A-LR_historical_r1i1p1f1_gr" ;
                :nominal_resolution = "250 km" ;
                :online_operation = "once" ;
                :parent_activity_id = "CMIP" ;
                :parent_experiment_id = "piControl" ;
                :parent_mip_era = "CMIP6" ;
                :parent_source_id = "IPSL-CM6A-LR" ;
                :parent_time_units = "days since 1850-01-01 00:00:00" ;
                :parent_variant_label = "r1i1p1f1" ;
                :physics_index = 1 ;
                :product = "model-output" ;
                :realization_index = 1 ;
                :realm = "atmos" ;
                :source = "IPSL-CM6A-LR (2017):  atmos: LMDZ (NPv6, N96; 144 x 143 longitude/latitude; 79 levels; top level 40000 m) land: ORCHIDEE (v2.0, Water/Carbon/Energy mode) ocean: NEMO-OPA (eORCA1.3, tripolar primarily 1deg; 362 x 332 longitude/latitude; 75 levels; top grid cell 0-2 m) ocnBgchem: NEMO-PISCES seaIce: NEMO-LIM3" ;
                :source_id = "IPSL-CM6A-LR" ;
                :source_type = "AOGCM BGC" ;
                :sub_experiment = "none" ;
                :sub_experiment_id = "none" ;
                :table_id = "fx" ;
                :title = "Default land area fraction assumption in netcdf-scm. Base on IPSL-CM6A-LR model output prepared for CMIP6 / CMIP historical" ;
                :tracking_id = "hdl:21.14100/cc6c4852-271d-4c5a-adc3-42530ef19550" ;
                :variable_id = "sftlf" ;
                :variant_label = "r1i1p1f1" ;
                :Conventions = "CF-1.7" ;
}
[14]:
# NBVAL_IGNORE_OUTPUT
fig = plt.figure(figsize=(16, 9))
qplt.pcolormesh(comp_cube_interp);
/data/ubuntu-znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1192: UserWarning: Coordinate 'longitude' is not bounded, guessing contiguous bounds.
  warnings.warn('Coordinate {!r} is not bounded, guessing '
/data/ubuntu-znicholls/miniconda3/envs/netcdf-scm/lib/python3.9/site-packages/iris/coords.py:1192: UserWarning: Coordinate 'latitude' is not bounded, guessing contiguous bounds.
  warnings.warn('Coordinate {!r} is not bounded, guessing '
_images/usage_default-land-ocean-mask_17_1.png
[15]:
comp_cube_regrid = comp_cube.regrid(default_sftlf, iris.analysis.Linear())

As expected, the default mask is more or less identical to the IPSL mask, even with regridding.

[16]:
# NBVAL_IGNORE_OUTPUT
fig = plt.figure(figsize=(16, 9))
qplt.pcolormesh((default_sftlf - comp_cube_regrid));
_images/usage_default-land-ocean-mask_20_0.png

Miscellaneous

Year zero handling

The CMIP6 historical concentration data files use a gregorian calendar which has a reference year of zero. There is no year zero in a gregorian calendar so this case cannot be handled by iris. As a result, we provide a simple wrapper to handle this edge case. Note, as we have to read in the entire data file, it can be slow.

[1]:
# NBVAL_IGNORE_OUTPUT
import datetime

import iris
import iris.coord_categorisation
import iris.plot as iplt
from netcdf_scm.iris_cube_wrappers import CMIP6Input4MIPsCube

import matplotlib.pyplot as plt

plt.style.use("bmh")
[2]:
# NBVAL_IGNORE_OUTPUT
cmip6_hist_concs = CMIP6Input4MIPsCube()
cmip6_hist_concs.load_data_from_identifiers(
    root_dir="../../../tests/test-data/cmip6input4mips",
    activity_id="input4MIPs",
    mip_era="CMIP6",
    target_mip="CMIP",
    institution_id="UoM",
    source_id="UoM-CMIP-1-2-0",
    realm="atmos",
    frequency="mon",
    variable_id="mole-fraction-of-carbon-dioxide-in-air",
    grid_label="gr1-GMNHSH",
    version="v20100304",
    dataset_category="GHGConcentrations",
    time_range="000001-201412",
    file_ext=".nc",
)
[3]:
# NBVAL_IGNORE_OUTPUT
cmip6_hist_concs.cube
[3]:
Mole (1.e-6) time sector
Shape 24168 3
Dimension coordinates
time x -
sector - x
Attributes
Conventions CF-1.6
activity_id input4MIPs
comment Data provided are global and hemispheric area-weighted means. Zonal means...
contact malte.meinshausen@unimelb.edu.au
creation_date 2016-08-30T18:22:16Z
dataset_category GHGConcentrations
dataset_version_number 1.2.0
frequency mon
further_info_url http://climatecollege.unimelb.edu.au/cmip6
grid global and hemispheric means - area-averages from the original latitudinal...
grid_label gr1-GMNHSH
institution Australian-German Climate & Energy College, The University of Melbourne...
institution_id UoM
license GHG concentrations produced by UoM are licensed under a Creative Commons...
mip_era CMIP6
nominal_resolution 10000 km
product assimilated observations
realm atmos
references Malte Meinshausen, Elisabeth Vogel, Alexander Nauels, Katja Lorbacher,...
source UoM-CMIP-1-2-0: Historical GHG mole fractions from NOAA & AGAGE networks...
source_id UoM-CMIP-1-2-0
table_id input4MIPs
target_mip CMIP
title UoM-CMIP-1-2-0: historical GHG concentrations: global and hemispheric means...
tracking_id hdl:21.14100/3ef0a11f-2ed2-4004-9234-4087c2d41cee
variable_id mole_fraction_of_carbon_dioxide_in_air
Cell methods
mean time
mean area

We also make a plot to show how the underlying data looks.

[4]:
# NBVAL_IGNORE_OUTPUT
yearmin = 1700
cube = cmip6_hist_concs.cube.extract(
    iris.Constraint(time=lambda t: t[0].year > yearmin)
)

fig = plt.figure(figsize=(16, 9))

for i in range(3):
    region = cube.extract(iris.Constraint(sector=i))
    for title in region.coord("sector").attributes["ids"].split(";"):
        if title.strip().startswith(str(i)):
            title = title.split(":")[1].strip()
            break

    if "Global" in title:
        plt.subplot(322)
    elif "Northern" in title:
        plt.subplot(324)
    elif "Southern" in title:
        plt.subplot(326)

    iplt.plot(region, lw=2.0)
    xlabel = "Time"
    plt.title(title)
    plt.xlabel(xlabel)
    plt.xlim(
        [datetime.date(1965, 1, 1), datetime.date(2015, 1, 1),]
    )

    if "Global" in title:
        plt.subplot(121)

        iris.coord_categorisation.add_year(region, "time", name="year")
        region_annual_mean = region.aggregated_by(["year"], iris.analysis.MEAN)

        iplt.plot(region_annual_mean, lw=2.0)

        var_name = region.var_name.replace("_", " ")
        var_name = var_name.replace("in", "\nin")
        plt.ylabel("{} ({})".format(var_name, region.units))
        plt.title(title + "-annual mean")
        plt.xlabel("Time")

plt.tight_layout();
_images/usage_year-zero-handling_5_0.png
Miscellaneous reading

There are some files which don’t fit within the standard CMIP6 output but which we would nonetheless like to read. For this purpose, we have netcdf_scm.misc_readers. At the moment it only helps us to read hemispheric-mean data for CMIP6 input concentrations, but more options can be added as needed (pull requests welcome :)).

CMIP6 concentrations input

The CMIP6 input concentrations are provided on a grid. However, hemispheric mean data was also provided. These can be read as shown below.

[1]:
# NBVAL_IGNORE_OUTPUT
import os.path

import matplotlib.pyplot as plt

from netcdf_scm.misc_readers import read_cmip6_concs_gmnhsh
[2]:
TEST_DATA_DIR = os.path.join("..", "..", "..", "tests", "test-data")
TEST_HISTORICAL_FILE = os.path.join(
    TEST_DATA_DIR,
    "mole-fraction-of-carbon-dioxide-in-air_input4MIPs_GHGConcentrations_CMIP_UoM-CMIP-1-2-0_gr1-GMNHSH_000001-201412.nc",
)
TEST_PROJECTION_FILE = os.path.join(
    TEST_DATA_DIR,
    "mole-fraction-of-carbon-dioxide-in-air_input4MIPs_GHGConcentrations_ScenarioMIP_UoM-MESSAGE-GLOBIOM-ssp245-1-2-1_gr1-GMNHSH_201501-250012.nc",
)
[3]:
# NBVAL_IGNORE_OUTPUT
historical_concs = read_cmip6_concs_gmnhsh(TEST_HISTORICAL_FILE)
historical_concs.head()
[3]:
time 0001-01-17 12:00:00 0001-02-16 00:00:00 0001-03-17 12:00:00 0001-04-17 00:00:00 0001-05-17 12:00:00 0001-06-17 00:00:00 0001-07-17 12:00:00 0001-08-17 12:00:00 0001-09-17 00:00:00 0001-10-17 12:00:00 ... 2014-03-17 12:00:00 2014-04-17 00:00:00 2014-05-17 12:00:00 2014-06-17 00:00:00 2014-07-17 12:00:00 2014-08-17 12:00:00 2014-09-17 00:00:00 2014-10-17 12:00:00 2014-11-17 00:00:00 2014-12-17 12:00:00
model scenario region variable unit variable_standard_name mip_era climate_model activity_id member_id
unspecified historical World mole_fraction_of_carbon_dioxide_in_air ppm NaN CMIP6 MAGICC7 input4MIPs unspecified 277.876678 278.231598 278.551178 278.774658 278.706543 277.966461 276.322845 274.719147 274.680359 275.719604 ... 399.020050 399.094604 398.623932 397.337616 395.648834 394.573456 395.026825 396.668762 398.189087 399.179688
World|Northern Hemisphere mole_fraction_of_carbon_dioxide_in_air ppm NaN CMIP6 MAGICC7 input4MIPs unspecified 278.555908 279.183929 279.804108 280.321655 280.213593 278.691864 275.406738 272.345093 272.443939 274.555481 ... 402.959412 403.127563 402.125000 399.311371 395.616882 393.376556 394.318665 397.456665 400.321228 402.195099
World|Southern Hemisphere mole_fraction_of_carbon_dioxide_in_air ppm NaN CMIP6 MAGICC7 input4MIPs unspecified 277.197449 277.279266 277.298248 277.227661 277.199493 277.241058 277.238922 277.093170 276.916809 276.883728 ... 395.080719 395.061676 395.122864 395.363861 395.680786 395.770386 395.734955 395.880859 396.056915 396.164307

3 rows × 24168 columns

[4]:
# NBVAL_IGNORE_OUTPUT
projection_concs = read_cmip6_concs_gmnhsh(TEST_PROJECTION_FILE)
projection_concs.head()
[4]:
time 2015-01-16 12:00:00 2015-02-15 00:00:00 2015-03-16 12:00:00 2015-04-16 00:00:00 2015-05-16 12:00:00 2015-06-16 00:00:00 2015-07-16 12:00:00 2015-08-16 12:00:00 2015-09-16 00:00:00 2015-10-16 12:00:00 ... 2500-03-16 12:00:00 2500-04-16 00:00:00 2500-05-16 12:00:00 2500-06-16 00:00:00 2500-07-16 12:00:00 2500-08-16 12:00:00 2500-09-16 00:00:00 2500-10-16 12:00:00 2500-11-16 00:00:00 2500-12-16 12:00:00
model scenario region variable unit variable_standard_name climate_model activity_id mip_era member_id
MESSAGE-GLOBIOM ssp245 World mole_fraction_of_carbon_dioxide_in_air ppm NaN MAGICC7 input4MIPs CMIP6 unspecified 399.985443 400.471680 400.829407 401.061829 400.765961 399.643005 398.118835 397.217529 397.855652 399.668549 ... 581.471252 581.205750 580.177002 578.151550 576.184753 575.389526 576.097717 578.050537 579.750366 580.699402
World|Northern Hemisphere mole_fraction_of_carbon_dioxide_in_air ppm NaN MAGICC7 input4MIPs CMIP6 unspecified 403.364502 404.067444 404.587128 404.826294 403.919159 401.186096 397.617981 395.575470 396.728790 400.064789 ... 583.975220 583.604553 581.599060 577.361206 573.039368 571.351440 572.867310 576.690613 580.148682 582.227295
World|Southern Hemisphere mole_fraction_of_carbon_dioxide_in_air ppm NaN MAGICC7 input4MIPs CMIP6 unspecified 396.606384 396.875916 397.071716 397.297333 397.612732 398.099915 398.619690 398.859619 398.982513 399.272278 ... 578.967224 578.806885 578.754883 578.941895 579.330139 579.427612 579.328125 579.410461 579.352112 579.171448

3 rows × 5832 columns

[5]:
combined_concs = historical_concs.append(projection_concs)
# hack around Pyam's inability to handle NaN for now
combined_concs = combined_concs.timeseries().reset_index()
combined_concs = combined_concs.drop("variable_standard_name", axis="columns")
combined_concs = type(historical_concs)(combined_concs)
[6]:
# NBVAL_IGNORE_OUTPUT
combined_concs.head()
[6]:
time 0001-01-17 12:00:00 0001-02-16 00:00:00 0001-03-17 12:00:00 0001-04-17 00:00:00 0001-05-17 12:00:00 0001-06-17 00:00:00 0001-07-17 12:00:00 0001-08-17 12:00:00 0001-09-17 00:00:00 0001-10-17 12:00:00 ... 2500-03-16 12:00:00 2500-04-16 00:00:00 2500-05-16 12:00:00 2500-06-16 00:00:00 2500-07-16 12:00:00 2500-08-16 12:00:00 2500-09-16 00:00:00 2500-10-16 12:00:00 2500-11-16 00:00:00 2500-12-16 12:00:00
model scenario region variable unit mip_era climate_model activity_id member_id
unspecified historical World mole_fraction_of_carbon_dioxide_in_air ppm CMIP6 MAGICC7 input4MIPs unspecified 277.876678 278.231598 278.551178 278.774658 278.706543 277.966461 276.322845 274.719147 274.680359 275.719604 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
World|Northern Hemisphere mole_fraction_of_carbon_dioxide_in_air ppm CMIP6 MAGICC7 input4MIPs unspecified 278.555908 279.183929 279.804108 280.321655 280.213593 278.691864 275.406738 272.345093 272.443939 274.555481 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
World|Southern Hemisphere mole_fraction_of_carbon_dioxide_in_air ppm CMIP6 MAGICC7 input4MIPs unspecified 277.197449 277.279266 277.298248 277.227661 277.199493 277.241058 277.238922 277.093170 276.916809 276.883728 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
MESSAGE-GLOBIOM ssp245 World mole_fraction_of_carbon_dioxide_in_air ppm CMIP6 MAGICC7 input4MIPs unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 581.471252 581.205750 580.177002 578.151550 576.184753 575.389526 576.097717 578.050537 579.750366 580.699402
World|Northern Hemisphere mole_fraction_of_carbon_dioxide_in_air ppm CMIP6 MAGICC7 input4MIPs unspecified NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 583.975220 583.604553 581.599060 577.361206 573.039368 571.351440 572.867310 576.690613 580.148682 582.227295

5 rows × 30000 columns

[7]:
# NBVAL_IGNORE_OUTPUT
fig = plt.figure(figsize=(16, 9))

ax = fig.add_subplot(121)
combined_concs.filter(year=range(2010, 2021)).line_plot(
    hue="scenario", style="region", ax=ax
)

ax = fig.add_subplot(122)
combined_concs.filter(year=range(1500, 2300)).line_plot(
    hue="scenario", style="region", ax=ax
);
_images/usage_miscellaneous-reading_8_0.png

Development

If you’re interested in contributing to netCDF-SCM, we’d love to have you on board! This section of the docs details how to get setup to contribute and how best to communicate.

Contributing

All contributions are welcome, some possible suggestions include:

  • tutorials (or support questions which, once solved, result in a new tutorial :D)

  • blog posts

  • improving the documentation

  • bug reports

  • feature requests

  • pull requests

Please report issues or discuss feature requests in the netCDF-SCM issue tracker. If your issue is a feature request or a bug, please use the templates available, otherwise, simply open a normal issue :)

As a contributor, please follow a couple of conventions:

Getting setup

To get setup as a developer, we recommend the following steps (if any of these tools are unfamiliar, please see the resources we recommend in Development tools):

  1. Install conda and make

  2. Run make conda-environment, if that fails you can try doing it manually by reading the commands from the Makefile

  3. Make sure the tests pass by running make test, as above if that fails you can try doing it manually by reading the commands from the Makefile

Getting help

Whilst developing, unexpected things can go wrong (that’s why it’s called ‘developing’, if we knew what we were doing, it would already be ‘developed’). Normally, the fastest way to solve an issue is to contact us via the issue tracker. The other option is to debug yourself. For this purpose, we provide a list of the tools we use during our development as starting points for your search to find what has gone wrong.

Development tools

This list of development tools is what we rely on to develop netCDF-SCM reliably and reproducibly. It gives you a few starting points in case things do go inexplicably wrong and you want to work out why. We include links with each of these tools to starting points that we think are useful, in case you want to learn more.

  • Git

  • Make

  • Conda virtual environments
    • note the common gotcha that source activate has now changed to conda activate

    • we use conda instead of pure pip environments because they help us deal with Iris’ dependencies: if you want to learn more about pip and pip virtual environments, check out this introduction

  • Tests
    • we use a blend of pytest and the inbuilt Python testing capabilities for our tests so checkout what we’ve already done in tests to get a feel for how it works

  • Continuous integration (CI)
    • we use GitLab CI for our CI but there are a number of good providers

  • Jupyter Notebooks
    • we’d recommend simply installing jupyter (conda install jupyter) in your virtual environment

  • Sphinx

Other tools

We also use some other tools which aren’t necessarily the most familiar. Here we provide a list of these along with useful resources.

  • Regular expressions
    • we use regex101.com to help us write and check our regular expressions, make sure the language is set to Python to make your life easy!

Formatting

To help us focus on what the code does, not how it looks, we use a couple of automatic formatting tools. These automatically format the code for us and tell use where the errors are. To use them, after setting yourself up (see Getting setup), simply run make black and make flake8. Note that make black can only be run if you have committed all your work i.e. your working directory is ‘clean’. This restriction is made to ensure that you don’t format code without being able to undo it, just in case something goes wrong.

Buiding the docs

After setting yourself up (see Getting setup), building the docs is as simple as running make docs (note, run make -B docs to force the docs to rebuild and ignore make when it says ‘… index.html is up to date’). This will build the docs for you. You can preview them by opening docs/build/html/index.html in a browser.

For documentation we use Sphinx. To get ourselves started with Sphinx, we started with this example then used Sphinx’s getting started guide.

Gotchas

To get Sphinx to generate pdfs (rarely worth the hassle), you require Latexmk. On a Mac this can be installed with sudo tlmgr install latexmk. You will most likely also need to install some other packages (if you don’t have the full distribution). You can check which package contains any missing files with tlmgr search --global --file [filename]. You can then install the packages with sudo tlmgr install [package].

Docstring style

For our docstrings we use numpy style docstrings. For more information on these, here is the full guide and the quick reference we also use.

Releasing

The steps to release a new version of netCDF-SCM are shown below. Please do all the steps below and all the steps for both release platforms.

First step
  1. Test installation with dependencies make test-install

  2. Update CHANGELOG.rst:

    • add a header for the new version between master and the latest bullet point

    • this should leave the section underneath the master header empty

  3. git add .

  4. git commit -m "Prepare for release of vX.Y.Z"

  5. git tag vX.Y.Z

  6. Test version updated as intended with make test-install

PyPI

If uploading to PyPI, do the following (otherwise skip these steps)

  1. make publish-on-testpypi

  2. Go to test PyPI and check that the new release is as intended. If it isn’t, stop and debug.

  3. Test the install with make test-testpypi-install (this doesn’t test all the imports as most required packages are not on test PyPI).

Assuming test PyPI worked, now upload to the main repository

  1. make publish-on-pypi

  2. Go to netCDF-SCM’s PyPI and check that the new release is as intended.

  3. Test the install with make test-pypi-install (a pip only install will throw warnings about Iris not being installed, that’s fine).

Push to repository

Finally, push the tags and the repository

  1. git push

  2. git push --tags

Conda
  1. If you haven’t already, fork the netCDF-SCM conda feedstock. In your fork, add the feedstock upstream with git remote add upstream https://github.com/conda-forge/netcdf-scm-feedstock (upstream should now appear in the output of git remote -v)

  2. Update your fork’s master to the upstream master with:

    1. git checkout master

    2. git fetch upstream

    3. git reset --hard upstream/master

  3. Create a new branch in the feedstock for the version you want to bump to.

  4. Edit recipe/meta.yaml and update:

    • version number in line 1 (don’t include the ‘v’ in the version tag)

    • the build number to zero (you should only be here if releasing a new version)

    • update sha256 in line 9 (you can get the sha from netCDF-SCM’s PyPI by clicking on ‘Download files’ on the left and then clicking on ‘SHA256’ of the .tar.gz file to copy it to the clipboard)

  5. git add .

  6. git commit -m "Update to vX.Y.Z"

  7. git push

  8. Make a PR into the netCDF-SCM conda feedstock

  9. If the PR passes (give it at least 10 minutes to run all the CI), merge

  10. Check https://anaconda.org/conda-forge/netcdf-scm to double check that the version has increased (this can take a few minutes to update)

Archiving on zenodo
  1. Create a clean version of the repo (note, this deletes all files not tracked by git, use with care!), git clean -xdf (dry run can be done with git clean -ndf)

  2. Tar the repo

    VERSION=`python -c 'import netcdf_scm; print(netcdf_scm.__version__)'` \
        && tar --exclude='./.git' -czvf "netcdf-scm-${VERSION}.tar.gz" .
    
  3. Run the zenodo script to get the curl command for the file to upload, python scripts/prepare_zenodo_upload.py <file-to-upload>

  4. The above script spits out a curl command, run this command (having set the ZENODO_TOKEN environment variable first) to upload your archive

  5. Go to zenodo.org, read through and finalise the upload by pushing publish

Why is there a Makefile in a pure Python repository?

Whilst it may not be standard practice, a Makefile is a simple way to automate general setup (environment setup in particular). Hence we have one here which basically acts as a notes file for how to do all those little jobs which we often forget e.g. setting up environments, running tests (and making sure we’re in the right environment), building docs, setting up auxillary bits and pieces.

Why did we choose a BSD 2-Clause License?

We want to ensure that our code can be used and shared as easily as possible. Whilst we love transparency, we didn’t want to force all future users to also comply with a stronger license such as AGPL. Hence the choice we made.

We recommend Morin et al. 2012 for more information for scientists about open-source software licenses.

Iris cube wrappers API

Wrappers of the iris cube.

These classes automate handling of a number of netCDF processing steps. For example, finding surface land fraction files, applying regions to data and returning timeseries in key regions for simple climate models.

class netcdf_scm.iris_cube_wrappers.CMIP6Input4MIPsCube[source]

Bases: netcdf_scm.iris_cube_wrappers._CMIPCube

Cube which can be used with CMIP6 input4MIPs data

The data must match the CMIP6 Forcing Datasets Summary, specifically the Forcing Dataset Specifications.

activity_id = None

The activity_id for which we want to load data.

For these cubes, this will almost always be input4MIPs.

Type

str

areacell_var

The name of the variable associated with the area of each gridbox.

If required, this is used to determine the area of each cell in a data file. For example, if our data file is tas_Amon_HadCM3_rcp45_r1i1p1_200601.nc then areacell_var can be used to work out the name of the associated cell area file. In some cases, it might be as simple as replacing tas with the value of areacell_var.

Type

str

convert_scm_timeseries_cubes_to_openscmdata(scm_timeseries_cubes, out_calendar=None)

Convert dictionary of SCM timeseries cubes to an scmdata.ScmRun

Parameters
  • scm_timeseries_cubes (dict) – Dictionary of “region name”-ScmCube key-value pairs.

  • out_calendar (str) – Calendar to use for the time axis of the output

Returns

scmdata.ScmRun containing the data from the SCM timeseries cubes

Return type

scmdata.ScmRun

Raises

NotImplementedError – The (original) input data has dimensions other than time, latitude and longitude (so the data to convert has dimensions other than time).

dataset_category = None

The dataset_category for which we want to load data e.g. GHGConcentrations

Type

str

dim_names

Names of the dimensions in this cube

Here the names are the standard_names which means there can be None in the output.

Type

list

file_ext = None

The file extension of the data file we want to load e.g. .nc

Type

str

frequency = None

The frequency for which we want to load data e.g. yr

Type

str

get_area_weights(areacell_scmcube=None)

Get area weights for this cube

Parameters

areacell_scmcube (ScmCube) – ScmCube containing areacell data. If None, we calculate the weights using iris.

Returns

Weights on the cube’s latitude-longitude grid.

Return type

np.ndarray

Raises
  • iris.exceptions.CoordinateMultiDimError – The cube’s co-ordinates are multi-dimensional and we don’t have cell area data.

  • ValueError – Area weights units are not as expected (contradict with self._area_weights_units).

get_data_directory()

Get the path to a data file from self’s attributes.

This can take multiple forms, it may just return a previously set filepath attribute or it could combine a number of different metadata elements (e.g. model name, experiment name) to create the data path.

Returns

path to the data file from which this cube has been/will be loaded

Return type

str

Raises

OSError – The data directory cannot be determined

get_data_filename()

Get the name of a data file from self’s attributes.

This can take multiple forms, it may just return a previously set filename attribute or it could combine a number of different metadata elements (e.g. model name, experiment name) to create the data name.

Returns

name of the data file from which this cube has been/will be loaded.

Return type

str

Raises

OSError – The data directory cannot be determined

classmethod get_data_reference_syntax(**kwargs)

Get data reference syntax for this cube

Parameters

kwargs (str) – Attributes of the cube to set before generating the example data reference syntax.

Returns

Example of the full path to a file for the given kwargs with this cube’s data reference syntax.

Return type

str

get_filepath_from_load_data_from_identifiers_args(**kwargs)[source]

Get the full filepath of the data to load from the arguments passed to self.load_data_from_identifiers.

Full details about the meaning of the identifiers are given in the Forcing Dataset Specifications.

Parameters

kwargs (str) – Identifiers to use to load the data

Returns

The full filepath (path and name) of the file to load.

Return type

str

Raises

AttributeError – An input argument does not match with the cube’s data reference syntax

get_load_data_from_identifiers_args_from_filepath(filepath)

Get the set of identifiers to use to load data from a filepath.

Parameters

filepath (str) – The filepath from which to load the data.

Returns

Set of arguments which can be passed to self.load_data_from_identifiers to load the data in the filepath.

Return type

dict

Raises

ValueError – Path and filename contradict each other

get_metadata_cube(metadata_variable, cube=None)

Load a metadata cube from self’s attributes.

Parameters
  • metadata_variable (str) – the name of the metadata variable to get, as it appears in the filename.

  • cube (ScmCube) – Optionally, pass in an already loaded metadata cube to link it to currently loaded cube

Returns

instance of self which has been loaded from the file containing the metadata variable of interest.

Return type

type(self)

Raises

TypeErrorcube is not an ScmCube

get_scm_timeseries(**kwargs)

Get SCM relevant timeseries from self.

Parameters

**kwargs – Passed to get_scm_timeseries_cubes()

Returns

scmdata.ScmRun instance with the data in the data attribute and metadata in the metadata attribute.

Return type

scmdata.ScmRun

get_scm_timeseries_cubes(lazy=False, **kwargs)

Get SCM relevant cubes

The effective areas used for each of the regions are added as auxillary co-ordinates of each timeseries cube.

If global, Northern Hemisphere and Southern Hemisphere land cubes are calculated, then three auxillary co-ordinates are also added to each cube: land_fraction, land_fraction_northern_hemisphere and land_fraction_southern_hemisphere. These co-ordinates document the area fraction that was considered to be land when the cubes were crunched i.e. land_fraction is the fraction of the entire globe which was considered to be land, land_fraction_northern_hemisphere is the fraction of the Northern Hemisphere which was considered to be land and land_fraction_southern_hemisphere is the fraction of the Southern Hemisphere which was considered to be land.

Parameters
  • lazy (bool) – Should I process the data lazily? This can be slow as data has to be read off disk multiple time.

  • kwargs (anys) – Passed to get_scm_timeseries_weights()

Returns

dict of str – Dictionary of cubes (region: cube key: value pairs), with latitude-longitude mean data as appropriate for each of the requested regions.

Return type

ScmCube

Raises

InvalidWeightsError – No valid weights are found for the requested regions

get_scm_timeseries_weights(surface_fraction_cube=None, areacell_scmcube=None, regions=None, cell_weights=None, log_failure=False)

Get the scm timeseries weights

Parameters
  • surface_fraction_cube (ScmCube, optional) – land surface fraction data which is used to determine whether a given gridbox is land or ocean. If None, we try to load the land surface fraction automatically.

  • areacell_scmcube (ScmCube, optional) – cell area data which is used to take the latitude-longitude mean of the cube’s data. If None, we try to load this data automatically and if that fails we fall back onto iris.analysis.cartography.area_weights.

  • regions (list[str]) – List of regions to use. If None then netcdf_scm.regions.DEFAULT_REGIONS is used.

  • cell_weights ({'area-only', 'area-surface-fraction'}) – How cell weights should be calculated. If 'area-surface-fraction', both cell area and its surface fraction will be used to weight the cell. If 'area-only', only the cell’s area will be used to weight the cell (cells which do not belong to the region are nonetheless excluded). If None, netCDF-SCM will guess whether land surface fraction weights should be included or not based on the data being processed. When guessing, for ocean data, netCDF-SCM will weight cells only by the horizontal area of the cell i.e. no land fraction (see Section L5 of Griffies et al., GMD, 2016, https://doi.org/10.5194/gmd-9-3231-2016). For land variables, netCDF-SCM will weight cells by both thier horizontal area and their land surface fraction. “Yes, you do need to weight the output by land frac (sftlf is the CMIP variable name).” (Chris Jones, personal communication, 18 April 2020). For land variables, note that there seems to be nothing in Jones et al., GMD, 2016 (https://doi.org/10.5194/gmd-9-2853-2016).

  • log_failure (bool) – Should regions which fail be logged? If no, failures are raised as warnings.

Returns

dict of str – Dictionary of ‘region name’: weights, key: value pairs

Return type

np.ndarray

Notes

Only regions which can be calculated are returned. If no regions can be calculated, an empty dictionary will be returned.

get_variable_constraint()

Get the iris variable constraint to use when loading data with self.load_data_from_identifiers.

Returns

constraint to use which ensures that only the variable of interest is loaded.

Return type

iris.Constraint

grid_label = None

The grid_label for which we want to load data e.g. gr1-GMNHSH

Type

str

info

Information about the cube’s source files

res["files"] contains the files used to load the data in this cube. res["metadata"] contains information for each of the metadata cubes used to load the data in this cube.

Returns

Return type

dict

institution_id = None

The institution_id for which we want to load data e.g. UoM

Type

str

lat_dim

iris.coords.DimCoord The latitude dimension of the data.

lat_dim_number

The index which corresponds to the latitude dimension.

e.g. if latitude is the first dimension of the data, then self.lat_dim_number will be 0 (Python is zero-indexed).

Type

int

lat_lon_shape

2D Tuple of int which gives shape of a lat-lon slice of the data

e.g. if the cube’s shape is (4, 3, 5, 4) and its dimensions are (time, lat, depth, lon) then cube.lat_lon_shape will be (3, 4)

Raises

AssertionError – No lat lon slice can be deduced (if this happens, please raise an issue at https://gitlab.com/netcdf-scm/netcdf-scm/issues so we can address your use case).

Type

tuple

load_data_from_identifiers(process_warnings=True, **kwargs)

Load data using key identifiers.

The identifiers are used to determine the path of the file to load. The file is then loaded into an iris cube which can be accessed through self.cube.

Parameters
  • process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?

  • kwargs (any) – Arguments which can then be processed by self.get_filepath_from_load_data_from_identifiers_args and self.get_variable_constraint to determine the full filepath of the file to load and the variable constraint to use.

load_data_from_path(filepath, process_warnings=True)

Load data from a path.

Parameters
  • filepath (str) – The filepath from which to load the data.

  • process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?

load_data_in_directory(directory=None, process_warnings=True)

Load data in a directory.

The data is loaded into an iris cube which can be accessed through self.cube.

Initially, this method is intended to only be used to load data when it is saved in a number of different timeslice files e.g.:

  • tas_Amon_HadCM3_rcp45_r1i1p1_200601-203012.nc

  • tas_Amon_HadCM3_rcp45_r1i1p1_203101-203512.nc

  • tas_Amon_HadCM3_rcp45_r1i1p1_203601-203812.nc

It is not intended to be used to load multiple different variables or non-continuous timeseries. These use cases could be added in future, but are not required yet so have not been included.

Note that this function removes any attributes which aren’t common between the loaded cubes. In general, we have found that this mainly means creation_date, tracking_id and history are deleted. If unsure, please check.

Parameters
  • directory (str) – Directory from which to load the data.

  • process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?

Raises

ValueError – If the files in the directory are not from the same run (i.e. their filenames are not identical except for the timestamp) or if the files don’t form a continuous timeseries.

lon_dim

iris.coords.DimCoord The longitude dimension of the data.

lon_dim_number

The index which corresponds to the longitude dimension.

e.g. if longitude is the third dimension of the data, then self.lon_dim_number will be 2 (Python is zero-indexed).

Type

int

mip_era = None

The mip_era for which we want to load data e.g. CMIP6

Type

str

netcdf_scm_realm

The realm in which netCDF-SCM thinks the data belongs.

This is used to make decisions about how to take averages of the data and where to find metadata variables.

If it is not sure, netCDF-SCM will guess that the data belongs to the ‘atmosphere’ realm.

Type

str

process_filename(filename)[source]

Cut a filename into its identifiers

Parameters

filename (str) – The filename to process. Filename here means just the filename, no path should be included.

Returns

A dictionary where each key is the identifier name and each value is the value of that identifier for the input filename

Return type

dict

process_path(path)[source]

Cut a path into its identifiers

Parameters

path (str) – The path to process. Path here means just the path, no filename should be included.

Returns

A dictionary where each key is the identifier name and each value is the value of that identifier for the input path

Return type

dict

realm = None

The realm for which we want to load data e.g. atmos

Type

str

root_dir = None
The root directory of the database i.e. where the cube should start its

path

e.g. /home/users/usertim/cmip6input.

Type

str

source_id = None

The source_id for which we want to load data e.g. UoM-REMIND-MAGPIE-ssp585-1-2-0

This must include the institution_id.

Type

str

surface_fraction_var

The name of the variable associated with the surface fraction in each gridbox.

If required, this is used when looking for the surface fraction file which belongs to a given data file. For example, if our data file is tas_Amon_HadCM3_rcp45_r1i1p1_200601.nc then surface_fraction_var can be used to work out the name of the associated surface fraction file. In some cases, it might be as simple as replacing tas with the value of surface_fraction_var.

Type

str

table_name_for_metadata_vars

The name of the ‘table’ in which metadata variables can be found.

For example, fx or Ofx.

We wrap this as a property as table typically means table_id but is sometimes referred to in other ways e.g. as mip_table in CMIP5.

Type

str

target_mip = None

The target_mip for which we want to load data e.g. ScenarioMIP

Type

str

time_dim

iris.coords.DimCoord The time dimension of the data.

time_dim_number

The index which corresponds to the time dimension.

e.g. if time is the first dimension of the data, then self.time_dim_number will be 0 (Python is zero-indexed).

Type

int

time_period_regex

Regular expression which captures the timeseries identifier in input data files.

For help on regular expressions, see regular expressions.

Type

_sre.SRE_Pattern

time_range = None

The time range for which we want to load data e.g. 2005-2100

If None, this information isn’t included in the filename which is useful for loading metadata files which don’t have a relevant time period.

Type

str

timestamp_definitions

Definition of valid timestamp information and corresponding key values.

This follows the CMIP standards where time strings must be one of the following: YYYY, YYYYMM, YYYYMMDD, YYYYMMDDHH or one of the previous combined with a hyphen e.g. YYYY-YYYY.

Each key in the definitions dictionary is the length of the timestamp. Each value is itself a dictionary, with keys:

  • datetime_str: the string required to convert a timestamp of this length into a datetime using datetime.datetime.strptime

  • generic_regexp: a regular expression which will match timestamps in this format

  • expected_timestep: a dateutil.relativedelta.relativedelta object which contains the expected timestep in files with this timestamp

Returns

Return type

dict

Examples

>>> self.timestamp_definitions[len("2012")]["datetime_str"]
"%Y"
variable_id = None

The variable_id for which we want to load data e.g. mole-fraction-of-carbon-dioxide-in-air

Type

str

version = None

The version for which we want to load data e.g. v20180427

Type

str

class netcdf_scm.iris_cube_wrappers.CMIP6OutputCube[source]

Bases: netcdf_scm.iris_cube_wrappers._CMIPCube

Cube which can be used with CMIP6 model output data

The data must match the CMIP6 data reference syntax as specified in the ‘File name template’ and ‘Directory structure template’ sections of the CMIP6 Data Reference Syntax.

activity_id = None

The activity_id for which we want to load data.

In CMIP6, this denotes the responsible MIP e.g. DCPP.

Type

str

areacell_var

The name of the variable associated with the area of each gridbox.

If required, this is used to determine the area of each cell in a data file. For example, if our data file is tas_Amon_HadCM3_rcp45_r1i1p1_200601.nc then areacell_var can be used to work out the name of the associated cell area file. In some cases, it might be as simple as replacing tas with the value of areacell_var.

Type

str

convert_scm_timeseries_cubes_to_openscmdata(scm_timeseries_cubes, out_calendar=None)

Convert dictionary of SCM timeseries cubes to an scmdata.ScmRun

Parameters
  • scm_timeseries_cubes (dict) – Dictionary of “region name”-ScmCube key-value pairs.

  • out_calendar (str) – Calendar to use for the time axis of the output

Returns

scmdata.ScmRun containing the data from the SCM timeseries cubes

Return type

scmdata.ScmRun

Raises

NotImplementedError – The (original) input data has dimensions other than time, latitude and longitude (so the data to convert has dimensions other than time).

dim_names

Names of the dimensions in this cube

Here the names are the standard_names which means there can be None in the output.

Type

list

experiment_id = None

The experiment_id for which we want to load data e.g. dcppA-hindcast

Type

str

file_ext = None

The file extension of the data file we want to load e.g. .nc

Type

str

get_area_weights(areacell_scmcube=None)

Get area weights for this cube

Parameters

areacell_scmcube (ScmCube) – ScmCube containing areacell data. If None, we calculate the weights using iris.

Returns

Weights on the cube’s latitude-longitude grid.

Return type

np.ndarray

Raises
  • iris.exceptions.CoordinateMultiDimError – The cube’s co-ordinates are multi-dimensional and we don’t have cell area data.

  • ValueError – Area weights units are not as expected (contradict with self._area_weights_units).

get_data_directory()

Get the path to a data file from self’s attributes.

This can take multiple forms, it may just return a previously set filepath attribute or it could combine a number of different metadata elements (e.g. model name, experiment name) to create the data path.

Returns

path to the data file from which this cube has been/will be loaded

Return type

str

Raises

OSError – The data directory cannot be determined

get_data_filename()

Get the name of a data file from self’s attributes.

This can take multiple forms, it may just return a previously set filename attribute or it could combine a number of different metadata elements (e.g. model name, experiment name) to create the data name.

Returns

name of the data file from which this cube has been/will be loaded.

Return type

str

Raises

OSError – The data directory cannot be determined

classmethod get_data_reference_syntax(**kwargs)

Get data reference syntax for this cube

Parameters

kwargs (str) – Attributes of the cube to set before generating the example data reference syntax.

Returns

Example of the full path to a file for the given kwargs with this cube’s data reference syntax.

Return type

str

get_filepath_from_load_data_from_identifiers_args(**kwargs)[source]

Get the full filepath of the data to load from the arguments passed to self.load_data_from_identifiers.

Full details about the meaning of each identifier is given in Table 1 of the CMIP6 Data Reference Syntax.

Parameters

kwargs (str) – Identifiers to use to load the data

Returns

The full filepath (path and name) of the file to load.

Return type

str

Raises

AttributeError – An input argument does not match with the cube’s data reference syntax

classmethod get_instance_id(filepath)[source]

Get the instance_id from a given path

This is used as a unique identifier for datasets on the ESGF.

Parameters

filepath (str) – Full file path including directory structure

Raises

ValueError: – If the filepath provided results in an instance id which is obviously incorrect

Returns

Instance ID

Return type

str

get_load_data_from_identifiers_args_from_filepath(filepath)

Get the set of identifiers to use to load data from a filepath.

Parameters

filepath (str) – The filepath from which to load the data.

Returns

Set of arguments which can be passed to self.load_data_from_identifiers to load the data in the filepath.

Return type

dict

Raises

ValueError – Path and filename contradict each other

get_metadata_cube(metadata_variable, cube=None)

Load a metadata cube from self’s attributes.

Parameters
  • metadata_variable (str) – the name of the metadata variable to get, as it appears in the filename.

  • cube (ScmCube) – Optionally, pass in an already loaded metadata cube to link it to currently loaded cube

Returns

instance of self which has been loaded from the file containing the metadata variable of interest.

Return type

type(self)

Raises

TypeErrorcube is not an ScmCube

get_scm_timeseries(**kwargs)

Get SCM relevant timeseries from self.

Parameters

**kwargs – Passed to get_scm_timeseries_cubes()

Returns

scmdata.ScmRun instance with the data in the data attribute and metadata in the metadata attribute.

Return type

scmdata.ScmRun

get_scm_timeseries_cubes(lazy=False, **kwargs)

Get SCM relevant cubes

The effective areas used for each of the regions are added as auxillary co-ordinates of each timeseries cube.

If global, Northern Hemisphere and Southern Hemisphere land cubes are calculated, then three auxillary co-ordinates are also added to each cube: land_fraction, land_fraction_northern_hemisphere and land_fraction_southern_hemisphere. These co-ordinates document the area fraction that was considered to be land when the cubes were crunched i.e. land_fraction is the fraction of the entire globe which was considered to be land, land_fraction_northern_hemisphere is the fraction of the Northern Hemisphere which was considered to be land and land_fraction_southern_hemisphere is the fraction of the Southern Hemisphere which was considered to be land.

Parameters
  • lazy (bool) – Should I process the data lazily? This can be slow as data has to be read off disk multiple time.

  • kwargs (anys) – Passed to get_scm_timeseries_weights()

Returns

dict of str – Dictionary of cubes (region: cube key: value pairs), with latitude-longitude mean data as appropriate for each of the requested regions.

Return type

ScmCube

Raises

InvalidWeightsError – No valid weights are found for the requested regions

get_scm_timeseries_weights(surface_fraction_cube=None, areacell_scmcube=None, regions=None, cell_weights=None, log_failure=False)

Get the scm timeseries weights

Parameters
  • surface_fraction_cube (ScmCube, optional) – land surface fraction data which is used to determine whether a given gridbox is land or ocean. If None, we try to load the land surface fraction automatically.

  • areacell_scmcube (ScmCube, optional) – cell area data which is used to take the latitude-longitude mean of the cube’s data. If None, we try to load this data automatically and if that fails we fall back onto iris.analysis.cartography.area_weights.

  • regions (list[str]) – List of regions to use. If None then netcdf_scm.regions.DEFAULT_REGIONS is used.

  • cell_weights ({'area-only', 'area-surface-fraction'}) –

    How cell weights should be calculated. If 'area-surface-fraction', both cell area and its surface fraction will be used to weight the cell. If 'area-only', only the cell’s area will be used to weight the cell (cells which do not belong to the region are nonetheless excluded). If None, netCDF-SCM will guess whether land surface fraction weights should be included or not based on the data being processed. When guessing, for ocean data, netCDF-SCM will weight cells only by the horizontal area of the cell i.e. no land fraction (see Section L5 of Griffies et al., GMD, 2016, https://doi.org/10.5194/gmd-9-3231-2016). For land variables, netCDF-SCM will weight cells by both thier horizontal area and their land surface fraction. “Yes, you do need to weight the output by land frac (sftlf is the CMIP variable name).” (Chris Jones, personal communication, 18 April 2020). For land variables, note that there seems to be nothing in Jones et al., GMD, 2016 (https://doi.org/10.5194/gmd-9-2853-2016).

  • log_failure (bool) – Should regions which fail be logged? If no, failures are raised as warnings.

Returns

dict of str – Dictionary of ‘region name’: weights, key: value pairs

Return type

np.ndarray

Notes

Only regions which can be calculated are returned. If no regions can be calculated, an empty dictionary will be returned.

get_variable_constraint()

Get the iris variable constraint to use when loading data with self.load_data_from_identifiers.

Returns

constraint to use which ensures that only the variable of interest is loaded.

Return type

iris.Constraint

grid_label = None

The grid_label for which we want to load data e.g. grn

Type

str

info

Information about the cube’s source files

res["files"] contains the files used to load the data in this cube. res["metadata"] contains information for each of the metadata cubes used to load the data in this cube.

Returns

Return type

dict

institution_id = None

The institution_id for which we want to load data e.g. CNRM-CERFACS

Type

str

lat_dim

iris.coords.DimCoord The latitude dimension of the data.

lat_dim_number

The index which corresponds to the latitude dimension.

e.g. if latitude is the first dimension of the data, then self.lat_dim_number will be 0 (Python is zero-indexed).

Type

int

lat_lon_shape

2D Tuple of int which gives shape of a lat-lon slice of the data

e.g. if the cube’s shape is (4, 3, 5, 4) and its dimensions are (time, lat, depth, lon) then cube.lat_lon_shape will be (3, 4)

Raises

AssertionError – No lat lon slice can be deduced (if this happens, please raise an issue at https://gitlab.com/netcdf-scm/netcdf-scm/issues so we can address your use case).

Type

tuple

load_data_from_identifiers(process_warnings=True, **kwargs)

Load data using key identifiers.

The identifiers are used to determine the path of the file to load. The file is then loaded into an iris cube which can be accessed through self.cube.

Parameters
  • process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?

  • kwargs (any) – Arguments which can then be processed by self.get_filepath_from_load_data_from_identifiers_args and self.get_variable_constraint to determine the full filepath of the file to load and the variable constraint to use.

load_data_from_path(filepath, process_warnings=True)

Load data from a path.

Parameters
  • filepath (str) – The filepath from which to load the data.

  • process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?

load_data_in_directory(directory=None, process_warnings=True)

Load data in a directory.

The data is loaded into an iris cube which can be accessed through self.cube.

Initially, this method is intended to only be used to load data when it is saved in a number of different timeslice files e.g.:

  • tas_Amon_HadCM3_rcp45_r1i1p1_200601-203012.nc

  • tas_Amon_HadCM3_rcp45_r1i1p1_203101-203512.nc

  • tas_Amon_HadCM3_rcp45_r1i1p1_203601-203812.nc

It is not intended to be used to load multiple different variables or non-continuous timeseries. These use cases could be added in future, but are not required yet so have not been included.

Note that this function removes any attributes which aren’t common between the loaded cubes. In general, we have found that this mainly means creation_date, tracking_id and history are deleted. If unsure, please check.

Parameters
  • directory (str) – Directory from which to load the data.

  • process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?

Raises

ValueError – If the files in the directory are not from the same run (i.e. their filenames are not identical except for the timestamp) or if the files don’t form a continuous timeseries.

lon_dim

iris.coords.DimCoord The longitude dimension of the data.

lon_dim_number

The index which corresponds to the longitude dimension.

e.g. if longitude is the third dimension of the data, then self.lon_dim_number will be 2 (Python is zero-indexed).

Type

int

member_id = None

The member_id for which we want to load data e.g. s1960-r2i1p1f3

Type

str

mip_era = None

The mip_era for which we want to load data e.g. CMIP6

Type

str

netcdf_scm_realm

The realm in which netCDF-SCM thinks the data belongs.

This is used to make decisions about how to take averages of the data and where to find metadata variables.

If it is not sure, netCDF-SCM will guess that the data belongs to the ‘atmosphere’ realm.

Type

str

process_filename(filename)[source]

Cut a filename into its identifiers

Parameters

filename (str) – The filename to process. Filename here means just the filename, no path should be included.

Returns

A dictionary where each key is the identifier name and each value is the value of that identifier for the input filename

Return type

dict

process_path(path)[source]

Cut a path into its identifiers

Parameters

path (str) – The path to process. Path here means just the path, no filename should be included.

Returns

A dictionary where each key is the identifier name and each value is the value of that identifier for the input path

Return type

dict

root_dir = None
The root directory of the database i.e. where the cube should start its

path

e.g. /home/users/usertim/cmip6_data.

Type

str

source_id = None

The source_id for which we want to load data e.g. CNRM-CM6-1

This was known as model in CMIP5.

Type

str

surface_fraction_var

The name of the variable associated with the surface fraction in each gridbox.

If required, this is used when looking for the surface fraction file which belongs to a given data file. For example, if our data file is tas_Amon_HadCM3_rcp45_r1i1p1_200601.nc then surface_fraction_var can be used to work out the name of the associated surface fraction file. In some cases, it might be as simple as replacing tas with the value of surface_fraction_var.

Type

str

table_id = None

The table_id for which we want to load data. e.g. day

Type

str

table_name_for_metadata_vars

The name of the ‘table’ in which metadata variables can be found.

For example, fx or Ofx.

We wrap this as a property as table typically means table_id but is sometimes referred to in other ways e.g. as mip_table in CMIP5.

Type

str

time_dim

iris.coords.DimCoord The time dimension of the data.

time_dim_number

The index which corresponds to the time dimension.

e.g. if time is the first dimension of the data, then self.time_dim_number will be 0 (Python is zero-indexed).

Type

int

time_period_regex

Regular expression which captures the timeseries identifier in input data files.

For help on regular expressions, see regular expressions.

Type

_sre.SRE_Pattern

time_range = None

The time range for which we want to load data e.g. 198001-198412

If None, this information isn’t included in the filename which is useful for loading metadata files which don’t have a relevant time period.

Type

str

timestamp_definitions

Definition of valid timestamp information and corresponding key values.

This follows the CMIP standards where time strings must be one of the following: YYYY, YYYYMM, YYYYMMDD, YYYYMMDDHH or one of the previous combined with a hyphen e.g. YYYY-YYYY.

Each key in the definitions dictionary is the length of the timestamp. Each value is itself a dictionary, with keys:

  • datetime_str: the string required to convert a timestamp of this length into a datetime using datetime.datetime.strptime

  • generic_regexp: a regular expression which will match timestamps in this format

  • expected_timestep: a dateutil.relativedelta.relativedelta object which contains the expected timestep in files with this timestamp

Returns

Return type

dict

Examples

>>> self.timestamp_definitions[len("2012")]["datetime_str"]
"%Y"
variable_id = None

The variable_id for which we want to load data e.g. pr

Type

str

version = None

The version for which we want to load data e.g. v20160215

Type

str

class netcdf_scm.iris_cube_wrappers.MarbleCMIP5Cube[source]

Bases: netcdf_scm.iris_cube_wrappers._CMIPCube

Cube which can be used with the cmip5 directory on marble (identical to ETH Zurich’s archive).

This directory structure is very similar, but not quite identical, to the recommended CMIP5 directory structure described in section 3.1 of the CMIP5 Data Reference Syntax.

activity = None

The activity for which we want to load data e.g. ‘cmip5’

Type

str

areacell_var

The name of the variable associated with the area of each gridbox.

If required, this is used to determine the area of each cell in a data file. For example, if our data file is tas_Amon_HadCM3_rcp45_r1i1p1_200601.nc then areacell_var can be used to work out the name of the associated cell area file. In some cases, it might be as simple as replacing tas with the value of areacell_var.

Type

str

convert_scm_timeseries_cubes_to_openscmdata(scm_timeseries_cubes, out_calendar=None)

Convert dictionary of SCM timeseries cubes to an scmdata.ScmRun

Parameters
  • scm_timeseries_cubes (dict) – Dictionary of “region name”-ScmCube key-value pairs.

  • out_calendar (str) – Calendar to use for the time axis of the output

Returns

scmdata.ScmRun containing the data from the SCM timeseries cubes

Return type

scmdata.ScmRun

Raises

NotImplementedError – The (original) input data has dimensions other than time, latitude and longitude (so the data to convert has dimensions other than time).

dim_names

Names of the dimensions in this cube

Here the names are the standard_names which means there can be None in the output.

Type

list

ensemble_member = None

The ensemble member for which we want to load data e.g. ‘r1i1p1’

Type

str

experiment = None

The experiment for which we want to load data e.g. ‘1pctCO2’

Type

str

file_ext = None

The file extension of the data file we want to load e.g. ‘.nc’

Type

str

get_area_weights(areacell_scmcube=None)

Get area weights for this cube

Parameters

areacell_scmcube (ScmCube) – ScmCube containing areacell data. If None, we calculate the weights using iris.

Returns

Weights on the cube’s latitude-longitude grid.

Return type

np.ndarray

Raises
  • iris.exceptions.CoordinateMultiDimError – The cube’s co-ordinates are multi-dimensional and we don’t have cell area data.

  • ValueError – Area weights units are not as expected (contradict with self._area_weights_units).

get_data_directory()

Get the path to a data file from self’s attributes.

This can take multiple forms, it may just return a previously set filepath attribute or it could combine a number of different metadata elements (e.g. model name, experiment name) to create the data path.

Returns

path to the data file from which this cube has been/will be loaded

Return type

str

Raises

OSError – The data directory cannot be determined

get_data_filename()

Get the name of a data file from self’s attributes.

This can take multiple forms, it may just return a previously set filename attribute or it could combine a number of different metadata elements (e.g. model name, experiment name) to create the data name.

Returns

name of the data file from which this cube has been/will be loaded.

Return type

str

Raises

OSError – The data directory cannot be determined

classmethod get_data_reference_syntax(**kwargs)

Get data reference syntax for this cube

Parameters

kwargs (str) – Attributes of the cube to set before generating the example data reference syntax.

Returns

Example of the full path to a file for the given kwargs with this cube’s data reference syntax.

Return type

str

get_filepath_from_load_data_from_identifiers_args(**kwargs)[source]

Get the full filepath of the data to load from the arguments passed to self.load_data_from_identifiers.

Full details about the identifiers are given in Section 2 of the CMIP5 Data Reference Syntax.

Parameters

kwargs (str) – Identifiers to use to load the data

Returns

The full filepath (path and name) of the file to load.

Return type

str

Raises

AttributeError – An input argument does not match with the cube’s data reference syntax

get_load_data_from_identifiers_args_from_filepath(filepath)

Get the set of identifiers to use to load data from a filepath.

Parameters

filepath (str) – The filepath from which to load the data.

Returns

Set of arguments which can be passed to self.load_data_from_identifiers to load the data in the filepath.

Return type

dict

Raises

ValueError – Path and filename contradict each other

get_metadata_cube(metadata_variable, cube=None)

Load a metadata cube from self’s attributes.

Parameters
  • metadata_variable (str) – the name of the metadata variable to get, as it appears in the filename.

  • cube (ScmCube) – Optionally, pass in an already loaded metadata cube to link it to currently loaded cube

Returns

instance of self which has been loaded from the file containing the metadata variable of interest.

Return type

type(self)

Raises

TypeErrorcube is not an ScmCube

get_scm_timeseries(**kwargs)

Get SCM relevant timeseries from self.

Parameters

**kwargs – Passed to get_scm_timeseries_cubes()

Returns

scmdata.ScmRun instance with the data in the data attribute and metadata in the metadata attribute.

Return type

scmdata.ScmRun

get_scm_timeseries_cubes(lazy=False, **kwargs)

Get SCM relevant cubes

The effective areas used for each of the regions are added as auxillary co-ordinates of each timeseries cube.

If global, Northern Hemisphere and Southern Hemisphere land cubes are calculated, then three auxillary co-ordinates are also added to each cube: land_fraction, land_fraction_northern_hemisphere and land_fraction_southern_hemisphere. These co-ordinates document the area fraction that was considered to be land when the cubes were crunched i.e. land_fraction is the fraction of the entire globe which was considered to be land, land_fraction_northern_hemisphere is the fraction of the Northern Hemisphere which was considered to be land and land_fraction_southern_hemisphere is the fraction of the Southern Hemisphere which was considered to be land.

Parameters
  • lazy (bool) – Should I process the data lazily? This can be slow as data has to be read off disk multiple time.

  • kwargs (anys) – Passed to get_scm_timeseries_weights()

Returns

dict of str – Dictionary of cubes (region: cube key: value pairs), with latitude-longitude mean data as appropriate for each of the requested regions.

Return type

ScmCube

Raises

InvalidWeightsError – No valid weights are found for the requested regions

get_scm_timeseries_weights(surface_fraction_cube=None, areacell_scmcube=None, regions=None, cell_weights=None, log_failure=False)

Get the scm timeseries weights

Parameters
  • surface_fraction_cube (ScmCube, optional) – land surface fraction data which is used to determine whether a given gridbox is land or ocean. If None, we try to load the land surface fraction automatically.

  • areacell_scmcube (ScmCube, optional) – cell area data which is used to take the latitude-longitude mean of the cube’s data. If None, we try to load this data automatically and if that fails we fall back onto iris.analysis.cartography.area_weights.

  • regions (list[str]) – List of regions to use. If None then netcdf_scm.regions.DEFAULT_REGIONS is used.

  • cell_weights ({'area-only', 'area-surface-fraction'}) –

    How cell weights should be calculated. If 'area-surface-fraction', both cell area and its surface fraction will be used to weight the cell. If 'area-only', only the cell’s area will be used to weight the cell (cells which do not belong to the region are nonetheless excluded). If None, netCDF-SCM will guess whether land surface fraction weights should be included or not based on the data being processed. When guessing, for ocean data, netCDF-SCM will weight cells only by the horizontal area of the cell i.e. no land fraction (see Section L5 of Griffies et al., GMD, 2016, https://doi.org/10.5194/gmd-9-3231-2016). For land variables, netCDF-SCM will weight cells by both thier horizontal area and their land surface fraction. “Yes, you do need to weight the output by land frac (sftlf is the CMIP variable name).” (Chris Jones, personal communication, 18 April 2020). For land variables, note that there seems to be nothing in Jones et al., GMD, 2016 (https://doi.org/10.5194/gmd-9-2853-2016).

  • log_failure (bool) – Should regions which fail be logged? If no, failures are raised as warnings.

Returns

dict of str – Dictionary of ‘region name’: weights, key: value pairs

Return type

np.ndarray

Notes

Only regions which can be calculated are returned. If no regions can be calculated, an empty dictionary will be returned.

get_variable_constraint()

Get the iris variable constraint to use when loading data with self.load_data_from_identifiers.

Returns

constraint to use which ensures that only the variable of interest is loaded.

Return type

iris.Constraint

info

Information about the cube’s source files

res["files"] contains the files used to load the data in this cube. res["metadata"] contains information for each of the metadata cubes used to load the data in this cube.

Returns

Return type

dict

lat_dim

iris.coords.DimCoord The latitude dimension of the data.

lat_dim_number

The index which corresponds to the latitude dimension.

e.g. if latitude is the first dimension of the data, then self.lat_dim_number will be 0 (Python is zero-indexed).

Type

int

lat_lon_shape

2D Tuple of int which gives shape of a lat-lon slice of the data

e.g. if the cube’s shape is (4, 3, 5, 4) and its dimensions are (time, lat, depth, lon) then cube.lat_lon_shape will be (3, 4)

Raises

AssertionError – No lat lon slice can be deduced (if this happens, please raise an issue at https://gitlab.com/netcdf-scm/netcdf-scm/issues so we can address your use case).

Type

tuple

load_data_from_identifiers(process_warnings=True, **kwargs)

Load data using key identifiers.

The identifiers are used to determine the path of the file to load. The file is then loaded into an iris cube which can be accessed through self.cube.

Parameters
  • process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?

  • kwargs (any) – Arguments which can then be processed by self.get_filepath_from_load_data_from_identifiers_args and self.get_variable_constraint to determine the full filepath of the file to load and the variable constraint to use.

load_data_from_path(filepath, process_warnings=True)

Load data from a path.

Parameters
  • filepath (str) – The filepath from which to load the data.

  • process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?

load_data_in_directory(directory=None, process_warnings=True)

Load data in a directory.

The data is loaded into an iris cube which can be accessed through self.cube.

Initially, this method is intended to only be used to load data when it is saved in a number of different timeslice files e.g.:

  • tas_Amon_HadCM3_rcp45_r1i1p1_200601-203012.nc

  • tas_Amon_HadCM3_rcp45_r1i1p1_203101-203512.nc

  • tas_Amon_HadCM3_rcp45_r1i1p1_203601-203812.nc

It is not intended to be used to load multiple different variables or non-continuous timeseries. These use cases could be added in future, but are not required yet so have not been included.

Note that this function removes any attributes which aren’t common between the loaded cubes. In general, we have found that this mainly means creation_date, tracking_id and history are deleted. If unsure, please check.

Parameters
  • directory (str) – Directory from which to load the data.

  • process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?

Raises

ValueError – If the files in the directory are not from the same run (i.e. their filenames are not identical except for the timestamp) or if the files don’t form a continuous timeseries.

lon_dim

iris.coords.DimCoord The longitude dimension of the data.

lon_dim_number

The index which corresponds to the longitude dimension.

e.g. if longitude is the third dimension of the data, then self.lon_dim_number will be 2 (Python is zero-indexed).

Type

int

mip_era = 'CMIP5'

The MIP era to which this cube belongs

Type

str

mip_table = None

The mip_table for which we want to load data e.g. ‘Amon’

Type

str

model = None

The model for which we want to load data e.g. ‘CanESM2’

Type

str

netcdf_scm_realm

The realm in which netCDF-SCM thinks the data belongs.

This is used to make decisions about how to take averages of the data and where to find metadata variables.

If it is not sure, netCDF-SCM will guess that the data belongs to the ‘atmosphere’ realm.

Type

str

process_filename(filename)[source]

Cut a filename into its identifiers

Parameters

filename (str) – The filename to process. Filename here means just the filename, no path should be included.

Returns

A dictionary where each key is the identifier name and each value is the value of that identifier for the input filename

Return type

dict

process_path(path)[source]

Cut a path into its identifiers

Parameters

path (str) – The path to process. Path here means just the path, no filename should be included.

Returns

A dictionary where each key is the identifier name and each value is the value of that identifier for the input path

Return type

dict

root_dir = None

The root directory of the database i.e. where the cube should start its path

e.g. /home/users/usertim/cmip5_25x25

Type

str

surface_fraction_var

The name of the variable associated with the surface fraction in each gridbox.

If required, this is used when looking for the surface fraction file which belongs to a given data file. For example, if our data file is tas_Amon_HadCM3_rcp45_r1i1p1_200601.nc then surface_fraction_var can be used to work out the name of the associated surface fraction file. In some cases, it might be as simple as replacing tas with the value of surface_fraction_var.

Type

str

table_name_for_metadata_vars

The name of the ‘table’ in which metadata variables can be found.

For example, fx or Ofx.

We wrap this as a property as table typically means table_id but is sometimes referred to in other ways e.g. as mip_table in CMIP5.

Type

str

time_dim

iris.coords.DimCoord The time dimension of the data.

time_dim_number

The index which corresponds to the time dimension.

e.g. if time is the first dimension of the data, then self.time_dim_number will be 0 (Python is zero-indexed).

Type

int

time_period = None

The time period for which we want to load data

If None, this information isn’t included in the filename which is useful for loading metadata files which don’t have a relevant time period.

Type

str

time_period_regex

Regular expression which captures the timeseries identifier in input data files.

For help on regular expressions, see regular expressions.

Type

_sre.SRE_Pattern

timestamp_definitions

Definition of valid timestamp information and corresponding key values.

This follows the CMIP standards where time strings must be one of the following: YYYY, YYYYMM, YYYYMMDD, YYYYMMDDHH or one of the previous combined with a hyphen e.g. YYYY-YYYY.

Each key in the definitions dictionary is the length of the timestamp. Each value is itself a dictionary, with keys:

  • datetime_str: the string required to convert a timestamp of this length into a datetime using datetime.datetime.strptime

  • generic_regexp: a regular expression which will match timestamps in this format

  • expected_timestep: a dateutil.relativedelta.relativedelta object which contains the expected timestep in files with this timestamp

Returns

Return type

dict

Examples

>>> self.timestamp_definitions[len("2012")]["datetime_str"]
"%Y"
variable_name = None

The variable for which we want to load data e.g. ‘tas’

Type

str

class netcdf_scm.iris_cube_wrappers.ScmCube[source]

Bases: object

Class for processing netCDF files for use with simple climate models.

areacell_var

The name of the variable associated with the area of each gridbox.

If required, this is used to determine the area of each cell in a data file. For example, if our data file is tas_Amon_HadCM3_rcp45_r1i1p1_200601.nc then areacell_var can be used to work out the name of the associated cell area file. In some cases, it might be as simple as replacing tas with the value of areacell_var.

Type

str

convert_scm_timeseries_cubes_to_openscmdata(scm_timeseries_cubes, out_calendar=None)[source]

Convert dictionary of SCM timeseries cubes to an scmdata.ScmRun

Parameters
  • scm_timeseries_cubes (dict) – Dictionary of “region name”-ScmCube key-value pairs.

  • out_calendar (str) – Calendar to use for the time axis of the output

Returns

scmdata.ScmRun containing the data from the SCM timeseries cubes

Return type

scmdata.ScmRun

Raises

NotImplementedError – The (original) input data has dimensions other than time, latitude and longitude (so the data to convert has dimensions other than time).

cube = None

The Iris cube which is wrapped by this ScmCube instance.

Type

iris.cube.Cube

dim_names

Names of the dimensions in this cube

Here the names are the standard_names which means there can be None in the output.

Type

list

get_area_weights(areacell_scmcube=None)[source]

Get area weights for this cube

Parameters

areacell_scmcube (ScmCube) – ScmCube containing areacell data. If None, we calculate the weights using iris.

Returns

Weights on the cube’s latitude-longitude grid.

Return type

np.ndarray

Raises
  • iris.exceptions.CoordinateMultiDimError – The cube’s co-ordinates are multi-dimensional and we don’t have cell area data.

  • ValueError – Area weights units are not as expected (contradict with self._area_weights_units).

get_metadata_cube(metadata_variable, cube=None)[source]

Load a metadata cube from self’s attributes.

Parameters
  • metadata_variable (str) – the name of the metadata variable to get, as it appears in the filename.

  • cube (ScmCube) – Optionally, pass in an already loaded metadata cube to link it to currently loaded cube.

Returns

instance of self which has been loaded from the file containing the metadata variable of interest.

Return type

type(self)

Raises

TypeErrorcube is not an ScmCube

get_scm_timeseries(**kwargs)[source]

Get SCM relevant timeseries from self.

Parameters

**kwargs – Passed to get_scm_timeseries_cubes()

Returns

scmdata.ScmRun instance with the data in the data attribute and metadata in the metadata attribute.

Return type

scmdata.ScmRun

get_scm_timeseries_cubes(lazy=False, **kwargs)[source]

Get SCM relevant cubes

The effective areas used for each of the regions are added as auxillary co-ordinates of each timeseries cube.

If global, Northern Hemisphere and Southern Hemisphere land cubes are calculated, then three auxillary co-ordinates are also added to each cube: land_fraction, land_fraction_northern_hemisphere and land_fraction_southern_hemisphere. These co-ordinates document the area fraction that was considered to be land when the cubes were crunched i.e. land_fraction is the fraction of the entire globe which was considered to be land, land_fraction_northern_hemisphere is the fraction of the Northern Hemisphere which was considered to be land and land_fraction_southern_hemisphere is the fraction of the Southern Hemisphere which was considered to be land.

Parameters
  • lazy (bool) – Should I process the data lazily? This can be slow as data has to be read off disk multiple time.

  • kwargs (anys) – Passed to get_scm_timeseries_weights()

Returns

dict of str – Dictionary of cubes (region: cube key: value pairs), with latitude-longitude mean data as appropriate for each of the requested regions.

Return type

ScmCube

Raises

InvalidWeightsError – No valid weights are found for the requested regions

get_scm_timeseries_weights(surface_fraction_cube=None, areacell_scmcube=None, regions=None, cell_weights=None, log_failure=False)[source]

Get the scm timeseries weights

Parameters
  • surface_fraction_cube (ScmCube, optional) – land surface fraction data which is used to determine whether a given gridbox is land or ocean. If None, we try to load the land surface fraction automatically.

  • areacell_scmcube (ScmCube, optional) – cell area data which is used to take the latitude-longitude mean of the cube’s data. If None, we try to load this data automatically and if that fails we fall back onto iris.analysis.cartography.area_weights.

  • regions (list[str]) – List of regions to use. If None then netcdf_scm.regions.DEFAULT_REGIONS is used.

  • cell_weights ({'area-only', 'area-surface-fraction'}) –

    How cell weights should be calculated. If 'area-surface-fraction', both cell area and its surface fraction will be used to weight the cell. If 'area-only', only the cell’s area will be used to weight the cell (cells which do not belong to the region are nonetheless excluded). If None, netCDF-SCM will guess whether land surface fraction weights should be included or not based on the data being processed. When guessing, for ocean data, netCDF-SCM will weight cells only by the horizontal area of the cell i.e. no land fraction (see Section L5 of Griffies et al., GMD, 2016, https://doi.org/10.5194/gmd-9-3231-2016). For land variables, netCDF-SCM will weight cells by both thier horizontal area and their land surface fraction. “Yes, you do need to weight the output by land frac (sftlf is the CMIP variable name).” (Chris Jones, personal communication, 18 April 2020). For land variables, note that there seems to be nothing in Jones et al., GMD, 2016 (https://doi.org/10.5194/gmd-9-2853-2016).

  • log_failure (bool) – Should regions which fail be logged? If no, failures are raised as warnings.

Returns

dict of str – Dictionary of ‘region name’: weights, key: value pairs

Return type

np.ndarray

Notes

Only regions which can be calculated are returned. If no regions can be calculated, an empty dictionary will be returned.

info

Information about the cube’s source files

res["files"] contains the files used to load the data in this cube. res["metadata"] contains information for each of the metadata cubes used to load the data in this cube.

Returns

Return type

dict

lat_dim

iris.coords.DimCoord The latitude dimension of the data.

lat_dim_number

The index which corresponds to the latitude dimension.

e.g. if latitude is the first dimension of the data, then self.lat_dim_number will be 0 (Python is zero-indexed).

Type

int

lat_lon_shape

2D Tuple of int which gives shape of a lat-lon slice of the data

e.g. if the cube’s shape is (4, 3, 5, 4) and its dimensions are (time, lat, depth, lon) then cube.lat_lon_shape will be (3, 4)

Raises

AssertionError – No lat lon slice can be deduced (if this happens, please raise an issue at https://gitlab.com/netcdf-scm/netcdf-scm/issues so we can address your use case).

Type

tuple

lat_name = 'latitude'

The expected name of the latitude co-ordinate in data.

Type

str

load_data_from_path(filepath, process_warnings=True)[source]

Load data from a path.

If you are using the ScmCube class directly, this method simply loads the path into an iris cube which can be accessed through self.cube.

If implemented on a subclass of ScmCube, this method should:

  • use self.get_load_data_from_identifiers_args_from_filepath to determine the suitable set of arguments to pass to self.load_data_from_identifiers from the filepath

  • load the data using self.load_data_from_identifiers as this method contains much better checks and helper components

Parameters
  • filepath (str) – The filepath from which to load the data.

  • process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?

load_data_in_directory(directory=None, process_warnings=True)[source]

Load data in a directory.

The data is loaded into an iris cube which can be accessed through self.cube.

Initially, this method is intended to only be used to load data when it is saved in a number of different timeslice files e.g.:

  • tas_Amon_HadCM3_rcp45_r1i1p1_200601-203012.nc

  • tas_Amon_HadCM3_rcp45_r1i1p1_203101-203512.nc

  • tas_Amon_HadCM3_rcp45_r1i1p1_203601-203812.nc

It is not intended to be used to load multiple different variables or non-continuous timeseries. These use cases could be added in future, but are not required yet so have not been included.

Note that this function removes any attributes which aren’t common between the loaded cubes. In general, we have found that this mainly means creation_date, tracking_id and history are deleted. If unsure, please check.

Parameters
  • directory (str) – Directory from which to load the data.

  • process_warnings (bool) – Should I process warnings to add e.g. missing metadata information?

Raises

ValueError – If the files in the directory are not from the same run (i.e. their filenames are not identical except for the timestamp) or if the files don’t form a continuous timeseries.

lon_dim

iris.coords.DimCoord The longitude dimension of the data.

lon_dim_number

The index which corresponds to the longitude dimension.

e.g. if longitude is the third dimension of the data, then self.lon_dim_number will be 2 (Python is zero-indexed).

Type

int

lon_name = 'longitude'

The expected name of the longitude co-ordinate in data.

Type

str

netcdf_scm_realm

The realm in which netCDF-SCM thinks the data belongs.

This is used to make decisions about how to take averages of the data and where to find metadata variables.

If it is not sure, netCDF-SCM will guess that the data belongs to the ‘atmosphere’ realm.

Type

str

surface_fraction_var

The name of the variable associated with the surface fraction in each gridbox.

If required, this is used when looking for the surface fraction file which belongs to a given data file. For example, if our data file is tas_Amon_HadCM3_rcp45_r1i1p1_200601.nc then surface_fraction_var can be used to work out the name of the associated surface fraction file. In some cases, it might be as simple as replacing tas with the value of surface_fraction_var.

Type

str

table_name_for_metadata_vars

The name of the ‘table’ in which metadata variables can be found.

For example, fx or Ofx.

We wrap this as a property as table typically means table_id but is sometimes referred to in other ways e.g. as mip_table in CMIP5.

Type

str

time_dim

iris.coords.DimCoord The time dimension of the data.

time_dim_number

The index which corresponds to the time dimension.

e.g. if time is the first dimension of the data, then self.time_dim_number will be 0 (Python is zero-indexed).

Type

int

time_name = 'time'

The expected name of the time co-ordinate in data.

Type

str

time_period_regex

Regular expression which captures the timeseries identifier in input data files.

For help on regular expressions, see regular expressions.

Type

_sre.SRE_Pattern

time_period_separator = '-'

Character used to separate time period strings in the time period indicator in filenames.

e.g. - is the ‘time period separator’ in “2015-2030”.

Type

str

timestamp_definitions

Definition of valid timestamp information and corresponding key values.

This follows the CMIP standards where time strings must be one of the following: YYYY, YYYYMM, YYYYMMDD, YYYYMMDDHH or one of the previous combined with a hyphen e.g. YYYY-YYYY.

Each key in the definitions dictionary is the length of the timestamp. Each value is itself a dictionary, with keys:

  • datetime_str: the string required to convert a timestamp of this length into a datetime using datetime.datetime.strptime

  • generic_regexp: a regular expression which will match timestamps in this format

  • expected_timestep: a dateutil.relativedelta.relativedelta object which contains the expected timestep in files with this timestamp

Returns

Return type

dict

Examples

>>> self.timestamp_definitions[len("2012")]["datetime_str"]
"%Y"

Weights API

Module which calculates the weights to be used when taking SCM-box averages

This typically requires considering both the fraction of each cell which is of the desired type (e.g. land or ocean) and the area of each cell. The combination of these two pieces of information creates the weights for each cell which are used when taking area-weighted means.

class netcdf_scm.weights.AreaSurfaceFractionWeightCalculator(cube, **kwargs)[source]

Bases: netcdf_scm.weights.CubeWeightCalculator

Calculates weights which are both area and surface fraction weighted

\[\begin{split}w(lat, lon) = a(lat, lon) \\times s(lat, lon)\end{split}\]

where \(w(lat, lon)\) is the weight of the cell at given latitude and longitude, \(a\) is area of the cell and \(s\) is the surface fraction of the cell (e.g. fraction of ocean area for ocean based regions).

get_weights(weights_names, log_failure=False)

Get a number of weights

Parameters
  • weights_names (list of str) – List of weights to attempt to load/calculate.

  • log_failure (bool) – Should failures be logged? If no, failures are raised as warnings.

Returns

dict of str – Dictionary where keys are weights names and values are np.ndarray of bool. The result only contains valid weights. Any invalid weights are dropped.

Return type

np.ndarray

Notes

This method handles all exceptions and will only return weights which can actually be calculated. If no weights could be calculated, an empty dictionary will be returned.

get_weights_array(weights_name)

Get a single weights array

If the weights have previously been calculated the precalculated result is returned from the cache. Otherwise the appropriate WeightFunc is called with any kwargs specified in the constructor.

Parameters

weights_name (str) – Region to get weights for

Returns

Weights for the region specified by weights_name

Return type

ndarray[bool]

Raises

InvalidWeightsError – If the cube has no data which matches the input weights or is invalid in any other way

get_weights_array_without_area_weighting(weights_name)

Get a single normalised weights array without any consideration of area weighting

The weights are normalised to be in the range [0, 1]

Parameters

weights_name (str) – Region to get weights for

Returns

Weights, normalised to be in the range [0, 1]

Return type

np.ndarray

Raises
  • InvalidWeightsError – If the requested weights cannot be found or evaluated

  • ValueError – The retrieved weights are not normalised to the range [0, 1]

class netcdf_scm.weights.AreaWeightCalculator(cube, **kwargs)[source]

Bases: netcdf_scm.weights.CubeWeightCalculator

Calculates weights which are area weighted but surface fraction aware.

This means that any cells which have a surface fraction of zero will receive zero weight, otherwise cells are purely area weighted.

\[\begin{split}w(lat, lon) = \\begin{cases} a(lat, lon), & s(lat, lon) > 0 \\\\ 0, & s(lat, lon) = 0 \\end{cases}\end{split}\]

where \(w(lat, lon)\) is the weight of the cell at given latitude and longitude, \(a\) is area of the cell and \(s\) is the surface fraction of the cell (e.g. fraction of ocean area for ocean based regions).

get_weights(weights_names, log_failure=False)

Get a number of weights

Parameters
  • weights_names (list of str) – List of weights to attempt to load/calculate.

  • log_failure (bool) – Should failures be logged? If no, failures are raised as warnings.

Returns

dict of str – Dictionary where keys are weights names and values are np.ndarray of bool. The result only contains valid weights. Any invalid weights are dropped.

Return type

np.ndarray

Notes

This method handles all exceptions and will only return weights which can actually be calculated. If no weights could be calculated, an empty dictionary will be returned.

get_weights_array(weights_name)

Get a single weights array

If the weights have previously been calculated the precalculated result is returned from the cache. Otherwise the appropriate WeightFunc is called with any kwargs specified in the constructor.

Parameters

weights_name (str) – Region to get weights for

Returns

Weights for the region specified by weights_name

Return type

ndarray[bool]

Raises

InvalidWeightsError – If the cube has no data which matches the input weights or is invalid in any other way

get_weights_array_without_area_weighting(weights_name)

Get a single normalised weights array without any consideration of area weighting

The weights are normalised to be in the range [0, 1]

Parameters

weights_name (str) – Region to get weights for

Returns

Weights, normalised to be in the range [0, 1]

Return type

np.ndarray

Raises
  • InvalidWeightsError – If the requested weights cannot be found or evaluated

  • ValueError – The retrieved weights are not normalised to the range [0, 1]

class netcdf_scm.weights.CubeWeightCalculator(cube, **kwargs)[source]

Bases: abc.ABC

Computes weights for a given cube in a somewhat efficient manner.

Previously calculated weights are cached so each set of weights is only calculated once. This implementation trades off some additional memory overhead for the ability to generate arbitary weights.

Adding new weights

Additional weights can be added to netcdf_scm.weights.weights. The values in weights should be WeightFunc’s. A WeightFunc is a function which takes a ScmCube, CubeWeightCalculator and any additional keyword arguments. The function should return a numpy array with the same dimensionality as the ScmCube.

These WeightFunc’s can be composed together to create more complex functionality using e.g. multiply_weights.

get_weights(weights_names, log_failure=False)[source]

Get a number of weights

Parameters
  • weights_names (list of str) – List of weights to attempt to load/calculate.

  • log_failure (bool) – Should failures be logged? If no, failures are raised as warnings.

Returns

dict of str – Dictionary where keys are weights names and values are np.ndarray of bool. The result only contains valid weights. Any invalid weights are dropped.

Return type

np.ndarray

Notes

This method handles all exceptions and will only return weights which can actually be calculated. If no weights could be calculated, an empty dictionary will be returned.

get_weights_array(weights_name)[source]

Get a single weights array

If the weights have previously been calculated the precalculated result is returned from the cache. Otherwise the appropriate WeightFunc is called with any kwargs specified in the constructor.

Parameters

weights_name (str) – Region to get weights for

Returns

Weights for the region specified by weights_name

Return type

ndarray[bool]

Raises

InvalidWeightsError – If the cube has no data which matches the input weights or is invalid in any other way

get_weights_array_without_area_weighting(weights_name)[source]

Get a single normalised weights array without any consideration of area weighting

The weights are normalised to be in the range [0, 1]

Parameters

weights_name (str) – Region to get weights for

Returns

Weights, normalised to be in the range [0, 1]

Return type

np.ndarray

Raises
  • InvalidWeightsError – If the requested weights cannot be found or evaluated

  • ValueError – The retrieved weights are not normalised to the range [0, 1]

exception netcdf_scm.weights.InvalidWeightsError[source]

Bases: Exception

Raised when a weight cannot be calculated.

This error usually propogates. For example, if a child weight used in the calculation of a parent weight fails then the parent weight should also raise an InvalidWeightsError exception (unless it can be satisfactorily handled).

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

netcdf_scm.weights.get_ar6_region_weights(region)[source]

Get a function to calculate the weights for a given AR6 region

AR6 regions defined in Iturbide et al., 2020 https://essd.copernicus.org/preprints/essd-2019-258/

Parameters

region (str) – AR6 region to extract

Returns

WeightFunc which weights out everything except the specified area

Return type

WeightFunc()

netcdf_scm.weights.get_binary_nh_weights(weight_calculator, cube, **kwargs)[source]

Get binary weights to only include the Northern Hemisphere

Parameters
  • weight_calculator (CubeWeightCalculator) – Cube weight calculator from which to retrieve the weights

  • cube (ScmCube) – Cube to create weights for

  • kwargs (Any) – Ignored (required for compatibility with CubeWeightCalculator)

Returns

Binary northern hemisphere weights

Return type

np.ndarray

netcdf_scm.weights.get_default_sftlf_cube[source]

Load netCDF-SCM’s default (last resort) surface land fraction cube

netcdf_scm.weights.get_land_weights(weight_calculator, cube, sftlf_cube=None, **kwargs)[source]

Get the land weights

The weights are always adjusted to have units of percentage. If the units are detected to be fraction rather than percentage, they will be automatically adjusted and a warning will be thrown.

If the default sftlf cube is used, it is regridded onto cube’s grid using a linear interpolation. We hope to use an area-weighted regridding in future but at the moment its performance is not good enough to be put into production ( approximately 100x slower than the linear interpolation regridding).

Parameters
  • weight_calculator (CubeWeightCalculator) – Cube weight calculator from which to retrieve the weights

  • cube (ScmCube) – Cube to create weights for

  • sftlf_cube (ScmCube) – Cube containing the surface land-fraction data

  • kwargs (Any) – Ignored (required for compatibility with CubeWeightCalculator)

Returns

Land weights

Return type

np.ndarray

Raises

AssertionError – The land weights are incompatible with the cube’s lat-lon grid

netcdf_scm.weights.get_natural_earth_50m_scale_region_weights(region)[source]

Get a function to calculate the weights for a given Natural Earth defined region

We use the 50m scale from Natural Earth and the implementation provided by regionmask.

Parameters

region (str) – Natural Earth region to extract

Returns

WeightFunc which weights out everything except the specified area

Return type

WeightFunc()

netcdf_scm.weights.get_nh_weights(weight_calculator, cube, **kwargs)[source]

Get weights to only include the Northern Hemisphere

Parameters
  • weight_calculator (CubeWeightCalculator) – Cube weight calculator from which to retrieve the weights

  • cube (ScmCube) – Cube to create weights for

  • kwargs (Any) – Ignored (required for compatibility with CubeWeightCalculator)

Returns

Northern hemisphere weights

Return type

np.ndarray

netcdf_scm.weights.get_ocean_weights(weight_calculator, cube, sftof_cube=None, **kwargs)[source]

Get the ocean weights

The weights are always adjusted to have units of percentage.

Parameters
  • weight_calculator (CubeWeightCalculator) – Cube weight calculator from which to retrieve the weights

  • cube (ScmCube) – Cube to create weights for

  • sftof_cube (ScmCube) – Cube containing the surface ocean-fraction data

  • kwargs (Any) – Ignored (required for compatibility with CubeWeightCalculator)

Returns

Ocean weights

Return type

np.ndarray

Raises

AssertionError – The ocean weights are incompatible with the cube’s lat-lon grid

netcdf_scm.weights.get_sh_weights(weight_calculator, cube, **kwargs)[source]

Get weights to only include the Southern Hemisphere

Parameters
  • weight_calculator (CubeWeightCalculator) – Cube weight calculator from which to retrieve the weights

  • cube (ScmCube) – Cube to create weights for

  • kwargs (Any) – Ignored (required for compatibility with CubeWeightCalculator)

Returns

Southern hemisphere weights

Return type

np.ndarray

netcdf_scm.weights.get_weights_for_area(lower_lat, left_lon, upper_lat, right_lon)[source]

Weights a subset of the globe using latitudes and longitudes (in degrees East)

Iris’ standard behaviour is to include any point whose bounds overlap with the given ranges e.g. if the range is (0, 130) then a cell whose bounds were (-90, 5) would be included even if its point were -42.5.

This can be altered with the ignore_bounds keyword argument to cube.intersection. In this case only cells whose points lie within the range are included so if the range is (0, 130) then a cell whose bounds were (-90, 5) would be excluded if its point were -42.5.

Here we follow the ignore_bounds=True behaviour (i.e. only include if the point lies within the specified range). If we want to only include the cell if the entire box is within a point we’re going to need to tweak things. Given this isn’t available in iris, it seems to be an unusual way to do intersection so we haven’t implemented it.

Circular coordinates (longitude) can cross the 0E.

Parameters
  • lower_lat (int or float) – Lower latitude bound (degrees North)

  • left_lon (int or float) – Lower longitude bound (degrees East)

  • upper_lat (int or float) – Upper latitude bound (degrees North)

  • right_lon (int or float) – Upper longitude bound (degrees East)

Returns

WeightFunc which weights out everything except the specified area

Return type

WeightFunc()

netcdf_scm.weights.get_world_weights(weight_calculator, cube, **kwargs)[source]

Get weights for the world

Parameters
  • weight_calculator (CubeWeightCalculator) – Cube weight calculator from which to retrieve the weights

  • cube (ScmCube) – Cube to create weights for

  • kwargs (Any) – Ignored (required for compatibility with CubeWeightCalculator)

Returns

Weights which can be used for the world mean calculation

Return type

np.ndarray

netcdf_scm.weights.multiply_weights(weight_a, weight_b)[source]

Take the product of two weights

Parameters
  • weight_a (str or WeightFunc) – If a string is provided, the weights specified by the string are retrieved. Otherwise the WeightFunc is evaluated at runtime

  • weight_b (str or WeightFunc) – If a string is provided, the weights specified by the string are retrieved. Otherwise the WeightFunc is evaluated at runtime

Returns

WeightFunc which multiplies the input weights

Return type

WeightFunc()

netcdf_scm.weights.subtract_weights(weights_to_subtract, subtract_from)[source]

Subtract weights from some other number

e.g. useful to convert e.g. from fraction of land to ocean (where ocean fractions are 1 - land fractions)

Parameters
  • weights_to_subtract (str) – Name of the weights to subtract. These weights are loaded at evaluation time.

  • subtract_from (float) – The number from which to subtract the values of weights_to_invert (once loaded)

Returns

WeightFunc which subtracts the input weights from subtract_from

Return type

WeightFunc()

Citing API

Helper tools for citing Coupled Model Intercomparison Project data

netcdf_scm.citing.check_licenses(scmruns)[source]

Check datasets for non-standard licenses

Non-standard licenses result in a warning

Parameters

scmruns (list of scmdata.ScmRun) – Datasets to check the licenses of

Returns

Datasets with non-standard licenses

Return type

list of scmdata.ScmRun

netcdf_scm.citing.get_citation_tables(database)[source]

Get citation tables for a given set of CMIP data

Parameters

database (list of ScmRun) – Set of CMIP data for which we want to create citation tables

Returns

dict of str – Dictionary containing the citation table and bibtex references for each MIP era used in database

Return type

Union[List, pd.DataFrame]

Raises

ValueError – Any ScmRun in database has a mip_era other than “CMIP5” or “CMIP6”

Command-line interface

netcdf-scm

NetCDF-SCM’s command-line interface

netcdf-scm [OPTIONS] COMMAND [ARGS]...

Options

--log-level <log_level>
Options

DEBUG | INFO | WARNING | ERROR | EXCEPTION | CRITICAL

crunch

Crunch data in src to netCDF-SCM .nc files in dst.

src is searched recursively and netcdf-scm will attempt to crunch all the files found. The directory structure in src will be mirrored in dst.

Failures and warnings are recorded and written into a text file in dst. We recommend examining this file using a file analysis tool such as grep. We often use the command grep "\|WARNING\|INFO\|ERROR <log-file>.

crunch_contact is written into the output .nc files’ crunch_contact attribute.

netcdf-scm crunch [OPTIONS] SRC DST CRUNCH_CONTACT

Options

--drs <drs>

Data reference syntax to use for crunching.

Default

Scm

Options

Scm | MarbleCMIP5 | CMIP6Input4MIPs | CMIP6Output

--regexp <regexp>

Regular expression to apply to file directory (only crunches matches). Be careful, if you use a very copmlex regexp directory sorting can be extremely slow (see e.g. discussion at https://stackoverflow.com/a/5428712)!

Default

^(?!.*(fx)).*$

--regions <regions>

Comma-separated regions to crunch.

Default

World,World|Northern Hemisphere,World|Southern Hemisphere,World|Land,World|Ocean,World|Northern Hemisphere|Land,World|Southern Hemisphere|Land,World|Northern Hemisphere|Ocean,World|Southern Hemisphere|Ocean

--data-sub-dir <data_sub_dir>

Sub-directory of dst to save data in.

Default

netcdf-scm-crunched

-f, --force, --do-not-force

Overwrite any existing files.

Default

False

--small-number-workers <small_number_workers>

Maximum number of workers to use when crunching files.

Default

10

--small-threshold <small_threshold>

Maximum number of data points (in millions) in a file for it to be processed in parallel with small-number-workers

Default

50.0

--medium-number-workers <medium_number_workers>

Maximum number of workers to use when crunching files.

Default

3

--medium-threshold <medium_threshold>

Maximum number of data points (in millions) in a file for it to be processed in parallel with medium-number-workers

Default

120.0

--force-lazy-threshold <force_lazy_threshold>

Maximum number of data points (in millions) in a file for it to be processed in memory

Default

1000.0

--cell-weights <cell_weights>

How to weight cells when calculating aggregates. If ‘area-surface-fraction’, land surface fraction weights will be included when taking cell means. If ‘area-only’, land surface fraction weights will not be included when taking cell means, hence cells will only be weighted by their area. If nothing is provided, netCDF-SCM will guess whether land surface fraction weights should be included or not based on the data being processed. See netcdf_scm.iris_cube_wrappers.ScmCube.get_scm_timeseries_weights() for more details.

Options

area-only | area-surface-fraction

Arguments

SRC

Required argument

DST

Required argument

CRUNCH_CONTACT

Required argument

stitch

Stitch netCDF-SCM .nc files together and write out in the specified format.

SRC is searched recursively and netcdf-scm will attempt to stitch all the files found. Output is written in DST.

STITCH_CONTACT is written into the header of the output files.

netcdf-scm stitch [OPTIONS] SRC DST STITCH_CONTACT

Options

--regexp <regexp>

Regular expression to apply to file directory (only stitches matches). Be careful, if you use a very copmlex regexp directory sorting can be extremely slow (see e.g. discussion at https://stackoverflow.com/a/5428712)!

Default

^(?!.*(fx)).*$

--prefix <prefix>

Prefix to apply to output file names (not paths).

--out-format <out_format>

Format to re-write crunched data into. The time operation conventions follow those in Pymagicc .

Default

mag-files

Options

mag-files | mag-files-average-year-start-year | mag-files-average-year-mid-year | mag-files-average-year-end-year | mag-files-point-start-year | mag-files-point-mid-year | mag-files-point-end-year | magicc-input-files | magicc-input-files-average-year-start-year | magicc-input-files-average-year-mid-year | magicc-input-files-average-year-end-year | magicc-input-files-point-start-year | magicc-input-files-point-mid-year | magicc-input-files-point-end-year | tuningstrucs-blend-model

--drs <drs>

Data reference syntax to use to decipher paths. This is required to ensure the output folders match the input data reference syntax.

Default

None

Options

None | MarbleCMIP5 | CMIP6Input4MIPs | CMIP6Output

-f, --force, --do-not-force

Overwrite any existing files.

Default

False

--number-workers <number_workers>

Number of worker (threads) to use when stitching.

Default

4

--target-units-specs <target_units_specs>

csv containing target units for stitched variables.

--normalise <normalise>

How to normalise the data relative to piControl (if not provided, no normalisation is performed).

Options

31-yr-mean-after-branch-time | 21-yr-running-mean | 21-yr-running-mean-dedrift | 30-yr-running-mean | 30-yr-running-mean-dedrift

Arguments

SRC

Required argument

DST

Required argument

STITCH_CONTACT

Required argument

wrangle

Wrangle netCDF-SCM .nc files into other formats and directory structures.

src is searched recursively and netcdf-scm will attempt to wrangle all the files found.

wrangle_contact is written into the header of the output files.

netcdf-scm wrangle [OPTIONS] SRC DST WRANGLE_CONTACT

Options

--regexp <regexp>

Regular expression to apply to file directory (only wrangles matches). Be careful, if you use a very copmlex regexp directory sorting can be extremely slow (see e.g. discussion at https://stackoverflow.com/a/5428712)!

Default

^(?!.*(fx)).*$

--prefix <prefix>

Prefix to apply to output file names (not paths).

--out-format <out_format>

Format to re-write crunched data into. The time operation conventions follow those in Pymagicc.

Default

mag-files

Options

mag-files | mag-files-average-year-start-year | mag-files-average-year-mid-year | mag-files-average-year-end-year | mag-files-point-start-year | mag-files-point-mid-year | mag-files-point-end-year | magicc-input-files | magicc-input-files-average-year-start-year | magicc-input-files-average-year-mid-year | magicc-input-files-average-year-end-year | magicc-input-files-point-start-year | magicc-input-files-point-mid-year | magicc-input-files-point-end-year | tuningstrucs-blend-model

--drs <drs>

Data reference syntax to use to decipher paths. This is required to ensure the output folders match the input data reference syntax.

Default

None

Options

None | MarbleCMIP5 | CMIP6Input4MIPs | CMIP6Output

-f, --force, --do-not-force

Overwrite any existing files.

Default

False

--number-workers <number_workers>

Number of worker (threads) to use when wrangling.

Default

4

--target-units-specs <target_units_specs>

csv containing target units for wrangled variables.

Arguments

SRC

Required argument

DST

Required argument

WRANGLE_CONTACT

Required argument

Crunching API

Module for crunching raw netCDF data into netCDF-SCM netCDF files

Definitions API

Miscellaneous definitions used in netCDF-SCM

netcdf_scm.definitions.NAME_COMPONENTS_SEPARATOR = '_'

Character assumed to separate different components within a filename

For example, if we come across a filename like ‘tas_r1i1p1f1_UoM-Fancy’ then we assume that ‘tas’, ‘r1i1p1f1’ and ‘UoM-Fancy’ all refer to different bits of metadata which are encoded within the filename.

Type

str

netcdf_scm.definitions.OUTPUT_PREFIX = 'netcdf-scm'

Prefix attached to outputs from netCDF-SCM by default

Type

str

Errors API

netCDF-SCM’s custom error handling

exception netcdf_scm.errors.NoLicenseInformationError[source]

Bases: AttributeError

Exception raised when a dataset contains no license information

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception netcdf_scm.errors.NonStandardLicenseError[source]

Bases: ValueError

Exception raised when a dataset contains a non-standard license

For example, if a CMIP6 dataset does not contain a Creative Commons Attribution ShareAlike 4.0 International License

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

netcdf_scm.errors.raise_no_iris_warning()[source]

Raise a warning that iris is not installed

Warns

UserWarningIris is not installed

IO API

Input and output from netCDF-SCM’s netCDF format

netcdf_scm.io.get_scmcube_helper(drs)[source]

Get ScmCube helper for a given data reference syntax

drsstr

Data reference syntax to get the helper cube for

Returns

Instance of sub-class of netcdf_scm.iris_cube_wrappers.ScmCube which matches the input data reference syntax

Return type

netcdf_scm.iris_cube_wrappers.ScmCube

Raises
netcdf_scm.io.load_mag_file(infile, drs)[source]

Load .MAG file with automatic infilling of metadata if possible

Parameters
  • infile (str) – File to load (use the full path for best results as this is used to determine the metadata)

  • drs (str) – Data reference syntax to use with this file

Returns

pymagicc.io.MAGICCData with the data and metadata contained in the file.

Return type

pymagicc.io.MAGICCData

Warns

UserWarning – Some or all of the metadata couldn’t be determined from infile with the given drs.

netcdf_scm.io.load_scmrun(path)[source]

Load an scmdata.ScmRun instance from a netCDF-SCM .nc file

Parameters

path (str) – Path from which to load the data

Returns

scmdata.ScmRun containing the data in path.

Return type

scmdata.ScmRun

netcdf_scm.io.save_netcdf_scm_nc(cubes, out_path)[source]

Save a series of cubes to a .nc file

Parameters
  • cubes (dict) – Dictionary of “region name”-ScmCube key-value pairs. The cubes will all be saved in the same .nc file.

  • out_path (str) – Path in which to save the data

Miscellaneous Readers API

Miscellaneous readers for files which can’t otherwise be read

netcdf_scm.misc_readers.read_cmip6_concs_gmnhsh(filepath, region_coord_name='sector')[source]

Read CMIP6 concentrations global and hemispheric mean data

Parameters
  • filepath (str) – Filepath from which to read the data

  • region_coord_name (str) – The name of the co-ordinate which represents the region in the datafile.

Returns

scmdata.ScmRun containing the global and hemispheric mean data

Return type

scmdata.ScmRun

Raises

AssertionError – Defensive assertion: the code is being used in an unexpected way

Normalisation API

Normalisation handling

Within netCDF-SCM, ‘normalisation’ refers to taking anomalies from some set of reference values. For example, subtracting a 21-year running mean from a pre-industrial control experiment from the results of a projection experiment.

netcdf_scm.normalisation.get_normaliser(key)[source]

Get the appropriate normaliser for a given key

Parameters

key (str) – Key which specifies the type of normaliser to get

Returns

Normaliser appropriate for key

Return type

netcdf_scm.normalisation.base.Normaliser

Raises

ValueErrorkey cannot be mapped to a known normaliser

Base API

Base class for normalisation operations

class netcdf_scm.normalisation.base.Normaliser[source]

Bases: abc.ABC

Base class for normalising operations

get_reference_values(indata, picontrol, picontrol_branching_time)[source]

Get reference values for an experiment from its equivalent piControl experiment

Parameters
  • indata (scmdata.ScmRun) – Experiment to calculate reference values for

  • picontrol (scmdata.ScmRun) – Pre-industrial control run data

  • picontrol_branching_time (datetime.datetime) – The branching time in the pre-industrial experiment. It is assumed that the first timepoint in input follows immediately from this branching time.

Returns

Reference values with the same index and columns as indata

Return type

pd.DataFrame

Raises
method_name

Name of the method used for normalisation

This string is included in the metadata of normalised data/files.

Type

str

normalise_against_picontrol(indata, picontrol, picontrol_branching_time)[source]

Normalise data against picontrol

Parameters
  • indata (scmdata.ScmRun) – Data to normalise

  • picontrol (scmdata.ScmRun) – Pre-industrial control run data

  • picontrol_branching_time (datetime.datetime) – The branching time in the pre-industrial experiment. It is assumed that the first timepoint in input follows immediately from this branching time.

Returns

Normalised data including metadata about the file which was used for normalisation and the normalisation method

Return type

scmdata.ScmRun

Raises

After branch time mean API

Module for the normaliser which calculates anomalies from a mean of a fixed number of years in the pre-industrial control run

class netcdf_scm.normalisation.after_branch_time_mean.AfterBranchTimeMean[source]

Bases: netcdf_scm.normalisation.base.Normaliser

Normaliser which calculates anomalies from a mean of a fixed number of years after the branch time in the pre-industrial control run

At present, only a 31-year mean after the branch time is implemented.

get_reference_values(indata, picontrol, picontrol_branching_time)

Get reference values for an experiment from its equivalent piControl experiment

Parameters
  • indata (scmdata.ScmRun) – Experiment to calculate reference values for

  • picontrol (scmdata.ScmRun) – Pre-industrial control run data

  • picontrol_branching_time (datetime.datetime) – The branching time in the pre-industrial experiment. It is assumed that the first timepoint in input follows immediately from this branching time.

Returns

Reference values with the same index and columns as indata

Return type

pd.DataFrame

Raises
method_name

Name of the method used for normalisation

This string is included in the metadata of normalised data/files.

Type

str

normalise_against_picontrol(indata, picontrol, picontrol_branching_time)

Normalise data against picontrol

Parameters
  • indata (scmdata.ScmRun) – Data to normalise

  • picontrol (scmdata.ScmRun) – Pre-industrial control run data

  • picontrol_branching_time (datetime.datetime) – The branching time in the pre-industrial experiment. It is assumed that the first timepoint in input follows immediately from this branching time.

Returns

Normalised data including metadata about the file which was used for normalisation and the normalisation method

Return type

scmdata.ScmRun

Raises

Running mean API

Module for the normaliser which calculates anomalies from a running mean in the pre-industrial control run

class netcdf_scm.normalisation.running_mean.NormaliserRunningMean(nyears=21)[source]

Bases: netcdf_scm.normalisation.base.Normaliser

Normaliser which calculates anomalies from a running mean in the pre-industrial control run

Each normalisation value is an n-year mean, centred on the equivalent point in the pre-industrial control simulation. If there is insufficient data to create a full n-year window at the edge of the simulation then a linear extrapolation of the running-mean is used to extend the normalisation values to cover the required full range.

get_reference_values(indata, picontrol, picontrol_branching_time)

Get reference values for an experiment from its equivalent piControl experiment

Parameters
  • indata (scmdata.ScmRun) – Experiment to calculate reference values for

  • picontrol (scmdata.ScmRun) – Pre-industrial control run data

  • picontrol_branching_time (datetime.datetime) – The branching time in the pre-industrial experiment. It is assumed that the first timepoint in input follows immediately from this branching time.

Returns

Reference values with the same index and columns as indata

Return type

pd.DataFrame

Raises
method_name

Name of the method used for normalisation

This string is included in the metadata of normalised data/files.

Type

str

normalise_against_picontrol(indata, picontrol, picontrol_branching_time)

Normalise data against picontrol

Parameters
  • indata (scmdata.ScmRun) – Data to normalise

  • picontrol (scmdata.ScmRun) – Pre-industrial control run data

  • picontrol_branching_time (datetime.datetime) – The branching time in the pre-industrial experiment. It is assumed that the first timepoint in input follows immediately from this branching time.

Returns

Normalised data including metadata about the file which was used for normalisation and the normalisation method

Return type

scmdata.ScmRun

Raises

Running mean de-drift API

Module for the normaliser which only removes drift in the pre-industrial control run (drift is calculated using a running-mean)

class netcdf_scm.normalisation.running_mean_dedrift.NormaliserRunningMeanDedrift(nyears=21)[source]

Bases: netcdf_scm.normalisation.running_mean.NormaliserRunningMean

Normaliser which calculates drift in the pre-industrial control using a running mean

Each normalisation value is the change in an n-year mean with respect to the running mean at the branch point. This means that the reference values are always zero in their first timestep. Each point is centred on the equivalent point in the pre-industrial control simulation.

If there is insufficient data to create a full n-year window at the edge of the simulation then a linear extrapolation of the running-mean is used to extend the normalisation values to cover the required full range.

get_reference_values(indata, picontrol, picontrol_branching_time)

Get reference values for an experiment from its equivalent piControl experiment

Parameters
  • indata (scmdata.ScmRun) – Experiment to calculate reference values for

  • picontrol (scmdata.ScmRun) – Pre-industrial control run data

  • picontrol_branching_time (datetime.datetime) – The branching time in the pre-industrial experiment. It is assumed that the first timepoint in input follows immediately from this branching time.

Returns

Reference values with the same index and columns as indata

Return type

pd.DataFrame

Raises
method_name

Name of the method used for normalisation

This string is included in the metadata of normalised data/files.

Type

str

normalise_against_picontrol(indata, picontrol, picontrol_branching_time)

Normalise data against picontrol

Parameters
  • indata (scmdata.ScmRun) – Data to normalise

  • picontrol (scmdata.ScmRun) – Pre-industrial control run data

  • picontrol_branching_time (datetime.datetime) – The branching time in the pre-industrial experiment. It is assumed that the first timepoint in input follows immediately from this branching time.

Returns

Normalised data including metadata about the file which was used for normalisation and the normalisation method

Return type

scmdata.ScmRun

Raises

Output API

Module for handling crunching output tracking

This module handles checking whether a file has already been crunched and if its source files have been updated since it was last crunched.

class netcdf_scm.output.OutputFileDatabase(out_dir)[source]

Bases: object

Holds a list of output files which have been written.

Also keeps track of the source files used to create each output file.

contains_file(filepath)[source]

Return whether a filepath exists in the database

Parameters

filepath (str) – Filepath to check (use absolute paths to be safe)

Returns

If the file is in the database, True, otherwise False

Return type

bool

dump()[source]

Rewrite the entire file

load_from_file()[source]

Load database from self.out_dir

Returns

Handle to the loaded filepath

Return type

io.TextIOWrapper

Raises

ValueError – The loaded file contains more than one entry for a given filename

register(out_fname, info)[source]

Register a filepath with info in the database

Parameters
  • out_fname (str) – Filepath to register

  • info (dict) – out_fname’s metadata

Retractions API

Utilities for checking for retracted datasets

netcdf_scm.retractions.check_depends_on_retracted(mag_files, raise_on_mismatch=True, **kwargs)[source]

Check if a .MAG file was calculated from now retracted data

Notes

This queries external ESGF servers. Please limit the number of parallel requests.

Parameters
  • mag_files (list of str) – List of .MAG files to check

  • raise_on_mismatch (bool) – If a file cannot be processed, should an error be raised? If False, an error message is logged instead.

  • **kwargs (any) – Passed to check_retractions()

Returns

Dataframe which describes the retracted status of each file in mag_files. The columns are:

  • ”mag_file”: the files in mag_files

  • ”dependency_file”: file which the file in the “mag_file” column depends on (note that

    the .MAG files may have more than one dependency so they may appear more than once in the “mag_file” column)

  • ”dependency_instance_id”: instance id (i.e. unique ESGF identifier) of the dependency file

  • ”dependency_retracted”: whether the dependency file has been retracted or not (True if

    the file has been retracated)

The list of retracted .MAG files can then be accessed with e.g. res.loc[res["dependency_retracted"], "mag_file"].unique()

Return type

pd.DataFrame

Raises
  • ValueError – The .MAG file is not based on CMIP6 data (retractions cannot be checked automatically for CMIP5 data with netCDF-SCM).

  • ValueError – Metadata about a .MAG file’s source is not included in the .MAG file.

netcdf_scm.retractions.check_retracted_files(filenames_or_dir, filename_filter='*.nc', **kwargs)[source]

Check if any files are retracted

Notes

This queries external ESGF servers. Please limit the number of parallel requests.

Parameters
  • filenames_or_dir (list of str or str) – A list of filenames or a directory to check for any retractions. If a string is provided, it is assumed to reference a directory and any files within that directory matching the filename_filter will be checked.

  • filename_filter (str) – If a directory is passed all files matching the filter will be checked.

  • **kwargs (any) – Passed to check_retracted()

Returns

Return type

List of the retracted files

netcdf_scm.retractions.check_retractions(instance_ids, esgf_query_batch_size=100, nworkers=8)[source]

Check a list of instance_ids for any retracted datasets

Notes

This queries external ESGF servers. Please limit the number of parallel requests.

Parameters
  • instance_ids (list of str) – Datasets to check. instance_id is the unique identifier for a dataset, for example CMIP6.CMIP.CSIRO.ACCESS-ESM1-5.esm-hist.r1i1p1f1.Amon.rsut.gn.v20191128

  • esgf_query_batch_size (int) – Maximum number of ids to include in each query.

  • nworkers (int) – Number of workers to parallel queries to ESGF.

Returns

A list of retracted instance_ids

Return type

list of str

Stitching API

Module for stitching netCDF-SCM netCDF files together

‘Stitching’ here means combining results from multiple experiments e.g. combining historical and scenario experiments. This relies on the ‘parent’ conventions within CMIP experiments which define the experiment from which a given set of output started (in CMIP language, the experiment from which a given experiment ‘branched’).

netcdf_scm.stitching.get_branch_time(openscmrun, parent=True, source_path=None, parent_path=None)[source]

Get branch time of an experiment

Parameters
  • openscmrun (scmdata.ScmRun) – Data of which to get the branch time

  • parent (bool) – Should I get the branch time in the parent experiment’s time co-ordinates? If False, return the branch time in the child (i.e. openscmrun’s) time co-ordinates.

  • source_path (str) – Path to the data file from which openscmrun is derived. This is only required if parent is False. It is needed because information about the time calendar and units of the data in openscmrun is only available in the source file.

  • parent_path (str) – Path to the data file containing the parent data of openscmrun. This is only required if the data is from CMIP5 because CMIP5 data does not store information about the parent experiment’s time calendar and units.

Returns

The branch time, rounded to the nearest year, month and day. netCDF-SCM is not designed for very precise calculations, if you need to keep finer information, please raise an issue on our issue tracker to discuss.

Return type

datetime.datetime

Raises
  • ValueErrorparent is not True and the data is CMIP5 data. It is impossible to determine the branch time in the child time co-ordinates from CMIP5 data because of a lack of information.

  • ValueErrorparent_path is None and the data is CMIP5 data. You must supply the parent path if the data is CMIP5 data because the parent file is the only place the parent experiment’s time units and calendar information is available.

netcdf_scm.stitching.get_continuous_timeseries_with_meta(infile, drs, return_picontrol_info=True, log_warning=False)[source]

Load a continuous timeseries with metadata

Continuous here means including all parent experiments up to (but not including) piControl

Parameters
  • infile (str) – netCDF-SCM crunched file to load

  • drs (str) – Data reference syntax which applies to this file

  • return_picontrol_info (bool) – If supplied, piControl information will be returned in the second and third outputs if available (rather than None). A caveat is that if the experiment itself is a piControl experiment, None will be returned in the second and third outputs.

  • log_warning (bool) – Should warnings be logged? If False, warnings are raised with warnings.warn instead.

Returns

  • scmdata.ScmRun – Loaded timseries, including metadata

  • dt.datetime – Branch time from piControl. If infile points to a piControl or piControl-spinup experiment then this will be None.

  • str – Path from which the piControl data was loaded. If infile points to a piControl or piControl-spinup experiment then this will be None.

netcdf_scm.stitching.get_parent_file_path(infile, parent_replacements, drs, log_warning=False)[source]

Get parent file path for a given file

If multiple versions are available the latest version is chosen.

Parameters
  • infile (str) – File path of which to get the parent

  • parent_replacements (dict of str : str) – Replacements to insert in infile to determine the parent filepath

  • drs (str) – Data reference syntax which is applicable to these filepaths

  • log_warning (bool) – Should a warning be logged? If no, the warning is raised using warnings.warn.

Returns

Path of the parent file

Return type

str

Raises
netcdf_scm.stitching.get_parent_replacements(scmdf)[source]

Get changes in metadata required to identify a dataset’s parent file

Parameters

scmdf (scmdata.ScmRun) – Dataset of which to identify the parent file

Returns

dict of str – Replacements which must be made to the dataset’s metadata in order to identify its parent file

Return type

str

Raises

KeyError – The variant label (e.g. r1i1p1f1) of the parent dataset is missing

netcdf_scm.stitching.step_up_family_tree(in_level)[source]

Step name up the family tree

Parameters

in_level (str) – Level from which to step up

Returns

Level one up from in_level

Return type

str

Examples

>>> step_up_family_tree("(child)")
"(parent)"
>>> step_up_family_tree("(parent)")
"(grandparent)"
>>> step_up_family_tree("(grandparent)")
"(grandparent)"
>>> step_up_family_tree("(greatgreatgrandparent)")
"(greatgreatgreatgrandparent)"

Utils API

Utils contains a number of helpful functions for doing common cube operations.

For example, applying masks to cubes, taking latitude-longitude means and getting timeseries from a cube as datetime values.

netcdf_scm.utils.apply_mask(in_scmcube, in_mask)[source]

Apply a mask to an scm cube’s data

Parameters
  • in_scmcube (ScmCube) – An ScmCube instance.

  • in_mask (np.ndarray) – The mask to apply

Returns

A copy of the input cube with the mask applied to its data

Return type

ScmCube

netcdf_scm.utils.assert_all_time_axes_same(time_axes)[source]

Assert all time axes in a set are the same.

Parameters

time_axes (list_like of array_like) – List of time axes to compare.

Raises

AssertionError – If not all time axes are the same.

netcdf_scm.utils.broadcast_onto_lat_lon_grid(cube, array_in)[source]

Broadcast an array onto the latitude-longitude grid of cube.

Here, broadcasting means taking the array and ‘duplicating’ it so that it has the same number of dimensions as the cube’s underlying data.

For example, given a cube with a time dimension of length 3, a latitude dimension of length 4 and a longitude dimension of length 2 (shape 3x4x2) and array_in of shape 4x2, results in a 3x4x2 array where each slice in the broadcasted array’s time dimension is identical to array_in.

Parameters
  • cube (ScmCube) – ScmCube instance whose lat-lon grid we want to check agains

  • array_in (np.ndarray) – The array we want to broadcast

Returns

The original array, broadcast onto the cube’s lat-lon grid (i.e. duplicated along all dimensions except for latitude and longitude). Note: If the cube has lazy data, we return a da.Array, otherwise we return an np.ndarray.

Return type

array_out

Raises
  • AssertionErrorarray_in cannot be broadcast onto the cube’s lat-lon grid because their shapes are not compatible

  • ValueErrorarray_in cannot be broadcast onto the cube’s lat-lon grid by iris.util.broadcast_to_shape

netcdf_scm.utils.cube_lat_lon_grid_compatible_with_array(cube, array_in)[source]

Assert that an array can be broadcast onto the cube’s lat-lon grid

Parameters

cube (ScmCube) – ScmCube instance whose lat-lon grid we want to check agains

array_innp.ndarray

The array we want to ensure is able to be broadcast

Returns

True if the cube’s lat-lon grid is compatible with array_in, otherwise False

Return type

bool

Raises

AssertionError – The array cannot be broadcast onto the cube’s lat-lon grid

netcdf_scm.utils.get_cube_timeseries_data(scm_cube, realise_data=False)[source]

Get a timeseries from a cube.

This function only works on cubes which are on a time grid only i.e. have no other dimension coordinates.

Parameters
  • scm_cube (ScmCube) – An ScmCube instance with only a ‘time’ dimension.

  • realise_data (bool) – If True, force the data to be realised before returning

Returns

The cube’s timeseries data. If realise_data is False then a da.Array will be returned if the data is lazy.

Return type

np.ndarray

netcdf_scm.utils.get_scm_cube_time_axis_in_calendar(scm_cube, calendar)[source]

Get a cube’s time axis in a given calendar

Parameters
  • scm_cube (ScmCube) – An ScmCube instance.

  • calendar (str) – The calendar to return the time axis in e.g. ‘365_day’, ‘gregorian’.

Returns

Array of datetimes, containing the cube’s calendar.

Return type

np.ndarray

netcdf_scm.utils.take_lat_lon_mean(in_scmcube, in_weights)[source]

Take the latitude longitude mean of a cube with given weights

Parameters
  • in_scmcube (ScmCube) – An ScmCube instance.

  • in_weights (np.ndarray) – Weights to use when taking the mean.

Returns

First output is a copy of the input cube in which the data is now the latitude-longitude mean of the input cube’s data. Second output is the sum of weights i.e. normalisation used in the weighted mean.

Return type

ScmCube, float

netcdf_scm.utils.unify_lat_lon(cubes, rtol=1e-06)[source]

Unify latitude and longitude co-ordinates of cubes in place.

The co-ordinates will only be unified if they already match to within a given tolerance.

Parameters
  • cubes (iris.cube.CubeList) – List of iris cubes whose latitude and longitude co-ordinates should be unified.

  • rtol (float) – Maximum relative difference which can be accepted between co-ordinate values.

Raises

ValueError – If the co-ordinates differ by more than relative tolerance or are not compatible (e.g. different shape).

Wranglers API

Functions used to ‘wrangle’ netCDF-SCM netCDF files into other formats

netcdf_scm.wranglers.convert_scmdf_to_tuningstruc(scmdf, outdir, prefix=None, force=False)[source]

Convert an scmdata.ScmRun to a matlab tuningstruc

One tuningstruc file will be created for each unique [“model”, “scenario”, “variable”, “region”, “unit”] combination in the input scmdata.ScmRun.

Parameters
  • scmdf (scmdata.ScmRun) – scmdata.ScmRun to convert to a tuningstruc

  • outdir (str) – Directory in which to save the tuningstruc

  • prefix (str) – Prefix for the filename. The rest of the filename is generated from the metadata. .mat is also appended automatically. If None, no prefix is used.

  • force (bool) – If True, overwrite any existing files

Returns

List of files which were not re-written as they already exist

Return type

list

Raises

AssertionError – If timeseries are not unique for a given [“climate_model”, “model”, “scenario”, “variable”, “region”, “unit”] combination.

netcdf_scm.wranglers.convert_tuningstruc_to_scmdf(filepath, variable=None, region=None, unit=None, scenario=None, model=None)[source]

Convert a matlab tuningstruc to an scmdata.ScmRun

Parameters
  • filepath (str) – Filepath from which to load the data

  • variable (str) – Name of the variable contained in the tuningstruc. If None, convert_tuningstruc_to_scmdf will attempt to determine it from the input file.

  • region (str) – Region to which the data in the tuningstruc applies. If None, convert_tuningstruc_to_scmdf will attempt to determine it from the input file.

  • unit (str) – Units of the data in the tuningstruc. If None, convert_tuningstruc_to_scmdf will attempt to determine it from the input file.

  • scenario (str) – Scenario to which the data in the tuningstruc applies. If None, convert_tuningstruc_to_scmdf will attempt to determine it from the input file.

  • model (str) – The (integrated assessment) model which generated the emissions scenario associated with the data in the tuningstruc. If None, convert_tuningstruc_to_scmdf will attempt to determine it from the input file and if it cannot, it will be set to “unspecified”.

Raises

KeyError – If a metadata variable is not supplied and it cannot be determined from the tuningstruc.

Returns

scmdata.ScmRun with the tuningstruc data

Return type

scmdata.ScmRun

netcdf_scm.wranglers.get_tuningstruc_name_from_df(df, outdir, prefix)[source]

Get the name of a tuningstruc from a pd.DataFrame

Parameters
  • df (pd.DataFrame) – pandas DataFrame to convert to a tuningstruc

  • outdir (str) – Base path on which to append the metadata and .mat.

  • prefix (str) – Prefix to prepend to the name. If None, no prefix is prepended.

Returns

tuningstruc name

Return type

str

Raises

ValueError – A name cannot be determined because e.g. more than one scenario is contained in the dataframe

Wrangling API

Module for wrangling netCDF-SCM netCDF files into other formats

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

The changes listed in this file are categorised as follows:

  • Added: new features

  • Changed: changes in existing functionality

  • Deprecated: soon-to-be removed features

  • Removed: now removed features

  • Fixed: any bug fixes

  • Security: in case of vulnerabilities.

v2.1.0 - 2021-03-31

Added
Changed
  • (!83) Raise warning or log warning if the branch times cannot be verified rather than raising a NotImplementedError (partially addresses #61)

  • (!80) Require xarray<0.17 until xarray #5050 is resolved

Fixed
  • (!84) Look for parent data submitted under a different institution id rather than immediately raising an IOError if no parent data under the same institution id is found

  • (!81) During stitching select the parent with the latest version if multiple parents are found (closes #59)

v2.0.2 - 2021-02-25

Added
  • (!79) Ability to crunch files if parent_time_units metadata is missing (closes #56)

Fixed
  • Incorrect paper link in README

v2.0.1 - 2021-02-25

Added
  • (!77) DOI reference to paper in crunched files (and anything derived from crunched files)

  • (!77) Tweaks to paper following proofs

Fixed

v2.0.0 - 2021-01-19

Added
  • (!76) Added missing modules to documentation

  • (!72) v2 paper revisions round 2

  • (!67) v2 paper revisions

  • (!69) Added AR6 reference regions

  • (!68) “30-yr-running-mean” and “30-yr-running-mean-dedrift” normalisation options when stitching

  • (!68) nyears keyword argument when initialising netcdf_scm.normalisation.NormaliserRunningMean and netcdf_scm.normalisation.NormaliserRunningMeanDedrift so that the number of years to use when calculating the running-mean is now arbitrary (default value is 21 so there is now change to the default behaviour)

  • (!32) First submission to Earth System Science Data (ESSD)

  • (!56) Instructions and scripts for doing zenodo releases

  • (!40) Add netcdf_scm.citing module (closes #39)

  • (!35) Add netcdf_scm.retractions module (closes #29)

  • (!51) Add normalisation module to docs

  • (!49) Add progress bar to directory sorting so it’s obvious when things are going very slowly

  • (!46) Add netcdf_scm.errors to docs (closes #41)

  • (!43) Add normalisation method 21-yr-running-mean-dedrift

  • (!39) Put basic license checking tools in new module: netcdf_scm.citing (closes #30)

  • (!34) Add convenience .MAG reader (netcdf_scm.io.load_mag_file) which automatically fills in metadata. Also adds netcdf_scm.io.get_scmcube_helper to the ‘public’ API.

  • (!25) Add regular test of conda installation

  • (!30) Added scipy to dependencies to pip install works

  • (!26) Added 21-year running mean normalisation option

  • (!22) Allow user to choose weighting scheme in CLI

  • (!17) Add netcdf_scm.weights.AreaWeightCalculator

  • (!16) Add CMIP5 stitching support

  • (!8) Add process id to logging calls (fixes #13)

  • (!1) Add netcdf-scm-stitch so e.g. historical and scenario files can be joined and also normalised against e.g. piControl

  • (#108 (github)) Optimise wranglers and add regression tests

  • (#107 (github)) Add wrangling options for average/point start/mid/end year time manipulations for .MAG and .IN files

  • (#104 (github)) Allow wranglers to also handle unit conversions (see #101 (github))

  • (#102 (github)) Keep effective area as metadata when calculating SCM timeseries (see #100 (github))

  • (#98 (github)) Add support for reading CMIP6 concentration GMNHSH data

  • (#95 (github)) Add support for CO2 flux data (fgco2) reading, in the process simplifying crunching and improving lazy weights

  • (#87 (github)) Add support for crunching data with a height co-ordinate

  • (#84 (github)) Add ability to crunch land, ocean and atmosphere data separately (and sensibly)

  • (#75 (github)) Check land_mask_threshold is sensible when retrieving land mask (automatically update if not)

  • (#69 (github)) Add El Nino 3.4 mask

  • (#66 (github)) Add devops tools and refactor to pass new standards

  • (#62 (github)) Add netcdf-scm format and crunch to this by default

  • (#61 (github)) Add land fraction when crunching scm timeseries cubes

Changed
  • (!73) Handling of invalid regions while crunching. If crunching requests regions which aren’t compatible with a file, a warning will be raised but the crunching will continue with all the valid regions it can. Previously, if invalid regions were requested, the crunch would fail and no regions would be crunched for that file.

  • (!73) Renamed netcdf_scm.weights.InvalidWeights to netcdf_scm.weights.InvalidWeightsError and ensured that all weights-related errors are now raised as netcdf_scm.weights.InvalidWeightsError rather than being a mix of netcdf_scm.weights.InvalidWeightsError and ValueError as was previously the case.

  • (!73) netcdf_scm.iris_cube_wrappers.ScmCube.get_scm_timeseries_cubes() will now raise a netcdf_scm.weights.InvalidWeightsError if none of the requested regions have valid weights.

  • (!73) Improved logging handling so only netCDF-SCM’s logger is used by netCDF-SCM, with the root logger never being used.

  • (!71) Rename prefix for AR6 regions from World|AR6 regions to World|AR6

  • (!70) Update default land-fraction cube, netcdf_scm.weights.default_land_ocean_weights.nc, so they’re based on CMIP6 data and treat e.g. the Caspian Sea and Great Lakes not as purely land

  • (!5) Use xarray to load crunched netCDF files in netcdf_scm.io.load_scmrun(), reducing load time by about a factor of 3

  • (!64) Upgraded to pymagicc 2.0.0rc5 and changed all use of scmdata.ScmDataFrame to scmdata.ScmRun

  • (!64) netcdf_scm.io.load_scmdataframe to netcdf_scm.io.load_scmrun and this function now automatically drops the “todo” column on reading

  • (!62) Changed command-line interface to use groups rather than hyphens. Change in commands is netcdf-scm-crunch –> netcdf-scm crunch, netcdf-scm-stitch –> netcdf-scm stitch, netcdf-scm-wrangle –> netcdf-scm wrangle.

  • (!60) Target journal for v2 paper

  • (!55) Added check that region areas are sensible when calculating SCM timeseries cubes (see ScmCube._sanity_check_area(), closes #34)

  • (!52) Put notebooks into documentation henced moved them from notebooks to docs/source/usage

  • (!48) Workaround erroneous whitespace in parent metadata when stitching (closes #36)

  • (!47) Rework CHANGELOG to follow Keep a Changelog (closes #27)

  • (!45) Move from https://gitlab.com/znicholls/netcdf-scm to https://gitlab.com/netcdf-scm/netcdf-scm

  • (!38) Split out normalisation module: netcdf_scm.normalisation (closes #31)

  • (!37) Do not duplicate files into a flat directory when wrangling and stitching (closes #33)

  • (!31) Rename SCMCube, it is now ScmCube. Also use “netCDF” rather than “NetCDF” throughout.

  • (!28) Move multiple stitching utility functions into the ‘public’ API

  • (!29) Parallelise directory sorting when crunching

  • (!27) Refactored stitching to module to make room for new normalisation method

  • (!24) Parallelise unit, integration and regression tests in CI to reduce run time

  • (!23) Split netcdf_scm.cli into smaller parts

  • (!21) Remove use of contourf in notebooks as it can give odd results

  • (!20) Update weight retrieval so that non-area weights are normalised (fixes #11)

  • (!19) Update notebooks and refactor so cubes can have multiple weights calculators

  • (#106 (github)) Upgrade to new Pymagicc release

  • (#105 (github)) Upgrade to new Pylint release

  • (#99 (github)) Switch to BSD-3-Clause license

  • (#92 (github)) Shrink test files (having moved entire repository to use git lfs properly)

  • (#90 (github)) Rely on iris for lazy crunching

  • (#89 (github)) Change crunching thresholds to be based on data size rather than number of years

  • (#82 (github)) Prepare to add land data handling

  • (#81 (github)) Refactor masks to use weighting instead of masking, doing all the renaming in the process

  • (#80 (github)) Refactor to avoid import conftest in tests

  • (#77 (github)) Refactor netcdf_scm.masks.get_area_mask logic to make multi-dimensional co-ordinate support easier

  • (#72 (github)) Monkey patch iris to speed up crunching and go back to linear regridding of default sftlf mask

  • (#70 (github)) Dynamically decide whether to handle data lazily (fix regression tests in process)

  • (#64 (github)) Update logging to make post analysis easier and output clearer

  • (#63 (github)) Switch to using cmor name for variable in SCM timeseries output and put standard name in standard_variable_name

  • (#58 (github)) Lock tuningstruc wrangling so it can only wrangle to flat tuningstrucs, also includes:

    • turning off all wrangling in preparation for re-doing crunching format

    • adding default sftlf cube

  • (#50 (github)) Make pyam-iamc a core dependency

Fixed
  • (!75) Check regionmask version before trying to access regionmask’s AR6 region definitions

  • (!66) Upgraded to scmdata 0.7

  • (!59) Updated SCMCube.lat_lon_shape so it is better able to handle non-standard datasets

  • (!58) Upgraded to pymagicc>=2.0.0rc3 to ensure pint compatible unit handling when writing .MAG files

  • (!57) Include cmip5 reference csv in package (closes #43)

  • (!36) Ensure areas are only calculated based on non-masked data (fixes bugs identified in #35 and #37)

  • (!33) Fix bug in stitching.get_branch_time where wrong time units were used when converting raw time to datetime

  • (!18) Hotfix tests

  • (!15) Fixed but in unit conversion which caused it to fail for hfds

  • (!14) Fixed stitching when start year is 1 error (#15)

  • (!13) Make cube concatenation workaround small errors in raw data metadata

  • (!12) Fixed stitched .MAG filename bug identified in (#14)

  • (!10) Add support for esm* experiments when stitching (fixes #2)

  • (!11) Add ability to read CanESM5 ocean data with depth and ‘extra’ co-ordinates. Also:

    • split regression testing into smaller pieces so memory requirements aren’t so high

  • (!9) Add ability to read CanESM5 ocean data, making handling of ‘extra’ co-ordinates more robust

  • (!6) Allow hfds crunching to work by handling extra ocean data coordinates properly

  • (#114 (github)) Ensure that default sftlf file is included in wheel

  • (#111 (github)) Write tuningstrucs with data in columns rather than rows

  • (#97 (github)) Add support for tuningstruc data which has been transposed

  • (#88 (github)) Fix bug when reading more than one multi-dimensional file in a directory

  • (#74 (github)) Fix bug in mask generation

  • (#67 (github)) Fix crunching filenaming, tidy up more and add catch for IPSL time_origin time variable attribute

  • (#55 (github)) Hotfix docs so they build properly

Removed
  • (!62) netcdf_scm.cli_utils._init_logging, netcdf-SCM will now only initialise a logger if used from the command-line, giving users full control of logging again

  • (!61) Redundant files

  • (!42) Remove redundant test files (leftover from previous behaviour)

v1.0.0 - 2019-05-21

Changed
  • (#49 (github)) Make bandit only check src

  • (#45 (github)) Refactor the masking of regions into a module allowing for more regions to be added as needed

Added
  • (#48 (github)) Add isort to checks

  • (#47 (github)) Add regression tests on crunching output to ensure stability. Also:

    • fixes minor docs bug

    • updates default regexp option in crunch and wrangle to avoid fx files

    • refactors cli.py a touch to reduce duplication

    • avoids collections deprecation warning in mat4py

Fixed
  • (#46 (github)) Fix a number of bugs in netcdf-scm-wrangle’s data handling when converting to tuningstrucs

v0.7.3 - 2019-05-16

Changed
  • (#44 (github)) Speed up crunching by forcing data to load before applying masks, not each time a mask is applied

v0.7.2 - 2019-05-16

Changed
  • (#43 (github)) Speed up crunching, in particular remove string parsing to convert cftime to python datetime

v0.7.1 - 2019-05-15

Added
  • (#42 (github)) Add netcdf-scm-wrangle command line interface

Fixed
  • (#41 (github)) Fixed bug in path handling of CMIP6OutputCube

v0.6.2 - 2019-05-14

Added
  • (#39 (github)) Add netcdf-scm-crunch command line interface

v0.6.1 - 2019-05-13

Added
  • (#29 (github)) Put crunching script into formal testsuite which confirms results against KNMI data available here, however no docs or formal example until #6 (github) is closed

  • (#28 (github)) Added cmip5 crunching script example, not tested so use with caution until #6 (github) is closed

Changed
  • (#40 (github)) Upgrade to pyam v0.2.0

  • (#38 (github)) Update to using openscm releases and hence drop Python3.6 support

  • (#37 (github)) Adjusted read in of gregorian with 0 reference to give all data from year 1 back

  • (#34 (github)) Move to new openscm naming i.e. returning ScmDataFrame rather than OpenSCMDataFrameBase

  • (#32 (github)) Move to returning OpenSCMDataFrameBase rather than pandas DataFrame when crunching to scm format

Fixed
  • (#35 (github)) Fixed bug which prevented SCMCube from crunching to scm timeseries with default earth radius when areacella cube was missing

  • (#29 (github)) Fixed bug identified in #30 (github)

v0.5.1 - 2018-11-12

Changed
  • (#26 (github)) Expose directory and filename parsers directly

v0.4.3 - 2018-11-12

Changed
  • Move import cftime into same block as iris imports

v0.4.2 - 2018-11-12

Changed
  • Update setup.py to install dependencies so that non-Iris dependent functionality can be run from a pip install

v0.4.1 - 2018-11-12

Added
  • (#23 (github)) Added ability to handle cubes with invalid calendar (e.g. CMIP6 historical concentrations cubes)

  • (#20 (github)) Added CMIP6Input4MIPsCube and CMIP6OutputCube which add compatibility with CMIP6 data

v0.3.1 - 2018-11-05

Added
  • (#15 (github)) Add ability to load from a directory with data that is saved in multiple timeslice files, also adds:

    • adds regular expressions section to development part of docs

    • adds an example script of how to crunch netCDF files into SCM csvs

  • (#13 (github)) Add load_from_path method to SCMCube

  • (#10 (github)) Add land/ocean and hemisphere splits to _get_scm_masks outputs

Changed
  • (#17 (github)) Update to crunch global and hemispheric means even if land-surface fraction data is missing

  • (#16 (github)) Tidy up experimental crunching script

  • (#14 (github)) Streamline install process

  • (#12 (github)) Update to use output format that is compatible with pyam

  • Update netcdftime to cftime to track name change

v0.2.4 - 2018-10-15

Added
  • Include simple tests in package

v0.2.3 - 2018-10-15

Added
  • Include LICENSE in package

v0.2.2 - 2018-10-15

Added
  • Add conda dev environment details

v0.2.1 - 2018-10-15

Changed
  • Update setup.py to reflect actual supported python versions

v0.2.0 - 2018-10-14

Added
  • (#4 (github)) Add work done elsewhere previously
    • SCMCube base class for handling netCDF files
      • reading, cutting and manipulating files for SCM use

    • MarbleCMIP5Cube for handling CMIP5 netCDF files within a particular directory structure

    • automatic loading and use of surface land fraction and cell area files

    • returns timeseries data, once processed, in pandas DataFrames rather than netCDF format for easier use

    • demonstration notebook of how this first step works

    • CI for entire repository including notebooks

    • automatic documentation with Sphinx

v0.0.1 - 2018-10-05

Added
  • initial release

v0.0 - 2018-10-05

Added
  • dummy release