Retractions API

Utilities for checking for retracted datasets

netcdf_scm.retractions.check_depends_on_retracted(mag_files, raise_on_mismatch=True, **kwargs)[source]

Check if a .MAG file was calculated from now retracted data

Notes

This queries external ESGF servers. Please limit the number of parallel requests.

Parameters
  • mag_files (list of str) – List of .MAG files to check

  • raise_on_mismatch (bool) – If a file cannot be processed, should an error be raised? If False, an error message is logged instead.

  • **kwargs (any) – Passed to check_retractions()

Returns

Dataframe which describes the retracted status of each file in mag_files. The columns are:

  • ”mag_file”: the files in mag_files

  • ”dependency_file”: file which the file in the “mag_file” column depends on (note that

    the .MAG files may have more than one dependency so they may appear more than once in the “mag_file” column)

  • ”dependency_instance_id”: instance id (i.e. unique ESGF identifier) of the dependency file

  • ”dependency_retracted”: whether the dependency file has been retracted or not (True if

    the file has been retracated)

The list of retracted .MAG files can then be accessed with e.g. res.loc[res["dependency_retracted"], "mag_file"].unique()

Return type

pd.DataFrame

Raises
  • ValueError – The .MAG file is not based on CMIP6 data (retractions cannot be checked automatically for CMIP5 data with netCDF-SCM).

  • ValueError – Metadata about a .MAG file’s source is not included in the .MAG file.

netcdf_scm.retractions.check_retracted_files(filenames_or_dir, filename_filter='*.nc', **kwargs)[source]

Check if any files are retracted

Notes

This queries external ESGF servers. Please limit the number of parallel requests.

Parameters
  • filenames_or_dir (list of str or str) – A list of filenames or a directory to check for any retractions. If a string is provided, it is assumed to reference a directory and any files within that directory matching the filename_filter will be checked.

  • filename_filter (str) – If a directory is passed all files matching the filter will be checked.

  • **kwargs (any) – Passed to check_retracted()

Returns

Return type

List of the retracted files

netcdf_scm.retractions.check_retractions(instance_ids, esgf_query_batch_size=100, nworkers=8)[source]

Check a list of instance_ids for any retracted datasets

Notes

This queries external ESGF servers. Please limit the number of parallel requests.

Parameters
  • instance_ids (list of str) – Datasets to check. instance_id is the unique identifier for a dataset, for example CMIP6.CMIP.CSIRO.ACCESS-ESM1-5.esm-hist.r1i1p1f1.Amon.rsut.gn.v20191128

  • esgf_query_batch_size (int) – Maximum number of ids to include in each query.

  • nworkers (int) – Number of workers to parallel queries to ESGF.

Returns

A list of retracted instance_ids

Return type

list of str