Retractions API¶
Utilities for checking for retracted datasets
-
netcdf_scm.retractions.
check_depends_on_retracted
(mag_files, raise_on_mismatch=True, **kwargs)[source]¶ Check if a
.MAG
file was calculated from now retracted dataNotes
This queries external ESGF servers. Please limit the number of parallel requests.
- Parameters
mag_files (list of str) – List of
.MAG
files to checkraise_on_mismatch (bool) – If a file cannot be processed, should an error be raised? If
False
, an error message is logged instead.**kwargs (any) – Passed to
check_retractions()
- Returns
Dataframe which describes the retracted status of each file in
mag_files
. The columns are:”mag_file”: the files in
mag_files
- ”dependency_file”: file which the file in the “mag_file” column depends on (note that
the
.MAG
files may have more than one dependency so they may appear more than once in the “mag_file” column)
”dependency_instance_id”: instance id (i.e. unique ESGF identifier) of the dependency file
- ”dependency_retracted”: whether the dependency file has been retracted or not (
True
if the file has been retracated)
- ”dependency_retracted”: whether the dependency file has been retracted or not (
The list of retracted
.MAG
files can then be accessed with e.g.res.loc[res["dependency_retracted"], "mag_file"].unique()
- Return type
pd.DataFrame
- Raises
ValueError – The
.MAG
file is not based on CMIP6 data (retractions cannot be checked automatically for CMIP5 data with netCDF-SCM).ValueError – Metadata about a
.MAG
file’s source is not included in the.MAG
file.
-
netcdf_scm.retractions.
check_retracted_files
(filenames_or_dir, filename_filter='*.nc', **kwargs)[source]¶ Check if any files are retracted
Notes
This queries external ESGF servers. Please limit the number of parallel requests.
- Parameters
filenames_or_dir (list of str or str) – A list of filenames or a directory to check for any retractions. If a string is provided, it is assumed to reference a directory and any files within that directory matching the filename_filter will be checked.
filename_filter (str) – If a directory is passed all files matching the filter will be checked.
**kwargs (any) – Passed to
check_retracted()
- Returns
- Return type
List of the retracted files
-
netcdf_scm.retractions.
check_retractions
(instance_ids, esgf_query_batch_size=100, nworkers=8)[source]¶ Check a list of
instance_ids
for any retracted datasetsNotes
This queries external ESGF servers. Please limit the number of parallel requests.
- Parameters
instance_ids (list of str) – Datasets to check.
instance_id
is the unique identifier for a dataset, for example CMIP6.CMIP.CSIRO.ACCESS-ESM1-5.esm-hist.r1i1p1f1.Amon.rsut.gn.v20191128esgf_query_batch_size (int) – Maximum number of ids to include in each query.
nworkers (int) – Number of workers to parallel queries to ESGF.
- Returns
A list of retracted
instance_ids
- Return type
list of str