Retractions API¶
Utilities for checking for retracted datasets
-
netcdf_scm.retractions.check_depends_on_retracted(mag_files, raise_on_mismatch=True, **kwargs)[source]¶ Check if a
.MAGfile was calculated from now retracted dataNotes
This queries external ESGF servers. Please limit the number of parallel requests.
- Parameters
mag_files (list of str) – List of
.MAGfiles to checkraise_on_mismatch (bool) – If a file cannot be processed, should an error be raised? If
False, an error message is logged instead.**kwargs (any) – Passed to
check_retractions()
- Returns
Dataframe which describes the retracted status of each file in
mag_files. The columns are:”mag_file”: the files in
mag_files- ”dependency_file”: file which the file in the “mag_file” column depends on (note that
the
.MAGfiles may have more than one dependency so they may appear more than once in the “mag_file” column)
”dependency_instance_id”: instance id (i.e. unique ESGF identifier) of the dependency file
- ”dependency_retracted”: whether the dependency file has been retracted or not (
Trueif the file has been retracated)
- ”dependency_retracted”: whether the dependency file has been retracted or not (
The list of retracted
.MAGfiles can then be accessed with e.g.res.loc[res["dependency_retracted"], "mag_file"].unique()- Return type
pd.DataFrame
- Raises
ValueError – The
.MAGfile is not based on CMIP6 data (retractions cannot be checked automatically for CMIP5 data with netCDF-SCM).ValueError – Metadata about a
.MAGfile’s source is not included in the.MAGfile.
-
netcdf_scm.retractions.check_retracted_files(filenames_or_dir, filename_filter='*.nc', **kwargs)[source]¶ Check if any files are retracted
Notes
This queries external ESGF servers. Please limit the number of parallel requests.
- Parameters
filenames_or_dir (list of str or str) – A list of filenames or a directory to check for any retractions. If a string is provided, it is assumed to reference a directory and any files within that directory matching the filename_filter will be checked.
filename_filter (str) – If a directory is passed all files matching the filter will be checked.
**kwargs (any) – Passed to
check_retracted()
- Returns
- Return type
List of the retracted files
-
netcdf_scm.retractions.check_retractions(instance_ids, esgf_query_batch_size=100, nworkers=8)[source]¶ Check a list of
instance_idsfor any retracted datasetsNotes
This queries external ESGF servers. Please limit the number of parallel requests.
- Parameters
instance_ids (list of str) – Datasets to check.
instance_idis the unique identifier for a dataset, for example CMIP6.CMIP.CSIRO.ACCESS-ESM1-5.esm-hist.r1i1p1f1.Amon.rsut.gn.v20191128esgf_query_batch_size (int) – Maximum number of ids to include in each query.
nworkers (int) – Number of workers to parallel queries to ESGF.
- Returns
A list of retracted
instance_ids- Return type
list of str