Crunching CMIP5 data

In this notebook we give a very brief example of how we can crunch CMIP5 data using netCDFSCM. This is highly experimental and this notebook is pretty stupid given that it’s really a command-line tool.

[1]:
# NBVAL_IGNORE_OUTPUT
!netcdf-scm-crunch -h
Usage: netcdf-scm-crunch [OPTIONS] SRC DST CRUNCH_CONTACT

  Crunch data in ``src`` to netCDF-SCM ``.nc`` files in ``dst``.

  ``src`` is searched recursively and netcdf-scm will attempt to crunch all
  the files found. The directory structure in ``src`` will be mirrored in
  ``dst``.

  Failures and warnings are recorded and written into a text file in
  ``dst``. We recommend examining this file using a file analysis tool such
  as ``grep``. We often use the command ``grep "\|WARNING\|INFO\|ERROR <log-
  file>``.

  ``crunch_contact`` is written into the output ``.nc`` files'
  ``crunch_contact`` attribute.

Options:
  --drs [Scm|MarbleCMIP5|CMIP6Input4MIPs|CMIP6Output]
                                  Data reference syntax to use for crunching.
                                  [default: Scm]

  --regexp TEXT                   Regular expression to apply to file
                                  directory (only crunches matches).
                                  [default: ^(?!.*(fx)).*$]

  --regions TEXT                  Comma-separated regions to crunch.
                                  [default: World,World|Northern
                                  Hemisphere,World|Southern Hemisphere,World|L
                                  and,World|Ocean,World|Northern
                                  Hemisphere|Land,World|Southern
                                  Hemisphere|Land,World|Northern
                                  Hemisphere|Ocean,World|Southern
                                  Hemisphere|Ocean]

  --data-sub-dir TEXT             Sub-directory of ``dst`` to save data in.
                                  [default: netcdf-scm-crunched]

  -f, --force / --do-not-force    Overwrite any existing files.  [default:
                                  False]

  --small-number-workers INTEGER  Maximum number of workers to use when
                                  crunching files.  [default: 10]

  --small-threshold FLOAT         Maximum number of data points (in millions)
                                  in a file for it to be processed in parallel
                                  with ``small-number-workers``  [default:
                                  50.0]

  --medium-number-workers INTEGER
                                  Maximum number of workers to use when
                                  crunching files.  [default: 3]

  --medium-threshold FLOAT        Maximum number of data points (in millions)
                                  in a file for it to be processed in parallel
                                  with ``medium-number-workers``  [default:
                                  120.0]

  --force-lazy-threshold FLOAT    Maximum number of data points (in millions)
                                  in a file for it to be processed in memory
                                  [default: 1000.0]

  -h, --help                      Show this message and exit.
[2]:
# NBVAL_IGNORE_OUTPUT
!netcdf-scm-crunch "../tests/test-data/marble-cmip5" "../output-examples/crunched-files" "notebook example <email address>" --drs "MarbleCMIP5" --regexp ".*tas.*"
5982 2020-05-01 12:52:38,835 INFO:netcdf_scm:netcdf-scm: 2.0.0-beta.7+14.g258d6b1.dirty
5982 2020-05-01 12:52:38,835 INFO:netcdf_scm:crunch-contact: notebook example <email address>
5982 2020-05-01 12:52:38,835 INFO:netcdf_scm:source: /Users/znicholls/Documents/AGCEC/Misc/netcdf-scm/tests/test-data/marble-cmip5
5982 2020-05-01 12:52:38,835 INFO:netcdf_scm:destination: /Users/znicholls/Documents/AGCEC/Misc/netcdf-scm/output-examples/crunched-files/netcdf-scm-crunched
5982 2020-05-01 12:52:38,835 INFO:netcdf_scm:drs: MarbleCMIP5
5982 2020-05-01 12:52:38,835 INFO:netcdf_scm:regexp: .*tas.*
5982 2020-05-01 12:52:38,835 INFO:netcdf_scm:regions: World,World|Northern Hemisphere,World|Southern Hemisphere,World|Land,World|Ocean,World|Northern Hemisphere|Land,World|Southern Hemisphere|Land,World|Northern Hemisphere|Ocean,World|Southern Hemisphere|Ocean
5982 2020-05-01 12:52:38,835 INFO:netcdf_scm:force: False
5982 2020-05-01 12:52:38,835 INFO:netcdf_scm:small_number_workers: 10
5982 2020-05-01 12:52:38,835 INFO:netcdf_scm:small_threshold: 50.0
5982 2020-05-01 12:52:38,835 INFO:netcdf_scm:medium_number_workers: 3
5982 2020-05-01 12:52:38,835 INFO:netcdf_scm:medium_threshold: 120.0
5982 2020-05-01 12:52:38,835 INFO:netcdf_scm:force_lazy_threshold: 1000.0
5982 2020-05-01 12:52:38,836 INFO:netcdf_scm.output:Read in 9 items from database netcdf-scm_crunched.jsonl
5982 2020-05-01 12:52:38,836 INFO:netcdf_scm:Finding directories with files
5982 2020-05-01 12:52:38,837 INFO:netcdf_scm:Adding directory to queue /Users/znicholls/Documents/AGCEC/Misc/netcdf-scm/tests/test-data/marble-cmip5/cmip5/rcp26/Amon/tas/bcc-csm1-1/r1i1p1
5982 2020-05-01 12:52:38,837 INFO:netcdf_scm:Adding directory to queue /Users/znicholls/Documents/AGCEC/Misc/netcdf-scm/tests/test-data/marble-cmip5/cmip5/piControl/Amon/tas/NorESM1-M/r1i1p1
5982 2020-05-01 12:52:38,838 INFO:netcdf_scm:Adding directory to queue /Users/znicholls/Documents/AGCEC/Misc/netcdf-scm/tests/test-data/marble-cmip5/cmip5/rcp45/Amon/tas/NorESM1-M/r1i1p1
5982 2020-05-01 12:52:38,839 INFO:netcdf_scm:Adding directory to queue /Users/znicholls/Documents/AGCEC/Misc/netcdf-scm/tests/test-data/marble-cmip5/cmip5/rcp45/Amon/tas/ACCESS1-0/r1i1p1
5982 2020-05-01 12:52:38,839 INFO:netcdf_scm:Adding directory to queue /Users/znicholls/Documents/AGCEC/Misc/netcdf-scm/tests/test-data/marble-cmip5/cmip5/rcp45/Amon/tas/HadCM3/r1i1p1
5982 2020-05-01 12:52:38,841 INFO:netcdf_scm:Adding directory to queue /Users/znicholls/Documents/AGCEC/Misc/netcdf-scm/tests/test-data/marble-cmip5/cmip5/historical/Amon/tas/NorESM1-M/r1i1p1
5982 2020-05-01 12:52:38,841 INFO:netcdf_scm:Adding directory to queue /Users/znicholls/Documents/AGCEC/Misc/netcdf-scm/tests/test-data/marble-cmip5/cmip5/historical/Amon/tas/ACCESS1-0/r1i1p1
5982 2020-05-01 12:52:38,843 INFO:netcdf_scm:Adding directory to queue /Users/znicholls/Documents/AGCEC/Misc/netcdf-scm/tests/test-data/marble-cmip5/cmip5/1pctCO2/Amon/tas/CanESM2/r1i1p1
5982 2020-05-01 12:52:38,845 INFO:netcdf_scm:Adding directory to queue /Users/znicholls/Documents/AGCEC/Misc/netcdf-scm/tests/test-data/marble-cmip5/cmip5/rcp85/Amon/tas/NorESM1-ME/r1i1p1
5982 2020-05-01 12:52:38,845 INFO:netcdf_scm:Found 9 directories with files
Sorting directories: 100%|████████████████████████| 9/9 [00:00<00:00, 36.05it/s]
5982 2020-05-01 12:52:39,096 INFO:netcdf_scm:Crunching 9 directories with less than 50.0 million data points
5982 2020-05-01 12:52:39,096 INFO:netcdf_scm:Processing in parallel with 10 workers
5982 2020-05-01 12:52:39,096 INFO:netcdf_scm:Forcing dask to use a single thread when reading
  0%|                                               | 0.00/9.00 [00:00<?, ?it/s]5994 2020-05-01 12:52:39,123 INFO:netcdf_scm:Attempting to process: ['tas_Amon_NorESM1-M_rcp45_r1i1p1_200601-230012.nc']
5992 2020-05-01 12:52:39,122 INFO:netcdf_scm:Attempting to process: ['tas_Amon_bcc-csm1-1_rcp26_r1i1p1_209001-209912.nc', 'tas_Amon_bcc-csm1-1_rcp26_r1i1p1_210001-211012.nc']
5993 2020-05-01 12:52:39,123 INFO:netcdf_scm:Attempting to process: ['tas_Amon_NorESM1-M_piControl_r1i1p1_070001-120012.nc']
5995 2020-05-01 12:52:39,123 INFO:netcdf_scm:Attempting to process: ['tas_Amon_ACCESS1-0_rcp45_r1i1p1_200601-201012.nc']
5998 2020-05-01 12:52:39,124 INFO:netcdf_scm:Attempting to process: ['tas_Amon_ACCESS1-0_historical_r1i1p1_187701-187703.nc']
5999 2020-05-01 12:52:39,125 INFO:netcdf_scm:Attempting to process: ['tas_Amon_CanESM2_1pctCO2_r1i1p1_189201-190312.nc']
5996 2020-05-01 12:52:39,125 INFO:netcdf_scm:Attempting to process: ['tas_Amon_HadCM3_rcp45_r1i1p1_200601-203012.nc', 'tas_Amon_HadCM3_rcp45_r1i1p1_203101-203512.nc']
5997 2020-05-01 12:52:39,124 INFO:netcdf_scm:Attempting to process: ['tas_Amon_NorESM1-M_historical_r1i1p1_185001-200512.nc']
6000 2020-05-01 12:52:39,127 INFO:netcdf_scm:Attempting to process: ['tas_Amon_NorESM1-ME_rcp85_r1i1p1_204501-205012.nc', 'tas_Amon_NorESM1-ME_rcp85_r1i1p1_204001-204412.nc']
5997 2020-05-01 12:52:39,173 INFO:netcdf_scm:Skipped (already exists, not overwriting) /Users/znicholls/Documents/AGCEC/Misc/netcdf-scm/output-examples/crunched-files/netcdf-scm-crunched/cmip5/historical/Amon/tas/NorESM1-M/r1i1p1/netcdf-scm_tas_Amon_NorESM1-M_historical_r1i1p1_185001-200512.nc
5994 2020-05-01 12:52:39,174 INFO:netcdf_scm:Skipped (already exists, not overwriting) /Users/znicholls/Documents/AGCEC/Misc/netcdf-scm/output-examples/crunched-files/netcdf-scm-crunched/cmip5/rcp45/Amon/tas/NorESM1-M/r1i1p1/netcdf-scm_tas_Amon_NorESM1-M_rcp45_r1i1p1_200601-230012.nc
5998 2020-05-01 12:52:39,176 INFO:netcdf_scm:Skipped (already exists, not overwriting) /Users/znicholls/Documents/AGCEC/Misc/netcdf-scm/output-examples/crunched-files/netcdf-scm-crunched/cmip5/historical/Amon/tas/ACCESS1-0/r1i1p1/netcdf-scm_tas_Amon_ACCESS1-0_historical_r1i1p1_187701-187703.nc
5999 2020-05-01 12:52:39,181 INFO:netcdf_scm:Skipped (already exists, not overwriting) /Users/znicholls/Documents/AGCEC/Misc/netcdf-scm/output-examples/crunched-files/netcdf-scm-crunched/cmip5/1pctCO2/Amon/tas/CanESM2/r1i1p1/netcdf-scm_tas_Amon_CanESM2_1pctCO2_r1i1p1_189201-190312.nc
5993 2020-05-01 12:52:39,181 INFO:netcdf_scm:Skipped (already exists, not overwriting) /Users/znicholls/Documents/AGCEC/Misc/netcdf-scm/output-examples/crunched-files/netcdf-scm-crunched/cmip5/piControl/Amon/tas/NorESM1-M/r1i1p1/netcdf-scm_tas_Amon_NorESM1-M_piControl_r1i1p1_070001-120012.nc
5995 2020-05-01 12:52:39,185 INFO:netcdf_scm:Skipped (already exists, not overwriting) /Users/znicholls/Documents/AGCEC/Misc/netcdf-scm/output-examples/crunched-files/netcdf-scm-crunched/cmip5/rcp45/Amon/tas/ACCESS1-0/r1i1p1/netcdf-scm_tas_Amon_ACCESS1-0_rcp45_r1i1p1_200601-201012.nc
5996 2020-05-01 12:52:39,190 INFO:netcdf_scm:Skipped (already exists, not overwriting) /Users/znicholls/Documents/AGCEC/Misc/netcdf-scm/output-examples/crunched-files/netcdf-scm-crunched/cmip5/rcp45/Amon/tas/HadCM3/r1i1p1/netcdf-scm_tas_Amon_HadCM3_rcp45_r1i1p1_200601-203512.nc
5992 2020-05-01 12:52:39,193 INFO:netcdf_scm:Skipped (already exists, not overwriting) /Users/znicholls/Documents/AGCEC/Misc/netcdf-scm/output-examples/crunched-files/netcdf-scm-crunched/cmip5/rcp26/Amon/tas/bcc-csm1-1/r1i1p1/netcdf-scm_tas_Amon_bcc-csm1-1_rcp26_r1i1p1_209001-211012.nc
6000 2020-05-01 12:52:39,199 INFO:netcdf_scm:Skipped (already exists, not overwriting) /Users/znicholls/Documents/AGCEC/Misc/netcdf-scm/output-examples/crunched-files/netcdf-scm-crunched/cmip5/rcp85/Amon/tas/NorESM1-ME/r1i1p1/netcdf-scm_tas_Amon_NorESM1-ME_rcp85_r1i1p1_204001-205012.nc
100%|█████████████████████████████████████████| 9.00/9.00 [00:00<00:00, 111it/s]
5982 2020-05-01 12:52:39,212 INFO:netcdf_scm:Crunching 0 directories with greater than or equal to 50.0 and less than 120.0 million data points
5982 2020-05-01 12:52:39,213 INFO:netcdf_scm:Crunching 0 directories with greater than or equal to 120.0 million data points

The output is saved in a log file in the destination directory. It can be searched using any standard tool e.g. grep (in particular grep "\|WARNING\|INFO\|ERROR <log-file>). The most useful keywords to search for are “DEBUG”, “INFO”, “WARNING” and “ERROR”.