# checkGRIB This utility was created primarily for CTBTO operations to facilitate a check of incoming ECMWF, NCEP and NCEPFV3 GRIB2 files before they are staged into the ATM pipeline. The idea is to catch problems in the files long before they are actually used, rather than to have the problems discovered, mysteriously, in a termination of an application that relies on the files. ## Author information Don Morton Boreal Scientific Computing Fairbanks, Alaska, USA Don.Morton@borealscicomp.com I'm horrible at legalese but, as far as I'm concerned, this utility is totally free and open to the public for use and modification, and is included with the FLEXPART distribution with the same permissions. ---- ## Installation Code in this directory is self-contained, but relies on an installation of GRIB-API library. Additionally, a test package is provided that depends on Python v2.7 (also works with Python 3.6) and GRIB-API command line tools. Compilation simply requires setting the `GRIB_API` variable in the *Makefile* to the local location, and then invoking *make* ---- ## Usage Usage is pretty straightforward, requiring flags `--source` and `--levels`, where the source is expected to be `ECMWF`, `NCEP` or `NCEPFV3`. Originally, we assumed GRIB2 (all messages) files, but we've learned that is not valid for EF files, so `--checkgrib2` is now an optional test, not performed by default. The number of levels should be the expected number of levels for 3D variables. The utility currently checks that all variables required for FLEXPART (including dry and wet deposition) are in the GRIB file, that all expected levels are included, that the source is, indeed the expected source. There are many more ambitious (and time-consuming) checks that could be added in the future. For example, the GRIB messages have fields for max, min and mean values, and we could actually read in each field and verify these. The utility prints a simple message to *stdout* if all is well with a file, and if an error is found, a message is printed to *stdout*, and the utility aborts with a system return code of 1 to indicate a problem with the GRIB file, and a 2 to indicate a problem with the utility. Some examples follow: ### A normal, successful test ``` $ ./checkGRIB --source ECMWF --levels 137 test/gribfiles/ecmwf0p5/EN19011112 Passed: test/gribfiles/ecmwf0p5/EN19011112 $ echo $? 0 ``` ### Checking with an incorrect source parameter ``` $ ./checkGRIB --source NCEP --levels 137 test/gribfiles/ecmwf0p5/EN19011112 ERROR: Unexpected grib_centre: 98 Failed on: test/gribfiles/ecmwf0p5/EN19011112 $ echo $? 1 ``` ### Non-existent file ``` $ ./checkGRIB --source NCEP --levels 137 Huh??? GRIB_API ERROR : IO ERROR: No such file or directory: Huh??? (No such file or directory) ERROR: problem opening GRIB file: Huh??? Failed on: Huh??? $ echo $? 2 ``` ### Finding more levels than expected ``` $ ./checkGRIB --source NCEP --levels 27 test/gribfiles/ncep0p5/GD19010818 ERROR: Found more than 27 levels Failed on: test/gribfiles/ncep0p5/GD19010818 $ echo $? 1 ``` ### Finding less levels than expected ``` $ ./checkGRIB --source NCEP --levels 48 test/gribfiles/ncep0p5/GD19010818 ERROR: Only found 31 levels Failed on: test/gribfiles/ncep0p5/GD19010818 $ echo $? 1 ``` ---- ## Structure All of the GRIB-related code is in the module, *cgutils.F90*. There is a lot of "file-specific" code in here for ECMWF, NCEP and NCEP FV3 GRIB files. The main program, *checkGRIB.F90* merely collects and parses command line arguments, and invokes appropriate routines in the *cgutils* module. ---- ## Testing A comprehensive testing package, located in the *test* subdirectory, was developed to test this utility under a wide variety of conditions, including successful execution, bad command line arguments, missing levels and missing variables, etc. Obviously, not every potential problem is tested, but a broad assortment of "spot-checking" leaves me confident that this is robust. The testing package requires the default Python v2.7 found on *devlan* (it also runs with Python v3.6), and the command line grib tools from GRIB-API. The grib tools are required because some of the tests require taking a good GRIB file and making it bad. I decided it would be better to do this programatically rather than requiring a large number of GRIB files for testing. The current tests include: ``` --------test_missing_source_cli_arg_detected-------- --------test_invalid_source_cli_arg_detected------- --------test_missing_levels_cli_arg_detected-------- --------test_checkanl_and_checkfcst_detected-------- --------test_reports_no_cli_args-------- --------test_reports_no_path_args-------- --------test_successful_ecmwfgrib2_check-------- --------test_successful_ecmwfgrib_ef_check-------- --------test_reports_ecmwfgrib_ef_fails_with_checkgrib2-------- --------test_reports_unexpected_ecmwf_grib_center-------- --------test_reports_unexpected_ecmwf_level-------- --------test_reports_missing_ecmwf_levels-------- --------test_reports_failed_ecmwf_anl_check-------- --------test_reports_failed_ecmwf_fcst_check-------- --------test_ecmwf_detects_missing_level_14-------- --------test_ecmwf_detects_missing_level_14_etadot-------- --------test_ecmwf_detects_missing_var_lsp-------- --------test_ecmwf_detects_grib1_nsss-------- --------test_reports_unexpected_ncep_grib_center-------- --------test_successful_ncepgrib2_check-------- --------test_detects_ncep_more_pressure_levels_than_expected------ --------test_detects_ncep_fewer_pressure_levels_than_expected------ --------test_ncep_detects_missing_level_850------ --------test_ncep_detects_missing_level_100_r------ --------test_ncep_detects_missing_var_10u------ --------test_ncep_detects_missing_var_tsig1------ --------test_ncep_detects_grib1_2t------ --------test_detects_ncepfv3_more_pressure_levels_than_expected------ --------test_detects_ncepfv3_fewer_pressure_levels_than_expected------ --------test_ncepfv3_detects_missing_level_850------ --------test_ncepfv3_detects_missing_level_100_r------ --------test_ncepfv3_detects_missing_var_10u------ --------test_ncepfv3_detects_missing_var_tsig1------ --------test_ncepfv3_detects_grib1_2t------ ``` Successful execution of the tests looks like ``` $ ./checkgrib_test.py Compiling checkGRIB in dir: /home/morton/git/MyFlexpartTools/CTBTO_SWEATM/WO02/checkgrib compile_passed: True --------test_missing_source_cli_arg_detected-------- args missing or out of sync Usage: checkgrib --source [ECMWF | NCEP | NCEPFV3] --levels [ --checkfcst | --checkanl | --checkgrib2 ] path1 path2 ... passed: True ----------------------------------------------- . . . --------test_ecmwf_detects_missing_var_lsp-------- ERROR: lsp: not found Failed on: /tmp/1fbd642c-316e-4fc6-910a-24870fc2611b.gr2 passed: True ----------------------------------------------- . . . --------test_ncep_detects_grib1_2t------ Grib message not GRIB2 ERROR: Bad gribmsg_shortname: 2t, gribmsg_level: 2 Failed on: /tmp/badeefa8-2ab8-4474-849c-45ac2ec4ac44.gr2 passed: True ----------------------------------------------- . . . --------test_ncepfv3_detects_grib1_2t------ Grib message not GRIB2 ERROR: Bad gribmsg_shortname: 2t, gribmsg_level: 2 Failed on: /tmp/da9dd1fc-d843-4b9e-ab74-82ea8769fbe3.gr2 passed: True ----------------------------------------------- ************************ Passed tests: 35 Failed tests: 0 ************************ ``` ## Testing files The testing package requires a set of GRIB files. I have this set up so that it can use a prepared package of files or files in place at CTBTO ### Prepared test files These are available at http://borealscicomp.com/CTBTO_SWEATM/checkgrib_testfiles/gribfiles/. Unfortunately, because they contain ECMWF files and ECMWF is very strict about posting of such files, I have to make these non-readable. If somebody wants to retrieve them, they should notify me and I can make them temporarily readable. Once readable, they can be placed in the test package by going to the *checkgrib/test/* directory, then ``` $ wget --recursive --no-parent --cut-dirs=2 -nH -R "index.html*" --execute robots=off http://borealscicomp.com/CTBTO_SWEATM/checkgrib_testfiles/gribfiles ``` Then, in *checkgrib_test.py* be sure to set the following ``` ECMWF_PREFIX = 'gribfiles/ecmwf0p5' NCEP_PREFIX = 'gribfiles/ncep0p5' NCEPFV3_PREFIX = 'gribfiles/ncepfv30p5' ``` There is a *gribfiles* entry in *test/.gitignore* so that these large files won't be committed to the repo. ### Using CTBTO files #WARNING - this section is not relevant right now# The ncep subdir structure now has subdirectories *0.5* and *0.5.fv3*, which would require some recoding to make this work correctly. In *checkgrib_test.py* one would want to set (for example) ``` CTBTO_PREFIX = '/ops/data/atm' ECMWF_PREFIX = CTBTO_PREFIX + '/ecmwf/2019/01/11/0.5' NCEP_PREFIX = CTBTO_PREFIX + '/ncep/2019/01/08/0.5' ``` Of course, you would want to make sure that the paths are actually valid, as they may change over time. ---- ## Notes ### ECMWF notes * Depending on the version of *GRIB-API* used, snow depth in ECMWF files may have a *shortName* of *sd* or *sde.* The utility will handle both cases. * EF files seem to have the old standard GRIB2 for model level variables and GRIB1 for the surface variables, and tests have been adjusted to account for this. ### NCEP notes * Vertical wind, *w*, seems to be only available at pressure levels of 100mb and lower in altitude, so that's all I'm checking for, and it's done in its own function, *ncep_all_expected_w_levels_present()*. I have verified through my own operations with other NCEP files that *w* is not available for levels higher than 100mb. * It doesn't seem like total precipitation, *tc*, convective precipitation, *acpcp*, or total cloud cover, *tcc* are available in the NCEP files being downloaded to CTBTO. They used to be available. So, I'm not checking for those right now (I've commented out the checks in *ncep_all_2dvars_present()*). * For 2-meter RH, one needs to look for leveltype *heightAboveGround* and then, for older GRIB-API, look for *r* specifically at *level* 2, because there are other messages with *shortName* of *r*. But, for newer GRIB-API the *shortName* to look for is a unique *2r* (also at *level* 2, but that doesn't really matter once you have the *2r*). The utility handles both situations. ### NCEP FV3 notes * The notes concerning NCEP, above, apply. * NCEP has added 15 mb and 40 mb pressure levels to the GRIB files, but only for the temperature variable and some other variables we don't care about. This ends up breaking the NCEP-checking code, and FV3-specific code has been added. ### forecast vs analysis messages This got complicated and confusing and is not complete, pending more complete future understanding of how we really want to categorise a GRIB file as being *analysis* or *forecast* * GRIB messages are checked for analysis or forecast by looking at the *dataType* field for values of *an* or *fc*. I'm a little confused on these values, and I have some suspicion that they may not always be correct. There is also a *forecastTime* field that "seems" to be zero for what might be analysis messages and nonzero (number of hours since last analysis) for forecast messages. * For ECMWF, it appears that the analysis files (as determined by a *dataType* of *an* are 12Z. Most of the messages are *an* in the 12Z files, yet there are six messages in these files that are *fc*: ``` 2            ecmf 20190111     fc regular_ll surface      0 0 lsp grid_jpeg    2            ecmf 20190111     fc regular_ll surface      0 0 acpcp grid_jpeg    2            ecmf 20190111     fc regular_ll surface      0 0 sshf grid_jpeg    2            ecmf 20190111     fc regular_ll surface      0 0 ewss grid_jpeg    2            ecmf 20190111     fc regular_ll surface      0 0 nsss grid_jpeg    2            ecmf 20190111     fc regular_ll surface      0 0 ssr grid_jpeg    ``` * Meanwhile, the non-12Z ECMWF files seem to have *dataType* of *fc* for all messages except for four: ``` 2            ecmf 20190111     an regular_ll surface      0 0 sdor grid_jpeg    2            ecmf 20190111     an regular_ll surface      0 0 cvl grid_jpeg    2            ecmf 20190111     an regular_ll surface      0 0 cvh grid_jpeg    2            ecmf 20190111     an regular_ll surface      0 0 sr grid_jpeg    ``` The problem with all of this is that a test for a successful *fcst* or *anl* file will always fail, since they are currently mixed. * For NCEP files, it appears that all messages in all files are *fc*. Meanwhile, correspondence with both Leo and Henrik suggest that people "generally" assume that 00/06/12/18 files are *analysis* (though some messages will still be *forecast*). So, the problems are * For ECMWF files, I "think" we will always have a mixture of *analysis* and *forecast* messages * The use of *dataType* values of *fc* or *an* doesn't seem to correlate with the expected 00/06/12/18 *analyses* files. So, in short, we need to better define what we want to look for. The code is generally in the utility, but would need to be modified for specific definitions.