1 | |
---|
2 | |
---|
3 | # checkGRIB |
---|
4 | |
---|
5 | This utility was created primarily for CTBTO operations to facilitate a check |
---|
6 | of incoming ECMWF, NCEP and NCEPFV3 GRIB2 files before they are staged into the ATM |
---|
7 | pipeline. The idea is to catch problems in the files long before they are |
---|
8 | actually used, rather than to have the problems discovered, mysteriously, in |
---|
9 | a termination of an application that relies on the files. |
---|
10 | |
---|
11 | ## Author information |
---|
12 | |
---|
13 | Don Morton |
---|
14 | Boreal Scientific Computing |
---|
15 | Fairbanks, Alaska, USA |
---|
16 | Don.Morton@borealscicomp.com |
---|
17 | |
---|
18 | |
---|
19 | |
---|
20 | I'm horrible at legalese but, as far as I'm concerned, this utility is totally |
---|
21 | free and open to the public for use and modification, and is included with |
---|
22 | the FLEXPART distribution with the same permissions. |
---|
23 | |
---|
24 | ---- |
---|
25 | |
---|
26 | ## Installation |
---|
27 | |
---|
28 | Code in this directory is self-contained, but relies on an installation of |
---|
29 | GRIB-API library. Additionally, a test package is provided that depends |
---|
30 | on Python v2.7 (also works with Python 3.6) and GRIB-API command line tools. |
---|
31 | |
---|
32 | Compilation simply requires setting the `GRIB_API` variable in the *Makefile* |
---|
33 | to the local location, and then invoking *make* |
---|
34 | |
---|
35 | ---- |
---|
36 | |
---|
37 | ## Usage |
---|
38 | |
---|
39 | Usage is pretty straightforward, requiring flags `--source` and `--levels`, |
---|
40 | where the source is expected to be `ECMWF`, `NCEP` or `NCEPFV3`. |
---|
41 | Originally, we assumed GRIB2 (all messages) files, but we've learned that |
---|
42 | is not valid for EF files, so `--checkgrib2` is now an optional test, not |
---|
43 | performed by default. The number of levels should be the expected number |
---|
44 | of levels for 3D variables. |
---|
45 | |
---|
46 | The utility currently checks that all variables required for FLEXPART |
---|
47 | (including dry and wet deposition) are in the GRIB file, that all expected |
---|
48 | levels are included, that the source is, indeed the expected source. |
---|
49 | |
---|
50 | There are many more ambitious (and time-consuming) checks that could be |
---|
51 | added in the future. For example, the GRIB messages have fields for max, |
---|
52 | min and mean values, and we could actually read in each field and verify these. |
---|
53 | |
---|
54 | The utility prints a simple message to *stdout* if all is well with a file, |
---|
55 | and if an error is found, a message is printed to *stdout*, and the utility |
---|
56 | aborts with a system return code of 1 to indicate a problem with the GRIB file, |
---|
57 | and a 2 to indicate a problem with the utility. |
---|
58 | |
---|
59 | Some examples follow: |
---|
60 | |
---|
61 | ### A normal, successful test |
---|
62 | ``` |
---|
63 | $ ./checkGRIB --source ECMWF --levels 137 test/gribfiles/ecmwf0p5/EN19011112 |
---|
64 | Passed: test/gribfiles/ecmwf0p5/EN19011112 |
---|
65 | |
---|
66 | $ echo $? |
---|
67 | 0 |
---|
68 | ``` |
---|
69 | |
---|
70 | ### Checking with an incorrect source parameter |
---|
71 | |
---|
72 | ``` |
---|
73 | $ ./checkGRIB --source NCEP --levels 137 test/gribfiles/ecmwf0p5/EN19011112 |
---|
74 | ERROR: Unexpected grib_centre: 98 |
---|
75 | Failed on: test/gribfiles/ecmwf0p5/EN19011112 |
---|
76 | |
---|
77 | $ echo $? |
---|
78 | 1 |
---|
79 | ``` |
---|
80 | |
---|
81 | ### Non-existent file |
---|
82 | |
---|
83 | ``` |
---|
84 | $ ./checkGRIB --source NCEP --levels 137 Huh??? |
---|
85 | GRIB_API ERROR : IO ERROR: No such file or directory: Huh??? (No such file or directory) |
---|
86 | ERROR: problem opening GRIB file: Huh??? |
---|
87 | Failed on: Huh??? |
---|
88 | |
---|
89 | $ echo $? |
---|
90 | 2 |
---|
91 | ``` |
---|
92 | |
---|
93 | ### Finding more levels than expected |
---|
94 | |
---|
95 | ``` |
---|
96 | $ ./checkGRIB --source NCEP --levels 27 test/gribfiles/ncep0p5/GD19010818 |
---|
97 | ERROR: Found more than 27 levels |
---|
98 | Failed on: test/gribfiles/ncep0p5/GD19010818 |
---|
99 | |
---|
100 | $ echo $? |
---|
101 | 1 |
---|
102 | ``` |
---|
103 | |
---|
104 | ### Finding less levels than expected |
---|
105 | |
---|
106 | ``` |
---|
107 | $ ./checkGRIB --source NCEP --levels 48 test/gribfiles/ncep0p5/GD19010818 |
---|
108 | ERROR: Only found 31 levels |
---|
109 | Failed on: test/gribfiles/ncep0p5/GD19010818 |
---|
110 | |
---|
111 | $ echo $? |
---|
112 | 1 |
---|
113 | ``` |
---|
114 | ---- |
---|
115 | |
---|
116 | ## Structure |
---|
117 | |
---|
118 | All of the GRIB-related code is in the module, *cgutils.F90*. There is a |
---|
119 | lot of "file-specific" code in here for ECMWF, NCEP and NCEP FV3 GRIB files. |
---|
120 | The main program, *checkGRIB.F90* merely collects and parses command |
---|
121 | line arguments, and invokes appropriate routines in the *cgutils* module. |
---|
122 | |
---|
123 | ---- |
---|
124 | |
---|
125 | ## Testing |
---|
126 | |
---|
127 | A comprehensive testing package, located in the *test* subdirectory, |
---|
128 | was developed to test this utility under a wide variety of conditions, |
---|
129 | including successful execution, bad command line arguments, missing levels |
---|
130 | and missing variables, etc. Obviously, not every potential problem is tested, |
---|
131 | but a broad assortment of "spot-checking" leaves me confident that this |
---|
132 | is robust. |
---|
133 | |
---|
134 | The testing package requires the default Python v2.7 found on *devlan* |
---|
135 | (it also runs with Python v3.6), and the command line grib tools from |
---|
136 | GRIB-API. The grib tools are required because some of the tests require |
---|
137 | taking a good GRIB file and making it bad. I decided it would be better |
---|
138 | to do this programatically rather than requiring a large number of GRIB |
---|
139 | files for testing. |
---|
140 | |
---|
141 | |
---|
142 | |
---|
143 | |
---|
144 | |
---|
145 | The current tests include: |
---|
146 | |
---|
147 | ``` |
---|
148 | --------test_missing_source_cli_arg_detected-------- |
---|
149 | --------test_invalid_source_cli_arg_detected------- |
---|
150 | --------test_missing_levels_cli_arg_detected-------- |
---|
151 | --------test_checkanl_and_checkfcst_detected-------- |
---|
152 | --------test_reports_no_cli_args-------- |
---|
153 | --------test_reports_no_path_args-------- |
---|
154 | --------test_successful_ecmwfgrib2_check-------- |
---|
155 | --------test_successful_ecmwfgrib_ef_check-------- |
---|
156 | --------test_reports_ecmwfgrib_ef_fails_with_checkgrib2-------- |
---|
157 | --------test_reports_unexpected_ecmwf_grib_center-------- |
---|
158 | --------test_reports_unexpected_ecmwf_level-------- |
---|
159 | --------test_reports_missing_ecmwf_levels-------- |
---|
160 | --------test_reports_failed_ecmwf_anl_check-------- |
---|
161 | --------test_reports_failed_ecmwf_fcst_check-------- |
---|
162 | --------test_ecmwf_detects_missing_level_14-------- |
---|
163 | --------test_ecmwf_detects_missing_level_14_etadot-------- |
---|
164 | --------test_ecmwf_detects_missing_var_lsp-------- |
---|
165 | --------test_ecmwf_detects_grib1_nsss-------- |
---|
166 | --------test_reports_unexpected_ncep_grib_center-------- |
---|
167 | --------test_successful_ncepgrib2_check-------- |
---|
168 | --------test_detects_ncep_more_pressure_levels_than_expected------ |
---|
169 | --------test_detects_ncep_fewer_pressure_levels_than_expected------ |
---|
170 | --------test_ncep_detects_missing_level_850------ |
---|
171 | --------test_ncep_detects_missing_level_100_r------ |
---|
172 | --------test_ncep_detects_missing_var_10u------ |
---|
173 | --------test_ncep_detects_missing_var_tsig1------ |
---|
174 | --------test_ncep_detects_grib1_2t------ |
---|
175 | --------test_detects_ncepfv3_more_pressure_levels_than_expected------ |
---|
176 | --------test_detects_ncepfv3_fewer_pressure_levels_than_expected------ |
---|
177 | --------test_ncepfv3_detects_missing_level_850------ |
---|
178 | --------test_ncepfv3_detects_missing_level_100_r------ |
---|
179 | --------test_ncepfv3_detects_missing_var_10u------ |
---|
180 | --------test_ncepfv3_detects_missing_var_tsig1------ |
---|
181 | --------test_ncepfv3_detects_grib1_2t------ |
---|
182 | ``` |
---|
183 | |
---|
184 | Successful execution of the tests looks like |
---|
185 | |
---|
186 | ``` |
---|
187 | $ ./checkgrib_test.py |
---|
188 | Compiling checkGRIB in dir: /home/morton/git/MyFlexpartTools/CTBTO_SWEATM/WO02/checkgrib |
---|
189 | compile_passed: True |
---|
190 | --------test_missing_source_cli_arg_detected-------- |
---|
191 | args missing or out of sync |
---|
192 | |
---|
193 | Usage: |
---|
194 | |
---|
195 | checkgrib --source [ECMWF | NCEP | NCEPFV3] --levels <int> |
---|
196 | [ --checkfcst | --checkanl | --checkgrib2 ] |
---|
197 | path1 path2 ... |
---|
198 | |
---|
199 | passed: True |
---|
200 | ----------------------------------------------- |
---|
201 | . |
---|
202 | . |
---|
203 | . |
---|
204 | --------test_ecmwf_detects_missing_var_lsp-------- |
---|
205 | ERROR: lsp: not found |
---|
206 | Failed on: /tmp/1fbd642c-316e-4fc6-910a-24870fc2611b.gr2 |
---|
207 | passed: True |
---|
208 | ----------------------------------------------- |
---|
209 | . |
---|
210 | . |
---|
211 | . |
---|
212 | --------test_ncep_detects_grib1_2t------ |
---|
213 | Grib message not GRIB2 |
---|
214 | ERROR: Bad gribmsg_shortname: 2t, gribmsg_level: 2 |
---|
215 | Failed on: /tmp/badeefa8-2ab8-4474-849c-45ac2ec4ac44.gr2 |
---|
216 | passed: True |
---|
217 | ----------------------------------------------- |
---|
218 | . |
---|
219 | . |
---|
220 | . |
---|
221 | --------test_ncepfv3_detects_grib1_2t------ |
---|
222 | Grib message not GRIB2 |
---|
223 | ERROR: Bad gribmsg_shortname: 2t, gribmsg_level: 2 |
---|
224 | Failed on: /tmp/da9dd1fc-d843-4b9e-ab74-82ea8769fbe3.gr2 |
---|
225 | passed: True |
---|
226 | ----------------------------------------------- |
---|
227 | |
---|
228 | ************************ |
---|
229 | Passed tests: 35 |
---|
230 | Failed tests: 0 |
---|
231 | ************************ |
---|
232 | |
---|
233 | ``` |
---|
234 | |
---|
235 | |
---|
236 | ## Testing files |
---|
237 | |
---|
238 | The testing package requires a set of GRIB files. I have this set up |
---|
239 | so that it can use a prepared package of files or files in place at |
---|
240 | CTBTO |
---|
241 | |
---|
242 | ### Prepared test files |
---|
243 | |
---|
244 | These are available at http://borealscicomp.com/CTBTO_SWEATM/checkgrib_testfiles/gribfiles/. Unfortunately, because they contain ECMWF files and ECMWF is |
---|
245 | very strict about posting of such files, I have to make these non-readable. |
---|
246 | If somebody wants to retrieve them, they should notify me and I can make them |
---|
247 | temporarily readable. Once readable, they can be placed in the test |
---|
248 | package by going to the *checkgrib/test/* directory, then |
---|
249 | |
---|
250 | ``` |
---|
251 | $ wget --recursive --no-parent --cut-dirs=2 -nH -R "index.html*" --execute robots=off http://borealscicomp.com/CTBTO_SWEATM/checkgrib_testfiles/gribfiles |
---|
252 | ``` |
---|
253 | |
---|
254 | Then, in *checkgrib_test.py* be sure to set the following |
---|
255 | |
---|
256 | ``` |
---|
257 | ECMWF_PREFIX = 'gribfiles/ecmwf0p5' |
---|
258 | NCEP_PREFIX = 'gribfiles/ncep0p5' |
---|
259 | NCEPFV3_PREFIX = 'gribfiles/ncepfv30p5' |
---|
260 | ``` |
---|
261 | |
---|
262 | |
---|
263 | There is a *gribfiles* entry in *test/.gitignore* so that these large files |
---|
264 | won't be committed to the repo. |
---|
265 | |
---|
266 | ### Using CTBTO files |
---|
267 | |
---|
268 | #WARNING - this section is not relevant right now# |
---|
269 | |
---|
270 | The ncep subdir structure now has subdirectories *0.5* and |
---|
271 | *0.5.fv3*, which would require some recoding to make this work |
---|
272 | correctly. |
---|
273 | |
---|
274 | In *checkgrib_test.py* one would want to set (for example) |
---|
275 | |
---|
276 | ``` |
---|
277 | CTBTO_PREFIX = '/ops/data/atm' |
---|
278 | ECMWF_PREFIX = CTBTO_PREFIX + '/ecmwf/2019/01/11/0.5' |
---|
279 | NCEP_PREFIX = CTBTO_PREFIX + '/ncep/2019/01/08/0.5' |
---|
280 | ``` |
---|
281 | |
---|
282 | Of course, you would want to make sure that the paths are actually valid, |
---|
283 | as they may change over time. |
---|
284 | |
---|
285 | |
---|
286 | ---- |
---|
287 | |
---|
288 | ## Notes |
---|
289 | |
---|
290 | |
---|
291 | ### ECMWF notes |
---|
292 | |
---|
293 | * Depending on the version of *GRIB-API* used, snow depth in ECMWF files may |
---|
294 | have a *shortName* of *sd* or *sde.* The utility will handle both cases. |
---|
295 | |
---|
296 | * EF files seem to have the old standard GRIB2 for model level variables |
---|
297 | and GRIB1 for the surface variables, and tests have been adjusted to |
---|
298 | account for this. |
---|
299 | |
---|
300 | |
---|
301 | |
---|
302 | |
---|
303 | |
---|
304 | |
---|
305 | ### NCEP notes |
---|
306 | |
---|
307 | * Vertical wind, *w*, seems to be only available at pressure levels of 100mb |
---|
308 | and lower in altitude, so that's all I'm checking for, and it's done in its |
---|
309 | own function, *ncep_all_expected_w_levels_present()*. I have verified |
---|
310 | through my own operations with other NCEP files that *w* is not available |
---|
311 | for levels higher than 100mb. |
---|
312 | |
---|
313 | * It doesn't seem like total precipitation, *tc*, convective precipitation, |
---|
314 | *acpcp*, or total cloud cover, *tcc* are available in the NCEP files being |
---|
315 | downloaded to CTBTO. They used to be available. So, I'm not checking for |
---|
316 | those right now (I've commented out the checks in *ncep_all_2dvars_present()*). |
---|
317 | * For 2-meter RH, one needs to look for leveltype *heightAboveGround* and |
---|
318 | then, for older GRIB-API, look for *r* specifically at *level* 2, because |
---|
319 | there are other messages with *shortName* of *r*. But, for newer GRIB-API |
---|
320 | the *shortName* to look for is a unique *2r* (also at *level* 2, but that |
---|
321 | doesn't really matter once you have the *2r*). The utility handles both |
---|
322 | situations. |
---|
323 | |
---|
324 | |
---|
325 | |
---|
326 | ### NCEP FV3 notes |
---|
327 | |
---|
328 | * The notes concerning NCEP, above, apply. |
---|
329 | |
---|
330 | * NCEP has added 15 mb and 40 mb pressure levels to the GRIB files, but only for |
---|
331 | the temperature variable and some other variables we don't care about. This |
---|
332 | ends up breaking the NCEP-checking code, and FV3-specific code has been added. |
---|
333 | |
---|
334 | |
---|
335 | |
---|
336 | ### forecast vs analysis messages |
---|
337 | |
---|
338 | This got complicated and confusing and is not complete, pending more |
---|
339 | complete future understanding of how we really want to categorise a GRIB |
---|
340 | file as being *analysis* or *forecast* |
---|
341 | |
---|
342 | * GRIB messages are checked for analysis or forecast by looking at the |
---|
343 | *dataType* field for values of *an* or *fc*. I'm a little confused on |
---|
344 | these values, and I have some suspicion that they may not always be correct. |
---|
345 | There is also a *forecastTime* field that "seems" to be zero for what might |
---|
346 | be analysis messages and nonzero (number of hours since last analysis) |
---|
347 | for forecast messages. |
---|
348 | |
---|
349 | * For ECMWF, it appears that the analysis files (as determined by a |
---|
350 | *dataType* of *an* are 12Z. Most of the messages are *an* in the 12Z files, |
---|
351 | yet there are six messages in these files that are *fc*: |
---|
352 | |
---|
353 | |
---|
354 | ``` |
---|
355 | 2 ecmf 20190111 fc regular_ll surface 0 0 lsp grid_jpeg |
---|
356 | 2 ecmf 20190111 fc regular_ll surface 0 0 acpcp grid_jpeg |
---|
357 | 2 ecmf 20190111 fc regular_ll surface 0 0 sshf grid_jpeg |
---|
358 | 2 ecmf 20190111 fc regular_ll surface 0 0 ewss grid_jpeg |
---|
359 | 2 ecmf 20190111 fc regular_ll surface 0 0 nsss grid_jpeg |
---|
360 | 2 ecmf 20190111 fc regular_ll surface 0 0 ssr grid_jpeg |
---|
361 | ``` |
---|
362 | |
---|
363 | |
---|
364 | * Meanwhile, the non-12Z ECMWF files seem to have *dataType* of *fc* for all |
---|
365 | messages except for four: |
---|
366 | |
---|
367 | |
---|
368 | ``` |
---|
369 | 2 ecmf 20190111 an regular_ll surface 0 0 sdor grid_jpeg |
---|
370 | 2 ecmf 20190111 an regular_ll surface 0 0 cvl grid_jpeg |
---|
371 | 2 ecmf 20190111 an regular_ll surface 0 0 cvh grid_jpeg |
---|
372 | 2 ecmf 20190111 an regular_ll surface 0 0 sr grid_jpeg |
---|
373 | ``` |
---|
374 | |
---|
375 | |
---|
376 | The problem with all of this is that a test for a successful *fcst* or *anl* |
---|
377 | file will always fail, since they are currently mixed. |
---|
378 | |
---|
379 | * For NCEP files, it appears that all messages in all files are *fc*. |
---|
380 | |
---|
381 | Meanwhile, correspondence with both Leo and Henrik suggest that people |
---|
382 | "generally" assume that 00/06/12/18 files are *analysis* (though some |
---|
383 | messages will still be *forecast*). So, the problems are |
---|
384 | |
---|
385 | * For ECMWF files, I "think" we will always have a mixture of *analysis* |
---|
386 | and *forecast* messages |
---|
387 | * The use of *dataType* values of *fc* or *an* doesn't seem to correlate |
---|
388 | with the expected 00/06/12/18 *analyses* files. |
---|
389 | |
---|
390 | So, in short, we need to better define what we want to look for. |
---|
391 | The code is generally in the utility, but would need to be modified for |
---|
392 | specific definitions. |
---|
393 | |
---|