Opened 6 years ago
Last modified 4 years ago
#234 new Defect
Segmentation fault in interpol_wind_short
Reported by: | ahilboll | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | FP coding/compilation | Version: | FLEXPART 10.3 |
Keywords: | Cc: |
Description
I'm running FLEXPART 10.3beta with ERA-5 meteorological data which I downloaded using flex_extract; here's the relevant parts of the CONTROL file I used:
M_GRID 250 M_LEFT -155000 M_LOWER 35000 M_UPPER 70000 M_RIGHT -40000 M_LEVELIST 60/to/137
After some days of simulation, I'm running into a segmentation fault:
call releaseparticles timemanager> SYSTEM CLOCK 6259.19092 timemanager> call convmix -- forward timemanager> call advance At line 125 of file interpol_wind_short.f90 Fortran runtime error: Index '-2147483648' of dimension 2 of array 'uu' below lower bound of 0 Error termination. Backtrace: #0 0x493c4e in interpol_wind_short_ at /mnt/beegfs/user/hilboll/tmp/flexpart_era5/test_segfault/flexpart/src/interpol_wind_short.f90:131 #1 0x5ee861 in advance_ at /mnt/beegfs/user/hilboll/tmp/flexpart_era5/test_segfault/flexpart/src/advance.f90:911 #2 0x43cc02 in timemanager at /mnt/beegfs/user/hilboll/tmp/flexpart_era5/test_segfault/flexpart/src/timemanager.f90:534 #3 0x441656 in flexpart at /mnt/beegfs/user/hilboll/tmp/flexpart_era5/test_segfault/flexpart/src/FLEXPART.f90:455 #4 0x441b7c in main at /mnt/beegfs/user/hilboll/tmp/flexpart_era5/test_segfault/flexpart/src/FLEXPART.f90:50
So I re-ran the simulation using IPOUT=1 to get an idea of each particle's evolution, and I can see that in the last partposit file before the segmentation fault, there is at least one particle very close to the edge of the meteo domain:
In [2]: read_partposit('output_partdump/partposit_20180821160000')[1].ylat.min() Out[2]: 35.000393
So it seems to me that there is a problem when applying the Petterssen scheme to particles at the meteo domain edges. From a logical point of view, I believe that in advance.f90, before applying the Petterssen scheme, one should check if the particle is still within the meteo domain, and terminate the particle otherwise. There are already several other return statements in there, so maybe just adding another check and return in there is enough?
I'm happy to try and contribute a patch, but would like to hear your thoughts first.
Change History (6)
comment:1 Changed 6 years ago by pesei
comment:2 Changed 6 years ago by ahilboll
Okay, I have a dirty fix which prevents the crash but doesn't go to the root of the problem:
diff --git a/src/advance.f90 b/src/advance.f90 index 539a473..50b024d 100644 --- a/src/advance.f90 +++ b/src/advance.f90 @@ -828,6 +828,13 @@ subroutine advance(itime,nrelpoint,ldt,up,vp,wp, & (yt.gt.real(nymin1))) then nstop=3 return + else if ((xt /= xt) .or. (yt /= yt)) then + ! AH: with non-global windfields, xt and yt can become NaN when using ERA-5 + if (verbosity.gt.0) then + write (*,*) 'advance> terminating particle due to NaN in xt or yt' + endif + nstop=3 + return endif ! If particle above highest model level, set it back into the domain
In my case, this allows the model to continue running without any problems.
By adding the additional debug output inside that if clause I could see that this case happens only once during my simulation.
I tried to dig a little deeper to get to the bottom of this, but at some point I stopped. dxsave and dysave both became NaN, because u and v were NaN, but I couldn't figure out why.
comment:3 Changed 6 years ago by ahilboll
P.S.: I should mention that this problem occurred with quite many simulations. However, in the one test case I was debugging, the above if statement only returns TRUE once during the simulation.
comment:4 Changed 5 years ago by pesei
Please let us know about the status.
comment:5 Changed 5 years ago by ahilboll
I implemented the patch above, as it allows the model to finish successfully. I did not further investigate why u and v can become NaN sometimes.
comment:6 Changed 4 years ago by pesei
- Component changed from FP other to FP coding/compilation
The question is, why did this never happen before? Has there been any change in v10 that could cause that? In principle such problems should have been taken care of.