Opened 9 months ago

Last modified 8 months ago

#234 new Defect

Segmentation fault in interpol_wind_short

Reported by: ahilboll Owned by:
Priority: major Milestone:
Component: FP other Version: FLEXPART 10.3
Keywords: Cc:

Description

I'm running FLEXPART 10.3beta with ERA-5 meteorological data which I downloaded using flex_extract; here's the relevant parts of the CONTROL file I used:

M_GRID 250  
M_LEFT -155000
M_LOWER 35000
M_UPPER 70000
M_RIGHT -40000
M_LEVELIST 60/to/137

After some days of simulation, I'm running into a segmentation fault:

 call releaseparticles
 timemanager> SYSTEM CLOCK   6259.19092
 timemanager> call convmix -- forward
 timemanager> call advance
At line 125 of file interpol_wind_short.f90
Fortran runtime error: Index '-2147483648' of dimension 2 of array 'uu' below lower bound of 0

Error termination. Backtrace:
#0  0x493c4e in interpol_wind_short_
        at /mnt/beegfs/user/hilboll/tmp/flexpart_era5/test_segfault/flexpart/src/interpol_wind_short.f90:131
#1  0x5ee861 in advance_
        at /mnt/beegfs/user/hilboll/tmp/flexpart_era5/test_segfault/flexpart/src/advance.f90:911
#2  0x43cc02 in timemanager
        at /mnt/beegfs/user/hilboll/tmp/flexpart_era5/test_segfault/flexpart/src/timemanager.f90:534
#3  0x441656 in flexpart
        at /mnt/beegfs/user/hilboll/tmp/flexpart_era5/test_segfault/flexpart/src/FLEXPART.f90:455
#4  0x441b7c in main
        at /mnt/beegfs/user/hilboll/tmp/flexpart_era5/test_segfault/flexpart/src/FLEXPART.f90:50

So I re-ran the simulation using IPOUT=1 to get an idea of each particle's evolution, and I can see that in the last partposit file before the segmentation fault, there is at least one particle very close to the edge of the meteo domain:

In [2]: read_partposit('output_partdump/partposit_20180821160000')[1].ylat.min()
Out[2]: 35.000393

So it seems to me that there is a problem when applying the Petterssen scheme to particles at the meteo domain edges. From a logical point of view, I believe that in advance.f90, before applying the Petterssen scheme, one should check if the particle is still within the meteo domain, and terminate the particle otherwise. There are already several other return statements in there, so maybe just adding another check and return in there is enough?

I'm happy to try and contribute a patch, but would like to hear your thoughts first.

Change History (5)

comment:1 Changed 9 months ago by pesei

The question is, why did this never happen before? Has there been any change in v10 that could cause that? In principle such problems should have been taken care of.

comment:2 Changed 9 months ago by ahilboll

Okay, I have a dirty fix which prevents the crash but doesn't go to the root of the problem:

diff --git a/src/advance.f90 b/src/advance.f90
index 539a473..50b024d 100644
--- a/src/advance.f90
+++ b/src/advance.f90
@@ -828,6 +828,13 @@ subroutine advance(itime,nrelpoint,ldt,up,vp,wp, &
        (yt.gt.real(nymin1))) then
     nstop=3
     return
+  else if ((xt /= xt) .or. (yt /= yt)) then
+    ! AH: with non-global windfields, xt and yt can become NaN when using ERA-5
+    if (verbosity.gt.0) then
+      write (*,*) 'advance> terminating particle due to NaN in xt or yt'
+    endif
+    nstop=3
+    return
   endif
 
   ! If particle above highest model level, set it back into the domain

In my case, this allows the model to continue running without any problems.

By adding the additional debug output inside that if clause I could see that this case happens only once during my simulation.

I tried to dig a little deeper to get to the bottom of this, but at some point I stopped. dxsave and dysave both became NaN, because u and v were NaN, but I couldn't figure out why.

comment:3 Changed 9 months ago by ahilboll

P.S.: I should mention that this problem occurred with quite many simulations. However, in the one test case I was debugging, the above if statement only returns TRUE once during the simulation.

comment:4 Changed 8 months ago by pesei

Please let us know about the status.

comment:5 Changed 8 months ago by ahilboll

I implemented the patch above, as it allows the model to finish successfully. I did not further investigate why u and v can become NaN sometimes.

Note: See TracTickets for help on using tickets.
hosted by ZAMG