Opened 5 weeks ago

Last modified 31 hours ago

#252 new Defect

FLEXPART-WRF crashes for array bound mismatch

Reported by: harish Owned by:
Priority: major Milestone:
Component: FP coding/compilation Version: FLEXPART-WRF
Keywords: mpi, sendint_mpi Cc:

Description

Hi
The flexpart-wrf version 3.3.2 crashes after about five hours of backward mode run when it is run using mpirun.

Using -fbacktrace and -fbounds-check option, I found that it crashes at line 132 in subroutine sendint_mpi.f90

if (tag.eq.1) npoint(jj2+1:numpart2)=mpi_npoint(1:chunksize2)

with error

Fortran runtime error: Array bound mismatch for dimension 1 of array 'npoint' (2500/2511)

My input file is similar to flexwrf.input.backward1 provided in the flexwrf_v31_testcases.tar.gz except for the release locations and dates.

Also, the par_mod.f90 file has been modified to accommodate larger grid size of the WRF data being used in following manner

integer, parameter :: naxmax=721, nymax=361,nuvzmax=64, nwzmax=64, nzmax=64

The code was run using LSF command bsub < mpijobfile where the mpijobfile is

mpirun -np 12 ./flexwrf33_gnu_mpi flexwrf.input.backward1

The code runs without crashing for serial mode, however grid_time file does not contain other than zero values in that case with all file size 372 bytes. I am not sure if this two issues are related or separate.

Any help/hint to resolve this issue will be highly appreciated. The input file being used is attached. I will be glad to provide more information.

Harish

Attachments (1)

flexwrf.input.backward1 (6.2 KB) - added by harish 5 weeks ago.
input file used for the run

Download all attachments as: .zip

Change History (6)

Changed 5 weeks ago by harish

input file used for the run

comment:1 Changed 5 weeks ago by harish

Hi
The flexpart-wrf version 3.3.2 crashes after about five hours of backward mode run when it is run using mpirun.

Using -fbacktrace and -fbounds-check option, I found that it crashes at line 132 in subroutine sendint_mpi.f90

if (tag.eq.1) npoint(jj2+1:numpart2)=mpi_npoint(1:chunksize2)

with error

Fortran runtime error: Array bound mismatch for dimension 1 of array 'npoint' (2500/2511)

My input file is similar to flexwrf.input.backward1 provided in the flexwrf_v31_testcases.tar.gz except for the release locations and dates.

Also, the par_mod.f90 file has been modified to accommodate larger grid size of the WRF data being used.

integer, parameter :: naxmax=721, nymax=361,nuvzmax=64, nwzmax=64, nzmax=64

The code was run using LSF command bsub < mpijobfile where the mpijobfile contains

mpirun -np 12 ./flexwrf33_gnu_mpi flexwrf.input.backward1

The code runs without crashing for serial mode run. However, the grid_time_yyyymmddhhmmss files contain only zero values and their size is 372 bytes. I am not sure if this two issues are related or not.

Any help/hint to resolve this issue will be highly appreciated. The input file being used is attached. I will be glad to provide more information.

Harish

comment:2 Changed 4 weeks ago by pesei

Thank you for reporting this issue. I am on leave at this moment, not in my office, and I regret that I can't investigate the issue soon. I suggest that you post this also in the FLEXPART mailing list, hoping that other users might be able to help.

comment:3 Changed 33 hours ago by harish

I think the error is because of the way chunksize2 variable is calculated at line numbers 66 to 71 in file sendint_mpi.f90. The lines are shown below.

 ii=0
 do jj=1, numpart2, ntasks
   ii = ii + 1
   jj2= jj
  enddo
chunksize2=ii+numpart2-jj2

The ntasks variable is number of nodes over which the job is distributed. When it is exact multiple of numpart2 which is total number of particles being released in the simulation, value calculated for chunksize2 is wrong.

Until this bug is fixed, one can avoid this by deliberately setting either total number of particles to be released or number nodes over which job to be distributed such that number of nodes is not integer factor of total number of particles to be released.

comment:4 Changed 33 hours ago by harish

If possible this ticket should be moved from support to defect category.

comment:5 Changed 31 hours ago by pesei

  • Type changed from Support to Defect
Note: See TracTickets for help on using tickets.
hosted by ZAMG