Opened 5 years ago

Last modified 9 months ago

#77 accepted Enhancement

Speeding up FLEXPART

Reported by: pesei Owned by: pesei
Priority: major Milestone: FLEXPART 9.2
Component: FP input data Version: FLEXPART 9.0.2
Keywords: Cc:

Description

Using 1-h meteorological input, CTBTO/PTS/IDC has found that calculation times increased over the sustainable limit for their operational application and asked for support.

With their specific setup and 119 levels, they found the following distribution of CPU time per subroutine for 3-h data:

  %   cumulative   self              self     total
 time   seconds   seconds    calls  Ks/call  Ks/call  name
 24.25   9198.93  9198.93 2983679872     0.00     0.00  advance_
 14.62  14743.46  5544.53 1976235129     0.00     0.00  interpol_wind_
 12.12  19339.47  4596.02 1009684743     0.00     0.00  interpol_all_
 10.84  23453.55  4114.08 2983414505     0.00     0.00 interpol_wind_short_
  9.23  26954.62  3501.07      119     0.03     0.03  verttransform_
  8.08  30021.38  3066.76      119     0.03     0.03  calcpv_
  3.81  31466.55  1445.17        1     1.45    35.87  timemanager_
  3.62  32840.83  1374.28      119     0.01     0.01  readwind_
  2.98  33971.49  1130.66      469     0.00     0.00  conccalc_
  1.67  34606.72   635.24      119     0.01     0.03  calcpar_
...

whereas with 1-h data it is

  %   cumulative   self              self     total
 time   seconds   seconds    calls  Ks/call  Ks/call  name
 21.00   8774.93  8774.93      359     0.02     0.03  verttransform_
 16.29  15581.52  6806.59 3573265152     0.00     0.00  advance_
 12.84  20947.59  5366.07      359     0.01     0.01  calcpv_
  9.18  24786.27  3838.68 889791459     0.00     0.00  interpol_all_
  7.70  28002.91  3216.65      359     0.01     0.01  readwind_
  7.34  31071.00  3068.08 3572065979     0.00     0.00 interpol_wind_short_
  6.84  33931.05  2860.06 2692433693     0.00     0.00  interpol_wind_
  4.13  35655.92  1724.86        1     1.72    40.23  timemanager_
  3.64  37177.09  1521.17      359     0.00     0.02  calcpar_
  3.48  38630.15  1453.06      715     0.00     0.00  conccalc_
  0.96  39033.10   402.94                             grib_jasper_decode 

Change History (6)

comment:1 Changed 5 years ago by pesei

  • Owner changed from somebody to pesei
  • Status changed from new to accepted

Remedies under discussion are:

  1. Removing unnecessary levels -> ticket:67
  2. Commenting out calcpv.f90 as it is not needed for applications which do not involve PV-related actions or output
  3. Modifying verttransform.f90 so that it would check for existence of a previous dump of met input data transformed from ECMWF eta levels to FLEXPART levels, if found use that, if not found do verttransform and then dump it. This procedure would be very useful for the operational environment where daily calculations are performed, shifted by 1 day, and where multiple instances of FLEXPART run, using the same input data.

comment:2 Changed 5 years ago by pesei

Remark on calcpv.f90

It would be good to modify the code in such a way that calcpv is only called if it is really needed.

comment:3 Changed 5 years ago by pesei

Speeding up calcpv and verttransfrom

Leo Haimberger has provided new versions of these two subroutines that considerably speed up the code.

  • calcpv, look-up tables replacing calls to exponential function

In a realistic setting, using the Intel FORTRAN compiler (version 13.1.2), optimized with -O2, an input grid of 720x360 and 250000 trajectories for 2.5 days
the run times were
2450.034u 11.239s 41:02.46 99.9% with the old calcpv
1451.342u 06.654s 24:18.70 99.9% with the new calcpv

  • verttransfrom, rearrangement of nested loops with frequent jumps in the memory stack using auxiliary variables. Improvement of speed by a factor of 2.

Routines are now being tested by NILU.

comment:4 Changed 4 years ago by pesei

  • Priority changed from critical to major

Update

  1. There were different outcomes of tests by NILU and and IMGW/LH, in terms of speed-up, not yet resolved.
  1. CTBTO has awarded a contract for various Fp-related work including this one. So I reduce the priority, and hope that soon we'll see here input from the contractors, see FpCtbtoOverview
Last edited 9 months ago by pesei (previous) (diff)

comment:5 follow-up: Changed 3 years ago by dearn

Within the CTBTO project, an attempt to generate a tool to pre-process has been made. The results can be found in wiki:FpCtbtoWo4PreprocessingUtil

I leave to the ticket owner the decision on whether to close this ticket or not.

Last edited 9 months ago by pesei (previous) (diff)

comment:6 in reply to: ↑ 5 Changed 9 months ago by pesei

Replying to dearn:

I leave to the ticket owner the decision on whether to close this ticket or not.

We need to keep this ticket open as the method is not sufficiently general yet. There is also interaction with ticket:140 and possible future modifications such as designating a certain field (date/time) as reference for a simulation in the COMMAND file.

Note: See TracTickets for help on using tickets.
hosted by ZAMG