Opened 10 years ago
Last modified 6 years ago
#77 accepted Enhancement
Speeding up FLEXPART
Reported by: | pesei | Owned by: | pesei |
---|---|---|---|
Priority: | major | Milestone: | FLEXPART 9.2 |
Component: | FP input data | Version: | FLEXPART 9.0.2 |
Keywords: | Cc: |
Description
Using 1-h meteorological input, CTBTO/PTS/IDC has found that calculation times increased over the sustainable limit for their operational application and asked for support.
With their specific setup and 119 levels, they found the following distribution of CPU time per subroutine for 3-h data:
% cumulative self self total time seconds seconds calls Ks/call Ks/call name 24.25 9198.93 9198.93 2983679872 0.00 0.00 advance_ 14.62 14743.46 5544.53 1976235129 0.00 0.00 interpol_wind_ 12.12 19339.47 4596.02 1009684743 0.00 0.00 interpol_all_ 10.84 23453.55 4114.08 2983414505 0.00 0.00 interpol_wind_short_ 9.23 26954.62 3501.07 119 0.03 0.03 verttransform_ 8.08 30021.38 3066.76 119 0.03 0.03 calcpv_ 3.81 31466.55 1445.17 1 1.45 35.87 timemanager_ 3.62 32840.83 1374.28 119 0.01 0.01 readwind_ 2.98 33971.49 1130.66 469 0.00 0.00 conccalc_ 1.67 34606.72 635.24 119 0.01 0.03 calcpar_ ...
whereas with 1-h data it is
% cumulative self self total time seconds seconds calls Ks/call Ks/call name 21.00 8774.93 8774.93 359 0.02 0.03 verttransform_ 16.29 15581.52 6806.59 3573265152 0.00 0.00 advance_ 12.84 20947.59 5366.07 359 0.01 0.01 calcpv_ 9.18 24786.27 3838.68 889791459 0.00 0.00 interpol_all_ 7.70 28002.91 3216.65 359 0.01 0.01 readwind_ 7.34 31071.00 3068.08 3572065979 0.00 0.00 interpol_wind_short_ 6.84 33931.05 2860.06 2692433693 0.00 0.00 interpol_wind_ 4.13 35655.92 1724.86 1 1.72 40.23 timemanager_ 3.64 37177.09 1521.17 359 0.00 0.02 calcpar_ 3.48 38630.15 1453.06 715 0.00 0.00 conccalc_ 0.96 39033.10 402.94 grib_jasper_decode
Change History (6)
comment:1 Changed 10 years ago by pesei
- Owner changed from somebody to pesei
- Status changed from new to accepted
comment:2 Changed 10 years ago by pesei
Remark on calcpv.f90
It would be good to modify the code in such a way that calcpv is only called if it is really needed.
comment:3 Changed 10 years ago by pesei
Speeding up calcpv and verttransfrom
Leo Haimberger has provided new versions of these two subroutines that considerably speed up the code.
- calcpv, look-up tables replacing calls to exponential function
In a realistic setting, using the Intel FORTRAN compiler (version 13.1.2), optimized with -O2, an input grid of 720x360 and 250000 trajectories for 2.5 days
the run times were
2450.034u 11.239s 41:02.46 99.9% with the old calcpv
1451.342u 06.654s 24:18.70 99.9% with the new calcpv
- verttransfrom, rearrangement of nested loops with frequent jumps in the memory stack using auxiliary variables. Improvement of speed by a factor of 2.
Routines are now being tested by NILU.
comment:4 Changed 9 years ago by pesei
- Priority changed from critical to major
Update
- There were different outcomes of tests by NILU and and IMGW/LH, in terms of speed-up, not yet resolved.
- CTBTO has awarded a contract for various Fp-related work including this one. So I reduce the priority, and hope that soon we'll see here input from the contractors, see FpCtbtoOverview
comment:5 follow-up: ↓ 6 Changed 8 years ago by dearn
Within the CTBTO project, an attempt to generate a tool to pre-process has been made. The results can be found in wiki:FpCtbtoWo4PreprocessingUtil
I leave to the ticket owner the decision on whether to close this ticket or not.
comment:6 in reply to: ↑ 5 Changed 6 years ago by pesei
Replying to dearn:
I leave to the ticket owner the decision on whether to close this ticket or not.
We need to keep this ticket open as the method is not sufficiently general yet. There is also interaction with ticket:140 and possible future modifications such as designating a certain field (date/time) as reference for a simulation in the COMMAND file.
Remedies under discussion are: