wiki:FpPerformance

FLEXPART Performance

This is the place to document performance tests with different hardware, compilers, compiler options, and run configurations so that we can learn from each other and avoid reinventing the wheel. Please document all relevant parameters.

Hardware Version Compilation Setup Runtime Note
Xeon E5-2690 (A)Fp8.2.3fr if13 (O2) AL-500-300 21:29
Xeon E5-2690 (A)Fp8.2.3fr if13 (O3a) AL-500-300 21:05
Xeon E5-2697 (B)Fp8.2.3fr if13 (O3a) AL-500-300 20:50
Xeon E5-2697 (B)Fp8.2.3fr if13 (O3a) AL-250-300 13:18 /1/
Xeon E5-2697 (B)Fp8.2.3fr if13 (O3a) AL-250-120 14:36
Xeon E5-2697 (B)Fp8.2.3fr if13 (O3a) AL-250-060 20:07
Xeon E5-2690 (A)Fp8.2.3fr if13 (O3a) AL-350-060 33:41
Xeon E5-2690 (A)Fp8.2.3fr if13 (O3b) AL-350-060 30:35

Hardware

(A)
Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz, microcode 1808, cache size: 20480 KB. 2 CPUs with each 8 cores.
(B)
Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz, microcode 57, cache size: 35840 KB. 2 CPUs with each 14 cores.

Compiler

if13 ifort 13.1.2

Compiler Options

(O2)
-O2 -mcmodel=medium. grib_api-1.12.3 (compiled with ifort, FCFLAGS = -g -O1 -fp-model precise)
(O3a)
-ipo -O3 -mcmodel=medium -no-prec-div -opt-prefetch3. grib_api-1.12.3
(O3b)
-O3 -mcmodel=medium -unroll -inline -heap-arrays 32 . grib_api-12.25.0 compiled with ifort and the same optimisation parameters

Fp Setup

AL-500-300
500k particles, 300 s lsynctime. Output size 17M. COMMAND:
-1 LDIRECT
20170418 000000
20170423 000000
3600 OUTPUT EVERY
3600 TIME AVERAGE OF OUTPUT
300 SAMPLING RATE OF OUTPUT
999999999 TIME CONSTANT FOR PARTICLE SPLITTING
300 SYNCHRONISATION INTERVAL
3.0 CTL
4 IFINE
1 IOUT
0 IPOUT
1 LSUBGRID
0 LCONVECTION
0 LAGESPECTRA
0 IPIN
1 IOUTPUTFOREACHREL
0 IFLUX
0 MDOMAINFILL
1 IND_SOURCE
2 IND_RECEPTOR
0 MQUASILAG
0 NESTED_OUTPUT
0 LIMIT_COND

OUTGRID dimensions: 450 x 300 x 2. Met. input dimensions: = 161 x 81 x 91

AL-250-300
250k particles, otherwise as AL-500-300
AL-250-120
as AL-250-300 except:
240 SAMPLING RATE OF OUTPUT
120 SYNCHRONISATION INTERVAL
1.0 CTL
2 IFINE
AL-250-060
as AL-250-120 except lcsyctime = 60 s
AL-350-060
as AL-250-060 except
120 SAMPLING RATE OF OUTPUT
350000 ! number of particles

Notes

/1/
implies that in this case runtime = 13.3m + 0.03m * npart(k) (or +1.8 ms per particle)
Last modified 7 years ago Last modified on Feb 15, 2018, 7:04:03 PM
hosted by ZAMG