wiki:FpPerformance

Version 4 (modified by pesei, 11 months ago) (diff)

--

FLEXPART Performance

This is the place to document performance tests with different hardware, compilers, compiler options, and run configurations so that we can learn from each other and avoid reinventing the wheel. Please document all relevant parameters.

Hardware Version Compilation Setup Runtime Note
Xeon E5-2690 (A)Fp8.2.3fr if13 (O2) AL-500-300 21:29
Xeon E5-2690 (A)Fp8.2.3fr if13 (O3a) AL-500-300 21:05
Xeon E5-2697 (B)Fp8.2.3fr if13 (O3a) AL-500-300 20:50
Xeon E5-2697 (B)Fp8.2.3fr if13 (O3a) AL-250-300 13:18 /1/
Xeon E5-2697 (B)Fp8.2.3fr if13 (O3a) AL-250-120 14:36
Xeon E5-2697 (B)Fp8.2.3fr if13 (O3a) AL-250-060 20:07

Hardware

(A)
Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz, microcode 1808, cache size: 20480 KB. 2 CPUs with each 8 cores.
(B)
Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz, microcode 57, cache size: 35840 KB. 2 CPUs with each 14 cores.

Compiler

if13 ifort 13.1.2

Compiler Options

(O2) -O2 -mcmodel=medium. grib_api-1.12.3 (compiled with ifort, FCFLAGS = -g -O1 -fp-model precise)
(O3a) -ipo -O3 -mcmodel=medium -no-prec-div -opt-prefetch3. grib_api-1.12.3

Fp Setup

AL-500-300
500k particles, 300 s lsynctime. Output size 17M. COMMAND:
-1 LDIRECT
20170418 000000
20170423 000000
3600 OUTPUT EVERY
3600 TIME AVERAGE OF OUTPUT
300 SAMPLING RATE OF OUTPUT
999999999 TIME CONSTANT FOR PARTICLE SPLITTING
300 SYNCHRONISATION INTERVAL
3.0 CTL
4 IFINE
1 IOUT
0 IPOUT
1 LSUBGRID
0 LCONVECTION
0 LAGESPECTRA
0 IPIN
1 IOUTPUTFOREACHREL
0 IFLUX
0 MDOMAINFILL
1 IND_SOURCE
2 IND_RECEPTOR
0 MQUASILAG
0 NESTED_OUTPUT
0 LIMIT_COND

OUTGRID dimensions: 450 x 300 x 2. Met. input dimensions: = 161 x 81 x 91

AL-250-300
250k particles, otherwise as AL-500-300
AL-250-120
as AL-250-300 except:
240 SAMPLING RATE OF OUTPUT
120 SYNCHRONISATION INTERVAL
1.0 CTL
2 IFINE
AL-250-120
as AL-250-120 except lcsyctime = 60 s

Notes

/1/
implies that in this case runtime = 13.3m + 0.03m * npart(k) (or +1.8 ms per particle)
hosted by ZAMG