wiki:FpPerformance

Version 2 (modified by pesei, 6 years ago) (diff)

add runs 1-4

FLEXPART Performance

This is the place to document performance tests with different hardware, compilers, compiler options, and run configurations so that we can learn from each other and avoid reinventing the wheel. Please document all relevant parameters.

Hardware Version Compilation Setup Runtime Note
Xeon E5-2690 (A)Fp8.2.3fr if13 (O2) AL-500-300 21:29
Xeon E5-2690 (A)Fp8.2.3fr if13 (O3a) AL-500-300 21:05
Xeon E5-2697 (B)Fp8.2.3fr if13 (O3a) AL-500-300 20:50
Xeon E5-2697 (B)Fp8.2.3fr if13 (O3a) AL-250-300 13:18 /1/

Hardware

(A)
Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz, microcode 1808, cache size: 20480 KB. 2 CPUs with each 8 cores.
(B)
Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz, microcode 57, cache size: 35840 KB. 2 CPUs with each 14 cores.

Compiler

if13 ifort 13.1.2

Compiler Options

(O2) -O2 -mcmodel=medium. grib_api-1.12.3 (compiled with ifort, FCFLAGS = -g -O1 -fp-model precise)
(O3a) -ipo -O3 -mcmodel=medium -no-prec-div -opt-prefetch3. grib_api-1.12.3

Fp Setup

AL-500-300
500k particles, 300 s lsynctime. Output size 17M. (more details to come).
AL-250-300
250k particles, otherwise as AL-500-300

Notes

/1/
implies that in this case runtime = 13.3m + 0.03m * npart(k) (or +1.8 ms per particle)
hosted by ZAMG