== FLEXPART Performance == This is the place to document performance tests with different hardware, compilers, compiler options, and run configurations so that we can learn from each other and avoid reinventing the wheel. Please document all relevant parameters. ||= '''Hardware''' =||= '''Version''' =||= '''Compilation''' =||= '''Setup''' =||= '''Runtime''' =||= Note=|| || Xeon E5-2690 (A)||Fp8.2.3fr|| if13 (O2)|| AL-500-300 || '''21:29'''|| || Xeon E5-2690 (A)||Fp8.2.3fr|| if13 (O3a)|| AL-500-300 || '''21:05'''|| || Xeon E5-2697 (B)||Fp8.2.3fr|| if13 (O3a)|| AL-500-300 || '''20:50'''|| || Xeon E5-2697 (B)||Fp8.2.3fr|| if13 (O3a)|| AL-250-300 || '''13:18'''|| /1/|| || Xeon E5-2697 (B)||Fp8.2.3fr|| if13 (O3a)|| AL-250-120 || '''14:36'''|| || || Xeon E5-2697 (B)||Fp8.2.3fr|| if13 (O3a)|| AL-250-060 || '''20:07'''|| || || Xeon E5-2690 (A)||Fp8.2.3fr|| if13 (O3a)|| AL-350-060 || '''33:41'''|| || || Xeon E5-2690 (A)||Fp8.2.3fr|| if13 (O3b)|| AL-350-060 || '''30:35'''|| || === Hardware === (A):: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz, microcode 1808, cache size: 20480 KB. 2 CPUs with each 8 cores. (B):: Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz, microcode 57, cache size: 35840 KB. 2 CPUs with each 14 cores. === Compiler === '''if13''' ifort 13.1.2 === Compiler Options === '''(O2)''':: `-O2 -mcmodel=medium`. grib_api-1.12.3 (compiled with ifort, `FCFLAGS = -g -O1 -fp-model precise`) '''(O3a)''':: `-ipo -O3 -mcmodel=medium -no-prec-div -opt-prefetch3`. grib_api-1.12.3 '''(O3b)''':: `-O3 -mcmodel=medium -unroll -inline -heap-arrays 32 `. grib_api-12.25.0 compiled with ifort and the same optimisation parameters === Fp Setup === AL-500-300:: 500k particles, 300 s `lsynctime`. Output size 17M. `COMMAND`: {{{ -1 LDIRECT 20170418 000000 20170423 000000 3600 OUTPUT EVERY 3600 TIME AVERAGE OF OUTPUT 300 SAMPLING RATE OF OUTPUT 999999999 TIME CONSTANT FOR PARTICLE SPLITTING 300 SYNCHRONISATION INTERVAL 3.0 CTL 4 IFINE 1 IOUT 0 IPOUT 1 LSUBGRID 0 LCONVECTION 0 LAGESPECTRA 0 IPIN 1 IOUTPUTFOREACHREL 0 IFLUX 0 MDOMAINFILL 1 IND_SOURCE 2 IND_RECEPTOR 0 MQUASILAG 0 NESTED_OUTPUT 0 LIMIT_COND }}} `OUTGRID` dimensions: 450 x 300 x 2. Met. input dimensions: = 161 x 81 x 91 AL-250-300:: 250k particles, otherwise as AL-500-300 AL-250-120:: as AL-250-300 except: {{{ 240 SAMPLING RATE OF OUTPUT 120 SYNCHRONISATION INTERVAL 1.0 CTL 2 IFINE }}} AL-250-060:: as AL-250-120 except lcsyctime = 60 s AL-350-060:: as AL-250-060 except {{{ 120 SAMPLING RATE OF OUTPUT 350000 ! number of particles }}} === Notes === /1/:: implies that in this case runtime = 13.3m + 0.03m * npart(k) (or +1.8 ms per particle)