FLEXPART Performance
This is the place to document performance tests with different hardware, compilers, compiler options, and run configurations so that we can learn from each other and avoid reinventing the wheel. Please document all relevant parameters.
Hardware | Version | Compilation | Setup | Runtime | Note |
---|---|---|---|---|---|
Xeon E5-2690 (A) | Fp8.2.3fr | if13 (O2) | AL-500-300 | 21:29 | |
Xeon E5-2690 (A) | Fp8.2.3fr | if13 (O3a) | AL-500-300 | 21:05 | |
Xeon E5-2697 (B) | Fp8.2.3fr | if13 (O3a) | AL-500-300 | 20:50 | |
Xeon E5-2697 (B) | Fp8.2.3fr | if13 (O3a) | AL-250-300 | 13:18 | /1/ |
Xeon E5-2697 (B) | Fp8.2.3fr | if13 (O3a) | AL-250-120 | 14:36 | |
Xeon E5-2697 (B) | Fp8.2.3fr | if13 (O3a) | AL-250-060 | 20:07 | |
Xeon E5-2690 (A) | Fp8.2.3fr | if13 (O3a) | AL-350-060 | 33:41 | |
Xeon E5-2690 (A) | Fp8.2.3fr | if13 (O3b) | AL-350-060 | 30:35 |
Hardware
- (A)
- Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz, microcode 1808, cache size: 20480 KB. 2 CPUs with each 8 cores.
- (B)
- Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz, microcode 57, cache size: 35840 KB. 2 CPUs with each 14 cores.
Compiler
if13 ifort 13.1.2
Compiler Options
- (O2)
- -O2 -mcmodel=medium. grib_api-1.12.3 (compiled with ifort, FCFLAGS = -g -O1 -fp-model precise)
- (O3a)
- -ipo -O3 -mcmodel=medium -no-prec-div -opt-prefetch3. grib_api-1.12.3
- (O3b)
- -O3 -mcmodel=medium -unroll -inline -heap-arrays 32 . grib_api-12.25.0 compiled with ifort and the same optimisation parameters
Fp Setup
- AL-500-300
- 500k particles, 300 s lsynctime. Output size 17M. COMMAND:
-1 LDIRECT 20170418 000000 20170423 000000 3600 OUTPUT EVERY 3600 TIME AVERAGE OF OUTPUT 300 SAMPLING RATE OF OUTPUT 999999999 TIME CONSTANT FOR PARTICLE SPLITTING 300 SYNCHRONISATION INTERVAL 3.0 CTL 4 IFINE 1 IOUT 0 IPOUT 1 LSUBGRID 0 LCONVECTION 0 LAGESPECTRA 0 IPIN 1 IOUTPUTFOREACHREL 0 IFLUX 0 MDOMAINFILL 1 IND_SOURCE 2 IND_RECEPTOR 0 MQUASILAG 0 NESTED_OUTPUT 0 LIMIT_COND
OUTGRID dimensions: 450 x 300 x 2. Met. input dimensions: = 161 x 81 x 91
- AL-250-300
- 250k particles, otherwise as AL-500-300
- AL-250-120
- as AL-250-300 except:
240 SAMPLING RATE OF OUTPUT 120 SYNCHRONISATION INTERVAL 1.0 CTL 2 IFINE
- AL-250-060
- as AL-250-120 except lcsyctime = 60 s
- AL-350-060
- as AL-250-060 except
120 SAMPLING RATE OF OUTPUT 350000 ! number of particles
Notes
- /1/
- implies that in this case runtime = 13.3m + 0.03m * npart(k) (or +1.8 ms per particle)
Last modified 7 years ago
Last modified on Feb 15, 2018, 7:04:03 PM