source: branches/jerome/src_flexwrf_v3.1/README.txt @ 16

Last change on this file since 16 was 16, checked in by jebri, 11 years ago

sources for flexwrf v3.1

File size: 6.0 KB
Line 
1J. Brioude, Sept 19  2013
2**************************************************************
3To compile flexwrf, choose your compiler in makefile.mom (line 23), the path to the NetCDF library and then type
4make -f makefile.mom mpi  for MPI+OPENMP hybrid run
5make -f makefile.mom omp  for OPENMP parallel run
6make -f makefile.mom serial for a serial run
7********************************************************************
8To run flexwrf, you can pass an argument to the executable that gives the name of the input file.
9for instance
10./flexwrf31_mpi /home/jbrioude/inputfile.txt
11Otherwise, the file flexwrf.input in the current directory is read by default.
12
13Examples of forward and backward runs are available in the examples directory.
14
15
16*****************************************************************
17Versions timeline
18
19version 3.1: bug fix on the sign of sshf in readwind.f90
20             modifications of advance.f90 to limit the vertical velocity from cbl scheme
21             bug fix in write_ncconc.f90
22             modifications of interpol*f90 routines to avoid crashes using tke_partition_hanna.f90 and tke_partition_my.f90
23version 3.0    First public version
24
25version 2.4.1: New modifications on the wet deposition scheme from Petra Seibert
26
27version 2.3.1: a NetCDF format output is implemented.
28
29version 2.2.7: CBL scheme is implemented. a new random generator is implemented.
30
31version 2.0.6:
32-map factors are used in advance.f90 when converting the calculated distance
33into a WRF grid distance.
34-fix on the divergence based vertical wind
35
36version 2.0.5:
37the time over which the kernel is not used has been reduced from 10800 seconds
38to 7200 seconds. Those numbers depend on the horizontal resolution, and a more
39flexible solution might come up in a future version
40version 2.0.4:
41- bug fix for regular output grid
42- IO problems in ASCII have been fixed
43- add the option of running flexpart with an argument that gives the name of
44  the inputfile instead of flexwrf.input
45version 2.0.3:
46- bug fix when flexpart is restarted.
47-bug fix in coordtrafo.f90
48- a new option that let the user decide if the time for the the time average
49  fields from WRF has to be corrected or not.
50
51version 2.0.2:
52- bug fix in sendint2_mpi_old.f90
53- all the *mpi*.f90 have been changed to handle more properly the memory.
54- timemanager_mpi has changed accordingly. Some bug fix too
55- bug fix in writeheader
56- parallelization of calcpar and verttransform.f90, same for the nests.
57
58version 2.0.1:
59-1 option added in flexwrf.input to define the output grid with dxout and dyout
60-fix in readinput.f90 to calculate maxpart more accurately
61
62version 2.0: first OPENMP/MPI version
63
64version 1.0:
65This is a fortran 90 version of FLEXPART.
66Compared to PILT, the version from Jerome Fast available on the NILU flexpart website, several bugs and improvements have been made (not
67necessarily commented) in the subroutines.
68non exhaustive list:
691) optimization of the kein-fritch convective scheme (expensive)
702) possibility to output the flexpart run in a regular lat/lon output grid.
71flexwrf.input has 2 options to let the model know which coordinates are used
72for the output domaine and the release boxes.
733) Differences in earth radius between WRF and WRF-chem is handled.
744) time averaged wind, instantaneous omega or a vertical velocity internally calculated in FLEXPART can be used now.
755) a bug fix in pbl_profile.f due to the variable kappa.
76
77Turb option 2 and 3 from Jerome Fast's version lose mass in the model. Those
78options are not recommended.
79
80***********************************************************************
81General comments on The hybrid version of flexpart wrf:
82This version includes a parallelized hybrid version of FLEXPART that can be
83used with:
84- 1 node (1 computer) with multi threads using openmp in shared memory,
85- or several nodes (computers) in distributed memory (using mpi) and several threads in shared memory (using openmp).
86if a mpi library is not available with your compiler, use makefile.nompi to compile flexwrf
87
88The system variable OMP_NUM_THREADS has to be set before running the model to define the number of thread used.
89it can also be fixed in timemanager*f90.
90If not, flexwrf20_mpi will use 1 thread.
91
92When submitting a job to several nodes, mpiexec or mpirun needs to know that 1 task has to be allocated per node to let openmp doing the work within each node in shared memory.
93See submit.sh as an example.
94
95Compared to the single node version, this version includes modifications of:
96
97- flexwrf.f90 that is renamed into flexwrf_mpi.f90
98- timemanager.f90 that is renamed into timemanager_mpi.f90
99- the interpol*f90 and hanna* has been modified.
100- the routines *mpi*.f90 are used to send or receive data between nodes.
101
102The most important modifications are in timemanager_mpi.f90, initialize.f90 and advance.f90.
103search for JB in timemanager_mpi.f90 to have additional comments.
104in advance.f90, I modified the way the random number is picked up (line 187). I use a simple count and the id of the thread instead of the random pick up that uses ran3.
105If the series of random number is output for a give release box (uncomment lines 195 to 198), the distribution is quite good, and I don't see any bigger bias that the one in the single thread version.
106of course, the distribution is less and less random when you increase the number of nodes or threads.
107
108
109*********************************************************
110performance:
111this is the performance of the loop line 581 in timemanager_mpi.f90 that calculates the trajectories.
112I use the version v74 as the reference (single thread, fortran 77).
113There is a loss in performance between v74 and v90 because of the temporary variables th_* that has to be used as private variables in timemanager_mpi.f90
114                v74
115v90 1thread     0.96
116v90 2threads    1.86
117v90 4threads    3.57
118v90 8threads    6.22
119
120performance of the communication between nodes:
121depends on the system. The super computer that I use can transfer about 1Gb in 1 second.
122in timemanager_mpi.f90, the output lines 540 and 885 give the time needed by the system to communicate between nodes. using 100 millions particles and say 4 nodes, it takes about 1 second.
123
Note: See TracBrowser for help on using the repository browser.
hosted by ZAMG