*********** Quick Start *********** ``flex_extract`` is a command-line tool. In the first versions, it was started via a korn shell script and since version 6, the entry point was a python script. From version 7.1, a bash shell script was implemented to call ``flex_extract`` with the command-line parameters. To submit an extraction job, change the working directory to the subdirectory ``Run`` (directly under the ``flex_extract_vX.X`` root directory, where ``X.X`` is the version number): .. code-block:: bash cd /Run Within this directory you can find everything you need to modify and run ``flex_extract``. The following tree shows a shortened list of directories and important files. The ``*`` serves as a wildcard. The brackets ``[]`` indicate that the file is present only in certain modes of application. .. code-block:: bash Run ├── Control │   ├── CONTROL_* ├── Jobscripts │   ├── compilejob.ksh │   ├── job.ksh │   ├── [joboper.ksh] ├── Workspace │   ├── CERA_example │   │   ├── CE000908* ├── [ECMWF_ENV] ├── run_local.sh └── run.sh The ``Jobscripts`` directory is used to store the Korn shell job scripts generated by a ``flex_extract`` run in the **Remote** or **Gateway** mode. They are used to submit the setup information to the ECMWF server and start the jobs in ECMWF's batch mode. The typical user must not touch these files. They are generated from template files which are stored in the ``Templates`` directory under ``flex_extract_vX.X``. Usually there will be a ``compilejob.ksh`` and a ``job.ksh`` script which are explained in the section :doc:`Documentation/input`. In the rare case of operational data extraction there will be a ``joboper.ksh`` which reads time information from environment variables at the ECMWF servers. The ``Controls`` directory contains a number of sample ``CONTROL`` files. They cover the current range of possible kinds of extractions. Some parameters in the ``CONTROL`` files can be adapted and some others should not be changed. In this :doc:`quick_start` guide we explain how an extraction with ``flex_extract`` can be started in the different :doc:`Documentation/Overview/app_modes` and point out some specifics of each dataset and ``CONTROL`` file. Directly under ``Run`` you find the files ``run.sh`` and ``run_local.sh`` and according to your selected :doc:`Documentation/Overview/app_modes` there might also be a file named ``ECMWF_ENV`` for the user credentials to quickly and automatically access ECMWF servers. From version 7.1 on, the ``run.sh`` (or ``run_local.sh``) script is the main entry point to ``flex_extract``. .. note:: Note that for experienced users (or users of older versions), it is still possible to start ``flex_extract`` directly via the ``submit.py`` script in directory ``flex_extract_vX.X/Source/Python``. Job preparation =============== To actually start a job with ``flex_extract`` it is sufficient to start either ``run.sh`` or ``run_local.sh``. Data sets and access modes are selected in ``CONTROL`` files and within the user section of the ``run`` scripts. One should select one of the sample ``CONTROL`` files. The following sections describes the differences in the application modes and where the results will be stored. Remote and gateway modes ------------------------ For member-state users it is recommended to use the *remote* or *gateway* mode, especially for more demanding tasks, which retrieve and convert the data on ECMWF machines; only the final output files are transferrred to the local host. Remote mode The only difference between both modes is the users working location. In the *remote* mode you have to login to the ECMWF server and then go to the ``Run`` directory as shown above. At ECMWF servers ``flex_extract`` is installed in the ``$HOME`` directory. However, to be able to start the program you have to load the ``Python3`` environment with the module system first. .. code-block:: bash # Remote mode ssh -X @ecaccess.ecmwf.int .. code-block:: bash # On ECMWF server [@ecgb11 ~]$ module load python3 [@ecgb11 ~]$ cd flex_extract_vX.X/Run Gateway mode For the gateway mode you have to log in on the gateway server and go to the ``Run`` directory of ``flex_extract``: .. code-block:: bash # Gateway mode ssh @ cd /Run From here on the working process is the same for both modes. For your first submission you should use one of the example ``CONTROL`` files stored in the ``Control`` directory. We recommend to extract *CERA-20C* data since they usually guarantee quick results and are best for testing reasons. Therefore open the ``run.sh`` file and modify the parameter block marked in the file as shown below: .. code-block:: bash # ----------------------------------------------------------------- # AVAILABLE COMMANDLINE ARGUMENTS TO SET # # THE USER HAS TO SPECIFY THESE PARAMETERS: QUEUE='ecgate' START_DATE=None END_DATE=None DATE_CHUNK=None JOB_CHUNK=3 BASETIME=None STEP=None LEVELIST=None AREA=None INPUTDIR=None OUTPUTDIR=None PP_ID=None JOB_TEMPLATE='job.temp' CONTROLFILE='CONTROL_CERA' DEBUG=0 REQUEST=2 PUBLIC=0 This would retrieve a one day (08.09.2000) *CERA-20C* dataset with 3 hourly temporal resolution and a small 1° domain over Europe. Since the ``ectrans`` parameter is set to ``1`` the resulting output files will be transferred to the local gateway into the path stored in the destination (SEE INSTRUCTIONS FROM INSTALLATION). The parameters listed in the ``run.sh`` file would overwrite existing settings in the ``CONTROL`` file. To start the retrieval you only have to start the script by: .. code-block:: bash ./run.sh ``Flex_extract`` will print some information about the job. If there is no error in the submission to the ECMWF server you will see something like this: .. code-block:: bash ---- On-demand mode! ---- The job id is: 10627807 You should get an email per job with subject flex.hostname.pid FLEX_EXTRACT JOB SCRIPT IS SUBMITED! Once submitted you can check the progress of the submitted job using ``ecaccess-job-list``. You should get an email after the job is finished with a detailed protocol of what was done. In case the job fails you will receive an email with the subject ``ERROR!`` and the job name. You can then check for information in the email or you can check on ECMWF server in the ``$SCRATCH`` directory for debugging information. .. code-block:: bash cd $SCRATCH ls -rthl The last command lists the most recent logs and temporary retrieval directories (usually ``pythonXXXXX``, where XXXXX is the process id). Under ``pythonXXXXX`` a copy of the ``CONTROL`` file is stored under the name ``CONTROL``, the protocol is stored in the file ``prot`` and the temporary files as well as the resulting files are stored in a directory ``work``. The original name of the ``CONTROL`` file is stored in this new file under parameter ``controlfile``. .. code-block:: bash :caption: "Example structure of ``flex_extract`` output directory on ECMWF servers." pythonXXXXX ├── CONTROL ├── prot ├── work │   ├── temporary files │   ├── CE000908* (resulting files) If the job was submitted to the HPC ( ``queue=cca`` or ``queue=ccb`` ) you may login to the HPC and look into the directory ``/scratch/ms/ECGID/ECUID/.ecaccess_do_not_remove`` for job logs. The working directories are deleted after job failure and thus normally cannot be accessed. To check if the resulting files are still transferred to local gateway server you can use the command ``ecaccess-ectrans-list`` or check the destination path for resulting files on your local gateway server. Local mode ---------- To get to know the working process and to start your first submission you could use one of the example ``CONTROL`` files stored in the ``Control`` directory as they are. For quick results and for testing reasons it is recommended to extract *CERA-20C* data. Open the ``run_local.sh`` file and modify the parameter block marked in the file as shown below. The differences are highlighted. +-----------------------------------------------+-----------------------------------------------+ | Take this for **member-state user** | Take this for **public user** | +-----------------------------------------------+-----------------------------------------------+ | .. code-block:: bash | .. code-block:: bash | | :emphasize-lines: 16,20,23 | :emphasize-lines: 16,20,23 | | | | | # -----------------------------------------| # -----------------------------------------| | # AVAILABLE COMMANDLINE ARGUMENTS TO SET | # AVAILABLE COMMANDLINE ARGUMENTS TO SET | | # | # | | # THE USER HAS TO SPECIFY THESE PARAMETERs:| # THE USER HAS TO SPECIFY THESE PARAMETERs:| | # | # | | | | | QUEUE='' | QUEUE='' | | START_DATE=None | START_DATE=None | | END_DATE=None | END_DATE=None | | DATE_CHUNK=None | DATE_CHUNK=None | | JOB_CHUNK=None | JOB_CHUNK=None | | BASETIME=None | BASETIME=None | | STEP=None | STEP=None | | LEVELIST=None | LEVELIST=None | | AREA=None | AREA=None | | INPUTDIR='./Workspace/CERA' | INPUTDIR='./Workspace/CERApublic' | | OUTPUTDIR=None | OUTPUTDIR=None | | PP_ID=None | PP_ID=None | | JOB_TEMPLATE='' | JOB_TEMPLATE='' | | CONTROLFILE='CONTROL_CERA' | CONTROLFILE='CONTROL_CERA.public' | | DEBUG=0 | DEBUG=0 | | REQUEST=0 | REQUEST=0 | | PUBLIC=0 | PUBLIC=1 | | | | +-----------------------------------------------+-----------------------------------------------+ This would retrieve a one day (08.09.2000) *CERA-20C* dataset with 3 hourly temporal resolution and a small 1° domain over Europe. The destination location for this retrieval will be within the ``Workspace`` directory within ``Run``. This can be changed to whatever path you like. The parameters listed in ``run_local.sh`` would overwrite existing settings in the ``CONTROL`` file. To start the retrieval you then start the script by: .. code-block:: bash ./run_local.sh While job submission on the local host is convenient and easy to monitor (on standard output), there are a few caveats with this option: 1. There is a maximum size of 20GB for single retrieval via ECMWF Web API. Normally this is not a problem but for global fields with T1279 resolution and hourly time steps the limit may already apply. 2. If the retrieved MARS files are large but the resulting files are relative small (small local domain) then the retrieval to the local host may be inefficient since all data must be transferred via the Internet. This scenario applies most notably if ``etadot`` has to be calculated via the continuity equation as this requires global fields even if the domain is local. In this case job submission via ecgate might be a better choice. It really depends on the use patterns and also on the internet connection speed. Selection and adjustment of ``CONTROL`` files ============================================= This section describes how to work with the ``CONTROL`` files. A detailed explanation of ``CONTROL`` file parameters and naming compositions can be found `here `_. The more accurately the ``CONTROL`` file describes the retrieval needed, the fewer command-line parameters are needed to be set in the ``run`` scripts. With version ``7.1`` all ``CONTROL`` file parameters have default values. They can be found in section `CONTROL parameters `_ or in the ``CONTROL.documentation`` file within the ``Control`` directory. Only those parameters which need to be changed for a dataset retrieval needs to be set in a ``CONTROL`` file! The limitation of a dataset to be retrieved should be done very cautiously. The datasets can differ in many ways and vary over the time in resolution and parameterisations methods, especially the operational model cycles improves through a lot of changes over the time. If you are not familiar with the data it might be useful or necessary to check for availability of data in ECMWF’s MARS: - **Public users** can use a web mask to check on data or list available data at this `Public datasets web interface `_. - **Member state users** can check availability of data online in the `MARS catalogue `_. There you can select step by step what data suits your needs. This would be the most straightforeward way of checking for available data and therefore limit the possibility of ``flex_extract`` to fail. The following figure gives an example how the web interface would look like: .. _ref-fig-mars-catalogue-ss: .. figure:: _files/MARS_catalogue_snapshot.png Additionally, you can find a lot of helpful links to dataset documentations, direct links to specific dataset web catalogues or further general information at the `link collection `_ in the ECMWF data section. ``Flex_extract`` is specialised to retrieve a limited number of datasets, namely *ERA-Interim*, *CERA-20C*, *ERA5* and *HRES (operational data)* as well as the *ENS (operational data, 15-day forecast)*. The limitation relates mainly to the dataset itself, the stream (what kind of forecast or what subset of dataset) and the experiment number. Mostly, the experiment number is equal to ``1`` to signal that the actual version should be used. The next level of differentiation would be the field type, level type and time period. ``Flex_extract`` currently only supports the main streams for the re-analysis datasets and provides extraction of different streams for the operational dataset. The possibilities of compositions of dataset and stream selection are represented by the current list of example ``CONTROL`` files. You can see this in the naming of the example files: .. code-block:: bash :caption: "Current example ``CONTROL`` files distributed with ``flex_extract``. " CONTROL_CERA CONTROL_CERA.global CONTROL_CERA.public CONTROL_EA5 CONTROL_EA5.global CONTROL_EI CONTROL_EI.global CONTROL_EI.public CONTROL_OD.ELDA.FC.eta.ens.double CONTROL_OD.ENFO.CF CONTROL_OD.ENFO.CV CONTROL_OD.ENFO.PF CONTROL_OD.ENFO.PF.36hours CONTROL_OD.ENFO.PF.ens CONTROL_OD.OPER.4V.operational CONTROL_OD.OPER.FC.36hours CONTROL_OD.OPER.FC.eta.global CONTROL_OD.OPER.FC.eta.highres CONTROL_OD.OPER.FC.gauss.highres CONTROL_OD.OPER.FC.operational CONTROL_OD.OPER.FC.twiceaday.1hourly CONTROL_OD.OPER.FC.twiceaday.3hourly The main differences and features in the datasets are listed in the table shown below: .. _ref-tab-dataset-cmp: .. figure:: _files/dataset_cmp_table.png A common problem for beginners in retrieving ECMWF datasets is the mismatch in the definition of these parameters. For example, if you would like to retrieve operational data before ``June 25th 2013`` and set the maximum level to ``137`` you will get an error because this number of levels was first introduced at this effective day. So, be cautious in the combination of space and time resolution as well as the field types which are not available all the time. .. note:: Sometimes it might not be clear how specific parameters in the control file must be set in terms of format. Please see the description of the parameters in section `CONTROL parameters `_ or have a look at the ECMWF user documentation for `MARS keywords `_ In the following we shortly discuss the main retrieval opportunities of the different datasets and categoize the ``CONTROL`` files. Public datasets --------------- The main difference in the definition of a ``CONRTOL`` file for a public dataset is the setting of the parameter ``DATASET``. This specification enables the selection of a public dataset in MARS. Otherwise the request would not find the dataset. For the two public datasets *CERA-20C* and *ERA-Interim* an example file with the ending ``.public`` is provided and can be used straightaway. .. code-block:: bash CONTROL_CERA.public CONTROL_EI.public For *CERA-20C* it seems that there are no differences in the dataset against the full dataset, while the *public ERA-Interim* has only analysis fields every 6 hour without filling forecasts in between for model levels. Therefore it is only possible to retrieve 6-hourly data for *public ERA-Interim*. .. note:: In general, *ERA5* is a public dataset. However, since the model levels are not yet publicly available, it is not possible to retrieve *ERA5* data to drive the ``FLEXPART`` model. As soon as this is possible it will be announced at the community website and per newsletter. CERA ---- For this dataset it is important to keep in mind that the dataset is available for the period 09/1901 until 12/2010 and the temporal resolution is limited to 3-hourly fields. It is also a pure ensemble data assimilation dataset and is stored under the ``enda`` stream. It has ``10`` ensemble members. The example ``CONTROL`` files will only select the first member (``number=0``). You may change this to another number or a list of numbers (e.g. ``NUMBER 0/to/10``). Another important difference to all other datasets is the forecast starting time which is 18 UTC. Which means that the forecast in *CERA-20C* for flux fields is 12 hours long. Since the forecast extends over a single day we need to extract one day in advance and one day subsequently. This is automatically done in ``flex_extract``. ERA 5 ----- This is the newest re-analysis dataset and has a temporal resolution of 1-hourly analysis fields. Up to date it is available until April 2019 with regular release of new months. The original horizontal resolution is ``0.28125°`` which needs some caution in the definition of the domain, since the length of the domain in longitude or latitude direction must be an exact multiple of the resolution. It might be easier for users to use ``0.25`` for the resolution which MARS will automatically interpolate. The forecast starting time is ``06/18 UTC`` which is important for the flux data. This should be set in the ``CONTROL`` file via the ``ACCTIME 06/18`` parameter in correspondence with ``ACCMAXSTEP 12`` and ``ACCTYPE FC``. .. note:: We know that *ERA5* also has an ensemble data assimilation system but this is not yet retrievable with ``flex_extract`` since the deaccumulation of the flux fields works differently in this stream. Ensemble retrieval for *ERA5* is a future ToDo. ERA-Interim ----------- This re-analysis dataset will exceed its end of production at 31st August 2019! It is then available from 1st January 1979 to 31st August 2019. The ``etadot`` is not available in this dataset. Therefore ``flex_extract`` must select the ``GAUSS`` parameter to retrieve the divergence field in addition. The vertical velocity is the calculated with the continuity equation in the Fortran program ``calc_etadot``. Since the analysis fields are only available for every 6th hour, the dataset can be made 3 hourly by adding forecast fields in between. No ensemble members are available. Operational data ---------------- This is the real time atmospheric model in high resolution with a 10-day forecast. This means it underwent regular adaptations and improvements over the years. Hence, retrieving data from this dataset needs extra attention in selecting correct settings of parameter. See :ref:`ref-tab-dataset-cmp` for the most important parameters. Nowadays, it is available 1 hourly by filling the gaps of the 6 hourly analysis fields with 1 hourly forecast fields. Since 4th June 2008 the eta coordinate is directly available so that ``ETA`` should be set to ``1`` to save computation time. The horizontal resolution can be up to ``0.1°`` and in combination with ``137`` vertical levels can lead to troubles in retrieving this high resolution dataset in terms of job duration and quota exceedence. It is recommended to submit such high resolution cases for single day retrievals (see ``JOB_CHUNK`` parameter in ``run.sh`` script) to avoid job failures due to exceeding limits. ``CONTROL`` files for normal daily retrievals with a mix of analysis and forecast fields are listed below: .. code-block:: bash CONTROL_OD.OPER.4V.eta.global CONTROL_OD.OPER.FC.eta.global CONTROL_OD.OPER.FC.eta.highres CONTROL_OD.OPER.FC.gauss.highres These files defines the minimum number of parameters necessary to retrieve a daily subset. The setup of field types is optimal and should only be changed if the user understands what he does. The grid, domain and temporal resolution can be changed according to availability. .. note:: Please see `Information about MARS retrievement `_ to get to know hints about retrieval efficiency and troubleshooting. Pure forecast It is possible to retrieve pure forecasts exceeding a day. The forecast period available depends on the date and forecast field type. Please use MARS catalogue to check the availability. Below are some examples for 36 hour forecast of *Forecast (FC)*, *Control forecast (CF)* and *Calibration/Validation forecast (CV)*. The *CV* field type was only available 3-hourly from 2006 up to 2016. It is recommended to use the *CF* type since this is available from 1992 (3-hourly) on up to today in 1-hourly temporal resolution. *CV* and *CF* field types belong to the *Ensemble prediction system (ENFO)* which contain 50 ensemble members. Please be aware that in this case it is necessary to set the specific type for flux fields explicitly, otherwise it could select a default value which might be different from what you expect! .. code-block:: bash CONTROL_OD.ENFO.CF.36hours CONTROL_OD.ENFO.CV.36hours CONTROL_OD.OPER.FC.36hours Half-day retrievals If a forecast for just half a day is wanted it can be done by substituting the analysis fields also by forecast fields as shown in files with ``twiceaday`` in it. They produce a full day retrieval with pure 12 hour forecasts twice a day. It is also possible to use the operational version which would get the time information from ECMWF's environmental variables and therefore get the newest forecast per day. This version uses a ``BASETIME`` parameter which tells MARS to extract the exact 12 hours upfront to the selected date. If the ``CONTROL`` file with ``basetime`` in the filename is used this can be done for any other date too. .. code-block:: bash CONTROL_OD.OPER.FC.eta.basetime CONTROL_OD.OPER.FC.operational CONTROL_OD.OPER.FC.twiceaday.1hourly CONTROL_OD.OPER.FC.twiceaday.3hourly Ensemble members The retrieval of ensemble members were already mentioned in the pure forecast section and for *CERA-20C* data. In this ``flex_extract`` version there is an additional possibility to retrieve the *Ensemble Long window Data Assimilation (ELDA)* stream from the real-time dataset. This model version has (up to May 2019) 25 ensemble members and a control run (``number 0``). Starting from June 2019 it has 50 ensemble members. Therefore we created the possibility to double up the 25 ensemble members (before June 2019) to 50 members by taking the original 25 members from MARS and subtracting 2 times the difference between the member value and the control value. This is done by selecting the parameter ``DOUBLEELDA`` and set it to ``1``. .. code-block:: bash CONTROL_OD.ELDA.FC.eta.ens.double CONTROL_OD.ENFO.PF.ens Specific features ----------------- rrint Decides if the precipitation flux data uses the old (``0``) or new (``1``) disaggregation scheme. See :doc:`Documentation/disagg` for explanaition. cwc Decides if the total cloud water content will be retrieved (set to ``1``) in addition. This is the sum of cloud liquid and cloud ice water content. addpar With this parameter an additional list of 2-dimensional, non-flux parameters can be retrieved. Use format ``param1/param2/.../paramx`` to list the parameters. Please be consistent in using either the parameter IDs or the short names. doubleelda Use this to double the ensemble member number by adding further disturbance to each member. debug If set to ``1`` all temporary files were kept at the end. Otherwise everything except the final output files will be deleted. request This produces an extra *csv* file ``mars_requests.csv`` where the content of each mars request of the job is stored. Useful for debugging and documentation. mailfail At default the mail is send to the mail connected with the user account. Add additional email addresses if you want. But as soon as you enter a new mail, the default will be overwritten. If you would like to keep the mail from your user account, please add ``${USER}`` to the list ( comma seperated ) or mail addresses. Hints for definition of some parameter combinations --------------------------------------------------- Field types and times This combination is very important. It defines the temporal resolution and which field type is extracted per time step. The time declaration for analysis (AN) fields uses the times of the specific analysis and (forecast time) steps have to be ``0``. The forecast field types (e.g. FC, CF, CV, PF) need to declare a combination of (forescast start) times and the (forecast) steps. Both of them together defines the actual time step. It is important to know the forecast starting times for the dataset to be retrieved, since they are different. In general it is enough to give information for the exact time steps, but it is also possible to have more time step combinations of ``TYPE``, ``TIME`` and ``STEP`` because the temporal (hourly) resolution with the ``DTIME`` parameter will select the correct combinations. .. code-block:: bash :caption: Example of a setting for the field types and temporal resolution. DTIME 3 TYPE AN FC FC FC AN FC FC FC TIME 00 00 00 00 12 12 12 12 STEP 00 03 06 09 00 03 06 09 Vertical velocity The vertical velocity for ``FLEXPART`` is not directly available from MARS. Therefore it has to be calculated. There are a couple of different options. The following parameters are responsible for the selection. See :doc:`Documentation/vertco` for a detailed explanation. The ``ETADIFF``, ``OMEGA`` and ``OMEGADIFF`` versions are only recommended for debugging and testing reasons. Usually it is a decision between ``GAUSS`` and ``ETA``, where for ``GAUSS`` spectral fields of the horizontal wind fields and the divergence are to be retrieved and used with the continuity equation to calculate the vertical velocity. For ``ETA`` the latitude/longitude fields of horizontal wind fields and eta-coordinate are to be retrieved. It is recommended to use ``ETA`` where possible due to a reduced computation time. .. code-block:: bash :caption: Example setting for the vertical coordinate retrieval. GAUSS 0 ETA 1 ETADIFF 0 DPDETA 1 OMEGA 0 OMEGADIFF 0 Grid resolution and domain The grid and domain selection depends on each other. The grid can be defined in the format of normal degrees (e.g. ``1.``) or as in older versions by 1/1000. degrees (e.g. ``1000`` for ``1°``). After selecting the grid, the domain has to be defined in a way that the length of the domain in longitude or latitude direction must be an exact multiple of the grid. The horizontal resolution for spectral fields will be set by the parameter ``RESOL``. For information about how to select an appropriate value you can read the explanation of the MARS keyword `here `_ and in `this table `_. .. code-block:: bash :caption: Example setting for a northern hemisphere domain with a grid of ``0.25°``. GRID 0.25 RESOL 799 SMOOTH 0 UPPER 90. LOWER 0. LEFT -179.75 RIGHT 180. Flux data The flux fields are accumulated forecast fields all the time. Since some re-analysis dataset nowadays have complete set of analysis fields in their temporal resolution it was important to define a new parameter set to define the flux fields since the information could not be taken from ``TYPE``, ``TIME`` and ``STEP`` any longer. Select a forecast field type ``ACCTYPE``, the forecast starting time ``ACCTIME`` and the maximum forecast step ``ACCMAXSTEP``. The ``DTIME`` parameter defines the temporal resolution for the whole period. .. code-block:: bash :caption: Example setting for the definition of flux fields. DTIME 3 ACCTYPE FC ACCTIME 00/12 ACCMAXSTEP 36 .. toctree:: :hidden: :maxdepth: 2 .. user_guide/oper_modes .. user_guide/ecmwf .. user_guide/how_to .. user_guide/control_templates