CMAQv4+ CONUS Benchmark Tutorial using 12US1 Domain
18.3.1. Use Cycle Cloud pre-installed with CMAQv5.4+ software and 12US1 Benchmark data.#
Step by step instructions for running the CMAQ 12US1 Benchmark for 2 days on a Cycle Cloud The input files are *.nc4, which requires the netCDF-4 compressed libraries. Instructions are provided below on how to use load module command to obtain the correct version of the I/O API, netCDF-C, netCDF-Fortran libraries.
18.3.2. This method relies on obtaining the code and data from blob storage.#
Note
Information about how to share a snapshot of a Blob Storage account: Share from Blob Storage Account”
You will need to copy the snapshot and create a new blob storage, and then use your Blob Storage as the backup to your Lustre Filesystem.
Use a configuration file from the github by cloning the repo to your local machine#
cd /lustre
sudo mkdir cyclecloud-cmaq
sudo chown username cyclecloud-cmaq
git clone -b main https://github.com/CMASCenter/cyclecloud-cmaq
cd cyclecloud-cmaq
Lustre - Request Public Preview#
Note
Information about the Public Preview for Azure Managed Lustre see: Azure Managed Lustre Benchmarking Lustre
See information on how to join: Azure Managed Lustre - Registration form link
Create Lustre Server#
Blob Storage - Lustre hierarchical storage management#
18.3.3. Update Cycle Cloud#
18.3.4. Log into the new cluster#
Note
Use your username and credentials to login
ssh -Y username@IP-address
18.3.5. Verify Software#
The software is pre-loaded on the /lustre volume of the CycleCloud.
ls /lustre/build
Load the modules
module avail
Output:
---------------------------------------------------------- /usr/share/Modules/modulefiles ----------------------------------------------------------
amd/aocl dot module-git modules mpi/hpcx-v2.9.0 mpi/impi_2021.2.0 mpi/mvapich2-2.3.6 mpi/openmpi-4.1.1 use.own
amd/aocl-2.2.1 gcc-9.2.1 module-info mpi/hpcx mpi/impi-2021 mpi/mvapich2 mpi/openmpi null
-------------------------------------------------------- /shared/build/Modules/modulefiles ---------------------------------------------------------
hdf5-1.10.5/gcc-9.2.1 ioapi-3.2_20200828/gcc-9.2.1-hdf5 ioapi-3.2_20200828/gcc-9.2.1-netcdf netcdf-4.8.1/gcc-9.2.1
Load the modules for the netCDF-4 compressed libraries.
module load ioapi-3.2_20200828/gcc-9.2.1-hdf5
output:
Change the group and ownership permissions on the /lustre/data directory
sudo chown ubuntu /lustre/data
sudo chgrp ubuntu /lustre/data
Create the output directory
mkdir -p /lustre/data/output
18.3.6. Download the input data from the AWS Open Data CMAS Data Warehouse.#
Do a git pull to obtain the latest scripts in the cyclecloud-cmaq repo.
cd /lustre/cyclecloud-cmaq
git pull
cd /shared/cyclecloud-cmaq/s3_scripts
./s3_copy_nosign_2018_12US1_conus_cmas_opendata_to_lustre_20171222_cb6r3.csh
18.3.7. Verify Input Data#
cd /lustre/data_lim/CMAQ_Modeling_Platform_2018/2018_12US1
du -h
Output
40K ./CMAQ_v54+_cb6r5_scripts
44K ./CMAQ_v54+_cracmm_scripts
1.6G ./emis/cb6r3_ae6_20200131_MYR/cmaq_ready/cmv_c1c2_12
2.4G ./emis/cb6r3_ae6_20200131_MYR/cmaq_ready/cmv_c3_12
5.1G ./emis/cb6r3_ae6_20200131_MYR/cmaq_ready/merged_nobeis_norwc
1.4G ./emis/cb6r3_ae6_20200131_MYR/cmaq_ready/othpt
1.3G ./emis/cb6r3_ae6_20200131_MYR/cmaq_ready/pt_oilgas
6.7M ./emis/cb6r3_ae6_20200131_MYR/cmaq_ready/ptagfire
255M ./emis/cb6r3_ae6_20200131_MYR/cmaq_ready/ptegu
19M ./emis/cb6r3_ae6_20200131_MYR/cmaq_ready/ptfire
2.9M ./emis/cb6r3_ae6_20200131_MYR/cmaq_ready/ptfire_grass
3.0M ./emis/cb6r3_ae6_20200131_MYR/cmaq_ready/ptfire_othna
5.9G ./emis/cb6r3_ae6_20200131_MYR/cmaq_ready/ptnonipm
18G ./emis/cb6r3_ae6_20200131_MYR/cmaq_ready
3.5G ./emis/cb6r3_ae6_20200131_MYR/premerged/rwc
3.5G ./emis/cb6r3_ae6_20200131_MYR/premerged
22G ./emis/cb6r3_ae6_20200131_MYR
60K ./emis/emis_dates
22G ./emis
2.3G ./epic
13G ./icbc/CMAQv54_2018_108NHEMI_M3DRY
17G ./icbc
41G ./met/WRFv4.3.3_LTNG_MCIP5.3.3_compressed
41G ./met
4.0G ./misc
697M ./surface
85G .
18.3.8. Examine CMAQ Run Scripts#
The run scripts are available in two locations, one in the CMAQ scripts directory.
Another copy is available in the cyclecloud-cmaq repo.
Copy the run scripts from the repo. Note, there are different run scripts depending on what compute node is used. This tutorial assumes hpc6a-48xlarge is the compute node.
cp /shared/cyclecloud-cmaq/run_scripts/2018_12US1_CMAQv54plus/run_cctm_2018_12US1_v54_cb6r3_ae6.20171222.csh /shared/build/openmpi_gcc/CMAQ_v54+/CCTM/scripts/
Note
The time that it takes the 2 day CONUS benchmark to run will vary based on the number of CPUs used, and the compute node that is being used, and what disks are used for the I/O (shared or lustre). The Benchmark Scaling Plot for hbv3_120 on lustre and shared (include here).
Examine how the run script is configured
head -n 30 /shared/build/openmpi_gcc/CMAQ_v54+/CCTM/scripts/run_cctm_2018_12US1_v54_cb6r3_ae6.20171222.csh
#!/bin/csh -f
## For CycleCloud 120pe
## data on /lustre data directory
## https://dataverse.unc.edu/dataset.xhtml?persistentId=doi:10.15139/S3/LDTWKH
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=64
#SBATCH --exclusive
#SBATCH -J CMAQ
#SBATCH -o /shared/build/openmpi_gcc/CMAQ_v54+/CCTM/scripts/run_cctm5.4+_Bench_2018_12US1_M3DRY_cb6r3_ae6_20200131_MYR.256.16x16pe.2day.20171222start.log
#SBATCH -e /shared/build/openmpi_gcc/CMAQ_v54+/CCTM/scripts/run_cctm5.4+_Bench_2018_12US1_M3DRY_cb6r3_ae6_20200131_MYR.256.16x16pe.2day.20171222start.log
# ===================== CCTMv5.4.X Run Script =========================
# Usage: run.cctm >&! cctm_2018_12US1.log &
#
# To report problems or request help with this script/program:
# http://www.epa.gov/cmaq (EPA CMAQ Website)
# http://www.cmascenter.org (CMAS Website)
# ===================================================================
# ===================================================================
#> Runtime Environment Options
# ===================================================================
echo 'Start Model Run At ' `date`
#> Toggle Diagnostic Mode which will print verbose information to
#> standard output
setenv CTM_DIAG_LVL 0
Note
In this run script, slurm or SBATCH requests 4 nodes, each node with 64 pes, or 4x64 = 576 pes
Verify that the NPCOL and NPROW settings in the script are configured to match what is being requested in the SBATCH commands that tell slurm how many compute nodes to provision. In this case, to run CMAQ using on 256 cpus (SBATCH –nodes=4 and –ntasks-per-node=64), use NPCOL=16 and NPROW=16.
grep NPCOL run_cctm_2018_12US1_v54_cb6r3_ae6.20171222.csh
Output:
#> Horizontal domain decomposition
if ( $PROC == serial ) then
setenv NPCOL_NPROW "1 1"; set NPROCS = 1 # single processor setting
else
@ NPCOL = 16; @ NPROW = 16
@ NPROCS = $NPCOL * $NPROW
setenv NPCOL_NPROW "$NPCOL $NPROW";
endif
18.3.10. Build the code by running the makefile#
cd /shared/build/openmpi_gcc/CMAQ_v54+/CCTM/scripts
Check to see you have the modules loaded
module list
Currently Loaded Modulefiles:
1) gcc-9.2.1 2) mpi/openmpi-4.1.1 3) hdf5-1.10.5/gcc-9.2.1 4) ioapi-3.2_20200828/gcc-9.2.1-hdf5
Run the Make command
make
Verify that the executable has been created
ls -lrt CCTM_v54+.exe
18.3.11. Submit Job to Slurm Queue to run CMAQ on Lustre#
cd /shared/build/openmpi_gcc/CMAQ_v54+/CCTM/scripts
sbatch run_cctm_2018_12US1_v54_cb6r5_ae6.20171222.2x96.ncclassic.retest.csh
Check status of run#
squeue
Output:
It takes about 5-8 minutes for the compute nodes to spin up, after the nodes are available, the status will change from CF to R.
Successfully started run#
squeue
Once the job is successfully running#
Check on the log file status
grep -i 'Processing completed.' run_cctm5.4+_Bench_2018_12US1_M3DRY_cb6r3_ae6_20200131_MYR.256.16x16pe.2day.20171222start.log
Output:
Once the job has completed running the two day benchmark check the log file for the timings.
tail -n 18 run_cctm5.4+_Bench_2018_12US1_cb6r5_ae6_20200131_MYR.192.16x12pe.2day.20171222start.2x96.log
Output:
OUTDIR | /lustre/data_lim/output/output_v54+_cb6r5_ae7_aq_WR413_MYR_gcc_2018_12US1_2x96_classic
==================================
***** CMAQ TIMING REPORT *****
==================================
Start Day: 2017-12-22
End Day: 2017-12-23
Number of Simulation Days: 2
Domain Name: 12US1
Number of Grid Cells: 4803435 (ROW x COL x LAY)
Number of Layers: 35
Number of Processes: 192
All times are in seconds.
Num Day Wall Time
01 2017-12-22 1813.3
02 2017-12-23 2077.7
Total Time = 3891.00
Avg. Time = 1945.50
18.3.13. Submit a minimum of 2 benchmark runs#
NOTE, trying to reproduce this run on HBv120 Cluster am getting slower times.
tail -n 30 run_cctm5.4+_Bench_2018_12US1_cb6r5_ae6_20200131_MYR.192.16x12pe.2day.20171222start.2x96.shared.log
Output
==================================
***** CMAQ TIMING REPORT *****
==================================
Start Day: 2017-12-22
End Day: 2017-12-23
Number of Simulation Days: 2
Domain Name: 12US1
Number of Grid Cells: 4803435 (ROW x COL x LAY)
Number of Layers: 35
Number of Processes: 192
All times are in seconds.
Num Day Wall Time
01 2017-12-22 2549.7
02 2017-12-23 2752.4
Total Time = 5302.10
Avg. Time = 2651.05
Ideally, two CMAQ runs should be submitted to the slurm queue, using two different NPCOLxNPROW configurations, to create output needed for the QA and Post Processing Sections in Chapter 6.