4.5. Copy the run scripts from the CycleCloud repo#

Note, the run scripts are tailored to the Compute Node. This assumes the cluster was built with HC44rs compute nodes.

Change directories to where the run scripts are available from the git repo.

cd /shared/cyclecloud-cmaq/run_scripts/HC44rs

Copy the run scripts to the run directory

cp * /shared/build/openmpi_gcc/CMAQ_v533/CCTM/scripts/

4.6. Run the CONUS Domain on 180 pes#

cd /shared/build/openmpi_gcc/CMAQ_v533/CCTM/scripts/

sbatch run_cctm_2016_12US2.180pe.csh

Note, it will take about 3-5 minutes for the compute notes to start up This is reflected in the Status (ST) of PD (pending), with the NODELIST reason being that it is configuring the partitions for the cluster

4.7. Check the status in the queue#

squeue

output:

[lizadams@CMAQSlurmHC44rsAlmaLinux-scheduler scripts]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                 6       hpc     CMAQ lizadams CF       5:03      5 cmaqslurmhc44rsalmalinux-hpc-pg0-[1-5]

After 5 minutes the status will change once the compute nodes have been created and the job is running

squeue

output:

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                 6       hpc     CMAQ lizadams  R       0:37      5 cmaqslurmhc44rsalmalinux-hpc-pg0-[1-5]

The 180 pe job should take 60 minutes to run (30 minutes per day)

Note, if the job does not get scheduled, examine the slurm logs

sudo vi /var/log/slurmctld/slurmctld.log

sudo vi //var/log/slurmctld/resume.log

4.8. check the timings while the job is still running using the following command#

grep 'Processing completed' CTM_LOG_001*

output:

   Processing completed...    4.6 seconds
            Processing completed...    4.8 seconds
            Processing completed...    4.8 seconds
            Processing completed...    5.2 seconds
            Processing completed...    4.4 seconds
            Processing completed...    5.0 seconds
            Processing completed...    4.6 seconds
            Processing completed...    4.7 seconds
            Processing completed...    4.7 seconds
            Processing completed...    5.1 seconds

4.9. When the job has completed, use tail to view the timing from the log file.#

tail /shared/build/openmpi_gcc/CMAQ_v533/CCTM/scripts/run_cctmv5.3.3_Bench_2016_12US2.2x90.10x18pe.2day.log

output:

==================================
  ***** CMAQ TIMING REPORT *****
==================================
Start Day: 2015-12-22
End Day:   2015-12-23
Number of Simulation Days: 2
Domain Name:               12US2
Number of Grid Cells:      3409560  (ROW x COL x LAY)
Number of Layers:          35
Number of Processes:       180
   All times are in seconds.

Num  Day        Wall Time
01   2015-12-22   2097.37
02   2015-12-23   1809.84
     Total Time = 3907.21
      Avg. Time = 1953.60

4.10. Check whether the scheduler thinks there are cpus or vcpus#

sinfo -lN

output:

Thu Feb 17 14:53:19 2022
NODELIST             NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON              
cmaq-hbv3-hpc-pg0-1      1      hpc*       idle% 120   120:1:1 435814        0      1    cloud none                
cmaq-hbv3-hpc-pg0-2      1      hpc*       idle% 120   120:1:1 435814        0      1    cloud none                
cmaq-hbv3-hpc-pg0-3      1      hpc*       idle% 120   120:1:1 435814        0      1    cloud none                
cmaq-hbv3-htc-1          1       htc       idle~ 1       1:1:2   3072        0      1    cloud none