18.1.13. Copy the run scripts from the CycleCloud repo#
Note, the run scripts are tailored to the Compute Node.
Change directories to where the run scripts are available from the git repo.
cd /shared/cyclecloud-cmaq/run_scripts/HB120v3_12US1_CMAQv54plus
Copy the run scripts to the run directory
cp *.csh /shared/build/openmpi_gcc/CMAQ_v54/CCTM/scripts/`
18.1.14. Run the CONUS Domain on 176 pes#
cd /shared/build/openmpi_gcc/CMAQ_v54/CCTM/scripts/
sbatch run_cctm_2018_12US1_v54_cb6r5_ae6.20171222.1x176.ncclassic.csh
Note, it will take about 3-5 minutes for the compute notes to start up This is reflected in the Status (ST) of PD (pending), with the NODELIST reason being that it is configuring the partitions for the cluster
18.1.15. Check the status in the queue#
squeue
output:
[lizadams@CMAQSlurmHC44rsAlmaLinux-scheduler scripts]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
2 hpc CMAQ lizadams CF 0:02 1 CycleCloud8-5-hpc-1
After 5 minutes the status will change once the compute nodes have been created and the job is running
squeue
output:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
6 hpc CMAQ lizadams R 0:37 5 cmaqslurmhc44rsalmalinux-hpc-pg0-[1-5]
The 176 pe job should take 85 minutes to run (42 minutes per day)
Note, if the job does not get scheduled, examine the slurm logs
sudo vi /var/log/slurmctld/slurmctld.log
sudo vi //var/log/slurmctld/resume.log
18.1.16. check the timings while the job is still running using the following command#
grep 'Processing completed' CTM_LOG_001*
output:
Processing completed... 29.9047 seconds
Processing completed... 4.7678 seconds
Processing completed... 4.8123 seconds
Processing completed... 4.7888 seconds
Processing completed... 4.7633 seconds
Processing completed... 4.8243 seconds
18.1.17. When the job has completed, use tail to view the timing from the log file.#
tail -n 30 /shared/build/openmpi_gcc/CMAQ_v54/CCTM/scripts/run_cctm5.4+_Bench_2018_12US1_cb6r5_ae6_20200131_MYR.176.16x11pe.2day.20171222start.1x176.log
output:
18.1.18. Check whether the scheduler thinks there are cpus or vcpus#
sinfo -lN
output:
Tue Jan 09 19:11:04 2024
NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON
CycleCloud8-5-hpc-1 1 hpc* allocated# 176 176:1:1 747110 0 1 cloud none
CycleCloud8-5-hpc-2 1 hpc* idle~ 176 176:1:1 747110 0 1 cloud none
CycleCloud8-5-hpc-3 1 hpc* idle~ 176 176:1:1 747110 0 1 cloud none
CycleCloud8-5-hpc-4 1 hpc* idle~ 176 176:1:1 747110 0 1 cloud none
CycleCloud8-5-htc-1 1 htc idle~ 2 2:1:1 3072 0 1 cloud none
CycleCloud8-5-htc-2 1 htc idle~ 2 2:1:1 3072 0 1 cloud none