5. CMAQv5.4+ Benchmark on HBv3_120 compute nodes and beeond#
Run CMAQv5.4+ on a using pre-loaded software and input data on Beeond using HBv3_120 Cycle Cloud. Running using the Beeond fileystem, which uses the /nvme or fast file systems on each compute node is free, and gives performance that is almost as good as running on lustre. In addition, the beeond fileystem is created and destroyed as part of the run script. For the lustre managed filesystem, we did not have a way to create the filesystem when the job was started and stop it when it stopped, so we incurred costs for leaving it on for months.
Note about Lustre managed filesystem, the cost varies by performance and has a minimum size that varies by performance type, for the benchmarks run, we used a 250MB/s Lustre filesystems that was 20 TB in size?.
Lustre 250 MB/s for provisioned for 20 TiB our account was charged $3,386 / month
Due to these significant costs, we do not recommend using Lustre, but instead recommend the Beeond filesystem, that is free.
- 5.1. Use Cycle Cloud with CMAQv5.4+ software and 12US1 Benchmark data.
- 5.2. Log into the new cluster
- 5.3. Download the input data from the AWS Open Data CMAS Data Warehouse using the aws copy command.
- 5.4. Verify Input Data
- 5.5. Install CMAQv5.4+
- 5.6. Copy and Examine CMAQ Run Scripts
- 5.7. Submit Job to Slurm Queue to run CMAQ on beeond
- 5.8. submit job to run on 1 node x 96 processors
- 5.9. Submit job to run on 3 nodes
- 5.10. Check how quickly the processing is being completed
- 5.11. Check results when job has completed successfully
- 5.12. Check to see if spot VMs are available
- 5.13. Unsuccessful slurm status messages
- 5.14. Change to HB176_v4 compute node
- 5.15. To recover from failure use the terminate cluster option
- 5.16. If SLURM jobs are in a bad state
- 5.17. Run DESID CMAQ on hbv3_120 using the beeond filesystem