Contents Menu Expand Light mode Dark mode Auto light/dark mode
azure-cmaq documentation
Logo
azure-cmaq documentation

Contents:

  • 1. Introductory Tutorial
    • 1.1. Create an Azure Account
    • 1.2. Sign up for a Developer Azure Support Plan
    • 1.3. Use Azure CLI to examine your quota
    • 1.4. Create a virtual machine
    • 1.5. Login to Virtual Machine
    • 1.6. Delete the virtual machine and all of the associated resources by deleting the resource group.
  • 2. System Requirements
    • 2.1. System Requirements for a Single Virtual Machine or Cycle Cloud Cluster
    • 2.2. Software Requirements for CMAQ on Single VM or CycleCloud Cluster
    • 2.3. Storage Options
    • 2.4. Recommended Cycle Cloud Configuration for CONUS Domain 12US1
  • 3. Create Single VM using HB120rs_v3 Tutorial
    • 3.1. Create a HB120rs_v3 Virtual Machine
    • 3.2. Login to the Virtual Machine
    • 3.3. Mount the disk on the server as /shared using the instructions on the following link:
    • 3.4. Alternatively, you can create an nvme stripped disk that has faster performance.
    • 3.5. Download the Input data from the S3 Bucket
    • 3.6. Change shell to use tcsh
    • 3.7. Create Environment Module for Libraries
    • 3.8. Install and Build CMAQ
    • 3.9. Copy the run scripts from the repo to the run directory
    • 3.10. Run CMAQ interactively using the following command:
    • 3.11. Created another single VM using HBv120_v2 and ran again
    • 3.12. Created another VM using the HB120v3 cpus
    • 3.13. Verify that the correct number of cpus are installed using lscpu
    • 3.14. Timing information
    • 3.15. Review performance metrics in the Azure portal
    • 3.16. IF your performance is much slower than this, then we recommend that you terminate the resource group and re-build the VM
  • 4. Create CycleCloud HB120rs_v3 Cluster
    • 4.1. Create Cyclecloud CMAQ Cluster
  • 5. CMAQv5.4+ Benchmark on HBv3_120 compute nodes and beeond
    • 5.1. Use Cycle Cloud with CMAQv5.4+ software and 12US1 Benchmark data.
    • 5.2. Log into the new cluster
    • 5.3. Download the input data from the AWS Open Data CMAS Data Warehouse using the aws copy command.
    • 5.4. Verify Input Data
    • 5.5. Install CMAQv5.4+
    • 5.6. Copy and Examine CMAQ Run Scripts
    • 5.7. Submit Job to Slurm Queue to run CMAQ on beeond
    • 5.8. submit job to run on 1 node x 96 processors
    • 5.9. Submit job to run on 3 nodes
    • 5.10. Check how quickly the processing is being completed
    • 5.11. Check results when job has completed successfully
    • 5.12. Check to see if spot VMs are available
    • 5.13. Unsuccessful slurm status messages
    • 5.14. Change to HB176_v4 compute node
    • 5.15. To recover from failure use the terminate cluster option
    • 5.16. If SLURM jobs are in a bad state
    • 5.17. Run DESID CMAQ on hbv3_120 using the beeond filesystem
  • 6. Scripts to run combine and post processing
  • 7. Scripts to post-process CMAQ output
  • 8. Install R, Rscript and Packages
  • 9. Install Anaconda on the /shared/build directory
  • 10. QA CMAQ
    • 10.1. Quality Assurance Checks for Successful CMAQ Run on CycleCloud
    • 10.2. R analysis scripts
    • 10.3. Run Jupyter Notebook to analyze difference between with DESID Emissions and the base case (no emission reduction)
  • 11. Compare Timing of CMAQ Routines
    • 11.1. Parse timings from the log file
  • 12. Copy Output to S3 Bucket
    • 12.1. Copy Output Data and Run script logs to S3 Bucket
  • 13. Logout and Delete CycleCloud
    • 13.1. Link to Azure Instructions on how to logout and delete cyclecloud
  • 14. Performance Optimization
    • 14.1. Right-sizing Compute Nodes for a Single Virtual Machine.
    • 14.2. An explanation of why a scaling analysis is required for Single Node
    • 14.3. Benchmark Scaling Plots using Single Virtual Machine HBv120
    • 14.4. Right-sizing Compute Nodes for the CycleCloud
    • 14.5. An explanation of why a scaling analysis is required for Multinode or Parallel MPI Codes
    • 14.6. Slurm Compute Node Provisioning
    • 14.7. Benchmark Scaling Plots using CycleCloud
  • 15. Additional Resources
    • 15.1. Cycle Cloud Resources
    • 15.2. Help Resources for CMAQ
    • 15.3. Resources from Azure for diagnosing issues with running the Cycle Cloud
    • 15.4. Frequently Asked Questions
    • 15.5. Computing on the Cloud References
  • 16. Future Work
    • 16.1. List of ideas for future work
  • 17. Contribute to this Tutorial
    • 17.1. Contribute to Azure-cmaq Documentation
  • 18. Optional instructions for Creating CycleCloud Cluster using /shared and /lustre
    • 18.1. Advanced Tutorial
      • 18.1.1. Modify Cyclecloud CMAQ Cluster
      • 18.1.2. Install CMAQ and pre-requisite libraries on linux on Alinux
      • 18.1.3. Install CMAQ and pre-requisite libraries on linux on SUSE Linux
      • 18.1.4. Configuring selected storage and obtaining input data
      • 18.1.5. Copy the run scripts from the CycleCloud repo to run on HBv120
      • 18.1.6. Edit the run script to run on 192 pes
      • 18.1.7. Check the status in the queue
      • 18.1.8. check the timings while the job is still running using the following command
      • 18.1.9. When the job has completed, use tail to view the timing from the log file.
      • 18.1.10. Check whether the scheduler thinks there are cpus or vcpus
      • 18.1.11. Edit the run script to run on 96 pes
      • 18.1.12. Check the timing after run completed
      • 18.1.13. Copy the run scripts from the CycleCloud repo
      • 18.1.14. Run the CONUS Domain on 176 pes
      • 18.1.15. Check the status in the queue
      • 18.1.16. check the timings while the job is still running using the following command
      • 18.1.17. When the job has completed, use tail to view the timing from the log file.
      • 18.1.18. Check whether the scheduler thinks there are cpus or vcpus
    • 18.2. CMAQv5.4+ Benchmark on HB176_v4 compute nodes and shared
      • 18.2.1. Use Cycle Cloud pre-installed with CMAQv5.4+ software and 12US1 Benchmark data.
      • 18.2.2. Log into the new cluster
      • 18.2.3. Download the input data from the AWS Open Data CMAS Data Warehouse.
      • 18.2.4. Verify Input Data
      • 18.2.5. Examine CMAQ Run Scripts
      • 18.2.6. Submit Job to Slurm Queue to run CMAQ on 2 nodes
      • 18.2.7. Submit a run script to run on the shared volume
    • 18.3. CMAQv5.4+ Benchmark on HBv3_120 compute nodes and lustre
      • 18.3.1. Use Cycle Cloud pre-installed with CMAQv5.4+ software and 12US1 Benchmark data.
      • 18.3.2. This method relies on obtaining the code and data from blob storage.
      • 18.3.3. Update Cycle Cloud
      • 18.3.4. Log into the new cluster
      • 18.3.5. Verify Software
      • 18.3.6. Download the input data from the AWS Open Data CMAS Data Warehouse.
      • 18.3.7. Verify Input Data
      • 18.3.8. Examine CMAQ Run Scripts
      • 18.3.9. To run on the Shared Volume a code modification is required.
      • 18.3.10. Build the code by running the makefile
      • 18.3.11. Submit Job to Slurm Queue to run CMAQ on Lustre
      • 18.3.12. Submit a run script to run on the shared volume
      • 18.3.13. Submit a minimum of 2 benchmark runs
Back to top
Edit this page

16. Future Work#

  • 16.1. List of ideas for future work
Next
16.1. List of ideas for future work
Previous
15.5. Computing on the Cloud References
Copyright © 2022, CMAS Center
Made with Sphinx and @pradyunsg's Furo
Last updated on 2024-10-04 17:55:49 +0000