Intermediate Tutorial: Run CMAQ from HBv120 Compute Node
Instructions are provided to build and install CMAQ on HBv120 compute node installed from HPC AlmaLinux 8.5 HPC-Gen2 Image that contains modules for git, openmpi and gcc. The compute node does not have a SLURM scheduler on it, so jobs are run interactively from the command line.
Instructions to install data and CMAQ libraries and model are provided along with sample run scripts to run CMAQ on 16, 36, 90, and 120 processors on a single HBv120 instance.
This will provide users with experience using the Azure Portal to create a Virtual Machine, select AlmaLinux 8.5 HPC - Gen2 as the image, select the size of the VM as HB120rs_v2 - 120 vcpus, 456 GiB memory, using an SSH private key to login and install and run CMAQ.
Using this method, the user needs to be careful to start and stop the Virtual Machine and only have it run while doing the intial installation, and while running CMAQ. The full HBv120 instance will incur charges as long as it is on, even if a job isn’t running on it.
This is different than the Azure Cycle-Cloud, where if CMAQ is not running in the queue, then the HBv120 Compute nodes are down, and not incurring costs.
3.1. Create a HB120rs_v2 Virtual Machine#
Login to Azure Portal
Select Create a Virtual Machine
Click on See all images next to Image and use the search bar to search for HPC. Look for the AlmaLinux 8.5 HPC. Select either the Gen 1 or Gen 2, and click. That option should now pre-populate the form.
Select Size - Standard_HB1120rs_v2 - 120 vcpus, 456 GiB memory ($2,628.0/monthly)
Enter a Virtual Machine Name in the text box
Use your username or azureuser
Select Authentication type - SSH public key
Select SSH public key source - Generate new key pair
Click on Next > Disks
Click on Create and attach a new disk - select a 1TB disk
Select Checkbox to Delete disk with VM
(note, this will create the disk, but you will need to login and mount the disk as the shared volume following the instructions below.)
Click on Next > Management
Select check box for Identity > System assigned managed identity
Click on Next > Advanced
don’t need to change anything
Click on Next > Tags
don’t change anything
Click on Next > Review and create
!!!
Click on download private key and provision resource
Click on Go to Resource once the deployment is completed.
3.2. Login to the Virtual Machine#
Change the permissions on the public key using command
chmod 400 HPC-CMAQ-AlmaLinux-HB120_key.pem
Login to the Virtual Machine using ssh to the IP address using the public key.
ssh -Y -i ./xxxxxxx_key.pem username@xx.xx.xx.xx
3.4. Alternatively, you can create an nvme stripped disk that has faster performance.#
mkdir -p /mnt/nvme
mdadm --create /dev/md10 --level 0 --raid-devices 2 /dev/nvme0n1 /dev/nvme1n1
mkfs.xfs /dev/md10
mount /dev/md10 /mnt/nvme
chmod 1777 /mnt/nvme
That should create a file system with about 1.8TiB
3.5. Obtain the Cyclecloud-cmaq code from github#
Load the git module
module load module-git
If you do not see git available as a module, you may need to install it as follows:
sudo yum install git
3.5.1. Load the openmpi module#
module load mpi/openmpi-4.1.1
3.5.2. Install Cycle Cloud Repo#
git clone -b main https://github.com/CMASCenter/cyclecloud-cmaq.git
3.5.3. Install and build netcdf C, netcdf Fortran, I/O API, and CMAQ#
cd /shared/cyclecloud-cmaq
Install netcdf-C and netcdf-Fortran
./gcc_install.csh
If successful, you will see the following output, that at the bottom shows what versions of the netCDF library were installed.
+-------------------------------------------------------------+
| Congratulations! You have successfully installed the netCDF |
| Fortran libraries. |
| |
| You can use script "nf-config" to find out the relevant |
| compiler options to build your application. Enter |
| |
| nf-config --help |
| |
| for additional information. |
| |
| CAUTION: |
| |
| If you have not already run "make check", then we strongly |
| recommend you do so. It does not take very long. |
| |
| Before using netCDF to store important data, test your |
| build with "make check". |
| |
| NetCDF is tested nightly on many platforms at Unidata |
| but your platform is probably different in some ways. |
| |
| If any tests fail, please see the netCDF web site: |
| https://www.unidata.ucar.edu/software/netcdf/ |
| |
| NetCDF is developed and maintained at the Unidata Program |
| Center. Unidata provides a broad array of data and software |
| tools for use in geoscience education and research. |
| https://www.unidata.ucar.edu |
+-------------------------------------------------------------+
make[3]: Leaving directory '/shared/build/netcdf-fortran-4.5.4'
make[2]: Leaving directory '/shared/build/netcdf-fortran-4.5.4'
make[1]: Leaving directory '/shared/build/netcdf-fortran-4.5.4'
netCDF 4.8.1
netCDF-Fortran 4.5.4
Install I/O API
./gcc_ioapi.csh
Find what operating system is on the system:
cat /etc/os-release
Output
NAME="AlmaLinux"
VERSION="8.5 (Arctic Sphynx)"
ID="almalinux"
ID_LIKE="rhel centos fedora"
VERSION_ID="8.5"
PLATFORM_ID="platform:el8"
PRETTY_NAME="AlmaLinux 8.5 (Arctic Sphynx)"
ANSI_COLOR="0;34"
CPE_NAME="cpe:/o:almalinux:almalinux:8::baseos"
HOME_URL="https://almalinux.org/"
DOCUMENTATION_URL="https://wiki.almalinux.org/"
BUG_REPORT_URL="https://bugs.almalinux.org/"
ALMALINUX_MANTISBT_PROJECT="AlmaLinux-8"
ALMALINUX_MANTISBT_PROJECT_VERSION="8.5"
3.6. Change shell to use tcsh#
sudo usermod -s /bin/tcsh azureuser
Log out and then log back in to have the shell take effect.
Copy a file to set paths
cd /shared/cyclecloud-cmaq
cp dot.cshrc.vm ~/.cshrc
3.7. Create Environment Module for Libraries#
There are two steps required to create your own custome module:
write a module file
add a line to your ~/.cshrc to update the MODULEPATH
Create a new custom module that will be loaded including any dependencies using the following command:
module load ioapi-3.2_20200828/gcc-9.2.1-netcdf
Step 1: Create the module file.
First, create a path to store the module file. The path must contain /Modules/modulefiles/ and should have the general form
/
mkdir /shared/build/Modules/modulefiles/ioapi-3.2_20200828
Next, create the module file and save it in the directory above.
cd /shared/build/Modules/modulefiles/ioapi-3.2_20200828
vim gcc-9.2.1-netcdf
Contents of gcc-9.2.1-netcdf:
#%Module
proc ModulesHelp { } {
puts stderr "This module adds ioapi-3.2_20200828/gcc-9.2.1 to your path"
}
module-whatis "This module adds ioapi-3.2_20200828/gcc-9.2.1 to your path\n"
set basedir "/shared/build/ioapi-3.2_branch_20200828/"
prepend-path PATH "${basedir}/Linux2_x86_64gfort"
prepend-path LD_LIBRARY_PATH "${basedir}/ioapi/fixed_src"
module load mpi/openmpi-4.1.1
module load gcc-9.2.1
module load netcdf-4.8.1/gcc-9.2.1
The example module file above sets two evironment variables and loads two system modules and a custom module (that we also need to define).
The modules update the PATH and LD_LIBRARY_PATH.
Now create the custom module to define the netCDF libraries that were used to build I/O API.
mkdir /shared/build/Modules/modulefiles/netcdf-4.8.1
vim gcc-9.2.1
Contents of gcc-9.2.1
#%Module
proc ModulesHelp { } {
puts stderr "This module adds netcdf-4.8.1/gcc-9.2.1 to your path"
}
module-whatis "This module adds netcdf-4.8.1/gcc-9.2.1 to your path\n"
set basedir "/shared/build/netcdf"
prepend-path PATH "${basedir}/bin"
prepend-path LD_LIBRARY_PATH "${basedir}/lib"
module load mpi/openmpi-4.1.1
module load gcc-9.2.1
Step 2: Add the module path to MODULEPATH.
Now that the two custom module files have been created, add the following line to your ~/.cshrc file so that they can be found:
module use --append /shared/build/Modules/modulefiles
Step 3: View the modules available after creation of the new module
The module avail command shows the paths to the module files on a given cluster.
module avail
Step 4: Load the new module
module load ioapi-3.2_20200828/gcc-9.2.1-netcdf
Output:
Loading ioapi-3.2_20200828/gcc-9.2.1-netcdf
Loading requirement: gcc-9.2.1 mpi/openmpi-4.1.1 netcdf-4.8.1/gcc-9.2.1
Verify that the libraries required for netCDF and I/O API have been added to the $LD_LIBRARY_PATH environment variable
echo $LD_LIBRARY_PATH
Output:
/shared/build/netcdf/lib:/opt/openmpi-4.1.1/lib:/opt/rh/gcc-toolset-9/root/lib64:/shared/build/ioapi-3.2_branch_20200828//ioapi/fixed_src::
Verify that the I/O API bin directory and netCDF bin directory that you specified in the custom module has been added to the $PATH environment variable
echo $PATH
Output
/shared/build/netcdf/bin:/opt/openmpi-4.1.1/bin:/opt/rh/gcc-toolset-9/root/bin:/shared/build/ioapi-3.2_branch_20200828//Linux2_x86_64gfort:/usr/share/Modules/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/slurm/bin/:/usr/local/bin:/opt/slurm/bin/:/usr/local/bin
3.8. Install and Build CMAQ#
./gcc_cmaq.csh
Verfify that the executable was successfully built.
ls /shared/build/openmpi_gcc/CMAQ_v533/CCTM/scripts/BLD_CCTM_v533_gcc/*.exe
Output
/shared/build/openmpi_gcc/CMAQ_v533/CCTM/scripts/BLD_CCTM_v533_gcc/CCTM_v533.exe
3.9. Copy the run scripts from the repo to the run directory#
cd /shared/build/openmpi_gcc/CMAQ_v533/CCTM/scripts
cp /shared/cyclecloud-cmaq/run_scripts/HB120v3/*pe.csh .
List the scripts available
ls -rlt *pe.csh*
Output
run_cctm_2016_12US2.90pe.csh
run_cctm_2016_12US2.36pe.csh
run_cctm_2016_12US2.16pe.csh
run_cctm_2016_12US2.120pe.csh
3.10. Download the Input data from the S3 Bucket#
3.10.1. Install aws command line#
see Install AWS CLI
cd /shared/build
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install
3.10.2. Install the input data using the s3 script#
cd /shared/cyclecloud-cmaq/s3_scripts/
./s3_copy_nosign_conus_cmas_opendata_to_shared.csh
Note, this Virtual Machine does not have Slurm installed or configured.
3.11. Run CMAQ interactively using the following command:#
cd /shared/build/openmpi_gcc/CMAQ_v533/CCTM/scripts
./run_cctm_2016_12US2.120pe.csh |& tee ./run_cctm_2016_12US2.120pe.log
When the run has completed, record the timing of the two day benchmark.
tail -n 30 run_cctm_2016_12US2.120pe.log
Output:
==================================
***** CMAQ TIMING REPORT *****
==================================
Start Day: 2015-12-22
End Day: 2015-12-23
Number of Simulation Days: 2
Domain Name: 12US2
Number of Grid Cells: 3409560 (ROW x COL x LAY)
Number of Layers: 35
Number of Processes: 120
All times are in seconds.
Num Day Wall Time
01 2015-12-22 2458.35
02 2015-12-23 2205.08
Total Time = 4663.43
Avg. Time = 2331.71
If runs are submitted immediately after a successful completion of a run, then you may skey the scaling results. It would be ideal to wait 30 minutes before running a second job.
3.11.1. Run second job interactively using the following command:#
./run_cctm_2016_12US2.90pe.csh | & tee ./run_cctm_2016_12US2.90pe.log
Output
==================================
***** CMAQ TIMING REPORT *****
==================================
Start Day: 2015-12-22
End Day: 2015-12-23
Number of Simulation Days: 2
Domain Name: 12US2
Number of Grid Cells: 3409560 (ROW x COL x LAY)
Number of Layers: 35
Number of Processes: 90
All times are in seconds.
Num Day Wall Time
01 2015-12-22 2786.21
02 2015-12-23 2417.74
Total Time = 5203.95
Avg. Time = 2601.97