NVIDIA HPC SDK is free, and it got really really good recently.
NVIDIA is phasing out nvprof
and switching to NSIGHT (nsys
). You can not use nvprof
for compute >=8 anymore. nsys
is bundled freely with NVHPC, only the GUI needs to be downloaded seperately here (user account required). I shall be using "Nsight Systems"
Preliminaries:
- Download and install NVHPC and Nsight in your PC. Hopefully the HPC center does this for you.
- Compile your program using NVHPC. I shall be using Quantum Espresso. Although there are no special instructions in the NSIGHT documentation, older NVIDIA Profiler did recommend. I don't know if this helps, but here is the info anyway.
1.5. Profiling CUDA Fortran Applications CUDA Fortran applications compiled with the PGI CUDA Fortran compiler can be profiled by nvprof and the Visual Profiler. In cases where the profiler needs source file and line information (kernel profile analysis, global memory access pattern analysis, divergent execution analysis, etc.), use the "-Mcuda=lineinfo" option when compiling. This option is supported on Linux 64-bit targets in PGI 2019 version 19.1 or later.
Caveat: `-Mcuda=lineinfo` is not compatible with `-cuda` and the compiler refuses to proceed. I disabled `-cuda` in make.inc
In the Quantum Espresso installation, two lines need to be edited:
F90FLAGS = -fast -Mcuda=lineinfo -Mcache_align -Mpreprocess -Mlarge_arrays -mp $(FDFLAGS) $(CUDA_F90FLAGS) $(IFLAGS) $(MODFLAGS) ###ADD -Mcuda
CUDA_F90FLAGS=-Mcuda=lineinfo -gpu=ccall,cuda11.2 $(MOD_FLAG)$(TOPDIR)/external/devxlib/src $(MOD_FLAG)$(TOPDIR)/external/devxlib/include -acc $(MOD_FLAG)$(TOPDIR)/external/devxlib/src ##Remove -cuda and add -Mcuda
HPC Center
- How to run an MPI program using nsys CLI is explained here. The command line argument scheme is a bit nasty, but I finally managed it after a couple of trials. The generated profiling files are huge! Be prepared!
- Basically you add
nsys
after thempirun
, before the task. For example:
mpirun -np 4 nsys profile -o '%h-%p' -w true -t 'cuda,cublas,openacc,openmp,mpi,nvtx' --cudabacktrace all /share/apps/JRTI/q-e/nvhpc/git-7.1-profiling/bin/pw.x -npool 4 -ndiag 2 -ntg 1 -inp /home/obm/TPP-crystal/ground_state/H2TPP-kanoetal-pbesol/in.H2TPP-kanoetal-pbesol > /home/obm/TPP-crystal/ground_state/H2TPP-kanoetal-pbesol/out.H2TPP-kanoetal-pbesol-130922-1450_988
- profile: Profile the program
- -o output file, here it is the host name, process id. Since the profiling data will be copied from the tmp directories of the hosts to your home directory automatically, some mechanism making them unique is necessary.
- -w true: do not block the stdout/stderr. You need this if you want to get the output of espresso in the usual way.
- -t : Things you want to profile. These can change depending on your NVHPC installation version.
- --cudabacktrace all: try to identify routines.
- Once the run is complete,
nsys
will generate huge profiling files i.e.JRTI.cluster-27795.qdrep
one for each MPI process. Copy these back to your workstation (hopefully with a lot of ram and fast disk). Alternatively you can extract specific information from these files and put them in various formats suitable for further analysis usingnsys stats
. See here.
Workstation
- Invoke the ui by
nsys-ui qdrep-file
nsys-ui comes with the Nsight system installation.