(QE 7.1 pw.x): real_space flag and GPU

(QE 7.1 pw.x): real_space flag and GPU

Is it a good idea to turn on real_space when using GPU?

In the last benchmarks, I have noticed the real space implementation on GPU's do not use the GPU resources as efficiently.

Now look at this code excerpt from PW/src/sum_band_gpu.f90:

  CALL start_clock_gpu( 'sum_band:calbec' )
  npw = ngk(ik)
  IF ( .NOT. real_space ) THEN
     CALL using_evc_d(0)
     CALL using_becp_d_auto(2)
     ! calbec computes becp = <vkb_i|psi_j>
!$acc data present(vkb(:,:))
!$acc host_data use_device(vkb)
     CALL calbec_gpu( npw, vkb, evc_d(:,ibnd_start:ibnd_end), becp_d )
!$acc end host_data
!$acc end data
  ELSE
     CALL using_evc(0)
     CALL using_becp_auto(2)
     if (gamma_only) then
        do ibnd = ibnd_start, ibnd_end, 2
           call invfft_orbital_gamma(evc,ibnd,ibnd_end)
           call calbec_rs_gamma(ibnd,ibnd_end,becp%r)
        enddo
        call mp_sum(becp%r,inter_bgrp_comm)
     else
        current_k = ik
        becp%k = (0.d0,0.d0)
        do ibnd = ibnd_start, ibnd_end
           call invfft_orbital_k(evc,ibnd,ibnd_end)
           call calbec_rs_k(ibnd,ibnd_end)
        enddo
       call mp_sum(becp%k,inter_bgrp_comm)
     endif
  ENDIF
  CALL stop_clock_gpu( 'sum_band:calbec' )

It seems the real_space flag disables the openACC and GPU variants. I'll need to profile to see if this makes a huge difference, but at the moment, I am just disabling real_space. TQR seems to call for GPU routines.

English