Now these are interesting. JRTI has a NVIDA A100 GPU with 80GB of ram. It performs best with 8xCPU per GPU, with NPOOL=8. Extreme oversubscription seems to be good for the architecture. Still Forever-Diamond runs circles around that GPU with its gaming cards. It seems quantity over quality is more important for Q-E. The more shocking result is from the CPU-only test. Triumphant Coal is a relatively old-ish workstation with 32 CPUs. It seems to outperform the GPU! This is very strange, am I doing something wrong?
Note: As of 9/2022, the NDIAG for the GPU seems still not implemented. Increasing NDIAG in forever diamond does not change the timings in any meaningful manner.