AMDGPU/SI: Add s_waitcnt at the end of non-void functions