combine consecutive subvector 16-byte loads into one 32-byte load