~40% faster vector shl <4 x i32> on SSE 4.1 Larger improvements for smaller types...