2nd attempt, fixing SSE4.1 issues and implementing feedback from duncan.