Add a quick and dirty "loop aligner pass". x86 uses it to align its loops to 16-byte...