On my old machine the improvement is from 7 nanosecs per OR to 5ns and then to 1.3ns.
Code: Select all
99999999 elements
asm: 638 ms for or'ing 99999999 elements
asm: 521 ms for or'ing 99999999 elements
asm: 442 ms for or'ing 99999999 elements
asm: 518 ms for or'ing 99999999 elements
asm: 466 ms for or'ing 99999999 elements
4095
15791
8187
11977
12247
14803
9085
6078
6143
10239
Code: Select all
99999999 elements
Gcc: 702 ms for or'ing 99999999 elements
Gcc: 677 ms for or'ing 99999999 elements
Gcc: 714 ms for or'ing 99999999 elements
Gcc: 706 ms for or'ing 99999999 elements
Gcc: 635 ms for or'ing 99999999 elements
Code: Select all
99999999 elements
asm: 131 ms for or'ing 99999999 elements
asm: 149 ms for or'ing 99999999 elements
asm: 136 ms for or'ing 99999999 elements
asm: 127 ms for or'ing 99999999 elements
asm: 124 ms for or'ing 99999999 elements