Exactly the same timings. However, I forgot something: Naked code has no overhead...!
Code: Select all
Function paritychkA naked ( wordvar As UShort ) As Byte
ASM
mov eax, [esp+4] ' wordvar
xor al, ah
setnp al
ret 4
End ASM
End Function
And that one beats the native popcnt function!
Code: Select all
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
1765 milliseconds for paritychk FZ, sum= 9680484
59 milliseconds for paritychkFZA, sum= 9680484
60 milliseconds for paritychkFZA, sum= 9680484
59 milliseconds for paritychkFZA, sum= 9680484
197 milliseconds for paritychk BI, sum= 9680484
349 milliseconds for paritychk J0, sum= 9680484 (cmp ecx, 16)
350 milliseconds for paritychk J0, sum= 9680484 (dec ecx)
350 milliseconds for paritychk J0, sum= 9680484 (cmp ecx, 16)
354 milliseconds for paritychk J0, sum= 9680484 (dec ecx)
356 milliseconds for paritychk J0, sum= 9680484 (cmp ecx, 16)
351 milliseconds for paritychk J0, sum= 9680484 (dec ecx)
96 milliseconds for MB PopCount, sum= 9680484
61 milliseconds for paritychk J1, sum= 9680484 (native popcnt instruction)
However,
PopCount and popcnt do more than just return parity: they count the bits set.