marcov wrote:It's not just memory savings. It is also when taking copies, the bitpacked arrays are faster. (less bytes copied is faster, and so is the initial initialization)
The copies will be much faster. For the initial initialization it depends on how you do it. Doing it as a copy operation will be much faster,
The bitarray will be copied using memcpy in the same way as the integer one. It will just have 1/32 of the bytes. Note that you always have one such operation (initialization, filling with zeroes, test if set contains any elements)
but doing it through the bit access mechanism will be slower.
I'm not sure that is measurable. I already showed above that bitarrays can be also one and two-instruction sequences, though more expensive one (bt* instead of mov).
It did occur to me that for random access, the more compact bit array might have a significant speed advantage.
There are cache related aspects there too yes. But the impacts are hard to predict, and probably depend a lot on usage. That is also why beginner benchmarking like putting such instruction in a 10 million loop is pretty useless here. Since such benchmarks often totally ignore the fact that include/exclude operations are not the only operations and allow the array to be fully cached after a few ops.
As a rule of thumb, bigger datasize is more expensive (e.g. 32-bit apps become about 5-10% slower by just recompiling them for 64-bit, if all circumstances are kept the same(*). Very data intensive apps more)
Still that is not exactly the same as this, since this has different instruction selection. (and the tradeoffs of that are different for every processor generation). I haven't benchmarked such instructions myself since 486 times, so while I lean towards the bitarray solution being better overall, that is mostly based on the cheaper init, copy and passing by value, and assume the base ops are in the same category (cache effects in elementary operations vs more expensive instructions average out on typical usage).
Now that I think of it, testing if a set is zero is another such special complex operation that happens enough to optimize it (e.g. with a repne scasb/l)
I mostly use (relative small) bitsets in nearly all businesscode, and I know there they are much faster as bitoperations (since now they registerable and easily passed to other procedures).
Pascal bitsets have long been limited to 256, bigger ones are based on a classwrapper around an array of <native word size> (bitpacked, but always a multiple of wordsize). I still hope I'll get native sets> 256 one day. (but probably only if I do it myself).
The main reason for having bigger native set is simply that when you use an enum as e.g. a state, and use sets of the enum as masks, you now have to rewrite code as soon as the enum hits 256 elements. That is an annoying limitation.
(*) one has to be very careful interpreting benchmarks here. The minimum SSE2 level of AMD64 accelerates some procedures heavily that are often rate determining. While you can often simply turn on SSE2 for the compiled code, you can't make handcoded routines in the runtime change