bitarray

Post your FreeBASIC source, examples, tips and tricks here. Please don’t post code without including an explanation.
AGS
Posts: 1284
Joined: Sep 25, 2007 0:26
Location: the Netherlands

bitarray

Post by AGS »

A bitarray can be used for storing an array of boolean values. It uses 1 bit to store one boolean value. You can find implementations of bitarrays in many a compiler compiler package (and in the code of quite a few compilers as well).

This particular implementation of a bitarray was taken from coco/r ( http://ssw.jku.at/Coco/ ). coco/r is a compiler compiler that, given the description of a language, creates a top down parser and a scanner to recognize the described language.

Copyright notice has been copied verbose from the original source code (original source code can be found at the coco/r website).

Code: Select all

/'-------------------------------------------------------------------------
Compiler Generator Coco/R,
Copyright (c) 1990, 2004 Hanspeter Moessenboeck, University of Linz
extended by M. Loeberbauer & A. Woess, Univ. of Linz
ported to C++ by Csaba Balazs, University of Szeged
with improvements by Pat Terry, Rhodes University

This program is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 2, or (at your option) any
later version.

This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
for more details.

You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

As an exception, it is allowed to write an extension of Coco/R that is
used as a plugin in non-free software.

If not otherwise stated, any source code generated by Coco/R (other than
Coco/R itself) does not fall under the GNU General Public License.
-------------------------------------------------------------------------'/
#ifndef COCO_BITARRAY_H__
#define COCO_BITARRAY_H__

#ifdef NIL
#undef NIL
#endif

#define NIL 0

type  bitarray_

public:
	dim count as integer
	dim data as ubyte ptr

end type

declare function bitarray_create(byval length as integer, byval defaultvalue as byte) as bitarray_ ptr
declare function bitarray_create_copy(byval copy as bitarray_ ptr) as bitarray_ ptr
declare sub bitarray_destroy(byref bitarray as bitarray_ ptr)
declare function bitarray_getcount(byval bitarray as bitarray_ ptr) as integer
declare function bitarray_get(byval bitarray as bitarray_ ptr, byval index as integer) as byte
declare sub bitarray_set(byval bitarray as bitarray_ ptr,byval index as integer, byval value as byte)
declare sub bitarray_setall(byval bitarray as bitarray_ ptr, byval value as byte)
declare function bitarray_equal(byval bitarray as bitarray_ ptr, byval right_ as bitarray_ ptr) as byte
declare function bitarray_itemat(byval bitarray as bitarray_ ptr, byval index as integer) as byte
declare function bitarray_assign(byval bitarray as bitarray_ ptr, byval right_ as bitarray_ ptr) as bitarray_ ptr

declare sub bitarray_not(byval bitarray as bitarray_ ptr)
declare sub bitarray_and(byval bitarray as bitarray_ ptr, byval value as bitarray_ ptr)
declare sub bitarray_or(byval bitarray as bitarray_ ptr, byval value as bitarray_ ptr)
declare sub bitarray_xor(byval bitarray as bitarray_ ptr, byval value as bitarray_ ptr)

declare function bitarray_clone(byval bitarray as bitarray_ ptr) as bitarray_ ptr

#endif '' ifndef COCO_BITARRAY_H__

#include "crt/string.bi"
#include "crt/stdio.bi"

function bitarray_create(byval length as integer, byval defaultvalue as byte) as bitarray_ ptr

  dim bitarray as bitarray_ ptr
  
  if (length < 0) then
    return NIL
  end if
  
  bitarray = callocate(sizeof(bitarray_),1)
  bitarray->count = length
  bitarray->data = callocate(sizeof(ubyte),(length+7) shr 3)
  if (defaultvalue) then
    memset(bitarray->data, &hFF, (length+7) shr 3)
  else
    memset(bitarray->data, 0, (length+7) shr 3)
  end if
  return bitarray
  
end function

'bitarray_create_copy will return NIL when copy equals NIL
function bitarray_create_copy(byval copy as bitarray_ ptr) as bitarray_ ptr

	if (copy = NIL) then
		return NIL
	end if
  
  dim bitarray as bitarray_ ptr
  bitarray = callocate(sizeof(bitarray_),1)
  bitarray->count  = copy->count
  bitarray->data = callocate(sizeof(ubyte),(copy->count+7) shr 3)
  memcpy(bitarray->data, copy->data, (copy->count+7) shr 3)
  return bitarray
  
end function

sub bitarray_destroy(byref bitarray as bitarray_ ptr)

	if (bitarray = NIL) then
		return
	end if

  if (bitarray->data = NIL) then
    deallocate(bitarray)
    bitarray = NIL
    return
  end if
  
  deallocate(bitarray->data)
  bitarray->data = NIL
  deallocate(bitarray)
  bitarray = NIL

end sub

'returns -1 when bitarray equals NIL
function bitarray_getcount(byval bitarray as bitarray_ ptr) as integer

  if (bitarray = NIL) then
    return -1
  end if

  return bitarray->count

end function

'returns -1 if bitarray is NIL
function bitarray_get(byval bitarray as bitarray_ ptr, byval index as integer) as byte

	if (bitarray = NIL) then
		return -1
	end if

  if ((bitarray->data[(index shr 3)] and (1 shl (index and 7))) <> 0) then
    return 1
  else
    return 0
  end if

end function

sub bitarray_set(byval bitarray as bitarray_ ptr,byval index as integer, byval value as byte)

	if (bitarray = NIL) then
		return
	end if

  if (value) then
    bitarray->data[(index shr 3)] or= (1 shl (index and 7))
  else
    dim mask as ubyte = &hFF
    mask xor= (1 shl (index and 7))
    bitarray->data[(index shr 3)] and= mask
  end if

end sub

sub bitarray_setall(byval bitarray as bitarray_ ptr, byval value as byte)

	if (bitarray = NIL) then
		return
	end if

  if (value) then
    memset(bitarray->data, &hFF, (bitarray->count+7) shr 3)
  else
    memset(bitarray->data, 0, (bitarray->count+7) shr 3)
  end if

end sub


sub bitarray_not(byval bitarray as bitarray_ ptr)

	if (bitarray = NIL) then
		return
	end if

  for i as integer = 0 to ((bitarray->count+7) shr 3)
    bitarray->data[i] xor= &hff
  next i

end sub

sub bitarray_and(byval bitarray as bitarray_ ptr, byval value as bitarray_ ptr)

	if (bitarray = NIL) then
		return
	end if

  var i = 0l
  while ( i < ((bitarray->count+7) shr 3) andalso i < ((value->count+7) shr 3)) 
    bitarray->data[i] = (bitarray->data[i] and value->data[i])
    i += 1
  wend

end sub

sub bitarray_or(byval bitarray as bitarray_ ptr, byval value as bitarray_ ptr)

	if (bitarray = NIL) then
		return
	end if

  var i = 0l
  while ( i < ((bitarray->count+7) shr 3) andalso i < ((value->count+7) shr 3)) 
    bitarray->data[i] = (bitarray->data[i] or value->data[i])
    i += 1
  wend

end sub

sub bitarray_xor(byval bitarray as bitarray_ ptr, byval value as bitarray_ ptr)

	if (bitarray = NIL) then
		return
	end if

  var i = 0l
  while ( i < ((bitarray->count+7) shr 3) andalso i < ((value->count+7) shr 3)) 
    bitarray->data[i] = (bitarray->data[i] or value->data[i])
    i += 1
  wend

end sub

'returns NIL if bitarray equals NIL. Allocates a new bitarray and copies
'content of bitarray to the new bitarray otherwise.
function bitarray_clone(byval bitarray as bitarray_ ptr) as bitarray_ ptr

	if (bitarray = NIL) then
		return NIL
	end if

  dim newbitarray as bitarray_ ptr = bitarray_create(bitarray->count,0)
  newbitarray->count = bitarray->count
  memcpy(newbitarray->data, bitarray->data, (bitarray->count+7) shr 3)
  return newbitarray

end function

function bitarray_equal(byval bitarray as bitarray_ ptr, byval right_ as bitarray_ ptr) as byte

	if (bitarray = NIL) then
		if (right_ = NIL) then
      return 1
    else
      return 0
    end if
  end if
	
  if (right_ = NIL) then
    if (bitarray = NIL) then
      return 1
    else
      return 0
    end if
  end if
  
 
  if (bitarray->count <> right_->count) then
    return 0
  end if

  for i as integer = 0 to bitarray->count
    if ((bitarray->data[(i shr 3)] and  (1 shl (i and 7))) <> (right_->data[(i shr 3)] and (1 shl (i and 7)))) then
      return 0
    end if
  next i
  return 1

end function

'bitarray should be created before calling bitarray_assign.
'bitarray_assign fails when either bitarray or right_ equals NIL
function bitarray_assign(byval bitarray as bitarray_ ptr, byval right_ as bitarray_ ptr) as bitarray_ ptr

	if (bitarray = NIL) then
    return NIL
	end if
  
  if (right_ = NIL) then
    return NIL
  end if
  if ( right_ <> bitarray ) then         '' avoid self assignment
    if (bitarray->data <> NIL) then
      deallocate(bitarray->data)              '' prevents memory leak
    end if
    bitarray->count = right_->count
    bitarray->data = callocate(sizeof(ubyte), (bitarray->count+7) shr 3)
    memcpy(bitarray->data, right_->data, (bitarray->count+7) shr 3)
  end if
  return bitarray
  
end function

function bitarray_itemat(byval bitarray as bitarray_ ptr, byval index as integer) as byte

	if (bitarray = NIL) then
		return -1
	end if
  
  return bitarray_get(bitarray,index)

end function

sub bitarray_dump(byval bitarray as bitarray_ ptr) 

  if (bitarray = NIL) then
    return
  end if

  var i = bitarray_getcount(bitarray)
  for j as integer = 0 to i - 1
    if (j mod 8 = 0 andalso j <> 0) then
      print
    end if
    print bitarray_itemat(bitarray,j);" ";
  next j

end sub

dim bitarray as bitarray_ ptr
dim right_ as bitarray_ ptr
dim copy_ as bitarray_ ptr

bitarray = bitarray_create(32,0)
assertwarn(bitarray <> NIL)
var count = bitarray_getcount(bitarray)
dim l as byte
for i as integer = 0 to count - 1
  l = bitarray_get(bitarray,i)
  assertwarn(l = 0)
  l = bitarray_itemat(bitarray,i)
  assertwarn(l = 0)
next i

right_ = bitarray_create(32,0)
assertwarn(right_ <> NIL)
bitarray_set(right_,0,1) 
l = bitarray_get(right_,0)
assertwarn(l = 1)

bitarray_setall(right_,1)
bitarray_setall(bitarray,0)

count = bitarray_getcount(right_)
assertwarn(count = 32)
for i as integer = 0 to count - 1
  l = bitarray_get(right_,i)
  assertwarn(l = 1)
next i
for i as integer = 0 to count - 1
  l = bitarray_get(bitarray,i)
  assertwarn(l = 0)
next i

bitarray_and(right_,bitarray)
for i as integer = 0 to bitarray_getcount(right_) - 1
  l = bitarray_get(right_,i)
  assertwarn(l = 0)
next i

bitarray_not(right_)
for i as integer = 0 to bitarray_getcount(right_) - 1
  l = bitarray_get(right_,i)
  assertwarn(l = 1)
next i

bitarray_or(bitarray,right_)
for i as integer = 0 to bitarray_getcount(bitarray) - 1
  l = bitarray_get(bitarray,i)
  assertwarn(l = 1)
next i

copy_ = bitarray_create_copy(bitarray)
for i as integer = 0 to bitarray_getcount(copy_) - 1
  l = bitarray_get(copy_,i)
  assertwarn(l = 1)
next i

bitarray_xor(copy_,copy_)
for i as integer = 0 to bitarray_getcount(copy_) - 1
  l = bitarray_get(copy_,i)
  assertwarn(l = 1)
next i

bitarray_setall(right_,0)
l = bitarray_equal(right_,copy_)
assertwarn(l=0)

bitarray_destroy(copy_)
assertwarn(copy_=NIL)

copy_ = bitarray_clone(right_)
assertwarn(bitarray_getcount(copy_) = bitarray_getcount(right_))


bitarray_destroy(bitarray)
assertwarn(bitarray=NIL)
bitarray_destroy(right_)
assertwarn(right_=NIL)
bitarray_destroy(copy_)
assertwarn(copy_=NIL)

bitarray = bitarray_create(32,0)
right_ = bitarray_create(32,0)
bitarray_setall(right_,1)

copy_ = bitarray_assign(bitarray, right_)
l = bitarray_getcount(copy_)
assertwarn(l = bitarray_getcount(bitarray))
bitarray_destroy(bitarray)
assertwarn(bitarray=NIL)
bitarray_destroy(right_)
assertwarn(right_=NIL)
bitarray_destroy(copy_)
assertwarn(copy_=NIL)
TJF
Posts: 3809
Joined: Dec 06, 2009 22:27
Location: N47°, E15°
Contact:

Post by TJF »

Thanks for the code. Though it's not real OO style. A lot of global SUBs and FUNCTIONs are used.

You may find a smarter solution here

http://www.freebasic.net/forum/viewtopi ... 219#157219
1000101
Posts: 2556
Joined: Jun 13, 2005 23:14
Location: SK, Canada

Post by 1000101 »

tbh, I don't like nor would I use either of your solutions. Too much code bloat, to slow code and it's not flexible code.

Other macro versions have been supplied and imo are better since macros don't care about source/destination so long as you pass is valid parameters to go with; also because they tend to generate smaller faster code for the specific situation instead of a one-size-doesn't-fit-any solution.
TJF
Posts: 3809
Joined: Dec 06, 2009 22:27
Location: N47°, E15°
Contact:

Post by TJF »

1000101 wrote:tbh, I don't like nor would I use either of your solutions. Too much code bloat, to slow code and it's not flexible code.

Other macro versions have been supplied and imo are better since macros don't care about source/destination so long as you pass is valid parameters to go with; also because they tend to generate smaller faster code for the specific situation instead of a one-size-doesn't-fit-any solution.
Macros blow up the binaries, if you use them alot.

They need global variables. That's why they're unsave in big projects.

I haven't seen a macro solution using an index and being prepared to handle more than 64 entries.

I haven't seen a macro solution allowing to redim the bit array.


Mine is a clean OOP solution without these restrictions.
marcov
Posts: 3462
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Post by marcov »

Coco/R is quite Wirthian oriented. The C code is probably just a stub to keep code ported from Modula2, Pascal that uses the built-in "set of <ordinal>" feature working.
marcov
Posts: 3462
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Post by marcov »

TJF wrote: Macros blow up the binaries, if you use them alot.
This is not true per se. Specially in this case, since most elementary operations on sets are very, very short. (with 32-bit sets even a single instruction)

Only badly constructed macros blow up binary size. But if you can't handle it, don't use it. (and IMHO if you can't handle it, be reluctant to use macros, very reluctant)
They need global variables. That's why they're unsave in big projects.

I haven't seen a macro solution using an index and being prepared to handle more than 64 entries.

I haven't seen a macro solution allowing to redim the bit array.

Mine is a clean OOP solution without these restrictions.
The dilemma is always the same, use a bounded solution and avoid dynamic allocation, and hardcode the size limits, or have an heavy type that does a memory allocation for two bits, and can't be passed by value without a memcpy routine use, so your implementation will still call memcpy to copy 2 bytes and do all the variable size checks.

Most enums are short, only a few are longer than 32/64-bit.

Generics could be a solution, but afaik generics doesn't allow to change algorithms on size. (#ifdef based on the size of the argument passed during specialization).

Built in bitarrays have the advantage that the compiler can use different internal types for larger and smaller sets. Many languages with bitset/bitarray types have e.g. 8,16,32 bit types and only go to a dynamically size type beyond the highest normal type.

Sometimes the biggest size set is sparse even (only lists/hashes elements in the array, instead of reserving a bit for each possible element)

Pascal also has a 256-bit (32 byte) type. Simple operations (include/exclude) can still be done inline in a few instructions on such types. That's a bit historic though, since in Pascal a lot of SET OF CHAR was used. With Unicode this gets more difficult.

Built in sets (and specially those of enum) is something I always missed in C.
TJF
Posts: 3809
Joined: Dec 06, 2009 22:27
Location: N47°, E15°
Contact:

Post by TJF »

marcov wrote:
TJF wrote: Macros blow up the binaries, if you use them alot.
This is not true per se. Specially in this case, since most elementary operations on sets are very, very short. (with 32-bit sets even a single instruction)
How will you operate an indexed bit array with more than 1000 entries by a single instruction in a macro?

Try to understand the words before you start blaming them. I wait for the day when you serve an advanced solution instead of criticism. And I wait for a marcov post on this FreeBasic forum without the word pascal in it.
marcov
Posts: 3462
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Post by marcov »

TJF wrote:
marcov wrote:
TJF wrote: Macros blow up the binaries, if you use them alot.
This is not true per se. Specially in this case, since most elementary operations on sets are very, very short. (with 32-bit sets even a single instruction)
How will you operate an indexed bit array with more than 1000 entries by a single instruction in a macro?
First: Don't put me words in the mouth. Second: It just has to be cheaper than call overhead ( at least one instruction per parameter preparation + call and in heavier loops register saving) to save space.

And that is just common sense about inlining. Doesn't matter(*) if it is function inlining or by macro. You can read about that in any compiler text, or find out if you just objdump some code.

Of course all this is "on average", since you can't take decisions depending on circumstances (e.g. only inline if the bit to set/clear is constant, or use simplified routines if the set is small). But demonstrating that was exactly the idea of the Pascal comparisons you hate so much.

That is not Pascal or not, but simply a demonstration that when you pull types into the compiler (making them basetypes) they get more powerfull (but the compiler more complicated). It's a choice. I'm perfectly happy if sb says that is a bridge to far for the set problem.

That it doesn't matter performancewise that much, but then the way of implementing it (OOP or not) doesn't matter either.

The fact that I say something slightly on the positive side of a macro discussion should already hint to you that this is not nonsense, since normally I'm against macros.
Try to understand the words before you start blaming them. I wait for the day when you serve an advanced solution instead of criticism.
Just show me wrong and argue on points, and not on sentiments. I'll be happy to address any points you have.
And I wait for a marcov post on this FreeBasic forum without the word pascal in it.
Spare me the demagogy. Find faults with what I say, not with labelling.

(*) it doesn't have to matter, but tests might give slightly different results because different parts of the compiler might be stressed, and not all might be optimal
TJF
Posts: 3809
Joined: Dec 06, 2009 22:27
Location: N47°, E15°
Contact:

Post by TJF »

@marcov:

I red your post -- no comment.
vdecampo
Posts: 2992
Joined: Aug 07, 2007 23:20
Location: Maryland, USA
Contact:

Post by vdecampo »

I personally just use Integers to store flags. Memory is plentiful and the CPU works with Integers faster anyway.

-Vince
MichaelW
Posts: 3500
Joined: May 16, 2006 22:34
Location: USA

Post by MichaelW »

vdecampo wrote:I personally just use Integers to store flags. Memory is plentiful and the CPU works with Integers faster anyway.
Same here. Even if you ignore all the memory overhang built into the OS (or at least into Windows), unless you have a very large number of flags the memory savings from using single-bit flags is negligible.
Last edited by MichaelW on May 09, 2011 5:52, edited 1 time in total.
TJF
Posts: 3809
Joined: Dec 06, 2009 22:27
Location: N47°, E15°
Contact:

Post by TJF »

MichaelW wrote:... unless you have a very large number of flags ...
That's what I'm talking about.

Think of parallel processing of an finite elemente simulation or the drawing of a big 3D scene. You need to save the done-status of the 1000 or more elements anywhere.
marcov
Posts: 3462
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Post by marcov »

Example:

Pascal example:

Code: Select all

var x : set of 0..255; // 32 byte set
      i : integer;
begin
  read(i);  // get a value from input. Just so htat the compiler can't predict it
  include(x,5);
  include(x,99);
  include(x,i);    
end;
resulting code;

Code: Select all

# [5] include(x,5);
        orl     $32,U_P$PROGRAM_X
# [6] include(x,99);
        orl     $8,U_P$PROGRAM_X+12
# [7] include(x,i);
        movzbl  U_P$PROGRAM_I,%eax
        btsl    %eax,U_P$PROGRAM_X

1 assembler instructions for constant case, 2 for variable case.

The latter surprises me. Apparantly btsl works on more than 32-bits memory operands, because the code works. I tried to search on internet and found:

If the bit base operand specifies a memory location, it represents the address of the byte in memory that contains the bit base (bit 0 of the specified byte) of the bit string. The offset operand then selects a bit position within the range -2^31 to 2^31 - 1 for a register offset and 0 to 31 for an immediate offset.

So this will work for very large sets (up to 512MB/set)
vdecampo
Posts: 2992
Joined: Aug 07, 2007 23:20
Location: Maryland, USA
Contact:

Post by vdecampo »

TJF wrote:
MichaelW wrote:... unless you have a very large number of flags ...
That's what I'm talking about.

Think of parallel processing of an finite elemente simulation or the drawing of a big 3D scene. You need to save the done-status of the 1000 or more elements anywhere.
This is still small. 1000 Integer flags is only 1k of memory. If your are rendering a scene I would say it is even more important to deal in Integers for speed. Even 100,000 flags I would still use Integer.

-Vince
TJF
Posts: 3809
Joined: Dec 06, 2009 22:27
Location: N47°, E15°
Contact:

Post by TJF »

vdecampo wrote:1000 Integer flags is only 1k of memory. If your are rendering a scene I would say it is even more important to deal in Integers for speed. Even 100,000 flags I would still use Integer.

-Vince
On my system 1000 INTEGERs is about 4k of memory.

In an 64-bit app 100,000 flags as INTEGER will be nearly 1MB, instead of the needed 12.5 kB.
Post Reply