Dodicat Zlib

General FreeBASIC programming questions.
Locked
angros47
Posts: 2324
Joined: Jun 21, 2005 19:04

Re: Dodicat Zlib

Post by angros47 »

Albert, the reason s+= chr( int( rnd *128 ) ) can be compressed is that it is not completely random: the range of int( rnd *128 ) is 0-127 (so, 7 bits), while a byte is 0-255 (8 bits): one bit (the first one) is never used, it is always set to 0, so it can be discarded with no data loss. The Zlib library is smart enough to notice it, so it manages to exploit that to achieve light compression.


Also, if your messages are directed to a particular person, please don't use @. Use emails. The forum is not supposed to be used for private messages, if you want to talk with dodicat do it in private and don't clog the forum.
paul doe
Moderator
Posts: 1733
Joined: Jul 25, 2017 17:22
Location: Argentina

Re: Dodicat Zlib

Post by paul doe »

albert wrote:...
So I'm no longer going to post compression code... I don't want to get banned again..
...
Technically, you never did. All you posted was code that destroyed the original data in more or less half-witted ways (so it could not be 'decompressed'). Stop wasting your time and just use this code instead:

Code: Select all

function compress( s as string ) as string
  return( "" )
end function

var s = "This is a test"

s = compress( s )

'' It compresses a string by 100%! I'm gonna get filthy rich!
? len( s )
dodicat
Posts: 7983
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Dodicat Zlib

Post by dodicat »

C'mon angros47,paul doe, srvaldez, stop brow beating the guy.
I hate bullies when I am unable to have a face to face discussion with them.
Folk are just picking on a particular member, reminds me of the school playground.
Plenty of people have been use @ in the forum for years.
Please pick on somebody else if you must.
Albert is discussing zlib.

Why I am so sensitive to this behaviour:
I was born in Scotland with a German mother just after the war.
I was preyed upon all during my primary school years.
If there was a game of cowboys and Indians I was always the Indian, taking on a whole bunch of cowboys.
If there was British versus Germans I of course was let loose as the German with a pile of British soldiers on my case.
I got tough very quickly, I actually enjoyed the predator prey games, I could drop large sticks on their heads from great heights (trees for example).

There is no need to prey on Albert, and the forum moderator should intervene here.
marcov
Posts: 3462
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Re: Dodicat Zlib

Post by marcov »

albert wrote:@Dodicat

I fumbled upon a quirk with Zlib...

If you make the random string s+= chr( int( rnd *128 ) )
It compresses on it's own , with no formula code..
Because random data still might contain entropy, iow in a certain sliding window not all tokens (characters) might exist, allowing Huffman encoding.
dodicat
Posts: 7983
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Dodicat Zlib

Post by dodicat »

For random data:
With the range 0 to 127 I get

original length 10000000
compressed length 8775519
return length 10000000
OK

With the full binary range 0 to 255 I get
original length 10000000
cpmpressed length 10003070
return length 10000000
OK

I have used My own random generator, tried and tested with the practrand test.

Code: Select all



#include "zlib1.bi"
namespace rr
dim as ulongint a,b,c,d,e
end namespace

function rndU() byref as  ulongint
   rr.e = rr.a - ((rr.b shl 7) or (rr.b shr (57)))
   rr.a = rr.b xor ((rr.c shl 13) or (rr.c shr (51)))
   rr.b = rr.c + ((rr.d shl 37) or (rr.d shr (27)))
   rr.c = rr.d + rr.e
   rr.d = rr.e + rr.a
   return rr.d
end function


 function range(f as longint,l as longint) as longint
    return (rndU() mod ((l-f)+1)) + f
end function


sub init(n as long)
    rr.a=n:rr.b=n:rr.c=n:rr.d=n
    for m as long=n to n+2
    rndU()+=m
    next
end sub

init(timer)

Function getpassedinfo(text As String,Byref passed_length As Integer) As String
    Dim As String var1,var2
    Dim As Integer pst
    #macro splice(stri,char,var1,var2)
    pst=Instr(stri,char)
    var1="":var2=""
    If pst<>0 Then
        var1=Mid(stri,1,pst-1)
        var2=Mid(stri,pst+1)
    Else
        var1=stri
    End If
    #endmacro
    splice(text,"|",var1,var2)
    text=var2
    passed_length=Valint(var1)
    Return text
End Function


'=================   UNPACK ===============
Function unpack(file As String) As String
    Dim As Integer passed_length
    Dim As String text=getpassedinfo(file,passed_length)
    Dim As Integer stringlength,destinationlength
    stringlength=Len(text)
    destinationlength =passed_length
    Dim As Ubyte Ptr source
    Dim As Ubyte Ptr  destination =Callocate(destinationlength,1)
    source=@text[0]
    Var mistake=uncompress(destination,@destinationlength, source, stringlength)
    If mistake<>0 Then Print "There was an error":Sleep:End
    Dim As String uncompressed
    uncompressed=String(destinationlength,0)
    For i As Integer = 0 To destinationlength- 1
        uncompressed[i]=(destination[i])
    Next
    Deallocate destination
    Return uncompressed
End Function

'===================  PACK ============
Function pack(file As String) As String
    Dim As String text=file
    Dim As Integer stringlength,destinationlength
    stringlength=Len(text)
    destinationlength = compressBound(stringlength)
    Dim As Ubyte Ptr source
    Dim As Ubyte Ptr destination =Callocate(destinationlength,1)
    source=@text[0]
    Var mistake=compress2(destination, @destinationlength, source, stringlength,Z_BEST_COMPRESSION)''<----  use compress2
    If mistake <>0 Then Print "There was an error"
    Dim As String compressed
    compressed=String(destinationlength,0)
    For n As Integer=0 To destinationlength-1
        compressed[n]=destination[n]
    Next n
    compressed=stringlength &"|"+compressed
    Deallocate destination
    Return compressed
End Function




dim as string s=string(10000000,0)
for n as long=0 to len(s)-1
    s[n]=range(0,255)
next


print "original length   ";len(s)
var p=pack(s)
print "cpmpressed length ";len(p)
var up=unpack(p)
print "return length     ";len(up)
print iif(up=s,"OK","ERROR")

sleep



 
With digits only (48 to 57) I get more than .5 compression.
So having more freedom outside the random character range gets more compression.
albert
Posts: 6000
Joined: Sep 28, 2006 2:41
Location: California, USA

Re: Dodicat Zlib

Post by albert »

@Dodicat

I thought that if 7 bit chars compress.. Then i could utilize it in a formula..

So , i stepped by 8 bits and turned 7 bits into a chr , and added 1 bit into a map..
When the map got to 7 bits , i added the map chr to the stream and zeroed the map , to start it over again..

It failed to compress... But like i said before somewhere.. "If it doesn't compress and can be decompressed , then it's encryption".


But one of the derogatory posts above , gave me a new idea , to try..

You bring in four bytes of bits and make them into a grid. and search columns for like bits..
If all of column n , is all 1's or all 0's then you can cancel that column or multiple columns. to arrive at compression..
I'm stupid , so i don't know how to go about writing the formula.. it might take me some time to figure it out..

Instead of Huffman it would be CCE ( column cancel encoding )
albert
Posts: 6000
Joined: Sep 28, 2006 2:41
Location: California, USA

Re: Dodicat Zlib

Post by albert »

@Dodicat

With my above CCE ( Column Cancel Encoding ) , compression idea..

You bring in two bytes for the initial column check.
If you get 1 or more columns with like bits , then you bring in another byte and add it to the grid , else you just output 1 byte..
If the number of like columns increases then you add another byte to the grid else you just use the first two bytes..

You might be able to cancel rows and columns both??

After playing around with the idea some , i determined that a 4 x 3 grid works best ( 12 bits )..

Here's a little program , that shows 12 bit grids..
If you look at the grids , practically every grid has at least 1 column or row , that equals all 1's or all 0's

Code: Select all


'Column Cancel Encoding ( CCE ) compression algorithm..
'
'You create a grid of bits and then check columns and rows for all bits being equal..
'If all of a row or column equals all 1's or all 0's then you can cancel those columns and rows.

screen 19

dim as string n1 , n2 , n3 , in 

do
        
        randomize
        
        in = ""
        for a as longint = 1 to 12
            in+= bin( int( rnd * 2 ) )
        next
        
        n1 = mid( in , 1 , 4 )
        n2 = mid( in , 5 , 4 )
        n3 = mid( in , 9 , 4 )
        
        print
        print n1
        print n2
        print n3
        sleep
        if inkey = chr( 27 ) then end
        
loop

sleep
end

albert
Posts: 6000
Joined: Sep 28, 2006 2:41
Location: California, USA

Re: Dodicat Zlib

Post by albert »

@Dodicat

With the 12 bit Grids...

How would you make the mapping of equated rows and columns??

1110
1010
0110

Column 3 ( 1's ) equates..
Column 4 ( 0's ) equates..

Would you map it to a ; 21 ?= ( 101) and a 30 ?= ( 110 ) ?? = 6 bits , so it's 1 for 1 ??
I got to figure a way to do the mapping...


One grid that came up:
0000
0011
0000

You have 2 rows and 2 columns equating..

=====================================================
If you use a 16 bit grid, 4 x 4

1010
0100
1010
0000

Then:
Column 4 ( all 0's ) equates = ( 30 ) = 110 , it would compress a bit..
Row 4 ( all 0's ) equates = ( 30 ) = 110 , it would compress another bit..

With a 12 bit grid , about 1 in 10 doesn't have an equate..
With a 16 bit grid , about 30% to 50% , don't have an equate..

If you use an 8 x 8 grid..
It would take 4 bits to specify a row or column , and 1 bit to specify 1's or 0's = 5 bits..
So an equate , would cause 3 bits compression per equate.
But the number of grids with no equate , might rise to 6 in 10...
albert
Posts: 6000
Joined: Sep 28, 2006 2:41
Location: California, USA

Re: Dodicat Zlib

Post by albert »

I wrote a grid analyzing program...

You set the size var , 4x4 grid = 4 , 8x8 grid = 8 , 6x6 grid = 6 , etc..


It lets you know how many bytes compression you can expect from a particular grid size...

It goes to 99,999 bytes of grids , and then pauses.. size = 4 , gives the best values..

Code: Select all


'Column Cancel Encoding ( CCE )
'
'You create a grid of bits and then check columns and rows for all bits being equal..
'If all of a row or column equals all 1's or all 0's then you can cancel those columns and rows.

screen 19

dim as longint size = 4
dim as longint gc , ec

do
        
        randomize
        
        dim as string n1( 0 to ( size - 1 ) )
        dim as string s1( 0 to ( size - 1 ) )
        
        for a as longint = 0 to ( size - 1 )
            for b as longint = 1 to size
                n1( a ) += bin( int( rnd * 2 ) )
            next
        next
        
        dim as longint place = 1
        dim as longint ele1 = 0
        dim as longint ele2
        do
            ele2 = 0
            do
                s1( ele1 ) += mid( n1( ele2 ) , place , 1 )
                ele2+= 1
            loop until ele2 = size
                place+= 1
                ele1+=1
        loop until ele1 = size
        
        gc+=1
        
        print
        for a as longint = 0 to ( size - 1 )
            if n1( a ) = string( size , "1" ) or n1( a ) = string( size , "0" ) then ec+=1 : print "row " ; a
        next
        
        for a as longint = 0 to ( size - 1 )
            if s1( a ) = string( size , "1" ) or s1( a ) = string( size , "0" ) then ec+=1 : print "col " ; a
        next
        
        print
        print "( rows )"
        for a as longint = 0 to ( size - 1 )
            print n1( a )
        next
        print
        print "( cols )"
        for a as longint = 0 to ( size - 1 )
            print s1( a )
        next
        
        dim as longint outbytes = ( ( size - ( len( bin( size - 1 ) ) + 1 ) ) * ec )
        
        print
        print "Total grids = " ; gc ,  " = ";  ( gc * ( size * size ) ) \ 8  ; " Bytes." 
        print "Total equates = " ; ec ,  " = ";  outbytes \ 8 ; " Bytes." 
        
        if ( gc * ( size * size ) ) \ 8 > 99999 then sleep
        
        if inkey = chr( 27 ) then exit do
        
loop

sleep
sleep
end

albert
Posts: 6000
Joined: Sep 28, 2006 2:41
Location: California, USA

Re: Dodicat Zlib

Post by albert »

@Dodicat

I've been having fun , playing around with the idea of ; Row-Column Cancel Encoding.. ( RCCE )

I'm still trying to figure out , how to map the equates..

For a 3 x 3 grid

n2 = "0"
if row( 0 ) = "000" then n2+= "0"
if row( 1 ) = "000" then n2+= "00"
if row( 2 ) = "000" then n2+= "01"

if row( 0 ) = "111" then n2+= "1"
if row( 1 ) = "111" then n2+= "10"
if row( 2 ) = "111" then n2+= "11"

n3 = "1"
if col( 0 ) = "000" then n3+= "0"
if col( 1 ) = "000" then n3+= "00"
if col( 2 ) = "000" then n3+= "01"

if col( 0 ) = "111" then n3+= "1"
if col( 1 ) = "111" then n3+= "10"
if col( 2 ) = "111" then n3+= "11"

if column matches are greater than row matches , then output n3 (col) else output n2 (row)

If their are no col or row matches , then you add "000" to the grid . So you know there's no output.

The problem is ; you don't know where n2 or n3 ends... there might be 2 or even 3 matches...
dodicat
Posts: 7983
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Dodicat Zlib

Post by dodicat »

Hi Albert.
I had to get one of my little dogs put to sleep today, she was very ill with kidney disease.
That's only five years since my little hound died.
I had taken her on when her owner died three years ago, she was quite an old dog then, about 11.5 years old.
This meant that she stayed on in her own village rather than go to Ireland with her owner's son who lives over there.

So I am not really in the mood for abstract thinking.

You can work away independently with your RCCE I am sure.
You got the fastest forum bigint multiplier using a grid method, so maybe you are on to a winner, I hope so anyway.
But beware of upsetting the natives again, they just love going on the warpath.
albert
Posts: 6000
Joined: Sep 28, 2006 2:41
Location: California, USA

Re: Dodicat Zlib

Post by albert »

@Dodicat

I wrote a grid analyzing program...

You set the grid size , then it runs through all the binary values of that grid size..
It checks all the times that the row grid = the column grid.

For a size 2 grid , if row(0) = col(0) and row(1) = col(1) then record the value and increment the count...

For a 2 x 2 grid 4 bits , there are 8 times that the rows = cols ( 0 to 7 )
For a 3 x 3 grid 9 bits , there are 64 times that the rows = cols ( 0 to 63 )
For a 4 x 4 grid 16 bits , there are 1024 times that the rows = cols ( 0 to 1023 )

Code: Select all


'This program compares row to columns..
'If all the rows equal all the columns , ( row grid = col grid ) then it records the value of the grid , and adds 1 to the count ( eq )..

screen 19

'set grid size
dim as ulongint size = 3

dim as ulongint top = ( 2 ^ ( size * size ) )  - 1

dim as string match = ""
dim as longint eq = - 1

dim as ulongint grid_bits = 0 , eq_bits = 0

dim as string ink

for z as ulongint = 0 to top step 1
        
        dim as string s1 = right( string( size * size , "0" ) + bin( z ) , size * size )

        dim as string row( 0 to ( size - 1 ) )
        dim as string col( 0 to ( size - 1 ) )
        
        'build rows
        dim as ulongint ele = 0
        for a as ulongint = 1 to len( s1 ) step size
                row( ele ) = mid( s1 , a , size )
                ele+= 1
        next
        
        'build columns
        dim as ulongint place = 1
        dim as ulongint ele1 = 0
        dim as ulongint ele2
        do
            ele2 = 0
            do
                col( ele1 ) += mid( row( ele2 ) , place , 1 )
                ele2+= 1
            loop until ele2 = size
                place+= 1
                ele1+=1
        loop until ele1 = size
        
        grid_bits+= ( size * size )
        
        cls
        print
        print "( rows )"
        dim as string rows = ""
        for a as ulongint = 0 to ( size - 1 )
            print row( a )
            rows+= row( a )
        next
        print
        print "( cols )"
        dim as string cols = ""
        for a as ulongint = 0 to ( size - 1 )
            print col( a )
            cols+= col( a )
        next
        
        if rows = cols then match+= str( z ) + " " : eq+=1
        
        print
        print "Matches =  " ; eq
        
        ink = inkey
        
        if ink = chr( 27 ) then exit for

next

print
print match
print
print "Times Row grid = Column grid " ; eq

eq_bits = ( ( size * size ) - len( bin( eq ) ) ) * ( eq + 1 )

print
print "Grid bytes     = " ; ( grid_bits / 8 )
print "Compress bytes =  " ; ( eq_bits / 8 )

sleep
end

albert
Posts: 6000
Joined: Sep 28, 2006 2:41
Location: California, USA

Re: Dodicat Zlib

Post by albert »

@Dodicat

For the "Row , Column Cancel Encoding" ( RCCE ) , I finally got the the mapping figured out...

It's compressing 1,000,000 bytes input , by 23% , on the first run through the compressor..

So i might have an error , or an omission somewhere...
I've gone over the code many times now , and i can't see any errors..

=============================================================================

@CoderJeff

Can i post the compression formula for Dodicat to check out?? ( I don't want to get banned again.. )
I'd like him to go over it , to see if the code actually works , or , if i have an error somewhere..

=============================================================================
Richard
Posts: 3096
Joined: Jan 15, 2007 20:44
Location: Australia

Re: Dodicat Zlib

Post by Richard »

@Albert. Why even ask? You know the answer will be NO.
The FB forum is not the place for your ongoing attempts at data compression.

Put simply: Do not post your attempts at compression on this website. There is no point posting code that does not actually work, or that has not been tested. No other member wants to invest valuable time in working out exactly why your code does not work. You must learn to debug your own code.

You must write and debug both the compression and the expansion code yourself. Then you should use it on itself, to compress and expand the source code you wrote. If it runs correctly after expansion, then you are half way there.
Then compile the FB source to a file.exe, compress and expand the file.exe, and test if it still runs.

Then test your code really well on any and every file you can find, with any extension. If the compressed files are less than 66% of their original size, and if it never fails to expand correctly, then you might risk posting it on this forum. I expect someone will then find a file where it fails to work, because you couldn't be bothered testing it properly.

If you post code that you claim works, and someone else can find a file that breaks it, then you deserve a ban for wasting other members time.
albert
Posts: 6000
Joined: Sep 28, 2006 2:41
Location: California, USA

Re: Dodicat Zlib

Post by albert »

@Richard

Got it Richard!! I'll work alone in it ...

I wont post any more here , till i got the decompression done...
Just sometimes ; you need a proof reader , to check the code , or to respond with ideas , for speeding it up..
Locked