Dodicat Zlib
Re: Dodicat Zlib
Albert, the reason s+= chr( int( rnd *128 ) ) can be compressed is that it is not completely random: the range of int( rnd *128 ) is 0-127 (so, 7 bits), while a byte is 0-255 (8 bits): one bit (the first one) is never used, it is always set to 0, so it can be discarded with no data loss. The Zlib library is smart enough to notice it, so it manages to exploit that to achieve light compression.
Also, if your messages are directed to a particular person, please don't use @. Use emails. The forum is not supposed to be used for private messages, if you want to talk with dodicat do it in private and don't clog the forum.
Also, if your messages are directed to a particular person, please don't use @. Use emails. The forum is not supposed to be used for private messages, if you want to talk with dodicat do it in private and don't clog the forum.
Re: Dodicat Zlib
Technically, you never did. All you posted was code that destroyed the original data in more or less half-witted ways (so it could not be 'decompressed'). Stop wasting your time and just use this code instead:albert wrote:...
So I'm no longer going to post compression code... I don't want to get banned again..
...
Code: Select all
function compress( s as string ) as string
return( "" )
end function
var s = "This is a test"
s = compress( s )
'' It compresses a string by 100%! I'm gonna get filthy rich!
? len( s )
Re: Dodicat Zlib
C'mon angros47,paul doe, srvaldez, stop brow beating the guy.
I hate bullies when I am unable to have a face to face discussion with them.
Folk are just picking on a particular member, reminds me of the school playground.
Plenty of people have been use @ in the forum for years.
Please pick on somebody else if you must.
Albert is discussing zlib.
Why I am so sensitive to this behaviour:
I was born in Scotland with a German mother just after the war.
I was preyed upon all during my primary school years.
If there was a game of cowboys and Indians I was always the Indian, taking on a whole bunch of cowboys.
If there was British versus Germans I of course was let loose as the German with a pile of British soldiers on my case.
I got tough very quickly, I actually enjoyed the predator prey games, I could drop large sticks on their heads from great heights (trees for example).
There is no need to prey on Albert, and the forum moderator should intervene here.
I hate bullies when I am unable to have a face to face discussion with them.
Folk are just picking on a particular member, reminds me of the school playground.
Plenty of people have been use @ in the forum for years.
Please pick on somebody else if you must.
Albert is discussing zlib.
Why I am so sensitive to this behaviour:
I was born in Scotland with a German mother just after the war.
I was preyed upon all during my primary school years.
If there was a game of cowboys and Indians I was always the Indian, taking on a whole bunch of cowboys.
If there was British versus Germans I of course was let loose as the German with a pile of British soldiers on my case.
I got tough very quickly, I actually enjoyed the predator prey games, I could drop large sticks on their heads from great heights (trees for example).
There is no need to prey on Albert, and the forum moderator should intervene here.
Re: Dodicat Zlib
Because random data still might contain entropy, iow in a certain sliding window not all tokens (characters) might exist, allowing Huffman encoding.albert wrote:@Dodicat
I fumbled upon a quirk with Zlib...
If you make the random string s+= chr( int( rnd *128 ) )
It compresses on it's own , with no formula code..
Re: Dodicat Zlib
For random data:
With the range 0 to 127 I get
original length 10000000
compressed length 8775519
return length 10000000
OK
With the full binary range 0 to 255 I get
original length 10000000
cpmpressed length 10003070
return length 10000000
OK
I have used My own random generator, tried and tested with the practrand test.
With digits only (48 to 57) I get more than .5 compression.
So having more freedom outside the random character range gets more compression.
With the range 0 to 127 I get
original length 10000000
compressed length 8775519
return length 10000000
OK
With the full binary range 0 to 255 I get
original length 10000000
cpmpressed length 10003070
return length 10000000
OK
I have used My own random generator, tried and tested with the practrand test.
Code: Select all
#include "zlib1.bi"
namespace rr
dim as ulongint a,b,c,d,e
end namespace
function rndU() byref as ulongint
rr.e = rr.a - ((rr.b shl 7) or (rr.b shr (57)))
rr.a = rr.b xor ((rr.c shl 13) or (rr.c shr (51)))
rr.b = rr.c + ((rr.d shl 37) or (rr.d shr (27)))
rr.c = rr.d + rr.e
rr.d = rr.e + rr.a
return rr.d
end function
function range(f as longint,l as longint) as longint
return (rndU() mod ((l-f)+1)) + f
end function
sub init(n as long)
rr.a=n:rr.b=n:rr.c=n:rr.d=n
for m as long=n to n+2
rndU()+=m
next
end sub
init(timer)
Function getpassedinfo(text As String,Byref passed_length As Integer) As String
Dim As String var1,var2
Dim As Integer pst
#macro splice(stri,char,var1,var2)
pst=Instr(stri,char)
var1="":var2=""
If pst<>0 Then
var1=Mid(stri,1,pst-1)
var2=Mid(stri,pst+1)
Else
var1=stri
End If
#endmacro
splice(text,"|",var1,var2)
text=var2
passed_length=Valint(var1)
Return text
End Function
'================= UNPACK ===============
Function unpack(file As String) As String
Dim As Integer passed_length
Dim As String text=getpassedinfo(file,passed_length)
Dim As Integer stringlength,destinationlength
stringlength=Len(text)
destinationlength =passed_length
Dim As Ubyte Ptr source
Dim As Ubyte Ptr destination =Callocate(destinationlength,1)
source=@text[0]
Var mistake=uncompress(destination,@destinationlength, source, stringlength)
If mistake<>0 Then Print "There was an error":Sleep:End
Dim As String uncompressed
uncompressed=String(destinationlength,0)
For i As Integer = 0 To destinationlength- 1
uncompressed[i]=(destination[i])
Next
Deallocate destination
Return uncompressed
End Function
'=================== PACK ============
Function pack(file As String) As String
Dim As String text=file
Dim As Integer stringlength,destinationlength
stringlength=Len(text)
destinationlength = compressBound(stringlength)
Dim As Ubyte Ptr source
Dim As Ubyte Ptr destination =Callocate(destinationlength,1)
source=@text[0]
Var mistake=compress2(destination, @destinationlength, source, stringlength,Z_BEST_COMPRESSION)''<---- use compress2
If mistake <>0 Then Print "There was an error"
Dim As String compressed
compressed=String(destinationlength,0)
For n As Integer=0 To destinationlength-1
compressed[n]=destination[n]
Next n
compressed=stringlength &"|"+compressed
Deallocate destination
Return compressed
End Function
dim as string s=string(10000000,0)
for n as long=0 to len(s)-1
s[n]=range(0,255)
next
print "original length ";len(s)
var p=pack(s)
print "cpmpressed length ";len(p)
var up=unpack(p)
print "return length ";len(up)
print iif(up=s,"OK","ERROR")
sleep
So having more freedom outside the random character range gets more compression.
Re: Dodicat Zlib
@Dodicat
I thought that if 7 bit chars compress.. Then i could utilize it in a formula..
So , i stepped by 8 bits and turned 7 bits into a chr , and added 1 bit into a map..
When the map got to 7 bits , i added the map chr to the stream and zeroed the map , to start it over again..
It failed to compress... But like i said before somewhere.. "If it doesn't compress and can be decompressed , then it's encryption".
But one of the derogatory posts above , gave me a new idea , to try..
You bring in four bytes of bits and make them into a grid. and search columns for like bits..
If all of column n , is all 1's or all 0's then you can cancel that column or multiple columns. to arrive at compression..
I'm stupid , so i don't know how to go about writing the formula.. it might take me some time to figure it out..
Instead of Huffman it would be CCE ( column cancel encoding )
I thought that if 7 bit chars compress.. Then i could utilize it in a formula..
So , i stepped by 8 bits and turned 7 bits into a chr , and added 1 bit into a map..
When the map got to 7 bits , i added the map chr to the stream and zeroed the map , to start it over again..
It failed to compress... But like i said before somewhere.. "If it doesn't compress and can be decompressed , then it's encryption".
But one of the derogatory posts above , gave me a new idea , to try..
You bring in four bytes of bits and make them into a grid. and search columns for like bits..
If all of column n , is all 1's or all 0's then you can cancel that column or multiple columns. to arrive at compression..
I'm stupid , so i don't know how to go about writing the formula.. it might take me some time to figure it out..
Instead of Huffman it would be CCE ( column cancel encoding )
Re: Dodicat Zlib
@Dodicat
With my above CCE ( Column Cancel Encoding ) , compression idea..
You bring in two bytes for the initial column check.
If you get 1 or more columns with like bits , then you bring in another byte and add it to the grid , else you just output 1 byte..
If the number of like columns increases then you add another byte to the grid else you just use the first two bytes..
You might be able to cancel rows and columns both??
After playing around with the idea some , i determined that a 4 x 3 grid works best ( 12 bits )..
Here's a little program , that shows 12 bit grids..
If you look at the grids , practically every grid has at least 1 column or row , that equals all 1's or all 0's
With my above CCE ( Column Cancel Encoding ) , compression idea..
You bring in two bytes for the initial column check.
If you get 1 or more columns with like bits , then you bring in another byte and add it to the grid , else you just output 1 byte..
If the number of like columns increases then you add another byte to the grid else you just use the first two bytes..
You might be able to cancel rows and columns both??
After playing around with the idea some , i determined that a 4 x 3 grid works best ( 12 bits )..
Here's a little program , that shows 12 bit grids..
If you look at the grids , practically every grid has at least 1 column or row , that equals all 1's or all 0's
Code: Select all
'Column Cancel Encoding ( CCE ) compression algorithm..
'
'You create a grid of bits and then check columns and rows for all bits being equal..
'If all of a row or column equals all 1's or all 0's then you can cancel those columns and rows.
screen 19
dim as string n1 , n2 , n3 , in
do
randomize
in = ""
for a as longint = 1 to 12
in+= bin( int( rnd * 2 ) )
next
n1 = mid( in , 1 , 4 )
n2 = mid( in , 5 , 4 )
n3 = mid( in , 9 , 4 )
print
print n1
print n2
print n3
sleep
if inkey = chr( 27 ) then end
loop
sleep
end
Re: Dodicat Zlib
@Dodicat
With the 12 bit Grids...
How would you make the mapping of equated rows and columns??
1110
1010
0110
Column 3 ( 1's ) equates..
Column 4 ( 0's ) equates..
Would you map it to a ; 21 ?= ( 101) and a 30 ?= ( 110 ) ?? = 6 bits , so it's 1 for 1 ??
I got to figure a way to do the mapping...
One grid that came up:
0000
0011
0000
You have 2 rows and 2 columns equating..
=====================================================
If you use a 16 bit grid, 4 x 4
1010
0100
1010
0000
Then:
Column 4 ( all 0's ) equates = ( 30 ) = 110 , it would compress a bit..
Row 4 ( all 0's ) equates = ( 30 ) = 110 , it would compress another bit..
With a 12 bit grid , about 1 in 10 doesn't have an equate..
With a 16 bit grid , about 30% to 50% , don't have an equate..
If you use an 8 x 8 grid..
It would take 4 bits to specify a row or column , and 1 bit to specify 1's or 0's = 5 bits..
So an equate , would cause 3 bits compression per equate.
But the number of grids with no equate , might rise to 6 in 10...
With the 12 bit Grids...
How would you make the mapping of equated rows and columns??
1110
1010
0110
Column 3 ( 1's ) equates..
Column 4 ( 0's ) equates..
Would you map it to a ; 21 ?= ( 101) and a 30 ?= ( 110 ) ?? = 6 bits , so it's 1 for 1 ??
I got to figure a way to do the mapping...
One grid that came up:
0000
0011
0000
You have 2 rows and 2 columns equating..
=====================================================
If you use a 16 bit grid, 4 x 4
1010
0100
1010
0000
Then:
Column 4 ( all 0's ) equates = ( 30 ) = 110 , it would compress a bit..
Row 4 ( all 0's ) equates = ( 30 ) = 110 , it would compress another bit..
With a 12 bit grid , about 1 in 10 doesn't have an equate..
With a 16 bit grid , about 30% to 50% , don't have an equate..
If you use an 8 x 8 grid..
It would take 4 bits to specify a row or column , and 1 bit to specify 1's or 0's = 5 bits..
So an equate , would cause 3 bits compression per equate.
But the number of grids with no equate , might rise to 6 in 10...
Re: Dodicat Zlib
I wrote a grid analyzing program...
You set the size var , 4x4 grid = 4 , 8x8 grid = 8 , 6x6 grid = 6 , etc..
It lets you know how many bytes compression you can expect from a particular grid size...
It goes to 99,999 bytes of grids , and then pauses.. size = 4 , gives the best values..
You set the size var , 4x4 grid = 4 , 8x8 grid = 8 , 6x6 grid = 6 , etc..
It lets you know how many bytes compression you can expect from a particular grid size...
It goes to 99,999 bytes of grids , and then pauses.. size = 4 , gives the best values..
Code: Select all
'Column Cancel Encoding ( CCE )
'
'You create a grid of bits and then check columns and rows for all bits being equal..
'If all of a row or column equals all 1's or all 0's then you can cancel those columns and rows.
screen 19
dim as longint size = 4
dim as longint gc , ec
do
randomize
dim as string n1( 0 to ( size - 1 ) )
dim as string s1( 0 to ( size - 1 ) )
for a as longint = 0 to ( size - 1 )
for b as longint = 1 to size
n1( a ) += bin( int( rnd * 2 ) )
next
next
dim as longint place = 1
dim as longint ele1 = 0
dim as longint ele2
do
ele2 = 0
do
s1( ele1 ) += mid( n1( ele2 ) , place , 1 )
ele2+= 1
loop until ele2 = size
place+= 1
ele1+=1
loop until ele1 = size
gc+=1
print
for a as longint = 0 to ( size - 1 )
if n1( a ) = string( size , "1" ) or n1( a ) = string( size , "0" ) then ec+=1 : print "row " ; a
next
for a as longint = 0 to ( size - 1 )
if s1( a ) = string( size , "1" ) or s1( a ) = string( size , "0" ) then ec+=1 : print "col " ; a
next
print
print "( rows )"
for a as longint = 0 to ( size - 1 )
print n1( a )
next
print
print "( cols )"
for a as longint = 0 to ( size - 1 )
print s1( a )
next
dim as longint outbytes = ( ( size - ( len( bin( size - 1 ) ) + 1 ) ) * ec )
print
print "Total grids = " ; gc , " = "; ( gc * ( size * size ) ) \ 8 ; " Bytes."
print "Total equates = " ; ec , " = "; outbytes \ 8 ; " Bytes."
if ( gc * ( size * size ) ) \ 8 > 99999 then sleep
if inkey = chr( 27 ) then exit do
loop
sleep
sleep
end
Re: Dodicat Zlib
@Dodicat
I've been having fun , playing around with the idea of ; Row-Column Cancel Encoding.. ( RCCE )
I'm still trying to figure out , how to map the equates..
For a 3 x 3 grid
n2 = "0"
if row( 0 ) = "000" then n2+= "0"
if row( 1 ) = "000" then n2+= "00"
if row( 2 ) = "000" then n2+= "01"
if row( 0 ) = "111" then n2+= "1"
if row( 1 ) = "111" then n2+= "10"
if row( 2 ) = "111" then n2+= "11"
n3 = "1"
if col( 0 ) = "000" then n3+= "0"
if col( 1 ) = "000" then n3+= "00"
if col( 2 ) = "000" then n3+= "01"
if col( 0 ) = "111" then n3+= "1"
if col( 1 ) = "111" then n3+= "10"
if col( 2 ) = "111" then n3+= "11"
if column matches are greater than row matches , then output n3 (col) else output n2 (row)
If their are no col or row matches , then you add "000" to the grid . So you know there's no output.
The problem is ; you don't know where n2 or n3 ends... there might be 2 or even 3 matches...
I've been having fun , playing around with the idea of ; Row-Column Cancel Encoding.. ( RCCE )
I'm still trying to figure out , how to map the equates..
For a 3 x 3 grid
n2 = "0"
if row( 0 ) = "000" then n2+= "0"
if row( 1 ) = "000" then n2+= "00"
if row( 2 ) = "000" then n2+= "01"
if row( 0 ) = "111" then n2+= "1"
if row( 1 ) = "111" then n2+= "10"
if row( 2 ) = "111" then n2+= "11"
n3 = "1"
if col( 0 ) = "000" then n3+= "0"
if col( 1 ) = "000" then n3+= "00"
if col( 2 ) = "000" then n3+= "01"
if col( 0 ) = "111" then n3+= "1"
if col( 1 ) = "111" then n3+= "10"
if col( 2 ) = "111" then n3+= "11"
if column matches are greater than row matches , then output n3 (col) else output n2 (row)
If their are no col or row matches , then you add "000" to the grid . So you know there's no output.
The problem is ; you don't know where n2 or n3 ends... there might be 2 or even 3 matches...
Re: Dodicat Zlib
Hi Albert.
I had to get one of my little dogs put to sleep today, she was very ill with kidney disease.
That's only five years since my little hound died.
I had taken her on when her owner died three years ago, she was quite an old dog then, about 11.5 years old.
This meant that she stayed on in her own village rather than go to Ireland with her owner's son who lives over there.
So I am not really in the mood for abstract thinking.
You can work away independently with your RCCE I am sure.
You got the fastest forum bigint multiplier using a grid method, so maybe you are on to a winner, I hope so anyway.
But beware of upsetting the natives again, they just love going on the warpath.
I had to get one of my little dogs put to sleep today, she was very ill with kidney disease.
That's only five years since my little hound died.
I had taken her on when her owner died three years ago, she was quite an old dog then, about 11.5 years old.
This meant that she stayed on in her own village rather than go to Ireland with her owner's son who lives over there.
So I am not really in the mood for abstract thinking.
You can work away independently with your RCCE I am sure.
You got the fastest forum bigint multiplier using a grid method, so maybe you are on to a winner, I hope so anyway.
But beware of upsetting the natives again, they just love going on the warpath.
Re: Dodicat Zlib
@Dodicat
I wrote a grid analyzing program...
You set the grid size , then it runs through all the binary values of that grid size..
It checks all the times that the row grid = the column grid.
For a size 2 grid , if row(0) = col(0) and row(1) = col(1) then record the value and increment the count...
For a 2 x 2 grid 4 bits , there are 8 times that the rows = cols ( 0 to 7 )
For a 3 x 3 grid 9 bits , there are 64 times that the rows = cols ( 0 to 63 )
For a 4 x 4 grid 16 bits , there are 1024 times that the rows = cols ( 0 to 1023 )
I wrote a grid analyzing program...
You set the grid size , then it runs through all the binary values of that grid size..
It checks all the times that the row grid = the column grid.
For a size 2 grid , if row(0) = col(0) and row(1) = col(1) then record the value and increment the count...
For a 2 x 2 grid 4 bits , there are 8 times that the rows = cols ( 0 to 7 )
For a 3 x 3 grid 9 bits , there are 64 times that the rows = cols ( 0 to 63 )
For a 4 x 4 grid 16 bits , there are 1024 times that the rows = cols ( 0 to 1023 )
Code: Select all
'This program compares row to columns..
'If all the rows equal all the columns , ( row grid = col grid ) then it records the value of the grid , and adds 1 to the count ( eq )..
screen 19
'set grid size
dim as ulongint size = 3
dim as ulongint top = ( 2 ^ ( size * size ) ) - 1
dim as string match = ""
dim as longint eq = - 1
dim as ulongint grid_bits = 0 , eq_bits = 0
dim as string ink
for z as ulongint = 0 to top step 1
dim as string s1 = right( string( size * size , "0" ) + bin( z ) , size * size )
dim as string row( 0 to ( size - 1 ) )
dim as string col( 0 to ( size - 1 ) )
'build rows
dim as ulongint ele = 0
for a as ulongint = 1 to len( s1 ) step size
row( ele ) = mid( s1 , a , size )
ele+= 1
next
'build columns
dim as ulongint place = 1
dim as ulongint ele1 = 0
dim as ulongint ele2
do
ele2 = 0
do
col( ele1 ) += mid( row( ele2 ) , place , 1 )
ele2+= 1
loop until ele2 = size
place+= 1
ele1+=1
loop until ele1 = size
grid_bits+= ( size * size )
cls
print
print "( rows )"
dim as string rows = ""
for a as ulongint = 0 to ( size - 1 )
print row( a )
rows+= row( a )
next
print
print "( cols )"
dim as string cols = ""
for a as ulongint = 0 to ( size - 1 )
print col( a )
cols+= col( a )
next
if rows = cols then match+= str( z ) + " " : eq+=1
print
print "Matches = " ; eq
ink = inkey
if ink = chr( 27 ) then exit for
next
print
print match
print
print "Times Row grid = Column grid " ; eq
eq_bits = ( ( size * size ) - len( bin( eq ) ) ) * ( eq + 1 )
print
print "Grid bytes = " ; ( grid_bits / 8 )
print "Compress bytes = " ; ( eq_bits / 8 )
sleep
end
Re: Dodicat Zlib
@Dodicat
For the "Row , Column Cancel Encoding" ( RCCE ) , I finally got the the mapping figured out...
It's compressing 1,000,000 bytes input , by 23% , on the first run through the compressor..
So i might have an error , or an omission somewhere...
I've gone over the code many times now , and i can't see any errors..
=============================================================================
@CoderJeff
Can i post the compression formula for Dodicat to check out?? ( I don't want to get banned again.. )
I'd like him to go over it , to see if the code actually works , or , if i have an error somewhere..
=============================================================================
For the "Row , Column Cancel Encoding" ( RCCE ) , I finally got the the mapping figured out...
It's compressing 1,000,000 bytes input , by 23% , on the first run through the compressor..
So i might have an error , or an omission somewhere...
I've gone over the code many times now , and i can't see any errors..
=============================================================================
@CoderJeff
Can i post the compression formula for Dodicat to check out?? ( I don't want to get banned again.. )
I'd like him to go over it , to see if the code actually works , or , if i have an error somewhere..
=============================================================================
Re: Dodicat Zlib
@Albert. Why even ask? You know the answer will be NO.
The FB forum is not the place for your ongoing attempts at data compression.
Put simply: Do not post your attempts at compression on this website. There is no point posting code that does not actually work, or that has not been tested. No other member wants to invest valuable time in working out exactly why your code does not work. You must learn to debug your own code.
You must write and debug both the compression and the expansion code yourself. Then you should use it on itself, to compress and expand the source code you wrote. If it runs correctly after expansion, then you are half way there.
Then compile the FB source to a file.exe, compress and expand the file.exe, and test if it still runs.
Then test your code really well on any and every file you can find, with any extension. If the compressed files are less than 66% of their original size, and if it never fails to expand correctly, then you might risk posting it on this forum. I expect someone will then find a file where it fails to work, because you couldn't be bothered testing it properly.
If you post code that you claim works, and someone else can find a file that breaks it, then you deserve a ban for wasting other members time.
The FB forum is not the place for your ongoing attempts at data compression.
Put simply: Do not post your attempts at compression on this website. There is no point posting code that does not actually work, or that has not been tested. No other member wants to invest valuable time in working out exactly why your code does not work. You must learn to debug your own code.
You must write and debug both the compression and the expansion code yourself. Then you should use it on itself, to compress and expand the source code you wrote. If it runs correctly after expansion, then you are half way there.
Then compile the FB source to a file.exe, compress and expand the file.exe, and test if it still runs.
Then test your code really well on any and every file you can find, with any extension. If the compressed files are less than 66% of their original size, and if it never fails to expand correctly, then you might risk posting it on this forum. I expect someone will then find a file where it fails to work, because you couldn't be bothered testing it properly.
If you post code that you claim works, and someone else can find a file that breaks it, then you deserve a ban for wasting other members time.
Re: Dodicat Zlib
@Richard
Got it Richard!! I'll work alone in it ...
I wont post any more here , till i got the decompression done...
Just sometimes ; you need a proof reader , to check the code , or to respond with ideas , for speeding it up..
Got it Richard!! I'll work alone in it ...
I wont post any more here , till i got the decompression done...
Just sometimes ; you need a proof reader , to check the code , or to respond with ideas , for speeding it up..