Squares
Re: Squares
@Richard
How do you make a integer 0 to 65535 into base 256?
The way the compression works is that there's always duplicates in a file.
So a 90 KB *.exe file , will only have like , maybe 10,000 unique values or less of 4 bytes.
I've tested several *.exe files , and am getting a range of 3,00 to 10,000 , unique values..
all the *.exe files seem to have a large number of "PQRST" 's
With text files it seem to be much more repeats, more duplicates..
How do you make a integer 0 to 65535 into base 256?
The way the compression works is that there's always duplicates in a file.
So a 90 KB *.exe file , will only have like , maybe 10,000 unique values or less of 4 bytes.
I've tested several *.exe files , and am getting a range of 3,00 to 10,000 , unique values..
all the *.exe files seem to have a large number of "PQRST" 's
With text files it seem to be much more repeats, more duplicates..
Re: Squares
That is simply splitting a Ushort 16 bit integer into two 8 bit bytes.Albert wrote:How do you make a integer 0 to 65535 into base 256?
hi = Hibyte( n ) : lo = Lobyte( n ). See also Loword() : Hiword()
hi = n \ 256 : lo = n Mod 256.
Re: Squares
@Richard
I'll describe it to you, maybe you can help..
I got the idea from the bible, the Jews left the vowels out of the words.
So you could take a 4 letter word, like "stop" and output it as "stp". compressing it by a 1/4.
Or use sentences , like "Oh My God" , (9 bytes) , compresses to "OMG" , (3 bytes). (use and make up texting lingo)
The problem i'm having is , for the dictionary , you have to put in (Oh My God) = bytes 1 thru 9..
The program could have preset sentences and their byte equates, and their 3 byte substitute..so you just look up the 3 letter equate in the table and replace it in the file with the 9 bytes equate.
I'll describe it to you, maybe you can help..
I got the idea from the bible, the Jews left the vowels out of the words.
So you could take a 4 letter word, like "stop" and output it as "stp". compressing it by a 1/4.
Or use sentences , like "Oh My God" , (9 bytes) , compresses to "OMG" , (3 bytes). (use and make up texting lingo)
The problem i'm having is , for the dictionary , you have to put in (Oh My God) = bytes 1 thru 9..
The program could have preset sentences and their byte equates, and their 3 byte substitute..so you just look up the 3 letter equate in the table and replace it in the file with the 9 bytes equate.
Re: Squares
@Albert.
It seems that your compression will require a big dictionary that you have forgotten to include with the 4 byte condensate. That dictionary could be compressed with LZH and you will get the equivalent of LZH compression. You may as well use the same file format.
It seems that your compression will require a big dictionary that you have forgotten to include with the 4 byte condensate. That dictionary could be compressed with LZH and you will get the equivalent of LZH compression. You may as well use the same file format.
Re: Squares
@Richard
I tried to step by 2 bytes and store the two byte value into the dictionary and check for reversals.
( byte 1 + byte 2 )
( byte 2 + byte 1 ) , So there can only be a total possible of 32767 values instead of 65535.
But i tried to compress MP3 files and it generates over 32,000 unique values..15 bits
So it can't be compressed any further , cause the reversal would take a bit to signify the dictionary reversal..
I was thinking about stepping by 3 or more bytes and include all the permutations , which would take several bits to signify the permutation.
How many permutations are there for 3 bytes ?
Is it 3^3 or is it 3^3^3 ??
I'll try to use 3 bytes in the dictionary...
Do you have any sample code lying around to do the 3 byte permutations??
I tried to step by 2 bytes and store the two byte value into the dictionary and check for reversals.
( byte 1 + byte 2 )
( byte 2 + byte 1 ) , So there can only be a total possible of 32767 values instead of 65535.
But i tried to compress MP3 files and it generates over 32,000 unique values..15 bits
So it can't be compressed any further , cause the reversal would take a bit to signify the dictionary reversal..
I was thinking about stepping by 3 or more bytes and include all the permutations , which would take several bits to signify the permutation.
How many permutations are there for 3 bytes ?
Is it 3^3 or is it 3^3^3 ??
I'll try to use 3 bytes in the dictionary...
Do you have any sample code lying around to do the 3 byte permutations??
Re: Squares
@Richard
I figured it out on my own, the number of permutations is the ( length * ( length-1 ) ) is that right???
Heres the code i got so far.. best to try it on a file less than 50K , it takes 200+seconds on a 100K file.
So far it just builds the dictionary...
In the final , i'll create the dictionary as i go through the file.
If you try it on a *.ZIP file you can see that the dictionary takes almost the total file size.
So , somehow i got to figure out how to compress the dictionary..
I figured it out on my own, the number of permutations is the ( length * ( length-1 ) ) is that right???
Heres the code i got so far.. best to try it on a file less than 50K , it takes 200+seconds on a 100K file.
So far it just builds the dictionary...
In the final , i'll create the dictionary as i go through the file.
If you try it on a *.ZIP file you can see that the dictionary takes almost the total file size.
So , somehow i got to figure out how to compress the dictionary..
Code: Select all
#define WIN_INCLUDEALL
#Include "windows.bi"
#Include "File.bi"
declare sub getfilename()
declare sub compress()
declare sub decompress()
declare function MultiBase_Number( byval Number as ulongint , byval Number_Base as ulongint ) as string
dim shared as string file , extension , file_data , bytes , file_name
dim as ubyte value1
Dim As MSG msg
Dim shared As HWND hWnd
screen 19
getfilename()
if FileExists(file) then
for a as ulongint = len(file)-1 to 0 step -1
bytes=chr(file[a])
if bytes = "\" then file_name = mid(file,a+2):exit for
next
print file_name
file_data = ""
open file for binary as #1
do
get #1,,value1
file_data = file_data + chr(value1)
loop until eof(1)
close #1
if extension = ".BDC" then
print
print "DECOMPRESSING:" , len(file_data)
decompress()
else
print
print "COMPRESSING:" , len(file_data)
extension=".BDC"
compress()
end if
end if
'===============================================================================
'===============================================================================
'END Program
'===============================================================================
'===============================================================================
END
'===============================================================================
'===============================================================================
'===============================================================================
'===============================================================================
sub compress()
dim as string inputs = file_data
dim as longint size = 3
dim as double time1 = timer , time2
start:
redim as string numbers(1)
dim as string vals = ""
dim as string pm(1 to 6)
dim as string permut = ""
'dim as longint place = 1
for a as longint = 1 to len(inputs) step size
vals = mid(inputs,a,size)
'pm(1) = left(vals,1) + mid(vals,2,1) + right(vals,1)
'pm(2) = left(vals,1) + right(vals,1) + mid(vals,2,1)
'pm(3) = mid(vals,2,1) + left(vals,1) + right(vals,1)
'pm(4) = mid(vals,2,1) + right(vals,1) + left(vals,1)
'pm(5) = right(vals,1) + mid(vals,2,1) + left(vals,1)
'pm(6) = right(vals,1) + left(vals,1) + mid(vals,2,1)
pm(1) = chr(vals[0]) + chr(vals[1]) + chr(vals[2])
pm(2) = chr(vals[0]) + chr(vals[2]) + chr(vals[1])
pm(3) = chr(vals[1]) + chr(vals[2]) + chr(vals[0])
pm(4) = chr(vals[1]) + chr(vals[0]) + chr(vals[2])
pm(5) = chr(vals[2]) + chr(vals[0]) + chr(vals[1])
pm(6) = chr(vals[2]) + chr(vals[1]) + chr(vals[0])
permut = pm(1)+" "+pm(2)+" "+pm(3)+" "+pm(4)+" "+pm(5)+" "+pm(6)
for b as longint = 1 to ubound(numbers)
if instr(1,permut,numbers(b)) <> 0 then goto done
next
numbers(ubound(numbers)) = pm(1)
redim preserve numbers(ubound(numbers)+1)
'place+=1
'if (len(bin(place)) >= 14) then size+=1 : goto start
done:
next
time2 = timer
for a as longint = 1 to ubound(numbers)
print a , "step = " ; size , "dict len = " ; (a*size) , "data = " ; len(inputs)*8\3*(len(bin(a))+3)\8 , len(bin(a))+3 , len(inputs)
next
print "Time = " ; time2-time1
sleep
'sleep
' open file_name+".BDC" for output as #1
' print #1,bytes_out
' close #1
'sleep
end sub
'===============================================================================
'===============================================================================
'Decompress
'===============================================================================
'===============================================================================
sub decompress()
'not written yet
end sub
'===============================================================================
'===============================================================================
'Get File Name
'===============================================================================
'===============================================================================
sub getfilename()
dim ofn as OPENFILENAME
dim filename as zstring * MAX_PATH+1
with ofn
.lStructSize = sizeof( OPENFILENAME )
.hwndOwner = hWnd
.hInstance = GetModuleHandle( NULL )
.lpstrFilter = strptr( !"All Files, (*.*)\0*.*\0\0" )
.lpstrCustomFilter = NULL
.nMaxCustFilter = 0
.nFilterIndex = 1
.lpstrFile = @filename
.nMaxFile = sizeof( filename )
.lpstrFileTitle = NULL
.nMaxFileTitle = 0
.lpstrInitialDir = NULL
'.lpstrTitle = @"File Open Test"
.lpstrTitle = @"File to Compress/Decompress"
.Flags = OFN_EXPLORER 'or OFN_FILEMUSTEXIST or OFN_PATHMUSTEXIST
.nFileOffset = 0
.nFileExtension = 0
.lpstrDefExt = NULL
.lCustData = 0
.lpfnHook = NULL
.lpTemplateName = NULL
end with
if( GetOpenFileName( @ofn ) = FALSE ) then
file = ""
return
else
file = filename
extension = right$(filename,4)
end if
end sub
Re: Squares
The key to compression is to find repeated strings in the file.
Make a table of repeated sequences of length two or more characters.
Keep a count of the number of occurrences of each.
The first pattern to compress should be the one with the greatest volume = length * occurrences.
Make a table of repeated sequences of length two or more characters.
Keep a count of the number of occurrences of each.
The first pattern to compress should be the one with the greatest volume = length * occurrences.
Re: Squares
@Richard
Do you have any code to do permutations?? You posted one sometime back.
I searched but couldn't find it..
I want to create a table of all 3 byte values.. 0 to 255 , 0 to 255 , 0 to 255
And then take out the permutation repeats...
Theres 16,777,216 different values , but i thinking if you check for permutations it should be about 65536.
Can you repost the permutation code , or point me to the post where you posted it?
Do you have any code to do permutations?? You posted one sometime back.
I searched but couldn't find it..
I want to create a table of all 3 byte values.. 0 to 255 , 0 to 255 , 0 to 255
And then take out the permutation repeats...
Theres 16,777,216 different values , but i thinking if you check for permutations it should be about 65536.
Can you repost the permutation code , or point me to the post where you posted it?
Re: Squares
@rRchard
I figured it out , its just a triple nested loop..
I got it running , it might take a whole day or two to complete...
I figured it out , its just a triple nested loop..
I got it running , it might take a whole day or two to complete...
Code: Select all
screen 19
dim as string bytes =""
dim as string*3 values(1)
for a as longint = 0 to 255
for b as longint = 0 to 255
for c as longint = 0 to 255
bytes = chr(a) + chr(b) + chr(c)
values(ubound(values)) = bytes
redim preserve values(ubound(values)+1)
print a,b,c
next
next
next
dim as string*3 vals = ""
dim as string*3 pm1
dim as string*3 pm2
dim as string*3 pm3
dim as string*3 pm4
dim as string*3 pm5
dim as string*3 pm6
dim as string*23 permut = ""
for a as longint = 1 to ubound(values)
vals = values(a)
pm1 = chr(vals[0]) + chr(vals[1]) + chr(vals[2])
pm2 = chr(vals[0]) + chr(vals[2]) + chr(vals[1])
pm3 = chr(vals[1]) + chr(vals[2]) + chr(vals[0])
pm4 = chr(vals[1]) + chr(vals[0]) + chr(vals[2])
pm5 = chr(vals[2]) + chr(vals[0]) + chr(vals[1])
pm6 = chr(vals[2]) + chr(vals[1]) + chr(vals[0])
permut = pm1+" "+pm2+" "+pm3+" "+pm4+" "+pm5+" "+pm6
for b as longint = 1 to ubound(values)
if b <> a then if instr(1,permut,vals) > 0 then values(b) = ""
next
print a ;" out of " ; ubound(values)
next
dim as longint count = 0
dim as longint dups = 0
for a as longint = 1 to ubound(values)
if values(a) <> "" then count+=1
if values(a) = "" then dups+=1
next
print "16,777,216" , "unique = " ; count , "dups = " ;dups
sleep
end
Re: Squares
Hello Guys;
I had to stick in sleep 20,000 , after each "a" iteration , my processor was overheating and shutting down the computer.
So it might take several hours to build the 16,777,216 * 6 array()
I had to stick in sleep 20,000 , after each "a" iteration , my processor was overheating and shutting down the computer.
So it might take several hours to build the 16,777,216 * 6 array()
Code: Select all
screen 19
redim as string*3 values(6)
dim as string*3 vals
dim as string*3 pm1
dim as string*3 pm2
dim as string*3 pm3
dim as string*3 pm4
dim as string*3 pm5
dim as string*3 pm6
dim as ulongint place = 1
for a as longint = 0 to 255
for b as longint = 0 to 255
for c as longint = 0 to 255
vals = chr(a) + chr(b) + chr(c)
pm1 = chr(vals[0]) + chr(vals[1]) + chr(vals[2])
pm2 = chr(vals[0]) + chr(vals[2]) + chr(vals[1])
pm3 = chr(vals[1]) + chr(vals[2]) + chr(vals[0])
pm4 = chr(vals[1]) + chr(vals[0]) + chr(vals[2])
pm5 = chr(vals[2]) + chr(vals[0]) + chr(vals[1])
pm6 = chr(vals[2]) + chr(vals[1]) + chr(vals[0])
values(place) = pm1 : place+=1
values(place) = pm2 : place+=1
values(place) = pm3 : place+=1
values(place) = pm4 : place+=1
values(place) = pm5 : place+=1
values(place) = pm6 : place+=1
redim preserve values(ubound(values)+6)
next
next
print "a = " ; a
sleep 20000
next
print "done getting values"
sleep 60*10*1000
print "Checking for dups"
for chk1 as ulongint = 1 to ubound(values)
for chk2 as ulongint = chk1+1 to ubound(values)
if values(chk1) = values(chk2) then values(chk2) = ""
next
if chk1 mod 100000 = 0 then print ubound(values) , chk1
next
print "counting uniques and dups"
dim as ulongint dups = 0
dim as ulongint count = 0
for a as longint = 1 to ubound(values)
if values(a) = "" then dups+= 1
if values(a) <> "" then count+=1
if a mod 1000000 = 0 then print ubound(values) , a
next
print "16,777,216" , "unique = " ; count , "dups = " ;dups
sleep
end
Re: Squares
I think you are duplicating permutations.
For example, consider that the number of permutations is 256 * 256 * 256.
In your code it is 256 * 256 * 256 * 6. (I think)
For example, consider that the number of permutations is 256 * 256 * 256.
In your code it is 256 * 256 * 256 * 6. (I think)
Re: Squares
@sancho3
Each three bytes , can be permuted 6 ways...i just included each 6 ways.. then the 100MB bubble sort (16M * 6) can wipe out any and all dups.
Some values like "0 0 0" , "255 255 255" can't be permuted.. "00? " can be permuted only 3 ways. "123" can be permuted 6 ways.
So i just stuck all 6 ways into it..
I was thinking how to build the array faster, or how to do it without an array.. but it's still foggy in my head..
I redid the bubble sort to skip over blanks....It takes like 4 hours to build the 100MB array...
I redid it , for testing , to just do 65536 elements
Each three bytes , can be permuted 6 ways...i just included each 6 ways.. then the 100MB bubble sort (16M * 6) can wipe out any and all dups.
Some values like "0 0 0" , "255 255 255" can't be permuted.. "00? " can be permuted only 3 ways. "123" can be permuted 6 ways.
So i just stuck all 6 ways into it..
I was thinking how to build the array faster, or how to do it without an array.. but it's still foggy in my head..
I redid the bubble sort to skip over blanks....It takes like 4 hours to build the 100MB array...
I redid it , for testing , to just do 65536 elements
Code: Select all
screen 19
redim as string*2 values(1 to 65536)
dim as string*2 vals
dim as longint place = 1
for b as longint = 0 to 255
for c as longint = 0 to 255
vals = chr(b) + chr(c)
values(place) = vals
print place
place+=1
next
next
print "done getting values"
print "Checking for dups"
dim as ulongint a_count = 0
dim as ulongint b_count = 0
dim as string*2 chk1
dim as string*2 chk2
dim as string*2 chk3
dim as longint dups = 0
dim as double time1 , time2
time1 = timer
do
a_count+=1
do
chk1 = values(a_count)
if chk1 = "" then a_count+=1
loop until chk1 <> ""
chk2 = right(values(a_count),1) + left(values(a_count),1)
b_count = a_count+1
do
do
chk3 = values(b_count)
if chk3 = "" then b_count+=1
loop until chk3 <> ""
if chk1 = chk3 then values(b_count) = "" : dups+=1
if chk2 = chk3 then values(b_count) = "" : dups+=1
b_count+=1
loop until b_count > ubound(values)
time2 = timer
if time2 - time1 > 10 then
print ubound(values) , a_count , "# of dups found = " ; dups
time1 = timer
end if
loop until a_count > ubound(values)-1
print ubound(values) , a_count , "# of dups found = " ; dups
print "counting uniques and dups"
dups = 0
dim as ulongint count = 0
for a as longint = 1 to ubound(values)
if values(a) = "" then dups+= 1
if values(a) <> "" then count+=1
next
print "65,536" , "unique = " ; count , "dups = " ;dups , "total = " ; count+dups
sleep
end
Re: Squares
I preset the array at 16,777,216 elements.
Now it fills the array in about 10 seconds.
Then I check for permutations...Gonna take a while..
Heres the 255,255,255 code...
Now it fills the array in about 10 seconds.
Then I check for permutations...Gonna take a while..
Heres the 255,255,255 code...
Code: Select all
screen 19
redim as string*3 values(1 to 16777216)
dim as string*3 vals
dim as longint place = 1
for a as longint = 0 to 255
for b as longint = 0 to 255
for c as longint = 0 to 255
vals = chr(a) + chr(b) + chr(c)
values(place) = vals
place+=1
next
next
print place
next
print "done getting values"
print "Checking for dups"
dim as ulongint a_count = 0
dim as ulongint b_count = 0
dim as string*3 pm1
dim as string*3 pm2
dim as string*3 pm3
dim as string*3 pm4
dim as string*3 pm5
dim as string*3 pm6
dim as string*3 chk3
dim as longint dups = 0
dim as double time1 , time2
time1 = timer
do
a_count+=1
do
vals = values(a_count)
if vals = "" then a_count+=1
loop until vals <> ""
pm1 = chr(vals[0]) + chr(vals[1]) + chr(vals[2])
pm2 = chr(vals[0]) + chr(vals[2]) + chr(vals[1])
pm3 = chr(vals[1]) + chr(vals[2]) + chr(vals[0])
pm4 = chr(vals[1]) + chr(vals[0]) + chr(vals[2])
pm5 = chr(vals[2]) + chr(vals[0]) + chr(vals[1])
pm6 = chr(vals[2]) + chr(vals[1]) + chr(vals[0])
b_count = a_count+1
do
do
chk3 = values(b_count)
if chk3 = "" then b_count+=1
loop until chk3 <> ""
if pm1 = chk3 then values(b_count) = "" : dups+=1
if pm2 = chk3 then values(b_count) = "" : dups+=1
if pm3 = chk3 then values(b_count) = "" : dups+=1
if pm4 = chk3 then values(b_count) = "" : dups+=1
if pm5 = chk3 then values(b_count) = "" : dups+=1
if pm6 = chk3 then values(b_count) = "" : dups+=1
b_count+=1
loop until b_count > ubound(values)
time2 = timer
if time2 - time1 > 10 then
print ubound(values) , a_count , "# of dups found = " ; dups
sleep 10000
time1 = timer
end if
loop until a_count > ubound(values)-1
print ubound(values) , a_count , "# of dups found = " ; dups
print "counting uniques and dups"
dups = 0
dim as ulongint count = 0
for a as longint = 1 to ubound(values)
if values(a) = "" then dups+= 1
if values(a) <> "" then count+=1
next
print "16,777,216" , "unique = " ; count , "dups = " ;dups , "total = " ; count+dups
sleep
end
Re: Squares
@Richard
vals = chr(128) + chr(200) + chr(132)
How do you get the decimal value ( 0 to 16,777,215 ) of the vals var??
vals = chr(128) + chr(200) + chr(132)
How do you get the decimal value ( 0 to 16,777,215 ) of the vals var??
Re: Squares
The stack space on my computer is too small to define an array large enough to hold all the perms.
Here is a way of using a block of memory to do it. I reduced it to only the perms. of 65 and 66 (A and B). I don't know for sure it will work properly when you set it to load the whole 0-255. 65 and 66 are easier to read.
This also doesn't help you sort them any faster since it doesn't get into an array.
I only allocate 24 bytes since thats all I need in my demo. You will need 3 * 256 * 256 * 256.
Maybe it can help you in some way.
Here is a way of using a block of memory to do it. I reduced it to only the perms. of 65 and 66 (A and B). I don't know for sure it will work properly when you set it to load the whole 0-255. 65 and 66 are easier to read.
This also doesn't help you sort them any faster since it doesn't get into an array.
I only allocate 24 bytes since thats all I need in my demo. You will need 3 * 256 * 256 * 256.
Maybe it can help you in some way.
Code: Select all
dim as byte ptr buffer
dim as string s
dim as ulong n
buffer = allocate(24)
for a as integer = 65 to 66
for b as integer = 65 to 66
for c as integer = 65 to 66
buffer[n] = a
buffer[n+1] = b
buffer[n+2] = c
n += 3
next
next
next
s = *cast(zstring ptr, buffer)
? s
deallocate(buffer)