Squares

albert · Post by **albert** » Dec 27, 2017 21:08

@Richard

How do you make a integer 0 to 65535 into base 256?

The way the compression works is that there's always duplicates in a file.
So a 90 KB *.exe file , will only have like , maybe 10,000 unique values or less of 4 bytes.
I've tested several *.exe files , and am getting a range of 3,00 to 10,000 , unique values..
all the *.exe files seem to have a large number of "PQRST" 's
With text files it seem to be much more repeats, more duplicates..

Richard · Post by **Richard** » Dec 27, 2017 21:22

Albert wrote:How do you make a integer 0 to 65535 into base 256?

That is simply splitting a Ushort 16 bit integer into two 8 bit bytes.
hi = Hibyte( n ) : lo = Lobyte( n ). See also Loword() : Hiword()
hi = n \ 256 : lo = n Mod 256.

albert · Post by **albert** » Dec 27, 2017 23:51

@Richard

I'll describe it to you, maybe you can help..

I got the idea from the bible, the Jews left the vowels out of the words.
So you could take a 4 letter word, like "stop" and output it as "stp". compressing it by a 1/4.

Or use sentences , like "Oh My God" , (9 bytes) , compresses to "OMG" , (3 bytes). (use and make up texting lingo)
The problem i'm having is , for the dictionary , you have to put in (Oh My God) = bytes 1 thru 9..

The program could have preset sentences and their byte equates, and their 3 byte substitute..so you just look up the 3 letter equate in the table and replace it in the file with the 9 bytes equate.

Richard · Post by **Richard** » Dec 28, 2017 0:11

@Albert.
It seems that your compression will require a big dictionary that you have forgotten to include with the 4 byte condensate. That dictionary could be compressed with LZH and you will get the equivalent of LZH compression. You may as well use the same file format.

albert · Post by **albert** » Dec 29, 2017 2:01

@Richard

I tried to step by 2 bytes and store the two byte value into the dictionary and check for reversals.

( byte 1 + byte 2 )
( byte 2 + byte 1 ) , So there can only be a total possible of 32767 values instead of 65535.

But i tried to compress MP3 files and it generates over 32,000 unique values..15 bits
So it can't be compressed any further , cause the reversal would take a bit to signify the dictionary reversal..

I was thinking about stepping by 3 or more bytes and include all the permutations , which would take several bits to signify the permutation.

How many permutations are there for 3 bytes ?
Is it 3^3 or is it 3^3^3 ??

I'll try to use 3 bytes in the dictionary...

Do you have any sample code lying around to do the 3 byte permutations??

albert · Post by **albert** » Dec 29, 2017 4:53

@Richard

I figured it out on my own, the number of permutations is the ( length * ( length-1 ) ) is that right???

Heres the code i got so far.. best to try it on a file less than 50K , it takes 200+seconds on a 100K file.
So far it just builds the dictionary...
In the final , i'll create the dictionary as i go through the file.

If you try it on a *.ZIP file you can see that the dictionary takes almost the total file size.
So , somehow i got to figure out how to compress the dictionary..

Code: Select all

#define WIN_INCLUDEALL
#Include "windows.bi"
#Include "File.bi"

declare sub getfilename()
declare sub compress()
declare sub decompress()
declare function MultiBase_Number( byval Number as ulongint , byval Number_Base as ulongint ) as string

dim shared as string file , extension , file_data , bytes , file_name
dim as ubyte value1

Dim As MSG msg
Dim shared As HWND hWnd

screen 19

getfilename()

if FileExists(file) then
    for a as ulongint = len(file)-1 to 0 step -1
        bytes=chr(file[a])
        if bytes = "\" then file_name = mid(file,a+2):exit for
    next
    
    print file_name

    file_data = ""
    open file for binary as #1
        do
            get #1,,value1
            file_data = file_data + chr(value1)
        loop until eof(1)
    close #1
    
    if extension = ".BDC" then
        print
        print "DECOMPRESSING:" , len(file_data)
        decompress()
    else
        print
        print "COMPRESSING:" , len(file_data)
        extension=".BDC"
        compress()
    end if

end if
'===============================================================================
'===============================================================================
'END Program
'===============================================================================
'===============================================================================
END

'===============================================================================
'===============================================================================
'===============================================================================
'===============================================================================
sub compress()

    dim as string inputs = file_data
    
    dim as longint size = 3
    
    dim as double time1 = timer , time2
    
    start:
    
    redim as string numbers(1)
    dim as string vals = ""
    dim as string pm(1 to 6)
    dim as string permut = ""
    'dim as longint place = 1
    for a as longint = 1 to len(inputs) step size
        vals = mid(inputs,a,size)
        'pm(1) = left(vals,1)  + mid(vals,2,1) + right(vals,1)
        'pm(2) = left(vals,1)  + right(vals,1) + mid(vals,2,1)
        'pm(3) = mid(vals,2,1) + left(vals,1)  + right(vals,1)
        'pm(4) = mid(vals,2,1) + right(vals,1) + left(vals,1)
        'pm(5) = right(vals,1) + mid(vals,2,1) + left(vals,1)
        'pm(6) = right(vals,1) + left(vals,1)  + mid(vals,2,1)
        pm(1) = chr(vals[0]) + chr(vals[1]) + chr(vals[2])
        pm(2) = chr(vals[0]) + chr(vals[2]) + chr(vals[1])
        pm(3) = chr(vals[1]) + chr(vals[2]) + chr(vals[0])
        pm(4) = chr(vals[1]) + chr(vals[0]) + chr(vals[2])
        pm(5) = chr(vals[2]) + chr(vals[0]) + chr(vals[1])
        pm(6) = chr(vals[2]) + chr(vals[1]) + chr(vals[0])
        permut = pm(1)+" "+pm(2)+" "+pm(3)+" "+pm(4)+" "+pm(5)+" "+pm(6)
        for b as longint = 1 to ubound(numbers)
            if instr(1,permut,numbers(b)) <> 0 then goto done
        next
        numbers(ubound(numbers)) = pm(1)
        redim preserve numbers(ubound(numbers)+1)
        'place+=1
        'if (len(bin(place)) >= 14) then size+=1 : goto start 
        done:
    next
        
    time2 = timer
    
    for a as longint = 1 to ubound(numbers)
        print a , "step = " ; size , "dict len = " ; (a*size) , "data = " ; len(inputs)*8\3*(len(bin(a))+3)\8 , len(bin(a))+3  , len(inputs)
    next
    
    print "Time = " ; time2-time1
    
    
    sleep
    
    'sleep
    '    open file_name+".BDC" for output as #1
    '        print #1,bytes_out
    '    close #1
    'sleep
        
            
end sub
'===============================================================================
'===============================================================================
'Decompress
'===============================================================================
'===============================================================================
sub decompress()
'not written yet
end sub
'===============================================================================
'===============================================================================
'Get File Name
'===============================================================================
'===============================================================================
sub getfilename()
        dim ofn as OPENFILENAME
        dim filename as zstring * MAX_PATH+1
        
        with ofn
                .lStructSize            = sizeof( OPENFILENAME )
                .hwndOwner              = hWnd
                .hInstance              = GetModuleHandle( NULL )
                .lpstrFilter            = strptr( !"All Files, (*.*)\0*.*\0\0" )
                .lpstrCustomFilter      = NULL
                .nMaxCustFilter         = 0
                .nFilterIndex           = 1
                .lpstrFile              = @filename
                .nMaxFile               = sizeof( filename )
                .lpstrFileTitle         = NULL
                .nMaxFileTitle          = 0
                .lpstrInitialDir        = NULL
                '.lpstrTitle            = @"File Open Test"
                .lpstrTitle             = @"File to Compress/Decompress"
                .Flags                  = OFN_EXPLORER 'or OFN_FILEMUSTEXIST or OFN_PATHMUSTEXIST
                .nFileOffset            = 0
                .nFileExtension         = 0
                .lpstrDefExt            = NULL
                .lCustData              = 0
                .lpfnHook               = NULL
                .lpTemplateName         = NULL
        end with
        
        if( GetOpenFileName( @ofn ) = FALSE ) then
            file = ""
            return
        else
            file = filename
            extension = right$(filename,4)
    end if

end sub

Richard · Post by **Richard** » Dec 29, 2017 7:57

The key to compression is to find repeated strings in the file.
Make a table of repeated sequences of length two or more characters.
Keep a count of the number of occurrences of each.
The first pattern to compress should be the one with the greatest volume = length * occurrences.

albert · Post by **albert** » Dec 29, 2017 19:04

@Richard

Do you have any code to do permutations?? You posted one sometime back.
I searched but couldn't find it..

I want to create a table of all 3 byte values.. 0 to 255 , 0 to 255 , 0 to 255
And then take out the permutation repeats...

Theres 16,777,216 different values , but i thinking if you check for permutations it should be about 65536.

Can you repost the permutation code , or point me to the post where you posted it?

albert · Post by **albert** » Dec 29, 2017 20:20

@rRchard

I figured it out , its just a triple nested loop..

I got it running , it might take a whole day or two to complete...

Code: Select all



screen 19

dim as string bytes =""
dim as string*3 values(1)
for a as longint = 0 to 255
    for b as longint = 0 to 255
        for c as longint = 0 to 255
            bytes = chr(a) + chr(b) + chr(c)
            values(ubound(values)) = bytes
            redim preserve values(ubound(values)+1)
            print a,b,c
        next
    next
next

dim as string*3 vals = ""
dim as string*3 pm1
dim as string*3 pm2
dim as string*3 pm3
dim as string*3 pm4
dim as string*3 pm5
dim as string*3 pm6
dim as string*23 permut = ""
for a as longint = 1 to ubound(values)
    vals = values(a)
    pm1 = chr(vals[0]) + chr(vals[1]) + chr(vals[2])
    pm2 = chr(vals[0]) + chr(vals[2]) + chr(vals[1])
    pm3 = chr(vals[1]) + chr(vals[2]) + chr(vals[0])
    pm4 = chr(vals[1]) + chr(vals[0]) + chr(vals[2])
    pm5 = chr(vals[2]) + chr(vals[0]) + chr(vals[1])
    pm6 = chr(vals[2]) + chr(vals[1]) + chr(vals[0])
    permut = pm1+" "+pm2+" "+pm3+" "+pm4+" "+pm5+" "+pm6
    for b as longint = 1 to ubound(values)
        if b <> a then if instr(1,permut,vals) > 0 then values(b) = ""
    next
    print a ;" out of " ; ubound(values)
next

dim as longint count = 0
dim as longint dups = 0
for a as longint = 1 to ubound(values)
    if values(a) <> "" then count+=1
    if values(a) =  "" then dups+=1
next

print "16,777,216" , "unique = " ; count , "dups = " ;dups

sleep
end

albert · Post by **albert** » Dec 30, 2017 17:21

Hello Guys;

I had to stick in sleep 20,000 , after each "a" iteration , my processor was overheating and shutting down the computer.
So it might take several hours to build the 16,777,216 * 6 array()

Code: Select all


screen 19

redim as string*3 values(6)
dim as string*3 vals
dim as string*3 pm1
dim as string*3 pm2
dim as string*3 pm3
dim as string*3 pm4
dim as string*3 pm5
dim as string*3 pm6
dim as ulongint place = 1
for a as longint = 0 to 255
    for b as longint = 0 to 255
        for c as longint = 0 to 255
            
            vals = chr(a) + chr(b) + chr(c)
            
            pm1 = chr(vals[0]) + chr(vals[1]) + chr(vals[2])
            pm2 = chr(vals[0]) + chr(vals[2]) + chr(vals[1])
            pm3 = chr(vals[1]) + chr(vals[2]) + chr(vals[0])
            pm4 = chr(vals[1]) + chr(vals[0]) + chr(vals[2])
            pm5 = chr(vals[2]) + chr(vals[0]) + chr(vals[1])
            pm6 = chr(vals[2]) + chr(vals[1]) + chr(vals[0])
            
            values(place) = pm1 : place+=1
            values(place) = pm2 : place+=1
            values(place) = pm3 : place+=1
            values(place) = pm4 : place+=1
            values(place) = pm5 : place+=1
            values(place) = pm6 : place+=1
            redim preserve values(ubound(values)+6)
        next
    next
    print "a = " ; a
    sleep 20000
next

print "done getting values"

sleep 60*10*1000

print "Checking for dups"
for chk1 as ulongint = 1 to ubound(values)
    for chk2 as ulongint = chk1+1 to ubound(values)
        if values(chk1) = values(chk2) then values(chk2) = ""
    next
    if chk1 mod 100000 = 0 then print ubound(values) , chk1
next

print "counting uniques and dups"
dim as ulongint dups = 0
dim as ulongint count = 0
for a as longint = 1 to ubound(values)
    if values(a)  = "" then dups+= 1
    if values(a) <> "" then count+=1
    if a mod 1000000 = 0 then print ubound(values) , a
next

print "16,777,216" , "unique = " ; count , "dups = " ;dups

sleep
end

sancho3 · Post by **sancho3** » Dec 30, 2017 20:14

I think you are duplicating permutations.
For example, consider that the number of permutations is 256 * 256 * 256.
In your code it is 256 * 256 * 256 * 6. (I think)

albert · Post by **albert** » Dec 31, 2017 0:31

@sancho3

Each three bytes , can be permuted 6 ways...i just included each 6 ways.. then the 100MB bubble sort (16M * 6) can wipe out any and all dups.

Some values like "0 0 0" , "255 255 255" can't be permuted.. "00? " can be permuted only 3 ways. "123" can be permuted 6 ways.
So i just stuck all 6 ways into it..

I was thinking how to build the array faster, or how to do it without an array.. but it's still foggy in my head..

I redid the bubble sort to skip over blanks....It takes like 4 hours to build the 100MB array...
I redid it , for testing , to just do 65536 elements

Code: Select all


screen 19

redim as string*2 values(1 to 65536)
dim as string*2 vals
dim as longint place = 1
for b as longint = 0 to 255
    for c as longint = 0 to 255
        vals = chr(b) + chr(c)
        values(place) = vals
        print place
        place+=1
    next
next

print "done getting values"

print "Checking for dups"
dim as ulongint a_count = 0
dim as ulongint b_count = 0
dim as string*2 chk1
dim as string*2 chk2
dim as string*2 chk3
dim as longint dups = 0
dim as double time1 , time2
time1 = timer
do
    a_count+=1
    do
        chk1 = values(a_count)
        if chk1 = "" then a_count+=1
    loop until chk1 <> ""
    chk2 = right(values(a_count),1) + left(values(a_count),1)
    b_count = a_count+1
    do
        do
            chk3 = values(b_count)
            if chk3 = "" then b_count+=1
        loop until chk3 <> ""
        if chk1 = chk3 then values(b_count) = "" : dups+=1
        if chk2 = chk3 then values(b_count) = "" : dups+=1
        b_count+=1
    loop until b_count > ubound(values)
        
    time2 = timer
    if time2 - time1 > 10 then 
        print ubound(values) , a_count , "# of dups found = " ; dups
        time1 = timer
    end if
    
loop until a_count > ubound(values)-1

print ubound(values) , a_count , "# of dups found = " ; dups

print "counting uniques and dups"
dups = 0
dim as ulongint count = 0
for a as longint = 1 to ubound(values)
    if values(a)  = "" then dups+= 1
    if values(a) <> "" then count+=1
next

print "65,536" , "unique = " ; count , "dups = " ;dups , "total = " ; count+dups

sleep
end

albert · Post by **albert** » Dec 31, 2017 4:34

I preset the array at 16,777,216 elements.
Now it fills the array in about 10 seconds.

Then I check for permutations...Gonna take a while..
Heres the 255,255,255 code...

Code: Select all


screen 19

redim as string*3 values(1 to 16777216)
dim as string*3 vals
dim as longint place = 1
for a as longint = 0 to 255
    for b as longint = 0 to 255
        for c as longint = 0 to 255
            vals = chr(a) + chr(b) + chr(c)
            values(place) = vals
            place+=1
        next
    next
    print place
next

print "done getting values"

print "Checking for dups"
dim as ulongint a_count = 0
dim as ulongint b_count = 0

dim as string*3 pm1
dim as string*3 pm2
dim as string*3 pm3
dim as string*3 pm4
dim as string*3 pm5
dim as string*3 pm6
dim as string*3 chk3
dim as longint dups = 0
dim as double time1 , time2
time1 = timer
do
    a_count+=1
    do
        vals = values(a_count)
        if vals = "" then a_count+=1
    loop until vals <> ""
    pm1 = chr(vals[0]) + chr(vals[1]) + chr(vals[2])
    pm2 = chr(vals[0]) + chr(vals[2]) + chr(vals[1])
    pm3 = chr(vals[1]) + chr(vals[2]) + chr(vals[0])
    pm4 = chr(vals[1]) + chr(vals[0]) + chr(vals[2])
    pm5 = chr(vals[2]) + chr(vals[0]) + chr(vals[1])
    pm6 = chr(vals[2]) + chr(vals[1]) + chr(vals[0])
    b_count = a_count+1
    do
        do
            chk3 = values(b_count)
            if chk3 = "" then b_count+=1
        loop until chk3 <> ""
        if pm1 = chk3 then values(b_count) = "" : dups+=1
        if pm2 = chk3 then values(b_count) = "" : dups+=1
        if pm3 = chk3 then values(b_count) = "" : dups+=1
        if pm4 = chk3 then values(b_count) = "" : dups+=1
        if pm5 = chk3 then values(b_count) = "" : dups+=1
        if pm6 = chk3 then values(b_count) = "" : dups+=1
        b_count+=1
    loop until b_count > ubound(values)
        
    time2 = timer
    if time2 - time1 > 10 then 
        print ubound(values) , a_count , "# of dups found = " ; dups
        sleep 10000
        time1 = timer
    end if
    
loop until a_count > ubound(values)-1

print ubound(values) , a_count , "# of dups found = " ; dups

print "counting uniques and dups"
dups = 0
dim as ulongint count = 0
for a as longint = 1 to ubound(values)
    if values(a)  = "" then dups+= 1
    if values(a) <> "" then count+=1
next

print "16,777,216" , "unique = " ; count , "dups = " ;dups , "total = " ; count+dups

sleep
end

albert · Post by **albert** » Dec 31, 2017 4:46

@Richard

vals = chr(128) + chr(200) + chr(132)

How do you get the decimal value ( 0 to 16,777,215 ) of the vals var??

sancho3 · Post by **sancho3** » Dec 31, 2017 4:48

The stack space on my computer is too small to define an array large enough to hold all the perms.
Here is a way of using a block of memory to do it. I reduced it to only the perms. of 65 and 66 (A and B). I don't know for sure it will work properly when you set it to load the whole 0-255. 65 and 66 are easier to read.
This also doesn't help you sort them any faster since it doesn't get into an array.
I only allocate 24 bytes since thats all I need in my demo. You will need 3 * 256 * 256 * 256.
Maybe it can help you in some way.

Code: Select all

dim as byte ptr buffer
dim as string s
dim as ulong n

buffer = allocate(24) 

for a as integer = 65 to 66
	for b as integer = 65 to 66 
		for c as integer = 65 to 66
			buffer[n] = a
			buffer[n+1] = b
			buffer[n+2] = c
			n += 3
			next
	next 
next
s = *cast(zstring ptr, buffer)
? s
deallocate(buffer)

Squares

Re: Squares

Re: Squares

Re: Squares

Re: Squares

Re: Squares

Re: Squares

Re: Squares

Re: Squares

Re: Squares

Re: Squares

Re: Squares

Re: Squares

Re: Squares

Re: Squares

Re: Squares