Data Compression Different Methods

Post your FreeBASIC source, examples, tips and tricks here. Please don’t post code without including an explanation.
neil
Posts: 431
Joined: Mar 17, 2022 23:26

Re: Data Compression Different Methods

Post by neil »

In a forum, someone posted to try a random number generator, so I did. The idea is that you try to match your four numbers with a PRNG. Then you save the loop count if its size is less than your four numbers. A decoder would use the 2 byte loop count number to restore 4 bytes. This is not reliable, and I had to change the seed number quite a few times to get it to work. This was just an experiment.

Code: Select all

Dim As Uinteger i,x,cnt
Dim As Ubyte n1,n2,n3,n4
Dim As Ubyte a,b,c,d

'my numbers
n1 = 33:n2 = 207:n3 = 94:n4 = 70

'seed number
randomize 112

Do
a = (rnd * 256)
b = (rnd * 256)
c = (rnd * 256)
d = (rnd * 256)

if n1 = a and n2 = b and n3 = c and n4 = d Then
Print "Your numbers matched Loop count = ";hex(cnt)
Exit do
End if
cnt += 1
Loop Until inkey = chr(27)

print
print n1,n2,n3,n4
print a,b,c,d

' decoder
randomize 112

For i = 0 to &H6D70
a = (rnd * 256)
b = (rnd * 256)
c = (rnd * 256)
d = (rnd * 256)
next
Print
Print "4 Bytes restored from loop count"
print a,b,c,d
sleep
The odds of matching 4 numbers is 1 in 4,294,967,296. You wont have many matches.
Last edited by neil on Nov 20, 2023 20:18, edited 1 time in total.
dodicat
Posts: 7923
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Data Compression Different Methods

Post by dodicat »

Slightly faster method, any seed.

Code: Select all

Dim As Uinteger i,x,cnt
Dim As Ubyte n1,n2,n3,n4
Dim As Ubyte a,b,c,d

'my numbers
n1 = 33:n2 = 207:n3 = 94:n4 = 70

'seed number
randomize timer
dim as long seed=rnd*50  

randomize seed
dim as byte fa,fb,fc,fd

Do
if fa=0 then a = (rnd * 256)
if fb=0 then b = (rnd * 256)
if fc=0 then c = (rnd * 256)
if fd=0 then d = (rnd * 256)
if n1=a then fa=1
if n2=b then fb=1
if n3=c then fc=1
if n4=d then fd=1
if n1 = a and n2 = b and n3 = c and n4 = d Then
Print "Your numbers matched Loop count = ";(cnt)
Exit do
End if
cnt += 1
Loop Until inkey = chr(27)

print
print n1,n2,n3,n4
print a,b,c,d

' decoder


randomize seed
fa=0:fb=0:fc=0:fd=0
For i = 0 to cnt
if fa=0 then a = (rnd * 256)
if fb=0 then b = (rnd * 256)
if fc=0 then c = (rnd * 256)
if fd=0 then d = (rnd * 256)
if n1=a then fa=1
if n2=b then fb=1
if n3=c then fc=1
if n4=d then fd=1

next
Print
Print "4 Bytes restored from loop count"
print a,b,c,d
print "seed ";seed
sleep  
neil
Posts: 431
Joined: Mar 17, 2022 23:26

Re: Data Compression Different Methods

Post by neil »

@dodicat
Unless I missed something, your improvements to the algorithm could work consistently.
If you can save a 1 byte to 3 byte loop count number and restore your 4 bytes, then this is data compression.
It looks like you don't even have to save the seed number. Thanks for your help.

I just saw it. My 4 numbers I wanted to compress are stored in "n1, n2, n3, n4" and are in the decoder. If they were not in the decoder, then this would be data compression.
neil
Posts: 431
Joined: Mar 17, 2022 23:26

Re: Data Compression Different Methods

Post by neil »

I found this information on encode.su. I thought it was interesting.
If you could compress this file by one byte, you could have won a $100 prize a few years ago.
https://marknelson.us/assets/2012-10-09 ... Digits.bin
As far as I know, the contest has ended with no winners.
For over 20 years now, the compression community has been trying to compress a file that can't even be compressed by one byte.
I don't believe lossless recursive random data compression is possible if they can't even compress a file by one byte.
Here's an old link with the rules.
https://marknelson.us/posts/2012/10/09/ ... s-ten.html

Someone claims they compressed the file by 8 Bytes.
https://github.com/cbarraford/random-data-challenge

This is how you could make the file 3 Bytes smaller.
The rules allow you to breakup the file into separate files. I would have searched the file for number 8. Then broke the files off at number 8's. In File 1 I would keep the number 8. Files 2,3,4 remove the 8's. This would be 3 Bytes smaller than the original file. To recover seek the end of the File 1 and load the character and put it at the end of the other 3 files. Then concatenate the files. This way I am not hiding any data in my algorithm. Mark Nelson made up the rules.

Number 8 has been removed from the end of 3 files.

File1 File2 File3 File4
-22----7F----63----33
-43----2E----44----6F
-7E----54----FF----9E
-37----22----16----FF
-08
Last edited by neil on Nov 22, 2023 3:56, edited 5 times in total.
dodicat
Posts: 7923
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Data Compression Different Methods

Post by dodicat »

Hi neil.
If you do pairs, it remains fast, only two numbers are required for the decode, the total count and a concatenation.
Any known seed.
example:

Code: Select all

 

Dim As Ulongint i,x,cnt,c1,c2,f1,f2,tot
Dim As Ubyte n1,n2,n3,n4
Dim As Ubyte a,b,c,d

Function mkulongint(n1 As Ulong,n2 As Ulong) As Ulongint
    Dim As Ulongint res
    Cast(Ulong Ptr,@res)[0]=n1
    Cast(Ulong Ptr,@res)[1]=n2
    Return res
End Function

#define low(n) Cast(Ulong Ptr,@(n))[0]
#define high(n) Cast(Ulong Ptr,@(n))[1]

Screen 17
start:
'my numbers
n1 = Rnd*256:n2 = Rnd*256:n3 = Rnd*256:n4 = Rnd*256
c1=0
c2=0
f1=0
f2=0
cnt=0
Var n=Timer
'seed
Randomize n

Cls

Do
    If c1=0 Then
        a = Rnd*256
        b = Rnd*256
    End If
    If c2=0 Then
        c = Rnd*256
        d = Rnd*256
    End If
    
    If n1 = a And n2 = b And f1=0 Then
        c1=cnt
        f1=1
    End If
    
    If  n3 = c  And n4 = d And f2=0 Then
        c2=cnt
        f2=1
    End If
    
    If c1 <>0 And c2<>0 Then
        tot=mkulongint(c1,c2)
        Print "Your numbers matched Loop count = ";(cnt)
        Exit Do
    End If
    cnt += 1
Loop Until Inkey = Chr(27)

Print
Print n1,n2,n3,n4
Print a,b,c,d


'==============================================
' decoder
'pass cnt and tot only
'seed
Randomize n

Var cc1=low(tot)
Var cc2=high(tot)
For i = 0 To cnt
    If i <= cc1 Then
        a = Rnd*256
        b = Rnd*256
    End If

    If i<=cc2 Then
        c = Rnd*256
        d = Rnd*256
    End If
Next
Print
Print "4 Bytes restored from loop count"

Print a,b,c,d
Print "press a key to refresh"
Sleep
Goto start

 
neil
Posts: 431
Joined: Mar 17, 2022 23:26

Re: Data Compression Different Methods

Post by neil »

@dodicat
Nice work. I like the way you paired the PRNG's together it's very fast.
Thank you very much for your work.
neil
Posts: 431
Joined: Mar 17, 2022 23:26

Re: Data Compression Different Methods

Post by neil »

Matt Mahoney's compressor compressed Mark Nelson's file that I posted that can't be compressed.
This was after using my sorting algorithm on the file. This did not work with ZIP.

Why did this work?
Did sorting the file create patterns?

paq8px -8 random.dat

This was before sorting the file got larger.
Total input size : 415248
Total archive size : 415403

paq8px -8 randomsorted.bin

This was after sorting the file got 96k smaller.
Total input size : 415248
Total archive size : 319429

I added 7 zero's to the end of the random.dat file because of my sorting algorithm.
It sorts in 8 byte blocks. 8 x 51906 = 415248. The original file size was 415241.

His bbb and paq9a compressor also worked.

More testing is needed. I still have to unsort the file.

Here's the sorting algorithm.

Code: Select all

'sorting a random data file by neil

DIM AS UBYTE a,b,c,d,e,f,g,h,nr
Dim As UShort i

open "random.dat" FOR INPUT AS #1
open "randomsorted.bin" FOR BINARY AS #2

for i = 1 to 51906

get #1,,a
get #1,,b
get #1,,c
get #1,,d
get #1,,e
get #1,,f
get #1,,g
get #1,,h

for nr = 1 to 8
if a >= b THEN swap a,b
if b >= c THEN swap b,c
if c >= d THEN swap c,d
if d >= e THEN swap d,e
if e >= f THEN swap e,f
if f >= g THEN swap f,g
if g >= h THEN swap g,h
next

put #2,,a
put #2,,b
put #2,,c
put #2,,d
put #2,,e
put #2,,f
put #2,,g
put #2,,h
next

close #1
close #2

print "Done"

sleep
Here's some links to the paq8px compressor.
Windows https://drive.google.com/file/d/1UY0ydz ... vilmB/view
Windows https://encode.su/attachment.php?attach ... 1683741419
Linux https://github.com/hxim/paq8px
neil
Posts: 431
Joined: Mar 17, 2022 23:26

Re: Data Compression Different Methods

Post by neil »

I was looking up different sorting algorithms. The merge sort seemed interesting. You still would need some type of reverse merge sort to put the file back to it's original correct order.

Merge Sort is an implementation of divide and conquer algorithm. You need to divide the whole array into sub arrays. For example [1, 3 ,5, 6, 2, 4,1 10] is divided into [1 3 5 6] and [2 4 1 10]. [1 3 5 6] is divided into [1 3] and [5 6]. Now as both [1 3] and [5 6] are sorted swap procedure is not needed.
neil
Posts: 431
Joined: Mar 17, 2022 23:26

Re: Data Compression Different Methods

Post by neil »

By sorting a random data file I have been able to compress it. I have not found an efficient way to store the data needed to unsort the file. Different types of sorting methods could be the future of data compression.
neil
Posts: 431
Joined: Mar 17, 2022 23:26

Re: Data Compression Different Methods

Post by neil »

I did not think this would work. I only sorted 2 bytes at a time, and a random data file was compressed.

paq8px -8 randomsorted.bin

Total input size : 415248
Total archive size : 402568

With an absolute minimal 2 Byte sort the file compressed to about 12k smaller.

Code: Select all

'minimal 2 byte sort data compression test by neil

DIM AS UBYTE a,b
Dim As Ulong i

open "random.dat" FOR INPUT AS #1
open "randomsorted.bin" FOR BINARY AS #2

for i = 1 to 207624

get #1,,a
get #1,,b

if (a and 128) = 0 Then put #2,,a
if (b and 128) = 0 Then put #2,,b

if (a and 128) Then put #2,,a
if (b and 128) Then put #2,,b
next

close #1
close #2

print "Done"
sleep
neil
Posts: 431
Joined: Mar 17, 2022 23:26

Re: Data Compression Different Methods

Post by neil »

@moderators I probably should quit posting about compressing random data before I get permanently banned from the FreeBasic Forum like another member was. Yes?
Imortis
Moderator
Posts: 1918
Joined: Jun 02, 2005 15:10
Location: USA
Contact:

Re: Data Compression Different Methods

Post by Imortis »

neil wrote: Nov 27, 2023 21:35 @moderators I probably should quit posting about compressing random data before I get permanently banned from the FreeBasic Forum like another member was. Yes?
You are not making yourself a public nuisance, posting dubious medical advice, or any of the many other things that went into the other member being banned.

That user even made another account, which we could see was clearly the same member, and we were prepared to ignore it until the same pattern of behavior began again.

If you continue to be a reasonable member of the forum, and do as asked by moderators if things get out of hand, you should be just fine.
Post Reply