Search found 373 matches

by Provoni
Jun 20, 2020 10:56
Forum: General
Topic: Duplicates
Replies: 34
Views: 914

Re: Duplicates

Don't think my idea will work.
by Provoni
Jun 20, 2020 10:19
Forum: General
Topic: Duplicates
Replies: 34
Views: 914

Re: Duplicates

8 buckets: For every new bucket I add a unique character to the string that is otherwise unused through the corpus. Okay or not? EDIT: Not okay, there was an error in my application of the idea. screenres 640,480,32 dim as uinteger i,j,b,l dim as string s dim as double t=timer redim shared as ubyte ...
by Provoni
Jun 20, 2020 9:56
Forum: General
Topic: Duplicates
Replies: 34
Views: 914

Re: Duplicates

I just realized that multiple CRC32 buckets can be used like so: if crc32array(0,j)=0 then 'unique crc32array(0,j)=1 print #2,s else 'collision, go to 2nd bucket print #3,s j=crc32(s+"*") 'unused character in corpus if crc32array(1,j)=0 then 'unique crc32array(1,j)=1 print #2,s else 'colli...
by Provoni
Jun 20, 2020 7:53
Forum: General
Topic: Duplicates
Replies: 34
Views: 914

Re: Duplicates

Can anyone make a CRC-n? For example CRC33 or CRC35.
by Provoni
Jun 20, 2020 7:29
Forum: General
Topic: Duplicates
Replies: 34
Views: 914

Re: Duplicates

Can't keep up with all the information but thanks allot! p.p.s. when replying please specify if this is for a one-off case, or that you have to do this e.g. every month/year etc for new measurement data or so. Maybe about 1-3 times a year. I am creating letter n-grams frequencies for my solver http:...
by Provoni
Jun 18, 2020 19:01
Forum: General
Topic: Duplicates
Replies: 34
Views: 914

Re: Duplicates

Thanks everyone for all the replies. Still need to catch up. Isn't a checksum just a 'low quality' hash? No clue, it could be. So we’re talking roughly a billion lines of text? One question that occurs to me is, how many unique lines are there likely to be? I never counted the lines, will follow up ...
by Provoni
Jun 16, 2020 17:48
Forum: General
Topic: Duplicates
Replies: 34
Views: 914

Re: Duplicates

Looks like a case for hashing Thanks jj2007. I was thinking of using a checksum for each line of text. Hashing is the transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string. Hashing is used to index and retrieve items in a datab...
by Provoni
Jun 15, 2020 16:46
Forum: General
Topic: Duplicates
Replies: 34
Views: 914

Duplicates

Hey all,

I have a very large text file (near 1 TB) and on each line there is some text with a minimum length of twenty bytes and a maximum length of perhaps thousands of bytes. From this file I want to remove the duplicates entries. How would one approach this in FreeBASIC?

Thanks
by Provoni
Jun 14, 2020 13:59
Forum: General
Topic: PCG32II
Replies: 44
Views: 827

Re: PCG32II

Cool boxes program dodicat!

deltarho[1859] wrote:With Windows, that is using CryptGenRandom, most folk would reckon I'd be here until Christmas before I saw 704/704. Nope - 52 seconds.

The program is slowed down by drawing functions and whatnot which irrelavates the speed of the RNG.

dodicat can you speed it up?
by Provoni
Jun 14, 2020 5:45
Forum: General
Topic: PCG32II
Replies: 44
Views: 827

Re: PCG32II

Hi David, I have a conceptual question: Most PRNGs cannot, technically speaking, produce the same number twice, since that would mean an infinite loop, right? I guess PractRand doesn't care, but a real random number generator would allow sequences such as 10, 1, 7, 7, 7, 5, 2 . Your 2cts? There is ...
by Provoni
Jun 13, 2020 8:47
Forum: General
Topic: Include dll into .exe
Replies: 15
Views: 410

Re: Include dll into .exe

jj2007 wrote:
Provoni wrote:worst case scenario.
Which is?

Program execution outside of original directory somehow.
by Provoni
Jun 13, 2020 7:31
Forum: General
Topic: Include dll into .exe
Replies: 15
Views: 410

Re: Include dll into .exe

jj2007 wrote:The cleanest solution is to include the DLL in the installation package. But then OP might run into licensing problems, I suppose. Is hiding it inside the exe the solution?

Not a licensing problem. Just want the .exe to be stand alone for worst case scenario.
by Provoni
Jun 13, 2020 6:15
Forum: General
Topic: Array question [Solved]
Replies: 8
Views: 236

Re: Array question [Solved]

fxm wrote:Personally, I prefer the following syntax (with a pointer index), considering that the data to access is into an integer buffer pointed by test:
test[i] = 123


dodicat wrote:rewriting the array to a single-dim

Thanks
by Provoni
Jun 13, 2020 5:10
Forum: General
Topic: Problem with PUT
Replies: 5
Views: 167

Re: Problem with PUT

dodicat wrote:I don't have enough ram here, but does the crt file handling work better?

When I try 8GB with your example it says: warning 25(0): Overflow in constant conversion, and no file is made.

I've worked arount the PUT issue by buffering it.
by Provoni
Jun 13, 2020 5:05
Forum: General
Topic: Problem with PUT
Replies: 5
Views: 167

Re: Problem with PUT

On my Win7-64 machine it doesn't freeze, but it doesn't write anything either. Some error checking might help. #include "Windows.bi" static shared as ubyte array() '8GB redim array(8000000000) print "GetLastError=";GetLastError(), " (8=not enough memory)" open "te...

Go to advanced search