Guys, I have feelings you know. I can't help it if my nose hooks around like that, have a heartdodicat wrote:AdeyBlue has a loop
some CPU info
Re: some CPU info
-
- Posts: 4292
- Joined: Jan 02, 2017 0:34
- Location: UK
- Contact:
Re: some CPU info
adeyblue wrote:have a heart
As with many of these exercises, it takes a million passes to separate methods. In practice with only a handful of passes, it doesn't matter which method we use. With RomuTrio with gcc in 64-bit mode, its best environment, I see no point in switching to the library.
And then someone comes along and says: "I'm using gas, and you won't believe how many passes I need".
Re: some CPU info
It's obscured by macros to generate the constants for various type sizes, but it's /almost/ the same as the method in post #6dodicat wrote:the libgcc.a has goodness knows what!
https://github.com/gcc-mirror/gcc/blob/ ... cc2.c#L842
It's the method under the 'The best method for counting bits...' line in the 'Counting bits set, in parallel' section here.
https://graphics.stanford.edu/~seander/bithacks.html
Basically all the various fancy methods of popcount, and other various funky bit twiddling things are on that page.
Re: some CPU info
wow, dodicat. that is quite the sub. things i do not understand. if you could remove the branching slowness (IF)
not entirely sure my logic is correct
not entirely sure my logic is correct
Code: Select all
Function popcount64(x As Ulongint) As Ulong
const as ulongint mask_hi = 1 shl 63
const as ulongint mask_lo = -1 xor mask_hi
static as ulongint y: y = x and mask_hi
'If x=18446744073709551615 Then Return 64
x and= mask_lo
x -= ((x Shr 1) And &h5555555555555555ull)
x = (((x Shr 2) And &h3333333333333333ull) + (x And &h3333333333333333ull))
x = (((x Shr 4) + x) And &hf0f0f0f0f0f0f0full)
x += (x Shr 8)
x += (x Shr 16)
x+= (x Shr 32)
return (x And &h0000003full) + (y shr 63)
End function
Re: some CPU info
Hi dafhi.
if only catches the case of 64 ones, i.e. top of ulongint, 18446744073709551615.
If you don't go there then you can scrap IF.
I get:
Shift value greater than or equal to number of bits in data type
with your adjustment, and only 32 popcount for 18446744073709551615
...
Here is a convolution (concatenation) of fb rnd using the default algorithm to achieve 1 terabyte with practrand.
Only 2 unusuals on the way up
...
rng=RNG_stdin32, seed=unknown
length= 4 gigabytes (2^32 bytes), time= 103 seconds
Test Name Raw Processed Evaluation
[Low1/32]BCFN(2+2,13-3,T) R= -7.0 p =1-8.5e-4 unusual
...and 215 test result(s) without anomalies
rng=RNG_stdin32, seed=unknown
length= 8 gigabytes (2^33 bytes), time= 204 seconds
no anomalies in 229 test result(s)
rng=RNG_stdin32, seed=unknown
length= 16 gigabytes (2^34 bytes), time= 404 seconds
Test Name Raw Processed Evaluation
DC6-9x1Bytes-1 R= -4.9 p =1-4.3e-3 unusual
...and 239 test result(s) without anomalies
...
after that no more.
It isn't the fastest of course, but it brings fb into practrand terabyte range with it's own algorithms.
I might try randomize ,2 later.
My sincere apologies AdeyBlue (nose), and thanks for looking into the library.
The concatenation:
if only catches the case of 64 ones, i.e. top of ulongint, 18446744073709551615.
If you don't go there then you can scrap IF.
I get:
Shift value greater than or equal to number of bits in data type
with your adjustment, and only 32 popcount for 18446744073709551615
...
Here is a convolution (concatenation) of fb rnd using the default algorithm to achieve 1 terabyte with practrand.
Only 2 unusuals on the way up
...
rng=RNG_stdin32, seed=unknown
length= 4 gigabytes (2^32 bytes), time= 103 seconds
Test Name Raw Processed Evaluation
[Low1/32]BCFN(2+2,13-3,T) R= -7.0 p =1-8.5e-4 unusual
...and 215 test result(s) without anomalies
rng=RNG_stdin32, seed=unknown
length= 8 gigabytes (2^33 bytes), time= 204 seconds
no anomalies in 229 test result(s)
rng=RNG_stdin32, seed=unknown
length= 16 gigabytes (2^34 bytes), time= 404 seconds
Test Name Raw Processed Evaluation
DC6-9x1Bytes-1 R= -4.9 p =1-4.3e-3 unusual
...and 239 test result(s) without anomalies
...
after that no more.
It isn't the fastest of course, but it brings fb into practrand terabyte range with it's own algorithms.
I might try randomize ,2 later.
My sincere apologies AdeyBlue (nose), and thanks for looking into the library.
The concatenation:
Code: Select all
Const u32max = &hFFFFFFFFul ' ULong maximum value
#define rnd64 (culngint(rnd*u32max)+(culngint(rnd*u32max) shl 32))
#define u64rng(f,l) (culngint(rnd64) mod (((l)-(f))+1)) + (f)
'used this in practrand
'Do
'For j = 1 To 262144
' *SPtr = u64rng(0,4294967294)
'SPtr += 1
' Next
'Print S;
' SPtr = BasePtr
'Loop
dim as ulong count,v
screen 17
do
count+=1
v+=u64rng(1,99)
locate 2,2,0
print v/count,count
loop
-
- Posts: 4292
- Joined: Jan 02, 2017 0:34
- Location: UK
- Contact:
Re: some CPU info
@dodicat
Your 16 gigabytes are taking a long time. Are you using PractRand's '-multithreaded'? Using all available threads speeds things up quite a bit. If testing Ulongint, then use stdin64.
With regard If 'x=18446744073709551615 Then Return 64' we are testing for a highly unlikely event for every pass of popcount64.
Your 16 gigabytes are taking a long time. Are you using PractRand's '-multithreaded'? Using all available threads speeds things up quite a bit. If testing Ulongint, then use stdin64.
With regard If 'x=18446744073709551615 Then Return 64' we are testing for a highly unlikely event for every pass of popcount64.
Re: some CPU info
deltarho.
randomize ,2 gets nowhere, the only one is the twister (default) and randomize,5 is flawless.
Here is the code I use (command at the top to paste to the console).
As you see I have given algo 5 a shot.
results:
randomize ,2 gets nowhere, the only one is the twister (default) and randomize,5 is flawless.
Here is the code I use (command at the top to paste to the console).
Code: Select all
'fbrnds.exe | rng_test stdin32 -multithreaded
'fbrnds.bas
randomize ,5
Const u32max = &hFFFFFFFFul ' ULong maximum value
#define rnd64 (culngint(rnd*u32max)+(culngint(rnd*u32max) shl 32))
#define u64rng(f,l) (culngint(rnd64) mod (((l)-(f))+1)) + (f)
Dim Shared S As String * 1048576
Dim As Ulong Ptr SPtr, BasePtr
Dim As Long j
SPtr = Cptr(Ulong Ptr, Strptr( S ))
BasePtr = SPtr
Do
For j = 1 To 262144
*SPtr = u64rng(0,4294967294)
SPtr += 1
Next
Print S;
SPtr = BasePtr
Loop
results:
Code: Select all
Microsoft Windows [Version 10.0.19042.1165]
(c) Microsoft Corporation. All rights reserved.
C:\Users\Computer\Desktop\fb\test\practrand\msvc12_64bit>fbrnds.exe | rng_test stdin32 -multithreaded
RNG_test using PractRand version 0.94
RNG = RNG_stdin32, seed = unknown
test set = core, folding = standard (32 bit)
rng=RNG_stdin32, seed=unknown
length= 128 megabytes (2^27 bytes), time= 3.5 seconds
no anomalies in 154 test result(s)
rng=RNG_stdin32, seed=unknown
length= 256 megabytes (2^28 bytes), time= 7.4 seconds
no anomalies in 165 test result(s)
rng=RNG_stdin32, seed=unknown
length= 512 megabytes (2^29 bytes), time= 14.2 seconds
no anomalies in 178 test result(s)
rng=RNG_stdin32, seed=unknown
length= 1 gigabyte (2^30 bytes), time= 27.5 seconds
no anomalies in 192 test result(s)
rng=RNG_stdin32, seed=unknown
length= 2 gigabytes (2^31 bytes), time= 53.1 seconds
no anomalies in 204 test result(s)
rng=RNG_stdin32, seed=unknown
length= 4 gigabytes (2^32 bytes), time= 104 seconds
no anomalies in 216 test result(s)
rng=RNG_stdin32, seed=unknown
length= 8 gigabytes (2^33 bytes), time= 206 seconds
no anomalies in 229 test result(s)
rng=RNG_stdin32, seed=unknown
length= 16 gigabytes (2^34 bytes), time= 409 seconds
no anomalies in 240 test result(s)
rng=RNG_stdin32, seed=unknown
length= 32 gigabytes (2^35 bytes), time= 813 seconds
no anomalies in 251 test result(s)
rng=RNG_stdin32, seed=unknown
length= 64 gigabytes (2^36 bytes), time= 1630 seconds
no anomalies in 263 test result(s)
rng=RNG_stdin32, seed=unknown
length= 128 gigabytes (2^37 bytes), time= 3261 seconds
no anomalies in 273 test result(s)
rng=RNG_stdin32, seed=unknown
length= 256 gigabytes (2^38 bytes), time= 6523 seconds
no anomalies in 284 test result(s)
rng=RNG_stdin32, seed=unknown
length= 512 gigabytes (2^39 bytes), time= 13141 seconds
no anomalies in 295 test result(s)
rng=RNG_stdin32, seed=unknown
length= 1 terabyte (2^40 bytes), time= 26293 seconds
no anomalies in 304 test result(s)
-
- Posts: 4292
- Joined: Jan 02, 2017 0:34
- Location: UK
- Contact:
Re: some CPU info
@dodicat
I thought that your latest PC's spec was similar to mine, but obviously not because with 'Randomize ,5' at 16 gigabytes you have 409 seconds to my 66.8 seconds.
There must be something else going on - that is a factor of 6.12.
I think that your macros are overkill - I'm using '*SPtr = Int(Rnd*2^32)' - but that will only get you another few per cent in speed.
I thought that your latest PC's spec was similar to mine, but obviously not because with 'Randomize ,5' at 16 gigabytes you have 409 seconds to my 66.8 seconds.
There must be something else going on - that is a factor of 6.12.
I think that your macros are overkill - I'm using '*SPtr = Int(Rnd*2^32)' - but that will only get you another few per cent in speed.
Re: some CPU info
My macro is the main ranger in the ulongint range, but using only a ulong range for practrand.
I don't think you need concatenation for function 5, it is very random and it is ulong range anyway, so yours will do there.
But for the default twister without concatenation it fails.
My main theme is a 64 bit simulator using fb rnd and giving a good practrand.
As I said earlier, the speed is poor.
Having to use rnd twice and rnd itself is already a double which needs to go back to ulong. (twice)
So there is no practical use for it really, just an exercise in messing around and being off topic for so long.
Sorry srvaldez.
I don't think you need concatenation for function 5, it is very random and it is ulong range anyway, so yours will do there.
But for the default twister without concatenation it fails.
My main theme is a 64 bit simulator using fb rnd and giving a good practrand.
As I said earlier, the speed is poor.
Having to use rnd twice and rnd itself is already a double which needs to go back to ulong. (twice)
So there is no practical use for it really, just an exercise in messing around and being off topic for so long.
Sorry srvaldez.
Re: some CPU info
dodicat, i've been messing with quicksort again. I found yours to be fastest. I will see if i can tweak further
[update] also made x shared, swapped the order of last 2 calls for later experiments
[update] also made x shared, swapped the order of last 2 calls for later experiments
Code: Select all
#macro SetQsort(datatype,fname,dot)
Sub fname(a() As datatype,begin As Long,Finish As Long)
Dim As Long i=begin,j=finish
'Dim As datatype x =a((I+J)\2)
x =a((I+J)\2) '' shared var
While I <= J
While a(I)dot direction X dot:I+=1:Wend
While x dot direction a(J)dot:J-=1:Wend
'' shared tmp var
If I<=J Then tmp=a(i) : a(i)=a(j) : a(j)=tmp: I+=1:J-=1
wend
If I < Finish Then fname(a(),I,Finish)
If J > begin Then fname(a(),begin,J)
End Sub
#endmacro