Random numbers not the same

New to FreeBASIC? Post your questions here.
deltarho[1859]
Posts: 4310
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Random numbers not the same

Post by deltarho[1859] »

@dodicat

No big deal but if Rnd = 0 in Sub shuffle then we get Swap a(n), a(n).

Instead of

Code: Select all

Swap a(n), a(range(n,Ubound(a)))
this should do it

Code: Select all

Swap a(n), a(range(n+1,Ubound(a)))
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Random numbers not the same

Post by jj2007 »

counting_pine wrote:How can you print 200 unique numbers between 0 and 99?!
Difficult indeed, especially with integers!

Another little problem:

Code: Select all

  Dim MyInt() As DWORD	; tested pseudo code
  For_ ct=0 To 999
	; load a unique random number between 1000 and 1999 into the array
	Rand(1000, 2000, MyInt(ct), unique)
  Next
  ArraySort MyInt()	; sort ascending
  For_ ct=0 To 999
	.if ct<9 || ct>=999-10
		If_ Zero? Then PrintLine "..." 
		PrintLine Str$(ct), Tb$, Str$(MyInt(ct))
	.endif
  Next
Not so random...:

Code: Select all

0       1000
1       1001
2       1002
3       1003
4       1004
5       1005
6       1006
7       1007
8       1008
...
989     1989
990     1990
991     1991
992     1992
993     1993
994     1994
995     1995
996     1996
997     1997
998     1998
999     1999
counting_pine
Site Admin
Posts: 6323
Joined: Jul 05, 2005 17:32
Location: Manchester, Lancs

Re: Random numbers not the same

Post by counting_pine »

deltarho[1859] wrote:@dodicat

No big deal but if Rnd = 0 in Sub shuffle then we get Swap a(n), a(n).

Instead of

Code: Select all

Swap a(n), a(range(n,Ubound(a)))
this should do it

Code: Select all

Swap a(n), a(range(n+1,Ubound(a)))
It breaks the distribution of the shuffle if we disallow the possibility of Rnd=0, although yes, it may be worth checking for and skipping the self-swap, to make sure nothing bad happens. (I seem to recall swapping an object with itself can have construct/destroy ordering issues?)

(And yes, there is no point in looping up to k=n. A little careless on my part.)
jj2007 wrote:Difficult indeed, especially with integers!
(And yes, I should also have said Integers. There are of course infinitely many unique "numbers" between 0 and 99:)
deltarho[1859]
Posts: 4310
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Random numbers not the same

Post by deltarho[1859] »

counting_pine wrote:It breaks the distribution of the shuffle if we disallow the possibility of Rnd=0
I would agree if that was true but we are not disallowing the possibility of Rnd = 0.

With 'Swap a(n), a(range(n+1,Ubound(a)))' we get 'Swap a(n), a(n+1)' by allowing Rnd = 0.

The issue with the original code is that the wrong range is being used and is not Knuth/Fisher Yates.

Code: Select all

-- To shuffle an array a of n elements (indices 1..n):
for i from 1 to n-1 do
     j ? random integer such that i+1 = j < n
     exchange a[i] and a[j]
Note i+1 and not i.
There are of course infinitely many unique "numbers" between 0 and 99:)
True if "numbers" is floating point but finite if "numbers" is either Single or Double.
deltarho[1859]
Posts: 4310
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Random numbers not the same

Post by deltarho[1859] »

Not sure where I got the piece of code in my last post - I have it in an old text file along with many miscellaneous items.

I have just spent a while searching for Knuth Shuffle and card shuffling algorithms and the event of an element exchanging/swapping with itself is allowed although I have not found an argument for allowing it - everyone just seems to accept it.

I mentioned earlier that the chance of Rnd = 0 will be very rare but if we allow the range to include the element being looked at then the chance of a self-swap is simply n-1 to 1 against where n is the number of elements. So, the chance of a self-swap decreases as the number of elements increases.

I take the view that a self-swap should not be allowed but then I do not have an argument for that. It is food for thought. <smile>
counting_pine
Site Admin
Posts: 6323
Joined: Jul 05, 2005 17:32
Location: Manchester, Lancs

Re: Random numbers not the same

Post by counting_pine »

I guess I'm really not sure what is meant here by the idea of 'Rnd = 0'.
I don't see any special significance, compared to, say, 'Rnd = 1e-9'.

Re self-swaps.. If I adapt my code so that self-swaps are impossible, then the most immediately obvious result is that the first element will be immediately swapped out of its original place, and will never be put back.

I made a more elaborate demonstration here, to show the general distribution.

Code: Select all

const N = 3
function factorial(nn as integer) as integer
    if nn <= 1 then return 1
    return nn * factorial(nn-1)
end function

dim as integer i, rndn, r
dim as integer a(0 to N-1)

print "Self-swaps allowed:"

'' iterate through all "random" possibilities
for rndn = 0 to factorial(N)-1
    r = rndn

    for i = 0 to N-1: a(i) = i: next i

    for i as integer = 0 to N-2
        swap a(i), a(i + (r mod (N-i)))
        r \= (N-i)
    next i

    for i = 0 to N-1
        print a(i);
    next i
    print
next rndn
print

print "Self-swaps disallowed:"

'' iterate through all "random" possibilities
for rndn = 0 to factorial(N)-1
    r = rndn

    for i = 0 to N-1: a(i) = i: next i

    for i as integer = 0 to N-3
        swap a(i), a(i+1 + (r mod (N-(i+1))))
        r \= N-(i+1)
    next i

    for i = 0 to N-1
        print a(i);
    next i
    print
next rndn
print
Note that instead of using int(rnd * N), where 0<=rnd<1, I use (r mod N), where r is an integer, then divide r by N to remove this part of the entropy.
deltarho[1859]
Posts: 4310
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Random numbers not the same

Post by deltarho[1859] »

@counting_pine
I guess I'm really not sure what is meant here by the idea of 'Rnd = 0'.
I don't see any special significance, compared to, say, 'Rnd = 1e-9'.
There isn't. We would get Swap a(n), a(n) if Rnd was sufficiently small.

I ran your code and got

Code: Select all

Self-swaps allowed:
 0 1 2
 1 0 2
 2 1 0
 0 2 1
 1 2 0
 2 0 1
 
Self-swaps disallowed:
 1 0 2
 2 1 0
 1 0 2
 2 1 0
 1 0 2
 2 1 0
In the first case we have all possible permutations and each column has 2x0,2x1 and 2x2. I do not understand what the second case is telling me.

Anyway, I have played around with some thought experiments as follows:

Suppose we have a set of five elements and allow self_swapping. If we consider the first element then there is a 4 to 1 against the chance that it will remain in situ. If we carried out two consecutive shuffles then we get a 16 to 1 chance. With ten consecutive shuffles, the chance is greater than one million to one.

On the other hand, if self_swapping was disallowed we would not have any element left in situ after only one shuffle.

So, the first case tends to the second case the more consecutive shuffles we do.

I would love to come up with a mathematical statement here but I cannot. All that I can say is the first case does not seem right whereas the second case is doing what I would expect a shuffle to do.

Another thought experiment:

Suppose we have ten balls numbered one to ten, placed them in a drum and gave the drum a good shake. If we pull the balls out one at a time would we regard something suspicious if the fourth ball we pulled out was number four? Of course, we would not. If someone said "I can arrange for that not to happen" then I would say "No, that would be fiddling"

Two conflicting thought experiments. Which one is correct?

I have to say that I am buying into the second one more than the first.

So, it looks like that I have persuaded myself that self-swapping should be allowed. <Big smile>
counting_pine
Site Admin
Posts: 6323
Joined: Jul 05, 2005 17:32
Location: Manchester, Lancs

Re: Random numbers not the same

Post by counting_pine »

deltarho[1859] wrote:

Code: Select all

Self-swaps allowed:
 0 1 2
 1 0 2
 2 1 0
 0 2 1
 1 2 0
 2 0 1
 
Self-swaps disallowed:
 1 0 2
 2 1 0
 1 0 2
 2 1 0
 1 0 2
 2 1 0
In the first case we have all possible permutations and each column has 2x0,2x1 and 2x2. I do not understand what the second case is telling me.
The second case is telling you what happens if self-swaps are disallowed.
It not only confirms that the first element can never appear in its original place, but also shows that only two permutations of six are attainable.
deltarho[1859]
Posts: 4310
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Random numbers not the same

Post by deltarho[1859] »

counting_pine wrote:but also shows that only two permutations of six are attainable.
I was never concerned about all possible permutations, only those which did not have self_swapping.

My second thought experiment above has convinced me that self_swapping should be allowed. It is more to do with perception than anything else.
deltarho[1859]
Posts: 4310
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Random numbers not the same

Post by deltarho[1859] »

More thought experiments.

Suppose we have a set of 100 elements and we have a self-swap at element number 53, say. The chance of that is 47 to 1 ( 100-53+1-1 ) against. However, what is the chance of the previous 52 elements not swapping with the 53rd element? I am not even going to try and work that out but the chance against the 53rd element remaining intact after a shuffle is much stronger than 47 to 1 against.

Going back to our numbered balls. Suppose we have just two. If self-swapping was allowed then we would pull ball number two out of the drum first 50% of the time, on average. If self-swapping was not allowed then we would pull ball number two out first with every shuffle. An onlooker, not knowing self-swapping was not allowed, may say "What kind of shuffling do you call that? I hope you are not using that drum for the national lottery."

Both of the above further strengthen the case for allowing self-swaps.

Added: I would recommend dodicat's Sub shuffle, above. The Knuth shuffle is clearly understood from 'Swap a(n), a(range(n,ubound(a)))'

On my machine, 10,000 integers are shuffled in about 0.378ms. If we use Randomize ,5 before invoking Sub shuffle then 10,000 integers are shuffled in about 109ms. That may not be an issue for you. Why use Randomize ,5? Well, being cryptographic, concerns about not having all possible permutations available is rendered academic.
counting_pine
Site Admin
Posts: 6323
Joined: Jul 05, 2005 17:32
Location: Manchester, Lancs

Re: Random numbers not the same

Post by counting_pine »

What are you deliberating over at this point? I'm not sure I follow..
The case for allowing self-swaps is not just "further strengthened", but iron-clad. You've seen that the possibility of self-swaps is crucial for the shuffle algorithm to be able to reach all permutations - much more so than a non-deterministic random number generator.

With the Knuth shuffle (i.e. with self-swaps allowed), even a fairly bad pseudorandom number generator will likely return a good sampling of the permutations, with every element occurring in any position with roughly equal probability.
With self-swaps disallowed, regardless of the random numbers generated, it's impossible for the first element to appear in its original position, and the probabilities for other elements will be skewed too.
deltarho[1859]
Posts: 4310
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Random numbers not the same

Post by deltarho[1859] »

counting_pine wrote:What are you deliberating over at this point?
Being very analytical sees me turning over more stones looking for trouble than most people do. <smile>
crucial for the shuffle algorithm to be able to reach all permutations
I agree but is that necessary from a shuffling perspective? If a set of numbers is sufficiently large then it will be impossible for a PRNG to reach all permutations anyway, as shown earlier.
it's impossible for the first element to appear in its original position
I know, but that did not concern me - I was concerned about shuffling.

In the first thought experiment of my last post, allowing self-swaps, it is clear that, in theory, a particular element may remain intact after a shuffle. That is not just with regard a shuffle but with each consideration of a range or iterate. So, in theory, we could have a set of 100 numbers, say, and have a bunch of them unchanged. The finished result then would appear to have terminated prematurely. In practice, however, there will exist a sufficiently large set of numbers, and not that large, where this scenario has a probability approaching zero.

I had a procedure at my local hospital a few weeks back where a tube with a camera at the end was pushed up my back passage so as to examine my bowels. I am currently on blood thinners and the consultant said that there was 1000 to 1 chance of a bleed so did I want to go ahead. I said, "I'm OK with that - 1000 to 1 doesn't bother me". As it happened the procedure did not find anything qualifying as a major concern and a different blood thinner may be recommended as a few small spots of blood were seen.

A probability approaching zero with the Knuth shuffle is not a major concern either.
Post Reply