tips_tricks_30_random_is_not_random.py

# https://youtu.be/azFSGHGeawg
"""
Random in computer programs is not really random, it is pseudo-random. 
A true random number can be generated by following natural events that bring 
randomness, such as radioactive particle emission. 

A pseudo random number gets generated by performing some operation over a value.
To get the process started, we need to supply a seed value.
Seed gives the random number generator its 'previous' value. 
The seed value can be randomly changed to generate 'irreproducible' random numbers. 
The seed must be kept constant to generate 'reproducible' random values. 

Many algorithms exist to generate pseudo-random values. 
Here are the default algortihms for python and numpy, respectively. 
Python default: https://en.wikipedia.org/wiki/Mersenne_Twister
Numpy uses: https://en.wikipedia.org/wiki/Permuted_congruential_generator
"""

import numpy as np
import matplotlib.pyplot as plt

#Print 5 random numbers without any seed, multiple times.
for i in range(10):
    print(np.random.rand(5))
    
# Print 5 random numbers by fixing the seed. 
for i in range(10):
    np.random.seed(42)
    print(np.random.rand(5))

#Common use in machine learning applications... 
from sklearn.model_selection import train_test_split
data = np.random.rand(10)
a, b = train_test_split(data, test_size=0.6)

c, d = train_test_split(data, test_size=0.6, random_state=42)



#With numpy you can generate multi dim. random numbers easily
#Test the following with and without seed. 
for i in range(5):
    np.random.seed(42) #Comment this to check without fixing a seed. 
    my_arr = np.random.rand(512,512)
    plt.figure()
    plt.imshow(my_arr, cmap='gray')
    plt.show()

#Can use randint to define specific range and size. 
for i in range(5):
    #np.random.seed(42) #Comment this to check without fixing a seed. 
    my_arr = np.random.randint(0, 500, size=(512,512))
    plt.figure()
    plt.imshow(my_arr, cmap='gray')
    plt.show()
    

##############################

#How many tries before the numbers start to repeat?

import numpy as np

unique_numbers = []
for i in range(5000):  # Number of times to generate a random number
     x = np.random.randint(1, 100000)  #Random number between 1 to N
     if x in unique_numbers:
         raise Exception('The number %d repeated at try %d' % (x, i))
     unique_numbers.append(x)
#Note: Use np.random.shuffle to pick random numbers without any duplicates. 
####################

#Likelihood of getting duplicates is very high and happens more often that we think. 
#We may think that if we have 1,000,000 numbers topick from it takes a million 
#tries torepeat the picking of a random number, but it only takes about 1000 picks. 

#Birthday paradox is a good example to illustrate this. 

###################################################################
"""
Birthday paradox. 

Probability fo two people in a collection of people having same birthday. 
We are all egocentric and we compare other birthdays to ours. But if the question
is about any two birthdays, the probability of match is much higher. 
As the date itself doesn't have to be specific. 
We just need 23 people to get to 50% probability!!! 
https://en.wikipedia.org/wiki/Birthday_problem#Understanding_the_problem


For a sequence of N numbers, if we want to match 2 numbers, 
we have N x (N-1) / 2  pairs . 
(e.g., 1st number matching 2nd, 3rd, etc. and 2nd number matching 3rd, 4th, etc.)

For example: if you have 1, 2, 3, 4 - there are 4x3/2 = 6 possibilities
1,2 - 1,3, - 1,4 - 2,3 - 2,4 - 3,4

For 10 numbers you have 45 possibilities and for 100 numbers 4950 possibilities. 
For 365 numbers, it is 66430 possibilities
So if we have 365 people, each having different birthdays, we can make 66430 different pairs
of individuals with different birthdays. 
This is a large number and we would think the probability of two people having 
the same birthday is very low. But it only takes 23 people to get to a probability of 0.5

We have 365 possible birthdays. We need to calculate the probability of any two
numbers occuring in a sequence. 
If we have 100 people in our pool, probability that all n birthdays are different
p_bar(n) = 365! / (365**n x (365-n)!)

p_bar(100) = 365! / (365**100 x (365-100)!)  #See wikipedia for reference

Probability of at least two of the n persons having same birthday = 1 - p_bar(n)


"""
###########################

from math import log10, factorial

n=10      # Number of people
p_barN = factorial(365)  / ((365**n) * factorial (365 - n))
pN = 1-p_barN


#Let us capture the probabilities for varying number of people
people=[]
prob = []
for i in range(0,101):  #0 to 100 people
    print(i)
    
    p_bar_n = factorial(365)  / ((365**i) * factorial (365 - i))
    p_n = 1. - p_bar_n
    prob.append(p_n)
    people.append(i)
    

plt.plot(people, prob)
plt.plot((0, 23), (0.5, 0.5), 'k--')
plt.plot((23, 23), (0., 0.5), 'k--')
plt.show()