Home / Index
 www.icosaedro.it 

 Creating generators for arbitrary random numbers distributions

Last update: 2018-08-27

This article shows how a random numbers generator with a given arbitrary distribution of probability could be created. All the examples of code shown here are based on the it\icosaedro\utils\Random PHP class that implements an uniform generator; that generator will be the base of all the other random generators.

Contents

Notations and definitions

The probability density function P(x) of a random variable x is a non-negative function; its integral over a specific interval [a,b] is the probability of finding x in that interval:

ab dt P(t)   = probability of finding x∈[a,b]

Obviously, the probability of finding a value on the whole real range [-∞,+∞] must be 1, that is the P function must be normalized.

The cumulative distribution function is defined as the probability of finding x in the range [-∞,x]:

C(x) ≡ ∫-∞x dt P(t)

Being P normalized, it follows that the cumulative distribution function C is monotonically increasing from C(-∞)=0 up to C(+∞)=1. Note that the derivative of the cumulative distribution function is the probability density function itself:

The mean value of the random variable x is defined as the "weighted" value of x:

μ ≡ ∫-∞+∞ dt P(t) t

The variance of the random variable x is defined as the "weighted" value of the squared deviation from the mean value (x-μ)2:

σ2 ≡ ∫-∞+∞ dt P(t) (t - μ)2

The standard deviation of the random variable x is defined as the square root of the variance:

σ = sqrt(σ2)

The Random class

The it\icosaedro\utils\Random class implements a random numbers generator that generates "int" and "float" values as well. In this article we will use only floating-point numbers. To create a new generator, an instance of that class must be created. Without arguments, the constructor will initialize a new sequence; a starting seed may be provided to generate a repeatable sequence, which could be handy for testing purposes. The same class also creates an internal instance randomly initialized that can be retrieved with the getCommon() static method; applications that only need to generate a random number "on the spot" may share this instance rather than trying to create their own.

In this article we will deal only with floating-point random numbers. So, for example, here is the code to display 10 random numbers in the range [0,1[:

use it\icosaedro\utils\Random;

$rnd = Random::getCommon();
for($i = 0; $i < 10; $i++)
	echo $rnd->randomFloat(), "\n";

The current implementation of the randomFloat() method only fills in 52 of the 53 bits of the IEEE 754 floating-point number mantissa, so the smallest returned value is exactly zero, while the higher value is 1-2-52. These facts should be taken into account while evaluating certain expressions to avoid INF and NAN results. For example, if $x is a value returned from the randomFloat() method, then:

Expression Resulting range
$x [0, 1-2-52] ≅ [0, 0.999999999999999778]
1.0 - $x [2-52, 1] ≅ [2.220e-16, 1]
1.0 / (1.0 - $x) [1, 252] ≅ [1, 4.504e15]
log(1.0 - $x) [-52 log(2), 0] ≅ [-36.04, 0]

Interface for random numbers generators

For the purpose to test and compare different generators, we define this abstract class as the base for any other generator:

use it\icosaedro\utils\Random;

abstract class DistributionAbstract {
	
	/**
	 * @var Random
	 */
	protected $generator;
	
	/**
	 * Initializes this object by setting the source of random bits.
	 * @param Random $generator Source of random bits. Normally the implementations
	 * of this abstract class will invoke the randomFloat() method only.
	 */
	function __construct($generator)
	{
		$this->generator = $generator;
	}
	
	/**
	 * Returns the next random value. Although random, these value are expected
	 * to be distributed according to some specific probability distribution law.
	 * @return float
	 */
	abstract function getValue();
}

Then, a random numbers generator is an object that spits out floating point numbers through its getValue() method. The constructor allows to pass the base uniform generator object used as a source of random bits; implementations may also extend the constructor to add their own specific parameters.

Uniform random generator

As a first example of generator, the uniform random numbers generator could be defined in this way:

class UniformDistribution extends DistributionAbstract {
	
	/**
	 * Returns the next random value.
	 * @return float Random number in the range [0,1[.
	 */
	function getValue(){ return $this->generator->randomFloat(); }
}

A new uniform random number generator could be created in several ways depending on the need:

// Use the pre-defined internal generator:
$uniform1 = new UniformDistribution(Random::getCommon());
// Use a brand new generator randomly initialized:
$uniform2 = new UniformDistribution(new Random());
// Generate the same sequence starting from a given seed:
$uniform3 = new UniformDistribution(new Random(12345));

A new random number can then be retrieved by invoking $uniform1->getValue(). For example, to display 10 random numbers the code is:

for($i = 0; $i < 10; $i++)
	echo $uniform1->getValue(), "\n";

The probability density function of this generator is then:

PU(x) ≡ 1   for x∈[0,1[,   zero elsewhere

and its cumulative density function is:

CU(x)   = 0 for x<0,   x for x∈[0,1],   1 for x>1.

The mean of the uniform distribution is μU=1/2 and its standard deviation is σU=1/sqrt(12)≅0.2887.

The transformation method to create arbitrary generators

In this paragraph we will show how to create a generator for a given arbitrary distribution by using the transformation method. The transformation method allows to calculate a function y=f(x) that applied to the values x of a distribution X gives the values y of another distribution Y. In this way our uniform generator can be used to create generators for any other distribution we want by simply applying a specific transformation function f. But first, a bit of theory.

Say we have a generator X of random numbers x with a given probability density function PX(x) and a cumulative density function defined as

CX(x) ≡ ∫-∞x dt PX(t)

and say we need another generator Y with probability distribution function PY and cumulative density function

CY(y) ≡ ∫-∞y dt PY(t)

This Y generator can be obtained by applying the following transformation function to the values of X:

y = f(x) ≡ CYinv( CX(x) )

where CYinv is the inverse function of CY, that is:

CY( CYinv(t) ) = 1   for any t∈[0,1]

Prove. Lets start by defining a new set of random numbers Z generated out of the X set by applying this function:

(1)   z = f(x) ≡ CYinv( CX(x) )

This f function is monotonically increasing because both CX and CYinv are monotonically increasing. This implies that for any given value x of X, all the samples that fall before this value are transformed by f(x) into values z of Z that fall before z=f(x). In other words, the value of the cumulative density function of Z equals the value of the cumulative density function of X for any pair of matching z,x values:

(2)   CZ(z) = CX(x)   for any z=f(x)

Now, by applying CY to both sides of the equation (1), we get

CY(z) = CX(x)

Then, by replacing this latter in (2) we finally get

CZ(z) = CY(z)

that is, our Z set of random numbers generated by transforming X with f follows the same cumulative distribution function of Y, and then also the same probability density function:

This proves the Z set has the same distribution of the target set Y.

This transformation method allows to easily generate arbitrary distributions of random numbers provided that the cumulative function of the starting set X and the inverse of the cumulative function of the target set Y could be analytically calculated with practical analytical formulas. We will show examples where these analytical formulae can be easily found and calculated.

The granularity issue

We seen our uniform random generator may generate up to 252 different floating point numbers equally spaced by Δx = 2-52. Starting from this random set X we may generate a random set Y by applying some transformation function f so that y ≡ f(x); the granularity of this set is then:

Depending on the specific transformation function f, the granularity of the generated set Y could become quite coarse, and this may in some case affect calculations and simulations. If this issue occurs, a very simple workaround is to replace x with the sum of two random floating point numbers,

x → xhi + 2-52 xlow

the first one covering the first 52 bists of the mantissa as usual, and the second one covering the next 52 bits of the wider (simulated) mantissa, for a total of 104 bits. Note that the 2-52 factor simply shifts right the low part by 52 binary digits. In this way the granularity of the starting set X becomes Δx = 2-104 so the granularity of the resulting set Y drastically reduces by several orders of magnitude. Obviously, an ordinary IEEE 754 double-precision floating point number cannot hold so many digits, so we must keep the high and low parts of x in two separate variables.

The extra benefit of this workaround is that the range of the generated set extends -- sometimes too much causing overflow, so some care should be applied verifying at each step of the intermediate calculations if the range of possible values is supported by the IEEE 754 format without generating INF, NAN or de-normalized values.

Linear increasing distribution

First example of generator obtained with the transformation method:

PLI(y)   ≡ 0 for y<0,   ≡ 2y for y∈[0,1],   ≡ 0 for 1<y

The cumulative density function is then:

CLI(y)   = 0 for y<0,   = y2 for y∈[0,1],   = 1 for 1<y.

The inverse cumulative of this distribution on the [0,1] interval is then:

CLIinv(t) = sqrt(t)

So, the transformation function to apply to our uniform distribution U to get this LI distribution is:

f(x) = CLIinv( CU(x) ) = sqrt(x)

The mean is μLI = 2/3 ≅ 0.6667 and the standard deviation is σLI = 1/(3 sqrt(2)) ≅ 0.2357.

The PHP implementation becomes:

class LinearIncreasingDistribution extends DistributionAbstract {
	
	/**
	 * Returns the next random value.
	 * @return float Random number in the range [0,1[.
	 */
	function getValue(){ return sqrt( $this->generator->randomFloat() ); }
}

Linear decreasing distribution

The linear decreasing distribution is defined by

PLD(y)   ≡ 0 for y<0,   ≡ 2 - 2y for y∈[0,1],   ≡ 0 for 1<y

Its cumulative distribution is then:

CLD(y)   = 0 for y<0,   = 2y - y2 for y∈[0,1],   = 1 for 1<y.

The inverse cumulative of this distribution on the [0,1] interval is then:

CLDinv(t) = 1 - sqrt(1-t)

So, the transformation function to apply to our uniform distribution U to get this LD distribution is:

f(x) = CLDinv( CU(x) ) = 1 - sqrt(1-x)

The mean is μLD = 1/3 ≅ 0.3333 and the standard deviation is σLD = 1/(3 sqrt(2)) ≅ 0.2357.

The PHP implementation becomes:

class LinearDecreasingDistribution extends DistributionAbstract {
	
	/**
	 * Returns the next random value.
	 * @return float Random number in the range [0,1[.
	 */
	function getValue(){ return 1.0 - sqrt( 1.0 - $this->generator->randomFloat() ); }
}

Power distribution

It is defined by this probability density function:

PPOW(y)   ≡ 0 for y<0,   ≡ (1+k) (1 - y)k for y∈[0,1],   ≡ 0 for 1<y

Its cumulative distribution is:

CPOW(y)   = 0 for y<0,   = 1 - (1 - y)1+k for y∈[0,1],   = 1 for 1<y.

The inverse cumulative of this distribution on the [0,1] interval is then:

CPOWinv(t) = 1 - (1 - t)1/(1+k) for t∈[0,1]

So, the transformation function to apply to our uniform distribution U to get this power distribution is:

f(x) = CPOWinv( CU(x) ) = 1 - (1 - x)1/(1+k)

The PHP implementation becomes:

class PowerDistribution extends DistributionAbstract {
	
	private $p = 0.0;

	/**
	 * 
	 * @param Random $generator
	 * @param float $k
	 */
	function __construct($generator, $k)
	{
		parent::__construct($generator);
		$this->p = 1.0 / (1.0 + $k);
	}
	
	/**
	 * @return float Value in the range [-1,1[.
	 */
	function getValue(){
		$x = $this->generator->randomFloat();
		return 1.0 - pow(1.0 - $x, $this->p);
	}
}

Triangle distribution

It is defined by this probability density function:

PTRI(y)   ≡ 0 for y<-1,   ≡ y+1 for y∈[-1,0],   ≡ 1-y for y∈[0,1],   ≡ 0 for 1<y

Its cumulative distribution is:

CTRI(y)   = 0 for y<-1,   = 1/2 y2 + y + 1/2 for y∈[-1,0],   = - 1/2 y2 + y + 1/2 for y∈[0,1],   = 1 for 1<y.

The inverse cumulative of this distribution on the [0,1] interval is then:

CTRIinv(t)   = -1 + sqrt(2 t) for t∈[0,1/2],   = 1 - sqrt(2 (1-t)) for t∈[1/2,1]

So, the transformation function to apply to our uniform distribution U to get this TRI distribution is:

f(x) = CTRIinv( CU(x) )   = -1 + sqrt(2 x) for x∈[0,1/2],   = 1 - sqrt(2 (1-x)) for x∈[1/2,1]

The mean is μTRI=0 and the standard deviation is σTRI = 1/sqrt(6) ≅ 0.4082.

The PHP implementation becomes:

class TriangleDistribution extends DistributionAbstract {
	
	/**
	 * @return float Value in the range [-1,1[.
	 */
	function getValue(){
		$x = $this->generator->randomFloat();
		if( $x < 0.5 )
			return sqrt(2*$x) - 1;
		else
			return 1 - sqrt(2*(1.0-$x));
	}
}

Logistic distribution

The logistic distribution (see https://en.wikipedia.org/wiki/Logistic_distribution) has many applications, it is very similar to the Normal distribution but it can be solved analytically, and it is often used as a replacement for the Normal distribution. Its probability density function is defined by:

Its cumulative distribution is:

and its inverse is:

CLinv(t) = μ + s log(t/(1-t))

so the transformation function to apply to the uniform distribution to get values distributed according to the logistic distribution is:

y = f(x) ≡ CLinv( CU(x) ) = μ + s log(x/(1-x))

The mean value is μL=μ and the standard deviation is σL = π s / sqrt(3).

The PHP generator is:

class LogisticDistribution extends DistributionAbstract {
	
	public $mu = 1.0, $s = 1.0;
	
	/**
	 * Initializes the "logistic" generator.
	 * @param Random $generator Source of random bits.
	 * @param float $mu "mu" parameter, that is the mean.
	 * @param type $s "s" parameter, that is sigma*sqrt(3)/M_PI being sigma the
	 * standard deviation.
	 */	
	function __construct($generator, $mu, $s)
	{
		parent::__construct($generator);
		$this->mu = $mu;
		$this->s = $s;
	}
	
	/**
	 * Returns the next random value.
	 * @return float Random number.
	 */
	function getValue(){
		do {
			$x = $this->generator->randomFloat();
		} while( $x == 0.0 );
		return $this->mu + $this->s * log($x / (1.0 - $x));
	}
}

The range of values that can be returned by our implementation is limited by the range of values of x ∈ [2-52, 1-2-52], and results to be y ∈ μ ± 36.04 s, quite far from infinity but still large enough for most applications. The granularity can be estimated to be:

becomes Δy ≅ s for Δx = 2-52 near to the limits of the generated interval.

Normal (or Gaussian) distribution

It is defined by the following probability density function:

The mean is μN = μ and the standard deviation is σN = σ. The cumulative distribution is:

This function cannot be inverted analytically, so the transformation method cannot be applied here and we must resort to some other solution. Among the possible alternatives (see https://en.wikipedia.org/wiki/Normal_distribution for a discussion) we choose the Box-Muller method that requires 2 random numbers x1 and x2 to calculate two independent values y1 and y2 distributed as required:

y1 = σ sqrt(-2 log(1-x1)) cos(2 π x2) + μ
y2 = σ sqrt(-2 log(1-x1)) sin(2 π x2) + μ

These two values are statistically independent from a strict mathematical point of view, but actually they are not due to our implementation. In fact, in our case log(1-x) ranges from about -36.04 up to exactly zero with quite coarse granularity toward the beginning of this interval, a granularity that in the worst case of cosine=1 can be estimated to be:

for x = 1-2-52 and Δx = 2-52. So, although the two values y1 and y2 are statistically independent in theory, they are correlated by the same granularity issue in our implementation. For this reason we must keep only one value, say y1 per turn:

class NormalDistribution extends DistributionAbstract {
	
	private $mu = 0.0;
	private $sigma = 0.0;
	
	/**
	 * @param Random $generator
	 * @param float $mu Mean.
	 * @param float $sigma Standard deviation.
	 */
	function __construct($generator, $mu, $sigma)
	{
		parent::__construct($generator);
		$this->mu = $mu;
		$this->sigma = $sigma;
	}
	
	/**
	 * Returns the next random value.
	 * @return float Random value.
	 */
	function getValue(){
		/* Box-Muller method: */
		$x1 = $this->generator->randomFloat();
		$x2 = $this->generator->randomFloat();
		return $this->sigma*sqrt(-2.0*log(1.0 - $x1))*cos(2.0*M_PI*$x2) + $this->mu;
	}
}

Finally, note that the maximum values that can be returned by our generator are μ ± 8.490 σ when x1 = 1 - 2-52, quite far from the infinity but still large enough for most applications.

Full test source code

The source code that generates and compares all the distributions here described is available here: test-dist.php.

References


Umberto Salsi
Comments
Contact
Site map
Home / Index
Still no comments to this page. Use the Comments link above to add your contribute.