Home / Index | www.icosaedro.it |
Last update: 2018-08-27
This article shows how a random numbers generator with a given arbitrary distribution of probability could be created. All the examples of code shown here are based on the it\icosaedro\utils\Random PHP class that implements an uniform generator; that generator will be the base of all the other random generators.
The probability density function P(x) of a random variable x is a non-negative function; its integral over a specific interval [a,b] is the probability of finding x in that interval:
∫_{a}^{b} dt P(t) = probability of finding x∈[a,b]
Obviously, the probability of finding a value on the whole real range [-∞,+∞] must be 1, that is the P function must be normalized.
The cumulative distribution function is defined as the probability of finding x in the range [-∞,x]:
C(x) ≡ ∫_{-∞}^{x} dt P(t)
Being P normalized, it follows that the cumulative distribution function C is monotonically increasing from C(-∞)=0 up to C(+∞)=1. Note that the derivative of the cumulative distribution function is the probability density function itself:
The mean value of the random variable x is defined as the "weighted" value of x:
μ ≡ ∫_{-∞}^{+∞} dt P(t) t
The variance of the random variable x is defined as the "weighted" value of the squared deviation from the mean value (x-μ)^{2}:
σ^{2} ≡ ∫_{-∞}^{+∞} dt P(t) (t - μ)^{2}
The standard deviation of the random variable x is defined as the square root of the variance:
σ = sqrt(σ^{2})
The it\icosaedro\utils\Random class implements a random numbers generator that generates "int" and "float" values as well. In this article we will use only floating-point numbers. To create a new generator, an instance of that class must be created. Without arguments, the constructor will initialize a new sequence; a starting seed may be provided to generate a repeatable sequence, which could be handy for testing purposes. The same class also creates an internal instance randomly initialized that can be retrieved with the getCommon() static method; applications that only need to generate a random number "on the spot" may share this instance rather than trying to create their own.
In this article we will deal only with floating-point random numbers. So, for example, here is the code to display 10 random numbers in the range [0,1[:
use it\icosaedro\utils\Random; $rnd = Random::getCommon(); for($i = 0; $i < 10; $i++) echo $rnd->randomFloat(), "\n";
The current implementation of the randomFloat() method only fills in 52 of the 53 bits of the IEEE 754 floating-point number mantissa, so the smallest returned value is exactly zero, while the higher value is 1-2^{-52}. These facts should be taken into account while evaluating certain expressions to avoid INF and NAN results. For example, if $x is a value returned from the randomFloat() method, then:
Expression | Resulting range |
---|---|
$x | [0, 1-2^{-52}] ≅ [0, 0.999999999999999778] |
1.0 - $x | [2^{-52}, 1] ≅ [2.220e-16, 1] |
1.0 / (1.0 - $x) | [1, 2^{52}] ≅ [1, 4.504e15] |
log(1.0 - $x) | [-52 log(2), 0] ≅ [-36.04, 0] |
For the purpose to test and compare different generators, we define this abstract class as the base for any other generator:
use it\icosaedro\utils\Random; abstract class DistributionAbstract { /** * @var Random */ protected $generator; /** * Initializes this object by setting the source of random bits. * @param Random $generator Source of random bits. Normally the implementations * of this abstract class will invoke the randomFloat() method only. */ function __construct($generator) { $this->generator = $generator; } /** * Returns the next random value. Although random, these value are expected * to be distributed according to some specific probability distribution law. * @return float */ abstract function getValue(); }
Then, a random numbers generator is an object that spits out floating point numbers through its getValue() method. The constructor allows to pass the base uniform generator object used as a source of random bits; implementations may also extend the constructor to add their own specific parameters.
As a first example of generator, the uniform random numbers generator could be defined in this way:
class UniformDistribution extends DistributionAbstract { /** * Returns the next random value. * @return float Random number in the range [0,1[. */ function getValue(){ return $this->generator->randomFloat(); } }
A new uniform random number generator could be created in several ways depending on the need:
// Use the pre-defined internal generator: $uniform1 = new UniformDistribution(Random::getCommon()); // Use a brand new generator randomly initialized: $uniform2 = new UniformDistribution(new Random()); // Generate the same sequence starting from a given seed: $uniform3 = new UniformDistribution(new Random(12345));
A new random number can then be retrieved by invoking $uniform1->getValue(). For example, to display 10 random numbers the code is:
for($i = 0; $i < 10; $i++) echo $uniform1->getValue(), "\n";
The probability density function of this generator is then:
P_{U}(x) ≡ 1 for x∈[0,1[, zero elsewhere |
and its cumulative density function is:
C_{U}(x) = 0 for x<0, x for x∈[0,1], 1 for x>1. |
The mean of the uniform distribution is μ_{U}=1/2 and its standard deviation is σ_{U}=1/sqrt(12)≅0.2887.
In this paragraph we will show how to create a generator for a given arbitrary distribution by using the transformation method. The transformation method allows to calculate a function y=f(x) that applied to the values x of a distribution X gives the values y of another distribution Y. In this way our uniform generator can be used to create generators for any other distribution we want by simply applying a specific transformation function f. But first, a bit of theory.
Say we have a generator X of random numbers x with a given probability density function P_{X}(x) and a cumulative density function defined as
C_{X}(x) ≡ ∫_{-∞}^{x} dt P_{X}(t)
and say we need another generator Y with probability distribution function P_{Y} and cumulative density function
C_{Y}(y) ≡ ∫_{-∞}^{y} dt P_{Y}(t)
This Y generator can be obtained by applying the following transformation function to the values of X:
y = f(x) ≡ C_{Y}^{inv}( C_{X}(x) )
where C_{Y}^{inv} is the inverse function of C_{Y}, that is:
C_{Y}( C_{Y}^{inv}(t) ) = 1 for any t∈[0,1]
Prove. Lets start by defining a new set of random numbers Z generated out of the X set by applying this function:
(1) z = f(x) ≡ C_{Y}^{inv}( C_{X}(x) )
This f function is monotonically increasing because both C_{X} and C_{Y}^{inv} are monotonically increasing. This implies that for any given value x of X, all the samples that fall before this value are transformed by f(x) into values z of Z that fall before z=f(x). In other words, the value of the cumulative density function of Z equals the value of the cumulative density function of X for any pair of matching z,x values:
(2) C_{Z}(z) = C_{X}(x) for any z=f(x)
Now, by applying C_{Y} to both sides of the equation (1), we get
C_{Y}(z) = C_{X}(x)
Then, by replacing this latter in (2) we finally get
C_{Z}(z) = C_{Y}(z)
that is, our Z set of random numbers generated by transforming X with f follows the same cumulative distribution function of Y, and then also the same probability density function:
This proves the Z set has the same distribution of the target set Y.
This transformation method allows to easily generate arbitrary distributions of random numbers provided that the cumulative function of the starting set X and the inverse of the cumulative function of the target set Y could be analytically calculated with practical analytical formulas. We will show examples where these analytical formulae can be easily found and calculated.
We seen our uniform random generator may generate up to 2^{52} different floating point numbers equally spaced by Δx = 2^{-52}. Starting from this random set X we may generate a random set Y by applying some transformation function f so that y ≡ f(x); the granularity of this set is then:
Depending on the specific transformation function f, the granularity of the generated set Y could become quite coarse, and this may in some case affect calculations and simulations. If this issue occurs, a very simple workaround is to replace x with the sum of two random floating point numbers,
x → x_{hi} + 2^{-52} x_{low}
the first one covering the first 52 bists of the mantissa as usual, and the second one covering the next 52 bits of the wider (simulated) mantissa, for a total of 104 bits. Note that the 2^{-52} factor simply shifts right the low part by 52 binary digits. In this way the granularity of the starting set X becomes Δx = 2^{-104} so the granularity of the resulting set Y drastically reduces by several orders of magnitude. Obviously, an ordinary IEEE 754 double-precision floating point number cannot hold so many digits, so we must keep the high and low parts of x in two separate variables.
The extra benefit of this workaround is that the range of the generated set extends -- sometimes too much causing overflow, so some care should be applied verifying at each step of the intermediate calculations if the range of possible values is supported by the IEEE 754 format without generating INF, NAN or de-normalized values.
First example of generator obtained with the transformation method:
P_{LI}(y) ≡ 0 for y<0, ≡ 2y for y∈[0,1], ≡ 0 for 1<y |
The cumulative density function is then:
C_{LI}(y) = 0 for y<0, = y^{2} for y∈[0,1], = 1 for 1<y. |
The inverse cumulative of this distribution on the [0,1] interval is then:
C_{LI}^{inv}(t) = sqrt(t)
So, the transformation function to apply to our uniform distribution U to get this LI distribution is:
f(x) = C_{LI}^{inv}( C_{U}(x) ) = sqrt(x)
The mean is μ_{LI} = 2/3 ≅ 0.6667 and the standard deviation is σ_{LI} = 1/(3 sqrt(2)) ≅ 0.2357.
The PHP implementation becomes:
class LinearIncreasingDistribution extends DistributionAbstract { /** * Returns the next random value. * @return float Random number in the range [0,1[. */ function getValue(){ return sqrt( $this->generator->randomFloat() ); } }
The linear decreasing distribution is defined by
P_{LD}(y) ≡ 0 for y<0, ≡ 2 - 2y for y∈[0,1], ≡ 0 for 1<y |
Its cumulative distribution is then:
C_{LD}(y) = 0 for y<0, = 2y - y^{2} for y∈[0,1], = 1 for 1<y. |
The inverse cumulative of this distribution on the [0,1] interval is then:
C_{LD}^{inv}(t) = 1 - sqrt(1-t)
So, the transformation function to apply to our uniform distribution U to get this LD distribution is:
f(x) = C_{LD}^{inv}( C_{U}(x) ) = 1 - sqrt(1-x)
The mean is μ_{LD} = 1/3 ≅ 0.3333 and the standard deviation is σ_{LD} = 1/(3 sqrt(2)) ≅ 0.2357.
The PHP implementation becomes:
class LinearDecreasingDistribution extends DistributionAbstract { /** * Returns the next random value. * @return float Random number in the range [0,1[. */ function getValue(){ return 1.0 - sqrt( 1.0 - $this->generator->randomFloat() ); } }
It is defined by this probability density function:
P_{POW}(y) ≡ 0 for y<0, ≡ (1+k) (1 - y)^{k} for y∈[0,1], ≡ 0 for 1<y |
Its cumulative distribution is:
C_{POW}(y) = 0 for y<0, = 1 - (1 - y)^{1+k} for y∈[0,1], = 1 for 1<y. |
The inverse cumulative of this distribution on the [0,1] interval is then:
C_{POW}^{inv}(t) = 1 - (1 - t)^{1/(1+k)} for t∈[0,1]
So, the transformation function to apply to our uniform distribution U to get this power distribution is:
f(x) = C_{POW}^{inv}( C_{U}(x) ) = 1 - (1 - x)^{1/(1+k)}
The PHP implementation becomes:
class PowerDistribution extends DistributionAbstract { private $p = 0.0; /** * * @param Random $generator * @param float $k */ function __construct($generator, $k) { parent::__construct($generator); $this->p = 1.0 / (1.0 + $k); } /** * @return float Value in the range [-1,1[. */ function getValue(){ $x = $this->generator->randomFloat(); return 1.0 - pow(1.0 - $x, $this->p); } }
It is defined by this probability density function:
P_{TRI}(y) ≡ 0 for y<-1, ≡ y+1 for y∈[-1,0], ≡ 1-y for y∈[0,1], ≡ 0 for 1<y |
Its cumulative distribution is:
C_{TRI}(y) = 0 for y<-1, = 1/2 y^{2} + y + 1/2 for y∈[-1,0], = - 1/2 y^{2} + y + 1/2 for y∈[0,1], = 1 for 1<y. |
The inverse cumulative of this distribution on the [0,1] interval is then:
C_{TRI}^{inv}(t) = -1 + sqrt(2 t) for t∈[0,1/2], = 1 - sqrt(2 (1-t)) for t∈[1/2,1]
So, the transformation function to apply to our uniform distribution U to get this TRI distribution is:
f(x) = C_{TRI}^{inv}( C_{U}(x) ) = -1 + sqrt(2 x) for x∈[0,1/2], = 1 - sqrt(2 (1-x)) for x∈[1/2,1]
The mean is μ_{TRI}=0 and the standard deviation is σ_{TRI} = 1/sqrt(6) ≅ 0.4082.
The PHP implementation becomes:
class TriangleDistribution extends DistributionAbstract { /** * @return float Value in the range [-1,1[. */ function getValue(){ $x = $this->generator->randomFloat(); if( $x < 0.5 ) return sqrt(2*$x) - 1; else return 1 - sqrt(2*(1.0-$x)); } }
The logistic distribution (see https://en.wikipedia.org/wiki/Logistic_distribution) has many applications, it is very similar to the Normal distribution but it can be solved analytically, and it is often used as a replacement for the Normal distribution. Its probability density function is defined by:
Its cumulative distribution is:
and its inverse is:
C_{L}^{inv}(t) = μ + s log(t/(1-t))
so the transformation function to apply to the uniform distribution to get values distributed according to the logistic distribution is:
y = f(x) ≡ C_{L}^{inv}( C_{U}(x) ) = μ + s log(x/(1-x))
The mean value is μ_{L}=μ and the standard deviation is σ_{L} = π s / sqrt(3).
The PHP generator is:
class LogisticDistribution extends DistributionAbstract { public $mu = 1.0, $s = 1.0; /** * Initializes the "logistic" generator. * @param Random $generator Source of random bits. * @param float $mu "mu" parameter, that is the mean. * @param type $s "s" parameter, that is sigma*sqrt(3)/M_PI being sigma the * standard deviation. */ function __construct($generator, $mu, $s) { parent::__construct($generator); $this->mu = $mu; $this->s = $s; } /** * Returns the next random value. * @return float Random number. */ function getValue(){ do { $x = $this->generator->randomFloat(); } while( $x == 0.0 ); return $this->mu + $this->s * log($x / (1.0 - $x)); } }
The range of values that can be returned by our implementation is limited by the range of values of x ∈ [2^{-52}, 1-2^{-52}], and results to be y ∈ μ ± 36.04 s, quite far from infinity but still large enough for most applications. The granularity can be estimated to be:
becomes Δy ≅ s for Δx = 2^{-52} near to the limits of the generated interval.
It is defined by the following probability density function:
The mean is μ_{N} = μ and the standard deviation is σ_{N} = σ. The cumulative distribution is:
This function cannot be inverted analytically, so the transformation method cannot be applied here and we must resort to some other solution. Among the possible alternatives (see https://en.wikipedia.org/wiki/Normal_distribution for a discussion) we choose the Box-Muller method that requires 2 random numbers x_{1} and x_{2} to calculate two independent values y_{1} and y_{2} distributed as required:
y_{1} = σ sqrt(-2 log(1-x_{1})) cos(2 π x_{2}) + μ
y_{2} = σ sqrt(-2 log(1-x_{1})) sin(2 π x_{2}) + μ
These two values are statistically independent from a strict mathematical point of view, but actually they are not due to our implementation. In fact, in our case log(1-x) ranges from about -36.04 up to exactly zero with quite coarse granularity toward the beginning of this interval, a granularity that in the worst case of cosine=1 can be estimated to be:
for x = 1-2^{-52} and Δx = 2^{-52}. So, although the two values y_{1} and y_{2} are statistically independent in theory, they are correlated by the same granularity issue in our implementation. For this reason we must keep only one value, say y_{1} per turn:
class NormalDistribution extends DistributionAbstract { private $mu = 0.0; private $sigma = 0.0; /** * @param Random $generator * @param float $mu Mean. * @param float $sigma Standard deviation. */ function __construct($generator, $mu, $sigma) { parent::__construct($generator); $this->mu = $mu; $this->sigma = $sigma; } /** * Returns the next random value. * @return float Random value. */ function getValue(){ /* Box-Muller method: */ $x1 = $this->generator->randomFloat(); $x2 = $this->generator->randomFloat(); return $this->sigma*sqrt(-2.0*log(1.0 - $x1))*cos(2.0*M_PI*$x2) + $this->mu; } }
Finally, note that the maximum values that can be returned by our generator are μ ± 8.490 σ when x_{1} = 1 - 2^{-52}, quite far from the infinity but still large enough for most applications.
The source code that generates and compares all the distributions here described is available here: test-dist.php.
Umberto Salsi | Comments | Contact | Site map | Home / Index |
Still no comments to this page. Use the Comments link above to add your contribute.