cma.utilities.math

module documentation

various math utilities, notably eig and a collection of simple functions in Mh

Class	`MathHelperFunctions`	static convenience math helper functions, if the function name is preceded with an "a", a numpy array is returned
Class	`UpdatingAverage`	use instead of a `list` when too many values must be averaged
Function	`eig`	eigendecomposition of a symmetric matrix, much slower than `numpy.linalg.eigh`, return `(EVals, Basis)`, the eigenvalues and an orthonormal basis of the corresponding eigenvectors, where
Function	`geometric_sd`	return geometric standard deviation of `vals`.
Function	`Hessian`	Hessian estimate for `f` at `x0`
Function	`ifloat`	remove np type of a number, return `int` or `float`
Function	`moving_average`	rolling average without biasing boundary effects.
Function	`normal_ppf`	return an approximation of the inverse cdf value of a standard normal distribution,
Function	`randhss`	`n` iid `dim`-dimensional vectors with length `norm_(vector)`.
Function	`randhss_mixin`	`n` iid vectors uniformly distributed on the hypersphere surface with mixing in of normal distribution, which can be beneficial in smaller dimension.
Function	`to_correlation_matrix`	change C in place into a correlation matrix, AKA whitening
Function	`_sqrt_len`	Undocumented

def eig(C): ¶

eigendecomposition of a symmetric matrix, much slower than numpy.linalg.eigh, return (EVals, Basis), the eigenvalues and an orthonormal basis of the corresponding eigenvectors, where

Basis[i]

the i-th row of Basis

columns of Basis, [Basis[j][i] for j in range(len(Basis))]

the i-th eigenvector with eigenvalue EVals[i]

def geometric_sd(vals, **kwargs): ¶

return geometric standard deviation of vals.

The gsd is invariant under linear scaling and independent of the choice of the log-exp base.

kwargs are passed to np.std, in particular ddof.

def Hessian(f, x0, eps=1e-06): ¶

Hessian estimate for f at x0

def ifloat(float_): ¶

remove np type of a number, return int or float

def moving_average(x, w=7): ¶

rolling average without biasing boundary effects.

The first entries give the average over all first values (until the window width is reached).

If w is not an integer, expontential smoothing with weights proportionate to (1 - 1/w)**i summing to one is executed, thereby putting about 1 - exp(-1) ≈ 0.63 of the weight sum on the last w entries.

Details: the average is mainly based on np.convolve, whereas exponential smoothing is for the time being numerically inefficient and scales quadratically with the length of x.

def normal_ppf(p): ¶

return an approximation of the inverse cdf value of a standard normal distribution,

assuming 0 < p <= 1/2. This is the "sigma" for which the tail distribution has probability p (AKA quantile or percentile point function).

For p=1/2 we have sigma = 0. The approximation 0.79 + 1.49 x sqrt(-log(p)) + alpha x sqrt(p), where alpha=0.637... is such that p=1/2 maps to sigma=0, has a sigma error within [-0.02, 0.029] for p >= 1e-12 and for p >= 1e-22 with an additional correction as applied. The error is less than 0.0121 for all p <= 0.1. The relative sigma error for p->1/2 and hence sigma->0 is < 0.11.

The input p may be an np.array.

The following is the Python code to assess the accuracy:

%pylab
import scipy
from scipy import stats
import cma

normal_ppf = cma.utilities.math.normal_ppf

pp = np.logspace(-15, np.log10(0.5), 5400)  # target tail probabilities

if 1 < 3:  # sigma vs p
    figure(53)
    gcf().clear()
    grid(True, which='both')
    xlabel('tail probability')
    ylabel('sigma')
    semilogx(pp, normal_ppf(pp), label='approximation')
    semilogx(pp, scipy.stats.norm.ppf(pp), label='true')
    legend()
    # semilogx(pp, -sqrt(7 * log(1/pp/4 + pp)) / 2)
    # semilogx(pp, -scipy.stats.norm.ppf(pp/2))
    # semilogx(pp, sqrt(7 * log((1/pp + pp) / 2)) / 2)
if 1 < 3:  # Delta sigma vs p
    figure(54)
    gcf().clear()
    grid(True, which='both')
    xlabel('tail probability')
    ylabel('sigma difference (error)')
    semilogx(pp, normal_ppf(pp) - scipy.stats.norm.ppf(pp),
            label='absolute')
    abs_error = max(np.abs(normal_ppf(pp) - scipy.stats.norm.ppf(pp)))
    semilogx(pp, (normal_ppf(pp) - scipy.stats.norm.ppf(pp)) / np.maximum(1e-12, -scipy.stats.norm.ppf(pp)),
            label='relative')
    rel_error = max((normal_ppf(pp) - scipy.stats.norm.ppf(pp))
                      / np.maximum(1e-12, -scipy.stats.norm.ppf(pp)))
    ylim(-max(np.abs(ylim())), max(np.abs(ylim())))
    text(xlim()[0], ylim()[0] * 0.98,
        ' the largest absolute and relative errors are {} and {}'
        .format(np.round(abs_error, 6), np.round(rel_error, 6)))
    legend()
if 11 < 3:
    figure(55)
    gcf().clear()
    grid(True, which='both')
    xlabel('tail probability')
    ylabel('sigma ratio')
    semilogx(pp, normal_ppf(pp) / scipy.stats.norm.ppf(pp))
if 11 < 3:
    figure(56)
    gcf().clear()
    ylabel('probability ratio')
    xlabel('sigma')
    plot(stats.norm.ppf(pp), stats.norm.cdf(normal_ppf(pp)) / pp)
    grid(True, which='both')
if 1 < 3:  # true p of sigma vs p input
    figure(56)
    gcf().clear()
    ylabel('true probability ratio')
    xlabel('tail probability (input)')
    semilogx(pp, pp / stats.norm.cdf(normal_ppf(pp)))
    grid(True, which='both')

The approximation is by construction exact for p=1/2. For small probabilities (sigma < -7), the probability is quite sensitive: roughly speaking, a Delta sigma of 0.05 / 0.25 / 1 changes the probability by a factor of 1.4 / 3 / 50, respectively.

def randhss(n, dim, norm_=_sqrt_len, randn=np.random.randn): ¶

n iid dim-dimensional vectors with length norm_(vector).

The vectors are uniformly distributed on a hypersphere surface.

CMA-ES diverges with popsize 100 in 15-D without option 'CSA_clip_length_value': [0,0].

>>> from cma.utilities.math import randhss
>>> dim = 3
>>> assert dim - 1e-7 < sum(randhss(1, dim)[0]**2) < dim + 1e-7

def randhss_mixin(n, dim, norm_=_sqrt_len, c=(lambda d: 1.0 / d), randn=np.random.randn): ¶

n iid vectors uniformly distributed on the hypersphere surface with mixing in of normal distribution, which can be beneficial in smaller dimension.

def to_correlation_matrix(c): ¶

change C in place into a correlation matrix, AKA whitening

def _sqrt_len(x): ¶

Undocumented