In [1]:
%pylab inline
from scipy.misc import derivative
from scipy.optimize import curve_fit
import inspect
Populating the interactive namespace from numpy and matplotlib

Numerical Cramer-Rao bound in Python

Cramér-Rao bound is a lower limit on the variance of estimators of a deterministic parameter.

Let's take an oscillating signal as an example, with three parameters: amplitude, frequency and initial phase. We know it will be sampled at 100Hz with noise level of 0.1.

In [2]:
def signal(t, a, f, ph):
    return a * sin(2 * pi * f * t + ph)

p0 = [2, 8, 0]
noise = 0.1

T = arange(0, 1, 0.01)
plot(T, signal(T, *p0), '.-k')
xlabel('time (s)')
Out[2]:
<matplotlib.text.Text at 0x109070198>

For convenience let's produce an array of parameters' names and a dictionary mapping those names to their values.

In [3]:
labels = inspect.getargspec(signal)[0][1:]
p0dict = dict(zip(labels, p0))
print('labels:', labels)
print('p0dict:', p0dict)
labels: ['a', 'f', 'ph']
p0dict: {'a': 2, 'f': 8, 'ph': 0}

Cramer-Rao bounds come form inverse of Fisher information matrix. Calculating it may be difficult in general case, but for a sample from multivariate normal distribution is really simple. Most measurements are in fact a sample from multivariate normal distribution, a vector \(\mu\), with number of dimensions equal to number of samples. If measurements are not correlated and have the same variance \(\sigma^2\) then an element of Fisher information matrix is: \[\mathcal{I}_{mn} = \frac{1}{\sigma^2} \frac{\partial \mu^\mathrm{T}}{\partial \theta_m} \frac{\partial \mu}{\partial \theta_n} = \frac{1}{\sigma^2} \sum_k \frac{\partial \mu_k}{\partial \theta_m} \frac{\partial \mu_k}{\partial \theta_n}\] where \(\theta\) is a vector of parameters, in our case \([a, f, ph]^T\).

We can define: \(D_{ik} := \frac{\partial \mu_k}{\partial \theta_i}\) to get: \[\mathcal{I}_{mn} = \frac{1}{\sigma^2} \sum_k \frac{\partial \mu_k}{\partial \theta_m} \frac{\partial \mu_k}{\partial \theta_n} = \frac{1}{\sigma^2} \sum_k D_{mk} D_{nk}\]

Let's first calculate matrix \(D_{ik}\):

In [4]:
D = zeros((len(p0), len(T)))

# for every parameter
for i, argname in enumerate(labels):
    # for every sample
    for k, t in enumerate(T):
        # Define a function to be derivated.
        # It's our signal at a given time t with all parameters fixed, except the one named 'argname'.
        func = lambda x: signal(t, **dict(p0dict, **{argname: x}))
        
        # Calulate derivative of func at point given by p0
        # Be careful with dx, since it's 1 by default!
        D[i,k] = derivative(func, p0dict[argname], dx=0.0001)

We can take a closer look at the \(D_{ik}\) matrix to better understand what it is.

In [5]:
plot(T, signal(T, *p0), '--k', lw=2, label='signal')

for Di, argname in zip(D, labels):
    plot(T, Di, '.-', label=argname)
    
legend(loc='best')
xlabel('time (s)')
Out[5]:
<matplotlib.text.Text at 0x1091b04a8>

\(D_{ik}\) matrix expresses how much \(k\text{th}\) sample affects \(i\text{th}\) parameter. So we see that amplitude is affected by samples that are in sine's apices. On the other hand, these points don't affect phase much. We also see that as signal lasts samples get more and more sensitive to frequency. It's intuitive, since even a very small change in frequency, given enough time, can cause an arbitrarily big change in signal's phase.

Recall how to get the Fisher information matrix from \(D\): \[\mathcal{I}_{m,n}= \frac{1}{\sigma^2} \sum_k D_{mk} D_{nk}\]

Numpy provides an excellent function to sum arrays with Einstein notation: einsum

In [6]:
I = 1/noise**2 * einsum('mk,nk', D, D) # How cool is that?
print(I)
[[  5.00000000e+03  -5.71453545e+02  -1.20661259e-09]
 [ -5.71453545e+02   2.55477009e+05   6.15752139e+04]
 [ -1.20661259e-09   6.15752139e+04   1.99999999e+04]]

Cramer-Rao bounds lie on a diagonal of the inverse of \(\mathcal{I}\):

In [7]:
iI = inv(I)

for argname, variance in zip(labels, iI.diagonal()):
    print('{}: {:.2g}'.format(argname, sqrt(variance)))
a: 0.014
f: 0.0039
ph: 0.014

If we now try to fit the signal with the exact model we should be exactly at Cramer-Rao bound with accuracy of the fitted parameters. In reality we will not, since variances yielded by fitting procedure are estimators themselves and thus have certain spread.

In [8]:
S = signal(T, *p0) + randn(T.size) * noise
plot(T, S, '.-k')

popt, pcov = curve_fit(signal, T, S, p0=p0)

for argname, variance in zip(labels, pcov.diagonal()):
    print('{}: {:.2g}'.format(argname, sqrt(variance)))

Tl = linspace(T[0], T[-1], 10000)
plot(Tl, signal(Tl, *popt))
a: 0.013
f: 0.0035
ph: 0.012

Out[8]:
[<matplotlib.lines.Line2D at 0x1092cff28>]

A function to calculate inverse of Fisher information matrix:

In [9]:
def cramer_rao(model, p0, X, noise, show_plot=False):
    """Calulate inverse of the Fisher information matrix for model
    sampled on grid X with parameters p0. Assumes samples are not
    correlated and have equal variance noise^2.
    
    Parameters
    ----------
    model : callable
        The model function, f(x, ...).  It must take the independent
        variable as the first argument and the parameters as separate
        remaining arguments.
    X : array
        Grid where model is sampled.
    p0 : M-length sequence
        Point in parameter space where Fisher information matrix is
        evaluated.
    noise: scalar
        Squared variance of the noise in data.
    show_plot : boolean
        If True shows plot.
    
    Returns
    -------
    iI : 2d array
        Inverse of Fisher information matrix.
    """
    labels = inspect.getargspec(model)[0][1:]
    p0dict = dict(zip(inspect.getargspec(model)[0][1:], p0))
    
    D = zeros((len(p0), X.size))
    for i, argname in enumerate(labels):
        D[i,:] = [ derivative(lambda p: model(x, **dict(p0dict, **{argname: p})),
                              p0dict[argname],
                              dx=0.0001)
                  for x in X ]
        
    if show_plot:
        plot(X, model(X, *p0), '--k', lw=2, label='signal')
        for d, label in zip(D, labels):
            plot(X, d, '.-', label=label)
        legend(loc='best')
        title('Parameter dependence on particular data point')
    
    I = 1/noise**2 * einsum('mk,nk', D, D)
    iI = inv(I)
    
    return iI

We can now use this function to show a funny little fact.

Non-diagonal elements of inverse of the Fisher information matrix show correlations between parameters. Let's see how correlation between frequency and phase looks depending on where we put zero in the time series. Zero is special, because that's where the phase is fixed to ph.

In [10]:
N = []
T0 = arange(0, 11)
for t0 in T0:
    T = arange(-t0, 10 - t0, 0.01)
    N.append( cramer_rao(signal, p0, T, noise)[2,1] )
    
plot(T0, N, 'o-')
axhline(0, color='black')
xlabel('zero location - point with fixed phase')
ylabel('frequency - phase correlation')
Out[10]:
<matplotlib.text.Text at 0x109237908>

The correlation drops out only when we look at the phase at the middle of the signal. Then the \(D_{ik}\) matrix looks like this:

In [11]:
T = arange(-0.5, 0.51, 0.01)
cramer_rao(signal, p0, T, noise, show_plot=True);
xlabel('time (s)')
Out[11]:
<matplotlib.text.Text at 0x10921a940>

References

D. Rife, R. Boorstyn "Single-Tone Parameter Estimation from Discrete-Time Observations", IEEE Transactions on Information Theory 20, 5, 1974