%pylab inline
from scipy.misc import derivative
from scipy.optimize import curve_fit
import inspect
Cramér-Rao bound is a lower limit on the variance of estimators of a deterministic parameter.
Let's take an oscillating signal as an example, with three parameters: amplitude, frequency and initial phase. We know it will be sampled at 100Hz with noise level of 0.1.
def signal(t, a, f, ph):
return a * sin(2 * pi * f * t + ph)
p0 = [2, 8, 0]
noise = 0.1
T = arange(0, 1, 0.01)
plot(T, signal(T, *p0), '.-k')
xlabel('time (s)')
For convenience let's produce an array of parameters' names and a dictionary mapping those names to their values.
labels = inspect.getargspec(signal)[0][1:]
p0dict = dict(zip(labels, p0))
print('labels:', labels)
print('p0dict:', p0dict)
Cramer-Rao bounds come form inverse of Fisher information matrix. Calculating it may be difficult in general case, but for a sample from multivariate normal distribution is really simple. Most measurements are in fact a sample from multivariate normal distribution, a vector \(\mu\), with number of dimensions equal to number of samples. If measurements are not correlated and have the same variance \(\sigma^2\) then an element of Fisher information matrix is: \[\mathcal{I}_{mn} = \frac{1}{\sigma^2} \frac{\partial \mu^\mathrm{T}}{\partial \theta_m} \frac{\partial \mu}{\partial \theta_n} = \frac{1}{\sigma^2} \sum_k \frac{\partial \mu_k}{\partial \theta_m} \frac{\partial \mu_k}{\partial \theta_n}\] where \(\theta\) is a vector of parameters, in our case \([a, f, ph]^T\).
We can define: \(D_{ik} := \frac{\partial \mu_k}{\partial \theta_i}\) to get: \[\mathcal{I}_{mn} = \frac{1}{\sigma^2} \sum_k \frac{\partial \mu_k}{\partial \theta_m} \frac{\partial \mu_k}{\partial \theta_n} = \frac{1}{\sigma^2} \sum_k D_{mk} D_{nk}\]
Let's first calculate matrix \(D_{ik}\):
D = zeros((len(p0), len(T)))
# for every parameter
for i, argname in enumerate(labels):
# for every sample
for k, t in enumerate(T):
# Define a function to be derivated.
# It's our signal at a given time t with all parameters fixed, except the one named 'argname'.
func = lambda x: signal(t, **dict(p0dict, **{argname: x}))
# Calulate derivative of func at point given by p0
# Be careful with dx, since it's 1 by default!
D[i,k] = derivative(func, p0dict[argname], dx=0.0001)
We can take a closer look at the \(D_{ik}\) matrix to better understand what it is.
plot(T, signal(T, *p0), '--k', lw=2, label='signal')
for Di, argname in zip(D, labels):
plot(T, Di, '.-', label=argname)
legend(loc='best')
xlabel('time (s)')
\(D_{ik}\) matrix expresses how much \(k\text{th}\) sample affects \(i\text{th}\) parameter. So we see that amplitude is affected by samples that are in sine's apices. On the other hand, these points don't affect phase much. We also see that as signal lasts samples get more and more sensitive to frequency. It's intuitive, since even a very small change in frequency, given enough time, can cause an arbitrarily big change in signal's phase.
Recall how to get the Fisher information matrix from \(D\): \[\mathcal{I}_{m,n}= \frac{1}{\sigma^2} \sum_k D_{mk} D_{nk}\]
Numpy provides an excellent function to sum arrays with Einstein notation: einsum
I = 1/noise**2 * einsum('mk,nk', D, D) # How cool is that?
print(I)
Cramer-Rao bounds lie on a diagonal of the inverse of \(\mathcal{I}\):
iI = inv(I)
for argname, variance in zip(labels, iI.diagonal()):
print('{}: {:.2g}'.format(argname, sqrt(variance)))
If we now try to fit the signal with the exact model we should be exactly at Cramer-Rao bound with accuracy of the fitted parameters. In reality we will not, since variances yielded by fitting procedure are estimators themselves and thus have certain spread.
S = signal(T, *p0) + randn(T.size) * noise
plot(T, S, '.-k')
popt, pcov = curve_fit(signal, T, S, p0=p0)
for argname, variance in zip(labels, pcov.diagonal()):
print('{}: {:.2g}'.format(argname, sqrt(variance)))
Tl = linspace(T[0], T[-1], 10000)
plot(Tl, signal(Tl, *popt))
A function to calculate inverse of Fisher information matrix:
def cramer_rao(model, p0, X, noise, show_plot=False):
"""Calulate inverse of the Fisher information matrix for model
sampled on grid X with parameters p0. Assumes samples are not
correlated and have equal variance noise^2.
Parameters
----------
model : callable
The model function, f(x, ...). It must take the independent
variable as the first argument and the parameters as separate
remaining arguments.
X : array
Grid where model is sampled.
p0 : M-length sequence
Point in parameter space where Fisher information matrix is
evaluated.
noise: scalar
Squared variance of the noise in data.
show_plot : boolean
If True shows plot.
Returns
-------
iI : 2d array
Inverse of Fisher information matrix.
"""
labels = inspect.getargspec(model)[0][1:]
p0dict = dict(zip(inspect.getargspec(model)[0][1:], p0))
D = zeros((len(p0), X.size))
for i, argname in enumerate(labels):
D[i,:] = [ derivative(lambda p: model(x, **dict(p0dict, **{argname: p})),
p0dict[argname],
dx=0.0001)
for x in X ]
if show_plot:
plot(X, model(X, *p0), '--k', lw=2, label='signal')
for d, label in zip(D, labels):
plot(X, d, '.-', label=label)
legend(loc='best')
title('Parameter dependence on particular data point')
I = 1/noise**2 * einsum('mk,nk', D, D)
iI = inv(I)
return iI
We can now use this function to show a funny little fact.
Non-diagonal elements of inverse of the Fisher information matrix show correlations between parameters. Let's see how correlation between frequency and phase looks depending on where we put zero in the time series. Zero is special, because that's where the phase is fixed to ph
.
N = []
T0 = arange(0, 11)
for t0 in T0:
T = arange(-t0, 10 - t0, 0.01)
N.append( cramer_rao(signal, p0, T, noise)[2,1] )
plot(T0, N, 'o-')
axhline(0, color='black')
xlabel('zero location - point with fixed phase')
ylabel('frequency - phase correlation')
The correlation drops out only when we look at the phase at the middle of the signal. Then the \(D_{ik}\) matrix looks like this:
T = arange(-0.5, 0.51, 0.01)
cramer_rao(signal, p0, T, noise, show_plot=True);
xlabel('time (s)')
D. Rife, R. Boorstyn "Single-Tone Parameter Estimation from Discrete-Time Observations", IEEE Transactions on Information Theory 20, 5, 1974