In [1]:
%pylab inline
from scipy.optimize import curve_fit
Populating the interactive namespace from numpy and matplotlib

Let's take a very simple model with two parameters - a line and generate some data.

In [2]:
def model(x, a, b):
    return a*x + b

a, b = 1, 3
noise = 0.1

X = linspace(0, 2)
D = model(X, a, b) + randn(X.size) * noise

errorbar(X, D, yerr=noise, fmt='k.', label='data')
plot(X, model(X, a, b), 'b--', label='real model')
legend(loc='best')
Out[2]:
<matplotlib.legend.Legend at 0x112e40780>

Now let's fit model to data:

In [3]:
popt, pcov = curve_fit(model, X, D)

# model with fitted parameters
fit = lambda x: model(x, *popt)

errorbar(X, D, yerr=noise, fmt='k.', label='data')
plot(X, model(X, a, b), 'b--', label='real model')
plot(X, fit(X), 'r--', label='fit: {:.4g}*x + {:.4g}f'.format(*popt))
legend(loc='best')
Out[3]:
<matplotlib.legend.Legend at 0x112f78c88>

Fitting procedures usually return also a covariance matrix. Most of the times people are only interested in what lies on it's diagonal: estimates of parameters' variances.

In [4]:
print('Fitted parameters:           ', popt)
print('Standard deviation estimates:', sqrt(pcov.diagonal()))
Fitted parameters:            [ 0.98018325  3.02793089]
Standard deviation estimates: [ 0.02081402  0.02415627]

However, covariance matrix also has non-diagonal elements which carry a very important information - correlations between parameters. Correlations are more clearly visible if we calculate Pearson correlation matrix:

In [5]:
print('Covariance matrix estimate returned by fitting procedure:')
print(pcov)
print()
print('Pearson correlation matrix estimate:')
Sigma = sqrt(pcov.diagonal())
print((pcov / Sigma).T / Sigma)
Covariance matrix estimate returned by fitting procedure:
[[ 0.00043322 -0.00043322]
 [-0.00043322  0.00058353]]

Pearson correlation matrix estimate:
[[ 1.         -0.86164044]
 [-0.86164044  1.        ]]

Elements of this matrix are Pearson correlation coefficients between parameters. Pearson correlation coefficient is 1 when variables are in ideal linear correlation, -1 when in an anticorrelation and 0 when they are not linearly correlated. Obviously parameters are perfectly correlated with themselves, hence ones on the diagonal, but from the non-diagonal elements we see that our parameters are also strongly anticorrelated with each other. What does it mean and why should anybody care?

To explain it it's good to realise what fitting actually is, and it's nothing else than minimisation of a function in parameters space. Inverse of the correlation matrix defines a quadratic form that describes shape of this minimum. It's good to remember that every sensible local minimum in it's neighbourhood can be described in general as a quadratic form (which follows directly from Taylor expansion - in extremum first derivative vanishes so the first non-trivial term is quadratic one).

In order to look at our fit's minimum let's first define a funciton to visualise any two dimensional quadratic form:

In [6]:
def plot_quadratic_form(A, b):
    """Plot 2D quadratic form given by matrix A centered in vector b."""
    N = 100
    # reasonable boundaries
    X = linspace(b[0] - 1/sqrt(A[0,0]), b[0] + 1/sqrt(A[0,0]), num=N)
    Y = linspace(b[1] - 1/sqrt(A[1,1]), b[1] + 1/sqrt(A[1,1]), num=N)
    
    Z = zeros((N, N))
    
    for j, x in enumerate(X):
        for i, y in enumerate(Y):
            v = array([x,y])
            Z[i,j] = 0.5 * (v - b).dot(A.dot((v - b)))
            
    contourf(X, Y, Z, 100)
    contour(X, Y, Z, 10)
    
A = array([[2, 0],
           [0, 1]])
bb = array([10,20])
plot_quadratic_form(A,bb)

Let's see how does our fit's minimum look:

In [7]:
plot_quadratic_form(inv(pcov), popt)

We see that the minimum is centered in fitted parameters' optimal values and it's size roughly corresponds to their standard deviation estimates. But the minimum is also tilted and that tilt is caused exactly by the fact that in our fit \(a\) and \(b\) parameters are correlated. We can find base where parameters are not correlated by diagonalising the covariance matrix:

In [8]:
eigvalues, eigvectors = eig(pcov)
print('Eigenvectors:')
print(eigvectors)
Eigenvectors:
[[-0.76515256  0.64384903]
 [-0.64384903 -0.76515256]]

So these two parameters are uncorrelated: \[c \approx -0.77a - 0.64b\] \[d \approx 0.64a - 0.77b\]

The minimum in base defined by above parameters isn't tilted, so if we make a model with these and perform a fit we should get diagonal covariance and Pearson's coefficients matrices.

In [9]:
def eigmodel(x, c, d):
    p = array([c,d])
    return eigvectors[0].dot(p) * x + eigvectors[1].dot(p)

eigpopt, eigpcov = curve_fit(eigmodel, X, D)

print('Covariance matrix estimate returned by fitting procedure:')
print(eigpcov)
print()
print('Pearson correlation matrix estimate:')
Sigma = sqrt(eigpcov.diagonal())
print((eigpcov / Sigma).T / Sigma)

plot_quadratic_form(inv(eigpcov), eigpopt)
Covariance matrix estimate returned by fitting procedure:
[[  6.86811022e-05  -8.70697971e-12]
 [ -8.70697971e-12   9.48067572e-04]]

Pearson correlation matrix estimate:
[[  1.00000000e+00  -3.41215899e-08]
 [ -3.41215899e-08   1.00000000e+00]]

What do these \(c\) and \(d\) parameters do? We can visualise that:

In [10]:
for c in linspace(0.9, 1.1) * eigpopt[0]:
    plot(X, eigmodel(X, c, eigpopt[1]), 'r-', alpha=0.5)
    
for d in linspace(0.9, 1.1) * eigpopt[1]:
    plot(X, eigmodel(X, eigpopt[0], d), 'g-', alpha=0.5)
    
errorbar(X, D, yerr=noise, fmt='k.', label='data')
plot(X, model(X, a, b), 'b--', label='real')

legend(loc='best')
Out[10]:
<matplotlib.legend.Legend at 0x1143bd908>

So why are these correlations important? Firstly they help to understand the fit and secondly because they are important in propagation of uncertainty.

Say that we want to calculate two simple values from fit parameters: \(f := a + b\) and \(g := a - b\):

In [11]:
print('f = a + b = %.2g' % (popt[0] + popt[1]))
print('g = a - b = %.2g' % (popt[0] - popt[1]))
f = a + b = 4
g = a - b = -2

If we would naively calculate uncertainties accorting to these equations: \[\sigma(f) = \sqrt{ \left(\frac{\partial f}{\partial a}\right)^2 \sigma(a)^2 + \left(\frac{\partial f}{\partial b}\right)^2 \sigma(b)^2 } = \sqrt{ 1^2 \sigma(a)^2 + 1^2 \sigma(b)^2 } = \sqrt{ \sigma(a)^2 + \sigma(b)^2 }\] \[\sigma(g) = \sqrt{ \left(\frac{\partial g}{\partial a}\right)^2 \sigma(a)^2 + \left(\frac{\partial g}{\partial b}\right)^2 \sigma(b)^2 } = \sqrt{ 1^2 \sigma(a)^2 + (-1)^2 \sigma(b)^2 } = \sqrt{ \sigma(a)^2 + \sigma(b)^2 } = \sigma(f)\] we would get following wrong uncertainties:

In [12]:
print('s(f) = %.2g' % sqrt(sum(pcov.diagonal())) )
print('s(g) = %.2g' % sqrt(sum(pcov.diagonal())) )
s(f) = 0.032
s(g) = 0.032

To get them right we have to use the full equations: \[\sigma(f) = \sqrt{ \left(\frac{\partial f}{\partial a}\right)^2 \sigma(a)^2 + \left(\frac{\partial f}{\partial b}\right)^2 \sigma(b)^2 + 2 \frac{\partial f}{\partial a} \frac{\partial f}{\partial b} \mathrm{cov}(a,b) } = \sqrt{ \sigma(a)^2 + \sigma(b)^2 + 2\ \mathrm{cov}(a,b) }\] \[\sigma(g) = \sqrt{ \left(\frac{\partial g}{\partial a}\right)^2 \sigma(a)^2 + \left(\frac{\partial g}{\partial b}\right)^2 \sigma(b)^2 + 2 \frac{\partial g}{\partial a} \frac{\partial g}{\partial b} \mathrm{cov}(a,b) } = \sqrt{ \sigma(a)^2 + \sigma(b)^2 - 2\ \mathrm{cov}(a,b) }\]

It most convenient to use vector form with vector of derivatives, call it \(D^f\) for function \(f\) of several parameters \(x_i\): \[D_{i}^f := \frac{\partial f}{\partial x_i} \] Then \[\sigma(f) = \sqrt{D^T \cdot C \cdot D}\] where \(C\) is the covariance matrix.

Now we can correctly calculate uncertainties:

In [13]:
D = array([1., 1.])
print('s(f) = %.2g' % sqrt(D.dot(pcov).dot(D)))
D = array([1., -1.])
print('s(g) = %.2g' % sqrt(D.dot(pcov).dot(D)))
s(f) = 0.012
s(g) = 0.043

So uncertainty of \(f = a+b\) is in fact smaller then we might have thought without taking correlations into account. Worse than that we would overestimate the uncertainty of \(g = a-b\).

Why? Well, it's clear when we visualise \(f\) and \(g\) in the parameter space:

In [14]:
plot_quadratic_form(inv(pcov), popt)
A = linspace(*gca().get_xlim())
plot(A, (A - popt[0]) + popt[1], ls='--', lw=5, color='black', label='f = a+b')
plot(A, -(A - popt[0]) + popt[1], ls=':', lw=5, color='black', label='g = a-b')
legend(loc='best')
Out[14]:
<matplotlib.legend.Legend at 0x113e939b0>

Uncertainty of \(f\) is small since it 'sees' a relatively narrow minimum. \(g\) on the other hand lies very close to the long axis of the minimum.