A Bayesian approach to recover signals from measured data

__Charles R. Hogg III__,
Igor Levin, Katharine Mullen

Many scientific investigations require measuring continuous functions. Examples are ubiquitous and diverse: the scattering function S(Q) in diffraction experiments, the strain field in a stretched metal plate, and many others. These measurements always involve uncertainty – the complete function would require an infinite number of measurements, and even the points which are measured are usually subject to noise. Our goal is to quantify the uncertainty in a measured continuous function, given a finite sample of (possibly noisy) datapoints.

We take a Bayesian approach, describing uncertainties
using probability theory. The central quantity is the *posterior
probability distribution* P(f|y), which gives the probability (density) that
the true curve is f, given the noisy datapoints y. Two factors contribute
to this probability. The *likelihood* P(y|f) gives the probability
of generating the observed datapoints, given a known true curve f. The *prior*
P(f) represents the plausibility of the curve f, given relevant background
information. The likelihood is usually relatively straightforward given a
noise model (e.g., Poisson or Gaussian).

Our main tool for specifying the prior probability P(f)
is *Gaussian Processes* (GP). These view a continuous function as a
continuously-indexed set of random variables, such that any finite subset has a
joint Gaussian distribution. The core of a GP model is a covariance
function k(X, X’), which gives the covariance between function values at X and
X’. A simple example is the Squared-Exponential (SE) covariance, which is
a Gaussian function of the separation (X-X’). This covariance means that
function values are highly correlated for nearby points, and mostly independent
for faraway points. It allows us to specify robust assumptions, such as
continuity and smoothness, without assuming a functional form. We use
variants on the SE covariance throughout, though extending to other covariance
functions is straightforward.

Using Gaussian Processes, we can directly generate curves with high P(f|y), including the curve which maximizes this quantity. We include several examples. First, we analyze both simulated and experimental scattering datasets with Poisson noise. We find that Bayesian analysis with Gaussian Process priors gives a good estimate of the underlying true curve, competitive with benchmark techniques such as Wavelet smoothing or Adaptive Weights Smoothing (AWS). Unlike these benchmarks, it also quantifies the uncertainty in its estimate. We also examined strain data in a stretched metal plate, interpolating the curve inside a central data-gap region where stress was measured instead. By quantifying the uncertainty as a function of gap size, we were able to improve experimental design, clearing the way for a better mapping of the stress-strain relationship.