A Bayesian approach to recover signals from measured data

Charles R. Hogg III, Igor Levin, Katharine Mullen

Many scientific investigations require measuring continuous functions.  Examples are ubiquitous and diverse: the scattering function S(Q) in diffraction experiments, the strain field in a stretched metal plate, and many others.  These measurements always involve uncertainty the complete function would require an infinite number of measurements, and even the points which are measured are usually subject to noise.  Our goal is to quantify the uncertainty in a measured continuous function, given a finite sample of (possibly noisy) datapoints.

We take a Bayesian approach, describing uncertainties using probability theory.  The central quantity is the posterior probability distribution P(f|y), which gives the probability (density) that the true curve is f, given the noisy datapoints y.  Two factors contribute to this probability.  The likelihood P(y|f) gives the probability of generating the observed datapoints, given a known true curve f.  The prior P(f) represents the plausibility of the curve f, given relevant background information.  The likelihood is usually relatively straightforward given a noise model (e.g., Poisson or Gaussian).

Our main tool for specifying the prior probability P(f) is Gaussian Processes (GP).  These view a continuous function as a continuously-indexed set of random variables, such that any finite subset has a joint Gaussian distribution.  The core of a GP model is a covariance function k(X, X), which gives the covariance between function values at X and X.  A simple example is the Squared-Exponential (SE) covariance, which is a Gaussian function of the separation (X-X).  This covariance means that function values are highly correlated for nearby points, and mostly independent for faraway points.  It allows us to specify robust assumptions, such as continuity and smoothness, without assuming a functional form.  We use variants on the SE covariance throughout, though extending to other covariance functions is straightforward.

Using Gaussian Processes, we can directly generate curves with high P(f|y), including the curve which maximizes this quantity.  We include several examples.  First, we analyze both simulated and experimental scattering datasets with Poisson noise.  We find that Bayesian analysis with Gaussian Process priors gives a good estimate of the underlying true curve, competitive with benchmark techniques such as Wavelet smoothing or Adaptive Weights Smoothing (AWS).  Unlike these benchmarks, it also quantifies the uncertainty in its estimate.  We also examined strain data in a stretched metal plate, interpolating the curve inside a central data-gap region where stress was measured instead.  By quantifying the uncertainty as a function of gap size, we were able to improve experimental design, clearing the way for a better mapping of the stress-strain relationship.