We applied our Bayesian technique to several noisy X-ray scattering datasets, and submitted the results to the Journal of Applied Crystallography. (The paper is currently under review; to request a preprint, please e-mail charles.hogg [at] nist.gov (charles[dot]hogg[at]nist[dot]gov))

Here, we explain to readers of that paper how to reproduce its results.

We are presently working to make GPPois into a polished R package, suitable for public consumption. However, the results in our paper pre-date this effort. They stem from an earlier version of the software which is more *ad hoc* and exploratory.

One theme in our paper was that the basic Gaussian Process approach must be extended to handle crystallographic data. In the new software, we implement these extensions as functions which *use* the basic interface, rather than as modifications to that interface. Presently, we have written such functions for the non-stationary lengthscale ℓ(Q) case (simulated Au dataset), but not for the multiple-lengthscales spiky-on-flat case (experimental TiO2 dataset).

Presently, we only provide instructions for reproducing our results with the new software, since it is much easier to use. We are happy to explain in detail how we obtained the results in the paper, but we're unlikely to take the trouble unless requested. In any case, we could find no significant differences between the output of the new and old software. (We *have* noticed one bug: in the old software, the definition of τ^{2} was off by a factor of 2. This did not materially affect the results; we simply used the equivalent value with the correct definition.)

To reproduce our Au results, please download this .zip with the following files:

The following R code may then be copy-pasted into an R terminal:

source("GPPois.R") # Setup the Dataset object to store our data: au.data.frame <- read.table("simulated_Au_data.txt", header=TRUE, sep="\t") au.data <- Dataset(id="Au", data=au.data.frame, X.names="Q", column="noisy", poisson.names="noisy", data.offset=0) # Find optimal hyperparameters (ell(Q) and sigma.f) which describe the data: ell.bounds <- c(0.1, 20) sigma.f.bounds <- c(1e-7, 30) source("testing.R") opt.hyp <- TrainSEVaryingEllFocusRegions(d=au.data, n.pts.wide=200, decay.length=20, tau.sq=1e-4, hyp.norm=FALSE, hyp.lin=FALSE, model=Model(id='VaryingEll'), varying.ell.bounds=ell.bounds, varying.sigma.f.bounds=sigma.f.bounds); # Construct a Model using the optimized hyperparameters M <- Model(id="varyingEll") M$AddCovariance(CovarianceSEVaryingEll(id="vary", X.ell=opt.hyp$X, ell =VariableParameterValues('ell' , opt.hyp$opt$par), sigma.f=VariableParameterValues('sigma.f', opt.hyp$opt$par))) Au.fit <- M$PosteriorInterval(d=au.data) write.table(file="Au_results.txt", sep="\t", row.names=FALSE, quote=FALSE, Au.fit)

At this point, the file Au_results.txt will contain four columns: the Q-values, the posterior mean, and the curves for plus/minus one standard deviation of the posterior. (i.e., at each point, the true curve lies between lower and upper with the standard 1-sigma probability of 68.3%.)

This dataset cannot yet be analyzed with the new software. To reproduce the results, we are supplying a legacy version of the code. This older code is tentative, messy, and poorly documented. It contains some ideas which are only partly implemented, and others which simply didn't work out. Nevertheless, it does analyze this dataset: whatever polish it may lack, the underlying ideas are the same.

To reproduce the TiO_{2} results, please download this .zip file containing the necessary scripts and datafiles.

To reproduce the data, simply unzip the .zip file, run R, and type

source("reproduce.R")at the prompt. Alternatively, the commands can be copy-pasted from the script, one at a time, to gain understanding.

** NOTE:** This datafile contains (N = 10679) datapoints. Analysis requires numerous (N x N) matrices, each of which takes up almost half a gigabyte. Beware running out of RAM! We suggest a bare minimum of 4 GB RAM; more is much better. The author finds the Au dataset is easily analyzed on his personal workstation (4 GB RAM), but the TiO

Created December 7, 2011, Updated August 15, 2023