Principal Component Analysis , Scatter Diagrams and Color Overlays
for analysis of Compositional Maps
Compositional Mapping often produces a number of registered images, each image representing the concentration of one element or chemical constituent. The images can be created from a variety of analytical instruments: the electron probe microanalyzer, and analytical electron microscope, the ion microscope, for example.
These images are usually first viewed as a series of gray level images. These give a general idea of the chemical distribution in the sample but figuring out the phases of material, or other relationships between more than just a few images can be confusing. Figure 1, showing a series of maps of a Garnet mineral, is an example:Figure 1
Color overlays of selected images help identify the phases in the material. concentrations fall into distinct groupings according to the phases. The quantitative aspects of this are brought out in the Scatter Diagram (Concentration Histogram image).
These tools are useful for looking at three or fewer maps. When there are four maps, then some of the analytical information in the data set is missing (unless the skipped map happens to be linearly dependent on the ones that are shown.)
Explanation of PCA in terms of scatter diagrams.
Principal component analysis is a method for reducing the dimensions or components of the data set. Practally speaking, this means that the number of images in the data set can be reduced. Then, color overlays and scatter diagrams can be used on the most important components - to identify the phases, measure the signal to noise
PCA chooses linear compinations of the components of the data set (images) such that these components show the most variance in the data - first greatest. The linear combinations are special ones - in that the scatter diagram is rotated in N space (for N origional images).
For example, Figure 2, shows a computer simulated example of two maps of a region of a sample with two phases - the background area and the disk to the upper right:
Figure 2 Simulated maps of element A (left) and B (right)
There could be more than two phases here, but it would be unlikely because the composition of either element A or B would be altered enough to see the differences in these maps. Let us assume for the moment, the presence of only two phases.
A color overlay of these two phases shows two distinct hues (Figure 3) as expected. The disk is rich in element A, and the background in element B, giving the regions in Error! Reference source not found. the appropriate hues.
The quantitative relationships between the elements (for example, what are the proportions of Element A to B in both of the phases) are not easily read from the color overlay above, or even from the individual images in Error! Reference source not found.. A scatter diagram, Figure 4, or bi-variate histogram, (or Concentration Histogram Imagei) immediately makes the relationship clear.
In this scatter diagram, each pixel in the maps is associated with two concentration values, one for each element. These values serve as coordinates to the scatter plot - so the count or each location or bin in the scatter plot is incremented for each pixel. We display the scatter plots themselves as images, so that the bin locations are discrete. (This corresponds well with the maps, where the intensity values are also counts - integers - discrete.) Brighter areas in the scatter plot show many counts per bin and correspond to larger areas in the original maps. Larger areas in the scatter plot above indicate a spread in the concentration values of the maps: Poisson noise was added when the computer simulations were constructed. Since the amount of noise goes as Ãn, the spread in the counts (n) is larger for larger concentration values - this makes the spots elliptical in the scatter plot, rather than circular. The disk, with smaller area and higher concentration of element A, corresponds the dimmer spot to the lower right. The background area corresponds to the brighter spot in the upper left.
When there are only two chemical components present, the amount of one can be determined from the amount of the other, since both must sum to 100%. This means that all of the chemical information is contained in one of the maps. Another way of saying this is that the scatter diagram in Figure 4 can be rotated so that the spots are separated only along one axis:
Figure 5: Scatter diagram of principal components of Figure 4Figure 6 Figure 5 scaled more to match Figure 4
The horizontal axis in Figure 5 has the chemical information, because it is along this axis that one differentiates between one cluster of bins and the other, that is between the two phases in the material. The vertical axis shows only the widths of the clusters, which is due to the noise - Poission counting statistics in x-ray maps, and random noise added to these simulated maps. The image that corresponds to the horizontal axis should then show the two phases, while the image corresponding to the vertical axis should show only random noise. These maps are shown in Figure 7.
Figure 7a,b Images of 1st (a) and 2nd (b) principal component
Because principal components are images, each can be inspected visually to see if any spatial chemical information is there. This can add confidence when the smaller principal components are discarded that no chemical information is being discarded. Here, we can discard the second principal component, and view only the first, which clearly shows the two phases. The scatter diagram of Figure 5, now becomes just a histogram of the remaining principal component,Figure 7a, where the larger peak on the left with the smaller intensities represents the background, the other peak, the disk.
If there are only two elements in the sample, it does not matter how
many phases are present (although, in equilibrium, there can be only two
(I think), all of the phases can be represented by only one principal component.
The principal component images are linear combinations of the originals that are linearly independent. They can be thought of as a rotation of the scatter diagram (in whatever dimensional space is appropriate). This means that the scatter diagram can be rotated back, and that the original images can be reconstructed exactly from the principal component images. What happens if the origional images are reconstructed from only some of the principal components? If the components are chosen properly, (they are given in order of their eigenvalues) then the original images will be restored, less some random noise, but with all of the relevant chemical information. Figure 9 show two elements and four phases or areas of unique composition.
The scatter diagram for the pair of images in Figure 9 is shown in Figure 10,
which, when rotated, looks like Figure 11 - again essentially a one dimensional scatter diagram, which is plotted as such in .
The two principal component images look like this:
The same principal applies to three elements, but where the clusters still line up in the scatter diagram. As an example, consider the three maps in
The three images are the same except for random noise. This situation is unphysical because there would have to be at least one more component to make up the difference - but this is for illustration only. The three dimensional scatter diagram looks like this:
The clusters inthe diagram lie along the line from the origin (which is away from the viewer) to the opposite corner (which is to the upper right, near the viewer). Obviously, since the three images are essentially the same, there is really only one principal component, as can be seen by inspecting the principal component images:
The scatter diagram for these three images, is just the diagram of Figure 14, but rotated so that the clusters fall along the first (x) axis. Since only one dimension is relevant, the information can be seen in a regular (one dimensional) histogram:
Backing up a step, consider two maps, with four clusters in line (so that only one component would be needed to describe them), and an additional cluster off of the line (so that a second component is necessary).:
The scatter diagram looks like this:
Principal component analysis still rotates the scatter diagram so that most of the variance is along the first (x) axis, but there still is significant variance (the last cluster, in fact) along the second PC axis, so that here, both principal components are needed to describe the data. PCA offers no reduction in dimensionality. ì
This can also be seen by visually inspecting the two principal component images. Both show significant visual (chemical) information and must be kept.:
Figure 20 illustrates what sometimes happens in PCA: the first component has the information for three of the phases, and the fourth has information largly for the last phase. One component is often associated with a particular feature or phase of the material.
This same sequence can be illustrated in three dimensions. This time we need four clusters that are planar in the scatter diagram:
In Figure 21, the four clusters lie on a plane that goes through the three corners off the cube, other than the origin that are cooincident with an axis. The center, brighter cluster, representing the background of the images, lies in the exact cente of the cube. The images corresponding to Figure 21 are:
Since the clusters in Figure 21lie in a plane, the axes can be rotated so that the plane is the new x,y axis. The third principal component contains mostly the noise (spread in the clusters):
Since the scatter diagram in Figure 23 is essentially planar, it is best displayed as a two dimensional diagram:
and, reasonably enough, only two of the three principla component images retains any chemical information:
The next example is similar to Figure 21, except the cluster in the center of the cube has been moved out of the plane toward the origin, so that the scatter diagram is no longer inherently planar (two dimensional).:
The images from whence this diagram came are these. The background intensity was adjusted to move the central cluster.:
PCA again does a rotation (which seems to be somethat arbitrary), but the scatter diagram is still inherently three dimensional:
And all three PC images contain chemical information:
PCA discards or pays no attention to visual information or spacing.
The areas of unique composition can be placed at random throughout the image. In other words, the pieces can be scrambled like a jig saw puzzle, and the scatter diagrams an eigenvalues for the PCís would remain unchanged.
Bright, D.S. and Newbury, D.E. (1991) "Concentration Histogram Imaging", Analytical Chemistry 63(4):243A-250A, (Feb. 15) 1991