This paper introduces a novel methodology that enables recognition of distributional functions without doing a fit of the data to the distribution. This methodology combines the technique of equation signatures, which uses coefficient independent properties of equations for recognition, with amachine learning algorithm, C4.5, to obtain accurate recognitionof distributional functions. The signature of a distributionalfunction is a linear probability plot. Thus this methodologyautomates the formation and interpretation of probability plots.This combination is shown to be more accurate than probabiltyplots alone for samples of size one hundred on a group of eighteen noise models comprising four symmetric distributionsof varying tail lengths and fourteen noise models drawnfrom the extreme value family. An analysis of the C4.5 rulesindicates how to create the most predictive attributes forthe general case.
Citation: NIST Interagency/Internal Report (NISTIR) - 6187
NIST Pub Series: NIST Interagency/Internal Report (NISTIR)
Pub Type: NIST Pubs
decision tree, distribution finding, equation discovery, equation signatures, function finding, learning, probability plotting