Significance tests for multivariate normality of clusters from branching patterns in dendrograms |
| |
Authors: | P. H. A. Sneath |
| |
Affiliation: | (1) Department of Microbiology, Leicester University, Leicester, England |
| |
Abstract: | A significance test is presented for whether, based on levels of branches in a dendrogram, a cluster is from a multivariate normal distribution. The method compares the observed cumulative graph of number of branches with a graph derived from a simple logistic function. Provided the number of objects or variables is not small, the difference between graphs can be tested by the Kolmogorov-Smirnov, Cramér-von Mises, and Lilliefors statistics.Logistic functions were obtained by simulation and are available for three similarity measures: (1) Euclidean distances, (2) squared Euclidean distances, and (3) simple matching coefficients, and for five cluster methods: (1) WPGMA, (2) UPGMA, (3) single linkage (or minimum spanning trees), (4) complete linkage, and (5) Ward's increase in sums of squares. For simple matching coefficient, the mean intracluster similarity also is required.The method allows a test of whether the dendrogram could be from a cluster of smaller dimensionality due to character correlations. Good fit of the data to abnormally large or small dimensionality provides an important warning to interpretation of the dendrogram. Quantiles of test statistics were found by simulation to be well-approximated by logistic functions. The Lilliefors test is recommended for general use; if a conservative test is required, the two-tailed Kolmogorov-Smirnov test is most suitable. The method is suitable for use with a hand calculator, and a computer program for it is available from the author. |
| |
Keywords: | classification cluster analysis significance tests multivariate normality minimum spanning trees |
本文献已被 SpringerLink 等数据库收录! |
|