首页 | 本学科首页   官方微博 | 高级检索  
     检索      


The chi-square plot: a tool for multivariate outlier recognition
Authors:Robert G Garrett
Abstract:In large multi-element regional surveys statistically derived threshold levels of the form that define, for example, the top 2% of the data for each element as worthy of further investigation have led to the generation of inordinately large lists of geochemical samples for detailed study. This problem is compounded when a number of geological and secondary environments exists of sufficiently different character that separate thresholds should be estimated for each. Additionally, single-element thresholds for multi-element surveys can, in certain circumstances, lead to obviously out-of-character individuals not being recognized.Numerical approaches to the problem of anomaly recognition have commonly used a principal-component or regression analysis procedure as their basis. These, as indeed do all such approaches, have a common drawback in that the outliers being sought can distort the analysis being used to detect them. In addition, regression models have the further problem that there may be outliers in both the response and explanatory variables.A relatively simple approach would be to prepare a multivariate cumulative probability plot where each multi-element geochemical sample is represented as a single value. The resulting diagram would be interpreted much as a univariate probability plot where the presence of more than one straight-line segment is taken as evidence of multiple populations, and outliers as individuals or small groups are separated from the remaining data by gaps on the plot.Such a diagram may be prepared by plotting the rank-ordered values of the generalized or Mahalanobis distance, a multivariate distance measure, versus values of the chi-square statistic. This procedure is based on the covariance matrix of the data, a measure that underlies both principal-component and regression model approaches. In order to work effectively a statistically robust starting covariance matrix is essential.The procedure is described in detail with two examples, one a synthetic bivariate data set containing known outliers, and the other a small, well studied stream sediment data set from Norway extensively used in methodological comparison studies. The result of the procedure is to identify statistical outliers, which are candidates for interpretation as true geochemical anomalies, and to isolate a multi-element subset that is representative of the geochemical background.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号