Summary statistics in geochemistry: A study of the performance of robust estimates |
| |
Authors: | Nicholas M. S. Rock |
| |
Affiliation: | (1) Department of Geology, University of Western Australia, 6009 Nedlands, Western Australia |
| |
Abstract: | Numerical data summaries in many geochemical papers rely on arithmetic means, with or without standard deviations. Yet the mean is the worst average (estimate of location) for those extremely common geochemical data sets which are non-normally distributed or include outliers. The widely used geometric mean, although allowing for skewed distributions, is equally susceptible to outliers. The superior performance of 19 robust estimates of location (simple median, plus various combined, adaptive, trimmed, and skipped,L, M, andW estimates) is illustrated using real geochemical data sets varying in sources of error (pure analytical error to multicomponent geological variability), modality (unimodal to polymodal), size (20 to >2000 data values), and continuity (continuous to truncated in either or both tails). The arithmetic mean tends to overestimate location of many geochemical data sets because of positive skew and large outliers; robust estimates yield consistent smaller averages, although some (e.g., Hampel's and Andrew's) do perform better than others (e.g., Shorth mean, dominant cluster mode). Recommended values for international standard rocks, and for such important geochemical concepts as average chondrite, can be reproduced far more simply via robust estimation on complete interlaboratory data sets than via the rather complicated and subjective methods (e.g., laboratory ratings ) so far used in the literature. Robust estimates also seem generally less affected by truncation than the mean; for example, if values below machine detection limits are alternatively treated as missing values or as real values of zero, similar averages are obtained. The standard (and mean) deviations yield consistently larger values of scale for many geochemical data sets than the hinge width (interquartile range) or median absolute deviation from the median. Therefore, summaries of geochemical data should always include at least the simple median and hinge width, to complement the often misleading mean and standard deviation. |
| |
Keywords: | location mean median nonparametric normality outlier robust estimation scale truncation |
本文献已被 SpringerLink 等数据库收录! |
|