首页 | 本学科首页   官方微博 | 高级检索  
     


Summary statistics in geochemistry: A study of the performance of robust estimates
Authors:Nicholas M. S. Rock
Affiliation:(1) Department of Geology, University of Western Australia, 6009 Nedlands, Western Australia
Abstract:Numerical data summaries in many geochemical papers rely on arithmetic means, with or without standard deviations. Yet the mean is the worst average (estimate of location) for those extremely common geochemical data sets which are non-normally distributed or include outliers. The widely used geometric mean, although allowing for skewed distributions, is equally susceptible to outliers. The superior performance of 19 ldquorobustrdquo estimates of location (simple median, plus various combined, adaptive, trimmed, and skipped,L, M, andW estimates) is illustrated using real geochemical data sets varying in sources of error (pure analytical error to multicomponent geological variability), modality (unimodal to polymodal), size (20 to >2000 data values), and continuity (continuous to truncated in either or both tails). The arithmetic mean tends to overestimate location of many geochemical data sets because of positive skew and large outliers; robust estimates yield consistent smaller averages, although some (e.g., Hampel's and Andrew's) do perform better than others (e.g., Shorth mean, dominant cluster mode). Recommended values for international standard rocks, and for such important geochemical concepts as ldquoaverage chondrite,rdquo can be reproduced far more simply via robust estimation on complete interlaboratory data sets than via the rather complicated and subjective methods (e.g., ldquolaboratory ratingsrdquo) so far used in the literature. Robust estimates also seem generally less affected by truncation than the mean; for example, if values below machine detection limits are alternatively treated as missing values or as real values of zero, similar averages are obtained. The standard (and mean) deviations yield consistently larger values of scale for many geochemical data sets than the hinge width (interquartile range) or median absolute deviation from the median. Therefore, summaries of geochemical data should always include at least the simple median and hinge width, to complement the often misleading mean and standard deviation.
Keywords:location  mean  median  nonparametric  normality  outlier  robust estimation  scale  truncation
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号