首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A totally objective procedure involving sixteen statistical tests (a total of thirty four single or multiple outlier versions of these tests) for outlier detection and rejection in a univariate sample is applied to a data base of sixty four elements in a recently issued international geochemical reference material (RM), a microgabbro PM-S from Scotland. This example illustrates the relative importance and usefulness of these tests in processing modern geochemical data for possible outliers and obtaining mean concentration and other statistical parameters from a final normal sample of univariate data. The final mean values are more reliable (characterized by smaller standard deviations and narrower confidence limits) than those obtained earlier using an accommodation approach (robust techniques) applied to this data base. Very high quality (certified value equivalent, cve) mean data are now obtained for eleven elements as well as high quality recommended values (rv) for thirty three elements in PM-S. Earlier work using the accommodation approach failed to establish even one cve value for any of the sixty four elements compiled here. The present procedure of outlier detection and elimination is therefore recommended in the study of RMs  相似文献   

2.
Numerous studies report geochemical data on reference materials (RMs) processed by outlier-based methods that use univariate discordancy tests. However, the relative efficiency of the discordancy tests is not precisely known. We used an extensive geochemical database for thirty-five RMs from four countries (Canada, Japan, South Africa and USA) to empirically evaluate the performance of nine single-outlier tests with thirteen test variants. It appears that the kurtosis test (N15) is the most powerful test for detecting discordant outliers in such geochemical RM databases and is closely followed by the Grubbs type tests (N1 and N4) and the skewness test (N14). The Dixon-type tests (N7, N8, N9 and N10) as well as the Grubbs type test (N2) depicted smaller global relative efficiency criterion values for the detection of outlying observations in this extensive database. Upper discordant outliers were more common than the lower discordant outliers, implying that positively skewed inter-laboratory geochemical datasets are more frequent than negatively skewed ones and that the median, a robust central tendency indicator, is likely to be biased especially for small-sized samples. Our outlier-based procedure should be useful for objectively identifying discordant outliers in many fields of science and engineering and for interpreting them accordingly. After processing these databases by single-outlier discordancy tests and obtaining reliable estimates of central tendency and dispersion parameters of the geochemical data for the RMs in our database, we used these statistical data to apply a weighted least-squares linear regression (WLR) model for the major element determinations by X-ray fluorescence spectrometry and compared the WLR results with an ordinary least-squares linear regression model. An advantage in using our outlier procedure and the new concentration values and uncertainty estimates for these RMs was clearly established.  相似文献   

3.
Various uncertainties arising during acquisition process of geoscience data may result in anomalous data instances(i.e.,outliers)that do not conform with the expected pattern of regular data instances.With sparse multivariate data obtained from geotechnical site investigation,it is impossible to identify outliers with certainty due to the distortion of statistics of geotechnical parameters caused by outliers and their associated statistical uncertainty resulted from data sparsity.This paper develops a probabilistic outlier detection method for sparse multivariate data obtained from geotechnical site investigation.The proposed approach quantifies the outlying probability of each data instance based on Mahalanobis distance and determines outliers as those data instances with outlying probabilities greater than 0.5.It tackles the distortion issue of statistics estimated from the dataset with outliers by a re-sampling technique and accounts,rationally,for the statistical uncertainty by Bayesian machine learning.Moreover,the proposed approach also suggests an exclusive method to determine outlying components of each outlier.The proposed approach is illustrated and verified using simulated and real-life dataset.It showed that the proposed approach properly identifies outliers among sparse multivariate data and their corresponding outlying components in a probabilistic manner.It can significantly reduce the masking effect(i.e.,missing some actual outliers due to the distortion of statistics by the outliers and statistical uncertainty).It also found that outliers among sparse multivariate data instances affect significantly the construction of multivariate distribution of geotechnical parameters for uncertainty quantification.This emphasizes the necessity of data cleaning process(e.g.,outlier detection)for uncertainty quantification based on geoscience data.  相似文献   

4.
This compilation report describes the field location, mineralogy, preparation and homogeneity testing of two new GIT-IWG reference materials: Whin Sill dolerite (WS-E) from England and Pitscurrie microgabbro (PM-S) from Scotland. The elemental composition of these two new reference materials has been established by an international cooperative analysis programme involving participation by 104 laboratories. A full assessment of these analytical results is presented, from which working values have been derived for the major elements as well as for 45 trace elements in WS-E and 44 trace elements in PM-S. Furthermore, isotopic ratios are presented for both samples, particularly for 87Sr/86Sr and 143Nd/144Nd.  相似文献   

5.
A common characteristic of concentration data compilations for geochemical reference materials (GRM) is a skewed frequency distribution because of aberrant analytical data. Rejection of outlying results usually is required to obtain a better estimate of mean concentration values. The present work describes the application of an approach, based on skewness and kurtosis statistical tests, to establish reliable concentration values in study of rare-earth element compilations for GRM. Frequency histograms of initial concentrations for these elements differ significantly from a normal distribution, as deduced from their skewness and kurtosis coefficients. The statistical procedure shows that rejection of outliers yields normal distributions which can be used to provide mean concentrations with smaller standard deviations for a large number of cases, although for many elements our %RSD are similar (within 5%) to literature values. This procedure has been applied to derive new concentration data for rare-earth elements in 26 GRM, which are compared with earlier compilations to show that the statistical procedure proposed here provides mean values with greater quality value. Although most present mean concentrations are similar (within 5%) to those reported in previous literature, some of them are significantly different (with differences of up to 40%) in a few GRM.  相似文献   

6.
The statistical analysis of compositional data is based on determining an appropriate transformation from the simplex to real space. Possible transfonnations and outliers strongly interact: parameters of transformations may be influenced particularly by outliers, and the result of goodness-of-fit tests will reflect their presence. Thus, the identification of outliers in compositional datasets and the selection of an appropriate transformation of the same data, are problems that cannot be separated. A robust method for outlier detection together with the likelihood of transformed data is presented as a first approach to solve those problems when the additive-logratio and multivariate Box-Cox transformations are used. Three examples illustrate the proposed methodology.  相似文献   

7.
常量金标准物质定值中离群值的统计识别   总被引:1,自引:0,他引:1  
离群值的剔除常用数理统计的方法,如格拉布斯检验法和迪克逊检验法等,但是这些统计方法用于常量金标准物质分析结果的统计检验,都存在着对离群值剔除明显不够的问题.本文建立了以常量金重复分析相对偏差允许限为依据的离群值统计识别方法,包括统计计算待定值样品中金的算术平均值x和相对偏差允许限YG,确定合格的测定结果的数据区间,从而识别出离群值并予以剔除;一次剔除后,按照新的统计量确定下一轮的离群值剔除范围,直到无离群值后,给出金的平均值及其波动范围.以15个人工组合的常量金标准物质为例,模拟金标准物质定值分析,以密码形式分派给不同单位和分析者,共收集10套独立分析结果,采用本法剔除离群值后,所得金算术平均值与金标准参考值更加接近,其相对偏差的质量分数为0.35,达到优秀;而格拉布斯法(或迪克逊法)和中位值法的质量分数分别为0.42和0.40,只能达到良好.应用本文建立的离群值统计识别方法,质量分数等级有了明显提高,增强了数据统计分析的有效性.  相似文献   

8.
‘Wild’, ‘rogue’ or outlying determinations occur periodically during geochemical analysis. Existing tests in the literature for the detection of such determinations within a set of replicate measurements are often misleading. This account describes the chances of detecting outliers and the extent to which correction may be made for their presence in sample sizes of three to seven replicate measurements. A systematic procedure for monitoring data for outliers is outlined. The problem of outliers becomes more important as instrumental methods of analysis become faster and more highly automated; a state in which it becomes increasingly difficult for the analyst to examine every determination. The recommended procedure is easily adapted to such analytical systems.  相似文献   

9.
Outlier detection is often a key task in a statistical analysis and helps guard against poor decision-making based on results that have been influenced by anomalous observations. For multivariate data sets, large Mahalanobis distances in raw data space or large Mahalanobis distances in principal components analysis, transformed data space, are routinely used to detect outliers. Detection in principal components analysis space can also utilise goodness of fit distances. For spatial applications, however, these global forms can only detect outliers in a non-spatial manner. This can result in false positive detections, such as when an observation’s spatial neighbours are similar, or false negative detections such as when its spatial neighbours are dissimilar. To avoid mis-classifications, we demonstrate that a local adaptation of various global methods can be used to detect multivariate spatial outliers. In particular, we account for local spatial effects via the use of geographically weighted data with either Mahalanobis distances or principal components analysis. Detection performance is assessed using simulated data as well as freshwater chemistry data collected over all of Great Britain. Results clearly show value in both geographically weighted methods to outlier detection.  相似文献   

10.
Basically, two main types of statistical methods – robust and outlier-based – are available for handling experimental data; we document here the application of the outlier-based method. Due to the unavailability of a suitable software system for statistically correct application of the outlier-based method, a new computer program, DODESSYS (Discordant Outlier DEtection and Separation SYStem), was written for the application of 33 discordancy test variants to experimental data, constituting contaminated or uncontaminated normal statistical samples. We illustrate the application of the discordant outlier-based scheme by five specific examples; three include univariate data for which this procedure was specifically designed and two are for bivariate data for which this methodology can be easily adopted. We thus report new statistical information on two reference materials (granite G-2 and sediment IAEA-417), bryozoan species from eastern Oman, a new improved Na/K geothermometric equation, and a more significant correlation with water depth of the abundance of meiofauna from the Gulf of Mexico. Recently, two sets of multi-dimensional discrimination diagrams for basic as well as acid rocks have been proposed from statistically correct methodology of natural logarithm-transformation of element ratios; the diagrams also require that these ratios should be normally distributed. We present numerous examples of application of these new diagrams for inferring tectonic setting of Archaean to Recent rocks, both before and after testing the datasets for discordant outliers. We recommend that outlying observations should always be evaluated for their discordancy.  相似文献   

11.
连续在线滨海湿地生态物联网观测系统,因传感器技术局限及环境干扰会产生异常观测数据,影响数据使用,有效的数据预处理极为重要。以上海崇明东滩国际重要湿地生态观测数据为研究对象,将异常数据分为数值异常、波动异常与异常事件3种类型,基于回归残差概率分布异常检测算法,使用查找表和多指标时间序列模型,综合多环境要素相互关系,构建针对滨海湿地生态观测的数据预处理方法。相比传统方法,该方法在保证异常数据检测精度的同时,更好地区分了异常事件与传感器异常,减少误判。通过分析9个指标5万余条数据,以10-8~10-20的阈值分别检测出0.18%~8.12%的数值异常和波动异常,以及2次异常事件。分析数据预处理结果,传感器的观测原理、观测季节等因素会影响传感器的稳定性,人类活动是造成观测区异常事件发生的主要因素。  相似文献   

12.
13.
The reflectance of vitrinite (collotelinite) particles is a widely used parameter as a geothermometer for the estimation of the thermal maturity of organic matter enclosed in rocks. However, several problems have occurred during the last decades, which can be traced back to basically three causes: human mistakes, technical problems, and problems associated with the structural and compositional inhomogeneity of organic matter. Whilst in most cases the first two types of uncertainties can be handled by standardization, the third can cause significant problems during interpretation due to its generally inestimable character. The suppression of vitrinite reflectance and statistical problems originated from small sample size, and outliers belong to this latter type.International standards, such as the ASTM and the ISO, define the vitrinite reflectance parameter as a statistical average of measured data, disregarding the fact that the average is an unresisting and unrobust statistical parameter. In other words, the average is very sensitive to outliers and distribution.The aim of this research was to find and test a better, more resistant, and robust statistical parameter used by traditional parametric and nonparametric statistics, which can be applied in practice instead of the average. Three categories of statistical problems were studied on coal and disperse organic matter (DOM) samples: the distribution of measured values, the effect of data number, and the effect of outliers on statistical parameters. The statistical experiments carried out on numerous original and generated sample sets show that the median (med) and the most frequent value (Mn), a special weighted average, are better parameters to estimate the thermal maturity of organic matter especially above 1% reflectance value.  相似文献   

14.
Previous interpretations of surface-rock geochemical data from the sheeted-vein tin mineralization in the Emmaville district have been carried out using classical statistics. These investigations revealed low-contrast geochemical patterns of 3 to 5 ppm Sn, supported by 80 to 160 ppm F, block-average contours defining four of the six known mineral occurrences. Principal component scores for the association dominated by F-Li-Rb have defined the same four mineral occurrences. For the prospecting of similar deposits it is highly desirable to improve the data processing techniques to achieve more acceptable geochemical contrasts between anomalous and background levels. Minimum volume ellipsoid (MVE) estimation, a high-breakdown method (capable of accommodating up to 50% outliers) recently developed in robust statistics is applied to a subset of the data from the northeastern part of the Emmaville district. The anomalies related to mineralization in this part of the district are not as well developed compared to those in the west. The data set used in this study consists of 133 observations with 6 elements, namely Cu, Li, Rb, F, As and Sn.The detection of multivariate outliers (anomalous observations) by Mahalanobis distance calculation was carried out on the surface rock geochemical data. The robust Mahalanobis distances computed from MVE estimates of location and scatter shows little variation over background areas but are sharply enhanced over mineralization. In contrast, the usual Mahalanobis distances either fail to indicate the presence of mineralization altogether, or, at best, respond with feebly enhanced values that do not satisfactorily indicate the presence of mineralization.Graphical display of results from classical RQ-PCA performs poorly, revealing only 6 weakly anomalous observations related to mineral occurrences. Several additional observations from these occurrences have also gone undetected. On the other hand, results from MVE-robust RQ-mode principal component analysis show that the background observations cluster tightly within the 95% tolerance ellipse while the anomalous observations (related to mineral occurrences) are greatly enhanced and the variables that characterize them are clearly indicated. Results are consistent with those of robust Mahalanobis distance procedure; both techniques indicate essentially the same observations as being anomalous.  相似文献   

15.
This work presents a comparison of relative efficiency of fourteen statistical tests to detect outliers in normally distributed samples. These tests include deviation/spread, Grubbs-type, Dixon-type, and high-order moment statistics. Performance for the statistical tests is evaluated using Geochemical Reference Material databases from the United States Geological Survey. Test efficiency is compared for the first application of statistics in version k = 1 and k = 2, as well as for tests in version k = 1 applied consecutively against block procedures in which k = 2, 3, or 4 values are evaluated at the same time. In both evaluations, the sensitivity of the statistical tests shows a general strong dependence on sample size. The best performance is observed for the block procedures compared to consecutive statistical tests affected by masking effects.  相似文献   

16.
根据WJH多功能电位分析仪,讨论在电化学分析仪器中误差分析与处理方法,利用计算机软件进行误差处理,该系统设计了10种误差处理方法.  相似文献   

17.
Debris flow is one of the most destructive mass movements. Sometimes regional debris flow susceptibility or hazard assessments can be more difficult than the other mass movements. Determination of debris accumulation zones and debris source areas, which is one of the most crucial stages in debris flow investigations, can be too difficult because of morphological restrictions. The main goal of the present study is to extract debris source areas by logistic regression analyses based on the data from the slopes of the Barla, Besparmak and Kapi Mountains in the SW part of the Taurids Mountain belt of Turkey, where formation of debris material are clearly evident and common. In this study, in order to achieve this goal, extensive field observations to identify the areal extent of debris source areas and debris material, air-photo studies to determine the debris source areas and also desk studies including Geographical Information System (GIS) applications and statistical assessments were performed. To justify the training data used in logistic regression analyses as representative, a random sampling procedure was applied. By using the results of the logistic regression analysis, the debris source area probability map of the region is produced. However, according to the field experiences of the authors, the produced map yielded over-predicted results. The main source of the over-prediction is structural relation between the bedding planes and slope aspects on the basis of the field observations, for the generation of debris, the dip of the bedding planes must be taken into consideration regarding the slope face. In order to eliminate this problem, in this study, an approach has been developed using probability distribution of the aspect values. With the application of structural adjustment, the final adjusted debris source area probability map is obtained for the study area. The field observations revealed that the actual debris source areas in the field coincide with the areas having high probability values on this final map.  相似文献   

18.
马氏距离是一种多元异常识别方法,目前已有多种基于马氏距离的异常识别方法.笔者选择青海省东昆仑东段1∶50万水系沉积物测量数据,对比常规马氏距离、基于最小方差行列式(FMCD)的稳健马氏距离、基于校正的最小方差行列式的稳健马氏距离(Adaptive)和基于协中值的稳健马氏距离(Comedian)4种方法在识别Cu、Co、...  相似文献   

19.
A high-resolution 14C chronology for the Teopancazco archaeological site in the Teotihuacan urban center of Mesoamerica was generated by Bayesian analysis of 33 radiocarbon dates and detailed archaeological information related to occupation stratigraphy, pottery and archaeomagnetic dates. The calibrated intervals obtained using the Bayesian model are up to ca. 70% shorter than those obtained with individual calibrations. For some samples, this is a consequence of plateaus in the part of the calibration curve covered by the sample dates (2500 to 1450 14C yr BP). Effects of outliers are explored by comparing the results from a Bayesian model that incorporates radiocarbon data for two outlier samples with the same model excluding them. The effect of outliers was more significant than expected. Inclusion of radiocarbon dates from two altered contexts, 500 14C yr earlier than those for the first occupational phase, results in ages calculated by the model earlier than the archaeological records. The Bayesian chronology excluding these outliers separates the first two Teopancazco occupational phases and suggests that ending of the Xolalpan phase was around cal AD 550, 100 yr earlier than previously estimated and in accordance with previously reported archaeomagnetic dates from lime plasters for the same site.  相似文献   

20.
Ordinary kriging is well-known to be optimal when the data have a multivariate normal distribution (and if the variogram is known), whereas lognormal kriging presupposes the multivariate lognormality of the data. But in practice, real data never entirely satisfy these assumptions. In this article, the sensitivity of these two kriging estimators to departures from these assumptions and in particular, their resistance to outliers is considered. An outlier effect index designed to assess the effect of a single outlier on both estimators is proposed, which can be extended to other types of estimators. Although lognormal kriging is sensitive to slight variations in the sill of the variogram of the logs (i.e., their variance), it is not influenced by the estimate of the mean of the logs.This paper was presented at MGUS 87 Conference, Redwood City, California, 14 April 1987.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号