首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Various uncertainties arising during acquisition process of geoscience data may result in anomalous data instances(i.e.,outliers)that do not conform with the expected pattern of regular data instances.With sparse multivariate data obtained from geotechnical site investigation,it is impossible to identify outliers with certainty due to the distortion of statistics of geotechnical parameters caused by outliers and their associated statistical uncertainty resulted from data sparsity.This paper develops a probabilistic outlier detection method for sparse multivariate data obtained from geotechnical site investigation.The proposed approach quantifies the outlying probability of each data instance based on Mahalanobis distance and determines outliers as those data instances with outlying probabilities greater than 0.5.It tackles the distortion issue of statistics estimated from the dataset with outliers by a re-sampling technique and accounts,rationally,for the statistical uncertainty by Bayesian machine learning.Moreover,the proposed approach also suggests an exclusive method to determine outlying components of each outlier.The proposed approach is illustrated and verified using simulated and real-life dataset.It showed that the proposed approach properly identifies outliers among sparse multivariate data and their corresponding outlying components in a probabilistic manner.It can significantly reduce the masking effect(i.e.,missing some actual outliers due to the distortion of statistics by the outliers and statistical uncertainty).It also found that outliers among sparse multivariate data instances affect significantly the construction of multivariate distribution of geotechnical parameters for uncertainty quantification.This emphasizes the necessity of data cleaning process(e.g.,outlier detection)for uncertainty quantification based on geoscience data.  相似文献   

2.
3.
针对超声电视测井图像中经常会出现的异常点问题,提出了一种采用全变分的修复算法(TV算法)来剔除异常点。该方法先在选定的区域里根据阈值来确定要剔除的异常点,然后利用全变分的图像修复算法去修复这些异常点。实验结果证明,该方法能够很好地对异常点进行剔除,保证测井图像的完整性,明显地改善图像质量,对实际测井资料的处理具有很重要实际意义。  相似文献   

4.
The multiquadric method (MQ) with high interpolation accuracy has been widely used for interpolating spatial data. However, MQ is an exact interpolation method, which is improper to interpolate noisy sampling data. Although the least squares MQ (LSMQ) has the ability to smooth out sampling errors, it is inherently not robust to outliers due to the least squares criterion in estimating the weights of sampling knots. In order to reduce the impact of outliers on the accuracy of digital elevation models (DEMs), a robust method of MQ (MQ-R) has been developed. MQ-R includes two independent procedures: knot selection and the solution of the system of linear equations. The two independent procedures were respectively achieved by the space-filling design and the least absolute deviation, both of which are very robust to outliers. Gaussian synthetic surface, which is subject to a series of errors with different distributions, was employed to compare the performance of MQ-R with that of LSMQ. Results indicate that LSMQ is seriously affected by outliers, whereas MQ-R performs well in resisting outliers, and can construct satisfactory surfaces even though the data are contaminated by severe outliers. A real-world example of DEM construction was employed to evaluate the robustness of MQ-R, LSMQ, and the classical interpolation methods including inverse distance weighting method, thin plate spline, and ANUDEM. Results showed that compared with the classical methods, MQ-R has the highest accuracy in terms of root mean square error. In conclusion, when sampling data is subject to outliers, MQ-R can be considered as an alternative method for DEM construction.  相似文献   

5.
The statistical analysis of compositional data is based on determining an appropriate transformation from the simplex to real space. Possible transfonnations and outliers strongly interact: parameters of transformations may be influenced particularly by outliers, and the result of goodness-of-fit tests will reflect their presence. Thus, the identification of outliers in compositional datasets and the selection of an appropriate transformation of the same data, are problems that cannot be separated. A robust method for outlier detection together with the likelihood of transformed data is presented as a first approach to solve those problems when the additive-logratio and multivariate Box-Cox transformations are used. Three examples illustrate the proposed methodology.  相似文献   

6.
几种特异值处理方法的比较   总被引:9,自引:0,他引:9  
特异值(又称特高品位)存在于抽样调查之中。在地质统计学中,如果观测值存在有特异值,就严重的影响变差函数的计算结果,从而大大影响了地质统计学研究结果的精度。本文通过对目前国内外处理特异值方法(1.估计邻域法ENM2.影响系数法ICM3.相对变差函数法GRV.PRV)的比较,确定处理方法的优劣,对提高地质统计学研究结果的精度,有积极的作用  相似文献   

7.
当精密单点定位的观测值含有异常数据时,Kalman滤波的精度将会降低。采用抗差Kalman滤波方法能够有效抑制观测异常,提高滤波的精度和可靠性。运用武汉国际GPS服务跟踪站数据对该方法进行了验证。结果表明,抗差Kalman滤波的精度比Kalman滤波的精度有一定程度提高,说明抗差Kalman滤波能够有效抑制观测异常。  相似文献   

8.
Variograms for gold and lead values from the Loraine and Prieska mines, respectively, indicate that data outliers can seriously distort and/or mask the real variogram patterns. Studies show that this problem is best overcome for these mines by logarithmic transformation of the data, and/or a suitable screening out of such outliers, and/or more robust variogram estimation procedures; the benefits are particularly significant when the basic data is limited.  相似文献   

9.
Abstract The calibration of geothermometers and geobarometers should involve not only the determination of the parameters in the equation used, but also the uncertainties on, and the correlations between, these parameters. This necessitates the use of a technique such as least squares. Given the poor performance of least squares in the presence of outliers in the data, techniques for identifying outliers for exclusion—regression diagnostics, and techniques for handling data which include outliers—robust regression and jackknifing, are essential. These techniques are summarized and their importance is emphasized, and they are applied to the calibration of the garnet-clinopyroxene Fe-Mg exchange geothermometer.
The experimental data of Raheim & Green (1974) and Ellis & Green (1979) are explored using regression diagnostics to discover outliers in the data. After exclusion of the two influential outliers found, a new geothermometer equation for garnet-clinopyroxene Fe-Mg exchange is derived using robust regression and based on all the data: thus, T (K) = 2790 + 10 P + 3140xca,g/1.735 + In K D where T is in Kelvin and P is in kbar. This equation, as might be hoped, is essentially identical to that of Ellis & Green (1979). Equations for calculating the uncertainty in a calculated temperature, contributed by uncertainties in the calibration, are also derived.  相似文献   

10.
Advantages of robust procedures over ordinary least-squares procedures in geochemical data analysis is demonstrated using NURE data from the Hot Springs Quadrangle, South Dakota, U.S.A. Robust principal components analysis with 5% multivariate trimming successfully guarded the analysis against perturbations by outliers and increased the number of interpretable factors. Regression with SINE estimates significantly increased the goodness-of-fit of the regression and improved the correspondence of delineated anomalies with known uranium prospects. Because of the ubiquitous existence of outliers in geochemical data, robust statistical procedures are suggested as routine procedures to replace ordinary least-squares procedures.  相似文献   

11.
Outlier detection is often a key task in a statistical analysis and helps guard against poor decision-making based on results that have been influenced by anomalous observations. For multivariate data sets, large Mahalanobis distances in raw data space or large Mahalanobis distances in principal components analysis, transformed data space, are routinely used to detect outliers. Detection in principal components analysis space can also utilise goodness of fit distances. For spatial applications, however, these global forms can only detect outliers in a non-spatial manner. This can result in false positive detections, such as when an observation’s spatial neighbours are similar, or false negative detections such as when its spatial neighbours are dissimilar. To avoid mis-classifications, we demonstrate that a local adaptation of various global methods can be used to detect multivariate spatial outliers. In particular, we account for local spatial effects via the use of geographically weighted data with either Mahalanobis distances or principal components analysis. Detection performance is assessed using simulated data as well as freshwater chemistry data collected over all of Great Britain. Results clearly show value in both geographically weighted methods to outlier detection.  相似文献   

12.
‘Wild’, ‘rogue’ or outlying determinations occur periodically during geochemical analysis. Existing tests in the literature for the detection of such determinations within a set of replicate measurements are often misleading. This account describes the chances of detecting outliers and the extent to which correction may be made for their presence in sample sizes of three to seven replicate measurements. A systematic procedure for monitoring data for outliers is outlined. The problem of outliers becomes more important as instrumental methods of analysis become faster and more highly automated; a state in which it becomes increasingly difficult for the analyst to examine every determination. The recommended procedure is easily adapted to such analytical systems.  相似文献   

13.
常量金标准物质定值中离群值的统计识别   总被引:1,自引:0,他引:1  
离群值的剔除常用数理统计的方法,如格拉布斯检验法和迪克逊检验法等,但是这些统计方法用于常量金标准物质分析结果的统计检验,都存在着对离群值剔除明显不够的问题.本文建立了以常量金重复分析相对偏差允许限为依据的离群值统计识别方法,包括统计计算待定值样品中金的算术平均值x和相对偏差允许限YG,确定合格的测定结果的数据区间,从而识别出离群值并予以剔除;一次剔除后,按照新的统计量确定下一轮的离群值剔除范围,直到无离群值后,给出金的平均值及其波动范围.以15个人工组合的常量金标准物质为例,模拟金标准物质定值分析,以密码形式分派给不同单位和分析者,共收集10套独立分析结果,采用本法剔除离群值后,所得金算术平均值与金标准参考值更加接近,其相对偏差的质量分数为0.35,达到优秀;而格拉布斯法(或迪克逊法)和中位值法的质量分数分别为0.42和0.40,只能达到良好.应用本文建立的离群值统计识别方法,质量分数等级有了明显提高,增强了数据统计分析的有效性.  相似文献   

14.
Highly Robust Variogram Estimation   总被引:5,自引:0,他引:5  
The classical variogram estimator proposed by Matheron is not robust against outliers in the data, nor is it enough to make simple modifications such as the ones proposed by Cressie and Hawkins in order to achieve robustness. This paper proposes and studies a variogram estimator based on a highly robust estimator of scale. The robustness properties of these three estimators are analyzed and compared. Simulations with various amounts of outliers in the data are carried out. The results show that the highly robust variogram estimator improves the estimation significantly.  相似文献   

15.
柳江盆地浅层地下水硝酸盐背景值研究   总被引:2,自引:0,他引:2       下载免费PDF全文
为探索地下水硝酸盐背景值获取方法,文章以柳江盆地为研究对象,在对比分析国内外研究方法的基础上,首先采用绝对含量和毫克当量百分位数双因子法从宏观上剔除硝酸盐异常数据,然后再利用层次聚类分析结合主成分分析法,分析地下水水化学特征及识别异常分类,进一步剔除异常数据。最后剩余数据进行分布类型检验,采用浓度累计频率法确定地下水硝酸盐背景值范围。研究结果表明,绝对含量和毫克当量百分位数双因子法虽然不能完全剔除异常值,但可为后续层次聚类分析异常识别减少异常信息和子集;层次聚类分析法注重对各亚类的水化学特征分析来识别分析异常数据,具有识别人为异常和天然异常的优势。对比分析常用的数理学方法计算表明,2种方法结合,更能有效识别异常,计算出的地下水硝酸盐背景值更合理。异常数据剔除分析表明,柳江盆地浅层地下水硝酸盐异常与农业化肥的超量施用和居民生活污水与垃圾粪便的下渗污染具有密切的关系。  相似文献   

16.
In large multi-element regional surveys statistically derived threshold levels of the form that define, for example, the top 2% of the data for each element as worthy of further investigation have led to the generation of inordinately large lists of geochemical samples for detailed study. This problem is compounded when a number of geological and secondary environments exists of sufficiently different character that separate thresholds should be estimated for each. Additionally, single-element thresholds for multi-element surveys can, in certain circumstances, lead to obviously out-of-character individuals not being recognized.Numerical approaches to the problem of anomaly recognition have commonly used a principal-component or regression analysis procedure as their basis. These, as indeed do all such approaches, have a common drawback in that the outliers being sought can distort the analysis being used to detect them. In addition, regression models have the further problem that there may be outliers in both the response and explanatory variables.A relatively simple approach would be to prepare a multivariate cumulative probability plot where each multi-element geochemical sample is represented as a single value. The resulting diagram would be interpreted much as a univariate probability plot where the presence of more than one straight-line segment is taken as evidence of multiple populations, and outliers as individuals or small groups are separated from the remaining data by gaps on the plot.Such a diagram may be prepared by plotting the rank-ordered values of the generalized or Mahalanobis distance, a multivariate distance measure, versus values of the chi-square statistic. This procedure is based on the covariance matrix of the data, a measure that underlies both principal-component and regression model approaches. In order to work effectively a statistically robust starting covariance matrix is essential.The procedure is described in detail with two examples, one a synthetic bivariate data set containing known outliers, and the other a small, well studied stream sediment data set from Norway extensively used in methodological comparison studies. The result of the procedure is to identify statistical outliers, which are candidates for interpretation as true geochemical anomalies, and to isolate a multi-element subset that is representative of the geochemical background.  相似文献   

17.
The presence of outliers and the statistical noise that affects the data for reference materials have undesirable effects on the mean and on other indicators of the central value. Five robust indicators of the central value, which are resistant to obvious outliers and less obvious contamination (spurious data), were investigated: the dominant cluster mode, the modian, the Gastwirth median, the trimean, and the trimmed mean. The mean and the median were investigated for purposes of comparison.
The results confirm that the mean is very unreliable, and that the Gastwirth median and the dominant cluster mode are strong indicators of the central value.  相似文献   

18.
Numerous studies report geochemical data on reference materials (RMs) processed by outlier-based methods that use univariate discordancy tests. However, the relative efficiency of the discordancy tests is not precisely known. We used an extensive geochemical database for thirty-five RMs from four countries (Canada, Japan, South Africa and USA) to empirically evaluate the performance of nine single-outlier tests with thirteen test variants. It appears that the kurtosis test (N15) is the most powerful test for detecting discordant outliers in such geochemical RM databases and is closely followed by the Grubbs type tests (N1 and N4) and the skewness test (N14). The Dixon-type tests (N7, N8, N9 and N10) as well as the Grubbs type test (N2) depicted smaller global relative efficiency criterion values for the detection of outlying observations in this extensive database. Upper discordant outliers were more common than the lower discordant outliers, implying that positively skewed inter-laboratory geochemical datasets are more frequent than negatively skewed ones and that the median, a robust central tendency indicator, is likely to be biased especially for small-sized samples. Our outlier-based procedure should be useful for objectively identifying discordant outliers in many fields of science and engineering and for interpreting them accordingly. After processing these databases by single-outlier discordancy tests and obtaining reliable estimates of central tendency and dispersion parameters of the geochemical data for the RMs in our database, we used these statistical data to apply a weighted least-squares linear regression (WLR) model for the major element determinations by X-ray fluorescence spectrometry and compared the WLR results with an ordinary least-squares linear regression model. An advantage in using our outlier procedure and the new concentration values and uncertainty estimates for these RMs was clearly established.  相似文献   

19.
Many data sets can be viewed as a collection of samples representing mixtures of a relatively small number of end members. When end members are present in the sample set, the algorithm QMODEL by Klovan and Miesch can efficiently determine proportionate contributions. EXTENDED QMODEL by Full, Ehrlich, and Klovan was designed to deduce the composition of realistic end members when the end members are not represented by samples. However, in the presence of high levels of random variation or outliers not belonging to the system of interest, EXTENDED QMODEL may not be reliable inasmuch as it is largely dependent on extreme values for definition of an initial mixing polyhedron. FUZZY QMODEL utilizes the fuzzy c-means algorithm of Bezdek to provide an alternative initial mixing polyhedron. This algorithm utilizes the collective property of all the data rather than outliers and so can produce suitable solutions in the presence of noisy or “messy” data points.  相似文献   

20.
Empirical discriminant analysis classified multivariate data from 2174 geochemical reconnaissance samples from South Greenland, so that they were related to known geological units or characterized as outliers. Training sets, comprising 514 samples from 14 geologic units were selected in order to reflect only the background conditions of each geological unit. A smoothing parameter of 0.5 maximized correct classification of the training sets and extracted a reasonable number of outliers (289, 13% of the samples) representing geographically grouped anomalies. Plots of the geochemical samples classified into the geological units corresponded well to the geological map.Q-mode cluster analysis classified the 289 outliers into 30 groups with different element associations. All types of mineral occurrence known in South Greenland could be recognized amongst the clusters. For example, there were seven clusters which were characterized by samples with high U values and different associated elements each one related to a different type of U mineralization. Another cluster containing samples with high Zr, Nb, and Y values reflects recently discovered pyrochlore mineralization. Other clusters were explained on the basis of geological units which were too small to be mapped or included amongst the training sets.Empirical discriminant analysis successfully reduced the multivariate data to one map, which made it easier to evaluate the varying element levels over the different geological units. Incorrectly classified samples require follow-up in order to appraise the accuracy of the geological mapping. Classification of the outliers by cluster analysis assists both in identifying samples influenced by mineral occurrences and in predicting the type of mineralization to be expected, thereby substantially aiding in the selection of areas for mineral exploration.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号