首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
An effect of closure on the structure of principal components   总被引:2,自引:0,他引:2  
The principal components transformation generates, from any data array, a new set of variables—the scores of the components—characterized by a total variance exactly equal to that of the initial set. It is in this sense that the transformed variables are said to contain, preserve, or account for, the variance of the original set. The scores, however, are uncorrelated. In the course of the transformation, what becomes of the strong interdependence of variance and covariance so characteristic of closed arrays? The question seems to have attracted little attention; we are aware of no study of it in the earth sciences. Experimental work reported here shows quite clearly that the overall equivalence of variance and covariance imposed by closure, though absent from the component scores,may emerge in relations between the coefficientsof each of the lower-order components; if the raw data are complete rock analyses, the sum of all the covariances of the coefficients of such a component is negative, and is very nearly equal to the sum of all the variances in absolute value. (In all cases so far examined, the absolute value of the first sum is a little less than that of the second.) The principal components transformation provides an elegant escape from closure correlation if a petrographic problem can be restated entirely in terms of component scores, but not if a physical interpretation of the component vectors is required.  相似文献   

2.
The Chayes-Kruskal procedure for testing correlations between proportions uses a linear approximation to the actual closure transformation to provide a null value,p ij , against which an observed closed correlation coefficient,r ij , can be tested. It has been suggested that a significant difference betweenp ij andr ij would indicate a nonzero covariance relationship between theith andjth open variables. In this paper, the linear approximation to the closure transformation is described in terms of a matrix equation. Examination of the solution set of this equation shows that estimation of, or even the identification of, significant nonzero open correlations is essentially impossible even if the number of variables and the sample size are large. The method of solving the matrix equation is described in the appendix.  相似文献   

3.
Multiple linear regression analysis may be used to describe the relation of one geologic variable to a number of other (independent) variables, and also may be used to fit a trend surface to geographically distributed variables. The leastsquares estimates of the regression coefficients differ unpredictably from the true coefficients if the independent variables are correlated. The estimates can be too large in absolute value, and may have the wrong sign. Also, the least-squares solution may be unstable in that replicate samples can give widely differing values of the regression coefficients. Ridgeregression analysis is a technique for removing the effect of correlations from the regression analysis. The procedure involves addition of a small constant K to the diagonal elements of the standardized covariance matrix. The estimates obtained are biased but have smaller sums of squared deviations between the coefficients and their estimates. The ridge trace, a plot of the coefficients versus K, helps determine the value of K that stabilizes the estimates. Correlations between geologic variables are common, and regression coefficients based on these data may be suspect. In trendsurface analysis, correlations between the geographic coordinates may differ widely, and extreme correlations may be introduced if higher order terms are used in the trend. Ridgeregression analysis serves to guide the geologist to a more reliable interpretation of the results of multiple regression if the independent variables are correlated.  相似文献   

4.
Spatial declustering weights   总被引:1,自引:0,他引:1  
Because of autocorrelation and spatial clustering, all data within a given dataset have not the same statistical weight for estimation of global statistics such mean, variance, or quantiles of the population distribution. A measure of redundancy (or nonredundancy) of any given regionalized random variable Z(uα)within any given set (of size N) of random variables is proposed. It is defined as the ratio of the determinant of the N X Ncorrelation matrix to the determinant of the (N - 1) X (N - 1)correlation matrix excluding random variable Z(uα).This ratio measures the increase in redundancy when adding the random variable Z(uα)to the (N - 1 )remainder. It can be used as declustering weight for any outcome (datum) z(uα). When the redundancy matrix is a kriging covariance matrix, the proposed ratio is the crossvalidation simple kriging variance. The covariance of the uniform scores of the clustered data is proposed as a redundancy measure robust with respect to data clustering.  相似文献   

5.
Studies of correlation coefficients between different sets of global geophysical data may lead to useful inferences concerning their relationship or independence. If one data set is allowed to rotate with respect to another, the statistical theory is complicated and extra care is required before one can conclude that there is any statistical significance to a maximized correlation coefficient. If, for some relative rotation, two spherical harmonic fields are significantly correlated, then their individual degree component harmonics of dominant power must also be significantly correlated. Rotations can be found that result in high correlations between the dominant low-degree spherical harmonics of the geomagnetic and tertestrial gravity field potentials, but rotations can also be found that result in equally high, yet meaningless, correlations if the lunar gravity field is substituted for the geomagnetic field. To explain such high correlations, the theoretical correlation distribution function between normally distributed component harmonics is derived and then verified for lowdegree harmonics by using a Monte Carlo technique which takes into account the three-dimensional rotation group. Some curious properties surface: (1)the correlation distribution function for all possible relative orientations is almost the same between identical and uncorrelated fields; and (2)a system for determining the correlation distribution function from randomly selected fields or from randomly rotated fields is almost ergodic.  相似文献   

6.
The use of principal component analysis in studying chemical trends in volcanic rock suites is described. It is suggested that eigenvectors generated from a correlation matrix, rather than a covariance matrix, could be used in this context. In the latter situation many elements are swamped by silicon's numerical size and range. In the former situation the alkalies and titanium begin to show their true importance.  相似文献   

7.
The application of R-mode principal components analysis to a set of closed chemical data is described using previously published chemical analyses of rocks from Gough Island. Different measures of similarity have been used and the results compared by calculating the correlation coefficients between each of the elements of the extracted eigenvectors and each of the original variables. These correlations provide a convenient measure of the contribution of each variable to each of the principal components. The choice of similarity measure (variance-covariance or correlation coefficient)should reflect the nature of the data and the view of the investigator as to which is the proper weighting of the variables—according to their sample variance or equally. If the data are appropriate for principal components analysis, then the Chayes and Kruskal concept of the hypothetical open and closed arrays and the expected closure correlations would seem to be useful in defining the structure to be expected in the absence of significant departures from randomness. If the data are not multivariate normally distributed, then it is possible that the principal components will not be independent. This may result in significant nonzero covariances between various pairs of principal components.  相似文献   

8.
Dynamic stochastic estimation of physical variables   总被引:1,自引:0,他引:1  
A fundamental problem facing the physical sciences today is analysis of natural variations and mapping of spatiotemporal processes. Detailed maps describing the space/time distribution of groundwater contaminants, atmospheric pollutant deposition processes, rainfall intensity variables, external intermittency functions, etc. are tools whose importance in practical applications cannot be overestimated. Such maps are valuable inputs for numerous applications including, for example, solute transport, storm modeling, turbulent-nonturbulent flow characterization, weather prediction, and human exposure to hazardous substances. The approach considered here uses the spatiotemporal random field theory to study natural space/time variations and derive dynamic stochastic estimates of physical variables. The random field model is constructed in a space/time continuum that explicitly involves both spatial and temporal aspects and provides a rigorous representation of spatiotemporal variabilities and uncertainties. This has considerable advantages as regards analytical investigations of natural processes. The model is used to study natural space/time variations of springwater calcium ion data from the Dyle River catchment area, Belgium. This dataset is characterized by a spatially nonhomogeneous and temporally nonstationary variability that is quantified by random field parameters, such as orders of space/time continuity and random field increments. A rich class of covariance models is determined from the properties of the random field increments. The analysis leads to maps of continuity orders and covariances reflecting space/time calcium ion correlations and trends. Calcium ion estimates and the associated statistical errors are calculated at unmeasured locations/instants over the Dyle region using a space/time kriging algorithm. In practice, the interpretation of the results of the dynamic stochastic analysis should take into consideration the scale effects.  相似文献   

9.
Correlation coefficients of modal variables from several suites of granitic rocks have been calculated and tested for significance using the Chayes-Kruskal and Chayes methods. The results show that although Chayes' remaining-space transformation does, in general, weaken the original proportion correlations, the positive and low-negative original proportion correlations tend to increase in absolute value because of a tendency of positive increment of the covariance by such transformation. However, no satisfactory procedure for choosing the variable to be transformed (V2)could be determined, and testing of the significance of remaining-space correlations is found to be highly problematic. It is shown also that the Vistelius-Sarmanov procedure of calculation of correlation coefficients from closed-table data does not effectively eliminate the closure effect. It is concluded that assigning statistical significance to the correlation coefficients between modal variables is, in general, unsatisfactory, except where the number of variables 8,the sample size is large (say, 30),and there is no negative element in the variance vector of the corresponding open variables.  相似文献   

10.
In this article, we present the multivariable variogram, which is defined in a way similar to that of the traditional variogram, by the expected value of a distance, squared, in a space withp dimensions. Combined with the linear model of coregionalization, this tool provides a way for finding the elementary variograms that characterize the different spatial scales contained in a set of data withp variables. In the case in which the number of elementary components is less than or equal to the number of variables, it is possible, by means of nonlinear regression of variograms and cross-variograms, to estimate the coregionalization parameters directly in order to obtain the elementary variables themselves, either by cokriging or by direct matrix inversion. This new tool greatly simplifies the procedure proposed by Matheron (1982) and Wackernagel (1985). The search for the elementary variograms is carried out using only one variogram (multivariable), as opposed to thep(p + 1)/2 required by the Matheron approach. Direct estimation of the linear coregionalization model parameters involves the creation of semipositive definite coregionalization matrices of rank 1.  相似文献   

11.
The dominant feature distinguishing one method of principal components analysis from another is the manner in which the original data are transformed prior to the other computations. The only other distinguishing feature of any importance is whether the eigenvectors of the inner product-moment of the transformed data matrix are taken directly as the Q-mode scores or scaled by the square roots of their associated eigenvalues and called the R-mode loadings. If the eigenvectors are extracted from the product-moment correlation matrix, the variables, in effect, were transformed by column standardization (zero means and unit variances), and the sum of the p-largest eigenvalues divided by the sum of all the eigenvalues indicates the degree to which a model containing pcomponents will account for the total variance in the original data. However, if the data were transformed in any manner other than column standardization, the eigenvalues cannot be used in this manner, but can only be used to determine the degree to which the model will account for the transformed data. Regardless of the type of principal components analysis that is performed—even whether it is Ror Q-mode—the goodness-of-fit of the model to the original data is given better by the eigenvalues of the correlation matrix than by those of the matrix that was actually factored.  相似文献   

12.
Anders Lindh 《Lithos》1975,8(2):151-161
A population of 117 coexisting nonalkaline pyroxene pairs has been studied statistically to evaluate compositional and thermal effects on the element distribution. KDMgopx-cpx is influenced by the Fe/Mg-ratio, by the Ca content—especially of clinopyroxene—and by the content of tetrahedral Al. Fe and tetrahedral Al are found to be negatively correlated. A principal component analysis based on the variation of Si, AlIV, AlVI, Fe, Mg, Mn, Ca is performed. Dropping of highly correlated variables does not affect the result significantly. The first principal component reflects the major chemical variation in Fe and Mg. When using ferrous and ferric iron as separate entries of the analysis, either the second or the third component demonstrates a temperature dependence. It is, however, not possible to obtain pure temperature and chemical components due to the composition not being uncorrelated to temperature of formation. From these components a graph reflecting temperature of formation has been constructed.  相似文献   

13.
Fitting the Linear Model of Coregionalization by Generalized Least Squares   总被引:2,自引:0,他引:2  
In geostatistical studies, the fitting of the linear model of coregionalization (LMC) to direct and cross experimental semivariograms is usually performed with a weighted least-squares (WLS) procedure based on the number of pairs of observations at each lag. So far, no study has investigated the efficiency of other least-squares procedures, such as ordinary least squares (OLS), generalized least squares (GLS), and WLS with other weighing functions, in the context of the LMC. In this article, we compare the statistical properties of the sill estimators obtained with eight least-squares procedures for fitting the LMC: OLS, four WLS, and three GLS. The WLS procedures are based on approximations of the variance of semivariogram estimates at each distance lag. The GLS procedures use a variance–covariance matrix of semivariogram estimates that is (i) estimated using the fourth-order moments with sill estimates (GLS1), (ii) calculated using the fourth-order moments with the theoretical sills (GLS2), and (iii) based on an approximation using the correlation between semivariogram estimates in the case of spatial independence of the observations (GLS3). The current algorithm for fitting the LMC by WLS while ensuring the positive semidefiniteness of sill matrix estimates is modified to include any least-squares procedure. A Monte Carlo study is performed for 16 scenarios corresponding to different combinations of the number of variables, number of spatial structures, values of ranges, and scale dependence of the correlations among variables. Simulation results show that the mean square error is accounted for mostly by the variance of the sill estimators instead of their squared bias. Overall, the estimated GLS1 and theoretical GLS2 are the most efficient, followed by the WLS procedure that is based on the number of pairs of observations and the average distance at each lag. On that basis, GLS1 can be recommended for future studies using the LMC.  相似文献   

14.
Because of autocorrelation and spatial clustering, all data within a given dataset have not the same statistical weight for estimation of global statistics such mean, variance, or quantiles of the population distribution. A measure of redundancy (or nonredundancy) of any given regionalized random variable Z(uα)within any given set (of size N) of random variables is proposed. It is defined as the ratio of the determinant of the N X Ncorrelation matrix to the determinant of the (N - 1) X (N - 1)correlation matrix excluding random variable Z(uα).This ratio measures the increase in redundancy when adding the random variable Z(uα)to the (N - 1 )remainder. It can be used as declustering weight for any outcome (datum) z(uα). When the redundancy matrix is a kriging covariance matrix, the proposed ratio is the crossvalidation simple kriging variance. The covariance of the uniform scores of the clustered data is proposed as a redundancy measure robust with respect to data clustering.  相似文献   

15.
Ensemble size is critical to the efficiency and performance of the ensemble Kalman filter, but when the ensemble size is small, the Kalman gain generally cannot be well estimated. To reduce the negative effect of spurious correlations, a regularization process applied on either the covariance or the Kalman gain seems to be necessary. In this paper, we evaluate and compare the estimation errors when two regularization methods including the distance-dependent localization and the bootstrap-based screening are applied on the covariance and on the Kalman gain. The investigations were carried out through two examples: 1D linear problem without dynamics but for which the true Kalman gain can be computed and a 2D highly nonlinear reservoir fluid flow problem. The investigation resulted in three primary conclusions. First, if localizations of two covariance matrices are not consistent, the estimate of the Kalman gain will generally be poor at the observation location. The consistency condition can be difficult to apply for nonlocal observations. Second, the estimate of the Kalman gain that results from covariance regularization is generally subject to greater errors than the estimate of the Kalman gain that results from Kalman gain regularization. Third, in terms of removing spurious correlations in the estimation of spatially correlated variables, the performance of screening Kalman gain is comparable as the performance of localization methods (applied on either covariance or Kalman gain), but screening Kalman gain outperforms the localization methods in terms of generality for application, as the screening method can be used for estimating both spatially correlated and uncorrelated variables, and moreover, no assumption about the prior covariance is required for the screening method.  相似文献   

16.
Although there are multiple methods for modeling matrix covariance functions and matrix variograms in the geostatistical literature, the linear coregionalization model is still widely used. In particular it is easy to check to ensure whether the matrix covariance function is positive definite or that the matrix variogram is conditionally negative definite. One of the difficulties in using a linear coregionalization model is in determining the number of basic structures and the corresponding covariance functions or variograms. In this paper, a new procedure is given for identifying the basic structures of the space–time linear coregionalization model and modeling the matrix variogram. This procedure is based on the near simultaneous diagonalization of the sample matrix variograms computed for a set of spatiotemporal lags. A case study using a multivariate spatiotemporal data set provided by the Environmental Protection Agency of Lombardy, Italy, illustrates how nearly simultaneous diagonalization of the empirical matrix variograms simplifies modeling of the matrix variograms. The new methodology is compared with a previous one by analyzing various indices and statistics.  相似文献   

17.
Geologists may want to classify compositional data and express the classification as a map. Regionalized classification is a tool that can be used for this purpose, but it incorporates discriminant analysis, which requires the computation and inversion of a covariance matrix. Covariance matrices of compositional data always will be singular (noninvertible) because of the unit-sum constraint. Fortunately, discriminant analyses can be calculated using a pseudo-inverse of the singular covariance matrix; this is done automatically by some statistical packages such as SAS. Granulometric data from the Darss Sill region of the Baltic Sea is used to explore how the pseudo-inversion procedure influences discriminant analysis results, comparing the algorithm used by SAS to the more conventional Moore–Penrose algorithm. Logratio transforms have been recommended to overcome problems associated with analysis of compositional data, including singularity. A regionalized classification of the Darss Sill data after logratio transformation is different only slightly from one based on raw granulometric data, suggesting that closure problems do not influence severely regionalized classification of compositional data.  相似文献   

18.
A coregionalization simulation consists of the generation of realizations of a group of spatially related random variables. The Fourier integral method is presented, modified to carry out such a multivariable simulation. This method allows the simulation of realizations with any specified symmetrical covariance matrix and it is not limited to the classic linear model of coregionalization. The results of gaussian nonconditinal simulations from a case study modeling the spatial characteristics of a layer of coal are given.  相似文献   

19.
In reservoir characterization, the covariance is often used to describe the spatial correlation and variation in rock properties or the uncertainty in rock properties. The inverse of the covariance, on the other hand, is seldom discussed in geostatistics. In this paper, I show that the inverse is required for simulation and estimation of Gaussian random fields, and that it can be identified with the differential operator in regularized inverse theory. Unfortunately, because the covariance matrix for parameters in reservoir models can be extremely large, calculation of the inverse can be a problem. In this paper, I discuss four methods of calculating the inverse of the covariance, two of which are analytical, and two of which are purely numerical. By taking advantage of the assumed stationarity of the covariance, none of the methods require inversion of the full covariance matrix.  相似文献   

20.
Geochemical samples from part of Lake Geneva were analyzed for 29oxides and trace elements. The variables and samples were subjected to R- and Q-mode analyses. The following techniques were applied in sequence: data transformation (normalization and standardization), data reduction (principal component and factor analysis), and automatic classification (dendrograph). The data were treated using various combinations of these techniques, and the resulting classifications evaluated by means of several criteria. The best classification of the samples is given by a cluster analysis performed on four principal components computed from standardized variables. The discriminatory power of the variables also was measured and determined to depend on their degree of intercorrelation. As a final result, the 29original variables were reduced to four components and the sediment samples classified into four facies, leading to easily interpretable geochemical maps.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号