首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 15 毫秒
1.
Traditionally,one form of preprocessing in multivariate calibration methods such as principal componentregression and partial least squares is mean centering the independent variables(responses)and thedependent variables(concentrations).However,upon examination of the statistical issue of errorpropagation in multivariate calibration,it was found that mean centering is not advised for some datastructures.In this paper it is shown that for response data which(i)vary linearly with concentration,(ii)have no baseline(when there is a component with a non-zero response that does not change inconcentration)and(iii)have no closure in the concentrations(for each sample the concentrations of allcomponents add to a constant,e.g.100%)it is better not to mean center the calibration data.That is,the prediction errors as evaluated by a root mean square error statistic will be smaller for a model madewith the raw data than a model made with mean-centered data.With simulated data relativeimprovements ranging from 1% to 13% were observed depending on the amount of error in thecalibration concentrations and responses.  相似文献   

2.
A multivariate calibration procedure based on principal component analysis is proposed.UV-vis spectraof ternary mixtures have been used to check the applicability of the procedure.  相似文献   

3.
A GLOBAL PERSPECTIVE ON MULTIVARIATE CALIBRATION METHODS   总被引:1,自引:0,他引:1  
This paper consists of two distinct but related parts.In the first part a geometric theory of generalizedinverses is presented and a methodology based on this theory is developed and applied to solve the K-matrix and P-matrix forms of Beer's law.It is shown that most currently accepted and practiced methodsfor solving these forms of Beer's law are just special cases of this geometric theory of generalized inverses.In addition,this geometric theory is used to explain why the current methods work and why they fail.In the second part a general methodology that includes as special cases least squares,principalcomponent regression,partial least squares 1 and 2,continuum regression plus a variety of otherdescribed and undescribed methodologies is presented and then applied to solve the P-matrix formulationof Beer's law.This general methodology,like the first,is also geometric in nature and relies on anunderstanding of projections.The main emphasis of this paper is one of perspective,which,if understood,provides the properfoundation for answering the general but extremely hard and possibly unanswerable question‘what isthe best method?’.  相似文献   

4.
Modern scanning(near-)infrared reflectance/absorption(NIR)spectroscopes measure the absorptions orreflectances at a sequence of around 1000 wavelengths.Training data may consist of 10-100 carefullydesigned sample mixtures for which the true composition of the mixture is either known by formulationor accurately determined by wet chemistry.In future one wishes to predict the true composition fromthe spectrum.In this paper we compare a simple wavelength selection approach with methods whichretain all the wavelengths.It offers a powerful yet simple technique for choosing those wavelengths thatare specific to each pure component as against the other components(including the medium)for thevarying compositions.In the presence of a defined range of ingredients it thus chooses wavelengths whichare highly selective for each particular component.It has the added advantage of selecting wavelengthswhich are little effected by interaction effects and consequent non-linearities.The calibration data used consist of 125 observations of three sugars,each varying at five levels in afull 5~3 design.The validation set consists of 21 further samples specially selected to have compositionsoutside the range of the training sample.The selection methods perform much better on this predictionset than methods which retain all the wavelengths,700 in this case.The leave-one-out cross-validationinternal to the calibration data would point to the opposite finding and suggests that such cross-validations may be overly flattering to techniques such as partial least squares and may encourageoverfitting.After selection,simple straightforward least squares methods may be used,eschewing theneed for‘shrinkage’methods such as partial least squares or ridge regression.  相似文献   

5.
PLS1 regression is generally viewed as lying in between PCR and OLS regression.Proof is given thatthe coefficient of determination,R~2,for a PLS multivariate calibration model is at least as high as thatfor a PCR model with the same number of components.It appears that PLS can be linked to acorrelation-weighted polynomial regression of a constant response on the eigenvalues of the covariancematrix of the predictor variables.  相似文献   

6.
THE KERNEL ALGORITHM FOR PLS   总被引:3,自引:0,他引:3  
A fast and memory-saving PLS regression algorithm for matrices with large numbers of objects ispresented.It is called the kernel algorithm for PLS.Long(meaning having many objects,N)matricesX (N×K)and Y(N×M)are condensed into a small(K×K)square‘kernel’matrix X~TYY~TX of sizeequal to the number of X-variables.Using this kernel matrix X~TYY~TX together with the small covariancematrices X~TX(K×K),X~TY(K×M)and Y~TY(M×M),it is possible to estimate all necessaryparameters for a complete PLS regression solution with some statistical diagnostics.The newdevelopments are presented in equation form.A comparison of consumed floating point operations isgiven for the kernel and the classical PLS algorithm.As appendices,a condensed matrix algebra versionof the kernel algorithm is given together with the MATLAB code.  相似文献   

7.
The usefulness of the Kalman filter as an algorithm for calibration in a real system is shown. Results arecompared with classical least squares and pure component calibration. The prediction of four prioritypollutant chlorophenols in binary, ternary and quaternary mixtures was also carried out by Kalmanfiltering. The condition number, standard deviation and prediction error have been employed to choosethe most suitable wavelength range. Comparison of the standard error of prediction in the validation setshows significant differences between the evaluated chlorophenols, the best results being obtained withKalman multivariate calibration.  相似文献   

8.
Regression between two blocks(usually called‘dependent’or Y and‘independent’or X)of data is a veryimportant scientific and data-analytical tool.Regression on multivariate images is possible and constitutesa meaningful addition to existing univariate and multivariate techniques of image analysis.The regressioncan be used as a modeling tool or for prediction.The form of the regression equation chosen is dependentupon problem specification and information at hand.This paper describes the use of principal componentregression(PCR).Both model building and prediction are presented for continuous Y-variables.The finalgoal is to supply new image material that can be used for visual inspection on a screen.Also,visual toolsfor diagnosis of model and prediction are provided,often based on derived image material.Examplesof modeling and prediction are given for six channels in a seven-channel satellite image  相似文献   

9.
The use of continuum regression(CR)for the identification of finite impulse response(FIR)dynamicmodels is investigated.CR encompasses the methods of principal component regression(PCR),partialleast squares(PLS)and multiple linear regression(MLR).PCR and MLR are at the two extremes of thecontinuum.In PCR and PLS,cross-validation is used to determine the optimum number of factors or‘latent variables’to retain in the regression model.CR allows one to vary the method in addition.Cross-validation then determines both the optimum method and the number of latent variables.The CR‘prediction error surface’—a function of the method and number of latent variables—is elucidated.Theoptimal model is defined as the minimum of this surface.Among the cases studied,the optimal modelusually comes from the region of the continuum between PCR and PLS.Few derive from the regionbetween PLS and MLR.It is also demonstrated that FIR models identified by CR have frequency domainproperties similar to those identified by PCR.  相似文献   

10.
Calibrations to predict crude protein (CP) and in vitro dry matter digestibility (IVDMD) in dried grasssilage from reflectance data collected at 19 wavelengths on an InfraAlyzer 400R have been developedusing stepwise multiple linear (SML) and principal component (PC) regression techniques. A directcomparison of the efficacy of each multivariate technique in this application has been possible by usingidentical calibration development and evaluation sample sets. The effect of two data transformation stepsprior to PC regression was also investigated. PC regression of raw reflectance data yielded no significantimprovement in the standard errors of prediction (SEP) for CP and IVDMD over those obtained bySMLR, viz. 0.61 vs 0.63 and 2.9 vs 3.0 respectively. Computation time for development and evaluation ofthe PC regression equation was less than for selection of the best SMLR equation, and PCR equationsmay be more robust. Data transformation to reduce granularity effects prior to PCR did not produce anyimprovement in predictive accuracy for either IVDMD or CP.  相似文献   

11.
In this paper a criterion is described for the construction of experimental designs for the evaluation ofcalibration models in analytical chemistry.The proposed criterion seeks a compromise between theD-optimal designs for estimating the parameters of different polynomial models.A computer algorithmis presented for a sequential construction of experimental designs using the optimality criterion.Theperformance of the optimality criterion and the computer algorithm is elaborated for the problem ofdiscrimination between a first-to a third-degree polynomial for the calibration of analytical methods.Anexperimental design consisting of replicate measurements at five distinct levels equally spaced over thecalibration range proved a good solution.  相似文献   

12.
By means of Monte Carlo simulations a comparison has been made between ordinary least squaresregression and robust regression. The robust regression procedure is based on the Huber estimate and iscomputed by means of the iteratively reweighted least squares algorithm. The performance of bothprocedures has been evaluated for estimation of the parameters of a calibration function and fordetermination of the concentration of unknown samples. The influence of the distributionalcharacteristics skewness and kurtosis has been studied, and the number of measurements used forconstructing the calibration curve has also been taken into account, Under certain conditions robustregression offers an advantage over least squares regression.  相似文献   

13.
14.
选取江西省乐安河及其支流流域作为研究区域,探讨了使用土壤高光谱数据间接反演其重金属(Cu、Zn、Pb)含量的方法。选用偏最小二乘模型对土壤有机质含量进行高光谱反演,引入人工神经网络回归模型建立土壤有机质含量与重金属含量的相关关系,从而提取出土壤中痕量级的重金属元素,并针对其空间分布情况进行关联分析和对比分析。实验结果表明,此方法在反演Cu、Zn元素时可有效反映其空间分布特征,具有在类似泛滥平原区域推广的适宜性,也为该区域土壤及水文生态环境监测提供了相关参考。  相似文献   

15.
用主分量方法分析广东春季低温阴雨年景   总被引:1,自引:0,他引:1  
徐小英  简裕庚 《热带地理》1997,17(4):364-370
本文利用主分量方法对广东47站1954~1991年2~3月平均温度和广东2~3月间低温阴雨出现年景进行统计分析,根据主分量原理,计算该时期温度的时空分布特征,直接评价低温阴雨出现年景:①广东2~3月温度时空分布极为集中,第1主分量已占埸的总方差的95.1%;③用前4个主分量及其对应的特征向量配合划分温度分布类型;③广东2~3月温度分布主要由2个类型控制,即全省一致的偏低(或高)分布和南暖北冷或南冷北暧分布.由主分量极大值(正)和极小值(负)表明:1957、1968、1969年为全省性温度偏低年,1973和1987年为全省性温度偏高年。这些年份恰好对应广东2~3月低温阴雨严重和轻微(或无)的年份。  相似文献   

16.
基于陕西关中地区半干旱的地理条件,针对1980—2017年农业生产的实际统计数据,以5年为计算时间尺度单元,建立了关中地区农业生产的主成分回归(PCR)分析模型,定量地研究了陕西关中地区地理环境和生产投入对农业生产的绩效贡献。结果表明:(1)对各时段的PCR方程模型自变量平均弹性系数的计算分析表明,促进农业生产效益提高的主要指标有Y3(实际灌溉农田面积,0.117)、Y4(高产稳产农田面积,0.509)、Y7(农用施用化肥总量,0.793)、Y8(农用机械总动力,0.091)、Y9(总农业用电量,0.478)、Y10(农业劳动力人数,0.106);减少效益的主要指标有Y1(农田面积,-0.763)、Y5(受灾农田面积,-0.052)、Y6(成灾农田面积,-0.062)。(2)自然灾害对关中地区农业粮食生产的影响处于非常显著位置,但影响总的而言比较平稳。(3)在这些指标因素的综合影响下,关中农业粮食生产产量呈现高低起伏、周期性循环、持续增长的趋势。  相似文献   

17.
The diatom composition in surface sediments from 119 northern Swedish lakes was studied to examine the relationship with lake-water pH, alkalinity, and colour. Diatom-based predictive models, using weighted-averaging (WA) regression and calibration, partial least squares (PLS) regression and calibration, and weighted-averaging partial least squares (WA-PLS) regression and calibration, were developed for inferences of water chemistry conditions. The non-linear response between the diatom assemblages and pH and alkalinity was best modelled by weighted-averaging methods. The lowest prediction error for pH was obtained using weighted averaging, with or without tolerance downweighting. For alkalinity there was still some information in the residual structure after extracting the first weighted-averaging component, which resulted in a slight improvement of predictions when using a two component WA-PLS model. The best colour predictions were obtained using a two component PLS model. Principal component analysis (PCA) of the prediction errors, with some characteristics of the training set included as passive variables, was performed to compare the results for the different alkalinity predictive models. The results indicate that calibration techniques utilizing more than one component (PLS and WA-PLS) can improve the predictions for lakes with diatom taxa that have broad tolerances. Furthermore, we show that WA-PLS performs best compared with the other techniques for those lakes that have a high relative abundance of the most dominant taxa and a corresponding low sample heterogeneity.  相似文献   

18.
About 145 freshwater to hypersaline lakes of the eastern Tibetan Plateau were investigated to develop a transfer function for quantitative palaeoenvironmental reconstructions using ostracods. A total of 100 lakes provided sufficient numbers of ostracod shells. Multivariate statistical techniques were used to analyse the influence of a number of environmental variables on the distributions of surface sediment ostracod assemblages. Of 23 variables determined for each site, 19 were included in the statistical analysis. Lake water electrical conductivity (8.2%), Ca% (7.6%) and Fe% (4.8%, ion concentrations as % of the cations) explained the greatest amounts of variation in the distribution of ostracod taxa among the 100 lakes. Electrical conductivity optima and tolerances were calculated for abundant taxa. A transfer function, based on weighted averaging partial least squares regression (WA-PLS), was developed for electrical conductivity (r 2 = 0.71, root-mean-square-error of prediction [RMSEP] = 0.35 [12.4% of gradient length], maximum bias = 0.64 [22.4% of gradient length], as assessed by leave-one-out cross-validation) based on 96 lakes. Our results show that ostracods provide reliable estimates of electrical conductivity and can be used for quantitative palaeoenvironmental reconstructions similarly to more commonly used diatom, chironomid or pollen data.  相似文献   

19.
This study investigated the distribution of subfossil diatom assemblages in surficial sediments of 100 lakes along steep ecological and climatic gradients in northernmost Sweden (Abisko region, 67.07° N to 68.48° N latitude, 17.67° E to 23.52° E longitude) to develop and cross-validate transfer functions for paleoenvironmental reconstruction. Of 19 environmental variables determined for each site, 15 were included in the statistical analysis. Lake-water pH (8.0%), sedimentary loss-on-ignition (LOI, 5.9% and estimated mean July air temperature (July T, 4.8%) explained the greatest amounts of variation in the distribution of diatom taxa among the 100 lakes. Temperature and pH optima and tolerances were calculated for abundant taxa. Transfer functions, based on WA-PLS (weighted averaging partial least squares), were developed for pH (r2 = 0.77, root-mean-square-error of prediction (RMSEP) = 0.19 pH units, maximum bias = 0.31, as assessed by leave-one-out cross-validation) based on 99 lakes and for July T (r2 = 0.75, RMSEP = 0.96 °C, max. bias = 1.37 °C) based on the full 100 lake set. We subsequently assessed the ability of the diatom transfer functions to estimate lake-water pH and July T using a form of independent cross-validation. To do this, the 100-lake set was divided in two subsets. An 85-lake training-set (based on single limnological measurements) was used to develop transfer functions with similar performance as those based on the full 100 lakes, and a 15-lake test-set (with 2 years of monthly limnological measurements throughout the ice-free seasons) was used to test the transfer functions developed from the 85-lake training-set. Results from the intra-set cross-validation exercise demonstrated that lake-specific prediction errors (RMSEP) for the 15-lake test-set corresponded closely with the median measured values (pH) and the estimations based on spatial interpolations of data from weather stations (July T). The prediction errors associated with diatom inferences were usually within the range of seasonal and interannual variability. Overall, our results confirm that diatoms can provide reliable and robust estimates of lake-water pH and July T, that WA-PLS is a robust calibration method and that long-term environmental data are needed for further improvement of paleolimnological transfer functions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号