首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于机器学习的稀疏样本下的土壤有机质估算方法
引用本文:刘明杰,徐卓揆,郜允兵,杨晶,潘瑜春,高秉博,周艳兵,周万鹏,王凌.基于机器学习的稀疏样本下的土壤有机质估算方法[J].地球信息科学,2020,22(9):1799-1813.
作者姓名:刘明杰  徐卓揆  郜允兵  杨晶  潘瑜春  高秉博  周艳兵  周万鹏  王凌
作者单位:1.长沙理工大学交通运输学院,长沙 4101142.国家农业信息化工程技术研究中心,北京 1000973.长沙理工大学公路地质灾变预警空间信息技术湖南省工程实验室,长沙 4101144.北京农业信息技术研究中心,北京 1000975.中国农业大学,北京1000836.河南理工大学,焦作 4540037.河北省农林科学院农业资源环境研究所,石家庄 050051
基金项目:国家重点研发计划课题(2017YFD0801205);北京市农林科学院科技创新能力建设专项(KJCX20170407);北京市农林科学院科技创新能力建设专项(KJCX20200414);湖南省教育厅资助科研项目(13B129);湖南省工程实验室开放基金资助项目(KFJ180602)
摘    要:采用GRNN(Generalized Regression Neural Network)和RF(Random Forest)2种机器学习方法构建土壤有机质预测模型,以提高稀疏样本情况下的土壤有机质估算精度。依据北京市大兴区农用地2007年的土壤有机质采样数据,按MMSD准则(Minimization of the Mean of the Shortest Distances)抽稀为8种不同采样密度的样本(分别为2703、1352、676、339、169、85、43、22个样本),分别采用GRNN、RF和Ordinary kriging对各采样密度下的未知采样点进行预测,采用交叉检验的方式验证各采样密度下未知样点的预测精度。随着采样点密度的下降,样点间的空间自相关性逐渐减弱,半变异函数的拟和精度变差,预测点结果误差增大,预测的置信度降低。当抽稀到43个和22个采样点时,样点间的空间自相关性接近歼灭,半变异函数的决定系数较低且残差较大。普通克里格受到采样点数量和采样密度、样点的空间结构的影响比较明显,其预测精度随采样点数量的下降而下降。在85个采样点及以下时,其预测值与观测值之间没有显著的相关性。GRNN和RF的预测精度受采样密度的影响不大,其预测精度在一个较小的范围内波动,其预测值围绕观测值在一定阈值空间内震荡波动,具有较好的相关性,在85个及以下的采样密度时,预测精度相对普通克里格有较大的提升。普通克里格法不适合在稀疏样本条件下空间插值计算,尤其是在空间自相关性比较弱的情况下。机器学习模型能充分学习土壤间环境信息、样点空间邻近效应信息,兼顾属性相似性和空间自相关,具有更好的稳定性和适应性,不容易受到采样点数量、构型和采样密度等因素的影响,即使在采样点空间自相关性很弱的情况下也能做出稳定预测精度。

关 键 词:土壤有机质  空间插值  机器学习  属性相似性  空间自相关  大兴区  稀疏样本  采样密度  
收稿时间:2019-08-13

Estimating Soil Organic Matter based on Machine Learning Under Sparse Sample
LIU Mingjie,XU Zhuokui,GAO Yunbing,YANG Jing,PAN Yuchun,GAO Bingbo,ZHOU Yanbing,ZHOU Wanpeng,WANG Ling.Estimating Soil Organic Matter based on Machine Learning Under Sparse Sample[J].Geo-information Science,2020,22(9):1799-1813.
Authors:LIU Mingjie  XU Zhuokui  GAO Yunbing  YANG Jing  PAN Yuchun  GAO Bingbo  ZHOU Yanbing  ZHOU Wanpeng  WANG Ling
Abstract:To improve the accuracy of soil organic estimation in the case of sparse samples and to construct the soil organic predictive models applying the machine learning methods, GRNN (Generalized Regression Neural Network) and RF(Random Forest). The soil was diluted into 8 samples with different sampling density (2703, 1352, 676, 339, 169, 85, 43, 22 samples) according to the soil organic matter sampling data of Daxing agricultural land in 2007 applying the MMSD (Minimization of the Mean of the Shortest Distances) criterion. GRNN (Generalized Regression Neural Network), RF (random forest) and Ordinary Kriging are applied to predict each sampling density espectively. Cross Validation is used to verify the prediction accuracy of unknown samples at each sampling density. With the decrease of sampling point density, the spatial correlation between sampling points decreases gradually, thus the semivariogram's fitting precision deteriorates, the errorofprediction point result increases, and the confidence of the prediction decreases. The spatial correlation between sampling points is close to disappear when the sample is diluted under 43 and 22 samples, and the coefficient of determination of the semivariogram function is low and the residual is large. The impacts the Ordinary Kriging receives, which are from the changes in the number of the sampling points, sampling density and spatial structures of samples is obvious. The prediction accuracy of the method decreases with the decrease of the number of sampling points. There is no significant correlation between the predicted values and the observed values at or below 85 sampling points. The prediction accuracy of GRNN and RF is almost independent of the sampling density. The predicted values fluctuate within a certain threshold space around the observed values, and has good correlation. At sampling points of 85 and below, the prediction accuracy is greatly improved compared with Ordinary Kriging. Ordinary Kriging is not suitable for spatial interpolating calculation in the case of sparse samples, especially in the case of weak spatial correlation. The machine learning models can fully learn the environmental information and spatial proximity information of soil sampling points. They combine attribute similarity and spatial correlation and have better stability and adaptability, not being easy to be affected by the number of sampling points, configuration and sampling density, and can make stable and accurate predictions even when the spatial autocorrelation between sampling points is very weak.
Keywords:soil organic matter  spatial interpolation  machine learning  attribute similarity  spatial correlation  Daxing County  sparse sample  sampling density  
点击此处可从《地球信息科学》浏览原始摘要信息
点击此处可从《地球信息科学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号