首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于混合式特征选择的高分五号影像农田识别
引用本文:陈珠琳,贾坤,李强子,肖晨超,魏丹丹,赵祥,魏香琴,姚云军,李娟.基于混合式特征选择的高分五号影像农田识别[J].遥感学报,2022,26(7):1383-1394.
作者姓名:陈珠琳  贾坤  李强子  肖晨超  魏丹丹  赵祥  魏香琴  姚云军  李娟
作者单位:1.北京师范大学 地理科学学部 遥感科学国家重点实验室, 北京 100875;2.北京师范大学 北京市陆表遥感数据产品工程技术研究中心, 北京 100875;3.中国科学院空天信息创新研究院, 北京 100101;4.自然资源部国土卫星遥感应用中心, 北京 100048
基金项目:国家重点研发计划(编号:2019YFE0127300, 2016YFB0501404);国家自然科学基金(编号:42171318)
摘    要:精准农田识别是农作物估产和粮食安全评估的基础。遥感数据作为农田识别的重要数据源,可提供动态、快速的监测结果。高光谱数据在农田识别分类方面具有巨大的应用潜力,但其中的冗余波段影响了分类效率和分类精度。因此,本研究提出了一种适用于高光谱数据农田分类的混合式特征选择算法。首先,基于变量的重要性排序或约束程度,按步长逐步进行降维;其次,寻找分类精度骤减的转折点,并将其对应的变量作为特征子集;最后,利用序列后向选择SBS(Sequential Backward Selection)方法搜索最优分类特征子集。本研究利用GF-5高光谱数据,共研究了3种降维方法(随机森林RF(Random Forest)、互信息MI(Multi-Information)和L1正则化(L1 regularization))和3种分类算法(随机森林、支持向量机SVM(Support Vector Machine)和K近邻KNN(K-Nearest Neighbor))的组合在农田分类中的表现。结果表明,基于L1正则化法得到的特征子集自相关性较低,并且包含的红边和近红外波段有效提高了农田、森林和裸土的区分度。在不同分类模型比较中发现,SVM在高维空间中表现出非常好的抗噪能力,分类精度高于RF和KNN。而RF在低维空间中的泛化能力要高于SVM和KNN。相比于第一步降维得到的特征子集,使用SBS搜索得到的最优特征子集均提高了分类精度。最终,具有23维输入的L1-SVM-SBS分类模型得到了最高的总体分类精度(94.64%)和农田召回率(95.83%)。本研究为高光谱数据特征优选提供了一种新思路,筛选出了更具代表性的特征波段,提高了农田分类精度,对高光谱遥感分类研究具有参考价值。

关 键 词:农田识别  高分五号  特征选择  高光谱遥感  L1正则化  后向序列选择
收稿时间:2020/10/26 0:00:00

Hybrid feature selection for cropland identification using GF-5 satellite image
CHEN Zhulin,JIA Kun,LI Qiangzi,XIAO Chenchao,WEI Dandan,ZHAO Xiang,WEI Xiangqin,YAO Yunjun,LI Juan.Hybrid feature selection for cropland identification using GF-5 satellite image[J].Journal of Remote Sensing,2022,26(7):1383-1394.
Authors:CHEN Zhulin  JIA Kun  LI Qiangzi  XIAO Chenchao  WEI Dandan  ZHAO Xiang  WEI Xiangqin  YAO Yunjun  LI Juan
Institution:1.State Key Laboratory of Remote Sensing, Faculty of Geographical Science, Beijing Normal University, Beijing 100875, China;2.Beijing Engineering Research Center for Global Land Remote Sensing Products, Beijing Normal University, Beijing 100875, China;3.Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100101, China;4.Land Satellite Remote Sensing Application Center, Ministry of Natural Resource of the People''s Republic of China, Beijing 100048, China
Abstract:Accurate farmland area identification is the basis of crop yield estimation and an important indicator in food security assessment. As an important data source for farmland identification, remote sensing data can provide dynamic and fast observation results for classification. GF-5, which is the only hyperspectral satellite in the China High-resolution Earth Observation System, has great research and application potential in farmland identification. However, the dimensionality curse caused by the redundant bands in hyperspectral data seriously affects the calculation speed and classification accuracy of models. To solve this problem, this research proposes a hybrid feature selection algorithm for farmland identification. First, on the basis of the feature importance provided by the feature selection algorithm, the feature dimension is gradually reduced from 295 to 5 with a step length of 10. The overall accuracy of the classification results corresponding to each feature dimension is recorded. Second, the turning point (a dimension number whose corresponding overall accuracy hardly decreases when the input variable number is smaller than it) is determined based on the overall accuracy, and the corresponding variables are adopted as the feature subset. Lastly, the Sequential Backward Selection (SBS) method is used to search for the best subset.Three feature selection algorithms (i.e., Random Forest (RF), Multi-Information (MI), and L1 regularization (L1)) and three classification algorithms (RF, Support Vector Machine (SVM), and K-Nearest Neighbor (KNN)) are examined. Results indicate that the autocorrelations of the three subsets differ significantly. Most of the bands selected by the MI method are continuous and concentrated in the blue and shortwave infrared range. Therefore, the extremely high autocorrelation that exists in this subset has a negative effect on classification accuracy. By contrast, the correlation between bands in the RF and L1 feature subsets is relatively weak. However, the two feature sets still result in different classification accuracy. According to the variable distribution, many red-edge and near-infrared bands are contained in the L1 feature subset. These bands demonstrate better ability to distinguish farmland, forest, and soil than the blue and red bands selected by the RF algorithm. The classification algorithms also have different capacities. In the high-dimensional space, the SVM algorithm exhibits high robustness to noise, resulting in high accuracy. However, when the dimension decreases to a critical value, the accuracy of SVM decreases sharply. By contrast, although RF is not as robust as SVM in the high-dimensional space, it has excellent generalization ability in the low-dimensional space. Compared with the subsets obtained after the first dimensionality reduction process, the optimal feature subsets obtained by SBS searching improve the classification accuracy of each model.The L1-SVM-SBS model with a 23-dimensional input achieves the highest overall classification accuracy (94.64%) and cropland recall rate (95.83%). This study provides a new method of farmland identification using hyperspectral data. By selecting numerous representative and informative bands, this method not only improves farmland classification accuracy, but can also be used as a reference for other classification problems involving hyperspectral remote sensing.
Keywords:cropland identification  GF-5  feature selection  hyperspectral remote sensing  L1 regularization  sequential backward selection
点击此处可从《遥感学报》浏览原始摘要信息
点击此处可从《遥感学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号