首页 | 本学科首页   官方微博 | 高级检索  
     

基于SMOTE-RF算法的村庄发展类型识别方法研究
引用本文:潘雨飘,赵翔,王静,张亦清,刘耀林. 基于SMOTE-RF算法的村庄发展类型识别方法研究[J]. 地球信息科学学报, 2023, 25(1): 163-176. DOI: 10.12082/dqxxkx.2023.220468
作者姓名:潘雨飘  赵翔  王静  张亦清  刘耀林
作者单位:1.武汉大学资源与环境科学学院,武汉 4300792.北京师范大学水科学研究院,北京 100875
基金项目:国家自然科学基金项目(41971336);国家重点研发计划项目(2018YFD1100801)
摘    要:准确把握区域发展规律,定量、客观地认识村庄发展类型,对“因地制宜、分类推进”乡村振兴具有非常重要的现实意义。针对区域村庄发展类型自动、准确识别问题,研究提出了一种基于SMOTE-RF算法的村庄发展类型识别模型。研究首先从地形、区位、社会经济、农业生产和生态环境等方面提出了面向村庄发展多维特征表达的指标体系。在此基础上,针对村庄样本不平衡分布特点,利用SMOTE过采样技术对少数类样本进行分析和模拟,合成平衡化的村庄分类样本集;进而利用随机森林算法自动构建村庄发展的多维属性特征与村庄类型之间的非线性关系,形成可用于区域村庄发展类型自动识别的智能分类器。为验证模型的有效性,研究选取山东招远市作为试验区开展了实证研究。实验结果表明,耦合SMOTE过采样技术的随机森林分类模型有效保障了村庄分类结果的可靠性和准确度。在试验区,模型自动识别结果与规划专家分类结果的一致性达88.27%,Kappa系数为0.78,整体一致性良好。相对于人工分类,基于SMOTE-RF方法的村庄类型自动识别方法减少了依赖人工经验分类带来的不确定性,保障了分类结果的一致性,能够为国土空间规划和乡村振兴专项规划决策提供可靠的决...

关 键 词:村庄分类  随机森林  SMOTE方法  多源数据  过采样  乡村振兴  国土空间规划  招远市
收稿时间:2022-07-02

Identifying the Class of the Villages based on SMOTE-RF Algorithm
PAN Yupiao,ZHAO Xiang,WANG Jing,ZHANG Yiqing,LIU Yaolin. Identifying the Class of the Villages based on SMOTE-RF Algorithm[J]. Geo-information Science, 2023, 25(1): 163-176. DOI: 10.12082/dqxxkx.2023.220468
Authors:PAN Yupiao  ZHAO Xiang  WANG Jing  ZHANG Yiqing  LIU Yaolin
Affiliation:1. School of Resources and Environmental Sciences, Wuhan University, Wuhan 430079, China2. College of Water Sciences, Beijing Normal University, Beijing 100875, China
Abstract:To achieve sustainable development and revitalization of the rural areas, it is significant to identify the development pattern of villages according to their natural, social, and economic conditions. To accurately identify the development pattern of villages in rural areas, this study aims to develop a village classification method based on the SMOTE-RF algorithm. To achieve this goal, first, we designed a multi-dimensional index system that includes aspects of topography, location, socioeconomics, agricultural production, construction lands, ecosystem services, and characteristics of rural settlements, to quantify and assess the development characteristics of villages. Second, the classification information of villages identified by planning experts were collected as a sample dataset for model training and validation. To address the overfitting issues of classification algorithms caused by imbalanced sample sets, an oversampling algorithm called SMOTE was applied to produce a balanced synthetic sample set from the original sample set obtained by planning experts based on the K-nearest neighbor strategy. Third, the balanced sample set produced by SMOTE algorithm was used to train the classifier for village classification. Then, the nonlinear relationship between the multi-dimensional development characteristics of the villages and the development pattern of villages was identified using the Random Forest (RF) algorithm. Finally, Zhaoyuan city, which is located in Shandong Province, China, was selected as the study area to evaluate the performance of our model. The experimental results show that the classification model we built based on the SOMTE-RF algorithm can automatically extract the multi-dimensional and nonlinear expert knowledge for village classification from a small number of samples. Compared with the unsupervised classification methods such as SOFM algorithm, the classification results produced by our model can better support the spatial planning decision-making, because the SMOTE-RF algorithm can intuitively present the classification rules in a tree structure. In addition, with the application of oversampling algorithm, the overall accuracy, the accuracy, and the AUC value of the classification model were increased from 0.93 to 0.99, 0.73 to 0.88, and 0.895 to 0.982, respectively, compare with the model results without oversampling. The village classification results in Zhaoyuan also demonstrated that the results obtained by SMOTE-RF algorithm were overall consistent with that of planning experts. For instance, the consistency between the results classified by our model and the planning experts reached 88.27%, and the Kappa coefficient was about 0.78. The village classification model we developed in this study can significantly reduce the uncertainty of the classification results, thus providing a reliable decision-making basis for the territorial planning and rural revitalization.
Keywords:village classification  random forest  SMOTE method  multi-source data  oversampling  rural revitalization  territorial spatial planning  Zhaoyuan city  
点击此处可从《地球信息科学学报》浏览原始摘要信息
点击此处可从《地球信息科学学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号