首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于机器学习方法的辽宁省初霜冻日期预测模型研究
引用本文:王涛,王乙舒,赵春雨,王小桃,秦美欧,沈玉敏,侯依玲,赵建云.基于机器学习方法的辽宁省初霜冻日期预测模型研究[J].气象与环境学报,2022,38(4):47-56.
作者姓名:王涛  王乙舒  赵春雨  王小桃  秦美欧  沈玉敏  侯依玲  赵建云
作者单位:1. 沈阳区域气候中心, 辽宁 沈阳 1101662. 中国气象局沈阳大气环境研究所, 辽宁 沈阳 110166
基金项目:中国气象局气候变化专项(CCSF202013);中国气象局创新发展专项(CXFZ2021J047);辽宁省科技厅自然基金指导计划(2019-ZD-0860);东北冷涡研究重点开放实验室开放基金课题(2022SYIAZKFMS09);农业攻关及产业化指导计划(2019JH8/10200023);辽宁省气象局科学技术课题(BA202005)
摘    要:基于前期ERA5逐月再分析数据, 应用3种机器学习算法(Lasso回归、随机森林和神经网络)对辽宁省初霜冻日期进行预测评估。Lasso回归算法提取对初霜冻日期预测有重要指示意义的气象要素特征集, 通过交叉验证和超参数调优建立初霜冻日期预测模型, 利用均方根误差(RMSE)和距平同号率方法定量定性地评估模型的预测效果。结果表明: 特征选择后的气象要素特征集建模提升了模型的泛化能力、可解释性和稳定性; Lasso回归模型在4月起报的预测效果最好(RMSE为6—8 d), 神经网络模型在5月起报性能最好(RMSE为6—9 d), 随机森林模型在3月起报性能最好(RMSE为8—9 d); 辽宁全省大部分站点距平同号率为50%—70%, 其中Lasso回归和神经网络模型为5月起报最高(约为68%), 随机森林算法为3月起报最高(约为62%)。特征选择和敏感性实验结果发现, 低植被覆盖比例是初霜冻日期预测关键预测因子, 植被覆盖率越高越有利于地表含水量保持, 降温容易产生霜冻, 初霜冻日期也就越易提前, 去掉低植被覆盖比例因子后模型预测效果显著下降, 也表明该因子是模型建模的前期关键因子。

关 键 词:ERA5  机器学习  Lasso回归  随机森林  神经网络  
收稿时间:2021-04-25

Prediction model of first-frost date in Liaoning province using machine learning methods
Tao WANG,Yi-shu WANG,Chun-yu ZHAO,Xiao-tao WANG,Mei-ou QIN,Yu-min SHEN,Yi-ling HOU,Jian-yun ZHAO.Prediction model of first-frost date in Liaoning province using machine learning methods[J].Journal of Meteorology and Environment,2022,38(4):47-56.
Authors:Tao WANG  Yi-shu WANG  Chun-yu ZHAO  Xiao-tao WANG  Mei-ou QIN  Yu-min SHEN  Yi-ling HOU  Jian-yun ZHAO
Institution:1. Shenyang Regional Climate Centre, Shenyang 110166, China2. Institute of Atmospheric Environment, China Meteorological Administration, Shenyang 110166, China
Abstract:Based on ERA5 monthly reanalysis data, the first-frost date in Liaoning province was predicted and evaluated using three machine learning algorithms (Lasso Regression, Random Forest, and Neural Network). The Lasso Regression algorithm was applied to identify the feature sets of meteorological parameters that have important indications for the prediction of the first-frost date, and the prediction model for the first-frost date was established after cross-validation and hyperparameter-tuning processes. Finally, the performance of first-frost prediction was evaluated quantitatively and qualitatively using the root mean square error (RMSE) and the rate with the same sign of an anomaly. The results showed that the feature sets of meteorological parameters after feature selection can improve the generalization ability, interpretability, and robustness of the model. The prediction performance of the Lasso Regression model performs best with prediction starting from April (with RMSE of 6-8 d), the Neural Network model has the best performance with prediction starting from May (with RMSE 6-9 d), and the Random Forest model performs best with prediction starting from March (with RMSE 8-9 d). The rate with the same sign of anomaly ranges from 50% to 70% at most stations in Liaoning province, with the Lasso Regression and Neural Network models reaching a maximum rate (about 68%) with prediction starting from May and with the Random Forest model reaching a maximum rate (about 62%) with prediction starting from March. Results from feature selection and sensitivity experiments indicated that the low vegetation coverage scale is the key predictor. High vegetation coverage favors the maintenance of surface water content, and frost is more likely to occur with lowered temperatures, leading to an earlier first-frost date. The model has a poor performance after excluding the low vegetation coverage scale factor, which is the key among previous factors. In short, machine learning algorithms have high skills in the quantitative and qualitative prediction of the first-frost date.
Keywords:ERA5  Machine learning  Lasso Regression  Random Forest  Neural Network  
点击此处可从《气象与环境学报》浏览原始摘要信息
点击此处可从《气象与环境学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号