首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于机器学习的洋岛玄武岩主量元素预测稀土元素
引用本文:洪瑾,甘成势,刘洁.基于机器学习的洋岛玄武岩主量元素预测稀土元素[J].地学前缘,2019,26(4):45-54.
作者姓名:洪瑾  甘成势  刘洁
作者单位:中山大学地球科学与工程学院,广东广州510275;中山大学地球科学与工程学院,广东广州510275;广东省地质过程与矿产资源探查重点实验室,广东广州510275
基金项目:国家重点研发计划项目(2016YFC0600506);国家自然科学基金项目(41574087)
摘    要:地学共享数据库(如GEOROC、PetDB等)可为地球科学研究提供重要基础数据。然而,这些数据库均存在一个明显缺陷:样品的9种主量元素(SiO2、TiO2、Al2O3、CaO、MgO、MnO、K2O、Na2O和P2O5)均有准确数据,但稀土元素(rare earth elements,REE)数据大量缺失。鉴于REE在地球化学领域的重要作用,我们尝试为数据库缺失的REE值提供一个补全方案,即利用机器学习中的随机森林方法实现由9种主量元素预测REE值。以洋岛玄武岩(ocean island basalt,OIB)为例,把从GEOROC库中搜集到的1 283组OIB数据按8∶2的比例分为两组,其中80%的数据作为训练数据集用于建模,20%的数据作为测试数据集验证模型。比较了随机森林和多元线性回归方法对相同数据进行建模和预测的效果差异,发现无论是回归建模还是预测,随机森林方法都优于多元线性回归,且随着输入参数与输出参数之间关系的复杂化,这种优势更加明显。随机森林对测试数据集的预测效果整体较好,只是随着REE原子序数的增大,预测效果逐渐减弱。这一方面可能是因为原子序数大的REE与主量元素的关系更弱;另一方面可能是由于原子序数大的REE与主量元素的关系更加复杂。其次,随机森林方法预测的REE配分曲线与实际配分曲线吻合度较高,且预测所得配分曲线的区分能力较强,能够反映实际配分曲线之间的相对差异,这一点对推断地球化学过程尤为重要。随机森林方法随着训练数据的增多,其建立的模型也将越稳定,预测结果也会更精确。因此,随着数据库的不断完善,对数据库中REE值的预测也将更为可信、可行。

关 键 词:机器学习  随机森林  洋岛玄武岩  主量元素  稀土元素
收稿时间:2018-04-12

Prediction of REEs in OIB by major elements based on machine learning
HONG Jin,GAN Chengshi,LIU Jie.Prediction of REEs in OIB by major elements based on machine learning[J].Earth Science Frontiers,2019,26(4):45-54.
Authors:HONG Jin  GAN Chengshi  LIU Jie
Institution:(School of Earth Sciences and Engineering,Sun Yat-sen University,Guangzhou 510275,China;Guangdong Provincial Key Laboratory of Mineral Resources & Geological Processes,Guangzhou 510275,China)
Abstract:Geoscience shared databases (GEOROC, PetDB, etc.) provide important basic data for geoscience research. However, there is an obvious defect in these databases, i.e., in database samples, the nine major elements (SiO2, TiO2, Al2O3, CaO, MgO, MnO, K2O, Na2O and P2O5) are mostly present, but rare earth element (REE) data are often missing. In view of the important role of REE in geochemistry, here we attempt to provide a solution for supplementing the missing REE data by using random forest method of machine learning to predict REE values by major elements. Taking Ocean Island Basalt (OIB) as an example, 1283 OIB samples collected from the GEOROC database were divided into two groups: 80% of the data were used as training data for modeling and the remaining 20% were test data for model validation. Comparing the modeling and prediction results using random forest and multivariable linear regression methods on the same data, we found that the random forest method was superior in both aspects with clear advantage; however, the relationship between input and output parameters was not simple. The random forest method predicted the test data very well for light REEs, but prediction power decreased gradually with increasing atomic number, possibly due to a weaker or more complex relationship between heavy rare earth and major elements. The predicted REE distribution pattern by the random forest method matched the actual REE distribution pattern, with good distinguishing power to reflect the relative difference between the actual distribution patterns, which is particularly important to infer the geochemical process. With increasing training data, the model established by the random forest method will be more stable thus to provide more accurate prediction results. Ultimately, REE value prediction will be more reliable and feasible with continuous improvement of databases.
Keywords:machine learning  random forest  oceanic island basalt  major elements  rare earth elements  
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《地学前缘》浏览原始摘要信息
点击此处可从《地学前缘》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号