首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Geo-tagged travel photos on social networks often contain location data such as points of interest (POIs), and also users’ travel preferences. In this paper, we propose a hybrid ensemble learning method, BAyes-Knn, that predicts personalized tourist routes for travelers by mining their geographical preferences from these location-tagged data. Our method trains two types of base classifiers to jointly predict the next travel destination: (1) The K-nearest neighbor (KNN) classifier quantifies users’ location history, weather condition, temperature and seasonality and uses a feature-weighted distance model to predict a user’s personalized interests in an unvisited location. (2) A Bayes classifier introduces a smooth kernel function to estimate a-priori probabilities of features and then combines these probabilities to predict a user’s latent interests in a location. All the outcomes from these subclassifiers are merged into one final prediction result by using the Borda count voting method. We evaluated our method on geo-tagged Flickr photos and Beijing weather data collected from 1 January 2005 to 1 July 2016. The results demonstrated that our ensemble approach outperformed 12 other baseline models. In addition, the results showed that our framework has better prediction accuracy than do context-aware significant travel-sequence-patterns recommendations and frequent travel-sequence patterns.  相似文献   

2.
精准刻画城市住宅地价分布特征,对于科学引导城市空间布局规划、有效实现城市精明增长等具有重要意义。而城市住宅地价与其潜在影响因素之间的复杂非线性关系,给地价分布精细模拟带来了挑战。论文旨在探索基于地理大数据和集成学习的城市住宅地价分布模拟方法体系,以满足快速、精准监测地价动态变化的需要。选取武汉市为典型区,以住宅用地交易样点、兴趣点(points of interest, POI)和夜间灯光影像为数据源,以500 m分辨率网格为估价单元,提取POI核密度和夜间灯光强度作为住宅地价预测变量,采用机器学习算法和bagging、stacking集成方法构建住宅地价预测模型,并对比分析其精度。研究发现:① 单个机器学习算法中,支持向量回归(support vector regression, SVR)预测精度最高,接下来依次是k最近邻算法(k-nearest neighbor algorithm, k-NN)、高斯过程回归(Gaussian process regression, GPR)和BP神经网络(back propagation neural networks, BP-NN);② 在提升单个算法预测精度方面,stacking方法的性能优于bagging方法,使用stacking集成SVR和k-NN的地价预测模型精度最高,其平均绝对百分误差仅为8.29%,拟合优度R2达0.814;③ 基于论文所构建模型生成的城市住宅地价分布图能有效表征价格圈层分布特征和局部奇异性。研究结果可为城市住宅地价评估提供新的思路和方法借鉴。  相似文献   

3.

With an increasing demand for raw materials, predictive models that support successful mineral exploration targeting are of great importance. We evaluated different machine learning techniques with an emphasis on boosting algorithms and implemented them in an ArcGIS toolbox. Performance was tested on an exploration dataset from the Iberian Pyrite Belt (IPB) with respect to accuracy, performance, stability, and robustness. Boosting algorithms are ensemble methods used in supervised learning for regression and classification. They combine weak classifiers, i.e., classifiers that perform slightly better than random guessing to obtain robust classifiers. Each time a weak learner is added; the learning set is reweighted to give more importance to misclassified samples. Our test area, the IPB, is one of the oldest mining districts in the world and hosts giant volcanic-hosted massive sulfide (VMS) deposits. The spatial density of ore deposits, as well as the size and tonnage, makes the area unique, and due to the high data availability and number of known deposits, well-suited for testing machine learning algorithms. We combined several geophysical datasets, as well as layers derived from geological maps as predictors of the presence or absence of VMS deposits. Boosting algorithms such as BrownBoost and Adaboost were tested and compared to Logistic Regression (LR), Random Forests (RF) and Support Vector machines (SVM) in several experiments. We found performance results relatively similar, especially to BrownBoost, which slightly outperformed LR and SVM with respective accuracies of 0.96 compared to 0.89 and 0.93. Data augmentation by perturbing deposit location led to a 7% improvement in results. Variations in the split ratio of training and test data led to a reduction in the accuracy of the prediction result with relative stability occurring at a critical point at around 26 training samples out of 130 total samples. When lower numbers of training data were introduced accuracy dropped significantly. In comparison with other machine learning methods, Adaboost is user-friendly due to relatively short training and prediction times, the low likelihood of overfitting and the reduced number of hyperparameters for optimization. Boosting algorithms gave high predictive accuracies, making them a potential data-driven alternative for regional scale and/or brownfields mineral exploration.

  相似文献   

4.
Zhang  Qiming  Wang  Enyuan  Feng  Xiaojun  Wang  Chao  Qiu  Liming  Wang  Hao 《Natural Resources Research》2021,30(2):1817-1834

With the increasing depth of underground engineering, the risk of coal–rock dynamic disasters such as rockburst is becoming more and more serious and complex, which seriously threatens the safety of coal resource, mine production and the surface ecological environment. However, the existing risk indices and methods used for evaluating rockburst risk cannot be fully applied to deep goal seam group (DCG) mining. For the safe exploitation of coal resources, in this paper, based on statistical analyses of 300 cases of rockburst, six new indices are proposed for evaluating rockburst risk in the DCG, namely dip angle, moisture content, stability of coal seam, advancing speed of working face, disturbance factors and support patterns. In addition, the influence of multiple factors coupling and superposition on rockburst risk was considered. Thus, the Comprehensive Index Method of rockburst risk of Deep Coal seam Group (DCG–CIM) based on analytic hierarchy process was established. Finally, rockburst risk in the evaluation area was quantitatively assessed into four grades, including “No rockburst risk”, “Weak rockburst risk”, “Medium rockburst risk” and “Strong rockburst risk”. Taking the 2233 working face of Hengda Coalmine as an example, the evaluation results show that the ranges of 0–184 m, 224–284 m, 324–384 m, 424–484 m, 524–584 m and 594–624 m from terminal line of haulage roadway on 2233 working face were the medium rockburst risk zones, which are in accordance with the on-site impact damage results and are more accurate than the traditional method. The DCG–CIM can consider more inducing factors and obtain more accurate and reliable evaluation results and is more suitable for deep coal seam group mining.

  相似文献   

5.
文章主要根据机器学习算法(随机森林算法和极端梯度提升算法)和遥感水深反演的原理,利用Sentinel_2多光谱卫星数据和无人船实测水深数据,对内陆水体——梅州水库建立了随机森林(RF)、极端梯度提升(XGBoost)和支持向量机(SVM)水深反演模型,并对反演结果进行对比分析。结果表明:1)RF的训练精度为97%,测试精度为0.80;XGBoost模型的训练精度为97%,测试精度为0.79;SVM的训练精度为90%,测试精度为0.78。说明了在水深预测方面RF模型和XGBoost模型比SVM模型表现更好,对各个区段的水深值较为敏感。2)根据运行时间考察各个模型的效率,其中RF模型从读取数据至输出结果耗时3.92 s;XGBoost模型4.26 s;SVM模型6.66 s。因此,在反演精度和效率上RF模型优于XGBoost模型优于SVM模型,且RF模型的预测结果图细节更加丰富,轮廓更加分明;XGBoost模型次之,但总体效果也较好;SVM模型表现最差。由此可知,机器学习水深反演模型获得的水深结果精度明显提高,解决了传统水深反演模型精度不高的问题。  相似文献   

6.

Globally, groundwater plays a major role in supplying drinking water for urban and rural population and is used for irrigation to grow crops and in many industrial processes. A novel self-learning random forest (SLRF) model is developed and validated for groundwater yield zonation within the Yeondong Province in South Korea. This study was conducted with an inventory data initially divided randomly into 70% for training and 30% for testing and 13 groundwater-conditioning factors. SLRF was optimized using Bayesian optimization method. We also compared our method to other machine learning methods including support vector machine (SVM), artificial neural networks (ANN), decision trees (DT), and voting ensemble models. Model validation was accomplished using several methods, including a confusion matrix, receiver operating characteristics, cross-validation, and McNemar’s test. Our proposed self-learning method improves random forest (RF) generalization performance by about 23%, with SLRF success rates of 0.76 and prediction rates of 0.83. In addition, the optimized SLRF performed better [according to a threefold cross-validated AUC (area under curve) of 0.75] than that using randomly initialized parameters (0.57). SLRF outperformed all of the other models for the testing dataset (RF, SVM, ANN, DT, and Voted ANN-RF) when the overall accuracy, prediction rate, and cross-validated AUC metrics were considered. The SLRF also estimated the contribution of individual groundwater conditioning factors and showed that the three most influential factors were geology (1.00), profile curvature (0.97), and TWI (0.95). Overall, SLRF effectively modeled groundwater potential, even within data-scarce regions.

  相似文献   

7.
X. Yao  L.G. Tham  F.C. Dai 《Geomorphology》2008,101(4):572-582
The Support Vector Machine (SVM) is an increasingly popular learning procedure based on statistical learning theory, and involves a training phase in which the model is trained by a training dataset of associated input and target output values. The trained model is then used to evaluate a separate set of testing data. There are two main ideas underlying the SVM for discriminant-type problems. The first is an optimum linear separating hyperplane that separates the data patterns. The second is the use of kernel functions to convert the original non-linear data patterns into the format that is linearly separable in a high-dimensional feature space. In this paper, an overview of the SVM, both one-class and two-class SVM methods, is first presented followed by its use in landslide susceptibility mapping. A study area was selected from the natural terrain of Hong Kong, and slope angle, slope aspect, elevation, profile curvature of slope, lithology, vegetation cover and topographic wetness index (TWI) were used as environmental parameters which influence the occurrence of landslides. One-class and two-class SVM models were trained and then used to map landslide susceptibility respectively. The resulting susceptibility maps obtained by the methods were compared to that obtained by the logistic regression (LR) method. It is concluded that two-class SVM possesses better prediction efficiency than logistic regression and one-class SVM. However, one-class SVM, which only requires failed cases, has an advantage over the other two methods as only “failed” case information is usually available in landslide susceptibility mapping.  相似文献   

8.
Resource estimation of a placer deposit is always a difficult and challenging job because of high variability in the deposit. The complexity of resource estimation increases when drill-hole data are sparse. Since sparsely sampled placer deposits produce high-nugget variograms, a traditional geostatistical technique like ordinary kriging sometimes fails to produce satisfactory results. In this article, a machine learning algorithm—the support vector machine (SVM)—is applied to the estimation of a platinum placer deposit. A combination of different neighborhood samples is selected for the input space of the SVM model. The trade-off parameter of the SVM and the bandwidth of the kernel function are selected by genetic algorithm learning, and the algorithm is tested on a testing data set. Results show that if eight neighborhood samples and their distances and angles from the estimated point are considered as the input space for the SVM model, the developed model performs better than other configurations. The proposed input space-configured SVM model is compared with ordinary kriging and the traditional SVM model (location as input) for resource estimation. Comparative results reveal that the proposed input space-configured SVM model outperforms the other two models.  相似文献   

9.

Blast-induced flyrock is a hazardous and undesirable phenomenon that may occur in surface mines, especially when blasting takes place near residential areas. Therefore, accurate prediction of flyrock distance is of high significance in the determination of the statutory danger area. To this end, there is a practical need to propose an accurate model to predict flyrock. Aiming at this topic, this study presents two machine learning models, including extreme learning machine (ELM) and outlier robust ELM (ORELM), for predicting flyrock. To the best of our knowledge, this is the first work that investigates the use of ORELM model in the field of flyrock prediction. To construct and verify the proposed ELM and ORELM models, a database including 82 datasets has been collected from the three granite quarry sites in Malaysia. Additionally, artificial neural network (ANN) and multiple regression models were used for comparison. According to the results, both ELM and ORELM models performed satisfactorily, and their performances were far better compared to the performances of ANN and multiple regression models.

  相似文献   

10.
基于模式优选的21世纪中国气候变化情景集合预估   总被引:1,自引:1,他引:0  
未来气候变化情景预估是制定气候变化应对和适应策略的科学基础。本文利用参与耦合模式比较计划第五阶段(CMIP5)的30个气候模式的模拟数据,通过评估各模式对历史气候变化的模拟能力,筛选出模拟区域气候变化的最优模式组合,进而建立偏最小二乘回归(PLS)集合预估模型,据此利用最优模式模拟结果预估区域温度和降水变化情景。通过与历史数据的对比,研究发现本文基于最优模式建立的PLS集合预估模型不仅优于传统的多模式集合平均,而且也优于利用全部模式建立的PLS集合预估模型,体现了模式优选过程的重要性。本文基于优选模式的PLS集合预估模型预估结果表明:① 21世纪各区域温度将持续上升,且冬半年升温速率总体大于夏半年,北方地区升温速率总体高于南方地区;RCP 4.5排放情景下温度上升先快后慢,转折点出现在21世纪中期,RCP 8.5排放情景下,呈持续增加趋势,至21世纪末的升温幅度约为RCP 4.5情景的2倍。② 21世纪各区降水变化均呈显著增加趋势,并表现出高排放情景大于低排放情景,少雨区大于多雨区的特征,但是降水增加过程伴有明显的年代际波动。对比发现,传统的等权重集合平均全部模式(EMC)方法预估的中国夏季变暖速率高于冬季,且降水基本呈线性增加,有悖于全球变暖的基本特征及中国降水具有鲜明的年代际变化特征的基本认识。因而,本文预估的温度和降水变化特征均更符合中国气候变化的基本规律。  相似文献   

11.
Prefetching is a process in which the necessary portion of data is predicted and loaded into memory beforehand. The increasing usage of geographic data in different types of applications has motivated the development of different prefetching techniques. Each prefetching technique serves a specific type of application, such as two-dimensional geographic information systems or three-dimensional visualization, and each one is crafted for the corresponding navigation patterns. However, as the boundary between these application types blurs, these techniques become insufficient for hybrid applications (such as digital moving maps), which embody various capabilities and navigation patterns. Therefore, a set of techniques should be used in combination to handle different prefetching requirements. In this study, a priority-based tile prefetching approach is proposed, which enables the ensemble usage of various techniques at the same time. The proposed approach manages these techniques dynamically through a fuzzy-logic-based inference engine to increase prefetching performance and to adapt to various exhibited behaviours. This engine performs adaptive decisions about the advantages of each technique according to their individual accuracy and activity level using fuzzy logic to determine how each prefetching technique performs. The results obtained from the experiments showed that up to a 25% increase in prefetching performance is achieved with the proposed ensemble usage over individual usage. A generic model for prefetching techniques was also developed and used to describe the given approach. Finally, a cross-platform software framework with four different prefetching techniques was developed to let other users utilize the proposed approach.  相似文献   

12.
金昭  吕建树 《地理研究》2022,41(6):1731-1747
为识别区域土壤重金属的空间变异特征并厘清其影响因素,本研究构建了多元线性回归(MLR)、弹性网络回归(ENR)、随机森林(RF)、随机梯度提升(SGB)、堆叠(stacking)集成模型、反向传播神经网络(BP-ANN)、基于模型平均的神经网络集成(avNNet)、线性核支持向量机(SVM-L)和高斯核支持向量机(SVM-R)共九种机器学习模型,利用山东省中部土壤重金属(Cd、Cu、Hg、Pb和Zn)和环境辅助变量数据,开展区域土壤重金属空间预测精度比较研究。结果表明:RF对五种重金属空间预测的决定系数(R2)介于0.263~0.448之间,平均绝对误差(MAE)和均方根误差(RMSE)分别小于8.408和10.636,预测值/实际值(P/O)均接近于1,对五种重金属的预测效果均较为理想,是研究区土壤重金属空间预测的最优模型;SVM-R整体预测性能仅次于RF,各项精度评价指标均相对稳健,可作为备选模型;其余七种模型的预测性能均明显低于RF和SVM-R。RF的空间预测结果显示,研究区五种重金属呈现出相似的空间分布格局,含量均由研究区东北部向西南部递减,包括东北部、北部和南部3个高值区,且高值区与当地工业–交通密集区的分布格局一致,反映出人类活动是研究区土壤重金属空间分异的主要影响因素。本研究可为区域土壤污染调查、评价和管控提供科学参考。  相似文献   

13.
赵雨  白宇  员学锋 《地理科学》2022,42(8):1421-1432
以传统社会经济指标为主导的贫困识别依赖于详尽的普查抽查数据,收集和处理不同质量和数量的普查抽查数据来研究区域贫困需要耗费大量的人力物力和时间,难以快速动态地监测贫困状态。然而时间分辨率高且客观易获取的夜间灯光数据可以在一定程度上弥补统计数据的劣势,即时地反映地表社会经济现象。机器学习算法能够从这些数据中学习出规律和模式,从中挖掘出潜在信息来识别贫困地区。基于陕西省NPP-VIIRS夜间灯光数据,通过构造多维统计变量,利用逻辑回归、支持向量机、K近邻、随机森林、决策树和梯度提升树6种监督分类算法识别贫困地区。结果表明从夜间灯光数据提取的多维特征能够更好的应用于贫困地区的识别,6种算法都能够准确的识别贫困地区,分类结果在空间上具有相似性,且表现出一定的地域性,分类准确度达到76.82%~83.20%。根据混淆矩阵进一步对比各个算法的特点,认为随机森林算法在误差偏移和分类精度等方面综合表现最佳。  相似文献   

14.
In this contribution, we used discriminant analysis (DA) and support vector machine (SVM) to model subsurface gold mineralization by using a combination of the surface soil geochemical anomalies and earlier bore data for further drilling at the Sari-Gunay gold deposit, NW Iran. Seventy percent of the data were used as the training data and the remaining 30 % were used as the testing data. Sum of the block grades, obtained by kriging, above the cutoff grade (0.5 g/t) was multiplied by the thickness of the blocks and used as productivity index (PI). Then, the PI variable was classified into three classes of background, medium, and high by using fractal method. Four classification functions of SVM and DA methods were calculated by the training soil geochemical data. Also, by using all the geochemical data and classification functions, the general extension of the gold mineralized zones was predicted. The mineral prediction models at the Sari-Gunay hill were used to locate high and moderate potential areas for further infill systematic and reconnaissance drilling, respectively. These models at Agh-Dagh hill and the area between Sari-Gunay and Agh-Dagh hills were used to define the moderate and high potential areas for further reconnaissance drilling. The results showed that the nu-SVM method with 73.8 % accuracy and c-SVM with 72.3 % accuracy worked better than DA methods.  相似文献   

15.
古尔班通古特沙漠是中国第二大沙漠,也是中国固定和半固定沙丘主要分布区,固沙灌木种较多。冠幅不仅是反映固沙灌木可视化的重要参数,也是反映沙漠植被生长情况的重要变量。以3种沙丘(固定沙丘、半固定沙丘和流动沙丘)上主要固沙灌木为研究对象,利用12种基础模型、BP(Backpropagation Neural Network)神经网络和支持向量机(Support Vector Machine,SVM)机器学习算法建立了基于固沙灌木株高和冠长率的冠幅预测模型,同时将两种机器学习算法拟合结果与基础模型进行比较,最终选出了适合研究区的冠幅预测模型。结果表明:(1)不同沙丘类型和不同灌木种类的最优冠幅预测模型不同,且固定沙丘和半固定沙丘模型优于流动沙丘。3种沙丘类型最优拟合为M2(Quadratic Model)模型;(2)白梭梭(Haloxylon persicum)在半固定沙丘和流动沙丘上拟合的最优模型分别为M2、M7(Gompertz),沙拐枣(Calligonum mongolicum)最优模型为M2,蛇麻黄(Ephedra distachya)和油蒿(Artemisia ordosica)在...  相似文献   

16.
Arsenic is often present in gold mining areas. The high sensitivity of arsenic to biogeochemical conditions may lead to catastrophic consequences through contamination of resources such as ground water. Therefore, it is critical to understand the spatial occurrence of arsenic across a given site. Previous studies using traditional pattern recognition techniques such as neural networks and kriging have not been entirely successful in predicting arsenic concentrations across a gold mining area. The methods used in this paper are the support vector machines (SVM) and robust least-square support vector machines (robust LS-SVM). The two techniques were used to predict arsenic concentrations in the sediments of Circle City, Alaska, using the gold concentration distribution present within the sediments. The analysis of the results shows an improved performance and better predictive capabilities of SVM and robust LS-SVM than that of the neural networks and kriging techniques. The robust LS-SVM performed better than the SVM. The performance of the SVM was affected by outliers. The removal of the outliers from the data set and application of SVM showed improved results.  相似文献   

17.
Research on forest phenology is an important parameter related to climate and environmental changes. An optical camera was used as a near-earth remote sensing satellite device to obtain forest images, and the data of Green excess index (GEI) in the images were calculated, which was fitted with the seasonal variation curve of GEI data by double Logistic method and normalization method. LSTM and GRU deep learning models were introduced to train and test the GEI data. Moreover, the rationality and performance evaluation of the deep learning model were verified, and finally the model predicted the trend of GEI data in the next 60 days. Results showed: In the aspects of forest phenology training and prediction, GRU and LSTM models were verified by histograms and autocorrelation graphs, indicating that the distribution of predicted data was consistent with the trend of real data, LSTM and GRU model data were feasible and the model was stable. The differences of MSE, RMSE, MAE and MAPE between LSTM model and GRU model were 0.0014, 0.013, 0.008 and 5.26%, respectively. GRU had higher performance than LSTM. The prediction of LSTM and GRU models about GEI data for the next 60 days both showed a trend chart consistent with the change trend of GEI data in the first half of the year. GRU and LSTM were used to predict GEI data by deep learning model, and the response of LSTM and GRU deep learning models in forest phenology prediction was realized, and the performance of GRU was better than that of LSTM model. It could further reveal the growth and climate change of forest phenology in the future, and provide a theoretical basis for the application of forest phenology prediction.  相似文献   

18.
Seabed sediment textural parameters such as mud, sand and gravel content can be useful surrogates for predicting patterns of benthic biodiversity. Multibeam sonar mapping can provide near-complete spatial coverage of high-resolution bathymetry and backscatter data that are useful in predicting sediment parameters. Multibeam acoustic data collected across a ~1000 km2 area of the Carnarvon Shelf, Western Australia, were used in a predictive modelling approach to map eight seabed sediment parameters. Four machine learning models were used for the predictive modelling: boosted decision tree, random forest decision tree, support vector machine and generalised regression neural network. The results indicate overall satisfactory statistical performance, especially for %Mud, %Sand, Sorting, Skewness and Mean Grain Size. The study also demonstrates that predictive modelling using the combination of machine learning models has provided the ability to generate prediction uncertainty maps. However, the single models were shown to have overall better prediction performance than the combined models. Another important finding was that choosing an appropriate set of explanatory variables, through a manual feature selection process, was a critical step for optimising model performance. In addition, machine learning models were able to identify important explanatory variables, which are useful in identifying underlying environmental processes and checking predictions against the existing knowledge of the study area. The sediment prediction maps obtained in this study provide reliable coverage of key physical variables that will be incorporated into the analysis of covariance of physical and biological data for this area.  相似文献   

19.
Recently, researchers have introduced deep learning methods such as convolutional neural networks (CNN) to model spatio-temporal data and achieved better results than those with conventional methods. However, these CNN-based models employ a grid map to represent spatial data, which is unsuitable for road-network-based data. To address this problem, we propose a deep spatio-temporal residual neural network for road-network-based data modeling (DSTR-RNet). The proposed model constructs locally-connected neural network layers (LCNR) to model road network topology and integrates residual learning to model the spatio-temporal dependency. We test the DSTR-RNet by predicting the traffic flow of Didi cab service, in an 8-km2 region with 2,616 road segments in Chengdu, China. The results demonstrate that the DSTR-RNet maintains the spatial precision and topology of the road network as well as improves the prediction accuracy. We discuss the prediction errors and compare the prediction results to those of grid-based CNN models. We also explore the sensitivity of the model to its parameters; this will aid the application of this model to network-based data modeling.  相似文献   

20.
高分辨率气候数据是研究气候变化对农业、生态、水文影响的驱动数据,动力和统计降尺度模型是两类常用的生成高分辨率气候数据的方法,近年来机器学习模型也被用到气候变化的研究中,但针对不同站点(下垫面)的多种统计降尺度模型的对比研究较少.石羊河流域土地利用类型多样,海拔变化显著,适合研究降尺度模型的适用性.本研究选择2种传统统计...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号