首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The advanced technologies in location-based services and telecom have yield large volumes of trajectory data. Understanding these data effectively requires intuitive yet accurate visual analysis. The visual analysis of massive trajectory data is challenged by the numerous interactions among different locations, which cause massive clutter. This paper presents a new methodology for visual analysis by integrating algebraic multigrid (AMG) method in data aggregation. The non-parametric method helps to build a multi-layer node representation from a graph which is extracted from trajectory data. The comparison with AMG and other methods shows that AMG method is more advanced in both the spatial representation and the importance of nodes. The new method is tested with real-world dataset of cell-phone signalling records in Beijing. The results show that our method is suitable for processing and creating abstraction of massive trajectory dataset, revealing inherent patterns and creating intuitive and vivid flow maps.  相似文献   

2.
基于地统计学插值方法的局部奇异性指数计算比较研究   总被引:3,自引:0,他引:3  
以铜陵矿集区土壤Pb元素为例,研究稀疏采样条件下地统计学克里格方法,序贯高斯模拟方法对奇异性指数计算的影响。研究结果表明,序贯高斯模拟方法强调了短距离范围内的空间不确定性,弥补了克里格方法平滑效应的不足,对于精细重建土壤元素的空间分布特征具有更好的效果。对于稀疏采样的数据集,较之原始数据和克里格方法,基于序贯高斯模拟方法获取的奇异性指数能够更精细的刻画局部空间结构,更好的应用于土壤地球化学异常的提取和识别。  相似文献   

3.
谭敏  刘凯  柳林  朱远辉  王大山 《地理科学进展》2017,36(10):1304-1312
人口空间化是实现人口统计数据与其他环境资源空间数据融合分析的有效途径。本文选取夜间灯光数据、道路网数据、水域分布数据、建成区数据、数字高程模型和地形坡度数据作为影响珠江三角洲人口分布的变量因子,利用随机森林模型对珠江三角洲2010年人口数据进行了30 m格网空间化,并将模拟结果与三个公开数据集作精度对比,最后基于随机森林模型的变量因子重要性分析珠江三角洲人口空间分布的影响因素。结果表明:本文模拟整体精度达到82.32%,均优于WorldPop数据集以及中国公里网格人口数据集,接近GPW数据集,而且在人口密度中等区域模拟精度最高;通过对变量因子重要性进行度量,发现夜间灯光强度是珠江三角洲人口分布的最重要指示性指标,到水域的距离、到建成区的距离和路网密度对珠江三角洲人口分布均具有重要作用。利用随机森林模型结合多源信息能够实现高空间分辨率的人口空间化,可为精细化城市管理提供重要数据源,也可为相关政策决策制定提供支持。  相似文献   

4.
郭春霞  诸云强  孙伟 《地理研究》2015,34(9):1675-1684
不同时间尺度、季节的气温数据表现出不同的空间平稳特征。为探讨分析空间平稳性对气温插值的影响规律,采用趋势线法对气温数据进行空间平稳性探索,并对比分析不同空间平稳性条件下,普通线性回归、普通克里格、回归克里格的气温插值精度及插值结果的空间分布特点。结果显示:冬季日均、月均气温与年均气温呈现空间非平稳,插值精度随时间序列的增长而提高,随着气温数据逐渐趋于稳定,精度提高的幅度逐渐下降;夏季日均、月均气温呈现空间平稳,随时间序列的增长,插值精度的提高并不显著;夏季日均气温各插值方法的插值精度普遍高于冬季日均气温。与普通克里格相比,回归克里格能有效提高空间非平稳数据的插值精度。时间序列的增长削弱了不同插值算法之间的插值精度差异和插值结果空间分布差异。  相似文献   

5.
地学数据集成的理论基础与集成体系   总被引:17,自引:2,他引:17  
地球空间数据 (简称地学数据 )来源的拓宽、更新手段的发展和应用领域的扩大使数据集成或集成使用的研究和实用化成为必需。简单地理解 ,地学数据集成是指不同来源、不同性状数据在相同环境下的使用。地学数据是对地理现象和过程及过程时空特征认知基础上的表达 ,地学数据集成的基础主要表现在 :地理现象和过程的空间和时间统一性、地学过程时空过程的连续性、地学现象和过程的层次性、地学数据认知的一致性、依赖于元数据的地学数据的透明性、数据内容和形式的相对独立性等 ;在此基础上 ,作者在论文中描述了基于地学知识和地理信息系统功能的地学数据集成概念模型和过程 ,并对地学数据集成过程中涉及到的问题进行了说明。  相似文献   

6.
It is easy for a multi-layered perception (MLP) to fit a stratified spatial interpolation pattern whose form is close to open surface; while it is easy for a radial basis function network (RBFN) to fit a pocket (radial) spatial interpolation pattern whose form is close to closed surface. However, in the real world, the spatial interpolation pattern may consist of stratified and pocket patterns. Neither MLP nor RBFN can fit the pattern easily. To combine their advantages to fit the complex hybrid spatial interpolation patterns, in this article we propose a novel neural network, MLP–RBFN hybrid network (MRHN), whose hidden layer contains sigmoid and Gaussian units at the same time. Although there are two kinds of processing units in MRHN, in this study we used the principle of minimizing the error sum of squares to derive the supervised learning rules for all the network parameters. This research took rainfall distribution in Taiwan as a case study. The results show that (1) the prediction error of the testing dataset outside the training dataset demonstrated that MRHN was the most accurate among the three networks, RBFN was the next best, and MLP was the worst; (2) the MLP model seriously underestimated the values of high observed rainfall; (3) over-learning may be a serious shortcoming of using RBFN in spatial interpolation applications; (4) MRHN may have better generalization learning capacity than RBFN in spatial interpolation applications.  相似文献   

7.
Spatial association rule mining (SARM) is an important data mining task for understanding implicit and sophisticated interactions in spatial data. The usefulness of SARM results, represented as sets of rules, depends on their reliability: the abundance of rules, control over the risk of spurious rules, and accuracy of rule interestingness measure (RIM) values. This study presents crisp-fuzzy SARM, a novel SARM method that can enhance the reliability of resultant rules. The method firstly prunes dubious rules using statistically sound tests and crisp supports for the patterns involved, and then evaluates RIMs of accepted rules using fuzzy supports. For the RIM evaluation stage, the study also proposes a Gaussian-curve-based fuzzy data discretization model for SARM with improved design for spatial semantics. The proposed techniques were evaluated by both synthetic and real-world data. The synthetic data was generated with predesigned rules and RIM values, thus the reliability of SARM results could be confidently and quantitatively evaluated. The proposed techniques showed high efficacy in enhancing the reliability of SARM results in all three aspects. The abundance of resultant rules was improved by 50% or more compared with using conventional fuzzy SARM. Minimal risk of spurious rules was guaranteed by statistically sound tests. The probability that the entire result contained any spurious rules was below 1%. The RIM values also avoided large positive errors committed by crisp SARM, which typically exceeded 50% for representative RIMs. The real-world case study on New York City points of interest reconfirms the improved reliability of crisp-fuzzy SARM results, and demonstrates that such improvement is critical for practical spatial data analytics and decision support.  相似文献   

8.
9.
沙坡头固沙植被若干土壤物理因子的空间异质性研究   总被引:26,自引:11,他引:15  
采用地统计学的理论和方法,对沙坡头人工植被表层土壤(0~15 cm, 15~30 cm)物理因子的空间异质性进行了研究。传统统计学分析结果显示:土壤水分、容重平均值在0~15 cm层小于 15~30 cm层,毛管持水量、空隙度在0~15 cm层大于15~30 cm层,各因子变异系数在0~15 cm层均大于 15~30 cm层。变异函数分析结果表明:土壤水分、土壤容重、土壤毛管持水量和空隙度在0~15 cm层具有明显的空间异质性,表层土壤水分有效变程最大为28.2 m,土壤毛管持水量有效变程最小为13.8 m,各因子自相关部分的空间异质性变化范围在85.3%~99.9%之间,显著大于随机部分的空间异质性。15~30 cm层土壤容重、毛管持水量、空隙度为线性模型。在 Krig ing插值分析的基础上,绘制了土壤属性各因子的等值分布彩图,清楚直观地表达了各因子在空间上的分布。此外还分析讨论了土壤空间异质性和植被的关系。  相似文献   

10.
Eye movement data convey a wealth of information that can be used to probe human behaviour and cognitive processes. To date, eye tracking studies have mainly focused on laboratory-based evaluations of cartographic interfaces; in contrast, little attention has been paid to eye movement data mining for real-world applications. In this study, we propose using machine-learning methods to infer user tasks from eye movement data in real-world pedestrian navigation scenarios. We conducted a real-world pedestrian navigation experiment in which we recorded eye movement data from 38 participants. We trained and cross-validated a random forest classifier for classifying five common navigation tasks using five types of eye movement features. The results show that the classifier can achieve an overall accuracy of 67%. We found that statistical eye movement features and saccade encoding features are more useful than the other investigated types of features for distinguishing user tasks. We also identified that the choice of classifier, the time window size and the eye movement features considered are all important factors that influence task inference performance. Results of the research open doors to some potential real-world innovative applications, such as navigation systems that can provide task-related information depending on the task a user is performing.  相似文献   

11.
Sketching as a natural mode for human communication and creative processes presents opportunities for improving human–computer interaction in geospatial information systems. However, to use a sketch map as user input, it must be localized within the underlying spatial data set of the information system, the base metric map. This can be achieved by a matching process called qualitative map alignment in which qualitative spatial representations of the two input maps are used to establish correspondences between each sketched object and one or more objects in the metric map. The challenge is that, to the best of our knowledge, no method for matching qualitative spatial representations suggested so far is applicable in realistic scenarios due to excessively long runtimes, incorrect algorithm design or the inability to use more than one spatial aspect at a time. We address these challenges with a metaheuristic algorithm which uses novel data structures to match qualitative spatial representations of a pair of maps. We present the design, data structures and performance evaluation of the algorithm using real-world sketch and metric maps as well as on synthetic data. Our algorithm is novel in two main aspects. Firstly, it employs a novel system of matrices known as local compatibility matrices, which facilitate the computation of estimates for the future size of a partial alignment and allow several types of constraints to be used at the same time. Secondly, the heuristic it computes has a higher accuracy than the state-of-the-art heuristic for this task, yet requires less computation. Our algorithm is also a general method for matching labelled graphs, a special case of which is the one involving complete graphs whose edges are labelled with spatial relations. The results of our evaluation demonstrate practical runtime performance and high solution quality.  相似文献   

12.
以阿坝藏族羌族自治州地质灾害频发的理县为研究区,从地形地貌、地质环境、水文条件和人类工程活动等方面选取11个影响因子,通过皮尔森相关系数研究各因子之间的相关性,从而构建滑坡易发性评价指标体系。利用信息量模型计算各影响因子的信息量值,从信息量模型得出的极低和低易发性分区中选取非滑坡样本,在此基础上将样本数据代入随机森林和径向基函数神经网络2种机器学习模型开展滑坡易发性评价,并通过接收灵敏度(Receiver Operating Characteristic,ROC)曲线进行精度验证。结果显示:随机森林模型预测出的高易发区单位面积内分布的滑坡点数量更为集中,在仅占6.666%的区域分布了74.026%的灾害点,评价结果优于径向基函数神经网络模型。ROC曲线中两模型AUC(Area Under Curve)值分别为0.893、0.874,说明随机森林模型具有更高的可靠性,比径向基函数神经网络在该区域地质灾害易发性评价中更具优势。  相似文献   

13.
Urban segregation has received increasing attention in the literature due to the negative impacts that it has on urban populations. Indices of urban segregation are useful instruments for understanding the problem as well as for setting up public policies. The usefulness of spatial segregation indices depends on their ability to account for the spatial arrangement of population and to show how segregation varies across the city. This paper proposes global spatial indices of segregation that capture interaction among population groups at different scales. We also decompose the global indices to obtain local spatial indices of segregation, which enable visualization and exploration of segregation patterns. We propose the use of statistical tests to determine the significance of the indices. The proposed indices are illustrated using an artificial dataset and a case study of socio‐economic segregation in São José dos Campos (SP, Brazil).  相似文献   

14.
全球历史森林数据中国区域的可靠性评估   总被引:3,自引:1,他引:2  
全球历史土地利用数据集对于深入理解全球或区域环境变化具有重要意义。历史森林数据作为其重要组成部分,在区域尺度上的可靠性至今鲜有评估。以中国区域为研究对象,依据中国学者基于历史文献资料重建的中国历史森林数据(CHFD),采用趋势、数量和空间格局等对比法,对全球数据集(SAGE、PJ和KK10)中国森林数据的可靠性进行评估。结果表明:①虽然全球数据集中国森林数据与CHFD在近300年的变化趋势上均呈减少态势,但数量上差异较大。其中,SAGE数据集对中国1700年以来的森林面积估算较CHFD高出约20%~40%;KK10数据集重建的1700-1850年森林数量则高出约32%~46%;而PJ数据集由于吸纳了区域性研究成果,其总量与CHFD较为接近,多数时点的数量差异低于20%。②在省区尺度上,从总量与CHFD较为接近的PJ数据集来看,其与CHFD数据集森林变化趋势差异较大省区占到84%,而数量差异较大的省区占比高达92%。③在网格尺度上,PJ与CHFD数据集相对差异率> 70%的网格占比高达60%~80%,二者的时空动态格局差异明显。④全球数据集中国历史森林数据未能客观反映该区域森林变化的过程与格局特征,造成这一现象的原因在于全球与区域性数据集重建历史数据所依据的资料源不同,以及基于不同空间尺度构建的重建方法的差异等。  相似文献   

15.
In machine learning, one often assumes the data are independent when evaluating model performance. However, this rarely holds in practice. Geographic information datasets are an example where the data points have stronger dependencies among each other the closer they are geographically. This phenomenon known as spatial autocorrelation (SAC) causes the standard cross validation (CV) methods to produce optimistically biased prediction performance estimates for spatial models, which can result in increased costs and accidents in practical applications. To overcome this problem, we propose a modified version of the CV method called spatial k-fold cross validation (SKCV), which provides a useful estimate for model prediction performance without optimistic bias due to SAC. We test SKCV with three real-world cases involving open natural data showing that the estimates produced by the ordinary CV are up to 40% more optimistic than those of SKCV. Both regression and classification cases are considered in our experiments. In addition, we will show how the SKCV method can be applied as a criterion for selecting data sampling density for new research area.  相似文献   

16.
The discovery of spatial clusters formed by proximal spatial units with similar non-spatial attribute values plays an important role in spatial data analysis. Although several spatial contiguity-constrained clustering methods are currently available, almost all of them discover clusters in a geographical dataset, even though the dataset has no natural clustering structure. Statistically evaluating the significance of the degree of homogeneity within a single spatial cluster is difficult. To overcome this limitation, this study develops a permutation test approach Specifically, the homogeneity of a spatial cluster is measured based on the local variance and cluster member permutation, and two-stage permutation tests are developed to determine the significance of the degree of homogeneity within each spatial cluster. The proposed permutation tests can be integrated into the existing spatial clustering algorithms to detect homogeneous spatial clusters. The proposed tests are compared with four existing tests (i.e., Park’s test, the contiguity-constrained nonparametric analysis of variance (COCOPAN) method, spatial scan statistic, and q-statistic) using two simulated and two meteorological datasets. The comparison shows that the proposed two-stage permutation tests are more effective to identify homogeneous spatial clusters and to determine homogeneous clustering structures in practical applications.  相似文献   

17.
A Monte Carlo approach is used to evaluate the uncertainty caused by incorporating Post Office Box (PO Box) addresses in point‐cluster detection for an environmental‐health study. Placing PO Box addresses at the centroids of postcode polygons in conventional geocoding can introduce significant error into a cluster analysis of the point data generated from them. In the restricted Monte Carlo method I presented in this paper, an address that cannot be matched to a precise location is assigned a random location within the smallest polygon believed to contain that address. These random locations are then combined with the locations of precisely matched addresses, and the resulting dataset is used for performing cluster analysis. After repeating this randomization‐and‐analysis process many times, one can use the variance in the calculated cluster evaluation statistics to estimate the uncertainty caused by the addresses that cannot be precisely matched. This method maximizes the use of the available spatial information, while also providing a quantitative estimate of the uncertainty in that utilization. The method is applied to lung‐cancer data from Grafton County, New Hampshire, USA, in which the PO Box addresses account for more than half of the address dataset. The results show that less than 50% of the detected cluster area can be considered to have high certainty.  相似文献   

18.
Georeferenced user-generated datasets like those extracted from Twitter are increasingly gaining the interest of spatial analysts. Such datasets oftentimes reflect a wide array of real-world phenomena. However, each of these phenomena takes place at a certain spatial scale. Therefore, user-generated datasets are of multiscale nature. Such datasets cannot be properly dealt with using the most common analysis methods, because these are typically designed for single-scale datasets where all observations are expected to reflect one single phenomenon (e.g., crime incidents). In this paper, we focus on the popular local G statistics. We propose a modified scale-sensitive version of a local G statistic. Furthermore, our approach comprises an alternative neighbourhood definition that enables to extract certain scales of interest. We compared our method with the original one on a real-world Twitter dataset. Our experiments show that our approach is able to better detect spatial autocorrelation at specific scales, as opposed to the original method. Based on the findings of our research, we identified a number of scale-related issues that our approach is able to overcome. Thus, we demonstrate the multiscale suitability of the proposed solution.  相似文献   

19.
Attractive regions can be detected and recommended by investigating users’ online footprints. However, social media data suffers from short noisy text and lack of a-priori knowledge, impeding the usefulness of traditional semantic modelling methods. Another challenge is the need for an effective strategy for the selection/recommendation of candidate regions. To address these challenges, we propose a comprehensive workflow which combines semantic and location information of social media data to recommend thematic urban regions to users with specific interests. This workflow is novel in: (1) developing a data-driven geographic topic modelling method which utilizes the co-occurrence patterns of self-explanatory semantic information to detect semantic communities; (2) proposing a new recommendation strategy with the consideration of region’s spatial scale. The workflow was implemented using a real-world dataset and evaluation conducted at three different levels: semantic representativeness, topic identification and recommendation desirability. The evaluation showed that the semantic communities detected were internally consistent and externally differentiable and that the recommended regions had a high degree of desirability. The work has demonstrated the effectiveness of self-explanatory semantic information for geographic topic modelling and highlighted the importance of including region spatial scale into the model for an effective region recommending strategy.  相似文献   

20.
林丹淳  谭敏  刘凯  柳林  朱远辉 《热带地理》2020,40(2):346-356
以人口密度差异显著的广东省为研究区,比较Worldpop、GPW v4和2种中国公里网格人口分布数据集的空间分布一致性,并以第六次全国人口普查数据为真值,按人口密度分为高、中、低3组,从误差的数值分布和空间分布两方面定量评价4种数据集的精度,最后讨论估算误差的可能来源及数据适用性。结果表明,4种网格人口数据集中Worldpop整体精度最高,且在人口密集区的精度也是最高;GPW v4在低人口密度和中人口密度区域精度略高于Worldpop,但对镇街内人口分布细节刻画不够详细;2种中国公里网格人口分布数据集精度较前两者低,主要受空间化方法和模型变量的选择所限制。Worldpop适合用于人口密度中等及人口密度高区域的精细化研究,GPW v4适合用于长时序、最小研究单元大于镇街的研究,第一种中国公里网格人口分布数据集适合用于需要考虑镇街内人口分布和空间异质性的研究,第二种中国公里网格人口分布数据集适用于需要考虑人口分布细节和空间格局变化的长时序研究。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号