首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
ABSTRACT

Geospatial data conflation is aimed at matching counterpart features from two or more data sources in order to combine and better utilize information in the data. Due to the importance of conflation in spatial analysis, different approaches to the conflation problem have been proposed ranging from simple buffer-based methods to probability and optimization based models. In this paper, I propose a formal framework for conflation that integrates two powerful tools of geospatial computation: optimization and relational databases. I discuss the connection between the relational database theory and conflation, and demonstrate how the conflation process can be formulated and carried out in standard relational databases. I also propose a set of new optimization models that can be used inside relational databases to solve the conflation problem. The optimization models are based on the minimum cost circulation problem in operations research (also known as the network flow problem), which generalizes existing optimal conflation models that are primarily based on the assignment problem. Using comparable datasets, computational experiments show that the proposed conflation method is effective and outperforms existing optimal conflation models by a large margin. Given its generality, the new method may be applicable to other data types and conflation problems.  相似文献   

2.
Geospatial data matching is an important prerequisite for data integration, change detection and data updating. At present, crowdsourcing geospatial data are attracting considerable attention with its significant potential for timely and cost-effective updating of geospatial data and Geographical Information Science (GIS) applications. To integrate the available and up-to-date information of multi-source geospatial data, this article proposes a heuristic probabilistic relaxation road network matching method. The proposed method starts with an initial probabilistic matrix according to the dissimilarities in the shapes and then integrates the relative compatibility coefficient of neighbouring candidate pairs to iteratively update the initial probabilistic matrix until the probabilistic matrix is globally consistent. Finally, the initial 1:1 matching pairs are selected on the basis of probabilities that are calculated and refined on the basis of the structural similarity of the selected matching pairs. A process of matching is then implemented to find M:N matching pairs. Matching between OpenStreetMap network data and professional road network data shows that our method is independent of matching direction, successfully matches 1:0 (Null), 1:1 and M:N pairs, and achieves a robust matching precision of above 95%.  相似文献   

3.
Linear feature matching is one of the crucial components for data conflation that sees its usefulness in updating existing data through the integration of newer data and in evaluating data accuracy. This article presents a simplified linear feature matching method to conflate historical and current road data. To measure the similarity, the shorter line median Hausdorff distance (SMHD), the absolute value of cosine similarity (aCS) of the weighted linear directional mean values, and topological relationships are adopted. The decision tree analysis is employed to derive thresholds for the SMHD and the aCS. To demonstrate the usefulness of the simple linear feature matching method, four models with incremental configurations are designed and tested: (1) Model 1: one-to-one matching based on the SMHD; (2) Model 2: matching with only the SMHD threshold; (3) Model 3: matching with the SMHD and the aCS thresholds; and (4) Model 4: matching with the SMHD, the aCS, and topological relationships. These experiments suggest that Model 2, which considers only distance, does not provide stable results, while Models 3 and 4, which consider direction and topological relationships, produce stable results with levels of accuracy around 90% and 95%, respectively. The results suggest that the proposed method is simple yet robust for linear feature matching.  相似文献   

4.
ABSTRACT

Address matching is a crucial step in geocoding, which plays an important role in urban planning and management. To date, the unprecedented development of location-based services has generated a large amount of unstructured address data. Traditional address matching methods mainly focus on the literal similarity of address records and are therefore not applicable to the unstructured address data. In this study, we introduce an address matching method based on deep learning to identify the semantic similarity between address records. First, we train the word2vec model to transform the address records into their corresponding vector representations. Next, we apply the enhanced sequential inference model (ESIM), a deep text-matching model, to make local and global inferences to determine if two addresses match. To evaluate the accuracy of the proposed method, we fine-tune the model with real-world address data from the Shenzhen Address Database and compare the outputs with those of several popular address matching methods. The results indicate that the proposed method achieves a higher matching accuracy for unstructured address records, with its precision, recall, and F1 score (i.e., the harmonic mean of precision and recall) reaching 0.97 on the test set.  相似文献   

5.
6.
Different versions of the Web Coverage Service (WCS) schemas of the Open Geospatial Consortium (OGC) reflect semantic conflict. When applying the extended FRAG-BASE schema-matching approach (a schema-matching method based on COMA++, including an improved schema decomposition algorithm and schema fragments identification algorithm, which enable COMA++-based support to OGC Web Service schema matching), the average recall of WCS schema matching is only 72%, average precision is only 82% and average overall is only 57%. To improve the quality of multi-version WCS retrieval, we propose a schema-matching method that measures node semantic similarity (NSS). The proposed method is based on WordNet, conjunctive normal form and a vector space model. A hybrid algorithm based on label meanings and annotations is designed to calculate the similarity between label concepts. We translate the semantic relationships between nodes into a propositional formula and verify the validity of this formula to confirm the semantic relationships. The algorithm first computes the label and node concepts and then calculates the conceptual relationship between the labels. Finally, the conceptual relationship between nodes is computed. We then use the NSS method in experiments on different versions of WCS. Results show that the average recall of WCS schema matching is greater than 83%; average precision reaches 92%; and average overall is 67%.  相似文献   

7.
Abstract

Map compilation, or conflation, is now being accomplished by computer. Interactive routines manipulate the graphic images of two different digital maps of the same region in order to permit map similarities and differences to be recognized more easily. Rubber-sheeting one or both of the maps permits an operator or the computer to align the maps in stages through methods of successive approximation and to review each new alignment. The computer recognizes matches using mathematical relations of geometric position and graph network configuration to test for feature matches and, when the tests are satisfied, corresponding features can be flagged automatically as matches or highlighted for review by the operator. Techniques and methods developed for conflation systems have important applications in other areas of automated cartography and in image processing and computer graphics  相似文献   

8.
Geometric conflation is the process undertaken to modify the coordinates of features in dataset A in order to match corresponding ones in dataset B. The overwhelming majority of the literature considers the use of points as features to define the transformation. In this article we present a procedure to consider one-dimensional curves also, which are commonly available as Global Navigation Satellite System (GNSS) tracks, routes, coastlines, and so on, in order to define the estimate of the displacements to be applied to each object in A. The procedure involves three steps, including the partial matching of corresponding curves, the computation of some analytical expression, and the addition of a correction term in order to satisfy basic cartographic rules. A numerical example is presented.  相似文献   

9.
The purpose of this project was to develop and test a methodology for determining the likelihood that mineral resource location records from two nationwide mineral resource information databases represent the same site. The long-term goal is to create a comprehensive database by merging the Mineral Resource Data System (MRDS) of the U.S. Geological Survey, and the Mineral Availability System/Mineral Industry Location System (MAS/MILS) of the U.S. Bureau of Mines (now part of the Geological Survey). Part of that process involves linking records for the same site from each database. Match probabilities were estimated using a logistic regression of mineral resource location attributes, derived from known matched (cross-referenced) and known unmatched randomly sampled mineral site pairs from within the conterminous United States (n=10,000). Model accuracy was assessed using a randomly sampled test dataset, not used in logistic model development (n=4,000). Probability distributions were similar between the development and test datasets. The overall agreement beyond chance was good for the test data set using the kappa statistic. Classification accuracy was 89.6% for known matched site pairs and 84.0% for known unmatched site pairs based on a probability threshold of 0.50 for a match. Distributions of attributes were similar between the development and test datasets. This classification method is a viable approach for estimating match probabilities between database records.  相似文献   

10.
When classical rough set (CRS) theory is used to analyze spatial data, there is an underlying assumption that objects in the universe are completely randomly distributed over space. However, this assumption conflicts with the actual situation of spatial data. Generally, spatial heterogeneity and spatial autocorrelation are two important characteristics of spatial data. These two characteristics are important information sources for improving the modeling accuracy of spatial data. This paper extends CRS theory by introducing spatial heterogeneity and spatial autocorrelation. This new extension adds spatial adjacency information into the information table. Many fundamental concepts in CRS theory, such as the indiscernibility relation, equivalent classes, and lower and upper approximations, are improved by adding spatial adjacency information into these concepts. Based on these fundamental concepts, a new reduct and an improved rule matching method are proposed. The new reduct incorporates spatial heterogeneity in selecting the feature subset which can preserve the local discriminant power of all features, and the new rule matching method uses spatial autocorrelation to improve the classification ability of rough set-based classifiers. Experimental results show that the proposed extension significantly increased classification or segmentation accuracy, and the spatial reduct required much less time than classical reduct.  相似文献   

11.
This paper proposes methods for detecting apparent differences between spatial tessellations at two different points in time, with the objective of conflation of spatial tessellations at multiple time points. The methods comprise three steps. First, we eliminate systematic differences between tessellations using the affine transformation. Second, we match subregions between tessellations at two points in time and match boundaries based on matching relationships between the subregions. Third, we propose a distance metric for measuring differences between the matched boundaries and a method for determining whether the measured differences are apparent or not. We apply the proposed methods to a part of the US Census data for 1990 and 2000 and empirically demonstrate the effectiveness of these methods.  相似文献   

12.
基于Logistic回归的CA模型改进方法——以广州市为例   总被引:7,自引:1,他引:6  
聂婷  肖荣波  王国恩  刘云亚 《地理研究》2010,29(10):1909-1919
基于Logistic回归的CA模型因其结构简单和数据要求相对较小的优势,被广泛应用于城市模拟领域,但数据的空间自相关性影响了模型机制挖掘与模拟精度。通过将影响城市发展演变的各种约束条件划分为强制和普通约束条件,以及运用主成分分析降低普通约束条件的数据相关性,构建了改进型Logistic回归CA模型,并在2000~2008年广州市城市增长模拟研究中进行应用。结果表明:与传统型Logistic回归CA模型相比,改进型Logistic回归CA模型在模型拟合度和精度上均有4%左右的提高。其中约束条件划分对非城市像元模拟精度约有6%的提高,对整体精度有3%的提高。更为重要的是,降低数据相关性后,Logistic回归CA模型对于城市扩展机制的解释更符合实际。本研究旨在寻求一种简单可行且易于构建的CA模型,探求城市发展机理,为城市规划管理提供更为准确的科学依据。  相似文献   

13.
This research compares the geographic information retrieval (GIR) performance of a set of logistic regression models with those of five non‐probabilistic methods that compute a spatial similarity score for a query–document pair. All methods are applied to a test collection of queries and documents indexed spatially by two convex conservative geometric approximations: the minimum bounding box (MBB) and the convex hull. In the comparison, the tested logistic regression models outperform, in terms of standard information retrieval recall and precision measures, all of the non‐probabilistic methods. The retrieval performance achieved by the logistic regression models on MBB approximations is similar to that achieved by the use of the non‐probabilistic methods on convex hulls. Although these results are valid only for the test collection used in this study, they suggest that a logistic regression approach to GIR provides an alternative to the use of higher‐quality geometric representations that are more difficult to obtain, implement, and process. Additionally, this research demonstrates the ability of a probabilistic approach to effectively incorporate information about geographic context in the spatial ranking process.  相似文献   

14.
Only few models for land-cover classification incorporated spectral data into ordinary logistic regression (OL model) in the Mt. Qomolangma (Everest) National Nature Preserve (QNNP) in China. In this study, spectral variables were incorporated into OL model and autologistic regression (AL) model to classify six main land covers. Twelve environmental variables and seven spectral variables of 10,000 stratified random sites in the QNNP were quantified and analyzed; OL model, AL model, OL model with spectral data (OLM model), and AL model with spectral data (ALM model) were estimated. The OLM and ALM models produced better estimates of regression coefficients and significantly improved model performance and overall accuracy for the grassland, sparsely vegetated land, and bare land compared with OL and AL models.  相似文献   

15.
王士博  王勇 《地理研究》2021,40(7):2102-2118
癌症已成为危害全球居民健康的重大民生问题,选取合适的空间插值方法分析小区域癌症数据的空间特征可对区域性癌症防控工作的有效开展提供依据。本研究以湖南省苏仙区2012和2016年以村为单位的肺癌死亡率数据为研究对象,以平均误差和均方根误差为评价指标,对反距离加权(IDW)、普通克里金(OK)、趋势面分析(TSA)、多元线性回归(MLR)与协同克里金(CK)五种典型空间插值方法进行精度效果对比及参数优选,并结合不同插值方法的优缺点,确定癌症数据的最优插值方法。结果表明:插值精度方面,CK法的均方根误差最小、插值精度最高,OK、IDW(幂值=1)和MLR次之,TSA(阶数=5)最低;插值效果方面,五种插值方法的实测值和预测值均显著相关,除CK外,其它四种方法均对死亡率低估程度较大,CK和OK插值结果的空间分布效果更好。同时考虑空间因素和影响因子的CK方法是小区域苏仙区2012年、2016年肺癌死亡率最优插值方法,应用该方法可对区域性癌症防控工作的有效开展提供最优的技术支撑。本论文的研究思路也可为小区域癌症数据空间插值方法及参数优选提供参考。  相似文献   

16.
Sketching as a natural mode for human communication and creative processes presents opportunities for improving human–computer interaction in geospatial information systems. However, to use a sketch map as user input, it must be localized within the underlying spatial data set of the information system, the base metric map. This can be achieved by a matching process called qualitative map alignment in which qualitative spatial representations of the two input maps are used to establish correspondences between each sketched object and one or more objects in the metric map. The challenge is that, to the best of our knowledge, no method for matching qualitative spatial representations suggested so far is applicable in realistic scenarios due to excessively long runtimes, incorrect algorithm design or the inability to use more than one spatial aspect at a time. We address these challenges with a metaheuristic algorithm which uses novel data structures to match qualitative spatial representations of a pair of maps. We present the design, data structures and performance evaluation of the algorithm using real-world sketch and metric maps as well as on synthetic data. Our algorithm is novel in two main aspects. Firstly, it employs a novel system of matrices known as local compatibility matrices, which facilitate the computation of estimates for the future size of a partial alignment and allow several types of constraints to be used at the same time. Secondly, the heuristic it computes has a higher accuracy than the state-of-the-art heuristic for this task, yet requires less computation. Our algorithm is also a general method for matching labelled graphs, a special case of which is the one involving complete graphs whose edges are labelled with spatial relations. The results of our evaluation demonstrate practical runtime performance and high solution quality.  相似文献   

17.
A novel generalized pattern search (GPS)-based cellular automata (GPS-CA) model was developed to simulate urban land-use change in a GIS environment. The model is built on a fitness function that computes the difference between the observed results produced from remote-sensing images and the simulated results produced by a general CA model. GPS optimization incorporating genetic algorithms (GAs) searches for the minimum difference, i.e. the smallest accumulated residuals, in fitting the CA transition rules. The CA coefficients captured by the GPS method have clear physical meanings that are closely associated with the dynamic mechanisms of land-use change. The GPS-CA model was applied to simulate urban land-use change in Kunshan City in the Yangtze River Delta from 2000 to 2015. The results show that the GPS method had a smaller root mean squared error (0.2821) than a logistic regression (LR) method (0.5256) in fitting the CA transition rules. The GPS-CA model thus outperformed the LR-CA model, with an overall accuracy improvement of 4.7%. As a result, the GPS-CA model should be a superior tool for modeling land-use change as well as predicting future scenarios in response to different conditions to support the sustainable urban development.  相似文献   

18.
This study describes the assessment of landslide susceptibility in Sicily (Italy) at a 1:100,000 scale using a multivariate logistic regression model. The model was implemented in a GIS environment by using the ArcSDM (Arc Spatial Data Modeller) module, modified to develop spatial prediction through regional data sets. A newly developed algorithm was used to automatically extract the detachment area from mapped landslide polygons. The following factors were selected as independent variables of the logistic regression model: slope gradient, lithology, land cover, a curve number derived index and a pluviometric anomaly index. The above-described configuration has been verified to be the best one among others employing from three to eight factors. All the regression coefficients and parameters were calculated using selected landslide training data sets. The results of the analysis were validated using an independent landslide data set. On an average, 82% of the area affected by instability and 79% of the not affected area were correctly classified by the model, which proved to be a useful tool for planners and decision-makers.  相似文献   

19.
Spatial optimization is complex because it usually involves numerous spatial factors and constraints. The optimization becomes more challenging if a large set of spatial data with fine resolutions are used. This article presents an agent-based model for optimal land allocation (AgentLA) by maximizing the total amount of land-use suitability and the compactness of patterns. The essence of the optimization is based on the collective efforts of agents for formulating the optimal patterns. A local and global search strategy is proposed to inform the agents to select the sites properly. Three sets of hypothetical data were first used to verify the optimization effects. AgentLA was then applied to the solution of the actual land allocation optimization problems in Panyu city in the Pearl River Delta. The study has demonstrated that the proposed method has better performance than the simulated annealing method for solving complex spatial optimization problems. Experiments also indicate that the proposed model can produce patterns that are very close to the global optimums.  相似文献   

20.
Integrating heterogeneous spatial data is a crucial problem for geographical information systems (GIS) applications. Previous studies mainly focus on the matching of heterogeneous road networks or heterogeneous polygonal data sets. Few literatures attempt to approach the problem of integrating the point of interest (POI) from volunteered geographic information (VGI) and professional road networks from official mapping agencies. Hence, the article proposes an approach for integrating VGI POIs and professional road networks. The proposed method first generates a POI connectivity graph by mining the linear cluster patterns from POIs. Secondly, the matching nodes between the POI connectivity graph and the associated road network are fulfilled by probabilistic relaxation and refined by a vector median filtering (VMF). Finally, POIs are aligned to the road network by an affine transformation according to the matching nodes. Experiments demonstrate that the proposed method integrates both the POIs from VGI and the POIs from official mapping agencies with the associated road networks effectively and validly, providing a promising solution for enriching professional road networks by integrating VGI POIs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号