首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 321 毫秒
1.
Reverse geocoding, which transforms machine‐readable GPS coordinates into human‐readable location information, is widely used in a variety of location‐based services and analysis. The output quality of reverse geocoding is critical because it can greatly impact these services provided to end‐users. We argue that the output of reverse geocoding should be spatially close to and topologically correct with respect to the input coordinates, contain multiple suggestions ranked by a uniform standard, and incorporate GPS uncertainties. However, existing reverse geocoding systems often fail to fulfill these aims. To further improve the reverse geocoding process, we propose a probabilistic framework that includes: (1) a new workflow that can adapt all existing address models and unitizes distance and topology relations among retrieved reference data for candidate selections; (2) an advanced scoring mechanism that quantifies characteristics of the entire workflow and orders candidates according to their likelihood of being the best candidate; and (3) a novel algorithm that derives statistical surfaces for input GPS uncertainties and propagates such uncertainties into final output lists. The efficiency of the proposed approaches is demonstrated through comparisons to the four commercial reverse geocoding systems and through human judgments. We envision that more advanced reverse geocoding output ranking algorithms specific to different application scenarios can be built upon this work.  相似文献   

2.
Record linkage is a frequent obstacle to unlocking the benefits of integrated (spatial) data sources. In the absence of unique identifiers to directly join records, practitioners often rely on text‐based approaches for resolving candidate pairs of records to a match. In geographic information science, spatial record linkage is a form of geocoding that pertains to the resolution of text‐based linkage between pairs of addresses into matches and non‐matches. These approaches link text‐based address sequences, integrating sources of data that would otherwise remain in isolation. While recent innovations in machine learning have been introduced in the wider record linkage literature, there is significant potential to apply machine learning to the address matching sub‐field of geographic information science. As a response, this paper introduces two recent developments in text‐based machine learning—conditional random fields and word2vec—that have not been applied to address matching, evaluating their comparative strengths and drawbacks.  相似文献   

3.
随着互联网应用的发展,所产生的非结构化文本大多与地理位置相关联,因此,地理信息检索(GIR)成为当前GIS和IR领域研究的热点。文本地理编码是建立文本与地理位置坐标对应关系的过程,是实现GIR的基础。本文对文本地理编码涉及的地理实体识别、地理实体消歧、文本位置聚焦、区域语言建模等关键技术进行分类总结,提出了该领域未来研究工作和面临的挑战,为文本地理编码进一步相关研究提供新思路。  相似文献   

4.
Often, we are faced with questions regarding past events and the answers are hidden in the historical text archives. The growing developments in geographic information retrieval and temporal information retrieval techniques have given new ways to explore digital text archives for spatio‐temporal data. The question is how to retrieve the answers from the text documents. This work contributes to a better understanding of spatio‐temporal information extraction from text documents. Natural language processing techniques were used to develop an information extraction approach using the GATE language processing software. The developed framework uses gazetteer matching, spatio‐temporal relationship extraction and pattern‐based rules to recognize and annotate elements in historical text documents. The extracted spatio‐temporal data is used as input for GIS studies on the time–geography context of the German–Herero resistance war of 1904 in Namibia. Related issues when analyzing the historical data in current GIS are discussed. Particularly problematic are movement data in small scale with poor temporal density and trajectories that are short or connect very distant locations.  相似文献   

5.
Gazetteers are instrumental in recognizing place names in documents such as Web pages, news, and social media messages. However, creating and maintaining gazetteers is still a complex task. Even though some online gazetteers provide rich sets of geographic names in planetary scale (e.g. GeoNames), other sources must be used to recognize references to urban locations, such as street names, neighborhood names or landmarks. We propose integrating Linked Data sources to create a gazetteer that combines a broad coverage of places with urban detail, including content on geographic and semantic relationships involving places, their multiple names and related non‐geographic entities. Our final goal is to expand the possibilities for recognizing, disambiguating and filtering references to places in texts for geographic information retrieval (GIR) and related applications. The resulting ontological gazetteer, named LoG (Linked OntoGazetteer), is accessible through Web services by applications and research initiatives on GIR, text processing, named entity recognition and others. The gazetteer currently contains over 13 million places, 140 million attributes and relationships, and 4.5 million non‐geographic entities. Data sources include GeoNames, Freebase, DBPedia and LinkedGeoData, which is based on OpenStreetMap data. An analysis on how these datasets overlap and complement one another is also presented.  相似文献   

6.
Using geographic information systems to link administrative databases with demographic, social, and environmental data allows researchers to use spatial approaches to explore relationships between exposures and health. Traditionally, spatial analysis in public health has focused on the county, ZIP code, or tract level because of limitations to geocoding at highly resolved scales. Using 2005 birth and death data from North Carolina, we examine our ability to geocode population‐level datasets at three spatial resolutions – zip code, street, and parcel. We achieve high geocoding rates at all three resolutions, with statewide street geocoding rates of 88.0% for births and 93.2% for deaths. We observe differences in geocoding rates across demographics and health outcomes, with lower geocoding rates in disadvantaged populations and the most dramatic differences occurring across the urban‐rural spectrum. Our results suggest that highly resolved spatial data architectures for population‐level datasets are viable through geocoding individual street addresses. We recommend routinely geocoding administrative datasets to the highest spatial resolution feasible, allowing public health researchers to choose the spatial resolution used in analysis based on an understanding of the spatial dimensions of the health outcomes and exposures being investigated. Such research, however, must acknowledge how disparate geocoding success across subpopulations may affect findings.  相似文献   

7.
Geocoding has become a routine task for many research investigations to conduct spatial analysis. However, the output quality of geocoding systems is found to impact the conclusions of subsequent studies that employ this workflow. The published development of geocoding systems has been limited to the same set of interpolation methods and reference data sets for quite some time. We introduce a novel geocoding approach utilizing object detection on remotely sensed imagery based on a deep learning framework to generate rooftop geocoding output. This allows geocoding systems to use and output exact building locations without employing typical geocoding interpolation methods or being completely limited by the availability of reference data sets. The utility of the proposed approach is demonstrated over a sample of 22,481 addresses resulting in significant spatial error reduction and match rates comparable to typical geocoding methods. For different land‐use types, our approach performs better on low‐density residential and commercial addresses than on high‐density residential addresses. With appropriate model setup and training, the proposed approach can be extended to search different object locations and to generate new address and point‐of‐interest reference data sets.  相似文献   

8.
Delineated built‐up areas may be used for applications such as navigation, database enrichment and the identification of urban sprawl. As more road network data have become available, many studies have considered using road network data to delineate built‐up areas. This study investigated the three existing approaches to delineating built‐up areas on a map: the grid‐based approach, kernel density analysis and an approach based on street blocks. These approaches are evaluated and compared from three angles. First, two measures were proposed to quantitatively evaluate the land area of the delineated built‐up areas; second, a questionnaire was designed to visually compare the representations of the delineated built‐up areas; and, third, the time complexity of using each approach was tested. The three approaches were applied to different sets of road network data for New Zealand; data from buildings and residential areas were used as benchmarks. The results showed that: (1) in a quantitative assessment, both the grid‐based approach and kernel density analysis can usually detect more built‐up areas than the approach based on street blocks; (2) on visual inspection, most of the students who completed the questionnaire performed the representations using the approach based on street blocks; and (3) in time complexity, the approach based on street blocks always takes the least time.  相似文献   

9.
Geocoding urban addresses usually requires the use of an underlying address database. Under the influence of the format defined for TIGER files decades ago, most address databases and street geocoding algorithms are organized around street centerlines, associating numbering ranges to thoroughfare segments between two street crossings. While this method has been successfully employed in the USA for a long time, its transposition to other countries may lead to increased errors. This article presents an evaluation of the centerline‐geocoding resources provided by Google Maps, as compared to the point‐geocoding method used in the city of Belo Horizonte, Brazil, which we took as a baseline. We generated a textual address for each point object found in the city's point‐based address database, and submitted it to the Google Maps geocoding API. We then compared the resulting coordinates with the ones recorded in Belo Horizonte's GIS. We demonstrate that the centerline segment interpolation method, employed by the online resources following the American practice, has problems that can considerably influence the quality of the geocoding outcome. Completeness and accuracy have been found to be irregular, especially within lower income areas. Such errors in online services can have a significant impact on geocoding efforts related to social applications, such as public health and education, since the online service can be faulty and error‐prone in the most socially demanding areas of the city. In the conclusion, we point out that a volunteered geographic information (VGI) approach can help with the enrichment and enhancement of current geocoding resources, and can possibly lead to their transformation into more reliable point‐based geocoding services.  相似文献   

10.
Place is a concept that is fundamental to how we orientate and communicate space in our everyday lives. Crowdsourced social media data present a valuable opportunity to develop bottom‐up inferences of places that are integral to social activities and settings. Conventional location‐led approaches use a predefined spatial unit to associate data and space with places, which cannot capture the richness of urban places (i.e., spatial extents and their dynamic functions). This article develops a name‐led framework to overcome these limitations in using social media data to study urban places. The framework first derives place names from georeferenced Twitter data combining text mining and spatial point pattern analysis, then estimates the spatial extents by spatial clustering, and further extracts their dynamic functions with time, which makes up a complete place profile. The framework is tested on a case study in Camden Borough, London and the results are evaluated through comparisons to the Foursquare point of interest data. This name‐led approach enables the shift from space‐based analysis to place‐based analysis of urban space.  相似文献   

11.
Exposure to traffic‐related pollutants is associated with both morbidity and mortality. Because vehicle‐exhaust are highly localized, within a few hundred meters of heavily traveled roadways, highly accurate spatial data are critical in studies concerned with exposure to vehicle emissions. We compared the positional accuracy of a widely used U.S. Geological Survey (USGS) roadway network containing traffic activity data versus a global positioning system (GPS)‐validated road network without traffic information; developed a geographical information system (GIS)‐based methodology for producing improved roadway data associated with traffic activities; evaluated errors from geocoding processes; and used the CALINE4 dispersion model to demonstrate potential exposure misclassifications due to inaccurate roadway data or incorrectly geocoded addresses. The GIS‐based algorithm we developed was effective in transferring vehicle activity information from the less accurate USGS roadway network to a GPS‐accurate road network, with a match rate exceeding 95%. Large discrepancies, up to hundreds of meters, were found between the two roadway networks, with the GPS‐validated network having higher spatial accuracy. In addition, identifying and correcting errors associated with geocoding resulted in improved address matching. We demonstrated that discrepancies in roadway geometry and geocoding errors, can lead to serious exposure misclassifications, up to an order of magnitude in assigned pollutant concentrations.  相似文献   

12.
Initiated by the University Consortium of Geographic Information Science (UCGIS), the GIS&T Body of Knowledge (BoK) is a community-driven endeavor to define, develop, and document geospatial topics related to geographic information science and technologies (GIS&T). In recent years, GIS&T BoK has undergone rigorous development in terms of its topic re-organization and content updating, resulting in a new digital version of the project. While the BoK topics provide useful materials for researchers and students to learn about GIS, the semantic relationships among the topics, such as semantic similarity, should also be identified so that a better and automated topic navigation can be achieved. Currently, the related topics are either defined manually by editors or authors, which may result in an incomplete assessment of topic relationships. To address this challenge, our research evaluates the effectiveness of multiple natural language processing (NLP) techniques in extracting semantics from text, including both deep neural networks and traditional machine learning approaches. Besides, a novel text summarization—KACERS (Keyword-Aware Cross-Encoder-Ranking Summarizer)—is proposed to generate a semantic summary of scientific publications. By identifying the semantic linkages among key topics, this work guides the future development and content organization of the GIS&T BoK project. It also offers a new perspective on the use of machine learning techniques for analyzing scientific publications and demonstrates the potential of the KACERS summarizer in semantic understanding of long text documents.  相似文献   

13.
Integrating data on health outcomes with methods of disease mapping and spatially explicit models of environmental contaminants are important aspects of environmental health surveillance. In this article, we describe a modular, web‐based spatial analysis system that uses GIS, spatial analysis methods and software services delivered over computer networks to achieve this end. The Environmental Health Surveillance System (EHSS) is a prototype system that is designed to serve three purposes: a secure environment for producing maps of disease outcomes from individual‐level data while preserving privacy; an automated process of linking environmental data, environmental models, and GIS tasks like geocoding for the purposes of estimating individual exposures to environmental contaminants; and mechanisms to visualize the spatial patterns of disease outcomes via Web‐based mapping interfaces and interactive tools like Google Earth.  相似文献   

14.
15.
GeoTxt: A scalable geoparsing system for unstructured text geolocation   总被引:1,自引:0,他引:1  
In this article we present GeoTxt, a scalable geoparsing system for the recognition and geolocation of place names in unstructured text. GeoTxt offers six named entity recognition (NER) algorithms for place name recognition, and utilizes an enterprise search engine for the indexing, ranking, and retrieval of toponyms, enabling scalable geoparsing for streaming text. GeoTxt offers a flexible application programming interface (API), allowing for customized attribute and/or spatial ranking of retrieved toponyms. We evaluate the system on a corpus of manually geo‐annotated tweets. First, we benchmark the performance of the six NERs that GeoTxt provides access to. Second, we assess GeoTxt toponym resolution accuracy incrementally, demonstrating improvements in toponym resolution achieved (or not achieved) by adding specific heuristics and disambiguation methods. Compared to using the GeoNames web service, GeoTxt's toponym resolution demonstrates a 20% accuracy gain. Our results show that places mentioned in the same tweet do not tend to be geographically proximate.  相似文献   

16.
The widespread use of Internet-based mapping and geospatial analysis has caused an increase in the demand for online geocoding services. Although such services provide convenience, low (or free) cost and immediate solutions, their characteristics, sometimes, overshadow the expectation of producing quality of geocoded results. In recent years, several geocoding techniques have emerged, including rooftop geocoding, but they have yet to receive much attention in the literature. This paper examines and compares the quality of online rooftop and street geocoding services based on match rates and positional accuracy. Six geocoding services by five providers (i.e., Microsoft Virtual Earth, Google, Geocoder.us, MapQuest, and Yahoo!) were evaluated using addresses in Allegheny County, Pennsylvania. Results of the comparison indicate that rooftop geocoding produces slightly lower match rates but significantly higher positional accuracy than street geocoding. The hybrid service, which combines the two techniques, produces match rates as high as other street geocoding services but improves in positional accuracy close to the level of rooftop geocoding. Geocoding services employing reference databases with similar quality trend to produce compatible match rates and positional accuracy. This paper examines the sensitivity of different address types on geocoding quality. The results reveal that both rooftop and street geocoding produce high match rates and high accuracy for residential addresses. However, positional accuracies of agricultural and industrial address types are not very reliable due to the small sample sizes. With these, it is recommended to use online rooftop geocoding services if high positional accuracy is the priority, use street geocoding if high match rate is the priority, and use the hybrid approach if both high match rates and high positional accuracy are required.  相似文献   

17.
Today, many services that can geocode addresses are available to domain scientists and researchers, software developers, and end‐users. For a number of reasons, including quality of reference database and interpolation technique, a given address geocoded by different services does not often result in the same location. Considering that there are many widely available and accessible geocoding services and that each geocoding service may utilize a different reference database and interpolation technique, selecting a suitable geocoding service that meets the requirements of any application or user is a challenging task. This is especially true for online geocoding services which are often used as black boxes and do not provide knowledge about the reference databases and the interpolation techniques they employ. In this article, we present a geocoding recommender algorithm that can recommend optimal online geocoding services by realizing the characteristics (positional accuracy and match rate) of the services and preferences of the user and/or their application. The algorithm is simulated and analyzed using six popular online geocoding services for different address types (agricultural, commercial, industrial, residential) and preferences (match rate, positional accuracy).  相似文献   

18.
地理编码系统中地址匹配引擎的设计与实现   总被引:4,自引:0,他引:4  
分析了地址匹配流程和几种模糊检索技术,选用了全文检索引擎包Lucene设计了地址匹配引擎,实现了基于XML Web Services的地理编码系统。  相似文献   

19.
XML是下一代先进的网络标记语言,SVG是一种以XML为基础的开放标准的文本式矢量图形描述语言,在W ebG IS中使用SVG可以描述各类的高质量矢量图形,通过DOM技术可对XML/SVG数据文档进行各种操作。本文通过创建DOM对象,实现对SVG数据文档的访问、查询以及对文档标记属性的添加和标记的增删,从而实现XML/SVG文档数据管理和W ebG IS地图操作功能。  相似文献   

20.
地理编码系统中地名地址分词算法研究   总被引:4,自引:0,他引:4  
张林曼  吴升 《测绘科学》2010,35(2):46-48
对中文分词有关技术进行分析和研究,结合地理编码的特点,设计了基于双字哈希和数组的三层组合分词词典数据结构,以及逐次增字的最大正向分词算法。并通过构建和遍历地名通名词典,识别地名地址中未登录词。测试表明,本文算法速度和效果较好,解决了地理信息系统中中文分词的相关问题。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号