首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Thematic signatures for cleansing and enriching place-related linked data
Authors:Benjamin Adams  Krzysztof Janowicz
Institution:1. Department of Computer Science, Centre for eResearch, The University of Auckland, Auckland, New Zealandb.adams@auckland.ac.nz;3. Department of Geography, University of California, Santa Barbara, CA, USA
Abstract:There has been significant progress transforming semi-structured data about places into knowledge graphs that can be used in a wide variety of geographic information systems such as digital gazetteers or geographic information retrieval systems. For instance, in addition to information about events, actors, and objects, DBpedia contains data about hundreds of thousands of places from Wikipedia and publishes it as Linked Data. Repositories that store data about places are among the most interlinked hubs on the Linked Data cloud. However, most content about places resides in unstructured natural language text, and therefore it is not captured in these knowledge graphs. Instead, place representations are limited to facts such as their population counts, geographic locations, and relations to other entities, for example, headquarters of companies or historical figures. In this paper, we present a novel method to enrich the information stored about places in knowledge graphs using thematic signatures that are derived from unstructured text through the process of topic modeling. As proof of concept, we demonstrate that this enables the automatic categorization of articles into place types defined in the DBpedia ontology (e.g., mountain) and also provides a mechanism to infer relationships between place types that are not captured in existing ontologies. This method can also be used to uncover miscategorized places, which is a common problem arising from the automatic lifting of unstructured and semi-structured data.
Keywords:geographic knowledge discovery  ontology  semantics
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号