首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于词嵌入的地理知识库实体类别对齐方法研究
引用本文:徐召华,诸云强,宋佳,孙凯,王曙.基于词嵌入的地理知识库实体类别对齐方法研究[J].地球信息科学,2021,23(8):1372-1381.
作者姓名:徐召华  诸云强  宋佳  孙凯  王曙
作者单位:1.山东理工大学建筑工程学院,淄博 2550002.中国科学院地理科学与资源研究所 资源与环境信息系统国家重点实验室,北京 1001013.江苏省地理信息资源开发与利用协同创新中心,南京 2100234.中国科学院大学,北京 100049
基金项目:国家自然科学基金面上项目(41771430);中国科学院战略性先导科技专项(A类)(XDA23100100);国家自然科学基金重点项目(41631177)
摘    要:地理知识库是地理实体及其相互间关系的集合,对于智能搜索、问答、推荐等知识服务有重要的支撑作用。然而,已有的地理知识库由于来源、形式、构建者等的不同,在实体地名、空间位置、类别等方面存在“同义异形”和“同形异义”的语义异构现象,影响了地理知识库间的知识融合与共享。语义对齐是解决语义异构的一种有效方法,其中实体类别对齐是语义对齐的基础,对于提高实体地名和空间位置的对齐精度具有重要作用。现有的实体类别对齐方法主要采用传统的字符相似度和结构相似度等来度量类别的相似度,无法捕捉实体类别深层次的语义相关性,从而影响了类别对齐的精确性。因此,本文提出了一种基于词嵌入的地理实体类别对齐方法,采用词嵌入模型从语料中学习实体类别的语义信息,并通过词向量来表达,以此弥补现有方法存在的缺失,进而提升实体对齐精度。进一步地,通过通用语料与地理信息语料的融合,本文实现了词嵌入模型所用语料在地理语义方面的增强,从而更精准地度量地理实体类别间的相关性。不同地理知识库实体类别对齐的实验表明,本文提出的方法能够有效捕捉地理实体类别的深层次语义信息,其实体类别对齐的调和平均值(Fl)可达0.9568,有效提高了实体类别的对齐精度。

关 键 词:地理知识库  语义异构  地理实体  实体类别  类别对齐  词嵌入  词向量  地理语料  相似度  
收稿时间:2020-09-29

Word Embedding-based Method for Entity Category Alignment of Geographic Knowledge Base
XU Zhaohua,ZHU Yunqiang,SONG Jia,SUN Kai,WANG Shu.Word Embedding-based Method for Entity Category Alignment of Geographic Knowledge Base[J].Geo-information Science,2021,23(8):1372-1381.
Authors:XU Zhaohua  ZHU Yunqiang  SONG Jia  SUN Kai  WANG Shu
Institution:1. School of Architecture Engineering, Shandong University of Technology, Zibo 255000, China2. State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources, Chinese Academy of Sciences, Beijing 100101, China3. Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing 210023, China4. University of Chinese Academy of Sciences, Beijing 100049, China
Abstract:Geographic knowledge base is a collection of geographic entities and the relationships between them, which plays an important role in many applications of knowledge services, such as intelligent search, question answering, and recommendation. However, due to the differences in the data source, data form, and publisher, the existing geographical knowledge bases have the problems of homonym and homographs in the place name, spatial footprint, and feature type. Thus it leads to a barrier of the knowledge sharing and fusion between different geographic knowledge bases. Semantic alignment is an effective way to solve semantic heterogeneity, and the alignment of feature types is very important to further improve the accuracy of the alignments of place names and spatial footprints. The existing methods of feature type alignment mainly rely on the traditional similarity measures of string and structure of feature types that are unable to capture their deep semantic correlation, thereby influencing the alignment accuracy. Therefore, this paper proposes a word embedding based method to align the feature type. The proposed method uses the word embedding model to learn the semantic information of feature type from the corpus and represent the learned information as a vector, so as to capture the deep semantic information of feature type which cannot be obtained by using the existing methods, thereby increasing the alignment accuracy. Meanwhile, this paper enhances the geographic semantics of the corpus by the combination of the corpus of geographic information and the general corpus used in the word embedding model, which can help to more accurately measure the correlation of feature types. In the case study, the method is applied to align the feature types of different geographic knowledge bases. The results show that the averageFl score is up to 0.9568, and indicates the method can effectively capture the deep semantic information of geographic feature types, effectively improving the alignment accuracy of entity categories.
Keywords:geospatial knowledge base  semantic heterogeneity  geographical entity  feature type  type alignment  word embedding  word vector  geographical corpus  similarity  
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《地球信息科学》浏览原始摘要信息
点击此处可从《地球信息科学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号