首页 | 本学科首页   官方微博 | 高级检索  
     检索      

开放式地理实体关系抽取的Bootstrapping方法
引用本文:余丽,陆锋,刘希亮.开放式地理实体关系抽取的Bootstrapping方法[J].测绘学报,2016,45(5):616-622.
作者姓名:余丽  陆锋  刘希亮
作者单位:1. 中国科学院地理科学与资源研究所资源与环境信息系统国家重点实验室, 北京 100101;2. 中国科学院大学, 北京 100101;3. 江苏省地理信息资源开发与利用协同创新中心, 江苏 南京 210023
基金项目:国家自然科学基金(41271408),国家863 计划(2013AA120305) The National Natural Science Foundation of China(41271408),The National High-Tech Research and Development Program of China (863 Program)(2013AA120305)
摘    要:从网络文本中抽取地理实体间空间关系和语义关系要求高时效性和强鲁棒性。本文提出一种开放式地理实体关系的自动抽取方法,通过bootstrapping技术统计词语的词性、位置和距离特征来计算语境中词语权值,据此确定描述地理实体关系的关键词,最终组织成结构化实例,并使用百度百科和Stanford CoreNLP开展了试验。研究结果表明,本文方法能自动挖掘自然语言的部分词法特征,无须领域专家知识和大规模标注语料,适用于未知关系类型的信息抽取任务;较之经典的Frequency、TFIDF和PPMI频率统计方法,精度和召回率分别提升约5%和23%。

关 键 词:文本挖掘  地理实体  关系抽取  定量评价  bootstrapping  
收稿时间:2015-04-07
修稿时间:2016-02-02

A Bootstrapping Based Approach for Open Geo-entity Relation Extraction
YU Li,LU Feng,LIU Xiliang.A Bootstrapping Based Approach for Open Geo-entity Relation Extraction[J].Acta Geodaetica et Cartographica Sinica,2016,45(5):616-622.
Authors:YU Li  LU Feng  LIU Xiliang
Institution:1. State Key Lab of Resources and Environmental Information System, The Institute of Geographic Sciences and Natural Resources Research, Beijing 100101, China;2. University of Chinese Academy of Sciences, Beijing 100101, China;3. Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing 210023, ChinaAbstract
Abstract:Extracting spatial relations and semantic relations between two geo-entities from Web texts, asks robust and effective solutions.This paper puts forward a novel approach:firstly,the characteristics of terms (part-of-speech,position and distance)are analyzed by means of bootstrapping.Secondly,the weight of each term is calculated and the keyword is picked out as the clue of geo-entity relations.Thirdly, the geo-entity pairs and their keywords are organized into structured information.Finally,an experiment is conducted with Baidubaike and Stanford CoreNLP.The study shows that the presented method can automatically explore part of the lexical features and find additional relational terms which neither the domain expert knowledge nor large scale corpora need.Moreover,compared with three classical frequency statistics methods,namely Frequency,TF-IDF and PPMI,the precision and recall are improved about 5%and 23% respectively.
Keywords:text mining  geo-entities  relation extraction  quantitative evaluation  bootstrapping
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《测绘学报》浏览原始摘要信息
点击此处可从《测绘学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号