首页 | 本学科首页   官方微博 | 高级检索  
 共查询到19条相似文献,搜索用时 187 毫秒
中文文本的地理命名实体标注   总被引:1,自引:0,他引:1  
通过文本中地理信息的语义解析,可以帮助人们深入理解空间认知和空间语言的表达规律,解决自然语言与地理信息系统(GIS)之间的语义障碍问题,提升GIS空间查询、空间推理、地理信息检索和地理信息服务的智能化水平。制定标注体系和建立标注语料库,能够发现自然语言中地理信息描述的语言结构,建立它们的元数据。本文在分析中文文本和GIS中地理实体描述和表达机制差异的基础上,结合地理命名实体描述的语言特点,制定了中文文本的地理命名实体标注体系和标注规范,并以GATE(General Architecture for Text Engineering)作为标注平台,构建了基于《中国大百科全书中国地理》(简称“GeoCorpus”)的大规模标注语料库,较为有效地解决了当前相关标准和规模化标准数据匮乏的问题。  相似文献   

CH20121724中文文本的地理命名实体标注=Annotation of Geographical Named Entities in Chinese Text/张雪英,朱少楠,张春菊(南京师范大学虚拟地理环境教育部重点实验室)//测绘学报.-2012,41(1).-115~120地理信息的语义解析有效地解决自然语言与地理信息系统之间的语义障碍问题。在分析中文文本和地理信息系统中地理实体描述和表达机制差异的基础上,结合地理命名实体描述的语言特点,制定中文文本的地理命名实体标注体系和标注规范,并以GATE(General Architecture for Text Engineering)作为标注平台,构建基于《中国大  相似文献   

针对网络地图服务环境中的地理注记评价问题,对传统地理注记内容进行非结构化文本形式扩展,定义宽泛地理注记,用投放过程来模型化标注行为首先,依据词频分类方法确定地理注记文本内容分类,并根据Voronoi k阶邻近关系建立投放邻域的空间相关性定量收敛描述,进而结合地理注记文本与地理注记存在邻域构建基于地理注记类型与类型转化的投放模型实验结果表明,在已知两种类型地理注记集的实际情况下,投放模型能够有效地对新增地理注记进行合理性评价.  相似文献   

针对网络数据源的分类标准与我国分类标准不同且相互独立这一情况,该文提出了一种顾及权重的地图中地名自动分组标注算法:依据图层要素的类别属性在知识库中查找最符合的权重知识,将地物标注分组;对权重最高的组进行标注叠加分析、最优化分析,进而达到最优化标注;将最优化的标注与次级权重高的组进行叠加分析、最优化分析,进而达到次级组的最优化标注;循环此步骤,直到末级组的最优化标注。依据此方法能够自动实现对集成数据中同一层具有不同权重地物的差别化标注,最后结合中国地名分类代码和OSM分类代码实现了该算法。  相似文献   

线状要素是道路、管线、界限等实体的抽象表达,是地图显示中最基本的组成部分。对线状要素进行标注能帮助人们直观地了解对象的地理特征。标注算法是GIS软件的基本功能,过自适应标注算法实现了对任意形态线状要素的自动标注,方法简单有效。  相似文献   

余丽  陆锋  刘希亮 《测绘学报》2016,45(5):616-622
从网络文本中抽取地理实体间空间关系和语义关系要求高时效性和强鲁棒性。本文提出一种开放式地理实体关系的自动抽取方法,通过bootstrapping技术统计词语的词性、位置和距离特征来计算语境中词语权值,据此确定描述地理实体关系的关键词,最终组织成结构化实例,并使用百度百科和Stanford CoreNLP开展了试验。研究结果表明,本文方法能自动挖掘自然语言的部分词法特征,无须领域专家知识和大规模标注语料,适用于未知关系类型的信息抽取任务;较之经典的Frequency、TFIDF和PPMI频率统计方法,精度和召回率分别提升约5%和23%。  相似文献   

空间关系地址模型及其表达模式分析   总被引:1,自引:0,他引:1  
地址模型是进行地址解析和地址匹配的基础。针对非规范中文地址,在层级地址模型、有限自动机模型的基础上,提出空间关系地址模型,并分析该模型的优势;阐述空间关系地址模型的结构,分析地址中空间关系的表达,将地址中的空间关系划分为包含关系、相交关系、模糊偏移关系、方向关系和距离关系;按照该模型设计相应的地址标注体系和规范,对按照该规范标注的地址标注语料进行基于HashTable地址模型树统计分析,分析顾及空间关系的中文地址的常用表达模式。  相似文献   

为克服依赖常识人工归纳总结的局限性,采用定性和定量相结合的方式,自动构建空间关系词汇与地理要素类型的语义约束关系。首先,以《中国大百科全书(地理版)》空间关系标注语料库为基础数据,采用Overlap语义相关度计量方法,挖掘空间关系词汇与地理要素类型约束关系的先验知识;然后以地理要素分类体系提供的概念语义相关性对先验知识进行扩展;最后采用本体工具Protégé建立相关知识库。  相似文献   

在互联网迅速发展的现代化信息社会,大量地理信息都以非结构化的文本形式存在,而地名识别是挖掘这些地理信息的重要基础。目前已有的地名识别方法主要是从自然语言处理的角度来实现,并没有充分考虑到地名的构成和使用习惯等特征,造成识别率偏低或过拟合等问题。本文引入语言学相关知识,分析中文地名用字特征,在传统的地名专名+通名的结构上,更细致地划分地名的词素类型,总结归纳各词素类型的特征,将这些特征融入条件随机场的方法中,使地名识别问题转化为序列标注问题。并根据中文地名的特征,制定形式化规则,设计基于字的标注规范。在此基础上,设计中文地名特征模板,通过条件随机场模型训练和预测,识别自然语言文本中的中文地名。采用170万字的人民日报标注语料进行实验验证,结果表明本文方法对中文地名识别的召回率、准确率和F值分别达到92.69%、96.73%和94.67%,优于已有研究成果,能为地理信息科学领域的研究和应用提供更有效的地名服务。  相似文献   

“数字湖北”中文地理编码数据库建设与服务共享   总被引:1,自引:1,他引:0  
针对"数字湖北"地理空间框架建设,探讨了适合于湖北省省情的中文地理编码技术,将湖北省现有的地址实体进行空间化、规范化,建立标准化的地址数据库,并实现了湖北省中文地理编码服务共享。以标准地址数据库为基础,通过发布可供标准REST接口调用的中文地理编码服务来满足用户的需求。  相似文献   

Introduction Inmanyapplications(CADandgeographical informationsystem,GIS),dataaremanagedby spatialdatabases,whichstorepoint,line,region objectsandspatialrelationsbetweenthem.Spa tialrelations(topologicalrelations,directionre lationsandapproximatedistancer…  相似文献   

Spatial relations are frequently described and used in natural language texts, and relations play a core role in a range of applications—from supporting geographic information retrieval in natural language texts to locating people and objects in natural disaster response situations. In this article, we present a neuro-net spatial extraction model (NeuroSPE) designed to address various language irregularities (i.e., a variety of sentence structures) that occur in natural language texts. We also propose a two-stage workflow to generate a training dataset based on a collection of words and their associated frequencies. The first stage of the proposed workflow focuses on processing the words in the input data and their associated frequencies; then, the words are segmented into a set of groups and used to accelerate model training. The second stage automatically generates a variety of sentences that include two geographic entities and related spatial relation terms through deep learning iteration based on a unigram language model. We evaluate our method both qualitatively and quantitatively using a real dataset. The experimental results demonstrate that the proposed two-stage workflow effectively extracts spatial relations from natural language texts and outperforms other current state-of-the-art approaches.  相似文献   

Because SQL for querying data from spatial databases is ineffective, the query based on natural or visual language becomes an attractive research field gradually. However, how to define and represent natural languages related to spatial data are still gigantic problems. Because existing models of direction relations can't describe by use of some common concepts. First of all, detailed direction relations are proposed to describe the directions related to the interior of spatial objects, such as “east part of a region”, “east boundary of a region”, and so on. Secondly, by integrating the detailed directions with exterior direction relations and topological relations, several NLSRs are defined, such as “a road goes across the east part of a lake”, “a river goes along the east boundary of a province”, etc. Finally, based on the NLSRs abovementioned, a natural spatial query language (NSQL) is formed to retrieve data from spatial databases.  相似文献   

在分析传统地理信息系统理论主要缺陷的基础上,将语言学引入地理信息系统,提出了数字环境下空间信息结构与表达的认识模型——空间信息的语言学模型,探讨了空间信息的语言学特征,主要是语言单位、语法及语义机制,并就其对地理信息系统各环节特别是空间信息的智能化数据处理所产生的影响进行了讨论。  相似文献   

Social media messages, such as tweets, are frequently used by people during natural disasters to share real‐time information and to report incidents. Within these messages, geographic locations are often described. Accurate recognition and geolocation of these locations are critical for reaching those in need. This article focuses on the first part of this process, namely recognizing locations from social media messages. While general named entity recognition tools are often used to recognize locations, their performance is limited due to the various language irregularities associated with social media text, such as informal sentence structures, inconsistent letter cases, name abbreviations, and misspellings. We present NeuroTPR, which is a Neuro‐net ToPonym Recognition model designed specifically with these linguistic irregularities in mind. Our approach extends a general bidirectional recurrent neural network model with a number of features designed to address the task of location recognition in social media messages. We also propose an automatic workflow for generating annotated data sets from Wikipedia articles for training toponym recognition models. We demonstrate NeuroTPR by applying it to three test data sets, including a Twitter data set from Hurricane Harvey, and comparing its performance with those of six baseline models.  相似文献   

随着互联网技术的飞速发展,基于网络地图的空间数据搜索成为人们获取空间信息的重要手段。文章分析了当前地图搜索的不足和瓶颈,阐述了其在处理空间语义方面的缺陷,提出了一种基于Solr的空间数据语义搜索方案:将全文检索引擎Solr应用到空间数据搜索中;同时,引入自然语言处理和本体技术,实现基于自然语言查询的空间数据语义搜索。最后建立原型系统进行验证,证明了该方案的可行性和有效性。  相似文献   

A spatial tessellation is a set of regions that are collectively exhaustive and mutually exclusive except for the boundaries. In geographical analysis, it may represent such administrative units as census tracts, postal zones, and electoral or school districts. Spatial tessellations of a certain area are often closely related to each other. Areas of local communities are related to school districts, market areas of retail stores, and administrative units. Postal zones and census tracts are determined by collecting or dividing administrative units. Analysis of such relations among tessellations often reveals their underlying spatial phenomena. To this end, this paper proposes a new exploratory method for analyzing the relations among spatial tessellations. It aims to detect spatial patterns, especially those with a hierarchical structure, and to provide a tessellation classification scheme. Topological relations and similarity measures are introduced to evaluate the relations between tessellation pair. For more tessellations, tree representations are proposed. These not only visualize relations, but also provide a means of classifying tessellations. The method is applied to the analysis of two sets of spatial tessellations: one with five hypothetical tessellations, and another with 34 candidate plans for the new Doshusei administrative system in Japan. The application reveals the properties of the method and quantitative measures used in analysis.  相似文献   

文章以空间信息与语言同为人类进行空间相关交流手段这一功能隐喻为基础,总结地图学、地理信息科学、知识工程等学科与语言学模型相关的理论研究成果,探讨空间信息语言学模型对地理信息科学发展的理论意义,包括学科范式观点、本体论观点、方法论观点和定性观点等方面的意义。  相似文献   

路径描述是日常生活中常见的地理空间信息应用方式,涉及到自然语言与地图语言的表达及转换关系。本文采用认知实验的方法,对自然语言的路径描述及其认知过程进行分析,研究了路径认知与表达上所涉及的空间关系,总结了利用自然语言表达地图空间信息的规律和习惯用法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号