首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于伪语义相似度模型的中文地址匹配方法
引用本文:郁汀,王铎,陈钦.基于伪语义相似度模型的中文地址匹配方法[J].测绘通报,2022,0(3):101-106.
作者姓名:郁汀  王铎  陈钦
作者单位:1. 公安部第三研究所, 上海 200031;2. 复旦大学, 上海 200433
摘    要:地址匹配中,由于传统相似度模型受字符重叠数影响大,在处理简写、缩写地址要素单元时,错误匹配问题突出;深度学习方法需要大量样本支撑,但庞大的数据量和多样的形式,导致生成样本的成本过高。为解决上述问题,本文首先应用基于条件随机场和双向长短时记忆神经网络的模型,对地址进行分词;然后通过建立一种伪语义相似度,对地址要素进行分级匹配。通过对公安业务中地址数据进行测试,在对缩写、简写等不规范地址描述方面,本文模型能较理想地完成任务,各参考指标均高于0.9。

关 键 词:条件随机场和双向长短时记忆神经网络  地址要素解析  伪语义相似度  地址匹配  地址标准化  
收稿时间:2021-03-16
修稿时间:2022-01-21

A Chinese addresses matching method based on the pseudo-semantic model
YU Ting,WANG Duo,CHEN Qin.A Chinese addresses matching method based on the pseudo-semantic model[J].Bulletin of Surveying and Mapping,2022,0(3):101-106.
Authors:YU Ting  WANG Duo  CHEN Qin
Institution:1. The Third Research Institute of Ministry of Public Security, Shanghai 200031, China;2. Fudan University, Shanghai 200433, China
Abstract:Due to various ways to express the address element such as abbreviation and logogram,address matching is a difficult task specially in Chinese address matching.One important address matching method is relying on similarity.However,these traditional similarity methods focused on the overlap characters,and could not deal with the situation.The other crucial and useful method is based on deep learning technology,but it is difficult to generate a large amount of learning samples.In this paper,Bi-directional long short-term memory conditional random field is applied to achieve the goal of Chinese address segmentation.Then,a new similarity named pseudo-semantic is constructed to solve the problem of abbreviation and logogram.According to current results,the pseudosemantic similarity can provide better performance than other similarity models in the matching process and its recall and precision are both reaching 0.9 on the test set.The samples proved that the pseudo-semantic can recognize the abbreviation and logogram of address elements.
Keywords:BiLSTM-CRF  resolution of address elements  pseudo-semantic model  addresses matching  address standardization  
本文献已被 万方数据 等数据库收录!
点击此处可从《测绘通报》浏览原始摘要信息
点击此处可从《测绘通报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号