首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Learning to combine multiple string similarity metrics for effective toponym matching
Authors:Rui Santos  Patricia Murrieta-Flores  Bruno Martins
Institution:1. Instituto Superior Técnico and INESC-ID, University of Lisbon, Lisbon, Portugal;2. Digital Humanities Research Center, University of Chester, Chester, UK
Abstract:Several tasks related to geographical information retrieval and to the geographical information sciences involve toponym matching, that is, the problem of matching place names that share a common referent. In this article, we present the results of a wide-ranging evaluation on the performance of different string similarity metrics over the toponym matching task. We also report on experiments involving the usage of supervised machine learning for combining multiple similarity metrics, which has the natural advantage of avoiding the manual tuning of similarity thresholds. Experiments with a very large dataset show that the performance differences for the individual similarity metrics are relatively small, and that carefully tuning the similarity threshold is important for achieving good results. The methods based on supervised machine learning, particularly when considering ensembles of decision trees, can achieve good results on this task, significantly outperforming the individual similarity metrics.
Keywords:Toponym matching  supervised learning  string similarity metrics  duplicate detection  ensemble learning  geographic information retrieval
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号