Computing the semantic similarity of geographic terms using volunteered lexical definitions |
| |
Authors: | Andrea Ballatore David C. Wilson Michela Bertolotto |
| |
Affiliation: | 1. School of Computer Science and Informatics, University College Dublin, Belfield, Dublin 4, Irelandandrea.ballatore@ucd.ie;3. Department of Software and Information Systems, University of North Carolina, University City Bouleward, Charlotte, NC, USA;4. School of Computer Science and Informatics, University College Dublin, Belfield, Dublin 4, Ireland |
| |
Abstract: | Volunteered geographic information (VGI) is generated by heterogenous ‘information communities’ that co-operate to produce reusable units of geographic knowledge. A consensual lexicon is a key factor to enable this open production model. Lexical definitions help demarcate the boundaries of terms, forming a thin semantic ground on which knowledge can travel. In VGI, lexical definitions often appear to be inconsistent, circular, noisy and highly idiosyncratic. Computing the semantic similarity of these ‘volunteered lexical definitions’ has a wide range of applications in GIScience, including information retrieval, data mining and information integration. This article describes a knowledge-based approach to quantify the semantic similarity of lexical definitions. Grounded in the recursive intuition that similar terms are described using similar terms, the approach relies on paraphrase-detection techniques and the lexical database WordNet. The cognitive plausibility of the approach is evaluated in the context of the OpenStreetMap (OSM) Semantic Network, obtaining high correlation with human judgements. Guidelines are provided for the practical usage of the approach. |
| |
Keywords: | lexical definitions semantic similarity volunteered geographic information crowdsourcing geo-semantics WordNet OpenStreetMap |
|
|