首页 | 本学科首页   官方微博 | 高级检索  
     


GeoCorpora: building a corpus to test and train microblog geoparsers
Authors:Jan Oliver Wallgrün  Morteza Karimzadeh  Alan M. MacEachren  Scott Pezanowski
Affiliation:1. Independent Researcher (Affiliated with GeoVISTA Center and ChoroPhronesis, Pennsylvania State University), Ahrensburg, Germany;2. Department of Geography, GeoVISTA Center, Pennsylvania State University, University Park, PA, USA
Abstract:In this article, we present the GeoCorpora corpus building framework and software tools as well as a geo-annotated Twitter corpus built with these tools to foster research and development in the areas of microblog/Twitter geoparsing and geographic information retrieval. The developed framework employs crowdsourcing and geovisual analytics to support the construction of large corpora of text in which the mentioned location entities are identified and geolocated to toponyms in existing geographical gazetteers. We describe how the approach has been applied to build a corpus of geo-annotated tweets that will be made freely available to the research community alongside this article to support the evaluation, comparison and training of geoparsers. Additionally, we report lessons learned related to corpus construction for geoparsing as well as insights about the notions of place and natural spatial language that we derive from application of the framework to building this corpus.
Keywords:Geoparsing  corpus building  microblogs  Twitter  geo-annotation  named entity recognition
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号