首页 | 本学科首页   官方微博 | 高级检索  
     


A comprehensive methodology for discovering semantic relationships among geospatial vocabularies using oceanographic data discovery as an example
Authors:Yongyao Jiang  Yun Li  Kai Liu  Edward M. Armstrong  Thomas Huang
Affiliation:1. NSF Spatiotemporal Innovation Center, Dept. of Geography &2. GeoInformation Science, George Mason University, Fairfax, VA, USA;3. Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA, USA
Abstract:It is challenging to find relevant data for research and development purposes in the geospatial big data era. One long-standing problem in data discovery is locating, assimilating and utilizing the semantic context for a given query. Most research in the geospatial domain has approached this problem in one of two ways: building a domain-specific ontology manually or discovering automatically, semantic relationships using metadata and machine learning techniques. The former relies on rich expert knowledge but is static, costly and labor intensive, whereas the second is automatic and prone to noise. An emerging trend in information science takes advantage of large-scale user search histories, which are dynamic but subject to user- and crawler-generated noise. Leveraging the benefits of these three approaches and avoiding their weaknesses, a novel methodology is proposed to (1) discover vocabulary-based semantic relationships from user search histories and clickstreams, (2) refine the similarity calculation methods from existing ontologies and (3) integrate the results of ontology, metadata, user search history and clickstream analysis to better determine their semantic relationships. An accuracy assessment by domain experts for the similarity values indicates an 83% overall accuracy for the top 10 related terms over randomly selected sample queries. This research functions as an example for building vocabulary-based semantic relationships for different geographical domains to improve various aspects of data discovery, including the accuracy of the vocabulary relationships of commonly used search terms.
Keywords:Query augmentation  semantic search  web mining  search history  click behavior  big data
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号