首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于Phoenix的地理空间大数据管理系统
引用本文:陈勉,李龙海,谢鹏,付少锋,何列松,周校东.基于Phoenix的地理空间大数据管理系统[J].武汉大学学报(信息科学版),2020(5):719-727.
作者姓名:陈勉  李龙海  谢鹏  付少锋  何列松  周校东
作者单位:西安电子科技大学计算机科学与技术学院;西安测绘研究所
基金项目:地理信息工程国家重点实验室开放基金(SKLGIE2014-M-4-1);国家自然科学基金(41301527)。
摘    要:NoSQL数据库HBase已被众多应用系统作为存储和管理海量数据的解决方案,但HBase并未提供对地理空间数据的直接支持,因此提出了名为GS-Phoenix的地理空间大数据管理系统,GS-Phoenix构建在开源项目Phoenix和HBase之上。在插入空间数据时,GS-Phoenix自动以主键索引或二次索引方式生成基于空间填充曲线的空间索引。利用该空间索引,GS-Phoenix实现了矩形范围查询、不规则范围查询和k近邻(k nearest neighbors,k NN)查询等复杂空间查询所需的基本操作。GS-Phoenix利用用户自定义函数机制和服务器端排序机制将空间查询中的主要运算任务放置在服务器端,有效降低了客户端的计算负担。此外,GS-Phoenix还设计了基于数据空间分布统计的查询优化方法,进一步提高了空间查询效率。实验表明,GS-Phoenix能够在小规模的集群上实现17万/s左右的数据插入速率,常用的空间范围查询和k NN查询都可以在几百毫秒内完成,因此GS-Phoenix能够适用于各类具有高数据吞吐和实时空间查询需求的位置相关应用系统。

关 键 词:地理空间数据  分布式存储  HBASE  GS-Phoenix  空间查询处理

A Data Management System for Big Geospatial Data Based on Phoenix
CHEN Mian,LI Longhai,XIE Peng,FU Shaofeng,HE Liesong,ZHOU Xiaodong.A Data Management System for Big Geospatial Data Based on Phoenix[J].Geomatics and Information Science of Wuhan University,2020(5):719-727.
Authors:CHEN Mian  LI Longhai  XIE Peng  FU Shaofeng  HE Liesong  ZHOU Xiaodong
Institution:(School of Computer Science and Technology,Xidian University,Xi’an 710071,China;Xi’an Research Institude of Surveying and Mapping,Xi’an 710054,China)
Abstract:HBase as a NoSQL database has been adopted as a solution for storing and managing huge datasets in many applications. However, it does not provide direct support for storing spatial data. In view of this,we present a data management system called GS-Phoenix for big geospatial data. GS-Phoenix builds on two open-source projects, Phoenix and HBase. While geospatial data being inserted into GS-Phoenix, it automatically generates a spatial index based on space filling curve in the form of primary keys of data table or a secondary index. By taking advantage of the spatial index, GS-Phoenix achieves several basic spatial query operations including rectangular range query, non-regular area query and k nearest neighbor(k NN) query which are all essential primitives for realizing complex spatial queries. GS-Phoenix employs the user-defined functions and server-side sorting mechanisms to impose most spatial filtering tasks on the server side in query processing, effectively reducing the computing burden of the client. GS-Phoenix also leverages a query optimization method based on spatial distribution statistics, which further improves the efficiency of spatial query. Experimental results show that GS-Phoenix deployed over a small scale cluster can sustain an I/O throughput of over 170 000 data insertions per second, while serving spatial range queries and k NN queries with response times as low as hundreds of milliseconds. The experiments demonstrate that GS-Phoenix is applicable to a wide spectrum of geospatial position related applications which demand high insertion throughput and real time spatial queries.
Keywords:geospatial data  distributed storage  HBase  GS-Phoenix  spatial query processing
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号