首页 | 官方网站   微博 | 高级检索  
     

基于Spark的分布式空间数据存储结构设计与实现
引用本文:乐鹏,吴昭炎,上官博屹.基于Spark的分布式空间数据存储结构设计与实现[J].武汉大学学报(信息科学版),2018,43(12):2295-2302.
作者姓名:乐鹏  吴昭炎  上官博屹
作者单位:1.武汉大学遥感信息工程学院, 湖北 武汉, 430079
基金项目:国家重点研发计划2017YFB0504103国家自然科学基金41722109武汉黄鹤英才科技创新专项2016湖北省杰出青-自然科学基金2018CFA053
摘    要:Apache Spark分布式计算框架可用于空间大数据的管理与计算,为实现云GIS提供基础平台。针对Apache Spark的数据组织与计算模型,结合Apache HBase分布式数据库,从分布式GIS内核的理念出发,设计并实现了分布式空间数据存储结构与对象接口,并基于某国产GIS平台软件内核进行了实现。针对点、线、面数据的存储与查询,与传统空间数据库系统PostGIS进行了一系列对比实验,验证了提出的分布式空间数据存储架构的可行性与高效性。

关 键 词:Spark    云GIS    分布式空间数据组织    分布式GIS内核    空间大数据
收稿时间:2018-06-07

Design and Implementation of a Distributed Geospatial Data Storage Structure Based on Spark
Affiliation:1.School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China
Abstract:In recent years, with the rapid development of sensor web and earth observation technologies, geospatial data has become an important part of the big data, traditional geospatial data storage and processing systems are increasingly unable to meet the requirements of big geospatial data. The Apache Spark, which is a unified analytics engine for large-scale data processing, can provide both the management and processing capabilities of big geospatial data. And based on the Apache Spark, a fundamental platform for developing cloud-based GIS can be developed to move conventional GIS kernel to distributed GIS kernel in the era of cloud computing. On the basis of the data organization and computation models of the Apache Spark system, this paper couples it with the Apache HBase distributed database, and presents the approaches of the design and implementation of a distributed geospatial data storage and processing architecture by leveraging data management and computing paradigm between Apache Spark and Apache HBase. In the architecture, a variable-length GeoHash index method is proposed to improve the query performance of geospatial point, polyline and polygon data, and the SpatialRDD is presented to manage and process the geospatial data queried from the Apache HBase in a distributed manner. The GIS kernel of the architecture is realized based on a Chinese-brand GIS software, in view of the storage and processing of different kinds of geospatial data, such as point, polyline and polygon, a series of contrast experiments with the traditional geospatial database, PostGIS, are performed, and the results demonstrate the applicability and efficiency of the approaches.
Keywords:
点击此处可从《武汉大学学报(信息科学版)》浏览原始摘要信息
点击此处可从《武汉大学学报(信息科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号