首页 | 官方网站   微博 | 高级检索  
     

云环境下海量空间矢量数据并行划分算法
引用本文:姚晓闯,杨建宇,李林,叶思菁,郧文聚,朱德海.云环境下海量空间矢量数据并行划分算法[J].武汉大学学报(信息科学版),2018,43(7):1092-1097.
作者姓名:姚晓闯  杨建宇  李林  叶思菁  郧文聚  朱德海
作者单位:1.中国农业大学信息与电气工程学院, 北京, 100083
基金项目:国土资源部公益性行业科研专项基金201511010-06
摘    要:空间数据划分是空间大数据索引方法及其数据存储的重要组成部分。针对Hadoop云计算平台在空间数据划分及其存储方面的不足,提出了基于Hilbert空间填充曲线的海量空间矢量数据并行划分算法。在数据划分阶段,充分考虑空间数据相邻对象的空间位置关系、空间对象的自身大小以及相同编码块的空间对象个数等影响因素;通过“合并小编码块,分解大编码块”的划分原则,实现了云环境下海量空间矢量数据的并行划分算法。试验表明,该算法不仅能够提高海量空间矢量数据的索引效率,同时也能够很好地解决空间矢量数据在Hadoop分布式文件系统(Hadoop distributed file system,HDFS)上的数据倾斜问题。

关 键 词:矢量数据    Hilbert编码    空间数据划分    MapReduce    R-tree索引    数据倾斜
收稿时间:2017-02-05

Parallel Algorithm for Partitioning Massive Spatial Vector Data in Cloud Environment
Affiliation:1.College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China2.Key Laboratory for Agriculture Land Quality, Monitoring and Control of the Ministry of Land and Resources, Beijing 100035, China3.Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100094, China4.Land Consolidation and Rehabilitation Center, Ministry of Land and Resources, Beijing 100035, China
Abstract:Spatial data partitioning plays an important role in the spatial index methods and the data storage strategy for spatial big data. In this paper, to make up the inherent shortcomings of spatial data partitioning and data storage in the Hadoop cloud computing platform, a parallel algorithm based on Hilbert space-filling curve is presented for partitioning the massive spatial vector data. In the spatial vector data partitioning phase, we take more influence factors, including the spatial location relationship between adjacent objects, the size of spatial vector object itself, the number of spatial objects in the same spatial coded block and others, into full consideration. Meanwhile, by following the partitioning principle of merging small coded blocks and sub-splitting large coded blocks, this paper implements the parallel algorithm for partitioning the massive spatial vector data in cloud environment. Experimental results show that the algorithm proposed in this paper can not only improve the efficiency of the spatial R-tree index for massive spatial vector data, but also give a good data balance in Hadoop distributed file system (HDFS).
Keywords:
本文献已被 CNKI 等数据库收录!
点击此处可从《武汉大学学报(信息科学版)》浏览原始摘要信息
点击此处可从《武汉大学学报(信息科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号