首页 | 本学科首页   官方微博 | 高级检索  
     

基于密度的线数据分组算法研究
引用本文:魏海涛,杜云艳,许开辉,吴笛,易嘉伟,莫洋,刘张. 基于密度的线数据分组算法研究[J]. 地球信息科学学报, 2015, 17(5): 538-546. DOI: 10.3724/SP.J.1047.2015.00538
作者姓名:魏海涛  杜云艳  许开辉  吴笛  易嘉伟  莫洋  刘张
作者单位:1. 山东科技大学测绘科学与工程学院, 青岛266510;2. 中国科学院地理科学与资源研究所资源与环境信息系统国家重点实验室, 北京100101
基金项目:海洋公益性专项项目"海洋环境信息云计算与云服务体系框架应用研究"(201105033);海洋预报综合信息系统(MiFSIS)研究应用项目(201105017)。
摘    要:目前,地理空间数据面临着由于数据量膨胀和计算量高速增长而引起算法效率低的问题,采用"分而治之"的数据分组策略提高运算效率已成为研究的热点。面向分布不均匀的线数据,本文提出了基于密度的线数据分组算法(简称LGAD)。首先,算法通过查找高密度区提取样本线段,保证了分组算法的起点落到高密区;其次,考虑线空间拓扑关系的复杂性,引用水平、垂直和夹角距离度量线段间距离,创建样本线段与其他线段的距离矩阵;最后,以距离矩阵和最优选择方法实现数据负载均衡分组。实验结果显示,对数据分组和分组后数据进行线段聚类的2个过程中,该算法体现了较好的时间优势,与串行计算相比,在分组数为2-12 时,平均比率达4.3,提高了应用的响应速度,具有较好的实际意义。

关 键 词:分而治之  负载均衡  并行计算  分布不均匀  线数据分组  
收稿时间:2014-12-26

A Line Grouping Algorithm Based on Density
WEI Haitao,DU Yunyan,XU Kaihui,WU Di,YI Jiawei,MO Yang,LIU Zhang. A Line Grouping Algorithm Based on Density[J]. Geo-information Science, 2015, 17(5): 538-546. DOI: 10.3724/SP.J.1047.2015.00538
Authors:WEI Haitao  DU Yunyan  XU Kaihui  WU Di  YI Jiawei  MO Yang  LIU Zhang
Affiliation:1. Shandong University of Science and Technology, Qingdao 266510, China;2. LREIS, Institute of Geographic Sciences and Natural Resources Research, CAS, Beijing 100101, China
Abstract:Parallel computing provides a promising solution to accelerate complicated spatial data processing, which is becoming increasingly computational intense. Partitioning large datasets into workload-balanced subgroups remains a challenge, particularly for unevenly distributed spatial data. In this study, a density-based data grouping algorithm was developed to tackle the partition problem for large line data. The algorithm includes three procedures: (1) extracting representative segment samples based on data density distribution; (2) generating a distance matrix between segment samples and the rest of the data by using three line distance measurements into calculations; (3) grouping line segments with data load balanced. Experiments show that the algorithm is able to partition large line data efficiently and evenly into equally sized sub-groups. The speed-up ratios of parallel interpolation save up to 65% of the execution time in comparison with consequential interpolation. A high efficiency of parallel computing was achieved when the datasets were divided into an optimal number of child data groups.
Keywords:divide and rule  balanced workloads  parallel computing  unevenly distributed data  line segments  
本文献已被 CNKI 等数据库收录!
点击此处可从《地球信息科学学报》浏览原始摘要信息
点击此处可从《地球信息科学学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号