首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种分布式计算的空间离群点挖掘算法
引用本文:张卫平,刘纪平,仇阿根,张用川,赵阳阳.一种分布式计算的空间离群点挖掘算法[J].测绘科学,2017,42(8).
作者姓名:张卫平  刘纪平  仇阿根  张用川  赵阳阳
作者单位:1. 中国测绘科学研究院,北京,100830;2. 武汉大学,武汉,430079;3. 辽宁工程技术大学,辽宁阜新,123000
基金项目:测绘地理信息公益性行业科研专项,中国测绘科学研究院基本科研业务费项目
摘    要:针对现有空间离群点挖掘算法无法适应大规模空间数据挖掘的需求,该文提出了一种分布式条件下的空间离群点挖掘算法。首先,该文针对集群上分布式计算和存储的特点提出使用空间填充曲线来划分数据集,加速寻找目标点的近似空间最近邻居。其次,使用信息熵的理论来定义空间离群系数,考虑到多维数据中不同属性对离群系数的影响具有差异性,该算法能够自动根据数据原有特点,计算各属性的权重;同时使用反距离权定义空间因素对离群系数的影响。最后,实验结果表明该算法在大规模的空间数据集中挖掘离群点的效率远高于传统算法,离群点的挖掘精度在90%以上。

关 键 词:空间离群点  分布式计算  最近邻居  空间离群系数

A spatial outlier mining algorithm based on distributed computing
ZHANG Weiping,LIU Jiping,QIU Agen,ZHANG Yongchuan,ZHAO Yangyang.A spatial outlier mining algorithm based on distributed computing[J].Science of Surveying and Mapping,2017,42(8).
Authors:ZHANG Weiping  LIU Jiping  QIU Agen  ZHANG Yongchuan  ZHAO Yangyang
Abstract:For the existing spatial outlier mining algorithms cannot adapt to the needs of large-scale spatial data mining,a spatial outlier mining algorithm based on distributed system was presented in this paper.Firstly,the use of space filling curve to partition the data set,and speed up the nearest neighbor of the target point were proposed.Secondly,using the theory of information entropy to define the spatial outlier factor,the effect of different attributes of multidimensional data on the outliers was taken into account and the weight of each attribute according to the original features of the data was calculated automatically;at the same time,the influence of spatial factors on the outlier factor was defined by the inverse distance weight.Lastly,experimental results showed that the efficiency of this algorithm was much higher than that of the traditional algorithm,and the accuracy of outlier mining was more than ninety percent.
Keywords:spatial outlier  distributed computing  nearest neighbor  spatial outlier factor
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号