首页 | 本学科首页   官方微博 | 高级检索  
     检索      

Web环境下地学数据共享用户行为模式分析
引用本文:王末,王卷乐.Web环境下地学数据共享用户行为模式分析[J].地球信息科学,2016,18(9):1174-1183.
作者姓名:王末  王卷乐
作者单位:1. 中国科学院地理科学与资源研究所 资源与环境信息系统国家重点实验室,北京 1001012. 中国科学院大学,北京 1000493. 江苏省地理信息资源开发与利用协同创新中心,南京 210023
基金项目:国家科技基础条件平台——地球系统科学数据共享平台(2005DKA32300)科技基础性工作重点项目(2011FY110400)中国工程院国际工程科技知识中心项目
摘    要:了解科学数据共享用户行为特征对实现高效、精准的数据共享服务具有重要的参考意义。本文基于国家地球系统科学数据共享平台网站服务器日志及服务记录数据,利用空间数据挖掘及Web使用挖掘技术,探索地球系统科学数据共享用户行为模式。在数据预处理阶段,完成用户识别、会话识别、位置识别,并对数据进行空间建模、空间数据库建库。在数据挖掘阶段,分别对用户产生的网页浏览数、会话数、数据集浏览数为对象进行空间“热点”分析,识别用户行为的地域差异。针对用户数据浏览和下载行为,采用FP-growth算法对用户——数据之间进行关联规则挖掘,发现用户对数据关注和使用的高频规律。分析结果表明:(1)该共享平台用户地在国内各省市均有分布,用户最多的3个省(市)分别为北京市、山东省、江苏省,该分布与国内高校学生分布相关程度不高,但与“211工程”高校学生的空间分布相关度较高;(2)空间“热点”分析表明,北京、天津及河北北部无论在网页浏览、数据浏览还是会话量上都是“热点”区域,但识别的“冷点”区域有较大不同,尤其是数据访问“冷点”分布较广,如南方沿海省份、河南省、山东省、四川省等;(3)关联规则挖掘发现多个数据浏览高频项目集以及关联规则。数据下载高频项与数据浏览高频模式较好吻合,但下载行为未表现出明显关联规则。本文提供了一种结合Web使用挖掘和空间数据挖掘的用户行为模式挖掘方法,该方法也可用于其他类型网站的数据挖掘。

关 键 词:网络数据挖掘  空间数据挖掘  用户行为模式  科学数据共享  地球系统科学数据  
收稿时间:2015-11-06

A Study on the User Behavior of Geoscience Data Sharing Based on Web Usage Mining
WANG Mo,WANG Juanle.A Study on the User Behavior of Geoscience Data Sharing Based on Web Usage Mining[J].Geo-information Science,2016,18(9):1174-1183.
Authors:WANG Mo  WANG Juanle
Institution:1. State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, CAS, Beijing 100101, China2. University of Chinese Academy of Sciences, Beijing 100049, China3. Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing 210023, China
Abstract:Understanding the user behavior of science data sharing is a key step to implement effective and accurate service for science data sharing. This study aims to explore the user behavior of science data sharing using spatial data mining and Web usage mining techniques for the National Earth System Science Data Sharing Platform. At the stage of data preprocessing, procedures of user identification, session identification and user location identification were performed. Spatial hotspot analysis was conducted to analyze the user pageviews, sessions, and dataset visits to explore the geographical variance of user behaviors using the Getis-Ord Gi* method. FP-growth was taken to be the algorithm for mining association rules, and was performed for analyzing data visits and data downloads. Data mining results show that: (1) the user distribution of data sharing platform does not show significant correlation with the overall university population distribution in China, but shows a significant positive correlation with the population of research-oriented universities; (2) the hotspot analysis shows that regions of hotspots were clustering in Beijing, Tianjin, and northern Hebei Province for all three perspectives, whereas the cold spots geographically scattered to a greater extent, e.g. the southern coastal provinces, Henan Province, Shandong Province, Sichuan Province, etc.; (3) the association rules mining reveals a number of frequently visited item sets and rules from the valuable user pageviews. The frequently visited item sets for data downloads were well coincided with the frequently visited data. However, no conspicuous rules occurred in data downloads. Results of the spatial hotspot analysis and association rules mining detected the geographical variance of users’ interests in data and discovered the usage patterns for the frequently visited data, which can be used for designing the personalized recommendation. This study provides a method for mining web user behaviors with the combination of Web usage mining and spatial data mining techniques, which can also be applied to the data mining of websites in other fields.
Keywords:Web usage mining  spatial data mining  user behavior mining  science data sharing  Earth System Science data  
本文献已被 CNKI 等数据库收录!
点击此处可从《地球信息科学》浏览原始摘要信息
点击此处可从《地球信息科学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号