首页 | 官方网站   微博 | 高级检索  
     

一种基于共词网络的社交媒体数据主题挖掘方法
引用本文:王艳东,付小康,李萌萌.一种基于共词网络的社交媒体数据主题挖掘方法[J].武汉大学学报(信息科学版),2018,43(12):2287-2294.
作者姓名:王艳东  付小康  李萌萌
作者单位:1.武汉大学测绘遥感信息工程国家重点实验室, 湖北 武汉, 430079
基金项目:国家重点研发计划2016YFB0501403国家自然科学基金41271399测绘地理信息公益性行业科研专项经费201512015
摘    要:对社交媒体所包含文本数据的深入挖掘,有利于有效地进行后续的时空分析。提出了一种新的基于共词网络的社交媒体数据主题挖掘方法,依据词频-逆文档频率分析,自动筛选出与主题相关的关键词汇,基于微博间是否包含相同的关键词汇,提出构建以微博为节点的共词网络,并结合Louvain社区探测算法进行文本主题挖掘。所提出的方法是一种无监督方法,且具有不需要指定聚类数目的优点。实验表明,该方法在主题挖掘表现上,准确率和召回率均优于常用的文档主题生成模型。以收集的2012年北京暴雨期间包含关键词的微博为例,利用提出的方法对微博数据集进行挖掘和时空分析,结果表明所提方法在实际应用中的有效性。

关 键 词:共词网络    社交媒体    Louvain社区探测    主题挖掘
收稿时间:2018-10-15

A New Social Media Topic Mining Method Based on Co-word Network
Affiliation:1.State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China2.Collaborative Innovation Center of Geospatial Technology, Wuhan 430079, China3.Faculty of Geomatics, East China University of Technology, Nanchang 330013, China
Abstract:The in-depth exploration of the text data contained in social media facilitates efficient analysis of time and space. This paper proposes a new social media topic mining method based on the concept of co-word network and community detection. The method uses term frequency-inverse document frequency (TF-IDF) analysis to identify the key words of the messages automatically. Based on the problem whether the microblogs contain the same key words or not, we put forward the concept of microblog co-word network with microblog as the node. The network combined with the Louvain community detection algorithm is used to classify the microblogs into different clusters with topics. The proposed method is an unsupervised method. The advantage of this method is that there is no need to specify the number of clusters. Experiments demonstrate that the performance of the proposed method is better than the commonly used latent dirichlet allocation (LDA) model on both precision and recall. Taking the collected microblogs during the 2012 Beijing rainstorm as the case study, the method is used to conduct in-depth mining and time-space analysis of the microblogs dataset. The results demonstrate that the proposed method is effective in real world applications.
Keywords:
点击此处可从《武汉大学学报(信息科学版)》浏览原始摘要信息
点击此处可从《武汉大学学报(信息科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号