首页 | 本学科首页   官方微博 | 高级检索  
     检索      

主题模型在基于社交媒体的灾害分类中的应用及比较
引用本文:苏凯,程昌秀,Nikita Murzintcev,张婷.主题模型在基于社交媒体的灾害分类中的应用及比较[J].地球信息科学,2019,21(8):1152-1160.
作者姓名:苏凯  程昌秀  Nikita Murzintcev  张婷
作者单位:1. 北京师范大学地理科学学部,地理数据与应用分析中心,北京 100875;2. 中国科学院地理科学与资源研究所,北京 100101
基金项目:国家重点研发计划项目(2017YFB0504102);中央高校基本科研业务费专项资金资助
摘    要:“一带一路”沿线为自然灾害高发地区,且多为经济欠发达、抗灾能力弱的发展中国家。灾害发生时,挖掘和分析相关推特数据有助于开展应急救援、灾情评估、减灾防灾等工作,为中国国际救援与救助工作提供重要支撑。主题模型能在没有经验语料库的情况下,从海量灾害相关推文中快速聚合出对灾害救援、评估有价值的信息。本文采用BTM模型和LDA模型,对2013年海燕台风相关推文进行细粒度的主题聚类,分析2个模型的精度并测试它们对近似灾害主题的区分能力,并基于“需求相关”主题类的推文,通过地名匹配,分析了海燕台风发生过程中菲律宾物资、医疗等需求程度的空间分布。结果表明: ① 在区分主题近似的短文本时,BTM总体精度为0.598,LDA的总体精度仅为0.321,说明在海燕台风灾害推文的主题识别中,BTM模型的精度高于LDA模型;② BTM能够较好识别出“灾害地点相关”、“祈福相关”等较为精细的灾害主题;③ 经初步验证,基于“需求相关”主题文本生成的物资、医疗等需求的需求程度空间分布与实际需求情况基本相符。

关 键 词:主题模型  BTM  LDA  推文  主题分类  自然灾害  应急管理  
收稿时间:2019-01-25

Application and Comparison of Topic Model in Identifying Latent Topics from Disaster-Related Tweets
SU Kai,CHENG Changxiu,Nikita Murzintcev,ZHANG Ting.Application and Comparison of Topic Model in Identifying Latent Topics from Disaster-Related Tweets[J].Geo-information Science,2019,21(8):1152-1160.
Authors:SU Kai  CHENG Changxiu  Nikita Murzintcev  ZHANG Ting
Institution:1. Center for Geodata and Analysis, Faculty of Geographical Science, Beijing Normal University, Beijing 100875, China;2. Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China
Abstract:From 1990 to 2010, the occurrence of natural disasters was increasing in countries along the "One Belt and One Road" where most countries are developing countries with underdeveloped economy and weak disaster resistance. When disasters happen, people in those countries will tweet about the disasters in real time. The tweets contain important information for emergency rescue, disaster assessment, disaster reduction and prevention, etc. Therefore, mining and analyzing relevant tweets can provide powerful support for China's international rescue and relief work. However, twitter data is fragmented and unstructured, and the number of topics that tweets contain are huge and miscellaneous. Therefore, how to rapidly screen out relevant information from tweets becomes a research challenge. Without empirical corpus, topic model can rapidly aggregate information from a large number of disaster-related tweets, which are valuable for disaster relief and assessment. In this paper, the BTM model and LDA model, that are widely used in the study of natural language processing, were adopted to cluster Haiyan typhoon-related tweets at fine granularity topics. Then we verified and compared the accuracy of two models, and tested their ability to distinguish similar disaster topics. In addition, based on the "demand-related" tweets obtained from topic categorization, through place-name matching, we analyzed the spatial distribution of demand degree of materials and medical care in the Philippines during the occurrence of Haiyan typhoon. The result shows that: (1) In classifying Haiyan typhoon-related tweets at fine granularity topics, the overall accuracy of BTM was 0.598, while that of LDA was only 0.321, indicating that BTM can outperform LDA. (2) The F1-measure values of BTM in "disaster location-related" and "blessing-related" tweets were 0.8 and 0.78, indicating that BTM can better identify tweets of those two topics. (3) After preliminary verification, the spatial distribution of material and medical needs generated based on "demand-related" tweets was basically consistent with the actual demand. Our findings can help quickly obtain first-hand disaster information from twitter when China lacks relevant data of disasters occurring in the "One Belt and One Road" region, so to provide data support for China's international rescue work. Besides, our methodology can be used for studying domestic microblog in disasters.
Keywords:Topic model  BTM  LDA  Tweet  Topic categorization  Natural hazard  Emergency management  
点击此处可从《地球信息科学》浏览原始摘要信息
点击此处可从《地球信息科学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号