首页 | 本学科首页   官方微博 | 高级检索  
     检索      

微博中蕴含台风灾害损失信息识别和分类方法
引用本文:杨腾飞,解吉波,李振宇,李国庆.微博中蕴含台风灾害损失信息识别和分类方法[J].地球信息科学,2018,20(7):906-917.
作者姓名:杨腾飞  解吉波  李振宇  李国庆
作者单位:1. 中国科学院遥感与数字地球研究所,北京 1000492. 中国科学院大学,北京 1000493. 山东科技大学,青岛 266000
基金项目:国家重点研发项目(2016YFE0122600);国家自然科学基金项目(41771476)
摘    要:社交媒体在灾害信息的实时发布与传播中发挥着越来越重要的作用。在灾害发生过程中,社交媒体中蕴含的实时灾损信息对灾情及时响应和评估有重要意义。然而,这些涉灾文本具有信息破碎度高、文本特征稀疏、标注语料库匮乏等缺点,使得传统的基于监督学习的方法难以有效提取其中的灾损信息。为此,本文提出了一种通过扩展上下文特征和匹配特征词的方法来快速识别和分类社交媒体中蕴含的不同类别的灾损信息。本方法首先基于中文语法规则,抽取小规模不同灾损类别下微博文本中的涉灾关键词构建特征词搭配对。然后,利用词向量模型和已有词库对这些特征词搭配对进行补充和扩展。同时,根据中文词语共现规则,引入外部语料库优化特征词间的语义搭配关系。最终,以此为基础构建台风灾损分类知识库对灾情文本中蕴含的不同类别灾损信息进行识别和分类。本文以2016年9月15日台风“莫兰蒂”登陆事件作为研究案例,以评估本文方法在灾损信息识别和分类上的效果。结果表明,本文方法对微博文本中蕴含的不同类别风灾损失信息的识别和分类效果显著(各类别综合评价指标都达到了0.74以上)。基于灾损信息分类结果,本文绘制了台风影响的时空分布图,从而进一步说明本文方法在灾害损失评估和减灾救灾方面的效用。

关 键 词:社交媒体  台风灾害  短文本分类  灾损信息识别  灾情评估  
收稿时间:2018-01-18

A Method of Typhoon Disaster Loss Identification and Classification Using Micro-blog Information
YANG Tengfei,XIE Jibo,LI Zhenyu,LI Guoqing.A Method of Typhoon Disaster Loss Identification and Classification Using Micro-blog Information[J].Geo-information Science,2018,20(7):906-917.
Authors:YANG Tengfei  XIE Jibo  LI Zhenyu  LI Guoqing
Institution:1. Institute of Remote Sensing and Digital Earth Chinese Academy of Sciences, Beijing 100049, China2. University of Chinese Academy of Sciences, Beijing 100049, China3. Shandong University of Science and Technology, Qingdao 266000, China
Abstract:Social media plays a more and more important role in the real-time disaster information distribution and dissemination. During the disaster event, social media usually generates and contains a lot of real-time disaster loss information, which is very useful for the timely disaster response and disaster loss assessment. However, the social media data has many shortcomings, such as high fragmentation of the information, sparsity of the text features, and the lack of annotated corpus and so on, which makes the traditional supervised learning method difficult to be effectively used for disaster information extraction. This paper proposed a fast disaster loss identification and classification method to extract the disaster information from social media data by extending the context features and matching feature words. By this method, we firstly extracted the keywords from a small amount of sample micro-blog text of different disaster loss categories based on Chinese grammar rules and constructed the pairs of feature words collocation. Then, we used the word vector model and the existing lexicon to supplement and expand these pairs of feature words collocation. And the external corpus was introduced to optimize the semantic collocation relationship between feature words according to the rules of the concurrence of Chinese words. At last, we built a classification knowledgebase for identification and classification of disaster loss information related to typhoon disasters included in micro-blog. An experiment system was developed to evaluate the method introduced in the paper. Typhoon "Meranti" landed on 15th September, 2016 was selected as a case study. Results show that this method has a significant effect (each comprehensive evaluation index of different categories is greater than 0.74) on identifying and classifying different categories of disaster loss information from social media. We mapped the spatio-temporal distribution of typhoon influence based on the classification results of disaster loss from social media. The experiment shows that the classification output data and maps could be used for the disaster loss evaluation and mitigation.
Keywords:social media  typhoon disaster  short text classification  identification of disaster loss information  assessment of disasters  
本文献已被 CNKI 等数据库收录!
点击此处可从《地球信息科学》浏览原始摘要信息
点击此处可从《地球信息科学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号