首页 | 本学科首页   官方微博 | 高级检索  
     检索      

面向公共安全事件的网络文本大数据结构化研究
引用本文:裴韬,郭思慧,袁烨城,张雪英,袁文,高昂,赵志远,薛存金.面向公共安全事件的网络文本大数据结构化研究[J].地球信息科学,2019,21(1):2-13.
作者姓名:裴韬  郭思慧  袁烨城  张雪英  袁文  高昂  赵志远  薛存金
作者单位:1. 中国科学院地理科学与资源研究所 资源与环境信息系统国家重点实验室,北京 1001012. 中国科学院大学,北京 1000493. 南京师范大学虚拟地理环境教育部重点实验室,南京 2100234. 中国标准化研究院,北京 1000885. 武汉大学测绘遥感信息工程国家重点实验室,武汉 4300796. 中国科学院遥感与数字地球研究所,北京 1000947. 江苏省地理信息资源开发与利用协同创新中心,南京 210023
基金项目:国家自然科学基金项目(41525004、 41421001)
摘    要:网络文本中所包含的相关信息目前已成为公共安全事件紧急救援与影响评估的重要信息源。现有的方法虽然可定向地提取文本信息中事件的各类要素信息,但由于缺乏面向事件的整体建模与解析框架,难以从网络文本中获取系统的事件要素的结构化信息,即所提取的事件要素信息要么不够完整,要么与目标事件不匹配,由此产生的遗漏与谬误难以支撑针对公共安全事件信息的系统分析。为解决该问题,本文提出了面向公共安全事件的网络文本大数据结构化理论框架,首先,建立了公共安全事件的语义框架,并以地震事件为例构建了相应的结构化表结构;其次,应用训练语料的关联标注解决了事件要素与事件无法匹配的难点;最后,通过使用可融合关联信息的文本解析算法,系统提取了事件类型、事件名称、事件时间、事件位置及其他属性,基本实现了网络文本中不同事件信息的结构化。本文以云南邵通鲁甸地震为例,展示了地震事件的网络文本信息的结构化过程与结果,为分析地震所受的关注程度以及救援状况提供了重要参考。在上述研究的基础上,开发了面向公共安全事件的网络文本信息挖掘系统,展示了地震事件文本的结构化解析以及由此实施的事件关注度分析。

关 键 词:语义框架  文本解析  事件关注度  地震事件  空间搜索引擎  
收稿时间:2018-12-01

Public Security Event Themed Web Text Structuring
Tao PEI,Sihui GUO,Yecheng YUAN,Xueying ZHANG,Wen YUAN,Ang GAO,Zhiyuan ZHAO,Cunjin XUE.Public Security Event Themed Web Text Structuring[J].Geo-information Science,2019,21(1):2-13.
Authors:Tao PEI  Sihui GUO  Yecheng YUAN  Xueying ZHANG  Wen YUAN  Ang GAO  Zhiyuan ZHAO  Cunjin XUE
Abstract:The information of public security event contained in text can be the data source of the evaluation and the relief if it can be structured into a relational database. Although previous research can extract the information of events into different attributes, the determination on the attribution of the attribute information to specific event remains unsolved. To solve the problem, this paper proposes a theoretical frame of public security event themed web text structuring, which is composed of three parts. First, an event semantic model is used to construct the seismic event semantic framework which defines abstract elements of event and their semantic relationships. Taking seismicity as an example, spatial element, time element, attribute element, source element are defined as basic elements. Spatial element includes earthquake latitude, longitude, depth and location. Attribute element is further subdivided into four sub-elements: Cause, result, behavior and influence element. Next, an annotation system is applied to typical event materials to label semantic elements, e.g. the place name where an earthquake took place, that is, instantiation of the abstract elements. The key to this step is labeling the relations between elements and specific event. Finally, the event text is structured into event type, event name, event time, event location and other attributes by using the text information extraction algorithm. The algorithm used the labeled materials in the last step as training data to optimize parameters, which can incorporate linked information. The extracted event text (e.g. words, phrases) finally is normalized to structured information for further analysis. An event information mining platform following the whole frame is developed, which includes the modules of webpage searching, text cleaning, event information extraction, visualization and analyzing. The platform processed the whole Chinese webpages of 2014 and found 85 506 seismicity reports. Taking Yunnanludian earthquake as an example, we display the structuring process and result of related web text, which can be the important reference for the relief of the disaster and the analysis of public concern. With the platform, we can demonstrate the seismic text structuring result and its social concern across China, which can be a new tool of event information mining and analyzing.
Keywords:semantic framework  text parsing  social concern about events  seismic events  spatial search engine  
本文献已被 CNKI 等数据库收录!
点击此处可从《地球信息科学》浏览原始摘要信息
点击此处可从《地球信息科学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号