首页 | 本学科首页   官方微博 | 高级检索  
     检索      

利用搜索引擎数据模拟疾病空间分布
引用本文:肖屹,何宗宜,苗静,潘峰,杨好.利用搜索引擎数据模拟疾病空间分布[J].测绘通报,2018,0(2):94-98.
作者姓名:肖屹  何宗宜  苗静  潘峰  杨好
作者单位:1. 武汉大学资源与环境科学学院, 湖北 武汉 430079; 2. 武汉市测绘研究院, 湖北 武汉 430022; 3. 西安测绘总站, 陕西 西安 710054
基金项目:国家自然科学基金,教育部人文社会科学研究项目
摘    要:互联网记录了人们的日常生活,对带有位置信息的搜索引擎数据进行分析和挖掘可以获得隐藏于其中的地理信息。本文通过分析中国各省流感月度发病数与相关关键词百度搜索指数之间的相关性,选取相关性较高关键词的百度指数作为解释变量,发病数作为因变量,在采用主成分分析法消除变量共线性后,分别使用普通最小二乘回归(OLS)、地理加权回归(GWR)及时空地理加权回归(GTWR)构建流感发病数的空间分布模型。模型的拟合度能够从OLS的0.737、GWR的0.915提高到GTWR的0.959,赤池信息准则(AIC)也表明,GTWR模型明显优于OLS与GWR模型。验证结果显示,GTWR模型能准确识别流感高发地区,将该方法与搜索引擎数据结合能较好地模拟流感空间分布,为空间流行病学的研究提供预测模型和统计解释。

关 键 词:时空地理加权回归模型  搜索引擎数据  流感  空间分布模型  
收稿时间:2017-07-18
修稿时间:2017-08-30

Modelling the Spatial Distribution of Epidemic by Search Engine Data
XIAO Yi,HE Zongyi,MIAO Jing,PAN Feng,YANG Hao.Modelling the Spatial Distribution of Epidemic by Search Engine Data[J].Bulletin of Surveying and Mapping,2018,0(2):94-98.
Authors:XIAO Yi  HE Zongyi  MIAO Jing  PAN Feng  YANG Hao
Institution:1. School of Resource and Environmental Science, Wuhan University, Wuhan 430079, China; 2. Wuhan Geomatic Institute, Wuhan 430022, China; 3. Xi'an Information Technique Institute of Surveying and Mapping, Xi'an 710054, China
Abstract:The Internet records people's daily life,and the analyzing and mining of search engine query data with location can discover valuable geographic information hidden in it. In this paper, the correlation between the monthly influenza case data in each Chinese province and the Baidu search index of related keywords were calculated, the most related keyword's index was chosen as the explanatory variable while the influenza case data was chosen as the dependent variable. The principal component analysis was used to eliminate the effect of multicollinearity among variables before the spatial distribution model of influenza was constructed by ordinary least squares regression ( OLS ) , geographically weighted regression ( GWR ) and geographically and temporally weighted regression (GTWR).The GTWR model demonstrated a better goodness-of-fit (0.959) than the OLS (0.737) and GWR model (0.915).The Akaike information criterion ( AIC ) test also supported that the improvement made by GTWR over OLS and GWR models were statistically significant.Validation results showed that the GTWR model can accurately identify the high prevalence area of influenza.It demonstrates that combining the GTWR model with search engine query data can model the spatial distribution of influenza accurately, and provide a prediction model and statistical explanation for the study of epidemiology.
Keywords:geographically and temporally weighted regression  search engine data  influenza  spatial distribution model  
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《测绘通报》浏览原始摘要信息
点击此处可从《测绘通报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号