Abstract: | China is one of the countries where landslides caused the most fatalities in the last decades.The threat that landslide disasters pose to people might even be greater in the future,due to climate change and the increasing urbanization of mountainous areas.A reliable national-scale rainfall induced landslide suscep-tibility model is therefore of great relevance in order to identify regions more and less prone to landslid-ing as well as to develop suitable risk mitigating strategies.However,relying on imperfect landslide data is inevitable when modelling landslide susceptibility for such a large research area.The purpose of this study is to investigate the influence of incomplete landslide data on national scale statistical landslide susceptibility modeling for China.In this context,it is aimed to explore the benefit of mixed effects mod-elling to counterbalance associated bias propagations.Six influencing factors including lithology,slope,soil moisture index,mean annual precipitation,land use and geological environment regions were selected based on an initial exploratory data analysis.Three sets of influencing variables were designed to represent different solutions to deal with spatially incomplete landslide information:Set 1(disregards the presence of incomplete landslide information),Set 2(excludes factors related to the incompleteness of landslide data),Set 3(accounts for factors related to the incompleteness via random effects).The vari-able sets were then introduced in a generalized additive model(GAM:Set 1 and Set 2)and a generalized additive mixed effect model(GAMM:Set 3)to establish three national-scale statistical landslide suscep-tibility models:models 1,2 and 3.The models were evaluated using the area under the receiver operating characteristics curve(AUROC)given by spatially explicit and non-spatial cross-validation.The spatial pre-diction pattern produced by the models were also investigated.The results show that the landslide inven-tory incompleteness had a substantial impact on the outcomes of the statistical landslide susceptibility models.The cross-validation results provided evidence that the three established models performed well to predict model-independent landslide information with median AUROCs ranging from 0.8 to 0.9.However,although Model 1 reached the highest AUROCs within non-spatial cross-validation(median of 0.9),it was not associated with the most plausible representation of landslide susceptibility.The Model 1 modelling results were inconsistent with geomorphological process knowledge and reflected a large extent the underlying data bias.The Model 2 susceptibility maps provided a less biased picture of landslide susceptibility.However,a lower predicted likelihood of landslide occurrence still existed in areas known to be underrepresented in terms of landslide data(e.g.,the Kuenlun Mountains in the northern Tibetan Plateau).The non-linear mixed-effects model(Model 3)reduced the impact of these biases best by introducing bias-describing variables as random effects.Among the three models,Model 3 was selected as the best national-scale susceptibility model for China as it produced the most plausible portray of rainfall induced landslide susceptibility and the highest spatially explicit predictive perfor-mance(median AUROC of spatial cross validation 0.84)compared to the other two models(median AUROCs of 0.81 and 0.79,respectively).We conclude that ignoring landslide inventory-based incomplete-ness can entail misleading modelling results and that the application of non-linear mixed-effect models can reduce the propagation of such biases into the final results for very large areas. |