Institution: | 1. Key Laboratory of Geospatial Technology for the Middle and Lower Yellow River Regions (Henan University), Ministry of Education, Kaifeng, China
College of Geography and Environmental Science, Henan University, Kaifeng, China;2. Key Laboratory of Geospatial Technology for the Middle and Lower Yellow River Regions (Henan University), Ministry of Education, Kaifeng, China;3. Key Laboratory of Geospatial Technology for the Middle and Lower Yellow River Regions (Henan University), Ministry of Education, Kaifeng, China
College of Geography and Environmental Science, Henan University, Kaifeng, China
Urban Big Data Institute, Henan University, Kaifeng, China;4. Key Laboratory of Geospatial Technology for the Middle and Lower Yellow River Regions (Henan University), Ministry of Education, Kaifeng, China
College of Geography and Environmental Science, Henan University, Kaifeng, China
Henan Industrial Technology Academy of Spatiotemporal Big Data, Henan University, Zhengzhou, China
Henan Technology Innovation Center of Spatiotemporal Big Data, Henan University, Zhengzhou, China;5. Henan Industrial Technology Academy of Spatiotemporal Big Data, Henan University, Zhengzhou, China
Henan Technology Innovation Center of Spatiotemporal Big Data, Henan University, Zhengzhou, China
School of Computer and Information Engineering, Henan University, Kaifeng, China |
Abstract: | Increasing concern for urban public safety has motivated the deployment of a large number of surveillance cameras in open spaces such as city squares, stations, and shopping malls. The efficient detection of crowd dynamics in urban open spaces using multi-viewpoint surveillance videos continues to be a fundamental problem in the field of urban security. The use of existing methods for extracting features from video images has resulted in significant progress in single-camera image space. However, surveillance videos are geotagged videos with location information, and few studies have fully exploited the spatial semantics of these videos. In this study, multi-viewpoint videos in geographic space are used to fuse object trajectories for crowd sensing and spatiotemporal analysis. The YOLOv3-DeepSORT model is used to detect a pedestrian and extract the corresponding image coordinates, combine spatial semantics (such as the positions of the pedestrian in the field of view of the camera) to build a projection transformation matrix and map the object recorded by a single camera to geographic space. Trajectories from multi-viewpoint videos are fused based on the features of location, time, and directions to generate a complete pedestrian trajectory. Then, crowd spatial pattern analysis, density estimation, and motion trend analysis are performed. Experimental results demonstrate that the proposed method can be used to identify crowd dynamics and analyze the corresponding spatiotemporal pattern in an urban open space from a global perspective, providing a means of intelligent spatiotemporal analysis of geotagged videos. |