A hybrid method for Chinese address segmentation |
| |
Authors: | Lin Li Wei Wang Biao He Yu Zhang |
| |
Affiliation: | 1. School of Resource and Environmental Sciences, Wuhan University, Wuhan, China;2. Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Land and Resources, Shenzhen, China;3. Collaborative Innovation Center of Geospatial Technology, Wuhan University, Wuhan, China;4. Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Land and Resources, Shenzhen, China;5. College of Architecture and Urban Planning, Shenzhen University, Shenzhen, China |
| |
Abstract: | ![]() Chinese address segmentation is a serious challenge in geographic information system geocoding. Most previous studies have relied on predefined gazetteers without considering the information contained by a raw address corpus. In this paper, a hybrid method employing both rule-based and statistical methods is proposed for Chinese address segmentation without a predefined gazetteer. This approach utilizes statistical methods to extract address information from a raw address corpus and a rule-based method to segment Chinese addresses. Two typical statistical methods and their combinations with rule-based methods are compared with the hybrid method in an experiment involving approximately 460,000 address items in Shenzhen City, China. The experimental results indicate that the proposed method achieves an F-score of over 0.8, which is better than those of existing methods, thus validating the proposed method. |
| |
Keywords: | Geocoding Chinese address segmentation without gazetteers hybrid method |
|
|