Location inference for hidden population with online text analysis

Table 1 Comparison of mainstream location inferring algorithms

	The Gazetteer-based Method	Part-of-speech (POS) Tagging	Named Entity Recognition (NER)
Features	Identifying geographical names according to external location knowledge (e.g., dictionary containing names of cities and states)	Recognizing geographical terms in a corpus based on the part of speech of its component words, according to both their definitions and contexts	Identifying and classifying words mentioned in unstructured corpus as pre-defined entity classes, i.e., persons, locations, organizations, etc. based on HMM models
Strengths	It is a popular approach when looking for locations in Web text [45]; The algorithm is simple and easy to implement	Part-of-speech information is a pre-requisite in many NLP (Natural Language Processing) algorithms	The algorithm is fast, and suitable for processing large-scale datasets
Limitations	Largely relies on the gazetteer, and easily affected by external geographic databases [46,47,48]	Vulnerable to linguistic errors and idiosyncratic style [38]; Algorithm accuracy is relatively low	Cannot identify names of local streets or buildings, non-standard place abbreviations and misspellings which are common in microtext

ISSN: 1476-072X