Skip to main content

Table 1 Comparison of mainstream location inferring algorithms

From: Location inference for hidden population with online text analysis

 

The Gazetteer-based Method

Part-of-speech (POS) Tagging

Named Entity Recognition (NER)

Features

Identifying geographical names according to external location knowledge (e.g., dictionary containing names of cities and states)

Recognizing geographical terms in a corpus based on the part of speech of its component words, according to both their definitions and contexts

Identifying and classifying words mentioned in unstructured corpus as pre-defined entity classes, i.e., persons, locations, organizations, etc. based on HMM models

Strengths

It is a popular approach when looking for locations in Web text [45]; The algorithm is simple and easy to implement

Part-of-speech information is a pre-requisite in many NLP (Natural Language Processing) algorithms

The algorithm is fast, and suitable for processing large-scale datasets

Limitations

Largely relies on the gazetteer, and easily affected by external geographic databases [46,47,48]

Vulnerable to linguistic errors and idiosyncratic style [38];

Algorithm accuracy is relatively low

Cannot identify names of local streets or buildings, non-standard place abbreviations and misspellings which are common in microtext