Identification of points of tourist interest through social network data
DOI:
https://doi.org/10.5821/ctv.8755Keywords:
Social media, tourism, Big Data, urban spatial analysisAbstract
In most European cities, mass tourism is becoming a major industry, but the effects of the increasing visitor pressure are beginning to be negatively perceived by its inhabitants. Local authorities must implement policies that modulate these activities so they can coexist and become compatible with the daily life of the local population. However, these policies must take into consideration that tourist activity tends to concentrate in very specific areas, which must be clearly identified before implementing specific taxation schemes, or requiring the procurement of a mandatory license for certain activities. To clearly designate these areas, this research proposes using social media data to identify hot spots of visitor activity in Barcelona, as an emerging source of information for stakeholders in their decision-making process, using as a case of study more than 75,000 geotagged pictures collected the Panoramio picture sharing community through its Application Programming Interface.
This data-driven approach to urban analysis must address some of the issues of what some authors describe as “the 4 V’s of Big Data”: volume, variety, velocity, and veracity. In particular, this research is especially sensitive to two aspects (volume and veracity) that must to be addressed accordingly. In the case of its volume, the large number of locations made conventional spatial data analysis challenging, while in the case of veracity, the informal nature of the source data reduced the confidence on location precision. The methodology followed a principled approach based on spatial statistics, focusing on the spatial distribution of locations analyzed using kernel density estimation (KDE), with the bandwidth determined to identify general trends at the intended scale of analysis and reduce spatial noise.
However, the analysis of the density of photographs in isolation ignores the spatial variation of the intensity of use within the city, since a greater concentration of photographs is expected in areas of elevated activity, ceteris paribus. The estimation of the level of activity was based on the inventory of businesses in the city of Barcelona, considering that it was a good indicator of the intensity of use in the different areas of the city, under the premise that a higher number of businesses is indicative increased pedestrian activity.
To analyze tourist attractiveness considering this spatial heterogeneity in the use of the city, the spatial distributions of photographs and street-level commercial activities (as an indirect indicator of intensity of use) were compared. To make both distributions equivalent, a methodology was developed to normalize the values obtained from the KDE, obtaining an indicator (relative attractiveness) robust regarding the resolution of discretization, the number of locations, and their spatial distribution. The differences between both standardized surfaces were classified on a divergent scale to identify and quantify the hot and cold spots of tourist activity, highlighting the areas of outstanding tourist pressure and also the “deserts” almost devoid of visitors.
The proposed approach proposes an emerging avenue of research in a traditionally data-scarce field of study, and suggests that social media is capable of becoming a valuable source of data in urban research. However, while this amount of data available is unprecedented, it also requires new analysis techniques as well as specific domain knowledge in the data collection, cleaning, analysis and visualization processes to successfully provide accurate and meaningful results.