The article at hand is driven by a methodological interest in the opportunities and challenges of applying an automated text mining approach, particularly a sentiment analysis on various tourism blogs at the same time. The study aims to answer the question to what extent advanced computational methods can improve the data acquisition and analysis of unstructured data sets stemming from various blogs and forums. Furthermore, the authors intend to explore to what extent the sentiment analysis is able to objectify the qualitative results identified by an earlier analysis by the authors using content analysis done by thematic coding. For the purpose of the specific tourism research question in this paper a new approach is proposed, which consists of a combination of sentiment analyses, supervised learning, and dimensionality reduction in order to identify terms that strongly load on specific emotions. The contribution indicates on the one hand, that advanced computational methods have their own specific constraints, but on the other hand, are able to provide a richer and deeper analysis following a quantitative approach. Several issues have to be taken into account, such as data protection constraints, the need for data cleaning, such as word stemming, dimension reduction, such as removal of custom stop words, and the development of descent ontologies. On the other hand, the quantitative method also provides, due to its standardised procedure, a less subjective insight in the given content, but is not less time consuming than traditional content analysis.
About the authors
After his PhD he was head of scoring and rating models at Santander consumer bank for seven years. He teaches statistics and research methods for students of leisure and tourism management at Stralsund University of Applied Sciences and is head of the competence center for artificial intelligence and machine learning at the institute of applied computer sciences (IACS), Stralsund and head of the working group of data analysis and numerical classification of the GfKl Data Science Society. His current research interests cover machine learning (autoML, explainable AI and algorithmic fairness), credit scoring, text mining, statistical computing and data literacy.
After graduating high school she realized her dream of traveling through South America. Doing internships in the tourism and hospitality sector in Venezuela and Brazil she decided to study Tourism- and Eventmanagement at EBC University of Applied Sciences in Berlin. After her B.A. instead of doing a Masters degree she did a second Bachelors degree in applied computer science at University of Applied Sciences (HTW) Berlin. There she was especially interested in text-mining and artificial intelligence.
Werner Gronau is Professor for Tourism, Travel & Transport at Stralsund University of Applied Sciences in Germany. He has extensive experience in tourism research and education, with previous posts and visiting professorships in Cyprus, Australia and Italy. He has worked in several research projects granted by various institutions, such as the European Commission, the DFG (German Research Foundation) or the German Ministry of Research and presented the results on international conferences, in various journals and books. He acts as Editor of the Journal of Tourism Sciences and serves as a reviewer for several tourism and transport journals, such as Journal of Sustainable Tourism or Journal of Transport Geography.
Dr. Tine Lehmann is Professor of International Business at the University of Applied Sciences (HTW) Berlin. She received her PhD from the University of Passau. Before joining the HTW she has been working in economic and tourism development projects in Southeast Europe for several years. She combined this function with her research interest on institutional development. Her research focuses on analyzing distorted institutional contexts’ and potentials to deal with institutional voids.
Ahmed, K., El Tazi, N., & Hossny, A. H. (2015). Sentiment analysis over social networks: an overview. In 2015 IEEE international conference on systems, man, and cybernetics (pp. 2174–2179). IEEE.10.1109/SMC.2015.380Search in Google Scholar
Centobelli, P., & Ndou, V. (2019). Managing customer knowledge through the use of big data analytics in tourism research. Current Issues in Tourism, 22(15), 1862–1882.10.1080/13683500.2018.1564739Search in Google Scholar
Feuerriegel, S. & Proellochs, N. (2019). SentimentAnalysis: Dictionary-Based Sentiment Analysis. R Package version 1.3–3. https://CRAN.R-project.org/package=SentimentAnalysis.Search in Google Scholar
Gronau, W. & Lehmann, T. (2019). Tourism development in transition economies of the western Balkans:–Addressing the seeming oxymoron of institutional voids and tourism growth. Zeitschrift für Tourismuswissenschaft, 11(1), 45–64.10.1515/tw-2019-0004Search in Google Scholar
Gold, Z., & Latonero, M. (2017). Robots Welcome: Ethical and Legal Considerations for Web Crawling and Scraping. Wash. JL Tech. & Arts, 13, 275.Search in Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. H. (2017). The elements of statistical learning: Data mining, inference, and prediction (Second edition, corrected at 12th printing 2017). Springer series in statistics. New York, NY: Springer.Search in Google Scholar
Jockers M. (2017). syuzhet: Extract Sentiment and Plot Arcs from Text. R Package version 1.0.4. https://CRAN.R-project.org/package=syuzhet.Search in Google Scholar
Mardia, K., Kent, J. & Bibby, J. (1979): Multivariate Analysis, Academic Press.Search in Google Scholar
Mohammad S., S. Mohammad & Kiritchenko, S. (2015). Sentiment after translation: A case-study on Arabic social media posts. In: Proceedings of the 2015 conference of the North American chapter of the association for computational linguistics: Human language technologies, pages 767–777Search in Google Scholar
Mohammad, S. M., & Turney, P. D. (2013). NRC emotion lexicon. National Research Council, Canada, 2.Search in Google Scholar
Philander, K., & Zhong, Y. (2016). Twitter sentiment analysis: Capturing sentiment from integrated resort tweets. International Journal of Hospitality Management, 55(2016), 16–24.10.1016/j.ijhm.2016.02.001Search in Google Scholar
Pröllochs, N., Feuerriegel, S., & Neumann, D. (2015). Generating domain-specific dictionaries using Bayesian learning. ECIS 2015 Completed Research Papers, (Paper 144).10.2139/ssrn.2522884Search in Google Scholar
R Core Team (2020). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria. https://www.R-project.org/.Search in Google Scholar
Tetzlaff, L., Rulle, K. Szepannek, G., & Gronau, W. (2019). A Customer Feedback Sentiment Dictionary – Towards Automatic Assessment of Online Reviews, European Journal of Tourism Research 23, 28–39.10.54055/ejtr.v23i.387Search in Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267–288.10.1111/j.2517-6161.1996.tb02080.xSearch in Google Scholar
Wilson, E., Mura, P., Sharif, S. P., & Wijesinghe, S. N. (2020). Beyond the third moment? Mapping the state of qualitative tourism research. Current Issues in Tourism, 23(7), 795–810.10.1080/13683500.2019.1568971Search in Google Scholar
Xu, F., Nash, N., & Whitmarsh, L. (2020). Big data or small data? A methodological review of sustainable tourism. Journal of Sustainable Tourism, 28(2), 144–163.10.1080/09669582.2019.1631318Search in Google Scholar
5 Appendix: Web-Scraping Source Code
The developed web-scraping program extracts the unstructured data in a common post data structure. The program stores this data in text files, using a comprehensive directory structure, which makes the origins of the data comprehensive for humans. In Figure 02 this directory structure is shown. The post data structure contains the following information: the website URL, the message text of the blog post, the date the message was written, as well as the language of the message. Figure 02 illustrates the web scraping process: The top left image shows the part of the website that contains the desired information including the message text and the date. The right part shows the underlying HTML structure for this part of the website. The lower part of Figure 02 shows the part of the web scraper program that extracts this information.
© 2021 Walter de Gruyter GmbH, Berlin/Boston