Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter September 3, 2020

Exploring Evolution of Public Opinions on Tianya Club Using Dynamic Topic Models

Zhihua Yan EMAIL logo and Xijin Tang

Abstract

Online media have brought tremendous changes to civic life, public opinions, and government administration. Compared with traditional media, online media not only allow individuals to browse news and express their views more freely, but also accelerate the transmission of opinions and expand influence. As public opinions may arouse societal unrest, it is worth detecting the primary topics and uncovering the evolution trends of public opinions for societal administration. Various algorithms are developed to deal with the huge volume of unstructured online media data. In this study, dynamic topic model is employed to explore topic content evolution and prevalence evolution using the original posts published from 2013 to 2017 on the Tianya Zatan Board of Tianya Club, which is one of the most popular BBS in China. Based on semantic similarities, topics are grouped into three themes: Family life, societal affairs, and government administration. The evolution of topic prevalence and content are affected by emergent incidents. Topics on family life become popular, while themes “societal affairs” and “government administration” with bigger standard deviations are more likely to be influenced by emergent hot events. Content evolution represented by monthly pairwise distance matrix is very easy to find change points of topic content.


Supported by the National Key Research and Development Program of China (2016YFB1000902) and the National Natural Science Foundation of China (71731002 & 71971190)


Acknowledgements

The authors gratefully acknowledge the editor and two anonymous referees for their insightful comments and helpful suggestions that led to a marked improvement of the article.

References

[1] Dong T, Liang C, He X. Social media and internet public events. Telematics and Informatics, 2017, 34(3): 726–739.10.1016/j.tele.2016.05.024Search in Google Scholar

[2] Cheng Y, Huang Y H C, Chan C M. Public relations, media coverage, and public opinion in contemporary China: Testing agenda building theory in a social mediated crisis. Telematics and Informatics, 2017, 34(3): 765–773.10.1016/j.tele.2016.05.012Search in Google Scholar

[3] Murphy J, Link M W, Childs J H, et al. Social media in public opinion research: Executive summary of the Aapor task force on emerging technologies in public opinion research. Public Opinion Quarterly, 2014, 78(4): 788–794.10.1093/poq/nfu053Search in Google Scholar

[4] Rohani V A, Shayaa S, Babanejaddehaki G. Topic modeling for social media content: A practical approach. Proceedings of 3rd International Conference on Computer and Information Sciences, 2016: 397–402.10.1109/ICCOINS.2016.7783248Search in Google Scholar

[5] Sobkowicz P, Kaschesky M, Bouchard G. Opinion mining in social media: Modeling, simulating, and forecasting political opinions in the web. Government Information Quarterly, 2012, 29(4): 470–479.10.1016/j.giq.2012.06.005Search in Google Scholar

[6] Deerwester S, Dumais S T, Furnas G W, et al. Indexing by latent semantic analysis. Journal of the American society for information science, 1990, 41(6): 391–407.10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9Search in Google Scholar

[7] Baeza-Yates R, Ribeiro-Neto B. Modern information retrieval. New York: ACM Press, 1999.Search in Google Scholar

[8] Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation. Journal of machine learning research, 2003, 3: 993–1022.Search in Google Scholar

[9] Blei D M, Lafferty J D. Dynamic topic models. Proceedings of the 23rd International Conference on Machine Learning, 2006: 113–120.10.1145/1143844.1143859Search in Google Scholar

[10] Ahmed A, Xing E. Dynamic non-parametric mixture models and the recurrent chinese restaurant process: With applications to evolutionary clustering. Proceedings of the SIAM International Conference on Data Mining, 2008: 219–230.10.1137/1.9781611972788.20Search in Google Scholar

[11] Iwata T, Yamada T, Sakurai Y, et al. Online multiscale dynamic topic models. Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2010: 663–672.10.1145/1835804.1835889Search in Google Scholar

[12] Allan J, Carbonell J, Doddington G, et al. Topic detection and tracking pilot study: Final report. Proceedings of DARPA Broadcast News Transcription and Understanding Workshop, 1998: 194–218.Search in Google Scholar

[13] Allan J. Introduction to topic detection and tracking. Topic detection and tracking. Boston, MA: Springer, 2002.10.1007/978-1-4615-0933-2Search in Google Scholar

[14] Chen C, Ibekwe-SanJuan F, Hou J. The structure and dynamics of cocitation clusters: A multiple-perspective cocitation analysis. Journal of the Association for Information Science and Technology, 2010, 61(7): 1386–1409.10.1002/asi.21309Search in Google Scholar

[15] Leydesdorff L, Nerghes A. Co-word maps and topic modeling: A comparison using small and medium-sized corpora (N ≤ 1000). Journal of the Association for Information Science and Technology, 2017, 68(4): 1024–1035.10.1002/asi.23740Search in Google Scholar

[16] Rule A, Cointet J P, Bearman P S. Lexical shifts, substantive changes, and continuity in State of the Union discourse. Proceedings of the National Academy of Sciences, 2015, 112(35): 10837–10844.10.1073/pnas.1512221112Search in Google Scholar PubMed PubMed Central

[17] Lu L Y Y, Liu J S. A novel approach to identify the major research themes and development trajectory: The case of patenting research. Technological Forecasting and Social Change, 2016, 103: 71–82.10.1016/j.techfore.2015.10.018Search in Google Scholar

[18] Mauch M, MacCallum R M, Levy M, et al. The evolution of popular music: USA 1960–2010. Royal Society Open Science, 2015, 2(5): 150081.10.1098/rsos.150081Search in Google Scholar PubMed PubMed Central

[19] Ding W, Chen C. Dynamic topic detection and tracking: A comparison of HDP, C-word, and cocitation methods. Journal of the Association for Information Science and Technology, 2014, 65(10): 2084–2097.10.1002/asi.23134Search in Google Scholar

[20] Hall D, Dan J, Christopher D. Studying the history of ideas using topic models. Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2008: 363–371.10.3115/1613715.1613763Search in Google Scholar

[21] Sun L J, Yin Y F. Discovering themes and trends in transportation research using topic modeling. Transportation Research Part C: Emerging Technologies, 2017, 77: 49–66.10.1016/j.trc.2017.01.013Search in Google Scholar

[22] Greene D, Cross J P. Exploring the political agenda of the European parliament using a dynamic topic modeling approach. Political Analysis, 2017: 25(1): 77–94.10.1017/pan.2016.7Search in Google Scholar

[23] Lau J H, Collier N, Baldwin T. On-line trend analysis with topic models: # twitter trends detection topic model online. Proceedings of COLING 2012, 2012: 1519–1534.Search in Google Scholar

[24] Barua A, Thomas S W, Hassan A E. What are developers talking about? An analysis of topics and trends in stack overflow. Empirical Software Engineering, 2014, 19(3): 619–654.10.1007/s10664-012-9231-ySearch in Google Scholar

[25] Cao L N, Tang X J. Topics and trends of the on-line public concerns based on Tianya forum. Journal of Systems Science and Systems Engineering, 2014, 23(2): 212–230.10.1007/s11518-014-5243-zSearch in Google Scholar

[26] Morimoto T, Kawasaki Y. Forecasting financial market volatility using a dynamic topic model. Asia-Pacific Financial Markets, 2017, 24(3): 149–167.10.1007/s10690-017-9228-zSearch in Google Scholar

[27] Cao L N, Tang X J. Prevailing trends detection of public opinions based on Tianya Forum. Proceedings of International Conference on Intelligent Data Engineering and Automated Learning, 2013: 186–193.10.1007/978-3-642-41278-3_23Search in Google Scholar

[28] Sun L, Yin Y. Discovering themes and trends in transportation research using topic modeling. Transportation Research Part C: Emerging Technologies, 2017, 77: 49–66.10.1016/j.trc.2017.01.013Search in Google Scholar

[29] Hu Y, Tang X J. Using support vector machine for classification of Baidu hot word. Proceedings of International Conference on Knowledge Science, Engineering and Management (KSEM 2013). Springer, 2013: 580–590.10.1007/978-3-642-39787-5_49Search in Google Scholar

[30] Blei D M, Lafferty J D. A correlated topic model of science. The Annals of Applied Statistics, 2007, 1(1): 17–35.10.1214/07-AOAS114Search in Google Scholar

[31] Blei D M, Lafferty J D. Dynamic topic models. Proceedings of the 23rd International Conference on Machine Learning, 2006: 113–120.10.1145/1143844.1143859Search in Google Scholar

[32] Griffiths T L, Steyvers M. Finding scientific topics. Proceedings of the National Academy of Sciences, 2004(suppl 1): 5228–5235.10.1073/pnas.0307752101Search in Google Scholar PubMed PubMed Central

[33] Kullback S, Leibler R A. On information and sufficiency. The Annals of Mathematical Statistics, 1951, 22(1): 79–86.10.1214/aoms/1177729694Search in Google Scholar

[34] Osterreicher F, Vajda I. A new class of metric divergences on probability spaces and its applicability in statistics. Annals of the Institute of Statistical Mathematics, 2003, 55(3): 639–653.10.1007/BF02517812Search in Google Scholar

[35] Endres D M, Schindelin J E. A new metric for probability distributions. IEEE Transactions on Information Theory, 2003, 49(7): 1858–1860.10.1109/TIT.2003.813506Search in Google Scholar

[36] Mehrotra R, Sanner S, Buntine W, et al. Improving LDA topic models for microblogs via tweet pooling and automatic labeling. Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2013: 889–892.10.1145/2484028.2484166Search in Google Scholar

[37] Lau J H, Grieser K, Newman D, et al. Automatic labelling of topic models. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011: 1536–1545.Search in Google Scholar

[38] Chuang J, Ramage D, Manning C, et al. Interpretation and trust: Designing model-driven visualizations for text analysis. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2012: 443–452.10.1145/2207676.2207738Search in Google Scholar

[39] Sievert C, Shirley K. LDAvis: A method for visualizing and interpreting topics. Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces, 2014: 63–70.10.3115/v1/W14-3110Search in Google Scholar

[40] Von Luxburg U. A tutorial on spectral clustering. Statistics and Computing, 2007, 17(4): 395–416.10.1007/s11222-007-9033-zSearch in Google Scholar

[41] Aghabozorgi S, Shirkhorshidi A S, Wah T Y. Time-series clustering — A decade review. Information Systems, 2015, 53: 16–38.10.1016/j.is.2015.04.007Search in Google Scholar

[42] Yu L L, Asur S, Huberman B A. Trend dynamics and attention in Chinese social media. American Behavioral Scientist, 2015, 59(9): 1142–1156.10.1177/0002764215580619Search in Google Scholar

Received: 2019-12-22
Accepted: 2020-05-29
Published Online: 2020-09-03
Published in Print: 2020-08-26

© 2020 Walter De Gruyter GmbH, Berlin/Boston

Downloaded on 4.12.2022 from frontend.live.degruyter.dgbricks.com/document/doi/10.21078/JSSI-2020-309-16/html
Scroll Up Arrow