Abstract
Drawing on an analogy between discourse and syntactic trees, this paper chooses 359 Wall Street Journal articles with multiple paragraphs from the Rhetorical Structure Theory (RST) Discourse Treebank, and converts each discourse tree into three additional dependency ones, at discourse, paragraph and sentence levels, with exclusively elementary discourse units of clauses, sentences and paragraphs, respectively. It empirically tests and visually presents the genre-specific “summary+details” or “inverted pyramid” structuring of news discourse. It further extends the idea of inverted pyramid structuring to the paragraph and sentence levels. It proves that the body of the report also has a similar schematic top-down installment organization with macro-propositions on top. It also visually and statistically presents the rhetorical structures at sentence level, which differ to some extent from grammatical structures. Operated in line with the compositionality criterion and hierarchy principle of RST, the converted trees provide unique analytical advantages and constitute new research prospects.
Funding statement: Funding: This work was supported by the Department of Education of Zhejiang Province, China (Grant No. Y201223584); and the National Social Science Foundation of China (Grant No. 11&ZD188).
About the authors
Hongxin Zhang is a lecturer and PhD candidate at Zhejiang University, China. Her research interests include quantitative linguistics, discourse analysis and dependency grammar. Her recent work has been published in Journal of Linguistics, Journal of Quantitative Linguistics and Glottometrics.
Haitao Liu is a Qiushi distinguished Professor of Linguistics and Applied Linguistics at Zhejiang University and Chair professor of linguistics at Ningbo Institute of Technology. His research interests include text analysis, quantitative linguistics and language complex networks. He is the author of 150+ scientific publications about language and linguistics, more than 40 publications indexed within the Web of Science.
Acknowledgments
We are deeply indebted to the anonymous referees for their detailed and constructive comments. Also, we sincerely appreciate Timothy Osborne’s and Chunshan Xu’s help in proofreading the paper. Thanks also go to Haiqi Wu, who helped with data for the research.
References
Abelen, Eric, Gisela Redeker & Sandra A. Thompson. 1993. The rhetorical structure of US-American and Dutch fund-raising letters. Text 13(3). 323–350.10.1515/text.1.1993.13.3.323Search in Google Scholar
Alami, Nabil, Mohammed Meknassi & Noureddine Rais. 2015. Automatic texts summarization: Current state of the art. Journal of Asian Scientific Research 5(1). 1–15.10.18488/journal.2/2015.5.1/2.1.1.15Search in Google Scholar
Antonio, Juliano Desiderato & Fernanda Trombini Rahmen Cassim. 2012. Coherence relations in academic spoken discourse. Linguística LII: Le Discours Parlé 52(1). 323–336.10.4312/linguistica.52.1.323-336Search in Google Scholar
Asher, Nicholas & Alex Lascarides. 1994. Intentions and information in discourse. In J. Pustejovsky (ed.), Proceedings of 32nd meeting of the association for computational linguistics (ACL’94), 34–41, Las Cruces, New Mexico, USA, 27–30 June.10.3115/981732.981738Search in Google Scholar
Asher, Nicholas & Alex Lascarides. 2003. Logics of conversation. Cambridge: Cambridge University Press.Search in Google Scholar
Baldridge, Jason & Alex Lascarides. 2005. Probabilistic head-driven parsing for discourse structure. In Proceedings of the ninth conference on computational natural language learning (CoNLL), 96–103, Michigan, USA, 29–30 June.10.3115/1706543.1706560Search in Google Scholar
Bell, Allan. 1991. The language of news media. Oxford: Blackwell.Search in Google Scholar
Braddock, Richard. 1974. The frequency and placement of topic sentences in expository prose. Research in the Teaching of English 8(3). 287–302.10.4324/9781315044620-22Search in Google Scholar
Carlson, Lynn & Daniel Marcu. 2001. Discourse tagging reference manual. http://www.isi.edu/~marcu/discourse/tagging-ref-manual.pdf (accessed 9 February 2015).Search in Google Scholar
Carlson, Lynn, Daniel Marcu & Mary Ellen Okurowski. 2002. RST discourse Treebank, LDC2002T07 [Corpus]. Philadelphia, PA: Linguistic Data Consortium.Search in Google Scholar
Carlson, Lynn, Daniel Marcu & Mary Ellen Okurowski. 2003. Building a discourse-tagged corpus in the framework of rhetorical structure theory. In Jan van Kuppevelt & Ronnie W. Smith (eds.), Current directions in discourse and dialogue, 85–112. Dordrecht: Kluwer Academic Publishers.10.1007/978-94-010-0019-2_5Search in Google Scholar
Cristea, Dan, Nancy Ide & Laurent Romary. 1998. Veins theory: A model of global discourse cohesion and coherence. In Proceedings of the 36th annual meeting of the association for computational linguistics and the 17th international conference on computational linguistics (ACL-98/COLING-98), 281–285, Montréal, Canada, 10–14 August.10.3115/980845.980891Search in Google Scholar
Cristea, Dan. 2005. Motivations and implications of veins theory. In Proceedings of the 2nd international workshop on natural language understanding and cognitive science, 32–44, Miami, Florida, USA, 24 May.Search in Google Scholar
da Cunha, Iria, Leo Wanner & Teresa Cabré. 2007. Summarization of specialized discourse: The case of medical articles in Spanish. Terminology 13(2). 249–286.10.1075/term.13.2.07cunSearch in Google Scholar
da Cunha, Iria, Juan-Manuel Torres-Moreno & Gerardo Sierra. 2011. On the development of the RST Spanish Treebank. In Proceedings of the fifth law workshop (ACL 2011), 1–10, Portland, Oregon, USA, 23–24 June.Search in Google Scholar
Dunn, Anne. 2005. Television news as narrative. In Helen Fulton (ed.), Narrative and media, 140–152. Cambridge: Cambridge University Press.10.1017/CBO9780511811760Search in Google Scholar
duVerle, David A. & Helmut Prendinger. 2009.A novel discourse parser based on support vector machine classification. In Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP, 665–673, Suntec, Singapore, 2–7 August.10.3115/1690219.1690239Search in Google Scholar
Errico, Marcus, with John April, Andrew Asch, Lynette Khalfani, Miriam A. Smith & Xochiti R. Ybarra. 1997. The evolution of the summary news lead. Media History Monographs 1(1). http://www.scripps.ohiou.edu/mediahistory/mhmjour1-1.htm (accessed 14 December 2013).Search in Google Scholar
Feng, Vanessa Wei & Graeme Hirst. 2012. Text-level discourse parsing with rich linguistic features. In Proceedings of the 50th annual meeting of the association for computational linguistics (Volume 1: Long Papers), 60–68, Jeju Island, Korea, 8–14 July.Search in Google Scholar
Goutsos, Dionysis. 1996. A model of sequential relations in expository text. Text 16(4). 501–533.10.1515/text.1.1996.16.4.501Search in Google Scholar
Grosz, Barbara, Aravind Joshi & Scott Weinstein. 1995. Centering: A framework for modelling the local coherence of discourse. Computational Linguistics 21(2). 203–226.10.21236/ADA324949Search in Google Scholar
Halliday, Michael Alexander Kirkwood & Ruqaiya Hasan. 1976. Cohesion in English. London: Longman.Search in Google Scholar
Hernault, Hugo, Helmut Prendinger, David A. duVerle & Mitsuru Ishizuka. 2010. HILDA: A discourse parser using support vector machine classification. Dialogue and Discourse 1(3). 1–33.10.5087/dad.2010.003Search in Google Scholar
Ide, Nancy & Dan Cristea. 2000.A hierarchical account of referential accessibility. In Proceedings of the 38th meeting of the association for computational linguistics, Hong Kong.10.3115/1075218.1075271Search in Google Scholar
Iruskieta, Mikel, Iria da Cunha & Maite Taboada. 2014. Principles of a qualitative method for rhetorical analysis evaluation: A contrastive analysis English-Spanish-Basque. Language Resources and Evaluation 49(2). 263–309.10.1007/s10579-014-9271-6Search in Google Scholar
Kamp, Hans & Uwe Reyle. 1993. From discourse to logic: Introduction to model theoretic semantics of natural language, formal logic and discourse representation theory. Dordrecht, The Netherlands: Kluwer.Search in Google Scholar
Khalil, Esam N. 2006. Communicating affect in news stories: The case of the lead sentence. Text & Talk 26(3). 329–34910.1515/TEXT.2006.013Search in Google Scholar
Kieras, David E. 1980. Initial mention as a signal to thematic content in technical passages. Memory & Cognition 8(4). 345–353.10.3758/BF03198274Search in Google Scholar
Kong, Kenneth C. C. 1998. Are simple business request letters really simple? A comparison of Chinese and English business request letters. Text 18(1). 103–141.10.1515/text.1.1998.18.1.103Search in Google Scholar
Lindley, Craig, Jim Davis, Frank Nack & Lloyd Rutledge. 2001. The application of rhetorical structure theory to interactive news program generation from digital archives (Technical Report No. INS-R0101). Amsterdam, Netherlands: Centrum voor Wiskunde en Informatica.Search in Google Scholar
Li, Sujian, Liang Wang, Ziqiang Cao & Wenjie Li. 2014. Text-level discourse dependency parsing. In Proceedings of the 52nd annual meeting of the association for computational linguistics, 25–35, Baltimore, Maryland, USA, 23–25 June.10.3115/v1/P14-1003Search in Google Scholar
Liu, Haitao. 2008. Dependency distance as a metric of language comprehension difficulty. Journal of Cognitive Science 9. 159–191.10.17791/jcs.2008.9.2.159Search in Google Scholar
Liu, Haitao. 2009. Dependency grammar: From theory to practice. Beijing: Science Press.Search in Google Scholar
Mann, William & Sandra A. Thompson. 1988. Rhetorical structure theory: Toward a functional theory of text organization. Text 8(3). 243–281.10.1515/text.1.1988.8.3.243Search in Google Scholar
Mann, William & Sandra A. Thompson (eds.). 1992. Discourse description: Diverse linguistic analyses of a fundraising text. Amsterdam: John Benjamins10.1075/pbns.16Search in Google Scholar
Marcu, Daniel. 1999. Discourse trees are good indicators of importance in text. In Inderjeet Mani & Mark T. Maybury (eds.), Advances in automatic text summarization, 123–136. Cambridge, MA: The MIT Press.Search in Google Scholar
Marcu, Daniel. 2000. The theory and practice of discourse parsing and summarization. Cambridge, MA: The MIT Press.10.7551/mitpress/6754.001.0001Search in Google Scholar
Mindich, David T. Z. 1998. Just the Facts * how “objectivity” came to define American journalism. New York and London: New York University Press.Search in Google Scholar
Pardo, Thiago Alexandre Salgueiro & Maria das Graças Volpe Nunes. 2008. On the Development and Evaluation of a Brazilian Portuguese Discourse Parser. Journal of Theoretical and Applied Computing 15(2). 43–64.10.22456/2175-2745.7015Search in Google Scholar
Pietiläinen, Jukka. 2005. From participating to informing. The transition of journalism in Russian regional press. In Svennik Høyer & Horst Pöttker (eds.), Diffusion of the news paradigm 1850–2000, 199–209. Göteborg: Nordicom.Search in Google Scholar
Pöttker, Horst. 2003. News and its communicative quality: The inverted pyramid – when and why did it appear? Journalism Studies 4(4). 501–511.10.1080/1461670032000136596Search in Google Scholar
Ramsay, Guy. 2000. Linearity in rhetorical organisation: A comparative cross-cultural analysis of newstext from the People’s Republic of China and Australia. International Journal of Applied Linguistics 10(2). 241–258.10.1111/j.1473-4192.2000.tb00150.xSearch in Google Scholar
Readership Institute. 2010. The value of feature-style writing. http://www.readership.org/content/feature.asp (accessed 22 January 2014).Search in Google Scholar
Roulet, Eddy. 1995. Geneva school. In Jef Verschueren, Jan-Ola Östman & Jan Blommaert (eds.), Handbook of pragmatics, 319–323. Amsterdam and Philadelphia, PA: John Benjamins.10.1075/hop.m.gen3Search in Google Scholar
Sanders, Ted & Carel van Wijk. 1996. PISA: A procedure for analyzing the structure of explanatory texts. Text 16(1). 91–132.10.1515/text.1.1996.16.1.91Search in Google Scholar
Scollon, Ron & Suzanne Scollon. 1997. Point of view and citation: Fourteen Chinese and English versions of the “same” news story. Text 17(1). 83–125.10.1515/text.1.1997.17.1.83Search in Google Scholar
Scollon, Ron. 2000. Generic variability in news stories in Chinese and English: A contrastive discourse study of five days’ newspapers. Journal of Pragmatics 32(6). 761–791.10.1016/S0378-2166(99)00092-2Search in Google Scholar
Shie, Jian-Shiung. 2012. The alignment of generic discourse units in news stories. Text & Talk 32(5). 661–679, DOI10.1515/text-2012-0031.10.1515/text-2012-0031Search in Google Scholar
Sperber, Dan & Deirdre Wilson. 1995. Relevance: Communication and cognition, 2nd edn. Oxford: Blackwell.Search in Google Scholar
Sporleder, Caroline & Alex Lascarides. 2004. Combining hierarchical clustering and machine learning to predict high-level discourse structure. In Proceedings of computational linguistics 2004, 43–49, Geneva, Switzerland, 23–27 August.Search in Google Scholar
Stede, Manfred. 2004. The Potsdam commentary corpus. In Proceedings of the ACL 2004 workshop “discourse annotation”, Barcelona, Spain.10.3115/1608938.1608951Search in Google Scholar
Stede, Manfred. 2012. Discourse processing. San Rafael, CA: Morgan and Claypool.Search in Google Scholar
Taboada, Maite. 2004. Building coherence and cohesion: Task-oriented dialogue in English and Spanish. Amsterdam/Philadelphia, PA: John Benjamins.10.1075/pbns.129Search in Google Scholar
Taboada, Maite & William Mann. 2006a. Applications of rhetorical structure theory. Discourse Studies 8(4). 567–588.10.1177/1461445606064836Search in Google Scholar
Taboada, Maite & William Mann. 2006b. Rhetorical structure theory: Looking back and moving ahead. Discourse Studies 8(3). 423–459.10.1177/1461445606061881Search in Google Scholar
Taboada, Maite. 2008. SFU review corpus [Corpus]. Vancouver: Simon Fraser University. http://www.sfu.ca/~mtaboada/research/SFU_Review_Corpus.html (accessed 22 January 2014).Search in Google Scholar
Thomson, Elizabeth A., Peter R. R. White & Philip Kitley. 2008. “Objectivity” and “Hard News” reporting across cultures. Journalism Studies 9(2). 212–228.10.1080/14616700701848261Search in Google Scholar
Torres-Moreno, Juan-Manuel. 2014. Automatic text summarization. London: Wiley.10.1002/9781119004752Search in Google Scholar
Van Dijk, Teun Adrianus. 1979. Relevance assignment in discourse comprehension. Discourse Processes 2(2). 113–126.10.1080/01638537909544458Search in Google Scholar
Van Dijk, Teun Adrianus. 1980. Macrostructures. Hillsdale, NJ: Lawrence Erlbaum.Search in Google Scholar
Van Dijk, Teun Adrianus. 1983. Discourse analysis: Its development and application to the structure of news. Journal of Communication 33. 20–43.10.1111/j.1460-2466.1983.tb02386.xSearch in Google Scholar
Van Dijk, Teun Adrianus & Walter Kintsch. 1983. Strategies of discourse comprehension. New York: Academic Press.Search in Google Scholar
Van Dijk, Teun Adrianus (ed.). 1985. Handbook of discourse analysis (4 vols.). London: Academic Press.Search in Google Scholar
Van Dijk, Teun Adrianus. 1986. News schemata. In Charles R. Cooper & Sidney Greenbaum (eds.), Studying writing: Linguistic approaches, 155–185. Thousand Oaks, CA: Sage Publications.Search in Google Scholar
Van Dijk, Teun Adrianus. 1988a. News analysis – case studies of international and national news in the press. Hillsdale, NJ: Lawrence Erlbaum.Search in Google Scholar
Van Dijk, Teun Adrianus. 1988b. News as discourse. Hillsdale, NJ: Lawrence Erlbaum.10.4324/9780203062784Search in Google Scholar
Yue, Ming & Zhiwei Feng. 2005. Findings in a preliminary study on the rhetorical structure of Chinese TV news reports. Paper presented at the first computational systemic functional grammar conference, Sydney, Australia, 15–16 July.Search in Google Scholar
Yue, Ming. 2006. Discursive usage of six Chinese punctuation marks. In Proceedings of the COLING/ACL-2006 student research workshop, Sydney, Australia, July.10.3115/1557856.1557866Search in Google Scholar
Yue, Ming & Haitao Liu. 2011. Probability distribution of discourse relations based on a Chinese RST-annotated corpus. Journal of Quantitative Linguistics 18(2). 107–121.10.1080/09296174.2011.556002Search in Google Scholar
©2016 by De Gruyter Mouton