Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter Mouton January 28, 2021

The Information Structure–prosody interface in text-to-speech technologies. An empirical perspective

Mónica Domínguez ORCID logo, Mireia Farrús and Leo Wanner

Abstract

The correspondence between the communicative intention of a speaker in terms of Information Structure and the way this speaker reflects communicative aspects by means of prosody have been a fruitful field of study in Linguistics. However, text-to-speech applications still lack the variability and richness found in human speech in terms of how humans display their communication skills. Some attempts were made in the past to model one aspect of Information Structure, namely thematicity for its application to intonation generation in text-to-speech technologies. Yet, these applications suffer from two limitations: (i) they draw upon a small number of made-up simple question-answer pairs rather than on real (spoken or written) corpus material; and (ii) they do not explore whether any other interpretation would better suit a wider range of textual genres beyond dialogs. In this paper, two different interpretations of thematicity in the field of speech technologies are examined: the state-of-art binary (and flat) theme-rheme, and the hierarchical thematicity defined by Igor Mel’čuk within the Meaning-Text Theory. The outcome of the experiments on a corpus of native speakers of US English suggests that the latter interpretation of thematicity has a versatile implementation potential for text-to-speech applications of the Information Structureprosody interface.


Corresponding author: Mónica Domínguez, Universitat Pompeu Fabra, Barcelona, Spain, E-mail:

Funding source: European Commission

Award Identifier / Grant number: H2020-645012-RIA, H2020-870930-IA

Funding source: Agencia Estatal de Investigación (AEI)

Award Identifier / Grant number: RYC-2015-17239 (AEI/FSE, UE)

Funding source: Ministerio de Ciencia, Innovación y Universidades

Award Identifier / Grant number: RYC-2015-17239 (AEI/FSE, UE)

Funding source: Fondo Social Europeo (FSE)

Award Identifier / Grant number: RYC-2015-17239 (AEI/FSE, UE)

  1. Research funding: This work was partially funded by the European Commission in the context of its H2020 Programme under the contract numbers H2020-645012-RIA (KRISTINA) and H2020-870930-IA (WELCOME). The second author was funded by the Agencia Estatal de Investigación (AEI), Ministerio de Ciencia, Innovación y Universidades and the Fondo Social Europeo (FSE), grant RYC-2015-17239 (AEI/FSE, UE).

References

Ballesteros, Miguel, Bernd Bohnet, Simone Mille & Leo Wanner. 2015. Data-driven sentence generation with non-isomorphic trees. In Proceedings of the 2015 conference of the North American chapter of the association for computational linguistics: Human language technologies (NAACL–HLT). Association for Computational Linguistics, Denver, Colorado.10.3115/v1/N15-1042Search in Google Scholar

Baumann, Stefan. 2012. The intonation of givenness: Evidence from German. Tübingen: Max Niemeyer Verlag.Search in Google Scholar

Beckman, Mary E. & Janet Pierrehumbert. 1986. Intonational Structure in Japanese and English. Phonology Yearbook 3. 255–310.10.1017/S095267570000066XSearch in Google Scholar

Black, Alan W. & Paul A. Taylor. 1997. The festival speech synthesis system: System documentation. UK: Human Communciation Research Centre, University of Edinburgh Scotland. Technical Report HCRC/TR-83.Search in Google Scholar

Boersma, Paul. 2001. Praat, a system for doing phonetics by computer. Glot International 5. 341–345.Search in Google Scholar

Bohnet, Bernd, Alicia Burga & Leo Wanner. 2013. Towards the annotation of penn treebank with information structure. In Proceedings of the sixth international joint conference on natural language processing. Association for Computational Linguistics, Nagoya, Japan.Search in Google Scholar

Bouayad-Agha, Nadjet, Gerard Casamayor, Simone Mille & Leo Wanner. 2012. Perspective-oriented generation of football match summaries: Old tasks, new challenges. ACM Transactions on Speech and Language Processing 9. 1–31.10.1145/2287710.2287711Search in Google Scholar

Brown, Gillian. 1983. Prosodic structure and the given/new distinction. In Anne Cutler & D. Robert Ladd (eds.), Prosody: Models and measurements, 67–77. Berlin, Heidelberg: Springer.10.1007/978-3-642-69103-4_6Search in Google Scholar

Büring, Daniel. 2003. On D-trees, beans, and B-accents. Linguistics and Philosophy 26. 511–545.10.1023/A:1025887707652Search in Google Scholar

Calhoun, Sasha. 2010. The centrality of metrical structure in signalling information structure: A probabilistic perspective. Language 1. 1–42.10.1353/lan.0.0197Search in Google Scholar

Campbell, Nick & Parham Mokhtari. 2003. Voice quality: The 4th prosodic dimension. In Proceedings of the 15th international congress of phonetic sciences (ICSPhS). The 15th ICPhS Organizing Committee: Causal Productions Pty Ltd. Barcelona, Spain.Search in Google Scholar

Chafe, Wallace L. 1976. Givenness, contrastiveness, definiteness, subjects, topics, and point of view. In Charles N. Li (ed.), Subject and topic, 25–55. New York: Academic Press.Search in Google Scholar

Charniak, Eugene, Don Blaheta, Niyu Ge, Keith Hall, John Hale & Mark Johnson. 2000. BLLIP 1987-89 WSJ Corpus Release 1 LDC2000T43. Available at: https://www.cis.upenn.edu/∼treebank/.Search in Google Scholar

Chomsky, Noam. 1995. The Minimalist program. Cambridge, MA: MIT Press.Search in Google Scholar

Clark, Herbert H. & Susane E. Haviland. 1977. Comprehension and the given-new contract. In Roy O. Freedle (ed.), Discourse production and comprehension. Discourse processes: Advances in research and theory, 1, 1–40. Norwood, New Jersey: Ablex Publishing Corporation.Search in Google Scholar

Daneš, Frantisek. 1970. One instance of Prague School methodology: Functional analysis of utterance and text. In Paul L. Garvin (ed.), Method and theory in linguistics. Janua Linguarum. Series Maior, 40, 132–146. Berlin, Germany: De Gruyter Mouton.10.1515/9783110872521.132Search in Google Scholar

Domínguez, Monica, Alicia Burga, Mireia Farrús & Leo Wanner. 2018. Towards expressive prosody generation in TTS for reading aloud applications. In Proceedings of IberSpeech 2018. International Speech Communication Association (ISCA). Barcelona, Spain.10.21437/IberSPEECH.2018-9Search in Google Scholar

Domínguez, Monica, Ivan Latorre, Mireia Farrús, Joan Codina & Leo Wanner. 2016. Praat on the web: An upgrade of praat for semi-automatic speech annotation. In Proceedings of the 26th international conference on computational linguistics: System demonstrations. The COLING 2016 Organizing Committee. Japan: Osaka.Search in Google Scholar

Domínguez, Monica, Mireia Farrús & Leo Wanner. 2017. A thematicity-based prosody enrichment tool for CTS. In Proceedings of interspeech: Show and tell demonstrations. Stockholm, Sweden: International Speech Communication Association (ISCA).10.21437/SpeechProsody.2018-119Search in Google Scholar

Erteschik-Shir, Nomi. 2007. Information structure: The syntax-discourse interface. Oxford, United Kingdom: Oxford University Press.Search in Google Scholar

Grabe, Esther, Francis Nolan & FarrarKimberley. 1998. IViE – A comparative transcription system for intonational variation in English. In Proceedings of the international conference on spoken language processing (ICSLP). Sydney, Australia: Australian Speech Science and Technology Association, Incorporated (ASSTA).Search in Google Scholar

Haji-Abdolhosseini, Mohammad. 2003. A constraint-based approach to information structure and prosody correspondence. In Proceedings of the 10th international conference on head-driven phrase structure grammar. Michigan State University: CSLI Publications, East Lansing.10.21248/hpsg.2003.9Search in Google Scholar

Hajičova, Eva. 1987. Focussing—A meeting point of linguistics and artificial intelligence. In Proceedings of the 2nd international conference on artificial intelligence II: Methodology, systems, applications. Varna, Bulgaria: Noth-Holland.Search in Google Scholar

Hajičova, Eva, Barbara Partee & Petr Sgall. 1998. Topic-focus articulation, tripartite structures, and semantic content volume 71 of studies in linguistics and philosophy. Dordrecht, Netherlands: Springer Netherlands.10.1007/978-94-015-9012-9Search in Google Scholar

Hall, Mark, Eibe Frank, Geoffery Holmes, Bernhard Pfahringer, Peter Reutemann & Ian H. Witten. 2009. The WEKA data mining software: An update. SIGKDD Explorations 11(1). 10–18. https://doi.org/10.1145/1656274.1656278.Search in Google Scholar

Halliday, Michael. 1967. Notes on transitivity and theme in english: Parts 1–3. Journal of Linguistics 3. 199–244.10.1017/S0022226700001882Search in Google Scholar

Hedberg, Nancy & Juan Sosa. 2008. The prosody of topic and focus in spontaneous English dialogue. In Chungmin Lee, Matthew Gordon & Daniel Büring (eds.), Topic and focus. Studies in linguistics and philosophy, vol. 82. Dordrecht, Netherlands: Springer.10.1007/978-1-4020-4796-1_6Search in Google Scholar

Hirschberg, Julia. 2008. Pragmatics and intonation. In Laurence R. Horn & Gregory Ward (eds.), The handbook of pragmatics chapter 23, 515–537. Hoboken, New Jersey, USA: John Wiley & Sons, Ltd.10.1002/9780470756959.ch23Search in Google Scholar

Daniel Hirst & Albert Di-Cristo (eds.). 1998. Intonation systems: A survey of twenty languages. Cambridge, United Kingdom: Cambridge University Press.Search in Google Scholar

Izzad, Ramli, Seman Noraini, Ardi Norizah & Jamil Nursuriati. 2016. Rule-based storytelling text-to-speech (TTS) synthesis. In 3rd International conference on mechanics and mechatronics research (ICMMR). Volume 77 of MATEC web conferences. Chongqing, China: EDP Sciences.10.1051/matecconf/20167704003Search in Google Scholar

Kalbertodt, Janina, Beatrice Primus & Petra B. Schumacher. 2015. Punctuation, prosody, and discourse: Afterthought vs. right dislocation. Frontiers in Psychology 6. 1–12.10.3389/fpsyg.2015.01803Search in Google Scholar

Krifka, Manfred. 2008. Basic notions of information structure. Acta Linguistica Hungarica 55. 243–276.10.1556/ALing.55.2008.3-4.2Search in Google Scholar

Kruijff-Korbayová, Ivana, Stina Ericsson, Kepa Rodríguez, J. & ElenaKaragrjosova. 2003. Producing contextually appropriate intonation in an information-state based dialogue system. In Proceedings of the 10th conference of the European chapter of the association for computational linguistics (EACL). Association for Computational Linguistics. Budapest, Hungary.10.3115/1067807.1067838Search in Google Scholar

Kügler, Frank, Bernadett Smolibocki & Manfred Stede. 2012. Evaluation of information structure in speech synthesis: The case of product recommender systems perception. In ITG symposium on speech communication. IEEE Braunschweig, Germany.Search in Google Scholar

Ladd, Robert. 2008. Intonational phonology. Cambridge: Cambridge University Press.10.1017/CBO9780511808814Search in Google Scholar

Lambrecht, Knud. 1994. Information structure and sentence form: Topic, focus and the mental representations of discourse referents. Cambridge: Cambridge University Press.10.1017/CBO9780511620607Search in Google Scholar

Levelt, Willem. 1993. Speaking: From intention to articulation. Cambridge, MA: MIT Press.10.7551/mitpress/6393.001.0001Search in Google Scholar

Levitan, Rivka, Stefan Beňuš, Ramiro H. Gálvez, Agustin Gravano, Florencia Savoretti, Marian Trnka, Andreas Weise & Julia Hirschberg. 2016. Implementing acoustic-prosodic entrainment in a conversational avatar. In Proceedings of the annual conference of the international speech communication association (Interspeech). San Francisco, USA.10.21437/Interspeech.2016-985Search in Google Scholar

López-Mencía, Beatriz, David Díaz-Pardo, Alvaro Hernández-Trapote & Luis A. Hernández-Gómez. 2013. Embodied conversational agents in interactive applications for children with special educational needs. In David Griol Barres, Zoraida Callejas Carrión & Ramon L.-C. Delgado (eds.), Technologies for inclusive education: Beyond traditional integration approaches, 59–88. Hershey, USA: IGI Global.10.4018/978-1-4666-2530-3.ch004Search in Google Scholar

Mathesius, Vilem. 1929. Zur Satzperspektive im modernen Englisch. Archiv für das Studium der neueren Sprachen und Literaturen, 202–210. Berlin, Germany: Erich Schmidt Verlag. https://en.google-info.cn/21249545/1/archiv-fur-das-studium-der-neueren-sprachen-und-literaturen.html.Search in Google Scholar

Mel’čuk, Ignor A. 2001. Communicative organization in natural language: The semantic-communicative structure of sentences. Amsterdam, Philadephia: Benjamins.10.1075/slcs.57Search in Google Scholar

Meurers, Detmar, Ramon Ziai, Niels Ott & Janina Kopp. 2011. Evaluating answers to reading comprehension questions in context: Results for German and the role of information structure. In Proceedings of the TextInfer 2011 workshop on textual entailment TIWTE ’11. Association for Computational Linguistics, Stroudsburg, PA, USA.Search in Google Scholar

Ortiz, Amalia, Maria del Puy Carretero, David Oyarzun, Jose J. Yanguas, Cristina Buiza, M. Feli González & Igone Etxeberria. 2007. Elderly users in ambient intelligence: Does an avatar improve the interaction? In Constantine Stephanidis & Michael Pieper (eds.), Universal access in ambient intelligence environments: 9th ERCIM workshop on user interfaces for all, 99–114. Berlin, Heidelberg: Springer Berlin Heidelberg.10.1007/978-3-540-71025-7_8Search in Google Scholar

Pérez-Marín, Diana & Ismael Pascual-Nieto. 2013. An exploratory study on how children interact with pedagogic conversational agents. Behaviour & Information Technology 32. 955–964.10.1080/0144929X.2012.687774Search in Google Scholar

Riester, Arndt, Lisa Brunetti & Kordula De Kuthy. 2018. Annotation guidelines for questions under discussion and information structure. In Evangelia Adamou, Katharina Haude & Martine Vanhove (eds.), Information structure in lesser-described languages: Studies in prosody and syntax, 403–443. John Benjamins.10.1075/slcs.199.14rieSearch in Google Scholar

Rooth, Mats. 1992. A theory of focus interpretation. Natural Language Semantics 1. 75–116.10.1007/BF02342617Search in Google Scholar

Schröder, Marc & Jurgen Trouvain. 2003. The German text-to-speech synthesis system MARY: A tool for research, development and teaching. International Journal of Speech Technology 6. 365–377.10.1023/A:1025708916924Search in Google Scholar

Schwarzschild, Roger. 1999. GIVENness, AvoidF and other constraints on the placement of accent*. Natural Language Semantics 7. 141–177.10.1023/A:1008370902407Search in Google Scholar

Selkirk, Elisabeth O. 1984. Phonology and syntax: The relation between sound and structure. Cambridge, Massachussetts: The MIT Press.Search in Google Scholar

Sgall, Petr, Eva Hajičová & Eva Benešová. 1973. Topic, focus and generative semantics. Kronberg im Taunus, Germany: Scriptor.Search in Google Scholar

Silverman, Kim, Mary Beckman, John Pitrelli, Mori Ostendorf, Colin Wightman, Patti Price, Janet Pierrehumbert & Julia Hirschberg. 1992. TOBI: A standard for labeling English prosody. In Proceedings of the 2nd international conference on spoken language processing (ICSLP 92). International Speech Communication Association (ISCA). Banff, Canada.Search in Google Scholar

Steedman, Mark. 2000. Information structure and the syntax-phonology interface. Linguistic Inquiry 31. 649–689.10.1162/002438900554505Search in Google Scholar

Syrdal, Ann K. & Yeon-Jun Kim. 2008. Dialog speech acts and prosody: Considerations for TTS. In Proceedings of the 4th international conference on speech prosody. Campinas, Brazil: International Speech Communication Association (ISCA).Search in Google Scholar

Vallduví, Enric. 2016. Information structure. In Maria Aloni & Paul Dekker (eds.), The Cambridge handbook of formal semantics Cambridge handbooks in language and linguistics, 728–755. Cambridge: Cambridge University Press.10.1017/CBO9781139236157.024Search in Google Scholar

Vanrell, Maria, Ignasi Mascaró, Francesc Torres-Tamarit & Pilar Prieto. 2013. Intonation as an encoder of speaker certainty: Information and confirmation yes-no questions in Catalan. Language and Speech 56. 163–190.10.1177/0023830912443942Search in Google Scholar

Von Stechow, Arnim. 1981. Topic, focus and local relevance. In Willemijn Klein & Willem Levelt (eds.), Crossing the boundaries in linguistics: Studies presented to Manfred Bierwisch, 95–130. Dordrecht, Netherlands: Springer.10.1007/978-94-009-8453-0_5Search in Google Scholar

Wanner, Leo, Elisabeth André, Josep Blat, Stamatia Dasiopoulou, Mireia Farrús, Thiago Fraga, Eleni Kamateri, Florian Lingenfelser, Gerard Llorach, Oriol Martínez, Georgios Meditskos, Simon Mille, Wolfgang Minker, Louisa Pragst, Dominik Schiller, Andries Stam, Ludo Stellingwerff, Federico Sukno, Bianca Vieru & Stefanos Vrochidis. 2017. Kristina: A knowledge-based virtual conversation agent. In Proceedings of the 15th international conference on practical applications of agents and multi-agent systems (PAAMS). Oporto, Portugal: Springer.10.1007/978-3-319-59930-4_23Search in Google Scholar

Wargnier, Pierre, Giovanni Carletti, Yann Laurent-Corniquet, Samuel Benveniste, Pierre Jouvelot & Rigaud Anne-Sophie. 2016. Field evaluation with cognitively-impaired older adults of attention management in the embodied conversational agent louise. In Proceedings of the 4th international conference on serious games and applications for health (SeGAH). Orlando, FL, USA: IEEE.10.1109/SeGAH.2016.7586282Search in Google Scholar

Wolff, Susann & Andre Brechmann. 2015. Carrot and stick 2.0: The benefits of natural and motivational prosody in computer-assisted learning. Computers in Human Behavior 43. 76–84.10.1016/j.chb.2014.10.015Search in Google Scholar

Received: 2020-02-18
Accepted: 2021-01-04
Published Online: 2021-01-28
Published in Print: 2022-05-25

© 2021 Walter de Gruyter GmbH, Berlin/Boston

Scroll Up Arrow