Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Lingua Posnaniensis

The Journal of Poznan Society for the Advancement of the Arts and Sciences and Adam Mickiewicz University, Institute of Linguistics

2 Issues per year


SCImago Journal Rank (SJR) 2016: 0.100

Open Access
Online
ISSN
2083-6090
See all formats and pricing
More options …

Duration and speed of speech events: A selection of methods

Dafydd Gibbon / Katarzyna Klessa / Jolanta Bachan
Published Online: 2015-07-24 | DOI: https://doi.org/10.2478/linpo-2014-0004

Abstract

The study of speech timing, i.e. the duration and speed or tempo of speech events, has increased in importance over the past twenty years, in particular in connection with increased demands for accuracy, intelligibility and naturalness in speech technology, with applications in language teaching and testing, and with the study of speech timing patterns in language typology. H owever, the methods used in such studies are very diverse, and so far there is no accessible overview of these methods. Since the field is too broad for us to provide an exhaustive account, we have made two choices: first, to provide a framework of paradigmatic (classificatory), syntagmatic (compositional) and functional (discourse-oriented) dimensions for duration analysis; and second, to provide worked examples of a selection of methods associated primarily with these three dimensions. Some of the methods which are covered are established state-of-the-art approaches (e.g. the paradigmatic Classification and Regression Trees, CART , analysis), others are discussed in a critical light (e.g. so-called ‘rhythm metrics’). A set of syntagmatic approaches applies to the tokenisation and tree parsing of duration hierarchies, based on speech annotations, and a functional approach describes duration distributions with sociolinguistic variables. Several of the methods are supported by a new web-based software tool for analysing annotated speech data, the Time Group Analyser.

Keywords: speech timing; Polish; English; speech technology

References

  • Arnold, Denis & W agner, P etra & Möbius, Bernd. 2011. E valuating different rating scales for obtaining judgments of syllable prominence from naive listeners. In Proceedings of XVIIth International Congress of Phonetic Sciences, 253-255. H ong K ong.Google Scholar

  • Auran, Cyril & Bouzon, Caroline & H irst, Daniel. 2004. T he A ix-MARSEC project: an evolutive database of spoken E nglish. In Bel, Bernard & Marlien, Isabelle (eds.), Proceedings of the Second International Conference on Speech Prosody, 561-564. N ara, J apan.Google Scholar

  • Bachan, J olanta. 2011. Communicative alignment of synthetic speech. P oznań: A dam Mickiewicz U niversity in Poznań. (Doctoral dissertation.) Barbosa, P linio. 2009. Measuring speech rhythm variation in an oscillator-based framework. In Proceedings of Interspeech 2009. Brighton: International Speech Communication A ssociation.Google Scholar

  • Breiman, L eo & Friedman, J erome & O lshen, R. A . & Stone, Charles. 1984. Classification and regression trees. Monterey, CA: W adsworth & Brooks/Cole A dvanced Books & Software.Google Scholar

  • Buchsbaum, A dam & van Santen L ., J an P . H . 1997. Methods for O ptimal T ext Selection. In Proceedings 5th Euro. Conf. on Speech Communication and Technology, Vol 2, 553-556. Rhodes, G reece.Google Scholar

  • Campbell, N ick. 1992. Multi-level timing in speech. Brighton, UK : U niversity of Sussex (Exp. P sychol). (Doctoral dissertation.) Google Scholar

  • Carson-Berndsen, J ulie. 1998. Time map phonology: Finite state models and event logics in speech recognition. Dordrecht: K luwer A cademic P ublishers.Google Scholar

  • Cummins, Fred. 1999. Some lengthening factors in E nglish speech combine additively at most rates. The Journal of the Acoustical Society of America 105. 476-480. Google Scholar

  • Dechert, H ans W . & Raupach, Manfred (eds.), Temporal Variables in Speech. Studies in Honour of Frieda Goldman- Eisler. T he H ague: Mouton.Google Scholar

  • Demenko, G rażyna & K lessa, K atarzyna & Szymański, Marcin & Breuer, Stefan & H ess, W olfgang. 2010. P olish unit selection speech synthesis with BOSS: extensions and speech corpora. International Journal of Speech Technology 13(2). 85-99.Google Scholar

  • Everitt, Brian S. & L andau, Sabine & L eese, Morven & Stahl, Daniel 2011. Cluster Analysis, 5th Edition. King’s College, L ondon: J ohn W iley & Sons.Google Scholar

  • Gibbon, Dafydd. 1992. P rosody, time types, and linguistic design factors in spoken language system architectures. Proceedings of KONVENS 1992. 90-99.Google Scholar

  • Gibbon, Dafydd. 2003. Computational modelling of rhythm as alternation, iteration and hierarchy. In Proceedings of International Congress of Phonetic Sciences III. Barcelona, 2489-2492.Google Scholar

  • Gibbon, Dafydd. 2006. T ime types and time trees: P rosodic mining and alignment of temporally annotated data. In Sudhoff, Stefan et al. 2006. Methods in Empirical Prosody Research, 281-209. Berlin: W alter de G ruyter.Google Scholar

  • Gibbon, Dafydd. 2013. TGA : a web tool for T ime G roup A nalysis. In Proceedings of Tools and Resources for the Analysis of Speech Prosody (TRASP). A ix-en-Provence.Google Scholar

  • Gibbon, Dafydd & Fernandes, Flaviane Romani. 2005. A nnotation-mining for rhythm model comparison in Brazilian P ortuguese. Proceedings of Interspeech 2005, 3289-3292.Google Scholar

  • Gibbon, Dafydd & H irst, Daniel & Campbell, N ick (eds.). 2012. Rhythm, melody and harmony in speech. Studies in honour of Wiktor Jassem. Speech and Language Technology 14/15. P oznań.Google Scholar

  • Grosjean, François H . & L ass, N orman J . 1977. Some factors affecting the listener’s perception of reading rate in English and French. Language and Speech 20(3). 198-208.Google Scholar

  • Gut, U lrike. 2012. Rhythm in L 2 speech. In G ibbon, Dafydd & H irst, Daniel & Campbell, N ick (eds.), Rhythm, melody and harmony in speech. Studies in honour of Wiktor Jassem. Speech and Language Technology 14/15. 105-114. P oznań. Google Scholar

  • ‘t Hart, J ohan & Collier, Rene & Cohen A ntonie. 1990. A Perceptual Study of Intonation: An Experimental- Phonetic Approach to Speech Melody. Cambridge: Cambridge U niversity P ress.Google Scholar

  • Hirst, Daniel & Di Cristo, A lbert (eds.). 1998. Intonation Systems. A survey of Twenty Languages. Cambridge: Cambridge U niversity P ress.Google Scholar

  • Inden, Benjamin & Malisz, Z ofia & W agner, P etra, & W achsmuth, Ipke. 2012. Rapid entrainment to spontaneous speech: A comparison of oscillator models. In Miyake, N . & P eebles, D. & Cooper, R. P . (eds.), Proceedings of 34th Annual Conference of the Cognitive Science Society. A ustin, T X: Cognitive Science Society.Google Scholar

  • Jassem, W iktor. 2003. IPA : Polish. Journal of the International Phonetic Association 33(1). 103-107.Google Scholar

  • Jassem, W iktor & K rzyśko, Mirosław & Stolarski, P rzemysław. 1981. Regression model of isochrony in speech signal, IPPT PAN 33. W arszawa.Google Scholar

  • Jassem, W iktor & H ill, David R. & W itten, Ian H . 1984. Isochrony in E nglish speech: its statistical validity and linguistic relevance. In G ibbon, Dafydd & Richter, H elmut (eds.), Intonation, accent and rhythm. Studies in Discourse Phonology 8. 203-225.Google Scholar

  • King, Simon & P ortele, T homas & H öfer, Florian. 1997. Speech synthesis using non-uniform units in the Verbmobil project. Proceedings Eurospeech 2. 569-572. Rhodes.Google Scholar

  • King, Simon & Black, A lan W . & T aylor, P aul & Caley, Richard & Clark, Rob. 2003. E dinburgh Speech T ools. System Documentation E dition 1.2, for 1.2.3 24th J an 2003. (Retrieved from: http://www.cstr.ed.ac.uk/projects/speech_tools/manual-1.2.0 on 27 A pril 2013).Google Scholar

  • Klatt, Dennis. H . 1976. L inguistic uses of segmental duration in E nglish: A coustic and perceptual evidence. The Journal of the Acoustical Society of America 59. 1208‑1221.Google Scholar

  • Klatt, Dennis. H . 1987. Review of text-to-speech conversion for E nglish. The Journal of the Acoustical Society of America 88(3). 737-793.Google Scholar

  • Klessa, K atarzyna & Szymański, Marcin & Breuer, S., & Demenko, G rażyna. 2007. O ptimization of P olish segmental duration prediction with CART. In Proceedings of 6th ISCA Workshop on Speech Synthesis (SSW-6). Vol. 1. Bonn.Google Scholar

  • Klessa, K atarzyna & W agner, A gnieszka, O leśkowicz-Popiel, Magdalena & K arpiński, Maciej. 2013. “Paralingua” - a new speech corpus for the studies of paralinguistic features. In Vargas-Sierra, Chelo (ed.), Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers from the 5th International Conference on Corpus Linguistics (CILC2013). Procedia - Social and Behavioral Science. Vol. 95, 48-58.Google Scholar

  • Koreman, J acques. 2006. P erceived speech rate: T he effects of articulation rate and speaking style in spontaneous speech. Journal of the Acoustical Society of America 119. 582-596.Google Scholar

  • Lehiste, Ilse. 1970. Suprasegmentals. Cambridge, Massachusetts-London: M.I.T. P ress.Google Scholar

  • Lehiste, Ilse. 1977. Isochrony reconsidered. Journal of Phonetics 5.Google Scholar

  • Low, E e L ing & G rabe, E sther & N olan, Francis. 2001. Quantitative characterisations of speech rhythm: Syllabletiming in Singapore E nglish. Language and Speech 43(4). 377-401.Google Scholar

  • Łobacz, P iotra. 1976a. O bjective and subjective speech tempo in P olish. Speech Analysis and Synthesis 4. 173-186.Google Scholar

  • Łobacz, P iotra. 1976b. Speech rate and vowel formants. Speech Analysis and Synthesis 4. 187-218.Google Scholar

  • Möbius, Bernd & van Santen, J an P . H . 1996. Modeling segmental duration in G erman text-to-speech synthesis. Spoken Language, 1996. Proceedings of ICSLP. Vol. 4, 2395-2398. P hiladelphia, PA : IEEE .Google Scholar

  • Möbius, Bernd. 2001. Rare events and closed domains: two delicate concepts in speech synthesis. 4th ISCA ITRW on Speech Synthesis. P erthshire.Google Scholar

  • Moers, Donata & J auk, Igor & Möbius, Bernd & W agner, P etra. 2010. Synthesizing Fast Speech by Implementing Multi-Phone U nits in U nit Selection Speech Synthesis. In Proceedings of 7th ISCA Tutorial and Research Workshop on Speech Synthesis (SSW-7).Google Scholar

  • Moos, A nja, & T rouvain, J ürgen. 2007. Comprehension of U ltra-Fast Speech-Blind vs. ‘Normally H earing’ P ersons. In Proceedings of the 16th International Congress of Phonetic Sciences, 677-680.Google Scholar

  • Olaszy, G ábor. 2002. P redicting H ungarian sound durations for continuous speech. Acta Linguistica Hungarica 49(3-4). 321-345.Google Scholar

  • OʼShaughnessy, Douglas. 1984. A multispeaker analysis of duration in read French paragraphs. Journal of the Acoustical Society of America 76(6). 1664-1672.Google Scholar

  • Pfitzinger, H artmut R. 1996. T wo approaches to speech rate estimation. In Proceedings SST. Vol. 96, 421-426.Google Scholar

  • Portele, T homas & Sendlemeier, W alter & H ess, W olfgang. 1990. A system for G erman speech synthesis based on demisyllables, diphones, and suffixes. In ESCA Workshop on Speech Synthesis Autrans, 161-164.Google Scholar

  • Richter, L utosława. 1973. T he duration of P olish vowels. Speech Analysis and Synthesis 3. 87-115. W arszawa.Google Scholar

  • Richter, L utosława. 1974. P orównanie iloczasu samogłosek polskich wymówionych w logatomach oraz w wyrazach. Biuletyn Polskiego Towarzystwa Fonetycznego 32. 173-178.Google Scholar

  • Richter, L utosława. 1987. Modelling of the rhythmic structure of utterances in P olish. Studia Phonetica Posnaniensia 1. 91-125.Google Scholar

  • Roach, P eter. 1982. O n the distinction between ‘stress-timed’ and ‘syllable-timed’ languages. In Crystal, David (ed.), Linguistic Controversies: Essays in Linguistic Theory and Practice, 73-79. L ondon: E dward A rnold.Google Scholar

  • Scott, Donia R. & Isard, S. D. & de Boysson-Bardies, Bénédicte. 1986. O n the measurement of rhythmic irregularity: a reply to Benguerel. Journal of Phonetics 14. 327-330.Google Scholar

  • Siegler, Matthiew A . & Stern, Richard M. 1995. O n the effects of speech rate in large vocabulary speech recognition systems. In International Conference on Acoustics, Speech, and Signal Processing 1995. ICASSP-95. Vol. 1, 612-615.Google Scholar

  • Syrdal, A nn K . & Bunnell, T imothy & H ertz, Susan R. & Mishra, T aniya & Spiegel, Murray & Bickley, Corine & Rekart, Deborah & Makashay, Matthew J . 2012. T ext-To-Speech Intelligibility across Speech Rates. In Proceedings of Interspeech. P ortland, O regon.Google Scholar

  • Szymański, Marcin & K lessa, K atarzyna & Breuer, Stefan & Demenko, G rażyna. 2011. O ptimization of unit selection speech synthesis. In Proceedings of XVIIth International Congress of Phonetic Sciences, 1930-1933. Hong K ong.Google Scholar

  • Treiblmaier, H orst & Filzmoser, P eter. 2009. Benefits from using continuous rating scales in online survey research. Technische U niversitt W ien, Forschungsbericht SM-2009-4.Google Scholar

  • Vainio, Martti. 2001. Artificial neural network based prosody models for Finnish text-to-speech synthesis. Helsinki: U niversity of H elsinki. (Doctoral dissertation.) Google Scholar

  • van Santen, J an P . H . 1993. Quantitative modeling of segmental duration. In Proceedings of the workshop on Human Language Technology, 323-328. A ssociation for Computational L inguistics.Google Scholar

  • Wagner, P etra & W indmann, A ndreas. 2011. T he shrinking effects on speech tempo perception. In Proceedings of XVIIth International Congress of Phonetic Sciences, 2082-2085. H ong K ong.Google Scholar

  • Zee, E ric. 2002. T he effect of speech rate on the temporal organization of syllable production in cantonese. Proceedings of Speech Prosody. Aix-en-Provence.Google Scholar

About the article

Published Online: 2015-07-24


Citation Information: Lingua Posnaniensis, Volume 56, Issue 1, Pages 59–83, ISSN (Online) 2083-6090, DOI: https://doi.org/10.2478/linpo-2014-0004.

Export Citation

© by Dafydd Gibbon. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. BY-NC-ND 3.0

Comments (0)

Please log in or register to comment.
Log in