Accessible Unlicensed Requires Authentication Published by De Gruyter Mouton January 29, 2020

Assessing the accuracy of existing forced alignment software on varieties of British English

Laurel MacKenzie ORCID logo and Danielle Turton
From the journal Linguistics Vanguard

Abstract

This paper presents an analysis of the performance and usability of automatic speech processing tools on six different varieties of English spoken in the British Isles. The tools used in the present study were developed for use with Mainstream American English, but we demonstrate that their forced alignment functionality nonetheless performs extremely well on a range of British varieties, encompassing both careful and casual speech. Where phone boundary placement is concerned, substantial errors in alignment occur infrequently, and although small displacements between aligner-placed and human-placed phone boundaries are found regularly, these will rarely have a significant effect on measurements of interest for the researcher. We demonstrate that gross phone boundary placement errors, when they do arise, are particularly likely to be introduced in fast speech or with varieties that are radically different from Mainstream American English (e.g. Scots). We also observe occasional problems with phonetic transcription. Overall, we advise that, although forced alignment software is highly reliable and improving continuously, human confirmation is needed to correct errors which can displace entire stretches of speech. Nevertheless, sociolinguists can be assured that the output of these tools is generally highly accurate for a wide range of varieties.

Acknowledgement

Many thanks to Sophie Holmes-Elliott and Meredith Tamminga for sharing their data, Adam Mearns for access to DECTE, James Stanford for help with DARLA, audiences at NWAV 42 for helpful comments and questions, and the reviewers and editors of this special issue for suggestions which have improved the paper.

References

Bailey, George. 2016. Automatic detection of sociolinguistic variation using forced alignment. In Helen Jeoung (ed.), Penn working papers in linguistics 22.2: Selected papers from NWAV 44, 11–20. Philadelphia: Penn Graduate Linguistics Society.Search in Google Scholar

Baranowski, Maciej & Danielle Turton. 2015. Manchester English. In Raymond Hickey (ed.), Researching Northern English, 283–305. Amsterdam & Philadelphia: John Benjamins.Search in Google Scholar

Bisani, Maximilian & Hermann Ney. 2008. Joint-sequence models for grapheme-to-phoneme conversion. Speech Communication 50(5). 434–451.Search in Google Scholar

Boersma, Paul & David Weenink. 2017. Praat: Doing phonetics by computer, version 6.0.29. (accessed 28 July 2017).Search in Google Scholar

Burbano-Elizondo, Lourdes. 2008. Language variation and identity in Sunderland. Sheffield, UK: University of Sheffield dissertation.Search in Google Scholar

Burbano-Elizondo, Lourdes. 2015. Sunderland. In Raymond Hickey (ed.), Researching Northern English, 183–204. Amsterdam & Philadelphia: John Benjamins.Search in Google Scholar

Cosi, Piero, Falavigna, Daniele & Omologo, Maurizio, 1991. A preliminary statistical evaluation of manual and automatic segmentation discrepancies. In Proceedings of the Second European Conference on Speech Communication and Technology, 693–696.Search in Google Scholar

DiCanio, Christian, Hosung Nam, Jonathan D. Amith, Rey Castillo García & Douglas H. Whalen. 2015. Vowel variability in elicited versus spontaneous speech: Evidence from Mixtec. Journal of Phonetics 48. 45–59.Search in Google Scholar

Evanini, Keelan. 2009. The permeability of dialect boundaries: A case study of the region surrounding Erie, Pennsylvania. Philadelphia, PA: University of Pennsylvania dissertation.Search in Google Scholar

Fromont, Robert & Jennifer Hay. 2012. LaBB-CAT: An annotation store. In Paul Cook & Scott Nowson (eds.),Proceedings of the Australasian Language Technology Association Workshop 2012, 113–117.Search in Google Scholar

Fromont, Robert & Kevin Watson. 2016. Factors influencing automatic segmental alignment of sociophonetic corpora. Corpora 11(3). 401–431.Search in Google Scholar

Fruehwald, Josef. 2011. handCoder [Praat script]. (accessed 28 July 2017).Search in Google Scholar

Goldman, Jean-Philippe. 2011. Easyalign: An automatic phonetic alignment tool under Praat. In Proceedings of the 12thConference of the International Speech Communication Association, 3233–3236.Search in Google Scholar

González, Simón, Catherine Travis, James Grama, Danielle Barth & Sunkulp Ananthanarayan. 2018a. Recursive forced alignment: A test on a minority language. In Julien Epps, Joe Wolfe, John Smith & Catherine Jones (eds), Proceedings of the 17thAustralasian International Conference on Speech Science and Technology, 145–148.Search in Google Scholar

González, Simón, James Grama & Catherine Travis. 2018b. Comparing the accuracy of forced-aligners for sociolinguistic research. Poster presented at CoEDL Fest, University of Melbourne, 5–8 February.Search in Google Scholar

Holmes-Elliott, Sophie. 2015. London calling: Assessing the spread of metropolitan features in the southeast. Glasgow, UK: University of Glasgow dissertation.Search in Google Scholar

Hosom, John-Paul. 2009. Speaker-independent phoneme alignment using transition-dependent states. Speech Communication 51(4). 352–368.Search in Google Scholar

Hughes, Arthur, Peter Trudgill & Dominic Watt. 2012. English accents and dialects: An introduction to social and regional varieties of English in the British Isles. London: Hodder Education.Search in Google Scholar

Johnson, Keith. 2004. Massive reduction in conversational American English. In Kiyoko Yoneyama & Kikuo Maekawa (eds.), Spontaneous speech: Data and analysis: Proceedings of the 1st Session of the 10th International Symposium, 29–54. Tokyo: The National International Institute for Japanese Language.Search in Google Scholar

Kisler, Thomas, Uwe Reichel & Florian Schiel. 2017. Multilingual processing of speech via web services. Computer Speech & Language 45. 326–347.Search in Google Scholar

Knowles, Thea, Meghan Clayards, Morgan Sonderegger, Michael Wagner, Aparna Nadig & Kristine H. Onishi. 2015. Automatic forced alignment on child speech: Directions for improvement. Proceedings of Meetings on Acoustics 25. 060001.Search in Google Scholar

Labov, William. 1984. Field methods of the project on linguistic change and variation. In John Baugh and Joel Sherzer (eds.), Language in use: Readings in sociolinguistics, 28–66. Englewood Cliffs, N.J.: Prentice Hall.Search in Google Scholar

Labov, William. 2006 [1966]. The Social stratification of English in New York City. New York: Cambridge University Press.Search in Google Scholar

Labov, William, Ingrid Rosenfelder & Josef Fruehwald. 2013. One hundred years of sound change in Philadelphia: Linear incrementation, reversal, and reanalysis. Language 89(1). 30–65.Search in Google Scholar

Lai, Catherine, Yanyan Sui & Jiahong Yuan. 2010. A corpus study of the prosody of polysyllabic words in Mandarin Chinese. In Proceedings of Speech Prosody 2010, 100457:1–4.Search in Google Scholar

Lee, Sarah. 2017. Style-shifting in vlogging: An acoustic analysis of “YouTube Voice”. Lifespans & Styles: Undergraduate Working Papers on Intraspeaker Variation 3(1). 28–39.Search in Google Scholar

MacKenzie, Laurel. 2017. Frequency effects over the lifespan: A case study of Attenborough’s r’s. Linguistics Vanguard 3(1). 1–12.Search in Google Scholar

MacKenzie, Laurel & Danielle Turton. 2013. Crossing the pond: Extending automatic alignment techniques to British English dialect data. Paper presented at New Ways of Analyzing Variation 42, Pittsburgh, 17–20 October.Search in Google Scholar

McAuliffe, Michael, Michaela Socolof, Sarah Mihuc, Michael Wagner & Morgan Sonderegger. 2017. Montreal Forced Aligner: Trainable text-speech alignment using Kaldi. In Proceedings of the 18th Conference of the International Speech Communication Association, 498–502.Search in Google Scholar

Meer, Phillip & José Matute Flores. 2018. Making FAVE ready for New Englishes: Applying and modifying FAVE for semi-automatic acoustic analyses of Trinidadian English vowels. Paper presented at New Ways of Analyzing Variation 47, New York University, 18–21 October.Search in Google Scholar

Milne, Peter M. 2011. Finding schwa: Comparing the results of an automatic aligner with human judgments when identifying schwa in a corpus of spoken French. Canadian Acoustics 39(3). 190–191.Search in Google Scholar

Milne, Peter M. 2012. The effects of syllable position on allophonic variation in Québec French /r/. In Hilary Prichard (ed.), Penn working papers in linguistics 18.2: Selected papers from NWAV 40, 67–76. Philadelphia: Penn Graduate Linguistics Society.Search in Google Scholar

Milne, Peter M. 2015. Improving the accuracy of forced alignment through model selection and dictionary restriction. Ms., McGill University.Search in Google Scholar

Raymond, William D., Mark A. Pitt, Keith Johnson, Elizabeth Hume, Matthew J. Makashay, Robin Dautricourt & Craig Hilts. 2002. An analysis of transcription consistency in spontaneous speech from the Buckeye corpus. In Proceedings of the 7th International Conference on Spoken Language Processing, 1125–1128.Search in Google Scholar

Reddy, Sravana & James N. Stanford. 2015a. Toward completely automated vowel extraction: Introducing DARLA. Linguistics Vanguard 1(1). 15–28.Search in Google Scholar

Reddy, Sravana & James N. Stanford. 2015b. A web application for automated dialect analysis. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 71– 75.Search in Google Scholar

Robinson, A. J. 1997. British English Example Pronunciation (BEEP). URL , accessed 6 June 2017.Search in Google Scholar

Rosenfelder, Ingrid, Josef Fruehwald, Keelan Evanini, Scott Seyfarth, Kyle Gorman, Hilary Prichard & Jiahong Yuan. 2014. FAVE 1.1.3. (accessed 28 July 2017).Search in Google Scholar

Schuppler, Barbara, Mirjam Ernestus, Odette Scharenborg & Lou Boves. 2011. Acoustic reduction in conversational Dutch: A quantitative analysis based on automatically generated segmental transcriptions. Journal of Phonetics 39(1). 96–109.Search in Google Scholar

Schuppler, Barbara, Wim A. van Dommelen, Jacques Koreman & Mirjam Ernestus. 2012. How linguistic and probabilistic properties of a word affect the realization of its final /t/: Studies at the phonemic and sub-phonemic level. Journal of Phonetics 40(4). 595–607.Search in Google Scholar

Tamminga, Meredith. 2009. Insular Scots front vowels in Westray, Orkney. Scottish Language 28. 67–87.Search in Google Scholar

Turton, Danielle. 2015. 4,000 /r/s from Blackburn, Lancashire: A (socio)phonological analysis of rhoticity in Northern England. Paper presented at the Manchester Forum in Linguistics, University of Manchester, 6–7 November.Search in Google Scholar

Warburton, Jasmine. 2016. Phonetic variation in the North East of England: On intra-regional differences in /u/, /Ə/ and /l/ realizations. Newcastle: Newcastle University MA thesis.Search in Google Scholar

Weide, Robert. 2008. The CMU Pronouncing Dictionary. Carnegie Mellon University. (accessed 28 July:2017).Search in Google Scholar

Wells, J. C. 1982. Accents of English. Cambridge: Cambridge University Press.Search in Google Scholar

Wilbanks, Eric. 2015. The development of FASE: Forced Alignment System for Español and implications for sociolinguistic methodologies. Paper presented at New Ways of Analyzing Variation 44, University of Toronto, 22–25 October.Search in Google Scholar

Wittenburg, Peter, Hennie Brugman, Albert Russel, Alex Klassmann, & Han Sloetjes. 2006. ELAN: A professional framework for multimodality research. In 5th International Conference on Language Resources and Evaluation (LREC 2006), 1556–1559.Search in Google Scholar

Wolfram, Walt & Natalie Schilling. 2015. American English: Dialects and variation. Malden, MA: John Wiley & Sons, third edition.Search in Google Scholar

Woodland, Philip C., Chris J. Leggetter, J. J. Odell, Valtcho Valtchev & Steve J. Young. 1995. The 1994 HTK large vocabulary speech recognition system. In Proceedings of the 1995 International Conference on Acoustics, Speech, and Signal Processing, 73–76.Search in Google Scholar

Yuan, Jiahong & Mark Liberman. 2008. Speaker identification on the SCOTUS corpus. Journal of the Acoustical Society of America 123. 3878.Search in Google Scholar

Yuan, Jiahong & Mark Liberman. 2011a. Automatic detection of g-dropping in American English using forced alignment. In 2011 IEEE Workshop on Automatic Speech Recognition and Understanding, 490–493.Search in Google Scholar

Yuan, Jiahong & Mark Liberman. 2011b. /l/ variation in American English: A corpus approach. Journal of Speech Sciences 1(2). 35–46.Search in Google Scholar

Received: 2019-02-14
Accepted: 2019-07-15
Published Online: 2020-01-29

© 2020 Walter de Gruyter GmbH, Berlin/Boston