Assessing the accuracy of existing forced alignment software on varieties of British English

  • 1 New York University, New York, NY, USA
  • 2 Lancaster University, Lancaster, UK
Laurel MacKenzieORCID iD: https://orcid.org/0000-0001-6427-4762 and Danielle Turton

Abstract

This paper presents an analysis of the performance and usability of automatic speech processing tools on six different varieties of English spoken in the British Isles. The tools used in the present study were developed for use with Mainstream American English, but we demonstrate that their forced alignment functionality nonetheless performs extremely well on a range of British varieties, encompassing both careful and casual speech. Where phone boundary placement is concerned, substantial errors in alignment occur infrequently, and although small displacements between aligner-placed and human-placed phone boundaries are found regularly, these will rarely have a significant effect on measurements of interest for the researcher. We demonstrate that gross phone boundary placement errors, when they do arise, are particularly likely to be introduced in fast speech or with varieties that are radically different from Mainstream American English (e.g. Scots). We also observe occasional problems with phonetic transcription. Overall, we advise that, although forced alignment software is highly reliable and improving continuously, human confirmation is needed to correct errors which can displace entire stretches of speech. Nevertheless, sociolinguists can be assured that the output of these tools is generally highly accurate for a wide range of varieties.

  • Bailey, George. 2016. Automatic detection of sociolinguistic variation using forced alignment. In Helen Jeoung (ed.), Penn working papers in linguistics 22.2: Selected papers from NWAV 44, 11–20. Philadelphia: Penn Graduate Linguistics Society.

  • Baranowski, Maciej & Danielle Turton. 2015. Manchester English. In Raymond Hickey (ed.), Researching Northern English, 283–305. Amsterdam & Philadelphia: John Benjamins.

  • Bisani, Maximilian & Hermann Ney. 2008. Joint-sequence models for grapheme-to-phoneme conversion. Speech Communication 50(5). 434–451.

    • Crossref
    • Export Citation
  • Boersma, Paul & David Weenink. 2017. Praat: Doing phonetics by computer, version 6.0.29. http://www.fon.hum.uva.nl/praat/ (accessed 28 July 2017).

  • Burbano-Elizondo, Lourdes. 2008. Language variation and identity in Sunderland. Sheffield, UK: University of Sheffield dissertation.

  • Burbano-Elizondo, Lourdes. 2015. Sunderland. In Raymond Hickey (ed.), Researching Northern English, 183–204. Amsterdam & Philadelphia: John Benjamins.

  • Cosi, Piero, Falavigna, Daniele & Omologo, Maurizio, 1991. A preliminary statistical evaluation of manual and automatic segmentation discrepancies. In Proceedings of the Second European Conference on Speech Communication and Technology, 693–696.

  • DiCanio, Christian, Hosung Nam, Jonathan D. Amith, Rey Castillo García & Douglas H. Whalen. 2015. Vowel variability in elicited versus spontaneous speech: Evidence from Mixtec. Journal of Phonetics 48. 45–59.

    • Crossref
    • Export Citation
  • Evanini, Keelan. 2009. The permeability of dialect boundaries: A case study of the region surrounding Erie, Pennsylvania. Philadelphia, PA: University of Pennsylvania dissertation.

  • Fromont, Robert & Jennifer Hay. 2012. LaBB-CAT: An annotation store. In Paul Cook & Scott Nowson (eds.),Proceedings of the Australasian Language Technology Association Workshop 2012, 113–117.

  • Fromont, Robert & Kevin Watson. 2016. Factors influencing automatic segmental alignment of sociophonetic corpora. Corpora 11(3). 401–431.

    • Crossref
    • Export Citation
  • Fruehwald, Josef. 2011. handCoder [Praat script]. http://val-systems.blogspot.co.uk/2011/02/handcoder-praat-script.html (accessed 28 July 2017).

  • Goldman, Jean-Philippe. 2011. Easyalign: An automatic phonetic alignment tool under Praat. In Proceedings of the 12thConference of the International Speech Communication Association, 3233–3236.

  • González, Simón, Catherine Travis, James Grama, Danielle Barth & Sunkulp Ananthanarayan. 2018a. Recursive forced alignment: A test on a minority language. In Julien Epps, Joe Wolfe, John Smith & Catherine Jones (eds), Proceedings of the 17thAustralasian International Conference on Speech Science and Technology, 145–148.

  • González, Simón, James Grama & Catherine Travis. 2018b. Comparing the accuracy of forced-aligners for sociolinguistic research. Poster presented at CoEDL Fest, University of Melbourne, 5–8 February.

  • Holmes-Elliott, Sophie. 2015. London calling: Assessing the spread of metropolitan features in the southeast. Glasgow, UK: University of Glasgow dissertation.

  • Hosom, John-Paul. 2009. Speaker-independent phoneme alignment using transition-dependent states. Speech Communication 51(4). 352–368.

    • Crossref
    • PubMed
    • Export Citation
  • Hughes, Arthur, Peter Trudgill & Dominic Watt. 2012. English accents and dialects: An introduction to social and regional varieties of English in the British Isles. London: Hodder Education.

  • Johnson, Keith. 2004. Massive reduction in conversational American English. In Kiyoko Yoneyama & Kikuo Maekawa (eds.), Spontaneous speech: Data and analysis: Proceedings of the 1st Session of the 10th International Symposium, 29–54. Tokyo: The National International Institute for Japanese Language.

  • Kisler, Thomas, Uwe Reichel & Florian Schiel. 2017. Multilingual processing of speech via web services. Computer Speech & Language 45. 326–347.

  • Knowles, Thea, Meghan Clayards, Morgan Sonderegger, Michael Wagner, Aparna Nadig & Kristine H. Onishi. 2015. Automatic forced alignment on child speech: Directions for improvement. Proceedings of Meetings on Acoustics 25. 060001.

  • Labov, William. 1984. Field methods of the project on linguistic change and variation. In John Baugh and Joel Sherzer (eds.), Language in use: Readings in sociolinguistics, 28–66. Englewood Cliffs, N.J.: Prentice Hall.

  • Labov, William. 2006 [1966]. The Social stratification of English in New York City. New York: Cambridge University Press.

  • Labov, William, Ingrid Rosenfelder & Josef Fruehwald. 2013. One hundred years of sound change in Philadelphia: Linear incrementation, reversal, and reanalysis. Language 89(1). 30–65.

    • Crossref
    • Export Citation
  • Lai, Catherine, Yanyan Sui & Jiahong Yuan. 2010. A corpus study of the prosody of polysyllabic words in Mandarin Chinese. In Proceedings of Speech Prosody 2010, 100457:1–4.

  • Lee, Sarah. 2017. Style-shifting in vlogging: An acoustic analysis of “YouTube Voice”. Lifespans & Styles: Undergraduate Working Papers on Intraspeaker Variation 3(1). 28–39.

    • Crossref
    • Export Citation
  • MacKenzie, Laurel. 2017. Frequency effects over the lifespan: A case study of Attenborough’s r’s. Linguistics Vanguard 3(1). 1–12.

  • MacKenzie, Laurel & Danielle Turton. 2013. Crossing the pond: Extending automatic alignment techniques to British English dialect data. Paper presented at New Ways of Analyzing Variation 42, Pittsburgh, 17–20 October.

  • McAuliffe, Michael, Michaela Socolof, Sarah Mihuc, Michael Wagner & Morgan Sonderegger. 2017. Montreal Forced Aligner: Trainable text-speech alignment using Kaldi. In Proceedings of the 18th Conference of the International Speech Communication Association, 498–502.

  • Meer, Phillip & José Matute Flores. 2018. Making FAVE ready for New Englishes: Applying and modifying FAVE for semi-automatic acoustic analyses of Trinidadian English vowels. Paper presented at New Ways of Analyzing Variation 47, New York University, 18–21 October.

  • Milne, Peter M. 2011. Finding schwa: Comparing the results of an automatic aligner with human judgments when identifying schwa in a corpus of spoken French. Canadian Acoustics 39(3). 190–191.

  • Milne, Peter M. 2012. The effects of syllable position on allophonic variation in Québec French /r/. In Hilary Prichard (ed.), Penn working papers in linguistics 18.2: Selected papers from NWAV 40, 67–76. Philadelphia: Penn Graduate Linguistics Society.

  • Milne, Peter M. 2015. Improving the accuracy of forced alignment through model selection and dictionary restriction. Ms., McGill University.

  • Raymond, William D., Mark A. Pitt, Keith Johnson, Elizabeth Hume, Matthew J. Makashay, Robin Dautricourt & Craig Hilts. 2002. An analysis of transcription consistency in spontaneous speech from the Buckeye corpus. In Proceedings of the 7th International Conference on Spoken Language Processing, 1125–1128.

  • Reddy, Sravana & James N. Stanford. 2015a. Toward completely automated vowel extraction: Introducing DARLA. Linguistics Vanguard 1(1). 15–28.

  • Reddy, Sravana & James N. Stanford. 2015b. A web application for automated dialect analysis. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 71– 75.

  • Robinson, A. J. 1997. British English Example Pronunciation (BEEP). URL http://svr-www.eng.cam.ac.uk/comp.speech/Section1/Lexical/beep.html, accessed 6 June 2017.

  • Rosenfelder, Ingrid, Josef Fruehwald, Keelan Evanini, Scott Seyfarth, Kyle Gorman, Hilary Prichard & Jiahong Yuan. 2014. FAVE 1.1.3. http://dx.doi.org/10.5281/zenodo.9846 (accessed 28 July 2017).

  • Schuppler, Barbara, Mirjam Ernestus, Odette Scharenborg & Lou Boves. 2011. Acoustic reduction in conversational Dutch: A quantitative analysis based on automatically generated segmental transcriptions. Journal of Phonetics 39(1). 96–109.

    • Crossref
    • Export Citation
  • Schuppler, Barbara, Wim A. van Dommelen, Jacques Koreman & Mirjam Ernestus. 2012. How linguistic and probabilistic properties of a word affect the realization of its final /t/: Studies at the phonemic and sub-phonemic level. Journal of Phonetics 40(4). 595–607.

    • Crossref
    • Export Citation
  • Tamminga, Meredith. 2009. Insular Scots front vowels in Westray, Orkney. Scottish Language 28. 67–87.

  • Turton, Danielle. 2015. 4,000 /r/s from Blackburn, Lancashire: A (socio)phonological analysis of rhoticity in Northern England. Paper presented at the Manchester Forum in Linguistics, University of Manchester, 6–7 November.

  • Warburton, Jasmine. 2016. Phonetic variation in the North East of England: On intra-regional differences in /u/, /Ə/ and /l/ realizations. Newcastle: Newcastle University MA thesis.

  • Weide, Robert. 2008. The CMU Pronouncing Dictionary. Carnegie Mellon University. http://dx.doi.org/10.5281/zenodo.9846 (accessed 28 July:2017).

  • Wells, J. C. 1982. Accents of English. Cambridge: Cambridge University Press.

  • Wilbanks, Eric. 2015. The development of FASE: Forced Alignment System for Español and implications for sociolinguistic methodologies. Paper presented at New Ways of Analyzing Variation 44, University of Toronto, 22–25 October.

  • Wittenburg, Peter, Hennie Brugman, Albert Russel, Alex Klassmann, & Han Sloetjes. 2006. ELAN: A professional framework for multimodality research. In 5th International Conference on Language Resources and Evaluation (LREC 2006), 1556–1559.

  • Wolfram, Walt & Natalie Schilling. 2015. American English: Dialects and variation. Malden, MA: John Wiley & Sons, third edition.

  • Woodland, Philip C., Chris J. Leggetter, J. J. Odell, Valtcho Valtchev & Steve J. Young. 1995. The 1994 HTK large vocabulary speech recognition system. In Proceedings of the 1995 International Conference on Acoustics, Speech, and Signal Processing, 73–76.

  • Yuan, Jiahong & Mark Liberman. 2008. Speaker identification on the SCOTUS corpus. Journal of the Acoustical Society of America 123. 3878.

    • Crossref
    • Export Citation
  • Yuan, Jiahong & Mark Liberman. 2011a. Automatic detection of g-dropping in American English using forced alignment. In 2011 IEEE Workshop on Automatic Speech Recognition and Understanding, 490–493.

  • Yuan, Jiahong & Mark Liberman. 2011b. /l/ variation in American English: A corpus approach. Journal of Speech Sciences 1(2). 35–46.

Purchase article
Get instant unlimited access to the article.
$42.00
Log in
Already have access? Please log in.


or
Log in with your institution

Journal + Issues

Linguistics Vanguard is a new channel for high-quality articles in all major fields of linguistics. Published solely online, the multimodal journal provides an accessible platform supporting both traditional contributions as well as innovative publications featuring interactive content. Linguistics Vanguard publishes concise and up-to-date reports on the state of the art in linguistics as well as cutting-edge research papers.

Search