The Speech Processing Lexicon
Neurocognitive and Behavioural Approaches
Ed. by Lahiri, Aditi / Kotzor, Sandra
Series:Phonology and Phonetics [PP] 22
- eBook (PDF)
- Publication Date:
- April 2017
- Copyright year:
Automatic speech recognition: What phonology can offer
Arora, Vipul / Reetz, Henning
This chapter presents phonological features as the underlying representation of speech for the purpose of automatic speech recognition (ASR), instead of phones (or phonemes), which are typically used for this purpose. Phonological features offer a number of advantages. Firstly, they can efficiently handle the pronunciation variability found in languages. Secondly, these features form natural classes to represent speech universally, hence they are capable of providing better ways to transfer various models, involved in ASR, across different languages and dialects. Moreover, the ubiquity of the perceptual properties of phonological features is supported by various neuro-linguistic experiments and language studies for different languages of the world. Thus, phonological features can provide a principled way of ASR, thereby reducing the amount of training data and computational resources required.
The main challenge is to develop mathematical models to reliably detect these features from the speech signal, and to incorporate them into ASR systems. Towards this end, we describe here some of our implementations. Firstly, we present a digit recognition system that includes detecting the features with the help of neural networks and a rule-based feature-to-phoneme mapping. Secondly, we describe a deep neural networks based method to extract the features from speech signals. This method improves the detection accuracy by using deep learning. Thirdly, we present a deep neural network based ASR system which detects features and maps them to phonemes using statistical models. This system performs at par with state-of-the-art ASR systems for the task of phoneme recognition.
Vipul Arora, Henning Reetz (2017). Automatic speech recognition: What phonology can offer. In Aditi Lahiri, Sandra Kotzor (Eds.), The Speech Processing Lexicon: Neurocognitive and Behavioural Approaches (pp. 211–235). Berlin, Boston: De Gruyter. https://doi.org/10.1515/9783110422658-011
Book DOI: https://doi.org/10.1515/9783110422658
Online ISBN: 9783110422658© 2017 Walter de Gruyter GmbH, Berlin/Munich/Boston