One of the unresolved questions in audiovisual prosody is the relative contribution of acoustic and visual cues to the expression of prosodic meaning. Though the majority of studies on audiovisual prosody have found a complementary mode of processing whereby sight provides relatively weak and redundant information in comparison with strong auditory cues, other work has found that sight provides information more efficiently than hearing. In Catalan, a pitch range contrast in a rising-falling nuclear configuration conveys a difference between a contrastive focus statement and an echo question. The main goal of this study is to investigate the relative contribution of visual cues in conveying this distinction. Twenty native speakers of Central Catalan participated in two identification tasks in which they had to decide between a focus statement and a question interpretation. Experiment 1 used a pitch range auditory continuum combined with two congruent and incongruent videotapes showing the facial gestures that are characteristic of the two pragmatic meanings. Experiment 2 used the same auditory continuum in combination with another continuum for facial gestures produced using a digital image-morphing technique. The responses and reaction times obtained in both experiments revealed a consistent reliance on visual cues in the listener's decisions, but also a consistent effect of the auditory stimulus. We argue that although facial gestures are the most influential elements that Catalan listeners rely on to decide between contrastive focus and echo question interpretations, bimodal integration with the acoustic cues is necessary for perceptual processing to be accurate and fast. Finally, we discuss the implications of these results for models of audiovisual processing.
© 2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston