Drawing from ethnographic, empirical, and historical /
cultural perspectives, we examine the extent to which visual aspects of music
contribute to the communication that takes place between performers and their
listeners. First, we introduce a framework for understanding how media and
genres shape aural and visual experiences of music. Second, we present case
studies of two performances, and describe the relation between visual and aural
aspects of performance. Third, we report empirical evidence that visual aspects
of performance reliably inﬂuence perceptions of musical structure (pitch related
features) and affective interpretations of music. Finally, we trace new and old
media trajectories of aural and visual dimensions of music, and highlight how
our conceptions, perceptions and appreciation of music are intertwined with
technological innovation and media deployment strategies.
Twenty English-speaking listeners judged the emotive intent of utterances spoken by male and female speakers of English, German, Chinese, Japanese, and Tagalog. The verbal content of utterances was neutral but prosodic elements conveyed each of four emotions: joy, anger, sadness, and fear. Identification accuracy was above chance performance levels for all emotions in all languages. Across languages, sadness and anger were more accurately recognized than joy and fear. Listeners showed an in-group advantage for decoding emotional prosody, with highest recognition rates for English utterances and lowest recognition rates for Japanese and Chinese utterances. Acoustic properties of stimuli were correlated with the intended emotion expressed. Our results support the view that emotional prosody is decoded by a combination of universal and culture-specific cues.
How did human vocalizations come to acquire meaning in the evolution of our species? Charles Darwin proposed that language and music originated from a common emotional signal system based on the imitation and modification of sounds in nature. This protolanguage is thought to have diverged into two separate systems, with speech prioritizing referential functionality and music prioritizing emotional functionality. However, there has never been an attempt to empirically evaluate the hypothesis that a single communication system can split into two functionally distinct systems that are characterized by music- and languagelike properties. Here, we demonstrate that when referential and emotional functions are introduced into an artificial communication system, that system will diverge into vocalization forms with speech- and music-like properties, respectively. Participants heard novel vocalizations as part of a learning task. Half referred to physical entities and half functioned to communicate emotional states. Participants then reproduced each sound with the defined communicative intention in mind. Each recorded vocalization was used as the input for another participant in a serial reproduction paradigm, and this procedure was iterated to create 15 chains of five participants each. Referential vocalizations were rated as more speech-like, whereas emotional vocalizations were rated as more music-like, and this association was observed cross-culturally. In addition, a stable separation of the acoustic profiles of referential and emotional vocalizations emerged, with some attributes diverging immediately and others diverging gradually across iterations. The findings align with Darwin’s hypothesis and provide insight into the roles of biological and cultural evolution in the divergence of language and music.