February 22, 2023

Spheres of interest: Space and social cognition in Phola deixis

From the journal Open Linguistics


After a period of space-centred description of demonstratives, recent research has highlighted the role of attention, psychological proximity and shared knowledge in determining deictic choice. While convincing evidence has been presented that mental states may define deictic reference (e.g. in Turkish, Jahai or Kogi), there is also neuroscientific data suggesting that spatial cognition is often drawn upon in the process and that spatial and attentional perspectives may interact with each other. Pragmatic analysis of deictic usage in some languages (e.g. Yucatec or Lao) suggests that demonstrative systems may respond to multidimensional search spaces that include not only spatial but also embodied, perceptual and social access to referents. On the basis of observational data from Phola, a Tibeto-Burman language of Southwest China, the present article contributes to these research endeavours by explicitly exploring how speaker and addressee demonstratives may independently respond to both spatial and sociocognitive modes of access to a referent. Advancing the notion of spheres of interest as a descriptive heuristics to capture this fluidity, it is shown how deictic choice not only passively reflects aspects of context but also actively projects intersubjective appraisals and expectations onto material and social reality.

1 Introduction

After a period of space-centred description of demonstratives (e.g. Anderson and Keenan 1985, Fillmore 1997), descriptive and experimental research has highlighted the role of shared attention (Özyürek 1998), cognitive accessibility (Hanks 2011) and psychological proximity (Johannessen 2008, Peeters Hagoort and Özyürek 2015, Johannessen 2020) in deictic systems. Some languages exhibit demonstratives that explicitly signal that the speaker and addressee’s attention is either jointly directed towards the same entity or disjointly allocated. Thus, for example Jahai ton ‘that which we are both attending to’ (Burenhult 2003) and Kogi twẽhié ‘idem’ (Knuchel 2019) have been characterised as demonstratives signalling addressee access and shared epistemic access, respectively. Conversely, Turkish şu ‘that which you are not as yet attending to’ signals lack of shared attention as assessed by gaze direction (Küntay and Özyürek 2006). While there is some neurolinguistic evidence that demonstrative production draws from spatial cognition (Stevens and Zhang 2013, Coventry et al. 2014), complementary empirical research suggests that it is also mediated by intersubjectivity and social cognition in principled ways (cf. Peeters et al. 2020). Thus, for example in languages such as Dutch (Piwek et al. 2008), English (Bangerter 2004, Peeters and Özyürek 2016), Finnish (Laury 1997), Japanese (Naruoka 2006), Lao (Enfield 2003, 2018), Mano (Khachaturyan 2020), Norwegian (Johannessen 2020), Spanish (Coventry et al. 2008), Tiriyó (Meira 2003) or Tzeltal (Brown and Levinson 2018), the spatial range of proximal and speaker demonstratives has been found to be significantly sensitive to attentional and psychological focus on the part of the speaker, as well as to the location and (presumed) mental states of the addressee, i.e. where they are and what they are looking at.

Beyond nominal determination, demonstratives have been found in some languages like Abui (Kratochvíl 2011), Duna (San Roque 2008), Lai Chin (Barnes 1998) and Makhuwa (Van der Wal 2013) to exhibit uses as insubordinate clausal markers to signal an epistemic association between the propositional content of an utterance and a speech act participant, whereas other languages like Marind have grammaticalised demonstrative roots into verbal markers signalling lack of shared attention and/or knowledge (Olsson 2019). These developments are not particularly shocking when seen in the light of findings from ontogenetic psychology suggesting that deictic pointing in both language and gesture is primarily a mechanism to regulate joint attention (Clark 1978, Diessel 2006). Nonetheless, descriptive and typological accounts of demonstratives sometimes focus on purely spatial properties to the detriment of attentional ones, a situation referred to as ‘spatial bias’ in the literature (Hanks 2011, Levinson 2018a). At the same time, there is sometimes an assumption that either space or social cognition, but not both, ultimately lies at the core of any given demonstrative’s semantics, with deviant uses treated as secondary readings emerging as typicality effects (Burenhult 2003, Knuchel 2019) or as pragmatically conditioned implicatures (Levinson 2018a, 21, Levinson 2018b, 341). However, while some methods have substantiated a sociocognitive analysis for a given demonstrative system, such as that of Yucatec (Hanks 1990, 2009), complementary methods have favoured a spatialist analysis for the same forms (Bohnemeyer 2018). Against this backdrop, recent research (e.g. Peeters et al. 2020) has strived to integrate the spatial and intersubjective dimensions, trying to account for different possible context-dependent factors (Enfield 2018) or modes of access (Hanks 2009, Levinson 2018a), i.e. spatial, discursive, attentional and social channels. Along similar lines, the traditional distinction between exophoric and endophoric demonstratives, i.e. between referential, pointing-accompanied demonstratives, on the one hand, and non-referential, discourse-based ones, on the other (cf. Diessel 1999), has been recently challenged by evidence from demonstrative paradigms, e.g. in Akan (Amfo 2007), Dalabon (Cutfield 2011), Lao (Enfield 2003), Mano (Khachaturyan 2020), Pilagá (Payne and Vidal 2020) or Yurakaré (Gipper 2017), where both kinds derive from, and respond to, a procedural instruction to simply search for a referent situated in either or both space and discourse, whichever is most contextually relevant in a given situation (Khachaturyan 2020).

The aim of the present article is to contribute to these efforts through ethnographically grounded, multimodally enriched data from Phola, a Tibeto-Burman language of the Ngwi subbranch (cf. Pelkey 2011, Bradley 2012) spoken in Yunnan Province of Southwest China. The article makes a threefold contribution to our understanding of deixis in human language. First, it enriches the linguistic description and analysis of deictic usage through extensive multimodal data showcasing the most common gestural and visuospatial configurations that obtain in naturalistic interactions, including metaphoric gestures. Second, it ties in with previous suggestions (e.g. Enfield 2018) that space, cognition and socialisation may all be equally relevant to deictic systems providing empirical data that show how all three dimensions may jointly or independently underlie deictic use. Third, once established that Phola deictics are not semantically coded for space, cognition or social structure and that therefore speakers need to flexibly interpret how demonstratives relate to the context of reference, it is shown how deictic choice not only reflects reality but also construes it. In this regard, structural, pragmatic and multimodal aspects are all explored in tandem to arrive at a fine-grained, multidimensional understanding of how demonstrative use serves the agentive purpose of conveying beliefs, intersubjective appraisals and regulating social behaviour.

Because of the strong pragmatic focus of this research article, the data presented here stem predominantly from dialogues recorded by the author during seven fieldtrips to Luodie village between December 2018 and January 2020. Given my linguistic integration into Luodie’s speech community and social kin networks, I have not only been an external observer but also an actively engaged participant in communicative acts. Hence, some of the examples include Phola utterances produced by myself and are indicated as such via the label GMN (for the initials of my Chinese name). However, these are never used to illustrate the facts of Phola but merely provided as part of the pragmatic background required to understand the utterances produced by native Phola speakers themselves. The selection of examples presented in this piece was qualitatively guided by the research questions at hand and informed by extensive ethnographic observation and metalinguistic discussions with Phola speakers. With a battery of 30 examples drawn almost entirely from naturalistic recordings and 17 figures showcasing 38 stills extracted from videorecorded material,[1] the present study offers both a thorough first glimpse into the Phola language and a preliminary empirical basis for its theoretical claims. The five examples that stem from fieldnotes were identified as theoretically relevant on the fly, transcribed in situ and analysed with the help of the relevant speakers. These are indicated as such in the example code below the English translation line. Moreover, the textual genre, i.e. dialogue, elicitation, narrative, is provided in parenthesis to the right of the example codes. Although far from comprehensive, the body of the article presents all the major uses of the speaker and addressee demonstratives identified whilst transcribing and analysing 8+ hours of (partly) transcribed naturalistic interactions.

The structure of the article is as follows. Section 2 offers the first-ever description of Phola deictics, minimally sketching their basic lexical, morphophonological and syntactic properties. Section 3 presents the speech act demonstratives e⁵⁵ ‘this by me’ and tʰe⁵⁵ ‘that by you’, with a focus on their spatial uses. Section 4 examines non-spatial uses based on conversational and attentional forms of anchoring linked to rich networks of shared knowledge about the spheres of interest of the speaker and the addressee. Section 5 discusses how various contextual factors simultaneously mediate cognitive access to a given referent. This is further fleshed out in Section 6, examining the relative weighting of the spatial and sociocognitive channels. Section 7 moves beyond an examination of context and argues that deixis does not merely reflect reality but also serves to signal epistemic stances, manage conversational turns and evaluate beliefs and behaviour. By way of discussion, Section 8 summarises the main findings and addresses some implications and shortcomings of the study.

2 The Phola deictic system

2.1 The paradigm

Phola displays a six-way deictic–interrogative system structured along a four-way morphological grid as shown in Table 1. This includes an interrogative base alongside five deictic bases with clearly differentiated meaning-functions. The deictics are respectively anchored to the speaker (‘this by me’), the addressee (‘that by you’), the downhill area (‘that down there’), the uphill area (‘that up there’) and a topologically defined cross-boundary zone (‘that across a contextually relevant boundary’). They participate in a morphophonologically derived paradigm comprising a set of nominal forms, two sets of adverbs – locative and manner – and, exclusively for the deictic bases, a set of syntactically disjoint particles that function as a directive attention-aligning device. Each of these morphosyntactic classes is marked by a tonal and a vocalic exponent. In addition, all forms except the disjoint particles come with optional etymologically and functionally opaque suffixes, whose inclusion or exclusion responds to a complex interaction of syntactic and prosodic factors. For semantic and pragmatic reasons, the interrogative root qʰɑ⁵⁵ lacks a disjoint form, but otherwise partakes in the paradigm. Note that the up(hill) deictic base has two lexical forms, of which the one with the palatal nasal, i.e. ɲe⁵⁵ ‘that up there’, is by far the more common one.

Table 1

The deictic-interrogative paradigm

Distance Person Deictic base Nominal deictic Adverbial deictics Disjoint deictic particle Exponents
Locative Manner
Non-distal 1 Speaker e⁵⁵(ve³³) e²²(ⁿte⁵⁵) i³³(sɨ²²) ʑiː²² e/(ʑ)i
This here like.this
This by me Here by me Like this by me Look at this by me!
2 Addressee tʰe⁵⁵(ve³³) tʰe²²(ⁿte⁵⁵) tʰi³³(sɨ²²) tʰuː²² tʰV
that there like.that look.there
That by you There by you Like that by you ‘Look at that by you!’
Distal 3 down(hill) ke⁵⁵(ve³³) ke²²(ⁿte⁵⁵) ki³³(sɨ²²) kuː²² kV
that.down down.there like.that.down look.down
That below There below’ Like that below Look at that below!
up(hill) ɲe⁵⁵(ve³³) ɲe²²(ⁿte⁵⁵) ɲi³³(sɨ²²) ŋuː²² ɲ/ŋV
ŋɔ⁵⁵(ve³³) ŋɔ²²(ⁿte⁵⁵) ŋɔ³³(sɨ²²) look.up
that.up up.there like.that.up Look at that above!
That above There above Like that above
Across (the hill) qɔ⁵⁵(ve³³) qɔ²²(ⁿte⁵⁵) qɔ³³(sɨ²²) qɔː²²
that.across across.there like.that.across look.across
That across There across’ Like that across Look at that across!’
Interrogative base qʰɑ⁵⁵(ve³³) qʰɑ²²(ⁿte⁵⁵) qʰɑ³³(sɨ²²) qʰɑ
int where how
What/Which ‘Where’ ‘How’
Exponents (C)V⁵⁵(ve³³) (C)V²²(ⁿte⁵⁵) (C)V³³(sɨ²²) CVː²²
V = e/ɔ/ɑ V = e/ɔ/ɑ V = i/ɔ/ɑ V = i/u

In line with Dixon (2003), Guérin (2015) and König and Umbach (2018), the four populated columns in Table 1 can be seen as different morphosyntactic kinds of demonstratives, which assign referents to a different ontological type, i.e. object/being, place and manner (König and Umbach 2018, 286) or, in the case of disjoint deictic particles, set up an ontologically unspecified search zone, i.e. an instruction to pay attention to something, someone, somewhere, a way of doing things, a kind or an entire event associated with a deictic centre. However, since the term ‘demonstrative’ is often reserved for nominal, i.e. pronominal and adnominal, forms, e.g. in the studies by Himmelmann (1996), Diessel (1999) or Levinson (2018a), I avoid terminological confusions by using the term deictic for all four Phola forms in combination, whenever relevant, with morphosyntactic descriptors, i.e. nominal deictic, adverbial deictic (divided into locative deictic and manner deictic) and disjoint deictic particle. It is worth noting that there are other deictic forms expressing location, time and quantity, among others, that are less morphophonologically integrated into the paradigm and thus excluded from Table 1 for the sake of simplicity. The most important kind involves a lɑ²²-suffixed version of nominal deictics, used to indicate ‘location at’ (cf. Example 8).

Even more so than for the European languages examined by König and Umbach (2018), the four different forms attested for each deictic base in Phola deserve unitary treatment because they are clearly part of a unified paradigm, i.e. they are morphophonologically related and semantico-pragmatically analogous to each other. Moreover, although the exact etymological makeup of Phola demonstratives cannot be fully ascertained at this stage, contrary to the typological trend identified in Dixon (2003) and attested in other Ngwi languages such as Lisu (Bradley 2003), there is no language-internal evidence that the adverbs and the disjoint particle go back to or are based on the nominal forms (cf. González Pérez 2022). It is possible that all synchronically attested deictics, including the nominal forms themselves, go back to syntactically loose deictic particles as envisioned in Himmelmann (1997, 21).

2.2 Syntax

This section presents a minimal sketch of the grammar of Phola deictics meant as a lead-up foundation for the reader to better follow the discussion of speech act deictics in Sections 38.

2.2.1 Nominal deictics

Nominal deictics are integrated into the noun phrase. In a maximal noun phrase, adnominal deictics (cf. Himmelmann 1996) follow a head noun and precede a quantifier phrase:

(1) LJF: tsʰɔ⁵⁵ e⁵⁵ tʰi³³ xɔ³¹ |
person this one clf.human
This person.’
(YPG1-20181206_01-extH5, 00:04:10.08—00:04:11.01) (bilingual elicitation)

Unlike articles in English, nominal deictics are not an obligatory grammatical category and deictic-less noun phrases are perfectly grammatical. Moreover, in actual discourse, they often appear in simpler noun phrases without a head noun, such as e⁵⁵ in Example 2, or entirely on their own, in what is known as a pronominal use, such as e⁵⁵ve³³ below:

(2) BLW: e⁵⁵ve³³ le³³ ŋɑ³³ kʰɑ̠³¹qɔ⁵⁵ qɔ⁵⁵ = tʰɑ³¹ tɕɔ²² fe⁵⁵ = xi⁵⁵ || e⁵⁵ tʰi³³ kʰɑ̠³¹qɔ⁵⁵ ||
this top 1 village that.across = time have exp = rel this one clf.village
pronominal use adnominal use
This is something that happened in ancient times in our village. ‘This village’
(YPG1-20190430_01-ext1Q8, 00:03:24.05—00:03:28.77) (traditional narrative directed at author)

Taking a quantifier phrase but no head noun, adnominal e⁵⁵ here can actually be thought of as displaying adclassificatory behaviour, in that it is the accompanying classifier (e.g. kʰɑ̠³¹qɔ⁵⁵clf.village’), which provides qualitative information on the entity referred to. Note that the suffixed form with -ve³³ is ungrammatical before a numeral and classifier and slightly preferred over the suffixless variant in pronominal uses, suggesting that it may once have had a nominalising function.

2.2.2 Adverbial deictics

Locative and manner deictics are adverbs, a word class generally absent from Ngwi languages but recently identified for Phola (González Pérez 2022). As such, they may work as adjuncts within a verb phrase or as syntactically disjoint expressions. In the former case, they immediately precede the verb group, coming after both subject and object if there are any, but before pre-verbal elements including TAME markers and negators such as the prohibitive particle tʰɑ³³ ‘do not’. Note that suffixed vs suffixless variants of both locative and manner deictics correlate with prosodically and syntactically integrated vs autonomous uses, respectively. Thus, suffixless forms such as tʰe²² ‘there’ are more likely within a verb phrase, but suffixed forms such as e²²ⁿte⁵⁵ ‘here!’ are preferred as a self-contained utterance. Both patterns are illustrated in BLW’s turn in the following example:

(3) MSN: e⁵⁵-lɑ²² qɑ⁵⁵ le³³ | ʑiː²² || e⁵⁵-lɑ²² | ʑiː²² ||
this-at descend go this-at
‘Let’s go down here, look here! over here, look! It becomes hard to go down from over there, doesn’t it… We have to go down here, then!’
BLW: e²²ⁿte⁵⁵ || tʰe²² tʰɑ³³ qɑ⁵⁵ le³³ || e²²ⁿte⁵⁵ ɲæ⁵⁵ ɕɨ²² nɑ⁵⁵ ||
here there proh descend go here intf walk good
Here! Don’t go down there! It’s very easy to walk here!’
(YPG1-20190503_01-Q4b, 00:10:31:90—00:10:42:20; idem-Q8_c, 00:10:40:50—00:10:45:30) (spontaneous dialogues)

2.2.3 Syntactically disjoint deictic particles

Distributionally, disjoint deictic particles are markedly different from all the other sets. Unlike nominal and adverbial deictics, they do not exhibit phrasal constituency, nor do they directly modify other words. Occupying utterance-initial and utterance-final positions, they pre-empt shared attention, respectively, before and after referring to something. Moreover, they may intervene between syntactic and communicative units whenever a speaker feels the need to add an attentional prompt:

(4) LJF: ʑiː²² | kʰɑ⁵⁵ⁿtsɔ²² pu³¹ tʰi̠²² | kʰɑ⁵⁵ⁿtsɔ²² | e⁵⁵ve³³ le³³ kʰɑ⁵⁵ⁿtsɔ²² pu³¹ = xi⁵⁵ = niæ³³ || basket carry rslt basket this top basket carry = rel = ep
Look at this! This is carrying a bamboo basket. Bamboo basket, this is indeed carrying a bamboo basket.’
e⁵⁵ le³³ | ʑiː²² || e⁵⁵ le³³ kʰɑ⁵⁵ⁿtsɔ²² ||
this top this top basket
‘This, look at this!. This is a basket…’
e⁵⁵ve³³ le³³ || kʰɑ⁵⁵ⁿtsɔ²² pu³¹ = xi⁵⁵ næ̠³¹ | kʰɑ⁵⁵ⁿtsɔ²² i³³sɨ²² næ̠³¹ | ʑiː²² |
this top basket carry = rel adv basket like.this adv
kʰɑ⁵⁵ⁿtsɔ²² i³³sɨ²² pu³¹ = xi⁵⁵ næ̠³¹ i³³sɨ²² | ʑiː²² ||
basket like.this carry = rel adv like.this
‘This one… One carries the basket like this, (carrying it) like this, look here! Carrying a basket like this, it’s done like this, look at me!’
(YPG1-20191201_03-intQ8, 00:00:07.63—00:00:19.63) (instructional speech directed at the author)

Disjoint deictic particles may constitute self-contained utterances both initiating a communicative act and providing a reply (cf. Example 5). While nominal and adverbial deictics may also occur in elliptic replies to an interrogative (cf. Table 1), this typically presupposes elided verbs or nouns, whereas disjoint deictic particles are not formally or semantically linked to any particular structure, and for that reason are much freer to appear on their own. Moreover, unlike the other three deictic kinds, which can be lumped together under the label phrasal deictics, disjoint deictic particles are not specified for ontological types and may be used to draw attention to things, people, places, properties, events or any combination thereof. Since they exhibit a very strong correlation with gestures, the direction and nature of reference is always fleshed out in a multimodal context as shown in Figure 1, corresponding to the following example, where MSN uses tʰuː²² ‘look at that’ to draw attention towards a qualitative–quantitative kind:

(5) GMN: qʰɑ⁵⁵ = xiæ³³ tɕɔ²² ||
int = size.ext have
‘How big is it?’
MSN: tʰuː²² ||
Look at that! (i.e. that big!).
(YPG1-20190512_05-Q8, 00:01:26.20—00:01:30.90) (covert monolingual elicitation)

Figure 1 
                     MSN shows orange size. Co-timed with disjoint deictic particle from addressee set.
Figure 1

MSN shows orange size. Co-timed with disjoint deictic particle from addressee set.

Although they do not constitute predicates in a canonical sense, they may be (natively) paraphrased through verbs of perception, in particular ɲi⁵⁵ ‘look’ (as reflected in translations above) but also verbs of action. As such they are similar to presentative particles such as Russian вoт vot ‘behold.’ or Hebrew הִנֵּה hinne ‘idem’, which are deictic in nature. However, given that Phola disjoint deictics are part of the deictic paradigm, they can be analysed as a dedicated grammaticalisation of what Hanks (1999) calls the directive function of demonstratives, i.e. a call on the addressee to pay attention to a specific linguistically conventionalised search zone, i.e. by speaker, addressee, downhill, etc. While this component is lexicalised in a subset of demonstratives in some of the world’s languages, e.g. Yucatec (Hanks 1990), Turkish (Özyürek 1998) and Jahai (Burenhult 2003), all deictic roots in Phola have a dedicated directive form, which is typologically unusual. As a formally structured device for attentional alignment, Phola disjoint deictics are relevant to the grammar of social cognition, i.e. what Evans et al. (2017a) call engagement. However, a full-blown exploration of their workings falls outside of the scope of the present article.

3 Spatial uses of speech act deictics

Unlike the situation in related Ngwi languages, e.g. Lahu (Matisoff 1973, 2017), Lisu (Bradley 2003, 2017), Nuosu (Gerner 2013) and Khatso (Donlay 2015), but in line with cross-linguistic trends in complex demonstrative systems (Diessel 2013), Phola has a fully paradigmatic opposition between two contrastive deictic terms anchored to the speech act. e⁵⁵ and tʰe⁵⁵, which stand in a complementary relation with one another, indicate that a referent is associated, respectively, with the speaker or the addressee on at least one of the dimensions in Table 2.

Table 2

Speech act deictics in Phola

Referent/Location is associated with…
The speaker’s The addressee’s
e⁵⁵ tʰe⁵⁵
Modes of access Spatial Bodily sphere
Attentional Visual and mental sphere
Intersubjective Epistemic, interactional, social spheres of interest

The most concrete kind of usage is anchored in space and involves reference to the current location and immediate physical space around the speaker and the addressee. This includes entities in immediate contact with or very close to the speech act participants’ bodies, i.e. the so-called peripersonal domain (Coventry et al. 2008, Levinson 2018a). Neither deictic term is purely egocentric, requiring instead attention to a double perspective (in the sense of Evans 2006) straddling both speech act participants. What this means is that speakers first assess a referent’s relative positioning with respect to themselves and the addressee and then assign it to either the speaker’s or the addressee’s sphere:

(6) BLW: xɔ³¹tsɔ³³ le³³ e⁵⁵ kʰɨ⁵⁵ || ⁿtsɔ²² le³³ tʰe⁵⁵ kʰɨ⁵⁵ ||
head.hat top this on hair top that on
‘“Hat” refers to this. “Hair” refers to that’.
(YPG1-20191219_08-Q8b, 00:00:00.56—00:00:03.60) (instructional speech directed at the author)

The gestural channel is fundamental. Note how, in Figure 2, BLW points at his own head whilst producing the speaker deictic e⁵⁵ ‘this’ (blue and red arrows are included henceforth to show the direction of gestures co-timed with speaker and addressee deictics, respectively).

Figure 2 
               BLW points to his own hat. Co-timed with speaker deixis. Both photos/stills/frames correspond to the same frame as captured by a lateral and a frontal camera, respectively.
Figure 2

BLW points to his own hat. Co-timed with speaker deixis. Both photos/stills/frames correspond to the same frame as captured by a lateral and a frontal camera, respectively.

Conversely, the gestures in Figure 3 show that when BLW refers to the referent ‘hair’, he is specifically referring to the addressee’s hair.

Figure 3 
               BLW points to his interlocutor’s hair. Co-timed with addressee deictic in Example 6. Same frame, different cameras.
Figure 3

BLW points to his interlocutor’s hair. Co-timed with addressee deictic in Example 6. Same frame, different cameras.

Physically approaching the addressee allows the speaker to include the former’s peripersonal space within their own sphere of interest. Thus, BLW switches to speaker deixis as he points at his addressee’s hair from very close, as shown in Figure 4, corresponding to the following example:

(7) BLW: e⁵⁵ve³³ kʰɨ⁵⁵ ᵐpɑ³³ = xi⁵⁵ = niæ³³ || e⁵⁵ve³³ kʰɨ⁵⁵ ||
this on say = rel = ep this on
‘I mean this of course, this.’
(YPG1-20191219_08-Q8b, 01:14:34.30—01:14:36.02) (instructional speech directed at the author)

Figure 4 
               BLW points at his addressee’s hair. Co-timed with speaker deictic in Example 7 – same frame, different cameras.
Figure 4

BLW points at his addressee’s hair. Co-timed with speaker deictic in Example 7 – same frame, different cameras.

Previous research on Lavukaleve (Terrill 2018), Dalabon (Cutfield 2018) and Tzeltal (Brown and Levinson 2018) has shown that pointing enlarges the proximal space, as does a distal addressee. Cases like Example 7 show a specific kind of enlargement where the speaker sphere extends into the addressee’s very own bodily space. More generally, while each set is associated with greater proximity to the corresponding speech act participant, absolute distance itself is not determining. For example, the addressee set can index both nearby and far away locations occupied by the addressee, as exemplified for tʰe²² ‘there (where you are)’ in Figure 5 corresponding to Example 3.

Figure 5 
               BLW (red circle on the right) monitors the location of MSN and BTL on the left. He uses addressee deictics to refer to their location. Same frame captured by different cameras (12 m apart).
Figure 5

BLW (red circle on the right) monitors the location of MSN and BTL on the left. He uses addressee deictics to refer to their location. Same frame captured by different cameras (12 m apart).

Inasmuch as addressee deictics normally involve greater distance from the speaker than the speaker deictics and usually also lesser distance than the altitude marked distal deictics, they may superficially seem to behave as altitude-unmarked medials. However, tʰe⁵⁵ ‘that by you’ is always understood by reference to an addressee, and in explicit or contextual opposition to e⁵⁵ ‘this by me’, as in Example 6. Multimodal evidence strongly suggests that usage of tʰe⁵⁵ ‘that by you’ involves tight monitoring on the part of the speaker of the addressee’s whereabouts, as illustrated in Figure 5.

The speaker and addressee can flexibly establish where their relative spheres begin and end. This is most obvious when reference is made to objects in between both of them, as in the following example, where reference to a video recorder fluctuates between speaker and addressee deictics. BLW switches from speaker deictics to addressee deictics and back as his hand approaches and withdraws from the camera’s tiniest lens. Consider first Example 8 where speaker deictics correlate with embodied proximity on the part of the speaker (Figure 6).

(8) BLW: ʑiː²² || e⁵⁵ = æ²²sɨ²² | e⁵⁵ le³³ tʰu⁵⁵ nɑ⁵⁵ | ʑiː²² || this = like this top Thick ev
Look here! like this one. This one is thick, as it is. Look at this!’
(YPG1-20191224_01-ext1Q8, 00:12:17.73—00:12:19.38) (instructional speech directed at the author)

Figure 6 
               BLW uses speaker deictic to refer to camera lens as his hand approaches it. Figures on top row and bottom row correspond to the exact same frame captured by a lateral and a frontal camera, respectively.
Figure 6

BLW uses speaker deictic to refer to camera lens as his hand approaches it. Figures on top row and bottom row correspond to the exact same frame captured by a lateral and a frontal camera, respectively.

However, as soon as the addressee touches the camera, as per Figure 7, corresponding to Example 9, BLW switches to addressee deixis so as to signal that the referent is now in the addressee sphere:

(9) GMN: qʰɑ⁵⁵ve³³ || e⁵⁵ve³³ ||
which this
‘Which one? This here?’
BLW: me̠³¹ || tʰe⁵⁵ve³³ me̠³¹ ||
neg.right that neg.right
‘No, not that one!’
(YPG1-20191224_01-ext1Q8, 00:12:19.89—00:12:22.36) (instructional speech directed at the author)

Figure 7 
               Addressee’s hand approaches the small lens prompting BLW to switch to addressee deixis.
Figure 7

Addressee’s hand approaches the small lens prompting BLW to switch to addressee deixis.

In the span of less than a second, however, the speaker reclaims physical proximity over the referent as the addressee’s hand withdraws (cf. Figure 8), which is reflected in the linguistic stream through a switch back to the speaker deictics as shown in Example 10.

(10) BLW: e⁵⁵ve³³ | ʑiː²² || e⁵⁵ tʰu⁵⁵ || e⁵⁵ le³³ pɑ³¹ ||
this this thick this top thin
‘This one here, look here! This one is thick. This one is thin.’
(YPG1-20191224_01-ext1Q8, 00:12:22.39—00:12:23.91) (instructional speech directed at the author)

Figure 8 
               Speaker reclaims physical proximity to referent, prompting a switch to speaker-anchored deictics. Figures on top row and bottom row represent the exact same frames as captured by a lateral and a frontal camera, respectively. They correspond to the bolded tokens in Example 10.
Figure 8

Speaker reclaims physical proximity to referent, prompting a switch to speaker-anchored deictics. Figures on top row and bottom row represent the exact same frames as captured by a lateral and a frontal camera, respectively. They correspond to the bolded tokens in Example 10.

The pattern in Examples 9–10 is rather different from the one in Example 6, where e⁵⁵ ‘this by me’ and tʰe⁵⁵ ‘that by you’ were used contrastively (in the sense of Levinson 2018a) to refer to two attentionally and conceptually differentiated entities. In contradistinction to this, the two deictic terms are now used coreferentially to index one and the same referent. This clearly demonstrates that speakers are inclined to pay attention to and reflect material and/or perspectival changes in the unfolding speech context. However, it is important to highlight at this point that because of the highly schematic semantic nature of deictic terms, their usage does not always need to be interpreted as emanating from real-time decisions based on the current context. Due to their significantly high frequency in everyday speech (more than 600 tokens per hour as per a preliminary assessment of some transcripts), heuristic shortcuts may develop to streamline the burden of succeeding in choosing the right deictic term. In other words, deictics are often used in conventionalised ways that respond to generic aspects of communicative practice rather than to the full range of specific details of a single communicative event. Note in this regard that the switch from tʰe⁵⁵ ‘that (by you)’ to e⁵⁵ ‘this (by me)’ actually pre-empted the actual, physical change in relative proximity (cf. Figure 8), i.e. BLW uses ʑiː²² ‘look here (by me)’ even before his hand has actually drawn closer to the camera lens than my own. This suggests that the speech act spheres are not only physically established but also conceptually represented. More generally, e⁵⁵ appears to be a default choice for reference to whatever the speaker wishes to highlight in speech as something that is conceptually in their sphere. Prototypically, this will involve things that are physically closer to the speaker, but this is not semantically coded. Conversely, tʰe⁵⁵ is a default choice for reference to whatever the speaker can conceptually associate with the addressee, which prototypically involves things and locations closer to the addressee. As Sections 45 explore, an important pathway for conventionalisation is anchored in speech routines so that whatever the addressee has mentioned conventionally qualifies for tʰe⁵⁵-marking, thus short-circuiting the need for real-time decisions regarding for example spatial distance. It follows that the use of tʰe⁵⁵ in Example 9 can be a conventionalised way to react to the speech turn of an addressee (regardless of the actual spatial layout of the scene) rather than a result of the speaker engaging in online processing of material context.

4 Spheres of interest

Beyond spatial proximity, Phola deixis can be based on discursive, attentional, epistemic, intersubjective and social associations between the speech act participants as deictic anchors and a given referent. Hence, e⁵⁵ ‘this by me’ and tʰe⁵⁵ ‘that by you’ can set up deictic spheres defined by association with the speaker and the addressee, not only as spatially positioned points in material reality but also as cognitively, communicatively and socially engaged agents. Together with space, these perspectives make up deictic zones contrasting the spheres of interest of the speech act participants. This notion is directly inspired by Laury (1997)’s and Naruoka (2006)’s interactional spheres as well as Kamio’s territory of information. In line with Enfield (2003, 2018)’s engagement areas and Khachaturyan (2020)’s spheres of engagement the idea of spheres of interest is meant to integrate (not reject) spatial and sociocultural dimensions as well as subjectively construed epistemic domains (cf. Barnes 1998, Kratochvíl 2011, Cutfield 2011 and Khachaturyan 2020 on epistemic uses of Lai Chin, Abui, Duna and Mano demonstratives, respectively).

Speaker and addressee contrasts unfold in dialogue in principled ways. Once a speaker introduces proximal referents, locations or topics using speaker deictics, other speakers will systematically refer back to these same referents with addressee deictics. Note in this regard, that e⁵⁵ and tʰe⁵⁵ can be used as syntactically disjoint anaphors at the end of an utterance to index a whole discourse topic or statement foregrounded by the speaker and the addressee, respectively:

(11) GMN: ɑ²²tɕʰi³¹ ⁿku²² = nɑ⁵⁵ me²² mɑ³³ tɕʰɔ⁵⁵ ||
what do = abl reach neg succeed
‘Why can’t (he) reach?’
LJF: mɔ²² ⁿti⁵⁵ ᵐpɔ⁵⁵ || nu³³ = ŋɔ³³ mɑ³³ tɕɔ²² ᵐpɔ⁵⁵ || nu³³ = ŋɔ³³
body short mp 2 = height.ext neg have mp 2 = height.ext
mɑ³³ tɕɔ²² = ve⁵⁵ me²² mɑ³³ tɕʰɔ⁵⁵ ᵐpɔ⁵⁵ | tʰe⁵⁵ ||
neg have = bridge reach neg succeed mp that
‘He’s short! He is not as tall as you! He is not as tall as you so he can’t reach, of course; as for that (which you are asking/which you are interested in/which belongs in your epistemic sphere).
(YPG1-20190423_07-ext1H5, 00:13:02.78—00:13:07.65) (monolingually conducted video description task)

In principle, it can be ambiguous whether an addressee deictic is spatially anchored to the addressee or is being used as an anaphoric resumption of referents introduced by the addressee in the previous speech context. However, when the addressee refers to entities that are spatiotemporally distant or to abstract entities without a physical presence, it becomes possible to single out conversational, rather than spatial, proximity as the main motivating factor driving the choice of an addressee deictic:

(12) GMN: ɕɑ̠²²pɑ²² i⁵⁵ⁿtsɔ²² = pɑ⁵⁵ pu²² ɬe³¹ ɲi⁵⁵ = vɑ³³ || ŋɑ³³ tsɑ̠³¹ ki̠²² qɨ³³ ke⁵⁵
meat whole = all even put complete = pfv 1 eat able nmlzr that.down
le³³ | ɲi³¹ ʑɔ³³ me²² mɑ³³ tɕɔ²² || ɑ³³ɬe²²pʰɑ³³ ɲæ⁵⁵ tsɑ̠³¹ ⁿkɑ³¹ | ɕɑ̠²²pɑ²² ||
top two clf.kind only neg have Nisu intf eat like meat
‘They put meat in absolutely everything. There were only two things I could eat. They do like eating meat the Nisus.’
MSN: ɑ³³ɬe²²pʰɑ³³ tʰe⁵⁵ le³³ | ɲæ⁵⁵ tsɑ̠³¹ xu⁵⁵ ⁿti²²vɑ³³ || ɲæ⁵⁵ tsɑ̠³¹ xu⁵⁵ ᵐpe³³le²² ||
Nisu that top intf eat love ucg intf eat love exh
‘Yeah, those Nisus (which you are talking about/which you have had recent experiences with) love eating meat in fact, of course they love eating meat!’
(YPG1-20191216-fieldnotes_YPG5BB) (spontaneous dialogues with author)

Because the referent in Example 12, i.e. Nisu people, is far away from the addressee, discourse-conversational anchoring rather than space can be singled out as mediating the relationship between the referent and the addressee.

Conversely, the speaker set can also be used as an unmarked choice to refer back to speaker-introduced discourse referents, including distal ones. Moreover, as a marker of the speaker’s discursive sphere, e⁵⁵ also has a cataphoric use to pre-emptively signal upcoming content, that is only known to, or prominent in, the speaker’s mind. The usage of speaker proximal demonstratives for cataphora, illustrated in Example 13, has been noted for several language families ranging from Indo-European (Fillmore 1997) to Sepik (Wilson 1980).

(13) BLW: qɔ⁵⁵ = tʰɑ³¹ e⁵⁵ = æ²²sɨ²² = xi⁵⁵ tɕɔ²² fe⁵⁵ […]
that.across = time this = like = rel have exp
‘Once upon a time it had been like this: …’
(YPG1-20190430_01-ext1Q8, 00:00:25.36—00:00:29.20) (traditional narrative directed at author)

For obvious pragmatic reasons, tʰe⁵⁵ lacks this function, although it can mark referents and topics that the speaker presumes the addressee to be invested in (cf. Section 7). Text-cohesive functions of Phola deictics also include recognitional and determinative uses, which are cross-linguistically common and attested elsewhere in Ngwi (Gerner 2003). These pertain to “relations between discourse segments” (Næss et al. 2020, 6) and often go by the name of endophoric (Diessel 1999). While a full treatment of such discourse uses falls outside the scope of this article, deictic usage in Examples 11–13 is discursive in the rather different sense, advanced among others by Burenhult (2003), Hanks (2009) and Næss et al. (2020), that the conversational stream works as a search domain. Put simply, what the addressee has talked about belongs in the addressee’s deictic sphere. Vice versa, what the speaker has talked or will talk about belongs in the speaker’s deictic sphere.

Deictic spheres can also be established on the basis of attentional and mental focus, which may or may not be aligned with spatial proximity. In Example 14, e⁵⁵xɔ²² ‘in this one’ serves to draw attention to the speakers’ own focus of attention, a specific path:

(14) BLW: ɑ³¹ʑɑ³³ || ɑ²²kʰɨ⁵⁵ = nɑ⁵⁵ vɨ⁵⁵ kɔ³³ me⁵⁵ || ɕɨ²² tʰɔ³³ mɑ³³ ⁿtɨ²²
intj above = from cross return mp walk through neg can
pe⁵⁵tɕɔ²² nɑ⁵⁵ ||
probably ev
‘Oh no! We’ll have to go all the way back up then. Looks like you can’t go through.’
BTL: ɑ²²ki̠³³ =nɑ⁵⁵ | e⁵⁵ = xɔ²² | e⁵⁵ = xɔ²² kɑ³³ tʰɔ³³ | tɕɑ³¹ ||
downhill = from this = in this = in also through tag
‘Downhill! Here! (we) can also go through here, right?’
(YPG1-20190503_01-STH2n, 00:21:29.35—00:21:34.83) (spontaneous dialogues)

The video recording of the interaction shows how the path that BTL is referring to, i.e. the one on the right-hand side, is closer to his two addressees, who are walking in front of him.

Crucially, however, these had turned their attention and their bodies towards the left-hand path as shown in Figure 9. This allows BTL to situate the right-hand path within a speaker-unique deictic sphere that was contrastive with the deictic sphere of his addressees. In this case, the deictic zones are defined by attentional and embodied engagement, not by spatial distance (Figure 10).

Figure 9 
               BLW and MSN, the frontrunners, direct their gaze and bodies to the left-hand road.
Figure 9

BLW and MSN, the frontrunners, direct their gaze and bodies to the left-hand road.

Figure 10 
               BTL uses speaker deictics while pointing towards the right-hand path.
Figure 10

BTL uses speaker deictics while pointing towards the right-hand path.

In line with Levinson (2018a, 32) and Cutfield (2018, 103), an extended range is licensed by pointing, especially when the speaker is paying more attention to the relevant referent/location than the addressee. This extended range may reach into the immediate peripersonal domain of the addressee as was the case for Example 7. Figure 11 shows the pointing gestures used by two speakers referring to far away villages that they are introducing into the speech for the first time using the speaker deictics e⁵⁵ ‘this one’ and e²² ‘here’ (full transcripts omitted for the sake of conciseness).

Figure 11 
               Speaker deictics e⁵⁵ ‘this one’ and e²² ‘here’ for far-away villages pointed at and mentally focused by two different speakers (each photo corresponds to a different recording event with a different sitting arrangement).
Figure 11

Speaker deictics e⁵⁵ ‘this one’ and e²² ‘here’ for far-away villages pointed at and mentally focused by two different speakers (each photo corresponds to a different recording event with a different sitting arrangement).

Beyond space and attention, referents may be associated with the speech act participants from a social viewpoint. This may reflect interpersonal knowledge of what Manning (2001) calls perduring context, i.e. socially stable cultural structures such as kinship ties, but also layers of small-scale social relations that are more fluid and interactional in nature (cf. Laury 1997). To illustrate, consider Example 15, where MSN refers to her addressee’s room using the addressee set.

The speaker was standing closer to the relevant location, i.e. the addressee’s room, than the addressee himself. This strongly suggests that tʰe⁵⁵lɑ²² means ‘there where you usually reside’ and not ‘there where you are right now’. In other words, deictic spheres can be carved up on the basis of habitual praxis, e.g. habitual locations or what Bickel (2001) calls sociocentric space and not just the location in the here and now.

(15) MSN: lɔ³¹sɨ³³tɑ³³ tʰe⁵⁵-lɑ²² tɑ²² mɑ³³ kɨ⁵⁵ || nu³³ = ⁿtse²² tʰe⁵⁵-lɑ²² ||
screwer that -at leave neg worry 2 = among that -at
‘You can leave the screwdriver there, at your place over there.’
(YPG1-20191128-fieldnotes_YPG5AA) (spontaneous dialogues with author)

Habitual associations can even motivate a deictic choice that violates the spatial layout that is usually associated with a given deictic. For example, tʰe⁵⁵ can be used for a chair that the speaker is holding with his own hands because this is the chair that his addressee usually sits on at mealtimes:

(16) LJF: nu³³ tʰe⁵⁵-lɑ²² u̠³¹ ||
2 that -at sit
‘Sit (at your usual) there!’
(YPG1-20191220-Fieldnotes_YPG5BB) (spontaneous dialogues)

The fact that the chair was in the speaker’s immediate proximity shows how a socially conventional association may not only override but also directly contradict the usual implicature of spatial proximity between the addressee and a tʰe⁵⁵-indexed referent.

Previous accounts of such mismatches have appealed to mental transpositions or Deixis am Phantasma (Bühler 1965), whereby a speaker presumably situates the deictic centre, e.g. themselves, on a different spatial location, e.g. the addressee’s or some other contextually relevant one which may be established through pragmatic (Levinson 2004, 2018a) or conventional cues (Bickel 2001). Other accounts, e.g. Peeters et al. (2020), have focused on how aspects of the referents themselves (from ownership to harmfulness) may motivate a semantically unexpected choice of deictic. A more economic solution is to acknowledge that both spatial and socio-epistemic proximity between speech act participants and a given referent are legitimate grounds for deictic reference. Indeed, Example 16 can of course also mean ‘sit right there where you are’. However, how exactly the addressee can (be expected to) identify the referent/location by reference to the deictic point of reference, i.e. in this case, the addressee, is not semantically coded but open to contextual interpretation. Such an account thus places the focus on an examination of pragmatic context understood as a broad search domain that mediates access to referents via different possible modes of access, to which we now turn.

5 Modes of access

Linguistic anthropology approaches to deixis (e.g. Hanks 1990, Manning 2001) have drawn attention to the limitations of overemphasising the ephemeral aspects of the speech act situation, e.g. a current spatial layout, to the detriment of more stable and socially entrenched aspects of the cultural context, e.g. ethnic and kinship structure. Table 3 provides a preliminary heuristic model showing how the usage of Phola deictics is sensitive to both transient and perduring context spanning spatial, attentional, and intersubjective perspectives. These different perspectives are idealised modes of cognitive access to a referent (Hanks 2009), i.e. dimensions or modalities along which speakers and hearers search for referents (Levinson 2018a), which can be thought of as layered along a gradual cline going from purely spatial to purely intersubjective, allowing room for intermediate overlapping zones in between. As such they straddle the whole spectrum between what some researchers call spatial cognition (cf. Levinson 2018a) and social cognition (cf. Evans 2021), which together make up holistic spheres of interest for the speaker and the addressee.

Table 3

Modes of access in Phola deixis; S/A = the speaker or the addressee

It is important to stress that the three proposed modes of access are not categorically distinct nor are they mutually exclusive. Very particularly, spatial and attentional search domains often integrate intersubjective perspectives, in the sense that the speaker takes among others the physical location of the addressee and their gaze direction into consideration when deciding to assign a referent to the speaker or the addressee sphere. Against this backdrop, the intersubjective mode of access is heuristically defined in negative terms, as a cover category for cases where physical and attentional perspectives do not explain deictic usage very well. At the same time, intersubjective anchoring is also positively defined as a category in its own right, which covers instances of deictic usage distinctly and primarily motivated by interactional and social factors.

A preliminary assessment of a small sample of 27 min of transcribed dialogues comprising 265 tokens of speech act deictics tagged for one or more modes of access[2] suggests that space, attention and intersubjectivity are roughly equally important factors mediating deictic choice. Some form of spatial footing was identified for 58% of examples, although only 24% were unambiguously motivated by direct spatial proximity, with the rest corresponding mostly to gestures whereby a non-proximal referent is metaphorically included within the sphere of the speaker or the addressee (on which more in Section 7). Attentional anchoring was identified for 40% of all examples, whereas intersubjective grounding was identified for 46% of tokens, of which 26% were primarily mediated by the appearance of the relevant referent or a thematically related discourse entity in the surrounding conversational turns of the speaker and the addressee, 5% were distinctly motivated by perduring social structures such as kinship ties or ethnicity, and the rest were best explained by reference to interactional and interpersonal knowledge about each other’s habits, activities, relations and the like. Note that since tokens were coded for as many modes of access as relevant, the percentages provided above do not add up to 100%.

Broadly acknowledged in the literature to drive deictic use, discourse has been identified as an important mode of access explaining up to a quarter of all speech act deictics in the data sample as per above (cf. Examples 11–13 for an illustration). As mentioned in Section 4, the term discourse here is used in a ‘non-relational sense’ to refer to the intersubjective anchoring of utterances to “a specific context, which is defined by the speech-act participants and their dialogic relations and interactions” (Næss et al. 2020, 6),[3] and not to the more usual ‘relational sense’ to do with discourse-segment-structuring functions (cf. Himmelmann 1996, Diessel 1999). Drawing inspiration from anthropologically and conversation-analytically orientated scholarship (Laury 1997, Hanks 1999, 2005, Burenhult 2003, Enfield 2003, 2018, Heritage 2005, Khachaturyan 2020), the model proposed here bypasses the traditional distinction between exophoric vs endophoric and that between deictic vs non-deictic uses, as per Diessel (1999) and Levinson (2004, 2018a), respectively. The Phola data is better accounted for by assuming that deictic reference is a multidimensional practice that can be anchored in either or both space and interactional discourse as well as more mental and social dimensions.

While the heuristic construct of spheres of interest may not be operationalised in exhaustive terms, it has a high practical utility to describe the pragmatics of deictic usage in real life, especially when it comes to cases where speakers draw from integrated packages of perceptual, conceptual and social information. For example, while kinship relations can serve as a basis for deictic reference, kinship information often needs to be combined with contextually shared knowledge for a given deictic token to be successfully interpreted. To illustrate, consider how BLW refers to a specific person using addressee deixis in Example 17. The person indexed through tʰe⁵⁵(ve³³) ‘that by you’ is the addressee’s classificatory uncle, who is absent from the speech act and is being introduced into the focus of conversation for the first time by the speaker as a point of reference to convey the message that the traditional garments are now so outdated that not even the addressee’s classificatory uncle, who is significantly older than the addressee but younger than the speaker, would have worn or even seen them.

(17) BLW: nu⁵⁵pɔ³³mɔ³¹ = æ²²sɨ²² tʰe⁵⁵ve³³ = ve⁵⁵ || tʰe⁵⁵ ʑɔ²² = nɑ⁵⁵ = ⁿtse²² ke⁵⁵ = æ²²sɨ²²
2 .uncle = like that = bridge that house = from = among that.down = like
mɑ³³ qɑ²² = vɑ³³ = niæ³³ || tʰe⁵⁵ve³³ = æ²²sɨ²² = xi⁵⁵ ŋɔ³³ = æ²² ŋɔ³³ mɑ³³ fe⁵⁵
neg wear = cos = ep that = like = rel see = and see neg exp
pe⁵⁵tɕɔ²² || ŋɔ³³ mɑ³³ fe⁵⁵ | tʰe⁵⁵ ||
probably see neg exp that
‘And people like that uncle of yours, at that one’s house (they) already wouldn’t have worn such clothes anymore, of course. People like that one, they wouldn’t even have seen them. Hasn’t seen them, that one.’
(YPG1-20191219_08-ext1Q8, 00:25:57.43—00:26:05.41) (spontaneous dialogues with author)

Reference is not in this case spatially anchored but responds to the kinship affiliation between the addressee and the referred person, which motivates the choice of tʰe⁵⁵. At the same time, however, there are many elders who fit the category of addressee’s uncle, so the specific person talked about is only obvious because both know (that both know) about the addressee’s particularly close relation to the village shaman. Deictic reference relies on the common ground (cf. Clark 1996, Enfield 2018), which includes an understanding of both classificatory kinship networks and details about the interpersonal relations between specific individuals.

Speakers accrue a shared pool of knowledge across interactions leading to a proliferation over time of intersubjectively loaded deictic expressions that can only be understood by reference to their interpersonal relations (Deppermann 2018). An important analytical point is that addressee deixis in Example 17 can in principle be licensed by kinship, prior shared knowledge (cf. Knuchel 2019, Burenhult 2003) and habitual praxis, since all of these are contextually co-activated by tʰe⁵⁵ ‘that (which belongs in your sphere)’, which is by nature anchored in dialogic interactions. A potential niche for isolating socially entrenched structures from intersubjectivity is provided by internal monologues. By way of illustration, consider Example 18 from a traditional moralising fable about parent–offspring relations, where a woman’s thoughts, here rendered in direct quotation, include a speaker deictic to refer to her own son. Interestingly, she could see him approaching from above in the distance, and because she thought he was coming to beat her (even though he was in fact trying to seek redemption from his poor behaviour by bringing her food), she ran away, tripped over and died.

(18) BLW: ki³³sɨ²² ⁿtuæ³² le³³ || u³³ le³³ xɔ³³tɕɑ̠³¹ xɑ⁵⁵ tɔ²² sɔ⁵⁵ = ve⁵⁵ ||
like.that think top 3 top food send will intend = bridge
u⁵⁵mɔ³³ ke⁵⁵ | ŋɑ³³ zɑ³¹ e⁵⁵ | ŋɑ³³ kʰɨ⁵⁵ ⁿtu²² tɔ²² ⁿti²²vɑ³³ | ⁿtuæ²² lɨ³³ ||
3.mother that.down 1 son this 1 do hit will ucg think seq
ʑe³¹ = æ²² le³³ | tʰɑ̠³¹ lu³¹ = vɑ³³ næ̠³¹ nɑ⁵⁵ ||
run = also top pierce fall = pfv adv ev
‘Thinking like that, he set out to bring her food. His mother thought: “this son of mine is, I now know, going to hit me,” and so she ran, she tripped and got pierced (by a sharp tree stump), as the story goes.’
(YPG1-20190430_01-ext1Q8, 00:07:48.91—00:08:00.05) (internal monologue embedded in a traditional story)

Crucially, the distance that already obtained between the mother and her son and her obvious desire to distance herself even further from him does not preclude the usage of a speaker deictic, suggesting that it is their kin bond which motivates the choice of e⁵⁵ to the detriment of other modes of access. Since the deictic expression is embedded in an utterance that represents a personal thought that was presumably not even uttered aloud, this example allows perhaps to isolate a conventional social dimension of anchoring from the conversational and intersubjective dimensions, although, of course, e⁵⁵ can still be understood as an index of a much more general kind of mental focusedness, e.g. ‘this son (which I am seeing and which I am acutely aware of right now)’. Importantly, this focusedness is not mediated by any obvious discourse-attentional factors. For example, e⁵⁵ ‘this’ in Example 18 is not used contrastively to pick out one particular son over other possible referents (no mention is made in the story to other sons or daughters). More generally, it is worth noting that, contrary to trends identified in the typological literature, (e.g. Diessel 1999, Næss Margetts and Treis 2020, Nakhola et al. 2020), Phola deictics are not semantically coded for the status of a referent as established or activated in discourse. Hence, both speaker and addressee deictics routinely appear in first, second and ulterior mentions of a referent (cf. Gipper 2017 for a similar situation in Yurakaré).

Even though speakers may creatively draw on various contextually intertwined factors when they choose to constitute deictic spheres of reference, one may ask whether any given one of them may be particularly likely to outweigh the others under certain conditions. For example, for deictic use in Spanish, Peeters et al. (2020) have assessed that the psychological level (cf. Jungbluth 2003) is usually predominant but may be overruled by physical distance when this is particularly salient in context. As a modest contribution towards tackling this issue, next section looks at extreme cases where one can identify a clear mismatch between two important modes of access, space and attention. It will be shown how both can independently license deictic choice even when they ‘clash’ with one another.

6 Space vs attention

Prior accounts of engagement (in the sense of Evans et al. 2017a, 2017b) in demonstrative systems have argued that certain addressee-anchored forms, such as Kogi twẽhié (Knuchel 2019), and Jahai ton (Burenhult 2003), are markers of shared attention used when an entity (within the peripersonal domain of the addressee) is visible to both speaker and addressee. Although they have spatial uses, these emerge as a ‘typicality effect’ from the overlap between spatial proximity and attentional proximity, i.e. physically closer typically means attentionally accessible and vice-versa (cf. Burenhult 2003, 2018). Moreover, they may even contrast with an addressee-inaccessible form, such as Jahai tũn ‘that (in your peripersonal domain) which you are not yet attending to’, which further confirms that it is attentional access rather than spatial proximity what drives deictic choice.

Unlike these forms, Phola tʰe⁵⁵ is not a purely attentional deictic. While it may latch onto the addressee’s cognitive access to a given referent, this is only a contextually given interpretation that is not semantically coded. This is evident from cases where tʰe⁵⁵ directs an addressee’s attention towards referents unknown to them, such as a cord to turn the lights on, which was hidden from view in the following interaction:

(19) PYNA: tʰuː²² | tʰe⁵⁵ = nɑ⁵⁵ tɕi̠³³ tʰi̠²² || tʰuː²² || tʰe⁵⁵ = ɣɔ²²tɔ³³ u̠³¹ tɕɑ³¹ […]
look.there that = from pull rslt look.there that = behind be tag
tʰe⁵⁵ = xɔ²² tʰe⁵⁵ tɕi̠³³ || tʰe⁵⁵ tɕi̠³³ tʰi̠²² || tɕi̠³³ = lɔ³³ = nɑ⁵⁵ tɕi̠³³ tʰi̠²² tɔ²² ||
that = in that pull That pull rslt pull = side = from pull rslt must
Look there! Pull it on using that one, look there by you! There by you! It’s behind that thing there by you, isn’t it? In there, pull that one by you, pull it down! You have to pull it from the pullable side.”
(YPG1-20190512_07-ext1Q8b, 00:08:27.16—00:08:49.13) (spontaneous dialogues with author)

Crucially, the speaker sticks to addressee deictics, even after it becomes clear that there is an epistemic asymmetry whereby only she knows the identity and location of the referent:

(20) PYNA: me³³ xe̠³¹ || i³³sɨ²² i³³sɨ²² i³³sɨ²² i³³sɨ²² || ŋɔ³³ mɑ³³ ki̠²² = e⁵⁵ ||
neg be.right like.this like.this like.this like.this see neg able = adr.asym
tɕɑ²²kʰɨ⁵⁵ || me̠³¹ || tɕɑ²²kʰɨ⁵⁵ ŋɔ³³ mɑ³³ ki̠²² = e⁵⁵ […] xæ²² ||
cord neg.right cord see neg able = adr.asym yes
tʰe⁵⁵ xæ³³ || tʰe⁵⁵ tɕi̠³³ || ɔ⁵⁵ ||
that yes that pull intj
‘That’s not it! Try doing like this, like this, like this, like this! You can’t see it, can you? There’s a cord! No! Can’t you see a cord? Oh yes! That’s the one! pull that one! Oh!’
(YPG1-20190512_07-ext1Q8b, 00:08:54.7—00:09:09.02) (spontaneous dialogues with author)

Shared attention and epistemic states are hard to operationalise, but can be assessed through gaze direction, pointing and other multimodal cues (cf. Özyürek 1998, Enfield 2003, Hanks 2011, Peeters et al. 2015, Olsson 2019) that obtain in real-life interactions such as the one presented in Examples 19–20. Note in particular how PYNA’s enthusiastic pointing disappears once the addressee has found the referent, which strongly suggests that she is actively tracking his attentional states (Figure 12).

Figure 12 
               tʰe⁵⁵. Co-timed with pointing before, but not after, the addressee finds the referent.
Figure 12

tʰe⁵⁵. Co-timed with pointing before, but not after, the addressee finds the referent.

However, contrary to the situation reported for purely attentional deixis in languages such as Turkish (Özyürek 1998, Küntay and Özyürek 2006) or Jahai (Burenhult 2003), there is no change in deictic choice that would correspond to the changes in attentional states of an addressee as per the following formula showing the correlation between attentional states and deictic choice:

It must be concluded that the choice of the addressee deictic set in Examples 19–20 is motivated by other factors. In this particular case, the referent’s immediate spatial proximity to the addressee combined with the lack of physical access on the part of the speaker interacts in the context of a directive speech act where the speaker projects a certain sense of urgency to locate the referent. In other words, the speaker needs the addressee to quickly identify the presence of a referent and assume an active role in engaging with the referent attentionally and manually. The use of tʰe⁵⁵ epistemically drags the referent into the addresse’s sphere of interest, over which he is expected/desired to exert an active control given the spatial layout.

At the same time however, spatial anchoring can in some cases be unambiguously ruled out. The clearest example of this involves addressee-marked reference to the speakers’ own body or clothes, which obviously belong in their own spatial sphere. In Example 21, each token of tʰuː²² ‘look there by you’ is produced as the speaker reveals one layer of clothing (cf. Figure 13). These are physically close to the speaker but can still be anchored to the addressee’s sphere of interest to reflect the latter’s discursive authority over the conversational topic (cf. Examples 11–12).

(21) GMN: ɑ³³ɲi³³nɑ̠²² = xe³³ mɑ³³pɑ³¹ ⁿkɔ²² mɔ²² | li³¹ te⁵⁵ pu²² qɑ²² ɬe³¹ tɔ²² ||
-1year = amount.ext more cold cause four clf.layer even wear into must
‘Because it’s colder than last year (I) have to wear up to four layers’
MSN: ŋɑ³³ qʰɑ⁵⁵ = xu³³ tʰi³³ pɨ²² qɑ²² ɬe³¹ | tʰuː²² || tʰuː²² || tʰuː²² ||
1 int = size.ext one clf.pile wear into look.there look.there look.there
tʰuː²² || ŋɑ³¹ te⁵⁵ ||
look.there five clf.layer
‘I am wearing a lot! Look at that! Look at that! Look at that! Look at that! Five layers.’
(YPG1-20191204-fieldnotes_YPG5AA) (spontaneous dialogues with author)

Figure 13 
               MSN pointing at her own clothing. A photograph was taken during the dialogue in Example 21 which was however not audio recorded.
Figure 13

MSN pointing at her own clothing. A photograph was taken during the dialogue in Example 21 which was however not audio recorded.

Similar to the situation reported for Dalabon (Cutfield 2011, 214), Phola speakers do not consistently show a preference for one over the other mode of access across identical or similar acts of reference. However, discourse anchoring (in the sense sketched out in Section 45) seems particularly likely to trump spatial and social proximity. In other words, if an addressee talks about something, it almost automatically becomes tʰe⁵⁵-indexable to the detriment of any other possible contextual considerations.

The two case studies presented thus far in this section represent the extremes of the design space. In Examples 19–20, the choice of tʰe⁵⁵ is mainly motivated by a spatial layout, whereas in Example 21, there is no way it can be grounded on space. However, in most contexts, there is no mode of access or search domain that unambiguously primes over the others. This semantic underspecification, which has been previously pointed out by scholars such as Enfield (2018, 87), is at the centre of a nominalisation construction in Phola where speaker and addressee deictics are implemented to set up a generic contrast between two spheres of interest corresponding, respectively, to a first-person and a second-person subject/topic. Crucially, the deictics here, which are syntactically obligatory, have scope over a whole event and cannot be said to be unambiguously spatial or epistemic, as neatly illustrated by the following example:

(22) LJF: nu³³ ɕɨ²² qɨ³³ tʰe⁵⁵ le³³ | ŋɑ³³ ɕɨ²² qɨ³³ e⁵⁵ = ŋɔ³³ mɑ³³ tɕɔ²² ||
2 walk nmlzr that top 1 walk nmlzr this = length.ext neg have
‘You didn’t walk as far as me (lit. That of you walking was not as long as this of me walking).’
(YPG1-20190516_04-Q8_d, 00:26:05.75—00:26:07.90) (monolingual elicitation)

This phenomenon can be functionally related to utterance-final locative demonstratives in Lai Chin, which locate not a referent “but rather the setting of the whole scene” (Barnes 1998, 55). Like Phola, Lai Chin demonstratives also show a strong correlation with the clausal subject/topic, so that, for example, a speech act distal form cannot be felicitously appended to a clause with a second person pronoun. Additionally, Phola exhibits a strong correlation between noun phrases containing first and second person personal pronouns, and speaker and addressee deictics, respectively. Thus, deictics are routinely used as modifiers of possessively modified nouns, e.g. ŋɑ³³ zɑ³¹ e⁵⁵ ‘(lit.) this my son’ in Example 18, as well as nouns with generic reference, e.g. lɑ̠²² e⁵⁵/tʰe⁵⁵ ‘(lit) these/those tigers’, i.e. ‘tigers in general (as something belonging in the sphere of interest of the speaker/addressee’.

Such paradigmatically contrastive epistemic nuances have been reported for nominalising demonstratives with clausal scope in Timor–Alor–Pantar languages (Schapper and San Roque 2011) and Abui (Kratochvíl 2011), which contribute to express epistemic and modal nuances. The next section is explicitly devoted to the analysis of deontic values with a focus on addressee deictics.

7 Deictics as enactors of social cognition

We have looked at how observable, inferable and mentally apprehensible factors of a spatial, attentional and social nature influence deictic choice. However, far from merely passively or deterministically reflecting reality, Phola deictics also serve to project a conceptual or deontic take on a situation, in a way that resonates with prior descriptions of demonstratives in languages like Finnish (Laury 1997), Lai Chin (Barnes 1998), Lao (Enfield 2003), Duna (San Roque 2008) and Abui (Kratochvíl 2011).

Borrowing from conversation analysis and anthropological linguistics, it is useful to think of Phola speech act deictics as markers of territories of information (Kamio 1997), territories of knowledge (Heritage 2012) and engagement zone perimeters (Enfield 2018). For example, in Examples 19–20, placing the switch cord within the addressee sphere not only ‘responds to’ or ‘reflects’ immediate physical proximity but also suggests that the enjoined action, i.e. pulling the switch, falls within the addressee’s physical capability and deontic responsibility. Put another way, the choice of an addressee deictic implies the potential for epistemic accessibility, i.e. ‘you should be able to locate it already’, thus allocating responsibility, i.e. ‘you need to activate the lights because I am all wired up with the mic and can’t get there’ (cf. Barnes 1998, 56 for a similar analysis of the Lai Chin addressee demonstrative).

For this reason, speech act deictics can work as interactional cues. For example, the two administrators of a video description task systematically used addressee deictics to prompt the remaining three participants to produce explanations of on-screen referents:

(23) LJF: tʰe⁵⁵ | ɑ²²tɕʰi̠³¹ ⁿku²² nɑ⁵⁵ || ke⁵⁵ = xɔ²² ||
that what do ev that.down = in
‘As for that (which I want you to talk about), what’s happening in there?’
BTL: tʰe⁵⁵ | ɑ²²tɕʰi̠³¹ ⁿku²² læ³³ ||
that what do q
As for that (which it is now your turn to comment on), what is happening?’
(YPG1-20190425_02, 00:22:51.23—00:22:52.74) (task-based instructional speech amongst Pholas)

The screen being equidistant from everyone, tʰe⁵⁵ ‘that by you’ in Example 23 is not spatially anchored to the addressees, but it also does not signal a presumed privileged epistemic access on their part, given that LJF and BTL, being as they were the task administrators, had privileged attentional and epistemic access to the content of the videos. Instead, what tʰe⁵⁵ does is assigning a speech turn to the addressees. By deictically placing the matter at hand in the addressee’s sphere, the speaker passes on the right of word and invites them to comment on it. This is in essence analogous to the light cord example (Examples 19–20) where the speaker also entitles the addressee with acting, except that the expected action is in this case talking. Conversely, speaker deictics are a common device recruited by speakers who want to (abruptly) claim the right of word. Assessing they know best, speakers will often interrupt an addressee saying something along the lines of:

(24) MSN: i³³sɨ²² ᵐpɑ³³ = xi⁵⁵ = niæ³³ | ʑiː²² ||
like.this say = rel = ep
This is how it is said, take a listen!’
(YPG1-20190511_05-ext1H5, 00:46:35.92—00:46:38.13) (spontaneous dialogues with author)

We will see in Examples 26–27 how epistemic nuances of the addressee deictic can correlate with metaphoric gestures on the multimodal stream, which place ideas in the embodied sphere of the addressee. However, when addressee deictics lack any concomitant gestures, the door is left open for other accompanying deictics to use the visuospatial channel. This is neatly illustrated for Example 23, where LJF (rightmost speaker) produces non-gestural tokens of tʰe⁵⁵ ‘that by you’, that contrast with a gestural token of ke⁵⁵ ‘that down there’ (corresponding to the yellow arrow in Figure 14). The latter, but not the former, spatially locates the referent.

Figure 14 
               LJF (rightmost speaker) uses the addressee deictic tʰe⁵⁵ in Example 23 without producing any gestures. Then, he points at the screen while producing the distal deictic ke⁵⁵.
Figure 14

LJF (rightmost speaker) uses the addressee deictic tʰe⁵⁵ in Example 23 without producing any gestures. Then, he points at the screen while producing the distal deictic ke⁵⁵.

Related to Example 23 and Figure 14 are cases where we find a single gesture corresponding to more than one deictic base that index the same referent from different perspectives as complementary or incremental steps towards shared attention. For example, to get his addressee to attend to a species of tree unknown to him, BLW first uses a speaker deictic to draw attention towards his pointing and gaze cues, and then adds a downhill deictic to redirect the addressee’s attention downhill where the referent is located:

(25) BLW: ʑiː²² || e⁵⁵ le³³ xɑ³¹pɨ³³ || this top rubber.oak
Look at me! This is a rubber oak’
GMN: xɑ³¹pɨ³³ ||
‘Rubber oak’
BLW: xɑ³¹pɨ³³ || kuː²² | ke⁵⁵ || kuː²² ||
rubber.oak look.downhill that.down look.downhill
‘It’s a rubber oak. Look down there, that one down there, look down!’
(YPG1-20190427_04-Q8a, 00:09:00.50–00:09:04.95) (instructional speech directed at the author)

As shown in Figure 15, speaker deictics (shown in blue) and downhill deictics (shown in yellow) correspond univocally to the exact same pointing gesture, sustained through the entire interaction, which strongly suggests that both are co-referential with the same attentional and denotational focus, the rubber oak tree. However, they each pick out a different aspect of the scene. The speaker deictic indexes the tree by reference to the speaker’s attentional and gestural focus whereas the downhill deictic indexes it by reference to its spatial location.

Figure 15 
               Speaker draws attention towards a tree using speaker and downhill deictics.
Figure 15

Speaker draws attention towards a tree using speaker and downhill deictics.

Co-referential deictics of the kind shown in Examples 23 and 25, as well as Examples 9–10 in Section 3, are a testament to “the ease with which speakers shift their perspectives on an object” (Hanks 2009, 22). There is often more than one contextual dimension, e.g. conversation and space for Example 23; attention and space for Example 25, that is simultaneously salient and useful for speakers. Providing two deictic anchors for the same referent can maximise the efficiency and precision of the referential act not only because it offers complementary bits of information but also because it can clarify the procedural instruction directed at the addressee, i.e. “first look at me and my gesture, then, in case you haven’t already, look downhill accordingly.”

Beyond interactional cues, speech act deictics can constitute spheres of interest based on speakers’ interpretations, assumptions and expectations about each other’s mental and social reality. A case in point involves a common use of addressee deictics when a speaker qualifies or rejects an idea that they presume their interlocutor to be epistemically invested in. In Example 26, BLW uses tʰe⁵⁵ ‘that by you’ to index the false belief that lotus plants could survive with just water or just mud. As suggested by the translation, the addressee deictic projects a reading along the lines of ‘as you may be (wrongfully) thinking’. The speaker presents the content as something the addressee had brought into focus, as in Examples 11–12 and 21, even if they didn’t sensu strictu:

(26) GMN: ⁿkɑ̠³¹ = xɔ²² tɨ³³ mɑ³³ tɨ³³ ni³³ ||
water = in grow neg grow q
‘Does it grow in water?’
BLW: xæ²² || ⁿkɑ̠³¹ nɨ³³ ne³¹tɕʰɑ³¹ || ⁿkɑ̠³¹ = æ²² mæ²² | ne³¹tɕʰɑ³¹ = æ²² mæ²² ||
yes water and mud water = also need mud = also need
“Yes! Both water and mud. It needs both water and mud.”
ne³¹tɕʰɑ³¹ le³³ || ɑ²²qʰu⁵⁵ = ɬɔ³³ = nɑ⁵⁵ pi̠²² tɔ²² ⁿti²²vɑ³³ || e²²ⁿte⁵⁵ te⁵⁵ |
mud top below = towards = from gesture will ucg here measure
ne³¹tɕʰɑ³¹ le³³ e⁵⁵ = ŋɔ³³ nɑ̠³¹ || ⁿkɑ̠³¹ le³³ e⁵⁵ = ŋɔ³³ = æ⁵⁵næ̠³¹ nɑ̠³¹ mæ²² ||
mud top this = height.ext deep water top this = height.ext = dim deep need
‘In terms of mud… I’m going to gesture from below, mind you! You need mud to this level of depth, and water only this small level of depth.
ɲi³¹ ʑɔ³³ pɑ⁵⁵ mæ²² || ⁿkɑ̠³¹ = æ²² mæ²² | ne³¹tɕʰɑ³¹ = æ²² mæ²² […]
two clf.kind all need water = also need mud = also need
ne³¹tɕʰɑ³¹ pu²² ne³¹tɕʰɑ³¹ | tʰe⁵⁵ | ⁿku²² tɕʰɨ³¹ mɑ³³ ⁿtɨ²² niæ³³ ||
mud all mud that do complete neg can ep
‘Both kinds are needed.’ ‘Water is needed and so is mud… If it’s just mud, that (which you may have thought since you asked about water) won’t work.’
ⁿkɑ̠³¹ pu²² ⁿkɑ̠³¹ = æ²² nɨ³³ | ne³¹tɕʰɑ³¹ mɑ³³ u̠³¹ | ke⁵⁵-lɑ²² tɨ³³ mɑ³³ ki̠²² ||
water all water = also and mud neg be that.down-at grow neg able
xɔ³³tɕɑ̠³¹ tsɑ̠³¹ tɔ²² mɑ³³ tɕɔ²² niæ³³ || ne³¹tɕʰɑ³¹ = xɔ²² tsɑ³¹ pi̠²² = xi⁵⁵ || tʰe⁵⁵ ||
food eat nmlzr neg have ep mud = in eat caus = rel that
‘If it’s only water, also, if there’s no mud it will not grow down there. Since it doesn’t have food, which is provided from within the earth… As for that issue (which you are asking about).’
(YPG1-20191222_04-Q8, 00:12:13.87—00:12:45.51) (instructional speech directed at author)

While the question posed by the addressee, i.e. ‘does it grow in water?’, may imply the counterfactual belief that ‘it could grow only in mud or only in water’, this belief was not directly stated as such. Yet it was indexed as addressee-associated via tʰe⁵⁵, appended as a syntactically and prosodically loose particle after the entire preceding utterance over which it has epistemic scope in a way similar to demonstratives in other Tibeto-Burman languages, such as Lai Chin (Barnes 1998), and elsewhere, e.g. Lao (Enfield 2007) or Makhuwa (Bantu), where the demonstratives and vo are used to anchor an utterance in the speech situation (Van der Wal 2013, 30). As we have seen thus far, nominal deictics are attested as constituents in a noun phrase (Examples 1–2), in structurally embedded nominalisations (Example 22) and in non-embedded discourse frames (Examples 11 and 26), where they function as loose discourse-epistemic particles, albeit of a very different kind from syntactically disjoint deictic particles like tʰuː²² ‘look at that!’.

Multimodality again provides key insights. In Figure 16 (corresponding to Example 26), BLW produces gestures engaging the addressee, as an embodied way of placing the propositional content within their presumed sphere of epistemic interest. More specifically, a repeating puncturing gesture is co-timed with tokens of tʰe⁵⁵ ‘that by you’.

Interestingly, tʰe⁵⁵ ‘that by you’ seems to be a conventionalised rhetorical device to dismiss a belief that any hypothetical addressee might harbour, as a sort of ‘straw-man’ to ensure everyone is on the same page. Thus, in the following monologue, tʰe⁵⁵ marks the referent ‘other peoples’ villages’ even though the addressee-association does not follow from any input whatsoever by the addressee but, rather, rhetorically pre-empts it:

(27) BLW: zi̠³³ le³³ || ŋɑ³³ kɑ³¹ fe⁵⁵ le³³ || sæ²² kʰɨ⁵⁵ qɨ³³ […] e⁵⁵ tʰi³³
leopard top 1 hear exp top three clf enter this one
kʰɑ̠³¹qɔ⁵⁵ || sɨ⁵⁵ kʰɑ̠³¹qɔ⁵⁵ tʰe⁵⁵ ŋɑ³³ mɑ³³ si²² ||
clf.village others village that 1 neg know
‘As far as clouded leopards go, as far as I’ve heard, three of them entered… I mean this village, as for other villages, that (which YOU may be wondering about), I don’t know.’
(YPG1-20191204_02-Q8, 00:16:32.76—00:16:45.00) (informal narrative directed at author)

The preceding use is functionally similar to Marind absconditives, which are verbal affixes derived from demonstratives, that are used to update the common ground by denying the addressee’s presuppositions (Olsson 2019). It can also be theoretically related to English you know, which, among others, is used when a speaker transmits non-shared knowledge (Macaulay 1991, 157–8, 2002, 755–6) to check whether the addressee is aligned with the speaker (Bernstein 1962, 235) and/or to construct familiarity by “get[ting] the addressee to cooperate and/or to accept the propositional content of his utterance as mutual background knowledge” (Östman 1981, 17).

While these epistemic nuances are somewhat abstract, multimodal observation of gestures co-timed with addressee deictics contributes to reveal how, in uses like the one in Example 27, the speaker metaphorically recruits the visuospatial channel to index the addressee’s, as shown in Figure 17.

This is in line with research on metaphoric language (e.g. Cienki and Müller 2008), suggesting that there is some form of cognitive reality to the association between concrete and abstract readings of the same linguistic forms, which surfaces in metaphoric gestures.

Associating referents or entire propositions with the addressee sometimes confers a sense of emotional disaffiliation on the part of the speaker. This is most obvious in contrastive scenarios opposing the addressee to the speaker on some explicit dimension, e.g. differences in age and level of fitness:

(28) MSN: ŋɑ³³-ʑi³³ tsʰɔ⁵⁵mɔ³¹ = æ²²sɨ²² le³³ […] liæ³¹ kɔ²² ɕiæ³¹ɕɨ²² nu⁵⁵ ɕɨ²² mæ²² […]
1-pl old.person = like top two clf hour walk need
nu³³ ɲi³¹ xɔ³¹ tʰe⁵⁵ le³³ ɕɨ²² tɕʰɔ⁵⁵ ||
2 two clf.human that top walk succeed
‘Old people like uswe have to walk up to two hours (to get back from the fields) … As for the two of you (lit. those you two), you can walk well!’
(YPG1-20190511_05-ext1H5, 00:49:43.52—00:51:09.16) (spontaneous dialogues)

Likewise, by placing a proposition within the addressee sphere, a speaker may also express (mild) reproach, as in BLW’s resolve to use drawings after the addressee repeatedly failed to understand oral explanations of a leopard capture:

(29) BLW: ŋɑ³³ xuɑ³¹ tʰi̠²² = lɨ³³ nu³³ ɲi⁵⁵ pi̠²² || nu³³ tʰe⁵⁵ || nɑ⁵⁵ kɑ³¹
1 draw rslt = seq 2 see caus 2 that listen hear
tɕʰi³³tɕʰu²² mɑ³³ ki̠²² = xi⁵⁵ tiæ⁵² ||
clearly neg able = rel contr
‘Let me make a drawing and show you. Since you boy don’t understand it when spoken.’
(YPG1-20191220_01-Q8, 00:43:10.29—00:43:14.51) (instructional speech directed at author)

As per Kratochvíl’s insightful analysis of similar uses of Abui addressee deictics, “by placing utterances within the epistemic or social jurisdiction of the addressee, the social implication emerges that they should have known or done something better” (2011, 773). Further crosslinguistic parallels include Lai Chin khaʔ (Barnes 1998), Japanese so- (Naruoka 2006, 492), Finnish se (Laury 1997) or Dalabon nunh. The latter is, for example, analysed in Cutfield (2011) as conveying that a referent is ‘not affiliated with speaker’ either because it is outside of the speaker’s physical, visual, mental or discursive territory, owned by someone else, explicitly dispreferred or emotionally disliked by the speaker (ibid, 247, 290–5).

Recall from Sections 12 that deixis has been defined as a linguistic device to achieve joint attention (Diessel 2006) and that Phola has a dedicated set of syntactically disjoint deictic particles whose primary function is to draw attention towards referents located in one of five deictic spheres, i.e. by the speaker, by the addressee, downhill, uphill and across the hill. In light of what we have seen in this and the previous section, we would expect disjoint deictic particles to be able to draw attention not only along the spatial and visually perceptible channel but also along more epistemic and social channels, and this is exactly what we find. To illustrate, consider Example 30 where the addressee disjoint deictic particle tʰuː²² ‘look there by you!’ is used in a reproachful admonition to draw attention towards the addressee’s censurable behaviour. The speaker here is not using tʰuː²² ‘look there by you!’ to achieve joint visual attention given that both speaker and addressee share a common visual focus on a rice bowl that the addressee keeps trying to fill with more rice. Instead, tʰuː²² is meant as a directive instruction for the addressee to use contextual cues residing in their own sphere of interest, i.e. their own behaviour, as evidence and legitimate grounds for the speaker’s reproach. The attentional deictic effectively works as a call towards a behavioural alignment inasmuch as it is used to prompt the addressee to acknowledge a rice-wine-driven behaviour as such and stop it.

(30) SPKR1: nu³³ ɣɔ³³tɕæ³¹ tʰe⁵⁵ ⁿtɕi²² pʰɑ³¹ ᵐpɔ²² le³³ | tʰi³³sɨ²² xɔ³³tɕɑ³¹ ɲæ⁵⁵ ɬe³¹ ki̠²² ||
2 man that cond full top like.that food intf able
‘You (lit. those) men, when you are drunk you are so likely to put a lot of food like that (on one’s bowl)’
(10 seconds later, SPKR2 tries to fill SPKR1’s bowl. The latter withdraws their bowl from SPKR1’s reach and says:)
SPKR1: tʰuː²² || tʰuː²² ||
look.there look.there
But do look at yourself!
SPKR2: ŋɑ³³ ⁿtɔ²² ᵐpɔ²² = e⁵⁵ ||
1 drink full = adr.asym
‘So now I am drunk according to you!?’
SPKR1: ɲæ⁵⁵ ⁿtɔ²² fe⁵⁵ ||
intf drink exp
‘You have certainly been drinking.’
SPKR2: ŋɑ³³ qʰɑ³³ = tʰɑ̠³¹ ⁿtɔ²² ᵐpɔ²² | nu³³ si²² mɑ³³ si²² = læ³³ ||
1 int = time drink full 2 know neg know = q
‘Do you even know when I’m drunk?’
SPKR1: ɲæ⁵⁵ si²² ||
intf know
‘Course I know!’
(YPG1-20191201-fieldnotes_YPG5AA) (spontaneous dialogues)

8 Discussion

While there is some evidence that spatial deixis may be historically more basic (cf. Diessel 2006), and that spatial cognition is recruited during production and decoding of demonstratives (Stevens and Zhang 2013), recent trends have highlighted the role of attention and intersubjectivity (e.g. Diessel 2006, Burenhult 2003, Evans et al. 2017a), in line with an older tradition embodied among others by Bühler (1965). Using neurolinguistic methods, Peeters and Özyürek (2016) claim that physical proximity is not what governs deictic choice, even in languages like English, but rather psychologically proximity, jointly established between speakers and addressees. From a more anthropological perspective, Hanks (1990, 2009, 2011) argues that Yucatec Maya deixis responds to conceptual and social modes of access to referents. But while these kinds of studies explicitly anchor the core of deixis in psychology and social interaction, different methods have yielded evidence substantiating a spatial analysis for the same forms (Bohnemeyer 2018). Likewise, Finnish (Laury 1997) and Lao demonstratives (Enfield 2003) have been convincingly analysed as encoding conceptually and interactionally constructed zones, rather than metric distance. However, distance has been subsequently acknowledged to pragmatically influence deictic construal in Lao after all (Enfield 2018). As Levinson has aptly put it: “the discussion of the importance of spatial distinctions is by no means over” (2018a, 9). The present study has contributed to this discussion by offering novel field data from Phola, a previously undocumented Tibeto-Burman language, where alongside space, socio-cognitive dimensions of experience, such as attention but also shared knowledge, discourse focus (in the sense sketched out in Sections 4 and 5) and social structure, allow one to partition the world into spheres of interest that work as deictic anchors for reference to entities, places and states of affairs.

However, it has emerged that Phola deictics are not purely sociocognitive in the same sense as attentionally specified demonstratives in Turkish, Jahai and Kogi. Instead of specifying space, attention, discourse or societal structure as a privileged search domain (Levinson 2018a), what is simply encoded in Phola is one of five possible referential anchors: the speaker sphere, the addressee sphere, the uphill sphere, the downhill sphere and the cross-boundary sphere. The specific kind of relation between a referent and a deictic sphere is left underspecified and may draw from either or both the ephemeral characteristics of the immediate situational context and more textured aspects of broader social context (in the sense of Voloshinov 1983 [1926], and Manning 2001). For example, objects held in an addressee’s hand, far-away locations they are pointing at or habitually occupy, referents they just introduced into speech, or a close kin relative of theirs are all within the addressee’s sphere. Correspondingly, marking all five instances with tʰe⁵⁵ ‘that by you’ establishes the addressee as a broad deictic anchor that mediates the identification of something as addressee-proximal in material, cognitive, actional, discursive, and social space, respectively.

This is interesting because previous studies have drawn attention to the difficulty of exhaustively defining the range of factors involved in sociocognitive deixis. Given that “the term ‘cognitive accessibility’ is rather undefined” (Burenhult 2003, 367), heuristic lists of factors are often enumerated: “‘Accessibility’ is to be understood as a wide concept incorporating a range of notions related to factors like reachability/approachability, perceptibility, distance, possession/ownership and topicality in discourse.” (Burenhult 2003, 365). In a similar vein, Enfield (2018) and Hanks (1990, 2009, 2011) list many parameters influencing demonstrative choice, including location, metric distance, visibility, physical access, motion, time, perceptual salience, attention, memory, anticipation, mutual knowledge, common sense, authority, ownership, habitual familiarity, kinship, social roles, culturally constructed space, engagement zone perimeters, stereotypical social scenarios, privileged relationships, politeness demands, stance, discourse, etc.

The notion of spheres of interest takes a different approach to the issue of indefinition by turning it into a heuristic model that assumes that the various different perspectives outlined above are complementary and intermeshed, rather than competing, parts of an integrated processing toolkit in the social mind, an interpretation compatible with embodied cognition models, e.g. Gazzaniga and LeDoux (1978), Gibbs (2005). It was shown how such a model receives validation from pragmatic analysis of contexts where the Phola speech act deictics e⁵⁵ ‘this by me’ and tʰe⁵⁵ ‘that by you’ latch onto holistic packages combining spatial, epistemic and social information (Example 17).

While spatial and social cognition are heuristic ends of a gradual continuum, it was shown that deictic reference may independently latch onto either dimension even when they ‘clash’ with each other. Thus, some uses are distinctly spatial in nature and allow no obvious attentional reading (e.g. Examples 6–8 and 19–20) while others are unambiguously sociocognitive and disallow any spatial interpretation (e.g. Examples 14 and 21). Crucially, neither dimension is promoted to an obligatory part of the deictic’s core semantics. Correspondingly, a given deictic expression may mean different things, depending on what aspects of context speakers and hearers decide to latch on, e.g. ‘there’ can mean ‘there were you are right now’ or ‘there where you usually are’ (Example 16).

As insightfully noted in Enfield’s exploration of Lao demonstratives: “A range of context-dependent factors … can all affect the selection of demonstratives in consistent and principled ways, yet without any of them being encoded in the semantics” (2018, 87). In other words, the mapping between a referent and a deictic anchor remains to an important degree open to intersubjective negotiation rather than deterministically emanating from external context. In Enfield’s terms, Phola demonstratives are purely deictic in that they are “neutral with respect to the domain [they] appl[y] to” (2018, 73). Like Lao’s two-term demonstrative system, Phola encodes purely deictic spheres, albeit five fully contrastive ones.

Leaving the mode of access underspecified may come at the cost of ambiguity, but it also provides a communicative flexibility that allows speakers to appeal to visual and spatial cognition on the one hand, and to social cognition, on the other. An important implication of this approach, discussed in Section 7, is that deictics do not merely identify and locate entities but also have an important performative function whereby they can be used to express epistemic stances, evaluations, inferences, interpretive frames and social expectations. Unconstrained by semantically coded values in terms of metric distance or attentional states, speech act deictics are free to project idealised or desirable relations between a referent and the speaker or the addressee as deictic anchors. For example, by placing a referent in the addressee’s sphere, a speaker may encourage them to assume their physical capability and deontic responsibility to reach out, find it and use it. In Laury’s terms, the speech act participants’ spheres are “not only expressed, but constituted, by demonstrative usage” (1997, 55). This is consistent with Bickel’s observation that deixis ‘conveys’ rather than ‘encodes’ space by imposing ‘morally loaded’ ‘networks of divisions’ on material space (2001, 240–44). Phola deixis imposes not so much spatial but ontologically underspecified divisions that provide an instruction to relate a referent to a conventional point of reference in whichever way makes sense in context, i.e. spatially, epistemically, socially, etc. This may respond to principled pragmatic structures (cf. Hanks 2009), e.g. the interpersonal management of conversational topics and turns (as discussed in Sections 4 and 7), but can also be creatively and rhetorically manipulated by the speaker’s own understandings and ad hoc communicative agenda.

From a structural viewpoint, the sociocognitive affordances of Phola speech act deictics are both revealed and enacted through syntactic flexibility. Nominal deictics, for example, may display scope over a noun phrase (Section 1.2), a nominalised verb phrase (Example 22) as well as entire utterances and larger chunks of discourse (Examples 11 and 26). Placing both entities and entire propositions within a given speech act participant’s sphere of interest serves the purpose of negotiating territories of information, knowledge, understanding and action. Beyond phrasal deictics, the existence of a dedicated class of disjoint deictic particles, which are not syntactically embedded and can flexibly draw attention to objects, places, kinds, properties and entire events/situations (cf. Examples 5 and 30) is a particularly neat reflex of both the semantic leanness and the performative nature of Phola deictics. Importantly, evidence has been presented that this performative function can pay heed to material reality, attentional states and discourse structure, but does not deterministically reflect them. The present analysis thus supports a descriptive approach to deictic usage that calls for researchers to step back and consider moving away from the long-established theoretical ambition of predicting deictic choice from context. It is suggested that we should instead be focusing on an ethnographic, pragmatic and multimodal examination of how deictic choice constitutes interactional and social praxis. In light of their semantic leanness, the multimodal stream has been shown to be key in the analysis of the pragmatic functions of Phola deictics. An interesting finding in this regard is that, even though space is not a more basic semantic component of Phola speech act deictics than social cognition, there is nonetheless some evidence that the visuospatial dimension can be optionally recruited to co-express intersubjective and performative nuances, e.g. by gesturally placing an idea in the addressee’s peripersonal domain (Figures 16 and 17).

Figure 16 
               Repeated puncturing gesture towards the addressee obtains twice. This is co-timed with addressee deictics bolded in Example 26. Same frame captured by a lateral and a frontal camera, respectively.
Figure 16

Repeated puncturing gesture towards the addressee obtains twice. This is co-timed with addressee deictics bolded in Example 26. Same frame captured by a lateral and a frontal camera, respectively.

Figure 17 
               Puncture stroke towards addressee’s chest is perfectly co-timed with bolded tʰe⁵⁵ in Example 27.
Figure 17

Puncture stroke towards addressee’s chest is perfectly co-timed with bolded tʰe⁵⁵ in Example 27.

One obvious limitation of the present study is that it is primarily based on monolingual participant observation (cf. Everett 2001; González Pérez forthcoming). One important direction for further research is the principled gathering of data via questionnaires and matching tasks that experimentally control for contextual variables, such as those used in Burenhult (2003) and Levinson (2018a). Future studies shall also look into usage patterns from a quantitative perspective with the goal of assessing how frequent spatial, attentional and social modes of access are, and whether their relative frequency exhibits any principled correlation with text genres, speech acts, modality (e.g. realis vs irrealis), among others. In this regard, a promising avenue of research involves a detailed analysis of how deictic choices are modulated and negotiated across several speech turns, based on changes in the common ground. Work in progress is currently looking into the conversational embedding of the most interactionally loaded kind of Phola deictics, the class of disjoint deictic particles, with a focus on the kinds of intersubjective settings that motivate their use and the kinds of reactions that they trigger in the addressee.

While there may be some functional and communicative advantages to the semantic indetermination of deictics, recent research by Peeters et al. (2020) suggests that different contextual variables may act as a principled filter mediating the contributions of physical and psychological factors in deictic choice. Although the data presented in this article does not allow to make any unchallengeable generalisations in this regard, a preliminary survey suggests that intersubjective anchoring is particularly common in dialogues and addressee-oriented speech, i.e. admonitions, requests, arguments, pedagogical and instructional speech, etc. Very particularly, addressee deictics seem to be systematically used for objects, places, topics and ideas that the addressee has brought into the discourse-conversational focus. Moreover, preliminary findings were presented suggesting that in combinations of multiple co-referential deictics, each deictic base will tend to map onto a different mode of access, e.g. attention and space (Example 25). Future research shall further investigate the possibility of isolating concrete contextual variables that specifically favour spatial over sociocognitive anchoring of Phola deictics, or vice versa. Connected to this is a systematic exploration of individual differences in usage and their potential correlations with social structure. While kinship emerged as a conventional anchor, there may be other salient factors at play. Previous research suggests for example that socially powerful and knowledgeable individuals may index a wider range of entities and locations with proximal demonstratives (cf. Hanks 2011). The complex hierarchies and relations in Phola society provide an excellent testing ground for the anthropological functions of deixis.

Finally, the picture of Phola deixis that has been presented here will most certainly be enriched and expanded by ongoing work on the spatial and intersubjective usages of the three distal deictics that are anchored to topographic features.

Published Online: 2023-02-22

