A Two-Step Theory of the Evolution of Human Thinking

Joint and (Various) Collective Forms of Intentionality

Glenda Satne
  • Department of Philosophy, Alberto Hurtado University, Alameda 1869, Of. 304, Santiago, Chile; and Vice-Chancellor Fellow, School of Humanities and Social Inquiry, Faculty of Law, Humanities and the Arts, University of Wollongong - NSW 2522, Australia, e-mail:
Published Online: 2016-03-23 | DOI: https://doi.org/10.1515/jso-2015-0053


Social accounts of objective content, like the one advanced by Tomasello (2014), are traditionally challenged by an ‘essential tension’ (Hutto and Satne 2015). The tension is the following: while sociality is deemed to be at the basis of thinking, in order to explain sociality, some form of thinking seems to be necessarily presupposed. In this contribution I analyse Tomasello’s two-step theory of the evolution of human thinking vis-à-vis this challenge. While his theory is in principle suited to address it, I claim that the specifics of the first step and the notion of perspective that infuse it are problematic in this regard. I end by briefly sketching an alternative.

Keywords: Collective intentionality; Joint intentionality; Objective content; Tomasello; Natural history

1 Introduction

How is it possible for something such as contentful states of mind to exist in a natural world? How can thoughts be objective, i.e. how can they refer to things beyond themselves? How did the capacity to think objectively emerge in natural history?

Any proposed naturalistic explanations of objective content must (i) not presuppose objective content and (ii) have recognized scientific credentials.

The attempts to provide an answer to this puzzle can be schematically classified along three different types of theory. I follow Haugeland’s lead 1 in branding these three main types as neo-Cartesian, neo-Behaviourist and neo-Pragmatism. Neo-Cartesians are committed to the idea that mental contents are original to the individual mind and prior to the existence of socio-cultural practices. Among proponents of such a view are Fodor and Millikan. Neo-Behaviourism is defined by its cautious attitude towards assuming the existence of mental states; contrariwise, it renders content in terms of intentional ascriptions. Daniel Dennett is famously an advocator of such a position. Finally, neo-Pragmatism take social practices to be at the basis of the emergence of content. They claim that “contentful tokens, [e.g. mental contents], like ritual objects, customary performances, and tools, occupy determinate niches within the social fabric – and these niches ‘define’ them as what they are. Only in virtue of such culturally instituted roles can tokens have contents at all” (Haugeland 1990, p. 404).

Tomasello’s work on the phylogenesis and ontogeny of human thinking lines up with the third group. He explicitly advocates this view from the very beginning of his A Natural History of Human Thinking: “[Thinking] is a solitary activity all right, but on an instrument made by others for that general purpose, after years of playing with and learning from other practitioners […]. Human thinking is individual improvisation enmeshed in a socialcultural matrix” (p. 1). Thus, it will come as no surprise that his philosophical companions are, among others, Wittgenstein, Brandom, Hegel, Vygotsky, Piaget and Davidson, all of whom comfortably side with the aforementioned social constructivist, neo-Pragmatism view.

The main idea that Tomasello has been developing over the years is that a distinctive tendency to cooperate, coupled with an ability for socio-cultural learning that is supported by environmental scaffolds, played a critical role in enabling contentful forms of cognition to emerge (Tomasello 1999). 2

Over the years, many have raised suspicions as to the prospects of such a social strategy for success. 3 The social constructivist strategy is thought to embed an essential tension (Hutto and Satne 2015). The problem at issue can be easily seen if we make explicit the line of reasoning that seems to underlie most attempts to account for the origins of content in social terms:

  1. Contentful intentionality (i.e. objective thinking) develops through social cooperation and engagement in socio-cultural practices;

  2. Participating in and mastering socio-cultural practices requires intelligence;

  3. Intelligence requires intentionality;

  4. Intentionality requires content (i.e. objective thinking).

If intentionality depends on social practices, how could we ever participate in social practices in the first place, since doing so requires a set of sophisticated capacities (e.g. learning capacities, the capacity to identify objects and situations, etc.) that uncontroversially presuppose the mastery of such intentional capacities? Unless a satisfactory answer could be provided to this essential tension, the social constructivist strategy is doomed.

Tomasello’s new book may provide the cure for this neo-Pragmatism problem. It would lie in the distinction between two kinds of intentionality, namely, joint intentionality and collective intentionality. This strategy has the advantage of, on the one hand, acknowledging that only those with minds can harbor states with the right contents for engaging in practices of learning from others (besides language-learning practices, which are key for objective thinking, strategies for foraging, hunting, and so on should also be included here), while at the same time providing a social constructivist, neo-Pragmatism account of the emergence of objective content in the natural world: the specific ability that allowed humans to entertain objective thoughts has a natural history, one in which social forms of engagement and cooperation played an essential role.

The notion of joint intentionality promises to describe the set of abilities that a creature is able to entertain without yet being capable of manipulating truth-functional representations of others and the world. At the same time, these cognitive tools would provide the platform for the development of the latter kind of representational capacities that belong together with collective intentionality. This strategy seems to have all the ingredients to overcome the essential tension outlined above: it provides fresh tools for neo-Pragmatism to use in explaining how organisms progressed from more primitive forms of intentionality to objective forms of intentionality. Elaborating a little, what the neo-Pragmatism story lacks is an account of how social practices are possible without having concepts about the other’s beliefs, desires, intentions. Tomasello’s account of the emergence of objective content avoids this essential tension by placing objective intentionality only in the context of special sorts of socio-cultural norms and acknowledging a different form of intentionality that explains social engagement in practices that lead to its development. It also assumes that our species-wide biologically based tendencies constitute the platform through which objective content first arises on the scene. This is to take for granted that biological forces put in place mechanisms for social cooperation that enable individual learning and work in conjunction with mechanisms for the social inheritance of culturally evolved devices.

While I surmise that this strategy is a promising contribution to the debate about the natural origins of contentful thinking, in what follows I would like to raise some important concerns about the specific way in which Tomasello develops it in his recent book. In the next section, I briefly describe the distinction between joint intentionality and collective intentionality that Tomasello deploys. After that, I explain why some of the features characteristic of the former kind of intentionality can raise problems for his view. To conclude I succinctly propose an alternative that, while consistent with Tomasello’s general strategy, steers clear of those concerns.

2 Joint Intentionality and Collective Intentionality

A Natural History of Human Thinking is the result of the research that Tomasello and others have been conducting for the last decade or so and argues that two forms of shared intentionality are necessary to account for both the ontogeny and the phylogeny of human thinking (p. 31). 4 Those two forms are the following:

  1. Joint intentionality. This involves small-scale collaboration and joint attention, and it is at play in short-lived spatially and temporally located second-personal relations between particular individuals (p. 48). Some understanding of different perspectives arises as the individuals relate and become aware of the fact that they are both attending to the same thing (p. 45). Individuals may come across a different perspective that they could contrast with their own. But even if there are several perspectives in play at this stage, this does not yet amount to real objectivity in the sense of universal validity.

  2. Collective intentionality. This involves large-scale forms of collaboration that go beyond the here and now and entail the construction of a common cultural ground as well as the creation of conventions and institutions (p. 5–6). At this point individuals become group-directed or group-minded, which allows for a new level of generality. Individuals can now assume a view from nowhere, a “group’s agent-neutral point of view” (p. 5; see also p. 122).

Some remarks on the nature of this distinction are in order. First, Tomasello focuses on intentional action, as most of the literature on collective intentionality does (Bratman 1992; Knoblich et al. 2011; Butterfill 2012). Accordingly, Tomasello claims that the first form of cooperative activity – the one characteristic of joint intentionality – can be captured as a kind of shared intentional action. More specifically he claims that this primitive form can be captured by Bratman’s analysis of joint action (38 ff.). In doing so he distinguishes three conditions that need to be fulfilled in order to engage in joint intentional action of that kind:

If you and I are agents, and J is a goal, then:

  1. I must have the goal of doing J together with you,

  2. you must have the goal of doing J together with me, and

  3. we must have common knowledge, or common ground, that we both know each other’s goal (p. 38).

Condition (3) involves meta-representations of other people’s mental states, which in turn involve higher order attitudes: I know that you know, and you know that I know, and you know that I know that you know, etc. While acknowledging that common ground might replace the need for this recursive mindreading in the case of humans, for whom the knowledge that we both want to J might suffice, Tomasello claims that in conflicting situations we draw back to a recursive reasoning, thus showing that there is indeed “an underlying recursive structure” (p. 38; also 2008). Accordingly, and expanding on the cognitive structure that allows for engagement in joint action, he underlines the need for recursive inferences, simulations and ongoing self-monitoring and meta-representations (p. 5, 9, 143). Understanding the other’s perspective requires, so he claims, simulating my partner’s abductive inferences ahead of time in order to anticipate how I might be understood by him (p. 94). 5 In sum, Tomasello grants an essential role to social cognition as part and parcel of joint intentionality and endorses a model that integrates elements from both simulation theory (ST) and theory-theory of mind (TT).

Second, Tomasello links the distinction between joint and collective intentionality to issues of validity and objectivity. As said, in the case of joint intentionality, when two people are aware that they are both attending to the same thing simultaneously, an understanding of different perspectives might arise. The individual may discover a different perspective that she could contrast with her own. However, although we can already have several perspectives in play at this stage – when dealing with small-scale I-thou relations – we are still far from having attained real objectivity in the sense of universal validity. A significant further step takes place the moment individuals become group-minded. This allows for scalability through group identification: we-groups make it possible to integrate ancestors and descendants, thus furnishing large groups with history and traditions. In natural history, this process of group-identification allows for a new level of generality. It is not simply a question of adding more perspectives to the two already in play, but of moving to the level of any perspective whatsoever, i.e. to a view from everywhere (and nowhere) (p. 122). Tomasello introduces a corresponding distinction between second-person and group-level normative pressure. In the former case, I am sensitive to how another individual evaluates me and might regulate my actions accordingly. In the latter case, a more objective or agent-neutral standard is in play. Even if a particular individual enforces the group’s norm, she is so to speak doing so as an emissary of the group as a whole (p. 88).

Finally, Tomasello rejects the idea that language is the foundation of human cognition and claims that language can only come into the picture once an already existing social infrastructure is at work (p. 128).

A Natural History of Human Thinking must then be recognized for enriching the field by distinguishing two forms of social intentionality and doing so in terms of a two-step evolutionary story of the emergence of objective content. In doing so he has outstripped many of the current approaches to collective intentionality that are committed to a uniformity thesis, i.e. to the idea that there is a single notion of intentionality that could capture any form of sharedness. 6 Prima facie, this seems to satisfy what is needed to overcome the essential tension, and pave the way for a successful social constructivist account of objective content. Nevertheless, the specifics of this strategy might raise doubts about its prospects for succeeding in such a task. I turn to such doubts in the following section.

3 Joint Intentionality as a Primitive Kind of Social Intentionality

The introduction of joint intentionality seemed to avoid the need for objective representations, thus overcoming the essential tension at stake in many social accounts of objective content by acknowledging a form of intentionality that could be at the basis of sociality without presupposing the mastery of socio-cultural practices. But the problem poses itself again once it is assumed that, in order to engage in joint intentionality, one needs to already entertain meta-representations of other people’s mental states (those needed by Bratman’s analysis of jointness), perspective-taking on external states of affairs and the capacity for their inter-assessment. The question arises as to how one could have states such that they would suffice to portray someone else’s perspective on the same situation, while at the same time falling short of what joint intentionality engagements are supposed to make possible, namely, thinking with objective content. 7 This opens the question as to whether a non-objective kind of thinking that encompasses perspective differences is at all possible, and what the notion of content associated with it would really mean. 8

To be fair, Tomasello has defended the possibility of pre-objective thought extensively by describing a set of proto-thinking capacities in chimpanzees (Tomasello 1999, ch. 2). But even granting that we could make sense of chimps’ proto-thinking capacities in terms of the specific abilities they have, that kind of content cannot be equated with the sort associated with meta-representations – including Bratman’s recursive kind – and simulation, which is the kind of content at stake in explaining the possibility of joint intentionality. This latter kind is in fact more complex, according to Tomasello himself, since it involves second and third order attitudes that chimps are unable to entertain. In order to account for the intermediate content needed to make sense of joint intentionality, he hypothesizes the existence of a set of cognitive mechanisms such as perspectival representations and social self-monitoring (Tomasello 2014, ch. 3). But even if those mechanisms could do the required job, the problem is whether the representation of perspective differences does not already presuppose objective content to which both perspectives simultaneously refer. Furthermore, one might wonder whether a social account of content and thinking can really use TT, Simulation tools and Bratman’s analysis of joint action for understanding the sort of sharing or jointness that underlies the very possibility of thinking. These tools are tailored to explain social cognition and joint action, respectively, but not to account for the emergence of objective content. Thus, it is not clear that ST, TT and Bratman’s account of joint action do not already use a notion of content that a social account of objective content claims developed from the use of these very mechanisms. If this were so, it would amount to restating the essential tension under investigation here: presupposing the kind of content we need to explain.

As for collective intentionality, a related worry arises. The ability to adopt a we-perspective seems to fall short of the ability to entertain an objective point of view. The fact that I am able to adopt a we-perspective, i.e. the perspective of my group, does not by itself entail that I am adopting a view from nowhere. Tomasello claims that a member of the group can speak as a representative of the whole, and this seems fair, but even if a group’s point of view might suffice to distinguish in-group from out-group perspectives, it is not clear that this would imply the ability to adopt an objective point of view, independent of any particular perspective. And conversely, if the opposition between the in-group perspective and the out-group perspective is thought to be enough to imply that members of the group are able to entertain a universal point of view that allows them to compare such perspectives, why is it not the same with the contrast between two perspectives in the I-Thou case? 9

In sum, Tomasello (1) provides an analysis of joint intentionality exclusively in terms of joint action that assumes Bratman’s view on jointness and complements it with a model that integrates ST and TT accounts of social cognition, and (2) gives an account of objective thinking based on forms of group thinking that amount progressively to a universal group or a perspective from no-where. Both (1) and (2) seem ill-equipped to give a naturalistic account of objective content, either falling short from what is required or presupposing what needs explaining. Thus, Tomasello’s two step strategy, though promising, falls short of providing a naturalistic account of objective content. To close, I would like to suggest that other forms of shared intentionality might provide us with an alternative understanding of we-intentionality fit for successfully accounting for objective content. I have already said that Tomasello sees the evolution of culture as the milestone of the human species. But what does it take to belong to a culture?

4 A Primitive Form of Collective Intentionality

I would like to briefly suggest a different way in which the intermediate step between chimps’ proto-propositional attitudes and full-fledged human intentionality could be understood. I take this to be sympathetic with Tomasello’s strategy while avoiding some of the problems it raises. The proposed strategy would basically consist in portraying the form of intentionality that precedes and grounds what Tomasello calls ‘collective intentionality’ in terms of a more primitive but already collective form of intentionality. This form would not include the demanding apparatus that Tomasello, following Bratman, proposes in order to explain jointness, but would nevertheless be a form of social intentionality that can be thought to underlie and support the development of the full-fledged kind that already involves conventions, language, institutions, and objective content.

Can there be a more primitive form of collective intentionality? We could hypothesize a more primitive kind of group behavior that is not preceded by joint intentionality but rather integrated with more primitive forms of joint action. This sort of group behavior would amount to special forms of group thinking, acting and feeling that are displayed in jointly acting and that do not require singular representations of others’ mental states but that spring from common background rules, norms and habits.

What would the details of such an account look like? This question is particularly pressing, as the prospects of this view’s avoiding the essential tension that threatens social accounts of the emergence of content would depend on such details. Three elements would be part and parcel of the primitive form of the collective intentionality here outlined: (i) social conformism; (ii) forms of coordination that enable individuals to follow others and conform to group behavior; and (iii) social cognition that allows one to identify others’ approval or disapproval.

This strategy is inspired by Haugeland (1990), who gives one of the more developed expressions of neo-Pragmatism. He posits a mechanism for social conformism and compliance – a mechanism that oils the wheels of the kind of social engagement that makes objective thought possible:

when community members behave normally, how they behave is in general directly accountable to what is normal in their community; their dispositions have been inculcated and shaped according to those norms, and their behavior continues to be monitored for compliance. (Haugeland 1990, p. 406) 10

These mechanisms of social conformity get the practice of learning and teaching off the ground and do not require individuals to purposefully comply with rules from the get-go. 11 Instead of representing rules and others’ assessments of their actions, individuals would be sensitive to others’ assessments through emotional tuning to others’ approval and disapproval. 12 A second key element to be included in this story is the individuals’ capacity for coordination that enables them to follow others and conform to group behavior. 13 Finally, social cognition might play a key role in this picture as well, for example in explaining tuning to someone else’s approval and disapproval reactions but, instead of having recourse to TT and ST or a combination of the two, this strategy, following the phenomenological tradition, would align itself with the idea that at least some mental states are directly observable (see Merleau-Ponty 1964; Zahavi 2014). In the same vein, some studies in developmental psychology have provided evidence that face-to-face encounters involve direct perceptual understanding of others and intentionally directed responses to them that do not implicate inferences or simulations (Reddy et al. 2013). This approach has some interesting advantages in contrast to TT and ST models. It does not presuppose that a cognitive gap needs to be bridged – by either simulations or theoretical inferences – for social understanding in its basic forms to emerge and allows one to explain very early forms of social interaction in babies as young as two months of age (Rochat and Striano 1999; Reddy 2008). If children are immersed in these social practices early on in development, it might be that the ability to adapt and integrate within groups and to adjust to the behavior of others in the group, including one-to-one interactions, is prior to the kind of joint intentionality that depends on representing and simulating the mental states of others.

As to the intermediate evolutionary step, Tomasello stresses that individuals at this point only gathered in special occasions and with particular others. This might pose a challenge to the alternative view, as if this were the case, no groups and thus no group norms would be in place to be followed and learned. However, the primitive form of collective intentionality here outlined is not meant to depend on already stable groups but to progressively contribute to their development. We might think that a few individuals behaving alike under the same circumstances and with similar needs might suffice to get the practice off the ground.

Regarding the development of objective forms of thinking, one may hold that the capacity to act and think within groups expands with time, training and exposure to foreign groups, beginning with perspective blindness and progressively moving to more explicit and universal, objective forms of thinking that involve the capacity to explicitly represent foreign perspectives and ultimately a perspective from nowhere. If this were so, there would be no need to represent I-Thou perspectives from the outset; quite the contrary, this capacity would co-develop with the capacity to represent objective facts, i.e. to adopt a point of view from no-where. Essential to this second step would be the development of language in the context of which objective content would have its proper place. If this is a sound alternative, it might well be that group identification and membership underlie and support the development of the capacity to represent other persons’ perspectives (including joint intentionality as described by Tomasello and Bratman), and not the other way around.

While this is of course a rough sketch of an alternative, its only aim is to lay out a possible and appealing path to be further developed. Its main attraction is that it might amount, if substantially developed, to an account of social conformism that dispenses with representations of others’ mental states and perspectives from the get-go. Introducing a primitive form of collective intentionality into the picture may then furnish the right tools to pursue a social account of the emergence of objective content.


  • See Haugeland 1990. For a much more detailed treatment of the problem of content see Hutto and Satne 2015. 

  • Tomasello gives reasons for believing “that the amazing suite of cognitive skills and products displayed by modern humans is the result of some sort of species-unique mode or modes of cultural transmission. The evidence [for this] […] is overwhelming” (Tomasello 1999, p. 4). 

  • See for example Fodor and Pylyshyn 2015, p. 14. Their particular version of the challenge hangs on the claim -characteristic of some neo-Pragmatism theories- that objective content depends on language. As I explain below, the problem has a more general scope. 

  • According to Tomasello, we should in fact distinguish three steps in the evolution of human thinking: 1. individual intentionality, characteristic of great apes’ capacity for thinking; 2. joint intentionality, characteristic of early humans’ – a hypothesized intermediate species – thinking; and 3. collective intentionality, characteristic of modern human thinking. 

  • More specifically, Tomasello lists the following three key components: cognitive representations (perspectival and symbolic), inferences (socially recursive) and self-monitoring (regulating one’s perspective from the perspective of a cooperative partner) (p. 33). 

  • Cf. Zahavi and Satne 2015, where we coin the term and defend a pluralist view against such uniformity thesis. For empirical evidence supporting a pluralist view see also Salice and Henriksen 2015. 

  • At the core of Davidson’s view, for example, is the idea that these two capacities amount to the same one (e.g. Davidson 1992). 

  • Echoing Davidson’s worry: But words, like thoughts, have a familiar meaning […] only if they occur in a rich context, […which] is required […for them to have] a meaningful function”. This rich context involves full-fledged propositional attitudes (Davidson 1997, p. 127). 

  • Both Davidson 1992; Brandom 1994, ch. 8, Section VI, claim that these two cases are alike, both implying that an objective point of view is already implicit. 

  • Tomasello himself endorses the idea that social conformism is an important element in the development of objective thinking, though he posits it later, alongside what he calls ‘Collective Intentionality’. There is an ongoing debate (Sterelny 2012; Pacherie 2013, 2015) about whether a single mechanism – for social and cultural transmission – or a suite of different ones, boosted by feedback loops, lies at the heart of such developments. Whatever the outcome of this debate, no one doubts that a capacity for social conformism will form at least part of the best explanation of how human cognition did, and does, come into being. 

  • Teaching and learning norm-abiding behavior are to be understood as biologically inherited human capacities (see also Tomasello 1999; Racokzy et al. 2009; Csibra and Gergely 2009). 

  • I have developed this aspect of the proposal with more detail in Satne 2014. 

  • See Pacherie 2013, 2015. This would involve an innate tendency to coordinate with others one’s bodily movements and, progressively, more sophisticated activities (Knoblich et al. 2011). 

