An ERP study of anaphor resolution with focused and non-focused antecedents

Abstract The goal of this study is to better understand when (and why) the combination of semantic overlap between antecedent and anaphor and antecedent focus leads to difficulty in anaphor processing. To investigate these questions, three ERP experiments manipulating semantic overlap and focus compared the ERPs from the onset of the anaphor as well as from the onset of the last word in the sentence containing the anaphor. Our results suggest that although the focus status of an antecedent and the semantic overlap between the antecedent and anaphor are important, these factors are not the only significant contributors to online anaphor resolution. Factors such as readers’ expectations about thematic shifts also influence the processing. We consider our results in relation to two accounts of anaphor resolution, the Informational Load Hypothesis (Almor, 1999; Almor & Eimas, 2008) and JANUS (Garnham & Cowles, 2008).


Introduction
Anaphor resolution is sensitive to the semantic overlap of the antecedent (with the anaphor), and the focus of the antecedent (e.g., Almor, 1999;Cowles & Garnham, 2005;Duffy & Rayner, 1990;Myers, Cook, Kambe, Mason, & O'Brien, 2000;Rayner, Kambe, & Duffy, 2000). By focus we mean discourse focus, or what is at the center of the attention in the discourse. Discourse focus does not necessarily correspond to linguistic focus, although in the case of clefts, which we use in our studies, they coincide. Using eventrelated potentials (ERPs), the current study aims to systematically manipulate the influence of semantic overlap and discourse focus on anaphor resolution. Specifically, we aim to better understand when (and, as a secondary goal, why) the combination of semantic overlap and focus (or lack of focus) leads to ease in processing. We hope that a better understanding of the underlying process that use these two kinds of information can be used to assess and modify existing theories. Following a review of the relevant literature, we briefly discuss two models of anaphor resolution and suggest how an ERP study can significantly extend current understanding of effects of semantic overlap and focus on anaphor resolution.

Semantic overlap and typicality effects
In the present context, we are considering the semantic overlap between two nouns that are hierarchically related. The higher a term is in a hierarchy, the less specific semantic content it has, so a less specific term, higher in the hierarchy, will have less semantic content that a more specific term lower in the hierarchy. Semantic overlap refers to the semantic content in common between the two nouns. In relation to the current studies, a term two-levels down in the hierarchy from a less specific term will share a smaller proportion of its content with the less specific term than a term only one level down. In relation to previous studies, discussed below, an atypical exemplar of a category, one level down from the category term, will share less of its content with the category noun than a more typical exemplar. Note that this notion of semantic overlap is developed primarily with content-bearing nouns in mind, not pronouns.
With regard to semantic overlap, reading time and eye-tracking experiments have found that typical antecedents (robin) lead to faster processing than atypical antecedents (goose) when a category anaphor is used (bird; Duffy & Rayner, 1990;Garrod & Sanford, 1977;Rayner, et al., 2000). Garrod and Sanford (1977) suggested that this effect arises because semantic overlap is greater between the typical antecedent (robin) and the category anaphor (bird) than between the atypical antecedent and the anaphor. Essentially more overlapping semantic features between anaphor and antecedent lead to faster anaphor resolution.

Inverse typicality effects
In contrast to these findings, Almor (1999) demonstrated that under some circumstances the atypical antecedent leads to faster anaphor resolution. This finding challenged the original proposal from Garrod and Sanford (1977) that the typicality effect occurs because it is easier to identify an antecedent when it has a high semantic overlap with the anaphor. Almor (1999) argued that the effect of typicality depends on the focus of the antecedent. The reason previous results suggest that typical antecedents are processed faster than atypical antecedents is because focus was not systematically manipulated. Almor (1999) demonstrated that when antecedents are in syntactically clefted position and hence strongly focused (robin in 1a and goose in 1b), it is the atypical antecedent (goose) that results in faster reading times at the anaphor (bird), rather than the typical antecedent (robin).
1a. What the woman chased off was the robin.
The bird had shown way too much interest in her pie.
1b. What the woman chased off was the goose. The bird had shown way too much interest in her pie.
This inverse typicality effect is not specific to cleft sentences. Using a self-paced reading paradigm, Cowles and Garnham (2005) demonstrated that syntactic prominence was the key factor in this inverse typicality effect, rather than the clefting of antecedents. Regardless of whether the antecedent was clefted or otherwise syntactically prominent, the inverse typicality effect was seen. Looking in more detail at the time course of processing, using eye-tracking, van Gompel, Liversedge, & Pearson (2004) found some evidence for the inverse typicality effect, however only in regression-path times in the final region of the anaphorcontaining sentence (for example, demand in It was the coal that was ordered in great quantities before the winter. Obviously, the fuel was in great demand). Interestingly, first-pass reading times for the region after the anaphor (was in great) showed the opposite pattern: a standard typicality effect. These results suggest that there are at least two different types of processes in anaphor resolution, with different time courses, a point we will return to shortly. In addition to these findings, a similar effect is seen with variations in conceptual distance in a category hierarchy (Cowles & Garnham, 2005). For example, the anaphor vehicle is read quicker when its (focused) antecedent is conceptually more distant (e.g., hatchback) than when the antecedent is conceptually closer (e.g., car). More generally, the typicality/inverse typicality effect can be seen as one example of a semantic distance effect, with the hierarchy effect as another example. In other words, it is not just the interaction between typicality and focus that affect anaphor resolution. Instead, and more generally, it is the semantic distance between the anaphor and antecedent in combination with the focus of the antecedent. Furthermore, as demonstrated in the work of van Gompel, Liversedge, and Pearson (2004), there are questions about when these effects occur, which we will pursue in this study.

Accounts of the semantic overlap effect
Although our primary aim is to provide further evidence about the interaction of semantic distance and focus in anaphor resolution, and its time course, in this section we will briefly consider two accounts of the semantic overlap (or inverse typicality) effect that our findings may bear on. One account proposes that the difficulty of processing the anaphor after a typical antecedent arises because of the higher overlap of semantic information in working memory (the Informational Load Hypothesis, Almor, 1999;Almor & Eimas, 2008), an overlap that is not justified by pragmatic considerations, such as antecedent identification. According to the Informational Load Hypothesis (ILH), distinct representations of the anaphor and of the antecedent need to be maintained, at least temporarily, in working memory, which is difficult when the antecedent and anaphor are semantically similar, and may be particularly difficult if either representation is highly activated (e.g., because the antecedent is focused, Almor & Eimas, 2008). Almor (1999) characterises interference effects in working memory as determined by conceptual distance. Almor and Eimas (2008) specifically suggest a further role for activation, related to the focus status of the antecedent. This view is not incompatible with the original 1999 presentation by Almor, but it is not part of it.
Pertinent to our hypotheses, this account also suggests that semantic overlap can have different effects at different points in processing (Almor & Eimas, 2008). Again, this idea is not part of Almor's original (1999) presentation, but it is suggested by the empirical findings of Almor and Eimas (2008), and those of van Gompel, Liversedge, and Pearson (2004). Specifically, the revised model allows for the possibility that greater semantic overlap may facilitate initial processing of an anaphor. For example, semantic overlap might help in initially identifying a word (e.g., robin helps identify bird more than ostrich helps identify bird). However, at a later point in processing, when the anaphoric relation is being established, greater semantic overlap may lead to semantic interference in working memory, particularly if the antecedent is salient (see the results of van Gompel, et al., 2004 discussed above). When the antecedent is not in focus, semantic interference in working memory may be reduced (Almor & Eimas, 2008), because in this case semantic overlap performs a role in securing the antecedent-anaphor relation, thus leading to faster reading times and typicality effects. Inverse typicality arises, according to this approach, when the potential benefit of overlap for identification is not necessary because the antecedent is focused and hence the default antecedent, and is not justified by any other discourse function. Crucially, this model appears to make the following assumptions: (1) the focused antecedent is the default antecedent, even for fuller forms of reference, (2) the speed of processing of anaphors is dependent on the speed of identifying the referent and the degree of interference in semantic working memory, and (3) any additional overlap between antecedent and anaphor that does not aid in identifying the antecedent will incur a processing cost.
An alternative account of the focus/semantic overlap effect comes from the JANUS model of coreferential NP-anaphora interpretation (Garnham & Cowles, 2008). According to this model, the content of an anaphoric expression has two types of function. First, there are functions related to the text preceding the anaphor, the so-called looking back functions. The primary purpose of looking back functions is to identify the appropriate antecedent from a set of possible antecedents. The second type of function relates to the discourse structuring function of coreferring expressions. This looking forward function ensures that the referring expression is appropriate to what is later said about its referent (see Garnham & Cowles, 2008, for a more detailed discussion, also Schumacher, Backhaus, & Dangl, 2015). JANUS suggests that the inverse typicality effect is not explained by the difficulty of maintaining in working memory an antecedent and anaphor that have a large semantic overlap (Cowles & Garnham, 2005;Garnham & Cowles, 2008). Instead, it emphasizes the fact that that there is often more than one possible antecedent in a given discourse. When an antecedent is in focus, it is likely to be referred to again, and the presence of other potential antecedents is less relevant. Focused antecedents become the defaults for anaphoric references, and reduced anaphoric forms, such as pronouns, can usually be used. According to JANUS, an over-specific anaphor, one that has a large semantic overlap with a focused antecedent, results in difficulty not because of working memory processes, as ILH assumes, but because the anaphor has additional content that is apparently not needed (compare Grice's, 1975, Maxim of Quantity), and the language processor cannot figure out what to do with that information. A general, but not completely compelling argument for why over-specificity is inappropriate, is that the language processor should work efficiently, and not do more work than is required. The additional information will pose problems for comprehenders when they integrate the anaphor with the discourse representation, because the referential form will be inappropriate, given the status of the antecedent. In indirect support of this view, Cowles, Garnham, and Simner (2010) have shown that there is no evidence for effects of semantic overlap in working memory of the kind assumed in the ILH.
In relation to the secondary aim of the current study, the key difference between the ILH and JANUS is the locus of the difficulty in processing anaphors referring to focused antecedents when they have a high semantic overlap. Where ILH assumes the difficulty is due to semantic interference in working memory, JANUS assumes the difficulty is due to semantic integration. This leads to different predictions for any measure of language processing that can index working memory and semantic integration, such as ERPs. ERP data may, therefore, shed light on whether increased processing time and difficulty arises from increased working memory burden, as the ILH predicts, or from increased semantic integration difficulty, as JANUS predicts. In relation to the primary aim of the study, questions arise, whichever theoretical framework one adheres to, of when effects of semantic overlap and focusing manifest themselves, and again ERP studies can shed light on this issue, because of the high temporal resolution they provide.

ERPs as a tool to investigate anaphor resolution
The current study uses event-related potentials (ERPs) to explore the online processing of anaphors when focus and semantic distance are manipulated. ERPs are a good technique to investigate this issue for several reasons. First, ERPs can be used to explore processing at the anaphor as well as at the end of the sentence (or any other point) without requiring the participants to carry out secondary tasks that may change normal processing (see Kutas, Van Petten, & Kluender, 2006), and they provide a fine-grained temporal analysis of when such effects occur. Tapping into the early and later processing relative to anaphor resolution is particularly useful since Almor and Eimas (2008) suggest processing differences depending upon the point at which processing is measured. In addition, van Gompel et al.'s (2004) results also indicate that there are at least two different types of processes in anaphor resolution. Second, they provide a way to distinguish between increased working memory load and semantic integration difficulties, which other measures do not. Previous studies of anaphor resolution using ERPs (e.g., Schumacher et al., 2015;Swaab, Camblin, & Gordon, 2004;Van Berkum, et al., 2003;Xu, 2015) have not directly addressed the issues investigated in this paper. However, Almor et al. (2017) interpret (or re-interpret) the results of these studies, and of their own, as suggesting that N400 effects in anaphor processing reflect predictability of reference, and not appropriateness of referential form, which they consider might be reflected in later processing. This idea is consistent with the version of the ILH presented in Almor and Eimas (2008), and is likely to be relevant to the interpretation of our own results. We will, however, continue to look for effects at the anaphor itself as previous studies, including those mentioned above, have found ERP effects at this point.

The current study
Using ERPs, the current study explores the influence of semantic overlap and discourse focus on anaphor resolution.
Based on previous findings, there are at least two points in the processing where ERP differences may be seen: At the anaphor and downstream from the anaphor. Both the ILH and JANUS predict that anaphors with a high degree of overlap with focused antecedents should be more difficult than those with either non-focused antecedents or antecedents that are not as closely related. Crucially, what we hope to discover in the current study is why these focused, semantically overlapping antecedents are more difficult. Furthermore, we hope to learn more about when the difficulty arises, since the results of van Gompel et al. (2004) and Almor and Eimas (2008) suggest that the difficulty only manifests itself downstream from the anaphor. The explanation that each of the models gives of why focused, semantically overlapping antecedents are problematic provide hints about which ERP components to study.
The ILH's assumption that working memory is the locus of the difficulty for focused, semantically overlapping antecedents, suggests a possible modulation of the left anterior negativity (LAN), which is an ERP component associated with increased working memory burden (e.g., Kleuender & Kutas, 1993).
JANUS claims that the difficulty of focused, semantically overlapping anaphors is that they disrupt semantic integration. In general, difficulties with semantic integration lead to a greater N400 amplitude, so although there are also other influences on the N400 (see Kutas & Federmeier, 2011, for an overview), JANUS suggests looking for N400 effects.
It is important to note that modulations of the LAN and the N400 are not the only ERP effects that we may see. Other ERP effects related to difficulty in processing do not clearly map onto one model or the other, but would nonetheless be informative about the locus of the difficulty of semantically overlapping anaphors with focused or defocused antecedents. For example, a late positivity (P3b-like) could indicate the need for reanalysis or other problems with discourse integration (e.g., Bornkessel, Schlesewsky & Friederici, 2003) or early effects in the N1-P2 complex could be related to resource allocation (e.g., Hillyard, 1985;Mangun & Hillyard, 1991).
Thus far, our ERP predictions might be interpreted as specifying what we may expect to see at the anaphor. However, the results of van Gompel et al., (2004) and Almor and Eimas (2008), and the analysis of Almor et al. (2017), suggest that some of the difficulties in anaphor resolution manifest themselves downstream from the anaphor, and these difficulties may be reflected in the downstream ERPs. Looking at the ERPs past the anaphor we may see a shift in slow potentials throughout the target sentence. Slow potentials reflect ease of processing or integration, and have been shown to be sensitive to syntactic properties of sentences, the use of working memory in syntactic processing, and thematic role assignment (see King & Kutas, 1995). We should, however, note that looking for ERPs indexed to words downstream from the anaphor potentially introduces issues of interpretation, as they are most naturally thought of as indicating processes (e.g. of integration) associated with the word to which they are indexed.
In summary, across three ERP experiments, we systematically manipulated the influence of semantic overlap and discourse focus on anaphor resolution. We aim to better understand when and why the combination of semantic overlap and focus leads to ease in processing. As there are suggestions that difficulty in anaphor resolution may occur downstream from the anaphor itself (van Gompel et al., 2004;Almor and Eimas, 2008), the use of ERPs will add unique information about this process, since we can look at multiple time points during anaphor resolution as well as linking ERP effects to specific processes (e.g., N400 effect to semantic integration).

Materials and Design of Experiments 1 -3
Across the three experiments we used three-level noun hierarchies as in Cowles and Garnham (2005, e.g. reptile-snake-cobra), so that we could vary antecedent focus and semantic distance between antecedent and anaphor orthogonally. Ideally, we would have liked to test all four conditions in one experiment. However, given the number of stimulus items needed per participant per condition for an ERP study and the limited number of suitable hierarchies from which to construct materials (see Pretests below), each experiment includes only two of the conditions of the four defined by our manipulations. Additionally, we felt the experiment would become too long with all four conditions. The additional materials would have added an extra 50-60 minutes to the experiment, which was already about 50 minutes long.
For these three experiments, 80 exemplar-category triples from three-level hierarchies were selected (see Pretests for details) to use in four different lead-in sentences, and a single anaphor-containing sentence, to examine the effects of Conceptual Distance and Antecedent Focus.
Conceptual Distance was defined in terms of the three-level semantic hierarchies. For example, for the category reptile a conceptually near category member is snake (one level away) and a conceptually distant category member is cobra (two levels away). For each set of materials (e.g. tuple: reptile, snake, cobra), one target sentence (1e) containing the anaphor (underlined) followed the different versions of the lead-in sentence (1a to 1d).
1a. Defocused, near: It was the mongoose that stood up to the snake. 1b. Defocused, far: It was the mongoose that stood up to the cobra. 1c. Focused, near: What the mongoose stood up to was the snake. 1d. Focused, far: What the mongoose stood up to was the cobra.

1e. Target Sentence
The reptile hissed and got ready to strike.
Experiment 1 tests the effect of antecedent focus with a conceptually near antecedent (only) by comparing the processing of the anaphor in the Target Sentence following 1a compared to 1c. Experiments 2 and 3 are introduced in detail below. To preview, Experiment 2 explores the effect of Conceptual Distance between the anaphor and antecedent when the antecedent is in focus, and Experiment 3 compares defocused antecedents that vary in their Conceptual Distance from the anaphor. Three pretests were performed on potential hierarchies. The purpose of these pretests was twofold: First, as in previous research (Cowles & Garnham, 2005), two norming studies assessed whether participants' intuitions matched our own about the hierarchical relationship for each triple. Second, we determined the cloze probability of the anaphor in the Target Sentence. The purpose of the cloze test was to exclude items that had a high cloze probability for a word other than the anaphor. If another word has a high cloze probability, the anaphor will produce an enhanced N400, simply because another word is highly expected in that context.

Pretests
Three pretests were conducted on 175 sets of items to choose the final 80 for the ERP experiments. The first pretest was to establish the validity of the semantic hierarchies. Sixty-eight participants (51 female, mean age 22.7) completed short sentences such as: A(n) X is a type of ______. An individual participant was either given the most specific word in the (putative) hierarchy (e.g., X = cobra) and we looked at how often they produced snake (the middle word) or they were given the middle word (e.g., X = snake) and we looked at how often they produced the most general category name reptile.
Based on this pretest we established the following cut-off: if less than 20% of the participants gave the highest category (reptile), given snake, the item was excluded. If less than 25% of the participants gave the middle category (snake) given cobra, the item was excluded. This criterion is similar to the one used in previous research using three-level hierarchies (see, Cowles & Garnham, 2005).
The final results for the included items were as follows: when the most specific word was included in the sentence (e.g., A cobra is a type of _____), 71.18% of the responses were the middle category word (e.g., snake; SD =16.12, Range: 25-92%). When the middle word was included in the sentence (e.g., A snake is a type of ______), 60.07% of the responses were the most general category word (e.g., reptile; SD = 19.82, Range: 22-100%).
The second pretest was a rating test intended to provide further evidence that participants shared our intuitions about the semantic hierarchies. Thirty-four participants (26 female, mean age 20.8) read short sentences of the form: An X is a more specific type of Y than a Z. and were asked to rate on a 7-point scale (e.g., 1 -fully disagree, 2 -disagree, 3 -somewhat disagree, 4 -neither agree nor disagree, 5 -somewhat agree, 6 -agree, 7 -fully agree). For example, for the triple reptile, snake, cobra, they would have read A cobra is a more specific type of reptile than a snake. Any item that rated below 5 on this pretest was excluded. The mean rating for the items included was 5.98 (SD 0.32, Range 5.3-6.5).
The last pretest was a cloze test. Cloze probability is inversely related to the magnitude of the N400 effect (see Kutas & Federmeier, 2011). Since the target word (e.g., reptile) always occurred as the second word of the critical sentence, we did not expect that the target word would have a high cloze probability. However, we wanted to ensure that there was not a high (or higher) cloze probability for a word other than the anaphor. If a word other than the anaphor had a high cloze probability, the presentation of the anaphor would result in an enhanced N400, simply because another word is highly expected in that context. Seventy participants (51 female, mean age 21.4) saw fragments such as the following and were asked to write the next word in the sentence: What the mongoose stood up to was the cobra. The____________. Each participant saw only one version of each item (cobra or snake, focused or defocused), but contributed data to each of the four conditions (two conditions in Experiment 1 Focused-Near and Defocused-Near, and two other conditions tested in Experiments 2 and 3, respectively -Focused-Far and Defocused-Far. We considered two things when excluding items. First, the cloze probability of an unintended word (i.e., a word that is not the anaphor) should not be more than 60%, (0-50% is often considered to be a low cloze probability, see for example Kutas & Federmeier, 2011). Secondly, our intended word (reptile) should not have a greater cloze probability in any one condition over the others. We specified that the cloze probability difference between different conditions for the target word should be no greater than 15%.
After excluding items, the highest cloze probability for a particular unintended word was 59%. We kept the cloze cut-off at 60%, which was higher than we would have preferred, but necessary to produce enough items. Two items had values of 59%; both were focused-far items. For example, "What the woman was shopping for was the gown. The" ... "gown" was given 59% instead of the intended word "clothing". The cloze probabilities for the intended target word (reptile) were as follows: Focused-Far, mean = 34.19, SD 11.0, Range 12-59; Focused-Near, mean 38.98, SD 8.75, Range 18-58; Defocused-Far, mean = 32.39, SD = 9.42, Range 15-55; Defocused-Near, mean = 34.77, SD = 9.33, Range = 17-58.

Experiment 1
Experiment 1 looks at the effect of focus when the antecedent is semantically close to the anaphor. We compared ERPs to the anaphor reptile (in The reptile hissed and got ready to strike) when the antecedent (snake) is focused (as in What the mongoose stood up to was the snake, Focused-Near condition) versus when the antecedent is defocused (as in It was the mongoose that stood up to the snake, Defocused-Near condition). According to ILH, the Focused-Near condition should result in processing difficulties because of the overlap in semantic information between (focused) snake and reptile leading to interference in working memory. According to JANUS, if there is a difficulty in the Focused-Near condition, it is because of the difficulty in semantic integration compared to the Defocused-Near condition, where the semantic overlap is justified because it is needed to confirm the antecedent-anaphor link (which, in this case, is not with the default, focused, antecedent -mongoose). In terms of ERP effects, a LAN should appear if the difficulty is related to increased working memory load, as the ILH predicts, and an N400 effect if it is related to semantic integration, as JANUS predicts.

Participants
Twenty-seven native English speakers from the University of Sussex participated in the experiment for either a small fee or course credit. Of these, 24 are included in the final analysis (15 females; aged 18-32, mean = 22). All participants had normal or corrected-to-normal vision, normal hearing, were right-handed, and had no history of neurological impairment. Three participants were excluded from the final analysis because of excess eye-movements, excessive noise from muscle tension or technical problems with recording (see EEG Recording and Analysis below).

Stimulus materials
For Experiment 1, 80 lead-in sentences and Target Sentences were presented.
The lead-in sentences were presented in their entirety on a computer screen for 3000 ms. The average length of the lead-in sentences was 8.4 words (SD = 1.30). The target sentences were presented word by word, again visually (see Procedure for details).
For the Target Sentence, the content word from the anaphor (reptile in the reptile) was always the second word in the sentence and the anaphoric noun phrase was always the subject of the sentence. Target Sentences varied in length from 5 to 12 words, (M = 8.3, SD = 1.84). The anaphor was always the same across conditions. Indeed, the whole of the target sentence was constant across the four conditions. None of the anaphors (e.g. reptile in 1e) was over 13 letters in length (M = 6.65, S.D. = 2.4).
The 80 experimental items were pseudorandomised across 2 versions of the experiment, with each version containing 40 exemplars of each of the 2 conditions. Each participant read only one of the lead-in/ Target Sentence pairs from each set of materials. The experiment was split into 4 blocks lasting approximately twelve minutes each. In addition, there was a practice block of 10 items, 8 extra starter items, and 40 filler items, similar in nature to the experimental items. The filler items were included to reduce the number of comprehension questions on experimental items and to minimise ERP data loss after comprehension questions. There were two starter items at the beginning of each experimental block to minimise loss of data. Following each block there was a short break. To ensure that participants paid attention to the content of the passages, for approximately 30% of all of the items (40 questions in total, 41% on experimental items), a yes/ no comprehension question was asked. The response hand for 'yes' or 'no' was varied across participants.

Procedure
Participants were tested individually in a quiet room. They were asked to fill out a questionnaire about their language and basic health background, and also a handedness questionnaire (Oldfield, 1971). In addition, they signed a consent form. Participants were told that the experiment investigated how people integrate information in a story and that some of the passages might be more difficult than others. They were instructed that they would read a number of short passages. Each passage would start with a sentence presented across the computer screen in its entirety (the lead-in context). This sentence would be followed by a second sentence presented word-by-word in the middle of the computer screen (the target sentence). Participants were asked to read the passages attentively and to try to understand them as well as possible. They were also asked to try not to move or blink during the target sentence. After 30% of the passages a simple comprehension question was asked that required a yes/no button press response.
The experimental stimuli were presented using E-Prime 2.0 (Schneider, Eschman, & Zuccolotto, 2002). The experimental session began with a practice block. At the end of the practice block the participant had a chance to ask any questions. A short break followed each block.
Trials began with the entire lead-in context displayed for 3000 ms (Courier New, 18-point font) in white text against a black background. The lead-in context was replaced by a fixation display (+++) for a time that varied randomly across trials from 400 to 800 ms. This display was followed by the target sentence, presented word-by-word in white lowercase letters (the same Courier New, 18-point font) against a black background. The first word and any proper noun were capitalised and the final word of each sentence was followed by a full stop. Each word was presented for 200 ms with an SOA of 500 ms. Following the target sentence, a row of stars (***) was presented for 3000 ms, during which time the participants were told they could blink, but to be prepared for the next sentence. Following the stars, either the next trial (lead-in context) began or a comprehension question appeared on the screen. The comprehension question remained on the screen until the participant pressed the 'yes' or 'no' button on a button box. Accuracy (not speed) was encouraged for the responses to the questions. Indeed, participants were instructed that waiting to respond to the comprehension question was a way in which they could give themselves a short break between items. Following the experiment, the participants were debriefed and given a short questionnaire to determine if they were aware of the purpose of the experiment. None was.

EEG Recording and Analysis
The EEG was recorded from 29 Ag/AgCl electrodes mounted in a Quik-cap (Compumedics NeuroScan, Herndon, VA, USA), each referred to the left mastoid. Electrodes were placed according to the 10-20 system of the American Electroencephalographic Society at midline sites at Fz, FCz, Cz, Pz, and Oz, along with lateral pairs of electrodes over standard sites on frontal (AF3, AF4, F7, F8, F3, F4, FC3, and FC4), central (C5, C3, C4, C6), temporal (FT7, FT8), centro-parietal (CP5, CP6, CP3, CP4), parietal (P5, P6, P3, P4), and occipital (PO7, PO8) positions. Vertical eye movements were monitored via a supra-to sub-orbital bipolar montage. A right to left canthal bipolar montage was used to monitor horizontal eye movements. Activity over the right mastoid bone was recorded on an additional channel to determine if there were differential contributions of the experimental variables to the presumably neutral mastoid site. No such differential effects were observed. The EEG and EOG recordings were amplified with a SynAmps Model 5083 EEG amplifier (NeuroScan Inc., Herndon, VA, USA), high passed at 0.05 Hz and low passed at 70 Hz and with a time constant of 8 s (0.166 Hz). We aimed at keeping electrode impedances below 5 kOhm. The EEG and EOG signals were digitised on-line with a sampling frequency of 500 Hz.
The EEG data were analysed offline using the EEGLAB toolbox (v5.02, Delorme & Makeig, 2004) for MATLAB (The Mathworks, Massachusetts, USA) in a critical window ranging from -150 ms before to 1800 ms after the onset of the critical word. Eyeblink artifacts were removed using Independent Components Analysis (ICA). For the ICA procedure, the filtered continuous raw data were exported to EEGLAB. We used the Infomax ICA algorithm (Delorme & Makeig, 2004) with default parameters. Data from all electrodes (including EEG & EOG) were included in the ICA. Components representing blink artifacts were identified by visual inspection of component activations and by the projections of the components to the scalp. The components representing blink artifacts were removed and the remaining independent components projected back (Delorme, Sejnowski, & Makeig, 2007). Additionally, non blink-related artifacts (electrode drifting, amplifier blocking and EMG artifacts) were removed using the EEGLAB standard abnormal values and abnormal trend procedure on the epoched data. Trials containing such artifacts were rejected (8% overall). Three subjects were excluded from the final analysis because more than 20% of the trials were rejected because of artifacts.

Experiment 1 Results
Average waveforms were computed across all trials per condition for the 24 participants. ANOVAs were conducted with factors Condition (Focused-Near, Defocused-Near) and Electrode. Because our studies have no neutral baseline, we will describe effects throughout by saying that one condition has a more positive (rather than a more negative) waveform than another. Initial visual examination of the ERPs suggested that neither the LAN or N400 were likely to indicate any significant effects. However, to be sure we conducted a time course analyses in which multiple overlapping time windows of 50 ms in steps of 10 ms were used to identify the onset of any effect between 200 and 500 ms post target word onset.
Additionally, we examined ERPs time-locked to the onset of the last word in the target sentence and looked at the time window 350-900 ms post onset (see below). Grand average ERP waveforms time-locked to the onset of the target word for both conditions are presented in Figure 1 for 29 electrode sites.
The time course analysis for the time window 200-500 ms resulted in no significant effects of Condition and no significant interactions.
We examined the ERPs to the last word in the target sentence (-100 to 1200 ms after onset) for end of sentence wrap up effects. Using the same time window analysis of multiple overlapping time windows of 50 ms in steps of 10 ms, a significant effect of Condition was seen from 350-900 ms after onset of the last word (see Figure 2; F (1, 23) = 7.68, MSe = 15.36, p = 0.011). The Focused-Near condition was more positive than Defocused-Near (mean amplitude Focused-Near = 0.263 µV, Defocused-Near = -0.319 µV). This effect appeared larger over the right hemisphere. Additionally, a significant interaction with electrode was seen in this same time window (F (28, 644) = 1.74, MSe = 0.416, p = 0.011). We broke down this effect by region (quadrants, as the effect appeared to have a right frontal maximum: Right Frontal electrodes AF4, F8, F4, FC4, FT8; Left Frontal AF3, F7, F3, FC3, FT7; Right Posterior CP6, CP4, P6, P4, PO8; Left Posterior CP3, CP5, P5, P3, PO7). The region analysis indicated a significant effect of condition across right frontal electrodes (F (1, 23) = 8.23, MSe = 3.64, p = 0.009) and right posterior electrodes (F (1, 23) = 8.07, MSe = 2.81, p = 0.009). There was no significant effect of condition in the left frontal or left posterior quadrants (p = 0.112 and p = 0.19 respectively). We defer discussion of these results until after the results of Experiments 2 and 3, since only by comparing the results of all three experiments can we interpret the findings relative to the predictions of JANUS and the ILH.

Experiment 2
Experiment 2 explores the effect of semantic overlap between anaphor and antecedent when the antecedent is in focus by comparing the ERP to the anaphor reptile when the focused antecedent is either semantically near (snake in What the mongoose stood up to was the snake, Focused-Near) or semantically far (cobra in What the mongoose stood up to was the cobra, Focused-Far; comparing 1c to 1d). The ILH predicts greater processing difficulty in the Focused-Near case, which, because of the involvement of working memory, should show up as a LAN compared to the Focused-Far case. JANUS would also predict more difficulty in the Focused-Near case; however, a greater N400 for the Focused-Near condition compared to the Focused-Far condition would reflect the predicted cause of the difficulty: the unexpectedness of an overspecific anaphor.

Participants
Twenty-Eight University of Sussex students participated in the experiment for course credit or a small fee. Six participants were excluded from the final analysis because of excess eye-movements, excessive noise from muscle tension or technical problems with EEG recording (see EEG Recording and Analysis below). The 22 participants (14 female) included in the analysis were between 18-30 years old (mean = 23). All participants included in the final analysis were native English speakers, had normal or corrected-to-normal vision, normal hearing, were right-handed and had no history of neurological impairment.

Design, stimulus materials and procedure
For Experiment 2, the Focused-Near (1c) and Focused-Far (1d) conditions were compared. All other details of the design and procedure were identical to Experiment 1.

EEG Recording and Analysis
The EEG recording and data analysis were identical to Experiment 1. The only difference lies in the number of trials and participants removed because of artifacts. In Experiment 2, 7% of trials were excluded and six of the participants, because of either the number of trials containing artifacts, technical problems with recording (2 participants) or >20% errors on the comprehension questions (2 participants).

Results
For all trials per condition we computed average waveforms for the 22 participants. We conducted an ANOVA with the factors Condition (Focused-Near and Focused-Far) and Electrode. Figure 3 illustrates the grand average ERP waveforms time-locked to the onset of the target word for both conditions for 29 electrodes.  Following the procedure of Experiment 1, we conducted a time course analyses in which multiple overlapping time windows of 50 ms in steps of 10 ms were used identify the onset of any effect between 200-500 ms post target word onset. Additionally, we conducted this same time course analysis beginning at 600 ms to explore an unexpected effect which was seen around 600 ms post target word onset.
We found no significant effect of Condition nor any significant interaction between Condition and Electrode in the 200-500 ms time window analysis. However, a significant effect was seen in the time window 660-750 ms, with the Focused-Far condition being more positive than the Focused-Near condition (F (1, 21) = 6.16, MSe = 35.65, p = 0.022; Mean amplitude = -0.50 µV for Focused-Far, -1.33 µV for Focused-Near). No other significant effects were seen at the post target word onset.
We also looked at the ERP aligned to the last word in the target sentence, again using the moving time window analysis as in Experiment 1. However, unlike Experiment 1, we saw no significant differences of Condition and no interaction with electrodes.

Experiment 3
Experiment 3 compares defocused antecedents that vary in their semantic overlap with the anaphor. For example, we compare the ERP to the anaphor reptile (in The reptile hissed and got ready to strike, when the semantic overlap between the anaphor and antecedent is greater (Defocused-Near condition, snake in 1a) or less (Defocused-Far, cobra in 1b). Since both antecedents are defocused in this experiment, we would expect to find a benefit from semantic overlap, as it is needed to confirm that the correct antecedent has been selected. In other words, both the ILH and the JANUS model would predict that the Defocused-Far condition would lead to greater processing difficulties than the Defocused-Near condition. Although it is not entirely straightforward to ascertain what ERP effect the ILH would predict, semantic overlap in working memory should always produce interference, but with defocused antecedents there is also a processing difficulty from having to find the antecedent (making the anaphor-antecedent mapping). Possibly the Defocused-Far condition would lead to the larger working memory load, and hence a possible LAN effect in the ERP. On the other hand, JANUS would predict any difficulty to be due to semantic integration, hence a larger N400 for the Defocused-Far condition compared to the Defocused-Near condition.

Participants
Twenty-Seven University of Sussex students participated in the experiment for course credit or a small fee. Participants were all native English speakers, had normal or corrected-to-normal vision, normal hearing, had no history of neurological impairment and were right-handed. Six participants were excluded from the final analysis because of excessive eye-movements, excessive noise from muscle tension or more than 20% errors on the comprehension questions. Twenty-one participants (14 female) were included in the analysis (age range 18-27, mean age 21).

Design, stimulus materials and procedure
Experiment 3 compared the effect of semantic overlap between the anaphor and antecedent for defocused antecedents (conditions: Defocused-Near, 1a, versus Defocused-Far, 1b). All other design and procedure details were identical to Experiment 1 and 2.

EEG Recording and Analysis
The EEG recording and data analysis was identical to Experiments 1 and 2. The number of trials and participants removed because of artifacts differed between the analyses. A total of 6.25 % of the trials were excluded from the analysis in Experiment 3. Six participants were excluded because of EEG artifacts or more than 20% errors on the comprehension questions (1 participant).

Results
Average waveforms for the 21 participants were computed. ANOVAs were conducted with the factors Condition (Defocused-Near and Defocused-Far) and Electrode. The grand average waveforms time-locked to the onset of the target word for both conditions are presented in Figure 5. The time course analysis for the time window 200-500 ms resulted in no significant effects of Condition and no significant Condition x Electrode interaction.
As in Experiment 1, we see a difference between conditions at the end of the sentence (see Figure 6). However, when we aligned the ERP to the last word in the sentence, one participant had many more artifacts in the end of sentence time window (-100 to 1200 ms after onset of last word in target sentence) than in the time window -100 to 1200 ms after target word onset (39% trials rejected vs. 20%). This participant was, therefore, excluded from the end of sentence analysis, and the analyses were based on the remaining 20 participants (see Figure 6). Using the same time window analysis as in Experiment 1 and 2, a significant effect of Condition was seen from 250-800 ms after final word onset (F (1, 19) = 7.15, MSe = 17.44, p = 0.015).
The Defocused-Near condition showed a greater positive shift than the Defocused-Far condition (mean amplitude: Defocused-Near = -0.03 µV, Defocused-Far = -0.68 µV). Additionally, there was a significant interaction between Condition and Electrode (F (28, 532) = 1.51, MSe = 0.45, p = 0.04). To compare this end of sentence effect with the one in Experiment 1, we conducted an additional analysis by region, with the same allocation of electrodes to quadrant. As in Experiment 1, this analysis indicated a significant effect of Condition in the right-frontal region (F (1, 19) = 5.4, MSe = 4.74, p = 0.031) and in the right-posterior region (F (1, 19) = 6.07, MSe = 5.18, p = 0.02). The effect of condition in the left-frontal and left-posterior regions was marginally significant (p = 0.09 and p = 0.08 respectively), but did show the same trend as in the two right regions.

Discussion
Across three ERP experiments we tested the effects of antecedent focus and semantic overlap between anaphors and their antecedents on the processing of anaphoric reference. Specifically, Experiment 1 explored the effect of focus when the antecedent was semantically close to the anaphor, comparing ERPs to the anaphor reptile (in The reptile hissed and got ready to strike.) when its antecedent (snake) is focused (What the mongoose stood up to was the snake, Focused-Near condition) versus when the antecedent is defocused (It was the mongoose that stood up to the snake, Defocused-Near condition). Experiment 2 compared the effect of semantic distance when the antecedent is focused (What the mongoose stood up to was the cobra, Focused-Far condition compared to What the mongoose stood up to was the snake, Focused-Near condition). Experiment 3 compared the effect of semantic distance when the antecedent is defocused (It was the mongoose that stood up to the cobra, Defocused-far condition, compared to It was the mongoose that stood up to the snake, Defocused-near condition). Our primary aim was to investigate the nature and time course of anaphor processing. In addition, we considered our results in relation to two extant theories that appear to make predictions for our study.
In relation to this secondary aim, although the Informational Load Hypothesis (ILH, Almor, 1999) and JANUS (Garnham & Cowles, 2008) made predictions about the ERP data (LAN and N400 differences respectively), neither of these effects were apparent at the target word. Instead, the results indicated no difference in the LAN and N400. The only significant effect of condition seen at the target word was in  Experiment 2, 660-750 ms after the onset of the target word (reptile). Here the Focused-Far condition led to a greater positivity than the Focused-Near condition. In this later time window, processing difficulties typically show up as positivities (P600-like). If our positivity effect reflects the fact that the Focused-Far condition was more difficult than the Focused-Near condition at the anaphor, this would be the opposite pattern to the one predicted by both ILH and JANUS. Indeed, the positivity seen at the anaphor corresponds to a standard (non-inverted) typicality effect and is broadly consistent with early (in the sentence) effects reported by van Gompel et al. (2004) and Almor and Eimas (2008). Although this effect is not obviously related to a specific ERP effect in the literature, we speculate that this late positivity may be related to reanalysis or more specifically discourse integration. It has been suggested that late positivities may be related to the P3b component (King & Kutas, 1995, Bornkessel, et al., 2003Cowles, Kluender, Kutas, & Polinsky, 2007). For example, Bornkessel et al., (2003) manipulated the thematic structure of verbs in verb-final clauses in German and showed a similar positivity when sentences violated 'canonical' thematic order of arguments. Related to this finding, Cowles, et al., (2007) found an increased positivity in answers to wh-questions at a focused element, which they interpreted as indexing the integration of focused constituents. More generally the P3b is considered to represent a domain-general response to new information and resolution of uncertainty. This idea suggests that, at the anaphor, when an antecedent is focused, the semantically far antecedents led to greater integration difficulties than the semantically near antecedents, a finding not predicted by ILH or JANUS, but compatible with standard typicality effects. No such effect was seen in Experiment 1, because there was no semantic distance manipulation. In Experiment 3, any semantic distance effect may be masked by the difficulty of linking with a non-focused antecedent, as the initial processing goal is to determine that there is an appropriate antecedent.
In contrast to the lack of significant differences at the target word in Experiments 1 and 3, there were significant differences at the end of sentence in both Experiments, but no end of sentence effects in Experiment 2. Experiment 1 revealed an effect 350-900 ms after the onset of the final word. The Focused-Near condition resulted in a greater positivity compared to Defocused-Near. Further analysis revealed that this effect was maximal across the right hemisphere. Experiment 3 also indicated a significant effect at the end of the sentence (250-800 ms after the onset of final word). There was greater positivity for the Defocused-Near condition compared to Defocused-Far.
The end of sentence results in Experiment 1 -a late processing difference between the Focused-Near and Defocused-Near conditions -fit well with recall results of Almor and Eimas (2008). They showed poorer recall for focused repeated name referents, which are most similar to our Focused-Near condition (though we did not use repetition). Almor and Eimas's revised version of the ILH predicts that discourse integration measures (e.g., any end of sentence measure) will show a cost related to greater semantic overlap when an antecedent is focused, because the semantic overlap is not justified by a role in identifying the antecedent. The end of sentence results in Experiment 1 are also broadly consistent with the eye movement data of van Gompel et al. (2004), who found evidence for inverse typicality effects at the end of the sentence.
Although JANUS does not make specific predictions about early and late effects, the idea of processing costs related to the semantic integration of a Focused-Near antecedent fits with the overall predictions from the model. Furthermore, more general considerations suggest that such effects might appear at the end of sentences. In summary, this late effect is likely to reflect the processing difficulty associated with the greater semantic overlap between the focused antecedent and anaphor. The lateness of the effect is consistent with the idea that some processing associated with anaphors, and in particular the processing that determines, or is sensitive to, the fit between the anaphor and the antecedent, is delayed to the end of the sentence. In relation to the two models we considered, such effects can be accommodated within either model.
One perspective on these end of sentence effects comes from the revised version of the ILH. The proposal is that working memory effects associated with anaphor resolution occur downstream from the anaphor itself. A different perspective comes from JANUS, which proposes forward-looking functions of anaphors, signalling upcoming thematic shifts and related phenomena, which should affect processing of material that follows the anaphor.
Forward-looking functions explain why an anaphoric expression can have more content than it needs in order to determine its antecedent. Vonk, Hustinx, and Simons, (1992) showed that such overspecific referring expressions signal a thematic shift for a reader. In (3) below, when (3a) is followed by (3b) (3a) is followed by (3d), the use of Johnson, a professor of medicine as an overspecified NP anaphor made the information from the preceding text less available, similar to when there was an actual thematic shift as in (3a) followed by (3b) or (3c). This finding suggests that when there is more information than necessary in the anaphor, a change of theme or perspective is expected.
In line with predictions of both the ILH and JANUS, a strongly-focused antecedent would almost always have unnecessary semantic overlap with an NP-anaphor (even in the 'far' case) because when the antecedent is focused, it is the default antecedent for any anaphor, and 'it' would be sufficient to link with the antecedent. However, unlike JANUS, the ILH does not consider forward-looking functions, such as signaling a perspective shift, that the unnecessary content might have. According to JANUS, expectations about the upcoming discourse (and the possible signal of a thematic shift) could have an effect on how extra semantic information is processed. For a focused antecedent, the use of an NP anaphor suggests there will be a shift in perspective, which in the current experimental items did not occur. Our end of sentence results for the focused antecedent conditions may be reflecting the "surprise" when the expected change in theme fails to materialise. Furthermore, one might expect any such effect to be stronger in the Focused-Far case than the Focused-Near case. Referring to a cobra as a reptile may indicate a change in perspective more strongly than referring to a snake as a reptile, because of the greater change in content.
Putting the end of sentence results from all three experiments together, we see persisting differences at the end of the sentence (ERP to last word) from about 300-900 ms, with the conditions being ordered (from the most positive) FF ≈ FN > DN > DF. With a focused antecedent, the anaphor should be "it", so the greater positivity with focused, than defocused, antecedents may reflect the fact that the anaphors are overspecific, and that the expected thematic shift does not occur. This idea is consistent with JANUS.
Turning to the defocused cases, an anaphor that refers to a defocused antecedent already includes a (thematic) shift, in this case from one referent to another, rather than a perspective shift on a single referent. Our results show that the defocused near case is somewhat harder than the defocused far case. One possible, though speculative, explanation is that by using the distinctive term cobra as the antecedent, as opposed to snake (and at a different level of specificity from the focused antecedent mongoose), there may already have been an indication that the shift might take place, and hence the shift from the mongoose to the cobra is not so unexpected in the (Defocused) Far condition as the shift from mongoose to snake in the Near condition.
To recap, in the Focused conditions, the overspecific anaphors suggest that a change in perspective on the referent is going to happen, but it does not, and so, at the end of the sentence, the overspecific form appears (somewhat) inappropriate. In the Defocused conditions, there was already a shift to the unexpected antecedent at the anaphor, so a change in perspective on the antecedent is not required to justify a more specific anaphor. Also, the use of the specific term in the antecedent (cobra rather than snake) already presaged this change of antecedent, so the second sentence in the Defocused Far passages is easier to process than the second clause in the Defocused Near condition. These ideas could be tested in further experiments in which (a) a pronoun is used as the subject of the second sentence, so that the anaphoric form is not overspecific and, according to our hypothesis, no change in perspective would be expected, and (b) a change in perspective on the referent of the subject noun phrase in the second sentence, or a further change in the Defocused cases, is signalled by the rest of the content of the second sentence (e.g., "The reptile had given birth the previous day"). In the former case, the pronominal anaphors would be potentially ambiguous (e.g., between the mongoose and the snake), so further changes to the materials might be necessary (e.g., pluralizing one of the nouns). In the latter case, it might be necessary to use longer passages, as a change in perspective in the second sentence of a two-sentence passage is somewhat odd.

Conclusion
Three ERP experiments explored the role of focus and semantic overlap on anaphor resolution both at the point of the anaphor, and at the end of the sentence. In the context of exploring the nature and time course of anaphor resolution, we contrasted the predictions of two models, the Informational Load Hypothesis (Almor, 1999;Almor & Eimas, 2008) and JANUS (Cowles & Garnham, 2005;Garnham & Cowles, 2008). The ILH suggests that keeping a semantically overlapping (focused) antecedent and anaphor in working memory has a processing cost. JANUS suggests that the difficulty arises from problems in allocating the content of the anaphor to its forward-and backward-looking functions. One of the forward-looking functions is to signal possible upcoming thematic shifts or changes in perspective. Prediction from the two models is complicated by the fact that the time course of various processes in anaphor resolution is unclear, which is why it is under investigation in our study. An advantage of the ILH, at least in its revised version (Almor & Eimas, 2008) is that it suggests two stages in anaphor resolution. However, the idea of two stages is not confined to the ILH. For example, it is crucial in Garrod and Terras's (2000) bonding and resolution model, and both the ILH and JANUS would benefit from more specific predictions about how focus and semantic distance influence anaphor resolution at the anaphor, immediately after the anaphor, and at later time points. Our own results failed to provide clear support for either theory, though they appear closer to the predictions of JANUS than to those of the ILH. In particular, our results are compatible with the idea of thematic shifts being predicted from anaphoric form, which is part of the JANUS model.
Overall, our results suggest that although the focus of an antecedent and the semantic overlap between the antecedent and anaphor are important, these factors are not the only important contributions to online anaphor resolution. Factors such as a readers' expectations about thematic shift also influence the processing.