On the computational modeling of English relative clauses

: Even in this era of parameter-heavy statistical modeling requiring large training datasets, we believe explicit symbolic models of grammar have much to offer, especially when it comes to modeling complex syntactic phenomena using a minimal number of parameters. It is the goal of explanatory symbolic models to make explicit a minimal set of features that license phrase structure, and thus they should be of interest to engineers seeking parameter-efficient language models. Relative clauses have been much studied and have a long history in linguistics. We contribute a feature-driven account of the formation of a variety of basic English relative clauses in the Minimalist Program framework of Chomsky (1995, 2001), that is precisely defined, descriptively adequate, and computationally feasible in the sense that we have not observed an exponential scaling with the number of heads in the Lexical Array (LA). Following proposals by Gallego (2006) and Pesetsky & Torrego (2001), we assume an analysis involving a uT feature and uRel feature, possibly simultaneously valued. In this paper, we show a detailed mechanical implementation of this analysis, and describe the structures computed for that , which , and who/whom relatives for standard English.


Introduction
Relative clauses have been the subject of much research in modern Generative Grammar.1 These constructions are of particular interest because the head noun of the relative clause appears to be doubly licensed.In (1)a, resp.(1)b, the relative clause head noun man appears to obtain both grammatical case and a theta-role in two distinct positions (assuming that there is only one head noun man); i.e. as object (resp.subject) of the relative clause verb saw (resp.loves), and as object of the matrix verb like.
(1) (a) I don't like [the man who John saw man].
(b) I don't like [the man who man loves Mary].
Analyses of relative clauses in the Generative Grammar tradition have attempted to explain the structure of relative clauses, and clarify how a relative N is licensed.Analyses can be loosely categorized into matching analyses, in which there are two separate relative nouns that have the same reference, and head promotion/raising analyses in which there is only one relative noun that undergoes movement.In an operator movement analysis, like that in (2)a (Chomsky 1977, Chomsky & Lasnik 1995), a relative operator is base generated in the relative clause and raises to the specifier of the CP.It then gets its interpretation by being associated with an external noun, expressed through coindexation here.In another account, shown in (2)b, there are two separate relative nouns that have the same reference, and the lower noun is deleted under identity (cf.Citko 2001).Both of these types of analyses in (2)a-b have been referred to as matching analyses, as there are two separate relative nouns.In a head promotion/raising account, the nominal head of a relative clause undergoes movement outside of the clause, as in (2)c (Brame 1968, Schachter 1973, Vergnaud 1974, Kayne 1994, Borsley 1997, Bianchi 2000, etc.).Unlike a matching analysis, there is only one relative noun which undergoes movement, and it is basically licensed in two positions.The matching analysis does not face the problem of a single noun receiving case and a thetarole in two positions, as it makes use of separate but co-indexed nouns.On the other hand, a raising analysis does not require a separate matching/coindexation operation.A thorough comparison of the various approaches, and their variants, is beyond the scope of this paper,2 but we adopt a version of the raising/promotion account in which new NPs with the same core noun lexeme can be formed through Internal Merge, obviating the need for a separate matching operation.At the same time, we avoid the problem of a single noun receiving multiple thetarole and case assignments, through the use of separate D heads.
English relative clauses pose a non-trivial problem as they vary with respect to the content of the head and edge of the CP, as summarized in Table 1.None of the standard English relative clauses permit which/who that 3 ; there being a well-known ban on doubly filled COMP (cf.Chomsky & Lasnik 1977, among others).Object relatives permit an empty COMP (indicated by Ø), but subject relative clauses do not.This paper presents a computational model of relative clauses based on linguistic proposals in the Minimalist Program (MP) framework. 5We have a full computer implementation of the theory, verified across all core examples presented in this paper. 6The detailed step-by-step derivations computed by the program are too lengthy for inclusion in the body of this paper; they may be found in the online Appendix, thus permitting the reader to verify the accuracy of our claims. 7These derivations should also prove helpful to both linguists and engineers who wish to understand how the components of the theory interact in full detail.To our knowledge, this is the first computer implementation in the MP framework 3 The use of who vs. which is dependent on whether or not the head noun is human/animate and is subject to minor stylistic variation.In particular, who appears to require a human head noun (although it is sometimes used for animals with names; e.g., see https://erinwrightwriting.com/refer-animals), whereas which can be non-human and animate, but it can also be inanimate. 4Most of these examples are from Gallego (2006), who has taken some of these examples from other works such as Kayne (1994) and Bianchi (1999). 5Implemented in Prolog and in Python independently, nearly identical results were obtained, thus providing verification of correct implementation, as well as verification for the consistency of the theory.This involves checking for intended derivations, for the absence of unintended derivations, as well as crashes in cases of ungrammatical input. 6Our model makes logical consistency checks with respect to feature valuation.The model also implements disjunctive logic, i.e. multiple derivations are, in principle, possible.Any logical inconsistencies will result in nonconvergence for a particular derivation, but other derivations may proceed independently.However, we also use the notion of economy of feature-checking to compare and dis-prefer other successful derivations. 7Complete derivations, implemented independently by each author, are available at [redacted for anonymity].
to accurately generate the complete set of basic subject and object English relative clause constructions, while also accounting for the usage of which, that, and who(m) in relative clauses.

Computational modeling of linguistic theory
The computational implementation of linguistic theory requires overcoming substantial barriers including: (1) the careful selection of compatible sub-theories of grammar, a particularly important aspect as relative clauses involve both theories of the noun phrase (NP) and sentential structure (CP); and (2) theory mechanization, as linguistic theories are not specified with a mechanical architecture in mind.We take a linguistically-faithful automatic computer program to be one that autonomously assembles syntactic derivations (beginning with a list of primitive lexical items).The Minimalist Program (MP) continues a long line of inquiry into the nature of the language faculty; this being a theory of competence, and our implementation concretely realizes the generative procedure.The problems of externalization, e.g. the mapping of linguistic representation into instructions to the sensory-motor system, and the problem of (efficient) parsing are important ones for which scientifically-motivated answers are still limited.In this paper, we limit our attention to the modeling of the generative procedure.We emphasize that the generative procedure does not automatically imply a psychologically-realistic parser; it generates structures and sentences starting from a list of (user-supplied) lexical heads. 8 Let us consider the general problem of concrete modeling.First, the theory should be precise and substantial enough to withstand scrutiny.It must be possible to algorithmically specify details down to the level of the linguistic primitives assumed in the MP framework, i.e.Merge (combing syntactic objects (SOs)) and probe-goal feature checking (agreement relations between features on two SOs).This is a non-trivial requirement as theoretical development in the MP framework proceeds apace and in radical fashion. 9Suppose we are able to select a sufficiently precise and broad theory.We also need to provide a computationally-tractable 8 Simply by selecting a correct list of lexical items, our model implies a semi-decidable parsing procedure in the sense that we can explore a finite number of possible options, e.g. with respect to movement and possible empty categories.The problem of convergence for ungrammatical input leaves us with only semi-decidability.An efficient parser would require additional constraints to effectively limit the search space.A cognitively plausible model would also have to account for psycholinguistic data, i.e. express ranked preferences.We note attempts to account for this, e.g. the surprisal theory of Hale (2001), have been made, but not within the MP framework.We leave this important topic for future work. 9For instance, a simple glance at the theoretical apparatus and structural descriptions suffices to confirm that the MP framework described in Chomsky (1995) differs substantially from Chomsky (2001), which, in turn, differs substantially from Chomsky (2013).
implementation.For example, the implementation should not exhibit combinatorial characteristics, such as in terms of temporary syntactic ambiguity or lexical ambiguity, that require exponentially-scaled resources, e.g. as the list of initial heads grows.Finally, in line with broader MP goals, we submit that an implementation of a particular phenomenon should be succinct in terms of the number of construction-specific theoretical devices required, ideally none.Every grammatical feature or data structure we specifically introduce to limit or control derivations is an additional burden, not only to acquisition and evolution, but also with respect to the goal of simplifying core syntax.As we will show, our system is parameter-efficient in this sense.Since relative clauses embed sentential structure, any model of relativization must also include substantial modeling of sentential structure, and therefore will be of broad relevance to general modeling of grammar.
We need to also motivate the MP framework itself.For theoretical linguistics, this requires little justification: the goal of a universal theory built around binary Merge has resulted in a large body of work (since Government-Binding theories of the late seventies) that has contributed greatly to the understanding of language.However, there remains a substantial gap from theoretical to computer models.Müller (2015, 35) writes that there are no "large-scale computer implementations that incorporate insights from Mainstream Generative Grammar." We believe we have substantially narrowed that gap by clearly demonstrating that theoretical achievements can be implemented.Furthermore, Müller (2015, 37) writes that the system presented in Fong and Ginsburg (2012), which is similar to the system presented in this paper, "neither parses nor generates a single sentence from any natural language."We wish to clarify that we compute linguistic derivations which are spelled out as phrases and sentences of English.
We do not employ a phrase structure grammar-based formalism in this paper, choosing instead to implement devices described by theory directly.We are aware that there is a substantial body of work centered around the Minimalist Grammar formalism, e.g.Stabler (1997Stabler ( , 2011)), including computational implementations such as Hale (2003), Harkema (2001), and Torr, Stanojević, Steedman & Cohen (2019), and Indurkya (2021).Detailed comparison of our work with an equivalent Minimalist Grammar is a topic that is beyond the scope of this paper. 10There are also proposals for relative clauses in other linguistic frameworks such as 10 Minimalist Grammar (MG) is a grammar formalism that embraces strict feature checking to drive displacement (movement), i.e.Internal Merge, and selection (External Merge).Lexical entries embed a sequence of (possible arbitrary) formal features to be applied in sequence to fix the correct word and hierarchical order (at the cost of requiring multiple lexical entries to account for different derivation paths).We limit the discussion here to the Head-Driven Phrase Structure Grammar (HPSG), e.g.Sag (1997), that lend themselves to computational implementation, e.g.Müller (2015).Sag (1997) takes a construction-specific approach to relative clauses, slicing them up into a sort hierarchy. 11In Chomsky's MP, construction-specific rules are frowned upon: the goal being to reduce constructions to universal primitives.We also include key examples from Sag (1997) and show how they are handled in the MP framework. 12

The Minimalist Program
It is important to note that the Minimalist Program is a program of research inviting many different theories under the umbrella of eliminating complex operations in favor of the simplest possible operations that can be conceived (thereby contributing to evolutionary plausibility).
Our model follows much of the theory outlined in Chomsky (2000Chomsky ( , 2001Chomsky ( , 2008)).We review the core assumptions and mechanisms below.
At the heart of the theory is binary set-Merge, the simplest possible operation taking two objects that, when iterated, creates hierarchical structure.Merge can result in either symmetric structure, resulting from set-Merge, or in asymmetric structure, resulting from pair-Merge (Chomsky 2000, Chomsky 2004).Pair-Merge is asymmetric in that one of the two merged objects in Pair-Merge is rendered inaccessible to further operations (not so in set-Merge).Set-Merge can be internal or external.Internal Merge (IM) encodes displacement from within a SO, and External Merge (EM) combines two distinct SOs, encoding argument structure.As a concrete example, EM applies to an object DP and a transitive verbal root V forming the set {V, DP}, followed by EM twice to form the theta configuration {DP, {v*, {V, DP}}}.v* is a verbalizer that licenses the outer DP subject.For unergatives, e.g.sleep, the equivalent configuration is {DP, {v, V}}}, and v is the corresponding verbalizer.
major points of departure from MG, including that displacement is not always feature-driven, the null hypothesis being that it is free (to take place or not).It is generally accepted that Merge is concerned only with hierarchy, i.e. encodes nothing about word order, e.g.Chomsky (2013).A MG account adds formal features and Mergeoperations to encode the correct word order.Each additional formal feature requires extraordinary justification due to evolutionary and acquisition burdens, criteria relevant to the notion of Genuine Explanation (Chomsky 2021).We finally note that theoretical linguistics has not adopted MG, and MG has not tracked the trajectory of recent theory, e.g.Labeling theory, and the fact that Internal Merge does not create copies. 11HPSG uses feature structure inheritance and typing to factor out commonalities between sub-constructions.For example, subject relatives with overt wh-relatives are specified as wh-subj-rel-cl with constraints inherited from both general clause structure, viz.hd-subj-ph, and relatives in general, viz.wh-rel-cl.Similarly, object relatives, e.g.fin-wh-fill-rel-cl, inherit constraints from wh-rel-cl and general gap structures, viz.hd-fill-ph.
12 English relative clauses are a complex construction in the sense that there are many exceptions and restrictions, e.g. between subject and object relatives.Simplicity of theory and the elimination of redundancy are what both HPSG and Chomsky's MP aim for.Despite these common goals, it is perhaps telling that both frameworks are quite complicated.In fact, Sag (1997, 453-454) mentions "every constraint … is playing some role in this representation, making the combined effect … a rather intricate theorem." In Chomsky (2000), agreement is implemented using (mostly) local c-command between a probe and a goal.A probe searches top-down into its c-command domain.T (tense), a probe, has unvalued phi-features (person, number, gender) that must match valued phi-features on the goal.For example, in {T, {DP, {v*, {V, DP}}}}, T finds the first DP, the subject.Similarly, unvalued phi-features on v*, a probe, match corresponding valued features on the DP object.
Implicit in this model is that features that remain unvalued will crash a derivation.In this paper, we use uF to represent unvalued F, F a feature: e.g., uT will be an unvalued T feature that needs to be valued in the course of a syntactic derivation. 13  In our model, movement is generally considered to be feature driven. 14Heads may have an Edge Feature (EF) that permits movement to the edge of a phrase.For example, in (3)a, the EF on T results in movement of a subject from a v*P internal position to the edge of TP, and an EF on an interrogative complementizer, CQ, forces movement of a wh-phrase to C in (3)b (for visible wh-movement).
(3) a) I T[EF] read I v* the book.

b) What CQ[EF] did you read what?
We assume that theta-roles are associated with a determiner head D (rather than N). 15 For example, when V Merges with an object DP, V assigns a theta-role to the DP, which lands on the D head of the object.
We also adopt Chomsky's Phase Theory (Chomsky 2001).Assuming the Phase Impenetrability Condition (PIC), once a phase is complete, constituents inside the complement of a phase head are invisible to operations (highlighted by underlining below).However, constituents displaced to the edge of a phase may be accessed outside of the phase.Phase heads are usually assumed to be transitive v* and C (possibly, also D).Thus v*P and CP (highlighted by boldface {..}) are the phases in (4)a.Cyclic movement must involve iterated displacement 13 Approaches exist that rely on the idea that features are interpretable or uninterpretable, and valued or unvalued, so you can have unvalued interpretable features (cf.Pesetsky & Torrego 2007).However, since this notion of interpretability/valuation is not crucial to our analysis, we will simply assume that features can be either unvalued or valued, where unvalued features need to be checked and valued via agreement with matching valued features. 14In more recent work, e.g.(Chomsky 2013(Chomsky , 2015)), IM is considered to be free, i.e. not feature-driven.In our system, we implement a small amount of free Merge in that a single input stream of lexical items can produce multiple structures. 15Here, we assume the DP hypothesis (Abney 1987), which is relatively standard in syntactic theory.However, in some theories, theta-roles are associated with nouns and not determiners.In this case, the theta-role would be assigned to the N head and an argument would not be a DP, but rather an NP.See Bruening, Dinh, & Kim (2018) and Bruening (2020) for arguments that N, not D, is not the head of a nominal argument.Given the variety of possible feature mechanisms, one should seek to constrain the grammatical feature system that shapes syntactic derivations as much as possible, as suggested by Chomsky's MP. 16 Formal language theory tells us that Turing-computability, i.e. arbitrarily powerful devices, can be built on formal features (see Black (1986) and Johnson (1988)).Use of formal features should be kept to a minimum, and conceptually unnecessary ones should be eliminated from the theory. 17Narrow syntax may make use of other features relevant to the interfaces.For example, we assume Q (question) and wh will be read at the semantic interface, and inflectional Case and phi-features will be read at Spell-Out.However, formal features, e.g.Edge, or its earlier incarnation, the EPP (Extended Projection Principle) 18 or ECM (Exceptional Case Marking), are arguably fundamentally limited to Merge syntax, and therefore should be deleted prior to Interpretation and Externalization. 19The introduction of new features during the course of a derivation is also not permitted.Examples of such devices from the past include indices, as used in Binding theory, or the γ-feature, from the Barriers framework (Chomsky 1986). 20  We posit, following fundamental MP assumptions, that there is a one-time selection of heads into a Lexical Array (LA) from the lexicon for Merge, but no explicit staging of features (as in MG, see note 10).In our implementation, the LA is ordered as a queue of heads (for input to Merge) purely for computational convenience. 2116 Chomsky (2011;2013) suggests that narrow syntax be reduced to general Merge, plus residual probe-goal for Agreement.Formal labeling of phrases is carried out at the interface, and therefore phrasal categories have no formal role in Merge.We note that in some other accounts, e.g.Epstein, Kitahara & Seely (2014), Merge may be conditioned on whether labeling obtains for a phrase. 17We make substantial use of an Edge Feature (EF) in this paper.In MP development subsequent to our model, features such as Edge, which regulates the possibility of Merge to the periphery of a phrase, are no longer permitted (or relevant).Instead, movement, i.e. internal Merge, must be unrestricted, and therefore freely available, e.g.see Chomsky (2008) and much work thereafter. 18This is basically the requirement that a clause have a subject (Chomsky 1981). 19If present, any remaining unvalued formal features will crash the derivation. 20Also, in some accounts, e.g.Müller (2011, 122), an Edge Feature (made optionally present) may trigger movement, e.g. for scrambling.Such a device would be ruled out on conceptual (not empirical) grounds. 21For example, simplifying somewhat with respect to functional elements, in the case of a simple transitive sentence, the LA encodes object < verb < subject.In principle if the LA is unordered we can form from this LA sentences schematized as "subject verb object" and "object verb subject".The ordering is for convenience as we seek convergence on our intended sentence only.

The Basic Model
In this section, we briefly summarize the theoretical aspects underlying our account.
We implement a revised version of Gallego's (2006) relative clause analysis, which builds on work by Pesetsky & Torrego (2001) (henceforth: P&T).P&T propose that nominative case results from a checked uT feature (uT = uninterpretable T) on a head D. The head T locally ccommands the subject DP and checks the DP's uT feature, resulting in nominative case.
Embedded C also possesses a uT feature, which can be checked in two ways.One way is by raising T to the edge of C and T checks the uT on C.An alternative way is by raising the subject DP to the edge of CP, in which case the already checked uT on D of the subject checks uT on C. When checked by T, the uT feature on embedded C is pronounced as that (e.g., Mary thinks that Sue will buy the book; P&T 2001, 373).When a nominative subject raises to the edge of CP to check the uT feature on C (e.g., Mary thinks Sue will buy the book), there is no pronunciation of that.
Although in principle, two methods for checking the uT feature on C are available, P&T utilize economy to account for that-trace effects and the English subject/object wh-movement asymmetry. 22In principle, multiple Agree operations are possible in this system.Abstractly, in (5)a, uF1 and uF2 on X are checked by Y and W, respectively.In (5)b, Z checks both uF1 and uF2 at once instead.Economy dictates that we prefer a single operation (over multiple operations), and therefore, (5)b over (5)a. ( Note that the preference for a single Agree relation over a multiple Agree relation does not make any predictions about the presence of that in cases where both are possible, as in (6)a.
Note that P&T assume that in constructions in which there is wh-movement out of an embedded clause, there is a uWh feature in a non-interrogative C that hosts a wh-phrase.The uT feature on C can be checked either by T, (6)b, in which case that is pronounced.The uT feature can also be checked by the subject, (6), in which case that is not pronounced.Therefore, that is optional, and there is no preference for or against that.Gallego extends P&T's proposal to relative clauses, i.e. that that is the pronunciation of uT in C. In addition, Gallego assumes that a relative clause C has a uRel feature that is checked by a relative DP containing a corresponding iRel (i = interpretable) feature.Economy, following P&T, comes into play if both uT and uRel on C can be checked by a single goal.
Gallego's analysis is noteworthy in that it attempts to provide a unified account of relative clause formation and the distribution of which/who(m)/that/Ø.
According to Gallego, in (7)a, who man originates in the subject position of the clause, from where it moves to the relative CP edge, followed by further movement of man to a higher position in the CP.Assume the subject relative D who contains both an iRel feature and nominative case.Applying economy, who checks both uT and uRel on C via a single Agree relation, as shown in (7)b.There is no pronunciation of that crucially because who, not T, checks uT on C. (Gallego proposes that uT and uRel have EPP (Extended Projection Principle) subfeatures, a complication that we do not adopt, that forces who man to raise to the edge of the CP.) Gallego also proposes that there is an extra projection, referred to as cP, in the left periphery that "introduces a subject of predication (Gallego 2006, 157)."This c has uPhi features with EPP subfeatures.The uPhi probe for matching phi-features that are interpretable, and the uPhi find iPhi on man in (7)a-b.The EPP sub-feature of uPhi forces man to raise to the edge of the cP.Example (7)a with that is blocked by economy: pronunciation of that would require uT on C to be checked by T, and uRel on C to be checked by who, separately.However, economy dictates who man checks both uT and uRel features on C simultaneously.In (8)a, Gallego assumes the subject boy contains a null D, and, following Chomsky (2001), that a null D must remain in situ.Furthermore, the uRel on C (conveniently) lacks an EPP sub-feature so that uRel on C is checked by iRel on null D without triggering movement.The uT's EPP sub-feature then causes T to raise to C, and be pronounced as that.(Note that this analysis also implies that the relative DP boy does not move to Spec-T either, as it has a (Adapted from Gallego 2006, 158) This analysis has two potential problems.First, it requires a stipulation that a relative DP with a null D cannot move.Furthermore, there is lexical proliferation for C containing uRel.As uRel typically has an EPP sub-feature, this must be modified for relatives containing a null D, as they do not move in Gallego's account.Therefore, C with uRel must come in two versions: one with an EPP subfeature (as in ( 7)b) and one without an EPP sub-feature (as in (8)b).
Gallego's analysis is also unable to account for the ill-formedness of (9)a; it has difficulty accounting for doubly filled Comp effects.As the structure (9)b indicates, T should be able to check uT on C, resulting in pronunciation of that.Because the relative DP is an object, it does not have nominative case, and thus it is unable to check uT on C. Hence, that must be pronounced, as in (9)a.But (9)a is ungrammatical in standard English.
(9) a) *The car which that John sold.(Gallego 2006, 160) b We adopt a modified version of Gallego's core proposals about uT feature checking on a relative C and about economy.While we follow Gallego's insights regarding uT and uRel feature checking, we omit the extra cP projection and we do not utilize EPP subfeatures.We are able to account for the distribution of the relative D and a noun in the examples above, including the ban on *who that in (7).Also contrary to Gallego, we have no stipulation that a DP with a null D is unable to move.Our analysis, to be discussed below, is able to account for the ill-formedness of (9)a, and we also extend this analysis to account for headless relatives (which Gallego does not investigate) and genitive relatives, as well as other related relative clause types.Note that we assume the judgments of standard English about relative clauses; in particular, no doubly filled COMP.However, there are varieties of English that permit a doubly filled Comp, suggesting there may be some dialectical variation in whether or not a relative D can check a uT feature.We discuss this in detail in Section 5. Suppose all relative D heads check a uRel feature.We assume there is variation in whether or not a relative D may also check a uT feature on C. When relative D can check both a uRel and uT feature, pronounciation of that (only when uT on C is checked by T) is blocked due to economy.Only when relative D is unable to check uT on C, then T can (raise and) check uT on C and that can be pronounced.Merge with a determiner D. We assume all (and only) relative Ds lack the ability to check uD.
Thus, boy may subsequently emerge from sentential structure to head a new noun phrase, and its unchecked D-feature will be checked by the (regular) determiner the, as shown in Figure 3. 24 As only the highest copy may be pronounced, vertical lines in Figure 3 are used to pick out possible pronounced elements of the frontier of the structure.We put aside implementation of a theory of inserting lexical items into a lexical array for future work. 25We assume Crel has no spellout in English.That as in the object relative construction the story that the boy told is derived via T to C movement, following P&T.See the following sections for details.N then get its uD feature checked by a higher D in a regular sentence.The feature checking details regarding uD, uT and uRel are summarized in Table 2.  Relativization productively occurs with external and internal arguments, and even oblique, i.e. non-core, arguments, as will be illustrated in the following sections.In each case, the mechanism is the same, i.e. a relative determiner (Drel) heading the argument to be relativized is attracted by a relative complementizer (Crel).There is no theta role clash (or theta criterion violation) with this movement-based account as we assume theta roles are hosted in D. 27 In Figure 3, external the (not boy) bears the theta role assigned to the entire DP when it is Merged as an argument in a higher clause, and the lowest copy of whorel (not boy) bears the theta role assigned to the external argument of tell.As the and whorel are distinct, there is no theta problem.

Feature
We next summarize the algorithm used to derive the structure in Figure 1 above.Assume all sentences involve a selection of heads from the Lexicon to feed Merge.A head may bear both formal features, e.g.D and Case on nominals (discussed above), and unvalued phi-features (person, number in English) on T and v*.Unvalued formal features must be valued in the course of a derivation (or else the derivation will not converge).Heads may also bear interpretable features, e.g.Q on wh-words and intrinsic phi-features on nominals, e.g. 1 stperson-singular on I/me.A head that probes for matching values gets only one opportunity to value its unvalued formal features, viz.when it is first Merged.If a head H Merges with a phrase YP, as in HP = {H, YP}, YP is the c-command search domain for H's formal features. 26We suggest the entire pair-Merged structure raises like a head; we cannot have *book that on syntax I read.A reviewer asks about how labeling occurs in an example such as the destruction of the city that led to the collapse of the empire surprised the traders.Just as with the VP destroy the city, destroy cannot separate (by raising) from its argument to form an entirely new phrase.The same applies to the derived nominal form destruction, we cannot form *the destruction that of the city led to the collapse of the empire.The entire head with object the city must raise and relabel.(In terms of labeling theory, left to future work, both book on syntax and destruction of the city must be labeled as N.) 27 Alternatively, assume that a nominal consists of a root R that is categorized by a N categorizer, i.e. {N, R} (cf.Marantz 1997, Chomsky 2013, 2015 among others), and that R can raise independently and be categorized anew.
In the boy who told the story, the root boy alone can raise and relabel the clause, as shown in (i).With two Ns, theta-roles can be assigned independently, without violating the Theta Criterion.
Once HP is Merged with another head or phrase, H is inactivated as a probe and cannot search again.As this policy is strict, leftover unvalued features on H will crash the computation, and no convergent structure will be produced.
As the algorithm selects only the first head of a sequence for Merge, heads selected from the Lexicon are sequenced precisely for proper assembly.28For (10)a, the sequence of heads that derives the structure shown in Figure 1 is given in (12)a.
( The system that we implemented is based on the feature-driven Merge model of Chomsky (2000Chomsky ( , 2001Chomsky ( , 2008, and , and  LA items, will crash the derivation.
Multiple threads of derivation are in principle possible if there are multiple possible operations, i.e. choice points, at any given point in the derivation.An example of a theoretical choice point that we use is the possibility of uT on C being checked either by movement of T (resulting in pronunciation of that) or by nominative case on the subject (in which that is not pronounced).In such cases, e.g."the man (that) John saw", the model correctly generates two different structures starting from the same LA.Another linguistic choice point will permit the option of pied-piping for cases like "the man to who/whom I talked" and "the man who/whom I talked to".(Note that we assume that who and whom are inflectional variants of the same word who.Again, two different structures will be generated from the same LA.The model we describe has only linguistic choice points predicted by the theory; there are no temporary ambiguities attributable solely to the algorithm or data structures required.In this sense, our model is maximally efficient with respect to the theory. Assuming we begin, as does Chomsky, with a one-time LA, our LA is selected in order, purely for computational efficiency. 33As the LA is ordered as a queue, the current SO has the choice of External Merge with the first item in the sequence, or ignoring the LA, the choice of Internal Merge, i.e. selecting a sub-SO from within itself.Based on the current SO and the head currently first in line in the LA, our machine will correctly select the right operation one step at a time to converge on the intended SO.(In the case of non-convergence, the machine will end up in a state with no possible continuation, call this a crash.)Lexical items selected from the LA may have unvalued and valued features.In the case of a LA head with an unvalued feature, when it is first Merged to the current SO, it must probe the existing SO for a matching valued feature.For efficiency, we assume all required probing for valued features can be accomplished during this first Merge time; i.e. no second chances are permitted nor needed. 34 Our model also incorporates an operation of Last Resort that enables an unlicensed relative head to move to the edge of a phase.If heads with remaining unvalued features are not to crash the derivation, the phrases they head must move to the edge of the Phase to save and keep the derivation going.This Last Resort operation happens automatically.A general remark about feature-driven movement is in order at this point: a head with an EF licenses movement to its phrase edge.Without it, movement is not permitted.Hence, for Last Resort to operate, either we must assume all Phases have an optional EF, or movement is generally licensed to all Phase edges. 35  The model that we created is complete in the sense that all convergent derivations are grammatical and all grammatical sentences in this paper are generated.For a summary of the basic operations of our model, please see the Appendix.

Derivations of Basic Relative Clauses
Our model crucially accounts for the (im)possibility of that in a variety of relative clauses.
Consider ( 13), in which that can be either pronounced or unpronounced.
33 With respect to LA ordering, see also footnote 21.
34 Heads may probe just once, when the head is first merged with an existing SO.Heads are not re-visited for probing once merged.As probes cannot re-try searching for goals in this theory, once an SO has been built, no probe operations can take place inside the SO.Single-probing is efficient because there is never any need to search the SO for probes. 35An issue is whether the model should require movement to always be triggered by a feature or whether movement should be freely available (without reference to features) in limited cases.The latter option may result in needless overgeneration, i.e. be less efficient, but conceptually it is simpler (and evolutionarily more plausible).
(13) the book that/Ø I read (Gallego 2006, 151) Snapshots of the derivation of ( 13) with that are shown in Figure 4.In Figure 4a, TP is the current SO, and Crel is about to be Merged from the LA.In Figure 4b, Crel has been Merged, and its unvalued features uRel and uT have been checked by Drel and T, respectively.uT on Crel, checked by T, results in pronunciation of that.As Crel possesses an EF, the DP {Drel, book} raises to the edge of CP.We assume Drel cannot check the D feature on the noun (in this case, The corresponding derivation of (13) without that is given in Figure 5 -in this case, the option of uT being checked by the subject (instead of T) is taken, so that is not pronounced.
The remainder of the derivation, illustrated in Figure 5a and Figure 5b is the same as in the case with that, described earlier.We assume talk to is a verb-particle construction, where to is a particle and Case is valued by v*. 36In this case, pied-piping is generally blocked, as *the man to that I talked and *the man to I talked are ill-formed.One explanation for the lack of pied-piping here is that to followed by an empty category disallows pied-piping (Chomsky 2001, 28).( 14) the man that/Ø I talked to Figure 6 Derivation of the man that I talked to 36 Clearly v* + to assigns case.For simplicity of implementation we assume that v* assigns Case without to mediating it.generally cannot check uT on Crel.Then the only available option is for T to raise to check uT on C, obligatorily pronounced as that in (15)a, and illustrated in Figure 8. 37 The covert/overt distinction neatly divides uT (on Crel) valuation; in short, covert Drel cannot check uT and overt Drel can.Thus, the wh-relative counterpart of (15)b, i.e. the boy who called Mary, is available.
37 For P&T, nominative case is a checked T feature on an argument, and thus an argument with nominative case can check a uT feature.However, if certain relative D heads can check uT, regardless of whether or not they have nominative case, then the checked T feature is not necessarily associated with just nominative case.One possibility is that this T feature-checking ability is associated with case in general, not just nominative case.We leave this issue for further investigation.Drel boy raises from the subject of embedded verb call out to the matrix CP.Since it passes through the edge of the embedded CP, there is no violation of the PIC.Drel will check the uRel feature on Crel at the matrix CP.However, in our theory, the uT feature on Crel cannot be checked by Drel, leaving it to be checked either by movement of the matrix subject to C, as in Figure 9, without the higher that, or by T-to-C, pronounced as the higher that, as in Figure 10.In the case of the embedded (non-relative) C, viz.Ce, the lower that is predicted to be obligatory because Drel from Drel boy cannot check Ce's uT feature as it passes through the edge of embedded CP.
One of the authors of this paper finds(16)b, which lacks that in the embedded clause, to be ill- c) The man who John saw d) *the man who that John saw (Gallego 2006, 154) e) the man who loves Mary f) *the man who that loves Mary (Gallego 2006, 151) Our proposal is that whichrel and whorel are relative Ds that may value uT on Crel. 38  Economy then forces uT on Crel to always be checked by a relative wh-determiner when present.
This is summarized in Table 3, which states that it is more economical for a single goal to value multiple uFs on a probe than it is for multiple goals to value the uFs.Basically, the fewer Agree operations required, the better.
Suppose distinct goals G1,..,Gm (m≤n) suffice to value F1 through Fn.A derivation with mmin, the fewest number of goals required, blocks all derivations with m > mmin goals. 39 Table 3: Economy In the derivation of ( 17 38 Consider (i)-(iii).P&T account for (i) and (ii) by economy.They assume that a non-interrogative embedded C has a uWh feature when there is a wh-phrase contained within.A subject wh-phrase will check both the uT and uWh features on embedded C (resulting in (i)).Economy therefore blocks T from checking just one feature, viz.uT on embedded C, and that is not permitted, i.e. (ii) is ruled out.We have argued that the relative which can check a uT feature on relative C. Suppose which book could also check the uT on embedded C, then by economy, that should not be possible in (iii), contrary to fact.We propose, however, that the which in which person does not have the properties of a relative D in that it cannot check a uT feature.Thus, the uT on the embedded C must be checked either by movement of a subject, in which case that is not pronounced, or by movement of T, in which case that is pronounced.(i) Which person did Mary say bought the book?(ii) *Which person did Mary say that bought the book?(iii) Which book did Mary say (that) John bought?
39 This leaves open, in principle, the possibility of there being simultaneous derivations with mmin.In the cases of uT valuation explored in this paper, this theoretical possibility does not occur.The derivation of (18)a is given in Figure 14.Note we assume whorel may also be pronounced as whom at Spell-Out.For some speakers, the form of whorel can be sensitive to Case.For example, whom = who+Accusative.Crel agrees with the relative DP headed by whorel., and uT and uRel on Crel are simultaneously valued.Economy blocks the option of T separately checking uT on Crel, and (18)c is ruled out.The EF of Crel attracts the relative DP to the edge of CP, leaving to stranded.
40 A reviewer points out a contrast between the use of who versus whom, from Radford (1997: 141-142).
(i) a) *Whom were you talking to? b) To whom where you talking?Note that one of the authors finds ia) to be better than ib) and the other author finds ib) to be better.(Both authors are native speakers of English.)As this is clearly a property of dialect, externalization is involved, and Spellout may be sensitive to pied-piping (especially as the pied-piped preposition will be adjacent to who/whom).The pronunciation of who/whom is not a property of Narrow Syntax.Wh-adverbials such as when can also be relativized, as in ( 19)a-b.Note that when and where are adverbials, not determiners.We must extend the rel feature to wh-adverbials, i.e.
whenrel and whererel exist in the Lexicon.This raises a possible acquisition question as not all determiners have a relative counterpart. 41 41 For example, although there is no therel in modern English, the Old English demonstrative se (sometimes translated as the, cf.van Gelderen 2014, 64, 128) can function as a relative determiner (cf.Ringe and Taylor 2014, 444, 447).As whenrel has an iRel feature and can check uT on Crel, our model straightforwardly accounts for (19)a, as shown in Figure 16.We assume that whenrel time initially adjoins at the TP level (as it is a temporal modifier). 42Furthermore, we assume that whenrel checks both uT and uRel simultaneously on Crel, so economy blocks (19)b with that.Finally, the relative wh-adverbial whenrel cannot value the uD of time (as is the case with all relative Ds), hence time raises and its uD is valued via Merge with external the. 43Other similar examples can be found in the Appendix. 44

Figure 16 Derivation of the time when I got drunk
42 A reviewer wonders if adverbs can generally take NP complements.We assume all wh-adverbs may take an unpronounced in-situ NP complement.Radford (2016, 423) observes that "when/where/why have the property that they cannot have an overt nominal complement at PF." 43 For further details of this derivation see the Appendix (Example 11).Note that our model tries Merging the adverbial whenrel time using both pair-Merge and set-Merge.Normally, an adjunct would be Merged via pair-Merge only.However, pair-Merge fails because extraction is impossible, assuming that a pair-Merged element is invisible to extraction.As a result, the set-Merge option is required. 44A reviewer wonders how we can account for adjectival relatives such as the following: (i) John will be [however helpful you are willing to be].Notably, the head of the relative is an adjective, so this could involve relabeling by the adjective helpful which then Merges with the adverbial however.This is an interesting type of example, which we must leave for future work.
Relatives can also occur in non-finite clauses, as in (20)a-b. 45Note that Sag (1997) indicates (20)b as being ill-formed; however, we find it grammatical.Pied-piping seems subject to dialectal variation. 46 (20) a) the baker in whom to place your trust (Sag 1997, 461) b) the baker whom to place your trust in (Sag 1997, 461 -marked as * by Sag) Relevant structures for (20)a-b are given in Figure 17.Note that we employ a non-finite T (Tinf) and a null subject (PRO).We assume a dyadic in that takes complement and specifier arguments.
Figure 17 Derivation of the baker in who to place your trust/the baker who to place your trust in 45 We thank a reviewer for questioning how our model can account for this type of example.
46 A reviewer notes that, in British English, some prefer stranding with who and not whom.Another type of non-finite relative clause can occur with an optional for as in (21)a-b.
(21) a) the person to visit b) the person for us to visit (Sag 1997, 464) Assume that Tinf (as with tensed T) and for are both capable of checking the uT feature on Crel.
For (21)a, as shown in Figure 18     whom occasionally used in some varieties of English. 48Relative D which can be used with non-human NPs.A reviewer notes nothing in our syntactic analysis blocks *the man which arrived.We assume semantic feature matching for determiner-noun combinations is also involved, e.g.-human for which and +human for who.We assume that the appropriate relative pronoun is selected from the lexicon, so whichrel occurs with a non-human relative noun and whorel occurs with a human relative noun.(But see also note 3.) Otherwise, these relative Ds are identical.We have also seen that what can be used with a null NP complement.
48 Radford (2019, 32) writes that "relative whom/whose have largely fallen out of use in contemporary colloquial English."But we note that this does not mean that whom is never used.See Radford (2019, 32-33) and references cited therein for discussion of use of whom in modern English.4 Comparative and Genitive Relatives Hale (2003) implemented a Minimalist Grammar that covers a variety of relative clauses from Keenan & Hawkins (1987) involving subject and object relatives, passivization, comparatives, and genitives.Although Hale covers a wide range of relative clause constructions, the reasons for the uses of the relative determiners which, who, what, and for restrictions on their use with that are not accounted for.Our model also accounts for all of the relative clause examples from Keenan (1987, 63), notably genitive relatives, as well as other related constructions.
The examples in (24) from Keenan & Hawkins (1987, 63) are essentially identical to examples that we have discussed earlier (Hale models (24)a, b, c, and a version of d). 49(24)a is a subject relative (see Figure 8 above), ( 24)b-c are object relatives (see Figure 11 and Figure 12).( 24)d-e contain relative nouns that originate as the object of a preposition (as in Figure 14 and Figure 15). 49Instead of (24)d, Hale (2003:98) lists "the box which Pat brought with apples in", which differs from the original Keenan and Hawkins example in that apples is contained within a PP.Some find these examples marginal or unacceptable.We modeled the original example on the assumption it is broadly acceptable. 50Although not important for our analysis of relative clauses, in (24)b we assume that the adverbial yesterday is a DP that is Merged at the TP level.Note that yesterday can behave as a nominal, as in (i-ii).We adopt Larson's (1985) view that the adverbial yesterday is really an NP with inherent case (Larson's proposal is that if Case isn't checked, a default case can be assigned to certain temporal NPs).
(i) yesterday's refusal (Larson 1985:598) (ii) Yesterday was a great day.We also follow Haumann's (2007) view that temporal adverbials like yesterday are outside the vP.The illformedness of (iii), in which yesterday occurs in a TP-internal position, can be accounted for if yesterday is Merged at the TP level.
(iii) *Illicit smokers were yesterday fined for taking a puff.(Haumann 2007, 265) Similarly, in the following examples adapted from van Gelderen (2013, 127), yesterday (van Gelderen uses last week instead of yesterday), is ill-formed in a TP-internal position, but fine in other positions, that are not necessarily TP-internal.
(iv) a) They were happy yesterday.b) Yesterday, they were very happy, c) *They were yesterday very happy.See the Appendix for the complete derivation of (24)b. 51The passive (16e), as shown in the Appendix, is formed with a v~ and a participle Prt.The v~ is a verbalizing head (Deal 2009, Sobin 2014).Both Prt and v~ have EF subfeatures that force remerge of the relative DP.
We next discuss some examples from Keenan that require some revisions to our core model. 52  In example (25) below, boy originates from a relative DP object of the comparative than. 53  (25) the boy who Mike writes better than (Keenan & Hawkins 1987, 63) 26) and ( 27), but not the other types of examples. 53It is crucial for our analysis that than has a relative clause DP complement (or that a relative DP occurs within the complement of than) in this example.Our analysis may not extend further to other constructions with than.One possibility is that than can be a P, following Hankamer (1973) and Chomsky (1977).Figure 23 analyzes (26) as follows: the possessive subject DP whorel girl 's friend raises from the edge of v*P to the surface subject position at the edge of TP.This is followed by raising to the edge of CrelP (whorel checks uRel on Crel).The head girl raises and relabels the structure.
We need to assume that although whorel is embedded in the specifier of the possessive DP, its Rel feature is visible to Crel and Agree(whorel, Crel) results in raising of the entire DP. 54['s friends] can't be substituted.We assume 0rel + 's is not a possible English word because 's is an affix and it has to affix to an overt word (see Radford 2016, 405-406).We assume that affixation happens at Spell-Out and is not a syntactic process.It requires an adjacent host with phonological content, so drel does not qualify as a host.In this case, that is not permitted because that-relative formation requires separate raising of T to C (to check uT on C) and is blocked by economy (following P&T).
55 Our model also predicts that T can check a uT feature on Crel, resulting in the man whose house that Patrick bought, an example that seems well-formed to us.See the Appendix for the complete derivation.The structure of (29)a is shown in Figure 25.

Figure 25 Derivation of the girl who friends of bought the cake
The relative DP whorel girl originates in a PP of-phrase that is the complement of friends, embedded within the subject.After the subject moves to the TP edge, the relative DP whorel girl moves to the CrelP edge to check the uRel feature, and head girl moves out and relabels.
Example (30)a is similar except that it involves a relative object, and (31)a involves a passivized relative object.
We next turn to deeply embedded genitive relatives, such as in (32).
(32) Give me the phone number of the person whose mother's friend's sister's dog's appearance had offended the audience.(Sag 1997, 450) The structure of the relative DP whose person 's mother's friend's sister's dog's appearance is shown in Figure 26.In our implementation, we must assume that the relative feature of deeply embedded whorel percolates up onto the highest 's, so Crel is able to attract the entire DP to check its uRel feature.

Other varieties of English
We have focused on an analysis that accounts for the structures of basic relative clauses in modern standard English.Although the relative clauses' heads in (33)a-c are unavailable in standard modern English, they do occur in older stages of English and in some modern dialects.
( (39) Ain't [nobody know about no club] (Labov 1972, 188). 59 (40) The [man saw John] went to the store.(Sistrunk 2012, 5) In our analysis so far, we have assumed the Lexicon contains whorel, whichrel, whenrel, and whatrel, all of which are able to value uT on Crel.It is certainly plausible that this property could vary over individual lexical items.Thus, certain varieties of English, whichrel and whorel may lack this ability, thereby permitting uT on Crel to be checked separately by T or nominative Case on a subject, crucially licensing the pronunciation of that in the former case.Another point of variation concerns null Drel; we proposed earlier that null Drel, unlike overt Drel, is unable to value uT on Crel, but it appears that in some dialects null Drel may behave like overt Drel, permitting zero subject relatives, e.g. as in Belfast and African American English.To summarize, a relative D must contain a core Rel feature (by definition).However, the ability to value uT on Crel may vary diachronically and/or synchronically.To summarize, this minor change in lexical feature specification can account for the data described above and is not a problem for our computer implementation in principle.

Conclusion
We have built our theory and verified implementation based on the insights of Gallego's (2006) analysis of relative clauses.Gallego, in turn, has built on the insights of P&T (2001).The fact that this refinement is possible is a sign that the Minimalist Program is a viable research program.Our account makes use of a relative complementizer (Crel) with separate unvalued Rel (relative) and T (Tense) features.Rel is a construction-specific formal feature, distinguishing relatives from normal clauses.Rel and T together are subject to economy considerations, i.e. simultaneous valuation (where possible).Our verified analyses improve upon Gallego in the following ways: a) there is no need for an extra projection in the left periphery, b) there is no stipulation that a null D cannot move, c) there is no need for two types of C, one with an EPP feature, and one without, and d) we are able to account for the absence 59 Labov (1972, 188) gives (i), with a a zero subject relative, as a possible underlying structure for this example.However, another possible structure given by Labov is (ii) which does not contain a relative clause.(i) (It) ain't nobody (that) know about no club.
(ii) Nobody ain't know about no club.
of which that in standard English.The additional stipulations of our model are that: i) a relative D cannot check a uD feature on N, in order to trigger extraction of the relative noun for relabeling, and ii) the null Drel cannot generally check a uT feature, although other relative Ds, such as whichrel/whorel/whomrel/whatrel/whenrel, can.A natural question arises: is our threefeature system, Rel, T and D, minimal, i.e., parameter-efficient? As, in our theory, movement is driven, both EF and something akin to our unvalued D are required to initiate raising and relabeling of the relative clause into a nominal.Finally, a minimum of two features, such as Rel and T, is needed in order to exploit economy.Economy simplifies operational complexity, enabling multiple features to be valued in one operation.More broadly, in the MP framework, the functional category T selects for verbal phrase structure and further projects phrase structure (with a surface subject position). 60In Chomsky (2008), non-selectional properties of T, e.g.phi-features, the ability to value nominative Case and Tense, do not appear in T's lexical entry, but instead are transmitted from phase head C.
Overall, we have developed a detailed and logically consistent feature-driven theory of English relative clauses in the MP framework.We have also built a computer-implemented derivational system capable of converging on the correct analyses starting from an initial LA queue.The implementation confirms that our theory is both complete and detailed enough to We summarize the core relative D facts in Table 4. 60 It is unclear to us whether the Edge feature of T (the requirement for subjects in English, also known as the EPP in earlier theories) should also be inherited from C. 61 We note that a machine learning approach has been taken in predicting the correct stack operation to take in transition-based dependency parsing, Nivre (2003), and in subsequent large-scale models, e.g.Andor et al. (2016).This is reminiscent of selecting the correct Merge operation to perform in our system.Finally, we believe our analysis can be extended to account for data in other dialects and languages, assuming limited variation in determiner heads with respect to the ability to value uT on Crel.There are also a variety of other relative clause types that remain for future work. 62
(7) a) the man who loves Mary/*the man who that loves Mary b) [DP the [cP manj c[uPhi,EPP] [CP [DP who manj]i C[uT, EPP] [uRel,EPP] [DP who manj]i T [DP who manj]i loves Mary]]]]] (Adapted from Gallego 2006, 156) An issue is that Gallego's analysis requires an extra cP projection in the left periphery.It is also not entirely clear why c can attract N from inside of a relative D and not the head of the DP itself, viz.D.

Figure 2
Figure 2 Relative clause structure for the boy who told the story unable to check uD) uT T (pronounced as that), by nominative case (which is a form of T), or by certain relative Ds uRel on Crel iRel of Drel ) a)[story, the, tell, v*, [the, boy], Tpast, C] b)[story, the, tell, v*, [whorel, boy], Tpast, Crel, the] Sequence (12)a is read from left to right with the algorithm selecting the appropriate Merge action based on the current state, i.e. the SO (Syntactic Object) constructed so far and the first input head.29In most cases, there will be only one possible Merge action per state.Nondeterminism, i.e. more than one possible Merge action, is limited solely to linguistic choice points; e.g. the option to pied-pipe a preposition with a DP in English or the T-to-C option described in this paper, both producing derivations that separately converge. 30Let us sketch the steps for Figure 1: step (i) Merge combines story and the, forming a DP; (ii) tell, the next head in the list, Merges with the DP formed in (i), we obtain {tell, {the, story}} (a VP); (iii) the next head, v*, Merges with the VP from (ii), forming {v*, VP}; (iv) the sub-list [the, boy] initiates a sub-computation producing {the, boy}, which replaces [the, boy] in the list of heads, (v) the v* phrase in (iii) Merges with {the, boy}, the External Argument (EA), forming {EA, {v*, VP}} (a v*P); (vi) the head Tpast Merges with the v*P from (v), forming {Tpast, v*P}; (vii) English T has an Edge feature (EF) which triggers internal Merge for {Tpast, v*P}.By minimal search, EA, being the highest accessible DP, is raised, forming {EA, {Tpast, v*P}}.In step (viii),the last head, C, Merges to head the clause.Note there is no ambiguity as to which sub-phrase must label the merged structure at each step.Therefore, the derivation is deterministic (and efficient in this sense).Minimal search itself is implemented using a stack to maximize the efficiency of search.Phrases with unvalued features (or rel) are placed onto a stack when Merged initially.When Internal Merge is triggered or a head probes to value unvalued features, generally only the top stack element is consulted.For Internal Merge, the top stack element is extracted, i.e. raised.As a goal, the top stack element features must be used (to satisfy the probe).Hence, minimal search typically involves no search at all and minimal c-command naturally results.31The derivation of Figure3proceeds similarly with the sequence of heads in (12)b.One crucial difference between (12)a and (12)b is that Crel in (12)b possesses EF, triggering Internal Merge after the equivalent of step (viii), and the EA {whorel, boy} raises (in similar fashion to wh-phrase fronting triggered by CQ).

Figure 4
Figure 4 Derivation of the book that I read

Figure 5
Figure 5 Derivation of the book Ø I read Figure 7 Derivation of the man Ø I talked to

Figure 8
Figure 8 Derivation of the boy that called Mary

Figure 9
Figure 9 Derivation of the boy John thinks that called Mary )a shown in Figure11, a single Agree relation between Crel and whichrel results in simultaneous valuation of both uT and uRel on Crel.Economy blocks the option in which uRel and uT are separately checked (by whichrel (or whorel) and nominative Case on the subject, respectively).Hence, (17)b, which would require checking of uT on Crel by T, is blocked.Similarly, the derivation of (17)c, shown in Figure12, results from a single Agree relation between Crel and whorel, and likewise, (17)d is blocked by economy.

Figure 11
Figure 11 Derivation of the book which I read Figure 14 Derivation of the man who(m) I talked to

Figure 15
Figure 15 Derivation of the man to who(m) I talked man remerges with C rel • man labels Agree(the,man) • uD of man checked Agree(C rel ,who rel ) • uT and uRel of C rel checked • EF of C rel forces remerge of who rel man Economy blocks *the man whom that I talked to • Agree(C rel ,who rel ) is more economical than Agree(C rel ,who rel ) and Agree(C rel ,Tpast) man remerges with C rel • man labels Agree(the,man) • uD of man checked Agree(C rel ,who rel ) • uT and uRel of C rel checked • EF of C rel forces remerge of who rel man Economy blocks *the man to whom that I talked • Agree(C rel ,who rel ) is more economical than Agree(C rel ,who rel ) and Agree(C rel ,Tpast) (19) a) the time when I got drunk b) *the time when that I got drunk , Tinf raises and checks the uT on Crel.The noun person raises and relabels the clause as a nominal.For (21)b, as shown in Figure 19, we assume that the complementizer for raises and checks the uT feature on Crel.Since for is closer to Crel than Tinf, Tinf does not raise given minimal search.The relative DP raises to the edge of CP, and then person raises to relabel.

Figure 18
Figure 18 Derivation of the person to visit

Figure 20
Figure 20 Derivation of what I read pro n remerges with C rel • pro n labels Agree(d,pro n ) • uD of pro n checked Agree(C rel ,what rel ) • uT and uRel of C rel checked • EF of C rel forces remerge of what rel pro n Economy blocks *what that I read • Agree(C rel ,what rel ) more economical than Agree(C rel ,i/T) We next turn to subject headless relative clauses, accounted for in parallel fashion.(23) a) what annoys John b) *what that annoys John This is similar to the case of (22)a-b, except that the relative DP originates in subject position here.The derivation of (23)a is shown in Figure 21.

Figure 21
Figure 21 Derivation of what annoys John pro n remerges with C rel • pro n labels Agree(d,pro n ) • uD of pro n checked Agree(C rel ,what rel ) • uT and uRel of C rel checked • EF of C rel forces remerge of what rel pro n Economy blocks *what that annoys John • Agree(C rel ,what rel ) more economical than Agree(C rel ,T) 50 See the Appendix for complete derivations of these particular examples. 51(24) a) the boy who told the story -Subject relative b) the letter which Dick wrote yesterday -Object relative c) the man who Ann gave the present to -Relative object of P d) the box which Pat brought the apples in -Relative object of P e) the dog which was taught by John -Passivized object relative(Keenan & Hawkins   1987, 63) -relative object of comparative than As shown in Figure 22, the relative whorel boy raises and remerges with Crel, , checking uRel on Crel.The head boy with unvalued uD raises out of CrelP, then relabels and Merges with external the (which checks unvalued uD on boy).

Figure 22
Figure 22 Derivation of the boy who Mike writes better than Agree(C rel ,who rel ) • uT and uRel of C rel checked • EF of C rel forces remerge of who rel boy boy remerges with C rel • boy labels Agree(the,boy) • uD of boy checked 1987, 63)

Figure 23
Figure 23 Derivation of the girl whose friends bought the cake Figure 24 Derivation of the man whose house Patrick bought Figure 26 Embedded relative: the person whose mother's friend's sister's dog's appearance had offended the audience ) a) *which that b) *who(m) that c) *Ø in a subject relative (e.g., *the boy called Mary) Old and Middle English allow a doubly filled CP.Examples (34)a-b below are from Old English.Se is a demonstrative pronoun, although it is translated as a wh-pronoun (Ringe and Taylor 2014, 467).(34) a) Se [weig se ðe laet to heofonrice] is for ði nearu & sticol the way which C leads to heaven is therefore narrow and steep Finally, the following two examples are from Black English Vernacular/African American English.
constitute an (automatic) computer program.The interested reader is referred to the Appendix, which contains step-by-step computer-generated derivations, too detailed to be included in the main body of the paper.The Appendix includes all the English relative clause examples discussed in the paper (and others).The program is able to correctly select the precise Merge operation at each step (without human intervention), based on the state of the current SO and the first available item in the LA. 61Moreover, our implementation permits us to verify that the model does not generate spurious analyses -unpredicted by the theory -for all example sentences.
b *the book which that I read (17)d *the man who that John saw (17)f *the man who that loves Mary (19)b *the time when that I got drunk (22)b *what that I read (23)b *what that annoys John

Table 1 :
Examples of core relative clauses 4

Table 2 :
Feature checkingA reviewer asks how examples such as(11)in which the relative book on syntax, not a simple head, can be accounted for under this relabeling proposal.As the PP on syntax is an adjunct, the relevant structure is <book, {on, syntax}>, where book and the PP on syntax are pair-Merged, as indicated by the angle brackets.Since pair-Merge is asymmetric, the adjunct on syntax is essentially invisible, and book on syntax is treated exactly the same as the single head book; thus it can relabel.26 other work).32Inthis work, Chomsky assumes there is a one-time selection of items from the Lexicon to form a Lexical Array (LA).Merge of Lexical Items is recursively applied to form an aggregate SO.An SO can be selected and Merged from the Lexical Array (External Merge) or it can be Merged from within the current SO; this is the process of movement (Internal Merge).The term Workspace refers to the LA and SO at any given stage.For a convergent derivation, the Workspace must consist solely of a single SO, with formal features eliminated.Any remaining uninterpretable features in the SO, or leftover

of C rel forces remerge of who rel man 's house man
Agree(C rel ,who rel ) • uT and uRel of C rel checked • EF 56We note, for some speakers, examples such as these are ill-formed.57Culicover(2013:161)providesthe following similar example.(i)aman who friends of t think that enemies of t are everywhere remerges with C rel• man labels Agree(the,man)• uD of man checked 5858 A reviewer asks whether the whole relative clause has to be attracted.The whole relative clause must be attracted, which we can see by examining examples of object relativization, shown below.Compare this with cases of pied-piping, where there can be some variation.
(i)the person [whose mother's friend] the play offended (ii) *the person whose the play offended [whose mother's friend]

Table 4 :
Summary of relative Ds in English