Extraction from NP, Frequency, and Minimalist Gradient Harmonic Grammar

Extraction of a PP from an NP in German is possible only if the head noun and the governing verb together form a natural predicate (Müller (1995), Sauerland (1995), Schmellentin (2006)). We show that this corresponds to collocational frequency of the verb-noun combinations in corpora, based on the metric of ∆ P (Gries (2013)). From this we conclude that frequency should be conceived of as a language-external grammatical building block that can directly interact with language-internal grammatical building blocks (like triggers for movement and economy constraints blocking movement) in excitatory and inhibitory ways. Integrating frequency directly into the syntax is not an option in most current grammatical theories. However, things are different in Gradient Harmonic Grammar (Smolensky & Goldrick (2016)), a version of Optimality Theory where linguistic objects of various kinds can be assigned strength in the form of numerical values (weights). We show that by combining a Minimalist approach to syntactic derivations (Chomsky (2001)) with a Gradient Harmonic Grammar approach of constraint evaluation, the role of frequency in licensing extraction from PP in German can be integrated straightforwardly, the only additional prerequisite being that (verb-noun) dependencies (Manzini (1995), O’Grady (1998), Osborne, Putnam & Groß (2013)), Bowers (2017), Bruening (2018)) qualify as linguistic objects that can be assigned strength (based on their frequency).


Extraction from NP
It has often been noted that extraction from NP in German is subject both to structural and to lexical restrictions; cf. Fanselow (1987, ch. 2), Grewendorf (1989, ch. 2.8), Webelhuth (1988;, Müller (1991;, Sauerland (1995), De Kuthy & Meurers (2001), Schmellentin (2006), Ott (2011), andFrey (2015); also see Cattell (1976), Bach & Horn (1976), Chomsky (1977), Davies & Dubinsky (2003) and Koster (1987, ch. 4) for English and Dutch, respectively. 1 The examples in (1) illustrate extraction from NP in German. As 1 Two remarks. First, throughout this paper, we assume that nominal projections in German are NPs (with DPs as specifiers) rather than DPs (with NPs as complements); see Bruening (2009;, Georgi & Müller (2010), and Bruening et al. (2018), among others. As a matter of fact, the dependence of extraction from nominal projections on a close relation of V and N that is at the heart of the present study can be viewed as a further argument in support of the NP-over-DP hypothesis. That said, by relaxing locality requirements for selection in head-head dependencies, the main claim of the present paper -viz., that collocational frequency can be assumed to directly play a role in licensing extraction -could in principle also be formulated in a DPover-NP approach to nominal projections in German. Second, some kinds of extraction from NP in German are subject only to structural restrictions (the position of the NP in the clausal spine), not to lexical ones; e.g., this holds for so-called was-für split constructions (see Müller (1995) for a characterization of this asymmetry). For the purposes of the present paper, we can leave open the question of why was für split does not require a close relation of V and N.
shown by (1-ab) and (1-cd), wh-movement and scrambling can bring about extraction from NP; more generally, the operation is not confined to specific movement types. Furthermore, the operation can involve either complete PP complements of N, as in (1-ac), or R-pronouns that act as complements of the P heads of complements of N, as in (1-bd); the latter option is restricted to varieties of German that allow postposition stranding more generally (and, in the examples here, with a bare vocalic onset of the preposition in particular; see Riemsdijk (1978), Trissler (1993), Müller (2000), and Hein & Barnickel (2018) Among the structural factors restricting the operation we take to be the following. First, extraction from NP is not possible with external arguments (of transitive or unergative verbs); cf. (2).
(2) Next, extraction from NP cannot take place with indirect objects bearing dative case (cf. (3-a)), even if the verb as such allows extraction from NP (cf. (3-b), where extraction from the direct object occurs in a ditransitive, dative-accusative environment).
(4)? A fourth observation is that extraction from NP is blocked when there is a possessor NP 2 We use examples with über ('about') here because this is the preposition that shows up with the canonical cases of extraction from NP; in contrast, with a preposition like von ('of'), which would be more innocuous in the sense that it would avoid the bare vocalic onset, there is good evidence that extraction data are blurred since PPs with such a head can be base-generated outside of NP; see footnote 4. present (either pre-nominally or post-nominally); see (5 All of these structural restrictions on extraction from NP can be derived without too much ado under current approaches to movement, based on (whatever derives) the Condition on Extraction Domain (Huang (1982), Chomsky (1986)) and the Minimal Link Condition (Chomsky (2001;2008)); see, e.g, Müller (2011) for an account of these phenomena that relies on Chomsky's (2001) Phase Impenetrability Condition (PIC). 4 In addition to these structural factors, extraction from NP in German is conditioned by lexical factors. Thus, whereas a verb like lesen ('read') in (1-a) (repeated here as (7-a)) permits extraction from the NP headed by Buch ('book'), a verb like stehlen ('steal') does not, in an identical environment (see (7-b)). Note that syntactically, the two verbs otherwise behave the same (they take an internal theme argument as a direct object and an external agent argument as a subject, they assign accusative to the direct object, etc.). What is more, 3 Also note that scrambling of the indefinite NP to a position in front of the external argument NP keiner ('noone nom '), although slightly marked and dependent on appropriate contexts and intonation, is well formed as such when there is no concurrent extraction from NP; see (i).  Haider (1983; and Diesing (1992) have argued that subject DPs can also be transparent for extraction in German. However, Fanselow (2001, 422) Kuthy & Meurers (2001, 149) and Haider (1993, 172-173), among others). For these, an analysis that does not involve actual extraction seems systematically available. For instance, von-('by') phrases are known as observed by Sauerland (1995), not only is nature of the verb relevant: By keeping the verb identical and modulating the head noun of the object, extraction can also become impossible; see (7-c), where Verlautbarung ('official statement') replaces Buch ('book') in the presence of lesen ('read'). As one might expect, a combination as in (7-d) will also block extraction from NP: Here Verlautbarung is the head noun and stehlen is the governing verb. This effect is not movement type-specific. As shown in (8) (where (8-a) = (1-c)), scrambling of a PP (or of a bare R-pronoun, in the varieties of German that permit this, as in (1-d)) instantiates the same pattern (see Webelhuth (1992); Müller (1995) The conclusion that suggests itself in view of this kind of evidence is that for extraction from NP of a PP complement (or an R-pronoun contained in it) to be legitimate in German, V and N must enter a tight relationship; they must form a natural predicate, i.e., a dependency of two lexical items that qualifies as entrenched. It is not a priori clear how this condition can be implemented in grammatical theory. Following Bach & Horn's (1976) proposal for English, Fanselow (1987) assumes that extraction from NP is in fact never possible in German; rather, data of the kind in (8-a) are the result of a pre-syntactic reanalysis rule that makes it possible for the verb to take not to often involve external generation of an optional argument instead of extraction (see Koster (1987, 196f.), Cinque (1990, 47), Sternefeld (1991, 121), Müller (1995, 397f.), Barbiers (2002, 54), and Gallego (2007, 349 just NP, but also PP directly as arguments, so that PP does not have to leave NP in (8-a) in the first place. Whereas a reanalysis approach along these lines has sometimes been been adopted by subsequent studies (cf., e.g., De ; De Kuthy & Meurers (2001)), severe problems have been pointed out for it that, in our view, make such an analysis untenable (see Webelhuth (1988), Fanselow (1991), Müller (1998), andSchmellentin (2006), among others). For one thing, in the absence of a theory of general restrictions on reanalysis rules, it is completely unclear why reanalysis cannot involve a verb and agent (subject) or goal (indirect object) arguments; recall (2), (3-a). Furthermore, on this view, it is a mystery why specificity and possessor intervention effects should arise if there is no extraction from NP in the first place; see (4), (5). Next, if PP does not have to undergo extraction from NP in the well-formed examples discussed so far, how can it be that NP scrambling creates a typical freezing effect (as in (6-a) vs. (6-b))? Now, it is known that verbs like lesen ('read') in (7-a), in contrast to verbs like stehlen ('steal') in (7-b), may occur in constructions in which the PP is present but the NP is either completely absent or realized only as a pronoun. This is generally taken to be the strongest argument in support of the base-generation approach to extraction from NP; see (9-a) vs. (9-b). However, verbs like geben ('give') in German behave like lesen in that they permit extraction from a direct object NP (cf. (3-b)), but behave like stehlen ('steal') in that they do not allow the NP to be pronominal or dropped (cf. (10-a)). What is more, as shown in (7-c), lesen ('read') does not permit extraction if the head noun of its complement is Verlautbarung ('official.statement'), but NP can be pronominal or zero in this context; see (10-b). Thus, the correlation breaks down, in both directions (there is the option of extraction without the option of pronominal/zero realization of NP, and there is the option of pronominal/zero realization of NP without the option of extraction); and with it goes the argument for reanalysis. 5 To conclude, reanalysis as a tool to account for extraction from NP is problematic from an empirical point of view. Furthermore, as noted above, there is no theory of what a reanalysis rule can and cannot look like; more generally, the concept emerges as dubious from a conceptual point of view, too (see, e.g., Baltin & Postal (1996, 135ff)). At this point, two basic questions need to be addressed as regards the influence of lexical factors on extraction from NP. The first question is how it can be determined whether a V and an N can form a natural predicate; i.e., how this lexical factor can be measured. And the second question is how this information then licenses or blocks the grammatical process of extraction, i.e., how the lexical factor, once its nature is determined, can interact with the building blocks of grammar that are involved in syntactic movement. 6 In a nutshell, the answers we will give are that the concept of a natural predicate corresponds to collocational frequency, which can be encoded as a numerical value for V-N dependencies (section 2); and that an approach to syntax that combines Minimalist derivations with constraint interaction in a Gradient Harmonic Grammar approach makes it possibe to implement the lexical factor, by letting the numerical values capturing different collocational strengths of V-N dependencies interact with constraints that trigger and block extraction (section 3).

∆P
In what follows, we will pursue the hypothesis that frequency is the decisive factor in establishing a natural predicate, i.e., an entrenched V-N dependency, in the cases of extraction from NP that we are interested in. A basic premise is that the absolute frequencies of individual lexical items in corpora will not be particularly informative in this context, and that the same goes for the absolute frequencies of V-N collocations. Rather, what is needed is a more fine-grained approach to frequency that is based on how well the two lexical items in a V-N dependency predict each other. One such measure that has been proposed is collostructional strength (see Stefanowitsch (2009), Gries & Stefanowitsch (2004), and Gries et al. (2005)). More recently, Gries (2013) has suggested to employ the measure of ∆P, and it is this concept that we will make use of in what follows. 7 ∆P X|Y measures how well the presence of some item Y predicts the presence or absence of some other item X. ∆P is defined as in (11). 6 Incidentally, it is worth noting that even if pre-syntactic reanalysis were to provide the correct approach to extraction from NP, essentially the same two questions would arise that arise under the view that extraction from NP is possible in principle. The first question would be identical, and the second question would then be how fixing the first factor can influence the application of the reanalysis rule. 7 It should be noted, though, that we have also investigated three other measures, viz., (i) Mutual Information (MI; see Church & Hanks (1990)), (ii) t-score (see Church et al. (1991)), and (iii) a further account for computing (asymmetrical) collocational strength suggested by a reviewer; cf. the Appendix. Although the results obtained with these measures differ from the results under ∆P in several respects, the basic conclusions carry over unchanged, and (with the possible exception of MI) these alternative approaches could in principle also (11) ∆P (Gries (2013, 143)): Here, p(X|Y = present) captures the probability of the outcome X in the presence of the cue Y; p(X|Y = absent) is the probability of the outcome X in the absence of the cue Y; and to determine ∆P X|Y , the latter is subtracted from the former. The values of ∆P range from −1.0 to 1.0; they are interpreted as follows: • ∆P X|Y approaching 1.0: Y is a good cue for the presence of X • ∆P X|Y approaching −1.0: Y is a good cue for the absence of X • ∆P X|Y approaching 0.0: Y is not a good cue for the presence or absence of X Note that this relationship is asymmetric. An element predicting another element well is not necessarily well-predicted by that element. This means that for every pair of X and Y , there are two values ∆P X|Y and ∆P Y |X . As an illustration, let us look at how ∆P s are determined for a V-N dependency involving kaufen ('buy') and Buch ('book') on the basis of the frequencies of the co-occurrences. To calculate ∆P X|Y , we first search the corpus for the number of all cases where X and Y co-occur, where only one of the elements occurs, and where none of the elements occur. (12) shows such a co-occurrence This kind of information can be used to calculate ∆P X|Y by taking the difference between the probability of X given the presence of Y and the probability of X given the absence of Y . Suppose that X = Buch and and Y = kaufen. ∆P X|Y (= ∆P Buch|kaufen ) is then determined as shown in (13); it shows how well kaufen predicts Buch in the corpus. In the same way, ∆P Y |X (= ∆P kaufen|Buch ) based on the data in (12) is calculated as shown in (14). The resulting value indicates how well Buch predicts kaufen.
have been employed in the present analysis (with arguably slightly less accurate empirical coverage). 8 The numbers here are actual numbers based on our corpus study; see the next section. By comparing the two ∆Ps, it becomes evident that kaufen is a somewhat better predictor for Buch than Buch is for kaufen: The likelihood of a buying event involving books (rather than, say, bikes or guitars) is greater (∆P = 0.01186) than the likelihood that a book is involved in a buying event (rather than, say, a reading or burning event, or some other scenario in which books may show up; ∆P = 0.00381). 9

Corpus
The data in our survey come from the core corpus of Digitales Wörterbuch der deutschen Sprache (DWDS; see Geyken (2007)). The DWDS is a freely searchable corpus consisting of about 5.8 m sentences in the German language. It contains a balanced mix of fictional, scientific, functional, and newspaper texts from the 20th century. The list in (15) shows the queries used to elicit the counts for nouns, verbs, and noun-verb pairs. Ideally, one would like to query the corpus for every instance where a given noun is the direct object of a given verb (recall that this is the only environment in which extraction from NP can be possible, given our characterization of the empirical evidence in the previous section). However, while the corpus is lemmatised and tagged for part-of-speech, it does not encode dependencies. Hence, without an additional step of dependency parsing applied to corpora, the queries can only ever be approximations.
(15) a. Query: Buch with $p=NN Searches for the lemma Buch with the part-of-speech tag NN (common nouns) b. Query: kaufen with $p=VV * Searches for the lemma kaufen with a part-of-speech tag starting with VV (verbs) 9 As a side remark, note that this approach of ∆P determination based on corpus frequencies can run into a technical problem if there are zero counts of some item in the corpus. As an example, consider the data in (i). The verb liken ('to give a like (on social media)') is a loanword from English in German that did not exist in the 20th century and is therefore not attested in the corpus that we base our analysis on (see the next subsection).
In line with this, we obtain results for the V-N dependency consisting of liken and Buch where ∆P liken|Buch is 0 whereas ∆P Buch|liken is in fact not solvable because division by zero is undefined. In what follows, we will abstract away from this technical issue which does not arise in practice (if we want to determine whether extraction from NP is possible with a certain kind of verb, that verb must exist, however rare its occurrence may be). c. Query: near(Buch with $p=NN, kaufen with $p=VV * , 3) Searches for a sentence with the noun Buch and the verb kaufen, with zero to three tokens between them.
The query in (15-c) attempts to find noun-verb pairs by looking for sentences where the noun and the verb are close to each other. This avoids false positives as in (16-a) (where Buch ('book') and gekauft ('bought') are clause-mates in a VP coordination construction, but Buch is the (head of the) object of gelesen ('read'), not of gekauft). However, it also introduces false negatives as in (16-b), where Buch is the (head of the) object of gelesen, but is separated from it by more than three items as a consequence of having undergone topicalization to the clause-initial ('Vorfeld') position. Cases like (16-b) can only pose a potential problem if there is reason to assume that object topicalization also (i.e., like extraction from NP) shows asymmetries depending on how close the relation between the verb and the object's head noun is, such that, e.g., an object headed by Buch ('book') tends to undergo topicalization more often, or more easily (or, in fact, less often, or less easily) in the presence of lesen ('read') than in the presence of stehlen ('steal'). We are not aware of any claims in the literature that would go in this direction, and will assume, here and henceforth, that there is no such effect. Thus, false negatives like (16-b) generated by object movement can be ignored, assuming that they affect all kinds of V-N dependencies in the same way. 10

Results
We have determined both ∆P values for every V-N pair where N is a noun in in (17-a) and V is a verb in (17-b) (see the Appendix for the raw data). This results in high-frequency collocations like Buch lesen ('book read'), combinations of low-frequency pairs where this intuitively seems to be 'the noun's fault', like Verlautbarung lesen ('official.statement read'), and combinations where it is the verb that is responsible for the low frequency, as in Buch werfen ('book throw'). 10 The same conclusion can be drawn for cases where verb-second movement leads to a larger distance between V and N than the one covered by the query.
(18) ∆P s for two V-N pairs: This is in full accordance with the fact that the V-N dependency Buch lesen permits extraction from the NP whereas the V-N dependency Buch stehlen does not; recall the examples in (7-a) and (7-b) above. As shown by the ∆Ps for some other V-N combinations in (19), this result can be generalized: The higher a ∆P is, the more likely it is that extraction is possible. These data also shed some light on which ∆P value may be most relevant for establishing the strength of a V-N dependency (and, consequently, for determining the option of extraction from NP). A priori, three options suggest themselves: ∆P V |N , ∆P N |V , and the arithmetic mean of these two values. Closer inspection reveals that ∆P N |V is not fully reliable. On the one hand, there are cases like Buch weglegen ('book put.away') where ∆P N |V is fairly high (i.e., weglegen ('put.away') is a reasonably good predictor for the presence of Buch ('book')), but extraction is not straightforwardly possible in this environment (cf. *Worüber hat der Fritz ein Buch weggelegt?, 'about.what has the Fritz nom a book acc put.away'). On the other hand, there are also cases like Bericht schreiben ('report write') where ∆P N |V is quite low (i.e., schreiben ('write') is not a good predictor for the presence of Bericht ('report')), but extraction is easily possible (cf. Worüber hat der Fritz einen Bericht geschrieben?, 'about.what has the Fritz nom a report acc filed'). In contrast, ∆P V |N makes the right predictions in these cases: Bericht ('report') is a good predictor for schreiben ('write'), and Buch ('book') is not such a good predictor for weglegen ('put.away'). This leaves ∆P V |N and the arithmetic mean of the values as the remaining options. In what follows, we will settle for ∆P V |N alone. Note that this introduces an asymmetry: Whether a V-N dependency qualifies as a natural predicte or not depends on how well the noun can predict the verb. 11

Scaling
In the next section, we will implement the frequency-based approach to extraction from NP in German in a version of Gradient Harmonic Grammar (see Smolensky & Goldrick (2016)). Standardly, numerical strength values assigned to linguistic objects in this grammatical theory are taken to be within the interval [0,1]. 12 We will therefore scale up numerical values of the type found for ∆P in (18) and (19), by min-max normalization (feature scaling), so that they end up squarely in the [0,1] interval. Thus, the data can be normalized into a range of [0,1] using the formula X ′ = X−min(X) (max(X)−min(X)) . For the V-N dependencies in (18) and (19), this produces the values in (20). We will adopt these normalized values for the theoretical modelling in the next section.
(20) Strength assignments for V-N dependencies:  (20) shows that there is a correlation between a higher normalized ∆P value and the option of extraction. In addition, the plot in (21) reveals that the cut-off point with respect to extraction is not so much between high-frequency and low-frequency pairs of N and V, but rather within the low-frequency area, at a strength of 0.14 (or thereabout). This picture persists when the complete set of data is taken into account (cf. the Appendix). (21)

The Gist of the Analysis
In this section, we show how the different strength values of V-N dependencies correctly predict the options of extraction from direct object NPs in German, assuming (i) a gradient harmonic grammar approach where both violable syntactic constraints and linguistic expressions (like V-N dependencies) are associated with weights, (ii) a minimalist approach to syntactic derivations in which both intermediate and final movement steps target the left edge of a verbal phase (a specifier of v), and (iii) an approach to iterative optimization based on harmonic serialism, where optimization domains are small and the amount of information that can be taken into account during each optimization is limited. However, before we address these issues in detail, let us focus on the gist of the analysis. In all cases of extraction from NP, there is a dependency between a verbal head X and a noun Y that intervenes between the base position (α i+1 ) and (what is typically) the target position (intermediate or final) at the left edge of the verbal phase (α i ); see (22).
At the heart of the analysis is a well-established locality constraint, the CONDITION ON EX-TRACTION DOMAIN (CED), which we take to be violable and weighted. If YP (the maximal projection of Y) is not a complement of X, ungrammaticality will invariably arise with extraction from NP (because of a violation of the CED that will always emerge as fatal); this covers the structural restrictions discussed in section 1. If, however, YP is a complement of X, the CED can be satisfied, and it is at this point that the weight of the X-Y dependency becomes crucial: CED satisfaction can bring about a reward, and this reward is required by each case of extraction from NP because the general constraint blocking movement (ECONOMY CONDITION) as such has slightly more weight than the general constraint forcing movement (MERGE CONDITION), so for the scales to be tipped in favour of the movement candidate, the derivation cannot do without a reward from the CED -and only if the reward for CED satisfaction generated by the X-Y dependency's weight is sufficiently high will extraction from NP (i.e., movement from α i+1 to α i ) be legitimate. This covers the lexical variation with extraction from NP, i.e., the natural predicate effect. In what follows, we flesh out this analysis, starting with Gradient Harmonic Grammar.

Gradient Harmonic Grammar
Harmonic Grammar (see , Pater (2016)) is a version of Optimality Theory (see Prince & Smolensky (1993)) that abandons the strict domination property (according to which no number of violations of lower-ranked constraints can outweigh a single violation of a higher-ranked constraint) and replaces harmony evaluation by constraint ranking with harmony evaluation based on weight assignment to constraints. The central concept of harmony is defined in (23) (see Pater (2009)).
(23) Harmony: (w k = weight of a constraint; s k = violation score of a candidate) According to (23), the weight of a constraint is multiplied with the violation score of a candidate for that constraint, and all the resulting numbers are added up, thereby determining the harmony score of a candidate. An output qualifies as optimal if it is the candidate with maximal harmony in its candidate set; i.e., if it has the highest harmony value.
Gradient Harmonic Grammar (see Smolensky & Goldrick (2016)), in turn, is an extension of Harmonic Grammar where it is not just the constraints that are given weights; rather, symbols in linguistic representations are also assigned weights (between 0 and 1). This gives rise to a very straightforward way of associating strength with linguistic objects. So far, most of the work on Gradient Harmonic Grammar has been in phonology; but cf. Smolensky (2017)

Minimalist Derivations
We adopt a minimalist setting (cf. Chomsky (2001)), according to which syntactic structure is created incrementally by external and internal Merge operations, where the former 13 As a matter of fact, Squishy Grammar as developed in Ross (1973a;b;1975) is an immediate predecessor of Gradient Harmonic Grammar in syntax. It is interesting to note that even though Squishy Grammar is widely regarded as having been refuted once and for all (see Gazdar & Klein (1978) and Newmeyer (1986;), closer scrutiny reveals that very few actual counter-arguments against this approach have been presented. However, a detailed reconsideration of the original counter-arguments, while certainly worthwile, is beyond the scope of the present paper. are responsible for basic structure-building and the latter bring about structure-building by movement. We assume that syntactic movement is restricted by the inviolable Phase Impenetrability Condition (PIC; cf. Chomsky (2001;2008))) in (24). 14 (24) Phase Impenetrability Condition (PIC; inviolable): The domain of a head X of a phase XP is not accessible to operations outside XP; only X and its edge are accessible to such operations.
This implies that movement must take place successive-cyclically, via intermediate edge domains (i.e., specifiers) of phases, where the clausal spine is composed of CP, TP, vP, and VP, of which CP and vP qualify as phases. (We follow Chomsky in assuming that NP/DP is not a phase.) Next, suppose that all Merge operations, including movement steps to intermediate phase edges, are triggered by designated features (cf. Chomsky (1995;, Pesetsky & Torrego (2006), Urk (2015), Collins & Stabler (2016) and Georgi (2017)); this can be enforced by the MERGE CONDITION (MC) in (25)  Next, there is a counteracting constraint that prohibits structure-building; for present purposes, it can be assumed that this role is played by the ECONOMY CONDITION (EC) in (26) (see Grimshaw (1997), Legendre et al. (2006); also see Grimshaw (2006) for an attempt at a yet more principled approach). Like MC, EC is violable, and associated with a weight. That said, as shown in section 1, extraction from NP in German does not distinguish between wh-movement and scrambling (or, for that matter, topicalization, relativization, or others movement types that exist in German); cf. Webelhuth (1992), Müller (1995). For this reason, to keep things simple, we will postulate in what follows that a violation of MC is always of strength -1.0, independently of which movement type is involved. Against this background, two questions need to be answered to provide an account of extraction from NP in German. First, how does optimization of Merge operations proceed technically? And second, how can the (frequency-based) weights assigned to V-N dependencies be integrated as a factor that may enable or block extraction from NP in the presence of MC and EC? We address the two issues in turn.

Optimization
There are two general possibilities to model the interaction of minimalist derivations and harmony evaluation. A first option is that all syntactic operations (which, by assumption, take place in the Gen component of the grammar) precede a single, parallel step of harmony evaluation (H-Eval). This then qualifies as a standard case of harmonic parallelism (see Prince & Smolensky (2004)), and it has been explicitly pursued by, e.g., Broekhuis (2006) and Broekhuis & Woolford (2013). Another option is that Merge operations (GEN) and harmony evaluation (H-EVAL) alternate constantly. On this view, syntactic operations and selection of the most harmonic (optimal) output are intertwined. This model is an instance of harmonic serialism (see Prince & Smolensky (2004)). It has been adopted in, e.g., Heck & Müller (2013) and Murphy (2017)  In what follows, we adopt an approach based on harmonic serialism. Harmonic serialism in syntax can be viewed as a procedure that is actually little more than a reasonably precise specification of standard minimalist approaches that incorporate a concept of the best next step at any given stage of the derivation (see, e.g., Chomsky (1995;  c. O ij forms the input I ij for the next generation step producing a new candidate set The output O ijk with the best constraint profile is selected as optimal. e. Candidate set generation stops (i.e., the derivation converges) when the output of an optimization procedure is identical to the input (i.e., when the constraint profile cannot be improved anymore).
In the present context, the main reason for adopting a harmonic serialist approach is that, in interaction with the PIC, it directly implements strict locality of constraint interaction: Since all competing outputs are separated from the input by at most one elementary operation, it can be ensured that there is no danger that processes taking place in potentially radically different areas of the sentence can interact with the process at issue in unwanted and unforeseen ways; in line with this, harmony evaluation based on weights assigned to constraints and to linguistic expressions remains feasible throughout since the number of interacting weights remains small.

Integrating Dependencies
Finally, it needs to be clarified how the optimization of structures involving extraction from NP can be made sensitive to ∆P V |N -based weight assignments to V-N dependencies. To this end, we postulate that X-Y dependencies relating two heads can function as syntactic primitives that constraints can refer to (and that they can restrict). This assumption has been made earlier in a number of otherwise quite different approaches, and sometimes with a different label attached to X-Y (like chains, catenae, or selections instead of dependencies); see, e.g., O'Grady (1998), Osborne et al. (2013), Manzini (1995)), Bowers (2017), and Bruening (2018;. For present purposes, we assume that dependencies (in this technical sense) are always two-membered (X-Y), and that they are characterized by a selection relation (X selects Y). 15 As detailed above, we assume that ∆P X|Y determines the strength of an X-Y dependency. And we would like to suggest that the constraint where strength of dependencies plays a crucial role in the theory of extraction is the CONDITION ON EXTRACTION DOMAIN (CED; see Huang (1982), Chomsky (1986), and Cinque (1990)) in (28).

(28) CONDITION ON EXTRACTION DOMAIN (CED; violable, weighted):
For all X-Y dependencies, if X-Y intervenes between two adjacent members of a movement chain, X is a sister of the phrase headed by Y.
According to earlier versions of the CED, an XP blocks movement across it if it is not governed (see Huang (1982)), or not L(exically)-marked (see Chomsky (1986)), or not a complement (Cinque (1990)). It is this latter version that we adopt in (28). Furthermore, (28) formulates the CED as a constraint on X-Y dependencies intervening in a movement chain (rather than as a constraint on movement, or on adjacent members of movement chains, as in the original versions). This is so as to ensure that it is the strength of the intervening X-Y dependency (rather than, say, the strength of the moved item, or of the movement chain that it is a part of) that determines CED satisfaction. Assuming the concept of intervention in (29), this change is innocuous.
(29) Intervention: An X-Y dependency intervenes between two members of a movement chain α i and α i+1 iff (a), (b), and (c) hold.
a. α i m-commands X. 16 b. Y m-commands α i+1 . c. It is not the case that X m-commands α i and c-commands α i+1 .
Given (29), all but the most local instances of movement to either intermediate phase edges or final landing sites will cross an X-Y dependency. Let us illustrate the concept of intervention in (29) by looking at some of the relevant configurations. Consider first the case of extraction from a direct object NP to the Specv position; cf. (30).
(30) Dependency intervention with extraction from direct object NP to Specv: vP There are three relevant X-Y dependencies to be considered in (30), viz., V-N 2 (V selects the head of a direct object NP 2 ), v-V (v selects the head of its complement VP), and v-N 1 (the head of the external argument NP 1 is selected by v). Of these, only the V-N 2 dependency intervenes between α i and α i+1 : α i m-commands V; N 2 m-commands α i+1 ; and it is not the case that V both m-commands α i and c-commands α i+1 (the latter is true but the former is not). In contrast, the v-V dependency does not intervene between α i and α i+1 : α i m-commands v; V m-commands α i+1 ; but it is the case that v both m-commands α i and ccommands α i+1 . 17 Third, the v-N 1 dependency does not intervene either: α i m-commands v; but N 1 does not m-command α i+1 ; furthermore, as we have just seen, v as an X makes clause (c) of (29) false. There are further dependencies that eventually need to be taken into account, but they will fail to intervene between α i and and α i+1 because one of their members is too deeply embedded to carry out m-command (for instance, this holds for the D head of DP, assuming that N selects D); so we can conclude that there is a unique intervening V-N 2 dependency with extraction from a direct object NP. Consider next extraction from a subject NP in Specv, to a higher Specv, as in (31). 18 (31) Dependency intervention with extraction from subject NP to Specv: vP Let us focus again on the three X-Y dependencies V-N 2 , v-V, and v-N 1 . This time, the V-N 2 dependency does not intervene between α i and α i+1 ; the reason is that N 2 does not m-command α i+1 . As before, the v-V dependency does not intervene: α i m-commands v; and v fails to simultaneously m-command α i and c-command α i+1 , as required for intervention. However, V does not m-command α i+1 . Still, there is again a unique intervening dependency, viz., v-N 1 : α i m-commands v; N 1 m-commands α i+1 , and, as we have just seen, v m-commands α i but does not c-command α i+1 (whereas v m-commands α i+1 , thus supporting the use of c-command rather than m-command in the second subclause of (29-c)). For present purposes, (30) and (31) are the core contexts of extraction from NP. However, more generally, it can be verified that there is an intervening X-Y dependency (often a unique one) in other extraction from NP scenarios as well. For instance, with extraction from an indirect object NP in SpecV, there is a unique intervening V-N 3 dependency; see (32-a). With extraction from a direct object NP scrambled to Specv, there will be two intervening 17 More generally, with movement to phase edge positions, the phase head itself can never be part of an intervening dependency if it c-commands the lower member of a link of a movement chain (but it can be if the dependency goes into a specifier, which we will turn to momentarily). This assumption is mainly made so as to reduce the number of intervening head-head dependencies, and keep evaluations as simple as possible. A version of the present approach where phase head dependencies always qualify as interveners would also be perfectly feasible. 18 See Müller (2011) for arguments that phase edges are not recursive, i.e., an item dominated by a category in a phase edge position is not in a phase edge position itself, and must undergo movement to a higher specifier in order to reach such a position. dependencies, viz., v-N 2 and V-N 2 ; cf. (32-b).
(32) a. Dependency intervention with extraction from indirect object NP to Specv: Dependency intervention with extraction to Specv from direct object NP scrambled to Specv: Thus, we can conclude that in all these scenarios where extraction takes place from an NP in a specifier or complement position, there is a dependency intervening between the moved item and its base position. 19 Based on these assumptions, we postulate that the CED, as a constraint on X-Y dependencies, plays a dual role in harmony evaluation. On the one hand, it is a negative constraint, just like MC and EC are: The CED registers a violation if it is violated by an output (and the strength of the violation depends on the strength of the X-Y dependency that gives rise to it). On the other hand, however, the CED is also a positive constraint, unlike MC and EC: It assigns a reward if it is satisfied. Positive constraints of this type are difficult to implement in standard parallel optimality theory (because of an Infinite Goodness problem arising according to which one could in principle carry out an infinite number of processes yielding rewards from a given constraint), but as noted by Kimper (2016), this problem vanishes under harmonic serialism, where input and output can be separated by at most one operation. Kimper observes that adopting positive constraint evaluation is empirically advantageous in the area of autosegmental spreading in phonology; and it turns out to also give rise to a much simpler account of the natural predicate effect with extraction from NP than would otherwise be available. Positive evaluation of the CED has the consequence that if an X-Y dependency satisfies the constraint, it can yield an additional reward, depending on the weight assigned to the X-Y dependency via ∆P X|Y . 19 Scrambling from an (in-situ) object NP may target a position preceding an in-situ subject, as in the examples in (8); this case falls under (30). However, such scrambling may also end up in a position following a subject, with an identical natural predicate effect arising; compare (8-a), (8-b) with (i-ab). In these contexts, there would in fact not exist a V-N dependency if scrambling were to target the local SpecV position (V would m-command the moved item and c-command its trace). We assume that scrambling in German always targets Specv, with the option of subsequent fronting of the subject to a position preceding the direct object's landing site (either via scrambling to Specv again, or via optional movement to SpecT).

Analyses
Let us look at some consequences. Suppose that MC is associated with a weight of 4.0, and EC with a weight of 5.0. Based on just these two constraints, the default consequence is that movement (or, in fact, any other kind of structure-building) is not possible: An output that carries out movement (in the presence of a designated feature [•F•]) will incur a violation (-1) of EC, and end up with a harmony value of -5.0. In contrast, a competing output that fails to apply movement will only trigger a violation (-1) of MC, therefore has an overall harmony value of -4.0 (other things being equal), and will thus always be selected. On this view, to bring about movement (i.e., to make the output with movement optimal), it is necessary to get a reward from the remaining constraint, CED. 20 We take the CED to be associated with a weight of 7.5. Under these assumptions, a first prediction is that NP specifiers (subjects, indirect objects, and moved NPs) are invariably islands. Movement from a position within NP to the next edge of a phase will always violate the CED, and thus the bias against the movement-inducing MC will actually be strengthened. As we have seen, there are intervening dependencies in these environments: There is an intervening v-N dependency with extraction from subject NPs (see (31)), there is an intervening V-N dependency with extraction from indirect object NPs (see (32-a)), and there is an intervening v-N dependency (plus an intervening V-N dependency) with extraction from scrambled objects (see (32-b)). Consequently, the CED springs into action here, and rules out extraction. This is shown for the case of extraction from a subject NP in (33).
-1 -1 -12.5 In (33), output O 2 leaves XP 1 in situ, within the subject NP in Specv, even though, by assumption, there is a featural trigger for it. This gives rise to a -1 violation of the MC with weight 4.0, and to a harmony score of -4.0. On the other hand, O 1 extracts XP 1 out of the subject NP in Specv, to an outer Specv position, as required by MC (and ultimately by the PIC). This violates EC, yielding a violation score of -5.0. However, in addition, the CED is also violated since there is an intervening v-N dependency, and NP is not a sister of v. It is clear that, whatever the weight of the v-N dependency is, the constraint profile of the output that employs movement is thereby further worsened. For the sake of concreteness, we have registered a -1 violation of CED with O 1 , yielding an overall harmony score of -12.5; but essentially the same result would have been obtained if the v-N dependency had a weight of, say, 0,01 (with -5,075 as the overall harmony score). The fact that the in-situ candidate O 2 wins this competition is, as such, not yet fatal. However, it is clear that XP 1 movement to the eventual target position later in the derivation (unless this already is the final landing site, as with local scrambling) will now eventually give rise to a fatal violation of the inviolable PIC. Consider next the consequences that arise for extractions from NPs that are complements of V, i.e., direct objects. In this scenario, the CED is not violated. However, this does not yet suffice to permit extraction from the complement domain of N to the phase edge of v; in addition, there must be a sufficient reward from the CED (with weight 7.5) generated by an intervening V-N dependency. This reward may then render fatal the MC violation incurred by the output that does not apply movement, by lessening the EC violation incurred by the output that does. The reward is big enough in the well-formed cases of extraction from NP (i.e., where a natural predicate is involved, with a strength >0.133), and too small in the ill-formed cases of extraction from NP (where V and N do not enter a tight relation, with a strength <0.133). 21 To illustrate this, we will focus on two weights assigned to V-N dependencies that are close to the dividing line between V-N dependencies that permit extraction and V-N dependencies that do not permit extraction; recall (20). Suppose first that the V-N dependency is equipped with a numerical value of 0.12 (roughly the strength associated with Buch stehlen ('book steal')). As shown in (34), this leads to a reward of 0.9 provided by the CED. Thus, the harmony score of the output that employs movement (i.e., O 1 ) is improved. However, (34) also shows that this does not yet suffice to license movement; the EC violation incurred by movement is still too strong, and leaving XP 1 in situ, as in O 2 , remains the most harmonic strategy.
(34) Optimization of extraction from direct object, ∆P V |N → 0.12: -1 +0.12 -4.1 21 As a matter of fact, the idea that specific types of head-head dependencies can extend locality domains and permit extraction from XP which is otherwise blocked is not new. Koster (1987) postulates a Bounding Condition according to which each XP is a locality domain that as such blocks movement (and other processes), and that can only be made transparent by so-called "dynasties" of heads that stand in a government relation. Baker (1988) proposes that each XP is a priori a minimality barrier that can only be made transparent by movement of a head Y 1 to the next higher head X 2 that takes YP 1 as its sister; such head movement can be abstract (i.e., invisible), in which case it is signalled by a co-indexing of the two heads involved. Similarly, Staudacher (1990) suggests strengthening Chomsky's (1986) concept of L-marking to head-marking; on this view, a YP 1 is a barrier if it is not a complement of a head X 2 that specificically selects the head Y 1 of YP 1 . Of course, none of these (and other, related) approaches can accomodate the frequency of V-N dependencies, in whatever form.
Things are different when the V-N dependency has a weight of 0.15, though (approximately the strength associated with Bericht schreiben ('report write')). As shown in (35), in this case the reduction effect brought about by the 1.125 reward for CED satisfaction is sufficiently large to permit the unavoidable violation of EC in the movement candidate O 1 ; and the MC violation incurred by the in-situ candidate O 2 becomes fatal.
(35) Optimization of extraction from direct object, ∆P V |N → 0.15: It is clear that all V-N dependencies with a weight higher than 0.15 (i.e., with higher ∆P V |N values, as with, e.g., Buch lesen ('book read') or Buch schreiben ('book write'), which have normalized ∆P values in the 0.5, 0.6 area) will ceteris paribus also permit extraction from a complement NP, and that all V-N dependencies with a weight smaller than 0.12 will invariably block it. Thus, by assuming frequency-based ∆P values to act as weights associated with V-N dependencies, the concept of a natural predicate can be given a precise characterization, and asymmetries arising with extractions from NP in German can be derived.

Consequences
Needless to say, the present analysis makes a lot of further predictions, and raises several new questions. One obvious consequence is that not just extraction from NP, but in fact all instances of movement that are not extremely local will depend on an intervening headhead dependency giving rise to a CED reward that sufficiently reduces the negative harmony value incurred by the EC violation inherent to movement, so as to make the output that carries out movement more harmonic than the output that does not (and that thereby violates MC). For instance, given (29), a movement step from Specv to SpecC (as in standard cases of wh-movement) crosses an intervening T-v dependency: In (36), α i m-commands T, v mcommands α i+1 , and whereas T c-commands α i+1 , it is not the case that T m-commands In contrast, the C-T dependency does not intervene in (36) since C m-commands α i and c-commands α i+1 . If nothing more is said, this dependency must be strong enough to bring about a sufficient CED reward to license the movement step, i.e., T-v must be associated with a weight >0.133. We will assume that, more generally, when a head-head dependency involves two functional categories, or one functional category and one lexical category, the weight associated with it is typically very high; this follows naturally by determining the ∆P values: A category like v is an extremely good predictor for a category like T, even if it is assumed that the particular phonological realizations of v and T are taken to be decisive (rather than the abstract functional category labels); the reason is that the number of different manifestations of both v and T is very small (and v and T usually co-occur). A further consequence of the analysis concerns EPP-driven movement of subject NPs to SpecT, which we take to be optional in German. Given a clause structure as in (37), there is no head-head dependency intervening between two members of a movement chain α i and α i+1 (T m-commands α i and c-commands α i+1 ).
Consequently, the CED cannot be violated in (37), but there is also no reward since there is no dependency that satisfies the constraint non-trivially (in general, trivial constraint satisfaction by dependencies must not be able to generate a reward). This means that movement should ceteris paribus be blocked in (37) (with the in-situ candidate violating MC being more harmonic than the movement candidate violating the constraint EC, which has a greater weight than MC). Several options suggest themselves to solve this problem. A simple solution would be that the EPP feature triggering (internal or external) Merge with T has more strength than other features triggering movement. 22 Next, recall that varieties of German allow for the option of moving an R-pronoun wo ('where') or da ('there') as the pronominal argument of a preposition, and this may also further involve extraction from an object NP, as in (1-b) and (1-d). R-pronoun extraction from NP is determined by exactly the same structural and lexical factors that PP extraction from NP is determined by; in the present context, this implies that the N-P dependency does not directly interact with the V-N dependency in the same optimization, neither by contributing additional weight, nor by reducing weight. The facts fall into place if it is assumed (i) that PP is accompanied by a functional projection pP on top of it, (ii) that pP qualifies as a phase, and (iii) that N continues to select P (deviating from strict locality in this environment; see foonote 1) but does not select p (cf. Riemsdijk (1978), Koopman (2000), and Abels (2012) for discussion of relevant proposals); cf. (38) (compare (30)).
Under these assumptions, an R-pronoun needs to reach Specp before moving on; and since there is no N-p dependency and the P item of the N-P dependency fails to m-command α i+1 in the specifier of pP, there is no additional intervening dependency to consider.
Finally, it can be noted that the present approach opens up the possibility of implement-ing Featherston's (2004) findings regarding the role of frequency in extraction from CPs in German in a very direct way. In German (and many other languages), the legitimacy of extraction from an embedded declarative clause headed by dass ('that') depends both on the grammatical function ( Featherston's (2004) observation is that bridge verbs are more frequent than non-bridge verbs. Thus, the mean log frequencies of CP-embedding verbs that can be derived by collecting the absolute frequencies of these verbs in four different corpora, converting the numbers by applying a logarithm function, summing the four individual resulting numbers for each verb, and finally dividing them by four strongly correlate with the option of extraction from CP (which was determined by experiments involving grammaticality judgements). This is shown for the verbs sagen ('say'), glauben ('believe'), and bezweifeln ('regret') in (40). Interestingly, even though Featherston (2004) has, in our view, convincingly identified frequency of the matrix verb as a factor determining the option of extraction from CP in German, the grammatical theory he employs (the Decathlon Model; see Featherston (2005;), while designed to predict frequencies in outputs, does in fact not incorporate frequency as a grammatical building block that may interact with other building blocks (like MC, EC, or CED in the present approach) to license or block extraction. Accordingly, Featherston (2004) remains silent on how to actually account for the frequency effect with the bridge verb phenomenon in grammatical theory. In contrast, it seems clear how the effect of frequency on extraction from CP could be modelled in the present approach. First, instead of bare V frequencies, ∆P V |C values for V-C dependencies that intervene between a movement chain member α i+1 in SpecC and its immediate chain antecedent α i in the matrix Specv have to be determined (we have not done this but are reasonably confident that the results will be very similar to Featherston's results). And second, normalized versions of these numbers are then predicted to bring about CED-based rewards that permit extraction from CP with highly frequent V-C dependencies (i.e., V-C dependencies that form a bridge).

Concluding Remarks
It is a standard observation that extraction of PPs and R-pronouns from direct object NPs in German is dependent on V and N forming a natural predicate. In this article, we have argued that this can and should be conceived of as a frequency effect: Only those V-N dependencies permit extraction from a direct object NP that have a sufficiently high ∆P V |N value. In other words: Frequency can act as a language-external grammatical building block that transparently and directly interacts with language-internal grammatical building blocks regulating syntactic movement. We would like to contend that such a finding is difficult to reconcile with virtually all of the more widely adopted grammatical theories. It seems that the best one can do in standard approaches in order to implement the generalization is to view frequency as a factor determining the learning of syntactic operations, or rules. On such a view, highly frequent V-N dependencies could have become equipped with a special diacritic in the course of language acquisition, and the decision on whether movement can or cannot apply could then be made sensitive in the grammar to the presence or absence of this diacritic. 23 We take it to be uncontroversial that such a use of ad hoc diacritics whose sole purpose is to encode some other well-defined, independently existing piece of information that cannot be available in the grammar for systematic reasons is to be avoided if at all possible. As we have tried to show, Gradient Harmonic Grammar is unique among current theories of grammar in postulating that linguistic objects are associated with numerical weights that then interact with the weights assigned to the language-internal grammatical constraints, and that therefore make implementing frequency values a straightforward option. Our approach combines standard constraint evalulation of Gradient Harmonic Grammar with standard Minimalist derivations and standard Harmonic Serialism (which independently suggests itself for Minimalist derivations due to its inherently derivational nature). The only innovative assumption that we had to make is that the weights of V-N dependencies (as well as of other head-head dependencies) are determined by frequency. 24 In addition to this substantive conceptual difference, a diacritic-based approach where frequency only plays a role in language acquisition and an approach where frequency acts as a language-external building block in the grammar itself are also not extensionally equivalent. At least in principle, they make different empirical predictions when it comes to variation in the domain of extraction from NP. Indeed, there seems to be quite a bit of varia-23 Arguably, the situation is basically identical in Construction Grammar, where entrenchment may make frequent V-N constructions amenable to extraction in the course of language acquisition; but frequency as such remains a factor relevant for learning a language here, and is not an actual building block active in the grammar (i.e., from a Construction Grammar perspective, the set of constructions exhibiting different degrees of abstractness, and the inheritance networks connecting them). 24 In contrast, the weights of individual lexical items (and constituents more generally) in general do not seem to correspond to frequency; see Smolensky (2017), Lee (2018), Müller (2019), and also the earlier proposals in Ross (1973a;b;1975). tion with extraction from NP. In Gradient Harmonic Grammar, there are two natural sources for this: First, different weights of constraints (MC, EC, or CED, in the case at hand) can produce different optimal outputs. This implies that speakers with slightly different weights assigned to crucial constraints may simply have different thresholds for accepting or rejecting extraction from direct object NPs, without there being any weight differences with respect to V-N dependencies. Second, different weights of N-V dependencies can of course also produce different optimal outputs. To end this article, it is this latter consequence that we would briefly like to focus on. 25 Corpora like the DWDS core corpus can only approximate the frequency of V-N dependencies in the external and internal linguistic inputs accessible to speakers. If the external linguistic input (i.e., the body of linguistic data outside of a speaker, which are accessible by hearing or reading) is vastly different, different outputs may become grammatical. To give a concrete example: Suppose that a speaker is immersed in a culture which is just like that of a prototypical German-speaking community, except that there is a tradition of of throwing books in the air after reading them. In that case, Buch ('book') will be a much better predictor for werfen ('throw') than it is in (19), and ∆P werfen|Buch will be much higher. Here we may then expect that sentences like Worüber hat Fritz ein Buch (in die Luft) geworfen?, ('about what has Fritz a book (in the air) thrown') will become well formed. The same conclusion can be drawn for internal linguistic inputs (i.e., all the acts of thinking in terms of language without ever externalizing it, conducting inner monologues, and the like). Suppose, for instance, that some Nazi speaker fantasizes about burning books all the time and very clearly distinguishes between authors, or between topics, of the books that he wants to burn. In this scenario, ∆P verbrennen|Buch will go up, and it would seem to be likely that this speaker will accept sentences like Über wen soll ich heute ein Buch verbrennen? ('about whom should I today a book burn'), which are certainly not well formed otherwise for most speakers (unless they have extremely reduced thresholds). These two thought experiments make it possible to distinguish empirically between the diacritic-based approach to frequency effects in extraction from NP and the purely frequency-based approach that we have pursued. In the former approach, frequency determines language acquisition and ceases to be active afterwards, whereas frequency stays active as a factor in the latter approach, and a change in frequency is expected to potentially lead to a change in the application of grammatical operations. Therefore, a change of the external linguistic input or of the internal linguistic input at any point in time is predicted to result in different extraction options under the direct approach to frequency effects advocated in the present paper, but not under the indirect approach that confines the role of frequency to language acquisition. Effects of the type hypothesized in this paragraph may then be taken as a further possible argument in support of 25 A third possible source of variation arises if a stochastic component is added to the grammar; see, e.g., Hayes (2001), Bresnan et al. (2001), and Boersma & Pater (2016). We will not pursue this option here further; the present system is strictly categorical. However, it seems clear that the significant degree of variation especially in the low-frequency domain of N-V dependencies would naturally lend itself to such an approach. the idea that frequency is directly active as a building block of grammar. 26 which do not permit extraction from NP) would seem to be traceable back to independent causes; in particular, an extreme overall rarity of a V-N dependency looks like an obvious additional factor.
Next, as noted in footnote 7, we have investigated three alternative measures of collocational strength, in addition to normalized ∆P values. These are, first, Mutual Information (MI); second, t-score; and third, an account for determining (asymmetrical) collocational strength that we will refer to as Alt. Let us begin with Mutual Information (cf. Church & Hanks (1990)). This is a measure that results in high values for low-frequency W1-W2 combinations if W1 and W2 are very faithful to each other. If a word occurs only once in a corpus, it will have high MI values for the preceding (and following) word, whatever those words are. Thus, Mutual Information rewards low-frequency collocations (as long as at least one of the members does not occur with many other words, which is trivially true for a word count of 1, e.g.). Second, the t-score (cf. Church et al. (1991)) is sensitive to the overall frequency of the collocation W1-W2 in the corpus. It produces high values, even if either W1 or W2 occur frequently with other words. And third, as yet another variation a reviewer has suggested an asymmetrical indicator of collocational strength based on the frequency of O given C, relative to the overall frequency of O, accompanied by log-transformed and scaled values. Table 3 shows what while the results obtained with these measures differ from the results under ∆P, and also from one another, in several respects, the basic conclusions carry over unchanged (under normalization), and (with the possible exception of MI) these alternative approaches could in principle also have been employed in the present analysis. 27 Still, it turns out that none of the alternatives manages to establish the near-perfect match with extraction options that is predicted by (normalized) ∆P.